减数分裂和基因重组英文文献和中文翻译(2)

This study was initiated in an attempt to address these shortcom- ings by developing a more powerful predictor for identifying DNA recombination spots. The proposed predictor is called iRSpot-EL, where ‘i’ stands for ‘identify’, ‘RSpot’ for ‘recombination spot’ and ‘EL’ for ‘ensemble learning’.

To develop a new predictor usually consists of two purposes. One is to stimulate theoretical studies in the relevant areas, and the other is to make experimental scientists easier to get their desired in- formation. To realize these, the rest of this article is presented ac- cording to the following five guidelines (Chou, 2011): (i) benchmark dataset, (ii) sample representation, (iii) operation algorithm, (iv) val- idation, and (v) web-server.

2 Materials and methods

2.1 Benchmark dataset

A reliable and stringent benchmark is pivotal to the development of an accurate prediction method. In literature, the benchmark dataset usually consists of a training dataset and a testing dataset: the for- mer is for the purpose of training a proposed model, while the latter for the purpose of testing it. As pointed out by a comprehensive re- view (Chou and Shen, 2007b), however, there is no need to separate a benchmark dataset into a training dataset and a testing dataset for validating a prediction method if it is tested by the jackknife or sub- sampling (K-fold) cross-validation because the outcome thus ob- tained is actually from a combination of many different independent dataset tests. In this study, for facilitating the comparison of the pro- posed predictor with the existing ones, we adopted the widely used benchmark dataset (Chen et al., 2013; Jiang et al., 2007; Liu et al., 2012; Qiu et al., 2014) that can be formulated as S ¼ Sþ [ S— (1)where S is the benchmark dataset, Sþ the positive subset containing 490 DNA segments (hotspot samples) with the relative hybridiza- tion ratios (Gerton et al., 2000) higher than 1.5 (Jiang et al., 2007), S— the negative subset containing 591 DNA segments (coldspot sam- ples) with the relative hybridization ratios (Gerton et al., 2000) lower than 0.82 (Jiang et al., 2007), and the symbol [ denotes the union in the set theory. In order to reduce redundancy and hom- ology bias, the CD-HIT software (Li et al., 2001) was used to re- move sequences whose similarity is >75%. Finally, 478 hotspots (positive samples) and 572 coldspots (negative samples) were ob- tained. For readers’ convenience, the 478 hotspot samples and 572 coldspot samples as well as their detailed sequences are given in

Supplementary Materials S1.

2.2 Pseudo k-tuple nucleotide composition

With the avalanche of biological sequences emerging in the post- genomic age, one of the most challenging problems in computa- tional biology is how to formulate a biological sequence with a vec- tor, yet essentially still keep its key pattern or characteristics. This is because nearly all the existing machine-learning algorithms were de- veloped to handle vector but not sequence samples, as elaborated in a recent review (Chou, 2015). Unfortunately, a vector defined in a discrete model may completely lose all the sequence-order informa- tion or sequence pattern characteristics. To overcome such a prob- lem for protein/peptide sequences, the pseudo amino acid composition (PseAAC) (Chou, 2001) was introduced, and has be- come an important tool (Cao et al., 2013; Du et al., 2012, 2014) widely used in nearly all the areas of computational proteomics [see a long list of references cited in Chou (2011)]. Encouraged by the successes of PseAAC, the pseudo nucleotide composition (PseKNC) (Chen et al., 2014, 2015b; Liu et al., 2015a, 2016b) was introduced to formulate DNA/RNA sequences, and it has been increasingly used in computational genetics and genomics (see, e.g. a recent re- view (Chen et al., 2015a) as well as a long list of references cited therein). Recently, a web-server called ‘Pse-in-One’ was developed for generating various modes of pseudo components for DNA/RNA and protein/peptide sequences (Liu et al., 2015b).

上一篇：太阳能最大功率点追踪和逆变控制英文文献和中文翻译

下一篇：开关电源水冷却系统英文文献和中文翻译

减数分裂和基因重组英文文献和中文翻译(2)

AngularJS技术介绍英文文献和中文翻译

开关电源水冷却系统英文文献和中文翻译

太阳能最大功率点追踪和...

移动码头的泊位分配问题英文文献和中文翻译

虚拟船舶装配集成建模方...

中学生科学探究中对等论...

车辆路面相互作用动力学英文文献和中文翻译

80C51单片机水箱液位控制系...

机械安全标准国内外研究现状

内河智能航运信息服务（...

超声波自动测量物体液位系统设计任务书

反转课堂在小学数学教学...

城镇化进程国内外研究现状

ARM新生儿水床控制系统设计硬件设计+源代码

第三方支付风险防范文献综述和参考文献

流动人员人事档案信息化...

浅析地籍档案的信息化管理【2143字】