This study was initiated in an attempt to address these shortcom- ings by developing a more powerful predictor for identifying DNA recombination spots. The proposed predictor is called iRSpot-EL, where ‘i’ stands for ‘identify’, ‘RSpot’ for ‘recombination spot’ and ‘EL’ for ‘ensemble learning’.

To develop a new predictor usually consists of two purposes. One is to stimulate theoretical studies in the relevant areas, and the other is to make experimental scientists easier to get their desired in- formation. To realize these, the rest of this article is presented ac- cording to the following five guidelines (Chou, 2011): (i) benchmark dataset, (ii) sample representation, (iii) operation algorithm, (iv) val- idation, and (v) web-server.

2 Materials and methods

2.1 Benchmark dataset

A reliable and stringent benchmark is pivotal to the development of an accurate prediction method. In literature, the benchmark dataset usually consists of a training dataset and a testing dataset: the for- mer is for the purpose of training a proposed model, while the latter for the purpose of testing it. As pointed out by a comprehensive re- view (Chou and Shen, 2007b), however, there is no need to  separate a benchmark dataset into a training dataset and a testing dataset for validating a prediction method if it is tested by the jackknife or sub- sampling (K-fold) cross-validation because the outcome thus ob- tained is actually from a combination of many different independent dataset tests. In this study, for facilitating the comparison of the pro- posed predictor with the existing ones, we adopted the widely used benchmark dataset (Chen et al., 2013; Jiang et al., 2007; Liu et al., 2012; Qiu et al., 2014) that can be formulated   as S ¼ Sþ [ S— (1)where S is the benchmark dataset, Sþ the positive subset containing 490 DNA segments (hotspot samples) with the relative hybridiza- tion ratios (Gerton et al., 2000) higher than 1.5 (Jiang et al., 2007), S— the negative subset containing 591 DNA segments (coldspot sam- ples) with the relative hybridization ratios (Gerton et al., 2000) lower than 0.82 (Jiang et al., 2007), and the symbol [ denotes the union in the set theory. In order to reduce redundancy and hom- ology bias, the CD-HIT software (Li et al., 2001) was used to re- move sequences whose similarity is >75%. Finally, 478 hotspots (positive samples) and 572 coldspots (negative samples) were ob- tained. For readers’ convenience, the 478 hotspot samples and 572 coldspot samples as well as their detailed sequences are given    in

Supplementary Materials S1.

2.2 Pseudo k-tuple nucleotide composition

With the avalanche of biological sequences emerging in the post- genomic age, one of the most challenging problems in computa- tional biology is how to formulate a biological sequence with a vec- tor, yet essentially still keep its key pattern or characteristics. This is because nearly all the existing machine-learning algorithms were de- veloped to handle vector but not sequence samples, as elaborated in a recent review (Chou, 2015). Unfortunately, a vector defined in a discrete model may completely lose all the sequence-order informa- tion or sequence pattern characteristics. To overcome such a prob- lem for protein/peptide sequences, the pseudo amino acid composition (PseAAC) (Chou, 2001) was introduced, and has be- come an important tool (Cao et al., 2013; Du et al., 2012, 2014) widely used in nearly all the areas of computational proteomics [see a long list of references cited in Chou (2011)]. Encouraged by the successes of PseAAC, the pseudo nucleotide composition (PseKNC) (Chen et al., 2014, 2015b; Liu et al., 2015a, 2016b) was introduced to formulate DNA/RNA sequences, and it has  been  increasingly used in computational genetics and genomics (see, e.g. a recent re- view (Chen et al., 2015a) as well as a long list of references cited therein). Recently, a web-server called ‘Pse-in-One’ was developed for generating various modes of pseudo components for DNA/RNA and protein/peptide sequences (Liu et al.,  2015b).

上一篇:太阳能最大功率点追踪和逆变控制英文文献和中文翻译
下一篇:开关电源水冷却系统英文文献和中文翻译

开关电源水冷却系统英文文献和中文翻译

太阳能最大功率点追踪和...

移动码头的泊位分配问题英文文献和中文翻译

虚拟船舶装配集成建模方...

中学生科学探究中对等论...

车辆路面相互作用动力学英文文献和中文翻译

活塞环/气缸套在摩擦和磨...

组态王文献综述

紫陵阁

人事管理系统开题报告

浅谈动画短片《天降好运》中的剧本创作

小学《道德与法治》学习心得体会

淮安市老漂族心理与休闲体育现状的研究

弹道修正弹实测弹道气象数据使用方法研究

大学生就业方向与专业关系的研究

林业机械作业中的安全性问题【2230字】

适合宝妈开的实体店,适...