摘要抗冻蛋白可以保护许多生活在极度寒冷条件下的生物抵抗低温,它们可以使细胞和体液免遭冻结,在生物技术领域具有很高的应用价值。到目前为止,已经在脊椎动物、非脊椎动物、植物和细菌等生物中发现了抗冻蛋白。随着后基因时代的到来,积累着越来越多没有备注的蛋白质序列,如何仅从序列信息中检测抗冻蛋白成为一个重要问题。抗冻蛋白在序列以及结构层面都具有多样性,正因如此,仅凭序列相似度鉴别抗冻蛋白往往是不成功的。本文中我们在前人研究的基础上引入新的基于遗传信息的特征,通过与PseAAC 特征的融合,在独立测试集上准确度达到了 88.05%,超过了业内最高水平的预测工具AFP-PseAAC,同时Youden's Index 指标也优于其他预测工具。实验结果可以说明,引入基于遗传信息的特征对于抗冻蛋白的预测性能的提升具有显著效果。31026 毕业论文关键词 抗冻蛋白质预测;特征表示;支持向量机;随机森林
Title AFPs prediction based on sequence
Abstract Antifreeze proteins can predict creatures living in low temperatures from coldness. They can prevent the cell and body fluids from freezing, which have a wide range of Biotechnological applications. AFPs are present in vertebrates, invertebrates, plants, bacteria, fungi, etc. With the enormous amount of genomic data available today, a rapid, specific and highly precise automated approach is desirable for identification and annotations of AFPs。Although AFPs have a common function, they show a high degree of persity in sequences and structures. Therefore, sequence similarity based search methods often fails to predict AFPs from sequence databases. A new descriptor named MEDP based on evolutionary information is introduced and fused with PseAAC feature. Our method achieves an accuracy of 88.05% in independent dataset, which is higher than the state-of-the-art. High accuracy suggests that evolutionary information is effective in improving the performance of AFPs prediction.
Keywords AFPs prediction; feature representation; SVM; random forest