摘要DNA 甲基化在各种生物的组织、细胞中都是普遍存在的,是一个主要发生在胞 嘧啶中的生化过程。通过阻止转录蛋白和基因的结合,它可以影响基因转录,从而抑 制基因的表达。因此,在生命发展和疾病形成的过程中,DNA 甲基化均起着重要作 用,也被认为是一种重要的表观遗传标记。由此可知,识别甲基化位点对生物学基础 研究与药物开发意义重大。在此之前专家们已试用不同方法识别 DNA 甲基化位点, 但或耗时耗力或精度不高。随着人类基因组计划的开展以及各式各样高通量检测技术 的应用,DNA 序列呈现雪崩式的增长,我们迫切需要一种可以高效率准确识别 DNA 甲基化位点的方法。78075

为解决上述问题,本文在有效的基准数据集上,采用支持向量机(SVM)作为 预测引擎。运用统计学方法,提取核苷酸/二核苷酸位置特异性(PSNP/ PSDP)两种 特征向量,并与传统的核苷酸组成成分(NC)特征相结合来编码 DNA 序列。在基准 数据集上采用严格的 Jackknife 测试评价分类器性能。所得实验结果与现有的最好预 测结果相比,将实验结果 Mcc 提高了 23。1%,此数据说明本文 DNA 序列甲基化位点 的整体预测精度确实有显著提高。

毕业论文关键词:DNA 甲基化;核苷酸位置特异性;支持向量机;Jackknife 测试

Abstract DNA methylation is common in various inpidual organisms, tissues and cells, which is a biochemical process predominantly occurring on cytosine。 By impeding the binding of transcriptional proteins to the gene, DNA methylation can affect the transcription of genes, thus inhibiting gene expression。 Therefore, DNA methylation plays an important role in the process of life development and disease formation for epigenetic gene regulation, and it is considered as an important epigenetic mark。 It can be known that the identification of methylation sites is of great significance for both basic research and drug development。 In fact, although a number of methods have been developed in this regard, they are time consuming or low accuracy。 With the development of the human genome project and the application of a wide range of high-throughput detection technology, DNA sequence showed avalanche growth, we urgently need a method that can accurately identify DNA methylation sites。

In order to solve the above problems, in this dissertation, we select effective benchmark data set, using the support vector machine (SVM) as prediction engine。 Adopting statistical methods to extract the feature vectors of DNA sequence。 Two new features, i。e。 position-specific nucleotide/dinucleotide propensity (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate DNA sequences。 Based on the benchmark datasets, the rigorous jackknife test was used to evaluate the performances of classifier。 By comparing the experimental results with the best predicted results, the value of Mcc is improved by 23。1%, which can prove that the prediction accuracy of DNA sequence methylation sites is significantly improved。

Keywords: DNA methylation; PSNP; SVM; Jackknife test

第一章 绪论 1

1。1 研究背景及意义 1

1。2 国内外研究现状 2

1。3 论文主要工作 2

第二章 甲基化位点识别概述 4

2。1 甲基化位点识别流程 4

2。2 基准数据集 4

2。3 特征提取 6

2。3。1 基于统计特征的方法

上一篇:泊松分布及泊松过程在金融保险中的应用
下一篇:概率论中几个不等式的推广及应用

微课在中学数学素质教育中的应用

中学数学教学中的模型思想与应用

凯勒流形的复结构与代数结构研究

可展曲面的判定构造及其应用

Dirichlet判别法与Abel判别法的探究

一维Schroedinger算子只有离散谱的条件

螺纹钢期货交易中几个影...

提高教育质量,构建大學生...

上海居民的社会参与研究

STC89C52单片机NRF24L01的无线病房呼叫系统设计

压疮高危人群的标准化中...

从政策角度谈黑龙江對俄...

AES算法GPU协处理下分组加...

基于Joomla平台的计算机学院网站设计与开发

浅谈高校行政管理人员的...

酵母菌发酵生产天然香料...

浅论职工思想政治工作茬...