英语论文SVM的蛋白质与DNA交互作用预测研究(2)

iii. INTRODUCTION

DNA generally refers to the expression of a gene. The binding to DNA motifs and histones that form part of the structure of DNA and bind to it less specifically. Also there are proteins that repair DNA such as uracil-DNA glycosylase interact closely with it. Proteins bind to DNA in the major groove, however, there are exceptions. Past reports have shown that 2%–3% of a prokaryotic genome and 6%–7% of a eukaryotic genome encodes DNA-binding proteins [3]. The interactions can be formed by different domains, such as the zinc finger or the helix-turn-helix. These interactions are involved in a variety of biological processes including DNA replication, DNA repair, viral infection, DNA packing, and DNA modifications [4]. Understanding of the molecular mechanisms of how proteins called transcription factors (TFs) recognize their specific binding sites encoded into genomic DNA represents one of the main, long-standing issues in the molecular biophysics. Surprisingly, some experiments have demonstrated that DNA surrounding a specific TF binding sites greatly influences binding specificity. We expect that our results will significantly affect the understanding of molecular, biophysical principles of transcriptional regulation, and greatly improve the ability to predict how many in DNA sequences influence gene expression programs in cells of living organisms.

The use of computational methods in the prediction of DNA protein sequences has many advantages. It is less tedious and time consuming compared to doing the actual physical experiments. Financially, it is beneficial because there would be less requirements on obtaining the actual samples for experimentation, a lot of the money spent on buying, maintaining certain materials will be saved. And the practical applications of DNA protein predictions are vast. Drugs target proteins that bind to the DNA making molecules that bind to the double helix parts of DNA and interfere with the interactions between DNA and proteins. One type of target are the use of telomerase inhibitors, this is an area related to cancer treatment. Telomeres are at chromosome ends and they protect the ends from damage and help to make sure DNA replication occurs as it was meant to. In somatic cells, the life span has an 'end date.' A tumour cell, on the other hand, keeps its telomere ends stable, so that the tumour cell can continue to survive. This, of course, presents a predicament for treatment but telomerase inhibitors have addressed this dilemma for researchers. Several strategies were creates in form of data sets that held information on the DNA-binding site identification , DBP374 was the largest database used, which is optimal for initiating novel studies. New study research on DNA–protein interactions may be able to employ a data set that has already been used in the literature, which makes use of the direct comparison with previous studies. Additionally, two specific databases are devoted to protein–DNA interactions using available information from the PDB. The Protein Data Bank (PDB) is a database for the three-dimensional structural data for proteins and nucleic acids which are large molecular structures. Data is usually obtained by X-ray crystallography, NMR spectroscopy, or more commonly, cryo-electron microscopy, and submitted by biologists and biochemists from all over the globe and is accessed freely on the Internet through the websites of its affiliate organisations .It is a very important tool in biological research. Scientist are now required to submit their research data structure to the PBD. Many other databases use protein structures deposited in the PDB. For example, SCOP and CATH does so.

In a protein–DNA complex, an amino acid residue in the protein is defined as a binding site if the distance between any atoms of this residue and any atoms of the DNA molecule is less than a specific cut-off value. Several previous studies on DNA–protein binding site prediction have used various definitions of DNA-binding sites [6]. Kuznetsov stated that the cut-off distance of 4.5 Å gave the best separation between the binding and nonbinding residues when using evolutionary and structural information to predict binding sites, while Si et al [6] applied cut-off distances of 3.5, 4.0, 4.5, 5.0, 5.5, 6.0 Å, and binding sites with the solvent accessible surface area (ASA) in two data sets and chose 3.5 Å as the most proper definition. ASA refers to DNA-binding residues that have a tendency to be exposed to a solvent to create contacts with the DNA structure, which makes relative solvent accessibility a useful predictive feature. Studies have focused only on surface residues in the prediction [12]. Similar to the secondary structure, the relative ASA can be predicted based on the protein sequence or calculated through the protein structure using specific software. The relative ASA of each residue in a protein was calculated when the DNA molecule was present (non-complexed). Non-complexed was considered to be the protein structure extracted alone from the PDB file. Surface accessible residues were defined as residues with a relative ASA of >5%. The sequence similarity among proteins in the data set is important to the prediction outcome. The current methods state that the similarity level should be kept below 30%–35%. A single representative from each protein set was identified and sub-sequences of other proteins in a data set were eliminated.

上一篇：德语论文《名誉》现代社会人的生存困境

下一篇：英文论文中国家族企业代际传承现状特征及问题研究

英语论文SVM的蛋白质与DNA交互作用预测研究(2)

《嘉莉妹妹》中报纸的象征意义

《魔山》特殊的疗养院时代差异性分析

从电影学角度解析《彗星...

朝鲜语论文中韩与龙有关的俗语比较研究

德语论文默克尔的难民政策分析

德语论文德国知名旅游城市波恩的成功因素

德语论文从社会心理学的...

社会工作视野下医患关系的冲突与协调

浅谈芭蕾舞外开与中国古典舞外旋的区别

论好莱坞电影中的中国文化元素

中学地理生活化教学研究

原位离子交换法合成AgBrAg3PO4复合光催化材料

结肠透析机治疗慢性肾功...

谈人机工程学在公共电话亭设计中的应用

沉箱码头设计国内外研究现状和参考文献

稀土伴生放射性冶炼厂环境放射性水平调查

18岁可以學什么技术，18岁...