博碩士論文 92522010 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:50 、訪客IP:3.145.74.215
姓名 李見信(Jian-Sin Li)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 利用決策樹以蛋白質序列及結構預測熱穩定性
(Prediction of protein thermostability using Decision Tree base on sequence and structure features)
相關論文
★ 應用嵌入式系統於呼吸肌肉群訓練儀之系統開發★ 勃起障礙與缺血性心臟病的雙向研究: 以台灣全人口基礎的世代研究
★ 基質輔助雷射脫附飛行時間式串聯質譜儀 微生物抗藥性資料視覺化工具★ 使用穿戴式裝置分析心律變異及偵測心律不整之應用程式
★ 建立一個自動化分析系統用來分析任何兩種疾病之間的關聯性透過世代研究設計以及使用承保抽樣歸人檔★ 青光眼病患併發糖尿病,使用Metformin及Sulfonylurea治療得到中風之風險:以台灣人口為基礎的觀察性研究
★ 利用組成識別和序列及空間特性構成之預測系統來針對蛋白質交互作用上的特殊區段點位進行分析及預測辨識★ 新聞語意特徵擷取流程設計與股價變化關聯性分析
★ 藥物與疾病關聯性自動化分析平台設計與實作★ 建立財務報告自動分析系統進行股價預測
★ 建立一個分析疾病與癌症關聯性的自動化系統★ 基於慣性感測器虛擬鍵盤之設計與實作
★ 一個醫療照護監測系統之實作★ 應用手機開發手握球握力及相關資料之量測
★ 利用關聯分析全面性的搜索癌症關聯疾病★ 全面性尋找類風濕性關節炎之關聯疾病
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 蛋白質的熱穩定資訊對於生化物質的生產有密切相關。新進的發展對於蛋白質熱穩定的研究,大多是根據在一些同源性蛋白質之間做比較,找出對熱穩定具有重要意義的特徵。其中蛋白質序列上的特定胺基酸的數量分佈、特別的序列pattern,蛋白質結構上的氫鍵、雙硫鍵、鹽橋等等許多的特性都被認為與蛋白質熱穩定有重要關係。本研究目的在整合各種的特徵,發展出可以預測蛋白質熱穩定性的系統。
本研究利用原核生物最適生長溫度資料庫 (PGTdb)及PDB所提供的資料,將大量的蛋白質被納入研究中。首先將一些重要的特徵一一取出,再配合特徵選取的演算法,過濾出與最適生長溫度有較高線性相關的特徵。並運用機器學習的方法,建立具有穩定效能的預測模型。過程中我們還發現(E+F+M+R)/residue , charged/noncharged與蛋白質熱穩定有線性相關。最後我們建立出兩個預測系統,其一僅需要輸入蛋白質的序列,便能對該蛋白質的熱穩定做預測。若蛋白質的結構已知,透過第二個預測系統,將得到更高準度的預測。
摘要(英) The protein thermostability information is closely related to production of many biomaterials. Recent developments in research on the proteins thermostability find out the significant features for thermal stability of protein according to comparisons between homologous proteins. The amino acid composition, special pattern in sequence information and hydrogen bond, disulfide bond, salt bridges and so on in protein structure are considered important for thermostability. In this study, we present a system to integrate various factors to predict protein thermostability. In our research, a large number of proteins are from PGTdb and PDB. To start with, fetch out various features form sequences and structures. Then, feature selection algorithm is used to filter the features that have higher linear correlation coefficient to thermostability. Lastly, we apply these features to machine learning approach to built a predict system. In this research we discover two features, i.e., (E+F+M+R)/residue and charged/noncharged have linear correlation to thermostability. We finally establish two predict systems, one can predict protein thermostability by inputting protein sequences only, and the other can get better performance if the protein structure is known.
關鍵字(中) ★ 熱穩定性
★ 蛋白質
★ 決策樹
關鍵字(英) ★ protein
★ thermostability
★ Decision Tree
論文目次 CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND 1
1.1.1. Protein Thermostability 1
1.1.2. Primary structure of protein 2
1.1.3. Secondary structure of protein 5
1.1.4. Tertiary structure of protein 6
1.2 MOTIVATION 9
1.3 GOAL 10
CHAPTER 2 RELATED WORK 11
2.1 PGTDB 11
2.2 PDB 12
2.3 STRUCTURE AND SEQUENCE PROPERTIES AND PROTEIN THERMOSTABILITY 12
CHAPTER 3 MATERIAL AND METHOD 16
3.1 SYSTEM FLOW 16
3.2 MATERIAL 18
3.2.1. Sampling Data 18
3.2.2. Candidate Feature 20
3.2.3. Data Generation 20
3.3 METHOD 22
3.3.1. Feature Selection 22
3.3.2. Naïve Bayes 23
3.3.3. Decision Tree 24
3.3.4. Neural Network 25
3.3.5. Model Evaluation 26
3.3.6. Training and Predict System Flow 29
CHAPTER 4 RESULT 31
4.1 CORRELATION COEFFICIENT 31
4.2 CROSS VALIDATION RESULT 33
4.2.1. Three Test Case 33
4.2.2. Compare three machine learning approach result 36
4.2.3. Decision Tree detail result 37
4.3 PREDICT RESULT 42
CHAPTER 5 DISCUSSION AND CONCLUSION 43
REFERENCE 46
APPENDIX 48
參考文獻 Baumgartner, C., C. Bohm, et al. (2004). "Supervised machine learning techniques for the classification of metabolic disorders in newborns." Bioinformatics 20(17): 2985-96.
Berman, H. M., J. Westbrook, et al. (2000). "The Protein Data Bank." Nucleic Acids Res 28(1): 235-42.
Brown, T. A. (2002). Genomes -2nd ed.
Chan, C. H., H. K. Liang, et al. (2004). "Relationship between local structural entropy and protein thermostabilty." Proteins 57(4): 684-91.
Dalton, J. A., I. Michalopoulos, et al. (2003). "Calculation of helix packing angles in protein structures." Bioinformatics 19(10): 1298-9.
Dominy, B. N., H. Minoux, et al. (2004). "An electrostatic basis for the stability of thermophilic proteins." Proteins 57(1): 128-41.
Farias, S. T., M. G. van der Linden, et al. (2004). "Thermo-search: lifestyle and thermostability analysis." In Silico Biol 4(3): 377-80.
Gianese, G., F. Bossa, et al. (2002). "Comparative structural analysis of psychrophilic and meso- and thermophilic enzymes." Proteins 47(2): 236-49.
Gromiha, M. M., M. Oobatake, et al. (1999). "Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins." Biophys Chem 82(1): 51-67.
Haney, P. J., M. Stees, et al. (1999). "Analysis of thermal stabilizing interactions in mesophilic and thermophilic adenylate kinases from the genus Methanococcus." J Biol Chem 274(40): 28453-8.
Huang, S. L., L. C. Wu, et al. (2004). "PGTdb: a database providing growth temperatures of prokaryotes." Bioinformatics 20(2): 276-8.
Kannan, N. and S. Vishveshwara (2000). "Aromatic clusters: a determinant of thermal stability of thermophilic proteins." Protein Eng 13(11): 753-61.
Liang, H. K., C. M. Huang, et al. (2005). "Amino acid coupling patterns in thermophilic proteins." Proteins 59(1): 58-63.
M.Kamber, J. H. Dara-Mining Concepts and Techniques.
Matthews, X. Z. a. B. W. (1995). "EdPDB: A Multi-Functional Tool for Protein Structure Analysis." J. Appl. Cryst. 28: 624-630.
McDonald, I. K. and J. M. Thornton (1994). "Satisfying hydrogen bonding potential in proteins." J Mol Biol 238(5): 777-93.
Parthasarathy, S. and M. R. Murthy (2000). "Protein thermal stability: insights from atomic displacement parameters (B values)." Protein Eng 13(1): 9-13.
Petukhov, M., Y. Kil, et al. (1997). "Insights into thermal resistance of proteins from the intrinsic stability of their alpha-helices." Proteins 29(3): 309-20.
Ragone, R. (2001). "Hydrogen-bonding classes in proteins and their contribution to the unfolding reaction." Protein Sci 10(10): 2075-82.
Shir-Ly Huang, L.-C. W., Hsien-Da Huang, Han-Kuen Liang, Ming-Tat Ko, and Jorng-Tzong Horng (2004). "A Probabilistic Method to Correlate Ion-pairs to Protein Thermostability." Applied Bioformatics 3(1): 21-29.
Szilagyi, A. and P. Zavodszky (2000). "Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey." Structure Fold Des 8(5): 493-504.
Vieille, C. and G. J. Zeikus (2001). "Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability." Microbiol Mol Biol Rev 65(1): 1-43.
Vogt, G., S. Woell, et al. (1997). "Protein thermal stability, hydrogen bonds, and ion pairs." J Mol Biol 269(4): 631-43.
指導教授 洪炯宗(Jorng-Tzong Horng) 審核日期 2005-7-20
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明