博碩士論文 952211003 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:20 、訪客IP:18.225.54.147
姓名 吳行展(Sing-Jhan Wu)  查詢紙本館藏   畢業系所 系統生物與生物資訊研究所
論文名稱 以支持向量機鑑別原核生物之嗜寒、中溫、嗜熱、及超嗜熱蛋白質
(Discrimination of psychrophilic, mesophilic thermophilic, and hyperthermophilic proteins in prokaryotes using Support Vector Machine)
相關論文
★ miRCSC : miRNA表現量伴隨癌症改變狀態的文獻證明搜尋引擎★ 利用上下文感知最大化邊界神經網路提取疾病與疾病的關聯
★ 應用嵌入式系統於呼吸肌肉群訓練儀之系統開發★ 勃起障礙與缺血性心臟病的雙向研究: 以台灣全人口基礎的世代研究
★ 基質輔助雷射脫附飛行時間式串聯質譜儀 微生物抗藥性資料視覺化工具★ 使用穿戴式裝置分析心律變異及偵測心律不整之應用程式
★ 建立一個自動化分析系統用來分析任何兩種疾病之間的關聯性透過世代研究設計以及使用承保抽樣歸人檔★ 青光眼病患併發糖尿病,使用Metformin及Sulfonylurea治療得到中風之風險:以台灣人口為基礎的觀察性研究
★ 利用組成識別和序列及空間特性構成之預測系統來針對蛋白質交互作用上的特殊區段點位進行分析及預測辨識★ 新聞語意特徵擷取流程設計與股價變化關聯性分析
★ 藥物與疾病關聯性自動化分析平台設計與實作★ 建立財務報告自動分析系統進行股價預測
★ 建立一個分析疾病與癌症關聯性的自動化系統★ 基於慣性感測器虛擬鍵盤之設計與實作
★ 一個醫療照護監測系統之實作★ 應用手機開發手握球握力及相關資料之量測
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 蛋白質熱穩定性無論在基礎科學或工業應用上都是很重要的課題,許多研究在同源蛋白質之間進行序列和結構上的比較分析,從中找出對熱穩定具有重要意義的影響因子。過去的研究發現,蛋白質序列上胺基酸組成(Amino Acid Composition)、疏水性交互作用(Hydrophobic Interaction)、離子交互作用(Ionic Interaction)等許多特性都被認為與蛋白質熱穩定有重要關係。相對於嗜熱蛋白質,嗜寒蛋白質的工業應用亦相當重要,但相關研究則相對較少。本研究目的在分析各種蛋白質物化特徵,發展出可預測嗜熱蛋白質及嗜寒蛋白質的系統,並探討不同特徵於四種溫度分類群組間之關係。我們利用NCBI原核生物基因體計畫所提供的資料,截取大量蛋白質及相關溫度資訊,計算出特徵後再配合特徵選取演算法,過濾出與溫度具相關性的重要因子,再運用機器學習方法,建立具有穩定效能的預測模型,我們認為三種型式的胺基酸組成(Amino Acid Composition, Dipeptide Composition, Pseudo Amino Acid Composition)對於蛋白質的溫度分類有顯著的效果。
摘要(英) The study of protein thermostability plays an important role in both basic and applied research. Most of the studies on protein thermostability are focused on the analysis of structure or sequence comparison among homologous proteins, and identify the factors that affect the protein thermostability. Scientists had found key properties that influence protein thermostability, such as amino acid composition, hydrophobic interaction, and ionic interaction, etc. However, the properties correlate to psychrophilic properties of proteins are less studied. The purpose of this study is to analyze the properties of selected pools of proteins by developing a method to predict the thermostability or psychrophilicity. Furthermore, to identify which are the key features We used the data provided by NCBI prokaryotic genome project to select 86470 proteins and the temperature data, the optimal growth temperatures from the source prokaryotes, followed by calculation of protein features by feature selection algorithm. Finally, the vital factors related to temperatures, amino acid composition, dipeptide composition, pseudo amino acid composition are selected. A machine learning method is performed to build a robust prediction model on protein thermostability and psychrophilicity. We believed these three types of amino acid composition have a significant effect on protein temperature classification.
關鍵字(中) ★ 支持向量機
★ 蛋白質熱穩定性
★ 蛋白質嗜寒性
★ 機器學習演算法
關鍵字(英) ★ Machine learning algorithms
★ Support vector machine
★ Protein thermostability
★ Protein psychrophilicity
論文目次 中文摘要………………………………………………………………V
Abstract………………………………………………………………VI
致謝…………………………………………………………………VII
Contents……………………………………………………………VIII
List of Figures………………………………………………………X
List of Tables………………………………………………………XI
Chapter 1.Introduction………………………………………………1
1.1 Background……………………………………………………1
1.1.1 Extremophile…………………………………………1
1.1.2 Protein…………………………………………………1
1.1.3 Protein thermostability …………………………3
1.1.4 Protein psychrophilicity…………………… 4
1.1.5 Optimal growth temperature……………………6
1.2 Motivation……………………………………………………6
1.3 Problem…………………………………………………… 6
1.4 Goal……………………………………………………… 7
Chapter 2.Related Works……………………………………… 9
2.1 NCBI Entrez Genome Project Database……………… 9
2.2 Temperature information……………………………… 9
2.2.1 PGTdb ………………………………………………9
2.2.2 Culture collection center…………………… 10
2.3 Phylogenetic analysis tool………………………… 10
2.3.1 PHYLIP…………………………………………… 11
2.3.2 iToL……………………………………………… 11
2.4 Machine learning and statistic tool……………… 11
2.4.1 WEKA……………………………………………… 11
2.4.2 LIBSVM…………………………………………… 12
2.4.3 SPSS……………………………………………… 12
2.5 Recent prediction toolon protein thermostability…12
Chapter 3.Materials and Methods…………………………… 13
3.1 Materials………………………………………………… 13
3.1.1 Prokaryotes and optimal growth temperature…13
3.1.2 Protein sequences……………………………… 14
3.1.3 Physicochemical properties of protein…… 14
3.2 Methods…………………………………………………… 16
3.2.1 System flow……………………………………… 16
3.2.2 Microorganisms sampling……………………… 18
3.2.3 Protein sampling……………………………… 20
3.2.4 Feature selection……………………………… 20
3.2.5 Statistical test……………………………… 21
3.2.6 Machine learning technique………………… 22
3.2.7 Performance index……………………………… 22
Chapter 4.Results……………………………………………… 23
4.1 Phylogenetic tree……………………………………… 23
4.2 Key feature ………………………………………………30
4.3 Statistical analysis of protein features from four categories……………………………………………………… 32
4.4 Discrimination of proteins from four categories…37
4.5 Discrimination of thermophilic and mesophilic proteins…………………………………………………………… 37
4.6 Discrimination of psychrophilic and mesophilic proteins…………………………………………………………… 38
Chapter 5.Discussion…………………………………………… 39
Chapter 6.Conclusion…………………………………………… 41
References……………………………………………………………42
Appendix………………………………………………………………44
參考文獻 Barutcuoglu, Z., R. E. Schapire, and O. G. Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22:830-6.
Baxevanis, A. D. 2006. Searching the NCBI databases using Entrez. Curr Protoc Hum Genet Chapter 6:Unit 6 10.
Chou, K. C. 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10-9.
De Vendittis, E., I. Castellano, R. Cotugno, M. R. Ruocco, G. Raimo, and M. Masullo. 2008. Adaptation of model proteins from cold to hot environments involves continuous and small adjustments of average parameters related to amino acid composition. J Theor Biol 250:156-71.
Dehouck, Y., B. Folch, and M. Rooman. 2008. Revisiting the correlation between proteins' thermoresistance and organisms' thermophilicity. Protein Eng Des Sel.
Demirjian, D. C., F. Moris-Varas, and C. S. Cassidy. 2001. Enzymes from extremophiles. Curr Opin Chem Biol 5:144-51.
Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle.
Frank, E., M. Hall, L. Trigg, G. Holmes, and I. H. Witten. 2004. Data mining in bioinformatics using Weka. Bioinformatics 20:2479-81.
Gromiha, M. M., M. Oobatake, and A. Sarai. 1999. Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82:51-67.
Gromiha, M. M., and M. X. Suresh. 2008. Discrimination of mesophilic and thermophilic proteins using machine learning algorithms. Proteins 70:1274-9.
Gupta, R., A. Mittal, and K. Singh. 2008. A novel and efficient technique for identification and classification of GPCRs. IEEE Trans Inf Technol Biomed 12:541-8.
Holm, L., and C. Sander. 1998. Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14:423-9.
Huang, J., T. Li, K. Chen, and J. Wu. 2006. An approach of encoding for prediction of splice sites using SVM. Biochimie 88:923-9.
Huang, S. L., L. C. Wu, H. D. Huang, H. K. Liang, M. T. Ko, and J. T. Horng. 2004a. A probabilistic method to correlate ion pairs with protein thermostability. Appl Bioinformatics 3:21-9.
Huang, S. L., L. C. Wu, H. K. Liang, K. T. Pan, J. T. Horng, and M. T. Ko. 2004b. PGTdb: a database providing growth temperatures of prokaryotes. Bioinformatics 20:276-8.
Huang, S. W., and J. K. Hwang. 2005. Computation of conformational entropy from protein sequences using the machine-learning method--application to the study of the relationship between structural conservation and local structural stability. Proteins 59:802-9.
Jahandideh, S., E. Barzegari Asadabadi, P. Abdolmaleki, M. Jahandideh, and S. Hoseini. 2007. Protein psychrophilicity: role of residual structural properties in adaptation of proteins to low temperatures. J Theor Biol 248:721-6.
Letunic, I., and P. Bork. 2007. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23:127-8.
Li, W., L. Jaroszewski, and A. Godzik. 2001. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17:282-3.
Li, Z. R., H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen. 2006. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 34:W32-7.
Livingstone, C. D., and G. J. Barton. 1993. Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9:745-56.
Page, R. D. 2003. Introduction to inferring evolutionary relationships. Curr Protoc Bioinformatics Chapter 6:Unit 6 1.
Rothschild, L. J., and R. L. Mancinelli. 2001. Life in extreme environments. Nature 409:1092-101.
Siddiqui, K. S., and R. Cavicchioli. 2006. Cold-Adapted Enzymes. Annu Rev Biochem.
SPSS. 2003. SPSS for Windows, Version 11.5. SPSS Inc., Chicago, USA.
Szilagyi, A., and P. Zavodszky. 2000. Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure 8:493-504.
Vieille, C., and G. J. Zeikus. 2001. Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol Mol Biol Rev 65:1-43.
Zhang, G., and B. Fang. 2006a. Application of amino acid distribution along the sequence for discriminating mesophilic and thermophilic proteins. Process Biochemistry 41:1792-1798.
Zhang, G., and B. Fang. 2006b. Support vector machine for discrimination of thermophilic and mesophilic proteins based on amino acid composition. Protein Pept Lett 13:965-70.
Zhang, G., and B. Fang. 2007. LogitBoost classifier for discriminating thermophilic and mesophilic proteins. J Biotechnol 127:417-24.
Zhang, G. Y., and B. S. Fang. 2006c. [A study on the discrimination of thermophilic and mesophilic proteins based on dipeptide composition]. Sheng Wu Gong Cheng Xue Bao 22:293-8.
Zhou, X. X., Y. B. Wang, Y. J. Pan, and W. F. Li. 2008. Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins. Amino Acids 34:25-33.
指導教授 黃雪莉、洪炯宗
(Shir-Ly Huang、Jorng-Tzong Horng)
審核日期 2008-7-24
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明