蛋白質熱穩定性無論在基礎科學或工業應用上都是很重要的課題,許多研究在同源蛋白質之間進行序列和結構上的比較分析,從中找出對熱穩定具有重要意義的影響因子。過去的研究發現,蛋白質序列上胺基酸組成(Amino Acid Composition)、疏水性交互作用(Hydrophobic Interaction)、離子交互作用(Ionic Interaction)等許多特性都被認為與蛋白質熱穩定有重要關係。相對於嗜熱蛋白質,嗜寒蛋白質的工業應用亦相當重要,但相關研究則相對較少。本研究目的在分析各種蛋白質物化特徵,發展出可預測嗜熱蛋白質及嗜寒蛋白質的系統,並探討不同特徵於四種溫度分類群組間之關係。我們利用NCBI原核生物基因體計畫所提供的資料,截取大量蛋白質及相關溫度資訊,計算出特徵後再配合特徵選取演算法,過濾出與溫度具相關性的重要因子,再運用機器學習方法,建立具有穩定效能的預測模型,我們認為三種型式的胺基酸組成(Amino Acid Composition, Dipeptide Composition, Pseudo Amino Acid Composition)對於蛋白質的溫度分類有顯著的效果。 The study of protein thermostability plays an important role in both basic and applied research. Most of the studies on protein thermostability are focused on the analysis of structure or sequence comparison among homologous proteins, and identify the factors that affect the protein thermostability. Scientists had found key properties that influence protein thermostability, such as amino acid composition, hydrophobic interaction, and ionic interaction, etc. However, the properties correlate to psychrophilic properties of proteins are less studied. The purpose of this study is to analyze the properties of selected pools of proteins by developing a method to predict the thermostability or psychrophilicity. Furthermore, to identify which are the key features We used the data provided by NCBI prokaryotic genome project to select 86470 proteins and the temperature data, the optimal growth temperatures from the source prokaryotes, followed by calculation of protein features by feature selection algorithm. Finally, the vital factors related to temperatures, amino acid composition, dipeptide composition, pseudo amino acid composition are selected. A machine learning method is performed to build a robust prediction model on protein thermostability and psychrophilicity. We believed these three types of amino acid composition have a significant effect on protein temperature classification.