博碩士論文 89522065 詳細資訊


姓名 王世賢(Shih-Hsien Wang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 應用資料探勘技術於蛋白質保留性胺基酸序列之關聯性
(Study of Motif Correlation in Proteins by Data Mining)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 蛋白質序列在演化的過程中,有些區域的序列往往較其他區域更容易被保留下來,而這些較保留下來的區域常常是在蛋白質的結構上或者是弁鄐W扮演著相當重要的角色。蛋白質序列上motif之間的關聯性可能隱約透露出蛋白質生物弁鄋爾穈T,而這樣的資訊也提供了我們分析人類基因與其他物種之間演化分析上的一些線索。我們的目的主要是想找出蛋白質序列結構上motif的關聯性,而在這次的研究中蛋白質序列主要是從PIR-NREF 資料庫萃取而來,而motif 則是由PROSITE資料庫取出。我們使用資料探勘的方法來尋找蛋白質序列上motif之間的關聯性。
摘要(英) In protein sequences, some regions are better conserved than others during evolution. These conserved regions generally play an important role in function or structure of proteins. The knowledge of the correlation between protein motifs should be important in shedding new light on the biological functions of proteins and offering a basis in analyzing the evolution in the human genome or other genomes. The aim here is to find the motif correlation in protein structures. The protein sequences used in this study are from PIR-NREF database and PROSITE database, respectively. We apply data mining approach to discover the correlation of motif in protein sequences.
關鍵字(中) ★ 蛋白質 關鍵字(英) ★ mining
★ motif
★ protein
論文目次 Content
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Protein Databases 5
2.2 Protein domain family database 6
2.3 Protein Structure Related Databases 7
2.4 Association rules 8
Chapter 3 Our Approach 10
3.1 Materials 11
3.2 Preprocessing and Mapping 11
3.3 Mining Association Rules 18
Chapter 4 Results 21
4.1 Environments of Implementation 21
4.2 Mining Result 21
Chapter 5 Discussion 29
Chapter 6 Conclusions 33
References 34
Appendix A 37
參考文獻 [1] Laurent Falquet, Marco Pagni, Philipp Bucher, Nicolas Hulo, Christian J. A. Sigrist, Kay Hofmann, and Amos Bairoch “The PROSITE database, its status in 2002”. Nucl. Acids. Res. 2002 30: 235-238.
[2] K Hofmann, P Bucher, L Falquet, and A Bairoch. "The PROSITE database, its status in 1999". Nucl. Acids. Res. 1999, 27: 215-219.
[3] A Bairoch, P Bucher, and K Hofmann. “The PROSITE database, its status in 1997”. Nucl. Acids. Res. 1997 25: 217-221.
[4] A Bairoch, P Bucher, and K Hofmann. “The PROSITE database, its status in 1995”. Nucl. Acids. Res. 1996 24: 189-196.
[5] Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy, Kevin L. Howe, and Erik L. L. Sonnhammer . "The Pfam Protein Families Database". Nucl. Acids. Res. 2000, 28: 263-266.
[6] T. K. Attwood, M. J. Blythe, D. R. Flower, A. Gaulton, J. E. Mabey, N. Maudling, L. McGregor, A. L. Mitchell, G. Moulton, K. Paine, and P. Scordis. "PRINTS and PRINTS-S shed light on protein ancestry". Nucl. Acids. Res. 2002, 30: 239-241.
[7] Amos Bairoch and Rolf Apweiler. "The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000". Nucl. Acids. Res. 2000, 28: 45-48.
[8] Loredana Lo Conte, Bart Ailey, Tim J. P. Hubbard, Steven E. Brenner, Alexey G. Murzin, and Cyrus Chothia . "SCOP: a Structural Classification of Proteins database". Nucl. Acids. Res. 2000, 28: 257-259.
[9] R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, E. Birney, M. Biswas, P. Bucher, L. Cerutti, F. Corpet, M. D. R. Croning, R. Durbin, L. Falquet, W. Fleischmann, J. Gouzy, H. Hermjakob, N. Hulo, I. Jonassen, D. Kahn, A. Kanapin, Y. Karavidopoulou, R. Lopez, B. Marx, N. J. Mulder, T. M. Oinn, M. Pagni, F. Servant, C. J. A. Sigrist, and E. M. Zdobnov. "InterPro-an integrated documentation resource for protein families, domains and functional sites". Bioinformatics. 2000, 16: 1145-1150.
[10] A Elofsson and EL Sonnhammer . "A comparison of sequence and structure protein domain families as a basis for structural genomics". Bioinformatics. 1999, 15: 480-500.
[11] Ernst Kretschmann, Wolfgang Fleischmann, and Rolf Apweiler. "Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT". Bioinformatics. 2001, 17: 920-926.
[12] SR Eddy. "Profile hidden Markov models". Bioinformatics. 1998, 14: 755-763.
[13] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Mining association rules between sets of items in large databases", in Proc. of the ACM SIGMOD Conference on Management of Data, 1993
[14] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo, "Finding Interesting Rules from Large Sets of Discovered Association Rules", CIKM, 1994, 401-407.
[15] F.C. Tseng and C.C. Hsu, "Generating Frequent Patterns with the Frequent Pattern List", PAKDD 2001
[16] S. J. Wheelan, A. Marchler-Bauer, and S. H. Bryant. "Domain size distributions can predict domain boundaries". Bioinformatics. 2000, 16: 613-618.
[17] Cathy H. Wu, Hongzhan Huang, Leslie Arminski, Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Kali C. Lewis, Hans-Werner Mewes, Bruce C. Orcutt, Baris E. Suzek, Akira Tsugita, C. R. Vinayaka, Lai-Su L. Yeh, Jian Zhang, and Winona C. Barker. “The Protein Information Resource: an integrated public resource of functional annotation of proteins”. Nucleic Acids Res. 2002, 30,35-37.
[18] John Westbrook, Zukang Feng, Shri Jain, T. N. Bhat, Narmada Thanki, Veerasamy Ravichandran, Gary L. Gilliland, Wolfgang Bluhm, Helge Weissig, Douglas S. Greer, Philip E. Bourne and Helen M. Berman. “ The Protein Data Bank: unifying the archive”. Nucleic Acids Res. 2002, 30,245-248.
[19] K Karplus, C Barrett, and R Hughey. “Hidden Markov models for detecting remote protein homologies”. Nucleic Acids Res. 1998, 14,846-856.
[20] Pearl, F.M.G, Lee, D., Bray, J.E, Sillitoe, I., Todd, A.E., Harrison, A.P., Thornton, J.M. and Orengo, C.A. “Assigning genomic sequences to CATH” Nucleic Acids Res. 2000, 1. 277-282
指導教授 洪炯宗(Jorng-Tzong Horng) 審核日期 2002-6-25
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡