經由潛在語義的線索從蛋白質交互作用網路進行蛋白質功能的預測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.141.24.134

姓名

林冠宏(Guan-Hong Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

經由潛在語義的線索從蛋白質交互作用網路進行蛋白質功能的預測
(Protein Function Prediction from Protein Interaction Networks by Latent Semantic Indexing)

相關論文

★ 一種減輕LEO衛星網路干擾的方案	★ 萃取駕駛人在不同環境之駕駛行為方法
★ 非地面網路中基於位置的隨機接入分配方法	★ TrustFADE: 針對可程式化邏輯區塊之安全認證方法
★ 捷徑問題在特殊圖形上之演算研究	★ 行動電腦教室與其管理系統的設計與建置
★ 蛋白質體視覺化系統之實作	★ 最小切割樹群聚演算法極端情形之研究
★ 教室內應用無線科技之一對一數位學習模式	★ 蛋白質交互作用網路之視覺化系統
★ 以賓果式遊戲輔助技巧熟練之數位學習環境設計與實作	★ 蛋白質註解的三維視覺化工具
★ Joyce 2：一個在一對一數位教室環境下之小組競爭遊戲	★ 同儕計算網路上內文散佈演算法之實作與效能評估
★ 在直角多邊形上使用基因演算法畫樹之研究	★ 從生物文件中萃取出蛋白質或基因之名稱

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

了解各種蛋白質在細胞中的作用一直是生物學中一項很重要的課題，近年來，由於新的實驗技術相繼問世，有些實驗技術可以在單一實驗中產生大量實驗結果，例如雙雜合系統可以在一次實驗中產生大量蛋白質交互作用的資料，這些資料通常都會隱含著某些具有生物意義的訊息。
在這篇論文中，我們提出了一個基於潛在語義的線索的方法，這個方法可以用來萃取隱藏在蛋白質交互作用網路中具有生物意義的訊息。在資訊擷取的領域中，一字多義與多字一義一直是導致擷取結果不正確的主因，而潛在語義的線索具有解決這些問題的能力。在蛋白質交互作用網路中，經常會存在一些錯誤或者是不明確的訊息，我們利用潛在語義的線索來過濾這一些訊息。我們的結果顯示出這個方法確實能幫我們過濾這些訊息並且擷取出具有高度功能相關的蛋白質。

摘要(英)

Determining protein function is one of the most important tasks in the post-genomic era. Large-scale biological experiment results such as protein interaction networks can be obtained now, and these data often involve the information about protein functions.
In this thesis, we present an approach based on Latent Semantic Indexing (LSI) to extract this information from protein interaction networks. LSI is an information retrieval technique that can solve the synonymy and polysemy problems. Because biologists believe that there are a lot of false positives and false negatives in protein interaction networks, we use the properties of LSI to filter out the wrong and confused information retrieved from these networks. Our results show that our approach can find out the functional related proteins in cells.

關鍵字(中)

★ 蛋白質功能預測
★ 蛋白質交互作用網路
★ 潛在語義的線索

關鍵字(英)

★ protein interaction network
★ protein function prediction
★ latent semantic indexing

論文目次

TABLE OF CONTENTS I
LIST OF FIGURES II
LIST OF TABLES III
1 INTRODUCTION 1
2 RELATED WORK 5
2.1 Methods based on Sequence Similarity 5
2.2 Methods based on Biological Experiment Data 7
2.3 Comparison between These Directions 10
3 MATERIALS AND METHODS 12
3.1 Introduction of Latent Semantic Indexing 13
3.1.1 Term-Document Matrix 13
3.1.2 Truncated Singular Value Decomposition 14
3.1.3 Similarity Definition 15
3.1.4 Basic Properties of LSI 16
3.2 Latent Semantic Indexing of Protein Interaction Network 17
3.2.1 Modeling 17
3.2.2 Similarity and Clustering 18
4 EXPERIMENTS, RESULTS, AND DISCUSSIONS 20
4.1 Data Handling 20
4.2 Validations of Our Method 21
4.2.1 Experiment processes 21
4.2.2 Similarity Test Results 22
4.2.3 Clustering Test Results 28
4.3 Fault Tolerance Experiments 33
4.3.1 Experiment Process 33
4.3.2 Fault Tolerance Results 33
4.4 Comparison with Other Methods 36
5 CONCLUSION AND FUTURE WORK 39
REFERENCE 40

參考文獻

[1] Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. (1999) From molecular to molecular cell biology. Nature, 402, C47-52.
[2] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., and Barabási, A. –L. (2002) Hierarchical Organization of Modularity in Metabolic Networks. Science, 297, 1551-1555.
[3] Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J. M. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623-627.
[4] Tong, A. H. Y., Evangelista, M., Parsons, A. B., Xu, H., Bader, G. D., Pagé, Robinson, M., Raghibizadeh, S., Hogue, C. W. V., Bussey, H., Andrews, B., Tyers, M., and Boone, C. (2001) Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants. Science, 294, 2364-2368.
[5] Tong, A. H. Y., Lesage, G., Bader, G. D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G. F., Brost, R. L., Chang, M., Chen, Y. Q., Cheng, X., Chua, G., Friesen, H., Goldberg, D. S., Haynes, J., Humphries, C., He, G., Hussein, S., Ke, L., Krogan, N., Li, Z., Levinson, J. N., Lu, H., Mébard, P., Munyana C., Parsons, A. B., Ryan, O., Tonikian, R., Roberts, T., Sdicu, A. M., Shapiro, J., Sheikh, B., Suter, B., Wong, S. L., Zhang L. V., Zhu, H., Burd, C. G., Munro, S., Sander, C., Rine, J., Greenblatt, J., Peter, M., Bretscher, A., Bell, G., Roth, F. P., Brown G. W., Andrews, B., Bussey, H., and Boone, C. (2004) Global Mapping of the Yeast Genetic Interaction Network. Science, 303, 808-813.
[6] Ge, H., Liu, Z., Church, G. M., and Vidal, M. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerivisia. Nature Genetics, 29, 482-486.
[7] Deerwester, S., Dumais, S. T., and Harshamn, R. (1990) Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41, 391-407.
[8] Enright, A. J., Dongen, S. V., and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research, 30, 7, 1575-1584.
[9] Dongen, S. V. (2000) Graph clustering by flow simulation. PhD Thesis, University of Utrecht, The Netherlands.
[10] Altschul., S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, 17, 3389-3402.
[11] Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M. D. R., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N. J., Oinn, T. M., Pagni, M., Servant, F., Sigrist, C. J. A., and Zdobnov, E. M. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research, 29, 1, 37-40.
[12] Conte, L. L., Ailey, B., Hubbard, T. J. P., Brenner, S. E., Murzin, A. G., and Chothia, C. (2000) SCOP: a Structural Classification of Proteins database. Nucleic Acids Research, 28, 1, 257-259.
[13] Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A Genomic Perspective on Protein Families. Science, 278, 631-637.
[14] Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research, 28, 1, 33-36.
[15] Tatusov, R. L., Natale, D. A., Garkavtsev, L. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., and Koonin, E. V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Research, 29, 1, 22-28.
[16] Heger, A. and Holm, L. (2000) Towards a covering set of protein family profiles. Progress in Biophysics & Molecular Biology, 73, 321-337.
[17] Schwikowski, B., Uetz, P., and Fileds S. (2000) A network of protein-protein interactions in yeast. Nature Biotechnology, 18, 1257-1261.
[18] Vazquez, A., Flammini, A., Maritan, A., and Vespignani, A. (2003) Global protein function prediction from protein-protein interaction networks. Nature Biotechnology, 21, 6, 697-700.
[19] Kirkpatrck, S., Gelatt, C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science, 220, 671-680.
[20] Mewes, H. W., Frishman, D., Güldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstren, B., Münsterkötter, M., Rudd, S., and Weil, B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Research, 30, 1, 31-34.
[21] Samanta, M. P. and Liang, S. (2003) Predicting protein functions from redundancies in large-scale protein interaction networks. PNAS, 100, 22, 12579-12583.
[22] von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417, 399-403.
[23] Pereira-Leal, J. B., Enright, A. J., and Ouzounis, C. A. (2004) Detection of Functional Modules From Protein Interaction Networks. PROTEINS: Structure, Function, and Bioinformatics, 54, 49-57.
[24] Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Research, 30, 1, 42-46.
[25] Andrade, M. A., Brown, N. P., Leroy, C., Hoersch, S., Daruvar, A. D., Reich, C., Franchini, A., Tamanes, J., Valencia, A., Ouzounis, C., and Sander, C. (1999) Automated genome sequence analysis and annotation. Bioinformatics, 15, 5, 391-412.
[26] Kelley, B. P., Sharan, R., Karp, R. M., Sittler, T., Root, D. E., Stockwell, B. R., and Ideker, T. (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 100, 20, 11394-11399.
[27] Sharan, R., Ideker, T., Kelley, B. P., Shamir, R., and Karp, R. M. (2004) Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data. Proceedings of the Eighth Annual International Conference on Computational Molecular Biology, 282-289.
[28] The TIGR database. http://www.tigr.org.
[29] Segal, E., Wang, H., and Koller, D. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 1, i264-i272.
[30] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-39.
[31] Berry, M., Do, T., O’Brien, G., Krishna, V., and Varadhan S. (1993) SVDPACKC (Version 1.0) User’s Guide.
[32] Dowling, J. (2002) Information Retrieval using Latent Semantic Indexing and a Semi-Discrete Matrix Decomposition. Thesis.
[33] Kolda, T.G. and O’Leary, D. P. (1998) A Semidiscrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval. ACM Transactions on Information Systems, 16, 4, 322-346.
[34] Papadimitriou, C. H., Raghavan, P., and Tamaki, H. (1998) Latent Semantic Indexing: A Probabilistic Analysis. Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, 159-168.
[35] Rosario B. (2000) Latent Semantic Indexing: An overview. INFOSYS 240.
[36] Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Research, 32, D449-D451.
[37] Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: the Database of Interacting Proteins. Nucleic Acids Research, 28, 1, 289-291.
[38] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Traver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology. Nature Genetics, 25, 25-29.
[39] Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y., Juvik, G., Roe, T. Y., Schroeder, M., Weng, S., and Botstein, D. (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Research, 26, 1, 73-79.
[40] Gavin, A. C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J. M., Michon, A. M., Cruciat, C. M., Remor, M. Höfert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch,A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M. A., Copley, R. R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., and Superti-Furga, G. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141-147.
[41] Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D., Moore, L., Adams, S. L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yand, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A. R., Sassi, H., Nielsen, P. A., Rasmussen, K. J., Andersen, J. R., Johansen, L. E., Hansen, L. H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Sǿrensen, B. D., Matthiesen, J., Hendrickson, R. C., Gleeson, F., Pawson, T., Moran, M. F., Durocher, D., Mann, M., Hogue, C. W. V., Daniel, F., and Tyers, M. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180-183.
[42] Krogan, N. J., Peng, W. T., Cagney, G., Robinson, M. D., Haw, R., Zhong, G., Gau, X., Zhang, X., Canadien, V., Richards, D. P., Beattie, B. K., Lalev, A., Zhang, W., Davierwala, A. P., Mnaimneh, S., Starostine, A., Tikuisis, A. P., Grigull, J., Datta, N., Bray, J. E., Hughes, T. R., Emili, A., and Greenblatt, J. F. (2004) High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes. Molecular Cell, 13, 225-239.

指導教授

何錦文(Chin-Wen Ho)

審核日期

2005-7-18

推文