博碩士論文 93522004 詳細資訊


姓名 郭俊利(Jun-Li Kuo)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 針對生物微晶片資料利用決策樹選取關鍵基因
(Gene selection by decision tree and classification for microarray gene expression data)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 結合生物微晶片實驗與電腦計算分析是目前研究癌症的一項新興科技,藉由上萬個基因表現來預測癌症的各項顯示特徵是否出現,甚至找出規則以了解癌症的成因,影響的方式,並且發展藥物療程來抑制癌症。不只癌症,任何未知的疾病都適用此方法。基因檢選對於分析生物微晶片資料是很重要的一個步驟,它可以讓我們知道哪些基因是對於疾病有判斷力及參與關鍵調控的;然而利用資訊工程方面的技術,如數值分析、機器學習、資料探勘來研究此議題會碰到兩個問題:「屬性維度太過巨大問題」與「訓練模型過適問題」!
於是我們希望設計出一套流程,能夠分析生物微晶片資料之後挑選出高度可能影響癌症的基因,這些基因有準確的鑑別力能建構一個順練分類模組,最後我們將以發表過的資料組與合作夥伴台大醫院的乳癌資料組作測試,發現我們的系統的確有非常好的準確性,經過資料探勘技術後的評價分析,我們相信可以減低訓練模組過適問題而達到預測的效果,讓擁有生物微晶片資料的生物學家能夠透過我們的系統流程分析,得到真正有鑑別力的基因,也許再用生物技術做驗證,大大減低實驗的經費與人力。
摘要(英) Gene selection can help to analyze microarray gene expression data. However, it is very difficult to classify a satisfied result by machine learning techniques because of a curse-of-dimensionality problem and an overfitting problem, i.e. the dimension of features is too large but the samples are too few. Therefore, we design a system flow to attempt to avoid the two problems and then select a small set of significant biomarker genes for diagnosis in order to classify correctly. Furthermore, we test on some microarray datasets to demonstrate that our system is useful and reliable according to the good performance.
關鍵字(中) ★ 分類
★ 生物微晶片
★ 基因
★ 癌症
關鍵字(英) ★ cancer
★ gene selection
★ microarray
★ classification
論文目次 Chapter 1 Introduction 1
1.1 Background 2
1.2 Motivation 4
1.3 Goal 5
Chapter 2 Related Works 6
2.1 Other gene selection methods 6
2.2 WEKA 8
2.3 KEGG 9
Chapter 3 System Flow 12
3.1 Data input 13
3.2 Gene Selection 14
3.2.1 Resampling 14
3.2.2 Tree gathering 15
3.2.3 Gene selecting 17
3.3 Classification 19
Chapter 4 Materials 22
4.1 Public datasets 22
4.2 NTU hospital data 23
Chapter 5 Results 26
5.1 The performance for public datasets 26
5.2 The performance for NTU hospital data 27
5.2.1 Metastasis diagnosis 27
5.2.2 Her2-positive diagnosis 30
Chapter 6 Discussion 33
References 35
Appendix 38
參考文獻 1. Su, A.I., et al., Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res, 2001. 61(20): p. 7388-93.
2. Antonov, A.V., et al., Optimization models for cancer classification: extracting gene interaction information from microarray expression data. Bioinformatics, 2004. 20(5): p. 644-52.
3. http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html.
4. Wang, Y., et al., HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics, 2005. 21(8): p. 1530-7.
5. Brown, T.A., Genomes 2nd. 2002.
6. Qiu, P., Z.J. Wang, and K.J. Liu, Ensemble dependence model for classification and prediction of cancer and normal gene expression data. Bioinformatics, 2005. 21(14): p. 3114-21.
7. Aronow, B.J., B.D. Richardson, and S. Handwerger, Microarray analysis of trophoblast differentiation: gene expression reprogramming in key gene function categories. Physiol Genomics, 2001. 6(2): p. 105-16.
8. Choi, J.K., et al., Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics, 2005. 21(24): p. 4348-55.
9. Brennan, D.J., et al., Application of DNA microarray technology in determining breast cancer prognosis and therapeutic response. Expert Opin Biol Ther, 2005. 5(8): p. 1069-83.
10. Li, T., C. Zhang, and M. Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 2004. 20(15): p. 2429-37.
11. Li, X., et al., Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Res, 2004. 32(9): p. 2685-94.
12. Bae, K. and B.K. Mallick, Gene selection using a two-level hierarchical Bayesian model. Bioinformatics, 2004. 20(18): p. 3423-30.
13. Buturovic, L.J., PCP: a program for supervised classification of gene expression profiles. Bioinformatics, 2006. 22(2): p. 245-7.
14. Yeung, K.Y., R.E. Bumgarner, and A.E. Raftery, Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics, 2005. 21(10): p. 2394-402.
15. Ein-Dor, L., et al., Outcome signature genes in breast cancer: is there a unique set? Bioinformatics, 2005. 21(2): p. 171-8.
16. Statnikov, A., et al., A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 2005. 21(5): p. 631-43.
17. Antonov, A.V., et al., Exploiting scale-free information from expression data for cancer classification. Comput Biol Chem, 2005. 29(4): p. 288-93.
18. Chu, W., et al., Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics, 2005. 21(16): p. 3385-93.
19. Golub, T.R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999. 286(5439): p. 531-7.
20. Li, J., et al., Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics, 2003. 19 Suppl 2: p. II93-II102.
21. Kanehisa, M., The KEGG database. Novartis Found Symp, 2002. 247: p. 91-101; discussion 101-3, 119-28, 244-52.
22. Witten, I.H. and E. Frank, Data mining: practical machine learning tools and techniques with Java implementations. 1999.
23. Kanehisa, M., et al., The KEGG databases at GenomeNet. Nucleic Acids Res, 2002. 30(1): p. 42-6.
24. Papaldo, P., et al., A phase II study on metastatic breast cancer patients treated with weekly vinorelbine with or without trastuzumab according to HER2 expression: changing the natural history of HER2-positive disease. Ann Oncol, 2006. 17(4): p. 630-6.
25. King, A., Major developments in adjuvant treatment of early HER2-positive breast cancer. Nat Clin Pract Oncol, 2006. 3(1): p. 10-1.
26. Nabholtz, J.M., et al., HER2-positive breast cancer: update on Breast Cancer International Research Group trials. Clin Breast Cancer, 2002. 3 Suppl 2: p. S75-9.
27. Kunitomo, K., et al., A case of metastatic breast cancer with outgrowth of HER2-negative cells after eradication of HER2-positive cells by humanized anti-HER2 monoclonal antibody (trastuzumab) combined with docetaxel. Hum Pathol, 2004. 35(3): p. 379-81.
28. Quinlan, R., C4.5: Programs for Machine Learning. 1993.
29. Freund, Y. and L. Mason, The alternating decision tree learning algorithm. 1999.
30. Platt, J., et al., Fast Training of Support Vector Machines using Sequential Minimal Optimization. 1998.
31. Keerthi, S.S., et al., Improvements to Platt's SMO Algorithm for SVM Classifier Design. 2001.
32. Mao, X., et al., Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics, 2005. 21(19): p. 3787-93.
33. Harhay, G.P. and J.W. Keele, Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics, 2003. 19(2): p. 249-55.
指導教授 洪炯宗(Jorng-Tzong Horng) 審核日期 2006-7-18
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡