博碩士論文 93532025 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:20 、訪客IP:18.206.177.17
姓名 陳聰百(Tsung-Pai Chen)  查詢紙本館藏   畢業系所 資訊工程學系在職專班
論文名稱 利用峰點特徵值來分析高解析度蛋白質質譜資料
(Analysis of high-resolution protein mass spectrabased on peak feature selection)
相關論文
★ 應用嵌入式系統於呼吸肌肉群訓練儀之系統開發★ 勃起障礙與缺血性心臟病的雙向研究: 以台灣全人口基礎的世代研究
★ 基質輔助雷射脫附飛行時間式串聯質譜儀 微生物抗藥性資料視覺化工具★ 使用穿戴式裝置分析心律變異及偵測心律不整之應用程式
★ 建立一個自動化分析系統用來分析任何兩種疾病之間的關聯性透過世代研究設計以及使用承保抽樣歸人檔★ 青光眼病患併發糖尿病,使用Metformin及Sulfonylurea治療得到中風之風險:以台灣人口為基礎的觀察性研究
★ 利用組成識別和序列及空間特性構成之預測系統來針對蛋白質交互作用上的特殊區段點位進行分析及預測辨識★ 新聞語意特徵擷取流程設計與股價變化關聯性分析
★ 藥物與疾病關聯性自動化分析平台設計與實作★ 建立財務報告自動分析系統進行股價預測
★ 建立一個分析疾病與癌症關聯性的自動化系統★ 基於慣性感測器虛擬鍵盤之設計與實作
★ 一個醫療照護監測系統之實作★ 應用手機開發手握球握力及相關資料之量測
★ 利用關聯分析全面性的搜索癌症關聯疾病★ 全面性尋找類風濕性關節炎之關聯疾病
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 表面強化雷射解析電離飛行質譜(SELDI-TOF)及基質輔助雷射脫附游離法飛行時間質譜(MALDI-TOF)技術是目前使用於辨識生物標記的技術。本論文是使用來自美國國家癌症研究協會的SELDI-TOF卵巢癌資料集,與來自長庚大學的MALDI-TOF口腔癌資料集。樣本皆區分為控制組及癌症病患組。我們的研究目標是縮減質譜的高維度並從中擷取出有意義的特徵峰點。抽取特徵的方法諸如基線校正、峰點偵測、質譜校準等。特徵選取則利用 Kolmogorov-Smirnov檢定(KS 檢定)、Logistic Regression(邏輯斯迴歸)和Random Forest 等方法。有鑑別力的特徵被挑選出來之後再應用三種分類方法來針對資料集做分類預測。
我們分別挑選了50個和100個最有鑑別力的特徵峰點來做1000次重複隨機性地10-fold 交叉驗證,並利用regression tree with bagging(迴歸樹), k-nearest neighbor(k 個最近鄰居)及SVM(支持向量機)等分類方法所得到的靈敏度(Sensitivity)、特異度(Specificity)、準確度(Accuracy)、精準度(Precision)皆有不錯的分類效果。同時我們也開發了一個質譜相關性查詢系統,去辨識在癌症及非癌症族群有高度相關的峰點值。在此我們提出的分析流程可以提供一個相對較小的特徵峰點資料集,該資料集具有足夠識別力來進行分類預測及相關性分析的研究。
摘要(英) The SELDI-TOF and MALDI-TOF process are the currently used techniques to identify biomarkers for cancers. Our work has focused on the ovarian cancer dataset that is generated by SELDI-TOF technique from National Cancer Institute, USA. Another study set is the oral cancer dataset that is generated by MALDI-TOF technique from Proteomics Center of Chang Gung University, Taiwan. The aim of this work is to reduce the high dimensionality of the mass spectra and extract the significant peak-features for further study. The methods used such as baseline subtraction, peak detection, spectra alignment and normalization are used for feature extraction. Kolmogorov-Smirnov test, logistic regression and random forest are used for feature selection. After feature selection, discriminatory peak-features are selected and three methods had applied to classify the two classes of the ovarian cancer datasets. The selected 50 and 100 most discriminatory peak-features were applied to do classification with 1000 replications using 10-fold proportional validation independently. The results yielded good accuracy, precision, sensitivity and specificity respectively, by regression tree with bagging, k-nearest neighbor and SVM classifier. We also develop a correlation based query system to identify the highly correlated peaks of cancer and non-cancer groups. The analysis pipeline that we proposed could provide a relatively small peak-feature set that is discriminatory enough for classification and correlation based studies.
關鍵字(中) ★ 質譜校準
★ 峰點偵測
★ 質譜儀
★ 分類預測
★ 基線校正
關鍵字(英) ★ feature selection
★ SELDI-TOF
★ MALDI--TOF
★ classification
★ peak detection
論文目次 CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND 1
1.2 MOTIVATION 2
1.3 GOAL 2
CHAPTER 2 RELATED WORKS 4
2.1 MASS SPECTROMETRY: 4
2.2 LOGISTIC REGRESSION IN R 5
2.3 REGRESSION TREE WITH BAGGING IN R 6
2.4 SUPPORT VECTOR MACHINE IN R 6
2.5 K-NEAREST-NEIGHBOR CLASSIFICATION IN R 6
2.6 RANDOM FOREST IN R 7
2.7 LITERATURE REVIEWS 8
2.7.1 Data preprocessing and classification 8
2.7.2 Correlation study 9
CHAPTER 3 MATERIALS AND METHODS 11
3.1 MATERIAL 11
3.2 METHODS 11
3.2.1 Preprocessing for feature extraction: 12
3.2.2 Feature selection: 16
3.2.3 Classification of mass spectra 20
3.2.4 Correlation associated peak-feature networks 21
3.3 SOFTWARE 21
CHAPTER 4 RESULT 23
4.1 N-FOLD PROPORTIONAL CROSS-VALIDATION 20
4.2 RESULTS COMPARISON AND HEAT MAP OF NCI DATA 23
4.3 TEN-FOLD PROPORTIONAL CROSS-VALIDATION 24
4.4 CORRELATION QUERY SYSTEM 25
CHAPTER 5 DISCUSSION AND CONCLUSION 30
REFERENCE 33
APPENDIX A -1 35
APPENDIX A -2 36
APPENDIX B 37
APPENDIX C 38
APPENDIX D 39
APPENDIX E 40
參考文獻 Alexandros Kalousis, J. P., Elton Rexhepaj and Melanie Hilario (2005). Feature Extraction from Mass Spectra for Classification. Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, October 3-7, 2005, Porto, Portugal, Springer Berlin / Heidelberg.
Baggerly, K. A., J. S. Morris, et al. (2003). "A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples." Proteomics 3(9): 1667-72.
Breiman, L. (1996). "Bagging Predictors." Machine Learning 24(2): 123-140.
Breiman, L. (1998). "Arcing Classifiers." The Annals of Statistics 26(3): 801-824.
Breiman, L. (2001). "Random Forests." Machine Learning 45(1): 5-32.
Chen, Y. and D. Xu (2003). "Computational analyses of high-throughput protein-protein interaction data." Curr Protein Pept Sci 4(3): 159-81.
Cheng, A. J., L. C. Chen, et al. (2005). "Oral cancer plasma tumor marker identified with bead-based affinity-fractionated proteomic technology." Clin Chem 51(12): 2236-44.
Conrads, T. P., V. A. Fusaro, et al. (2004). "High-resolution serum proteomic features for ovarian cancer detection." Endocr Relat Cancer 11(2): 163-78.
Diaz-Uriarte, R. and S. Alvarez de Andres (2006). "Gene selection and classification of microarray data using random forest." BMC Bioinformatics 7: 3.
Gentzel, M., T. Kocher, et al. (2003). "Preprocessing of tandem mass spectrometric data to support automatic protein identification." Proteomics 3(8): 1597-610.
Jacobs, I. J. and U. Menon (2004). "Progress and challenges in screening for early detection of ovarian cancer." Mol Cell Proteomics 3(4): 355-66.
Keith A. Baggerly, K. R. C., and Jeffrey S. Morris (2005). "Bias, Randomization, and Ovarian Proteomic Data: A Reply to "Producers and Consumers"." Cancer Informatics 1(1): 9-14.
Li, J., Z. Zhang, et al. (2002). "Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer." Clin Chem 48(8): 1296-304.
Liaw, A. and M. Wiener (2002). "Classification and regression by randomForest." R News Vol. 2/3: 18-22.
Liotta, L. A., M. Ferrari, et al. (2003). "Clinical proteomics: written in blood." Nature 425(6961): 905.
Malyarenko, D. I., W. E. Cooke, et al. (2005). "Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques." Clin Chem 51(1): 65-74.
Markey, M. K., G. D. Tourassi, et al. (2003). "Decision tree classification of proteins identified by mass spectrometry of blood serum samples from people with and without lung cancer." Proteomics 3(9): 1678-9.
Petricoin, E. F., A. M. Ardekani, et al. (2002). "Use of proteomic patterns in serum to identify ovarian cancer." Lancet 359(9306): 572-7.
Qu, Y., B. L. Adam, et al. (2002). "Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients." Clin Chem 48(10): 1835-43.
Ressom, H. W., R. S. Varghese, et al. (2005). "Analysis of mass spectral serum profiles for biomarker selection." Bioinformatics 21(21): 4039-45.
Sauve AC and S. TP (2004). Normalization, baseline correction and alignment of high-throughput mass spectrometry data. Workshop on Genomic Signal Processing and Statistics (GENSIPS), Baltimore, Maryland, USA.
Svetnik V, L. A. (2001). Detecting Novel Samples in Mass Spectral Data: A Clustering Approach. Proceedings of the 33rd Symposium on the Interface, Costa Mesa, CA, USA.
Teneriello, M. G. and R. C. Park (1995). "Early detection of ovarian cancer." CA Cancer J Clin 45(2): 71-87.
Wagner, M., D. N. Naik, et al. (2004). "Computational protein biomarker prediction: a case study for prostate cancer." BMC Bioinformatics 5: 26.
Wolski, W. E., M. Lalowski, et al. (2005). "Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process." BMC Bioinformatics 6: 285.
Wong, J. W., G. Cagney, et al. (2005). "SpecAlign--processing and alignment of mass spectra datasets." Bioinformatics 21(9): 2088-90.
Yu, J. and X. W. Chen (2005). "Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data." Bioinformatics 21 Suppl 1: i487-94.
Yu, J. S., S. Ongarello, et al. (2005). "Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data." Bioinformatics 21(10): 2200-9.
指導教授 陳廣典、洪炯宗
(Kuang-Den Chen、Jorng-Tzong Horng)
審核日期 2006-7-17
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明