使用兩種方法偵測基因體拷貝數變異：成對高斯合併法與隱藏馬可夫模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：234

、訪客IP：18.216.214.63

姓名

楊立行(Li-hsing Young) 查詢紙本館藏

畢業系所

系統生物與生物資訊研究所

論文名稱

使用兩種方法偵測基因體拷貝數變異：成對高斯合併法與隱藏馬可夫模型
(Copy number variation detection with two methods：Pair-wise Gaussian Merging and Hidden Markov Model)

相關論文

★ 人類陰道滴蟲之Myb2蛋白質動態性質研究	★ 分析原核生物基因體複製起點與終點的反向對偶對稱現象
★ 分析基因體拷貝數變異所使用的兩種方法比較：隱藏馬可夫模型與成對高斯合併法	★ 以整體晶片數據為母體應用於分析基因差異表達的z檢定方法
★ GSLHC －運用基因組及層次類聚以生物功能群將有生物活性的複合物定性的方法	★ 一個檢定測量微晶片基因表達數據靈敏度的全統計計算法
★ 運用嶄新抗體固著策略發展及驗證新式抗體微晶片平台	★ Drug-resistant colon cancer cells produce high carcinoembryonic antigen and might not be cancer-initiating cells
★ 創傷性關節炎軟骨之退化進程－大鼠模型基因體圖譜研究	★ 基因體功能統合分析在阿茲海默症和大腦老化－近年阿茲海默症研發藥物失敗的理論問題探討
★ 運用時間序列微陣列資料來預測調控基因	★ 以大鼠嗜鉻性瘤細胞株建立神經訊號傳遞之細胞分子生物學模型
★ 一種找尋再利用藥物複合物來系統性治療複雜疾病的架構：大腸直腸腺瘤的應用	★ 以上皮細胞間質化與增生相關功能來描述癌症幹細胞之基因型
★ 從共表達差異基因對導出正常腦老化及因阿茲海默症特定腦區導致在功能性基因途徑與樞紐基因子網絡之變化	★ 以疾病進展趨勢挑選基因法識別正常腦老化與阿爾茨海默氏症在特定腦區引發的關鍵功能路徑與調節路徑之變化

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

目前，很多的研究顯示物種間的顯型、特徵差異以及疾病、癌症發病機制跟遺傳變異有關聯。而在全基因組中，由於拷貝數變異發生的規模,也就是發生拷貝數變異的區段涵蓋在基因組程度較高，推論拷貝數變異是遺傳多樣性的關鍵來源之一。因此，檢測全基因體拷貝數變異來研究遺傳學逐漸變成一個重要的方法。然而，因為取得人類組織樣本不容易、花費高，使得評估全面性的全基因體拷貝數變異較為困難，為了克服這樣的限制,我們採用了目前微陣列晶片技術水準也達到了不錯水平的生物個體，取得來源較易、品系間的差異也有豐富資訊可供查詢的老鼠來當作我們的模型。此外，相對於之前數量較少的晶片實驗數據而言，我們傾向使用符合處理更大樣本數量以及更高解析度微陣列晶片龐大數據的演算法來做後續分析。我們比較兩種不同的演算法─隱藏馬可夫模型以及成對高斯合併法─執行在比較性全基因體雜交晶片所判定的老鼠全基因體拷貝數變異結果，除了發現兩者在長度及位置上有顯著的差異，我們更進一步地分析兩種演算法的優缺點，並試著挑選成對高斯合併法在本實驗較適當的參數數值。雖然，兩種演算法背後有著截然不同的理論支持，會導致判定區段時的策略有所不同，但具體地說，成對高斯合併法判定的拷貝數變異區段相對於隱藏馬可夫模型判定的結果，有著較廣的區段長度分布，相對地，區段個數卻可能比較少。其最可能原因是隱藏馬可夫模型的基本假設是拷貝變異區段的開始與結束都隨機發生，而機率由訓練（已知）的數據決定。因此，隱藏馬可夫模型所預測的變異區段的長度有很高的相似性。也就是說相較於成對高斯合併法，隱藏馬可夫模型所預測的變異區段長度的變化遠遠較小。這是隨機事件的特性，然而我們知道生物事件的空間分佈大部分是不隨機的。相反的，成對高斯合併法則沒有對事件作任何的隨機假設。基於以上觀點，我們認為高斯合併法預測結果的正確性較高。這個推論與之前成對高斯合併法與其它偵測方法，包括隱藏馬可夫模型，作比較之後所取得的結論相符合。最後就本比較而言，我們的結論是兩個方法的結果有相當大的差異，但評價仍須由實驗決定。

摘要(英)

So far extensive studies are being performed to associate phenotypic differences、disease susceptibility and pathogenesis of cancer with genetic variation. And at a genome-wide scale, since copy number variation（CNV）regions cover more content of the genome, suggesting the importance of CNV in one form of genetic diversity. Hence, detection of CNV in whole-genome DNA become an important method for understanding genetics. However, owing to the difficulty in obtaining human tissue samples and more expensive costs, such that a global assessment of CNVs exists challenges. To overcome this limitation, we use a reliable technology, mouse whole genome CGH microarray, as our biological model source, and there are complete
information of mouse strains in databases. Additionally, compare occupied on overall less memory in previous data, we trend to execute appropriate algorithms towards even larger sample sizes and higher resolution microarrays in follow-up analysis. We compare two different algorithms－Pair-wised Gaussian Merging（PGM）and Hidden Markov Model（HMM）－to detect copy number variations of mouse genome. In addition to results of two algorithms are significantly different, we further analyze advantages and defections of two algorithms, and we try to choose some more appropriate parameters in PGM. Although, those are two different kinds of theory to support two algorithms, result in different strategies of detections. Specifically compared to HMM, results of PGM exist wider distribution of size of CNV-regions, however, counts of CNV-regions are lower. We suggest one of reasons is, HMM is a stochastic generative model for time series defined by a finite set of states, and the probabilities depend on training（past）.Therefore, the segmental lengths of predicted CNVs of HMM are similar. In other words, compare to PGM, prediction of HMM is a narrower distribution of segmental lengths of CNVs. We could say, this is a random variable； however, we know the spatial distributions of biology are the non-random association. On the other hand, we don’’t make a random assumption in PGM. Based on the view, we suggest that the results of PGM are more precise. This inference with before, compared PGM to other algorithms（include HMM）are coincident. Finally, in this research, we conclude much divergent in results of these two methods, but for ultimately evaluation still depends on some experiments in the future.

關鍵字(中)

★ 比較性全基因體雜交晶片
★ 隱藏馬可夫模型
★ 成對高斯合併法
★ 拷貝數變異

關鍵字(英)

★ comparative genomic hybridization chip
★ copy number variation
★ Hidden Markov Model
★ Pair-wise Gaussian Merging

論文目次

中文摘要 i
英文摘要 ii
誌謝 iv
目錄 v
一、簡介 1
1-1 拷貝數變異 1
1-2 生物樣本的挑選 2
1-3 比較性全基因體雜交晶片 3
1-4 成對高斯合併法 5
1-5 隱藏馬可夫模型 9
1-6 相關研究 11
二、研究內容與方法 14
2-1 成對高斯合併法的參數選擇 14
2-2 執行PGM後，CNV的判定方式 15
三、結果 16
3-1 成對高斯合併法和隱藏馬可夫模型的整體結果比較 16
3-2 成對高斯合併法和隱藏馬可夫模型對64片CGH老鼠晶片分
別比較的結果 24
　　3-3 研究流程 34
四、討論 35
五、參考文獻 36
六、附錄 38
　　網址 CNV區段表 39
　　附表 PGM執行Z0＝3.28，分別偵測64片晶片（未合併）的CNV
統計表格 40

參考文獻

[1] Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C. et al. (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science, 315, 848–853.
[2] Donna G. Albertson and Daniel Pinkel. (2003) Genomic microarrays in human genetic disease and cancer. Human Molecular Genetics , Vol. 12 DOI: 10.1093/hmg/ddg261
[3] Chao Xie and Martti T Tammi. (2009) CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics, 10:80
doi:10.1186/1471-2105-10-80
[4] Henrichsen CN, Vinckenbosch N, Zöllner S, Chaignat E, Pradervand S, Schütz F, Ruedi M, Kaessmann H, Reymond A. (2009) Segmental copy number variation shapes tissue transcriptomes. Nat. Genet., 41, 424–429.
[5] Chih-Hao Chen, Hsing-Chung Lee, Qingdong Ling, Hsiao-Rong Chen, Yi-An Ko, Tsong-Shan Tsou, Sun-Chong Wang, Li-Ching Wu and H. C. Lee. (2011) An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes. Nucleic Acids Research, 1–7 doi:10.1093/nar/gkr137.
[6] Jia Li. Hidden Markov Model.
[7] Kees Jong, Elena Marchiori, Gerrit Meijer, A. v. d. Vaart and Bauke Ylstra. et al. (2004) Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics, 20, 3636–3637.
[8] Lai WR, Johnson MD, Kucherlapati R, Park PJ. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21: 3763–3770.
[9] Jane Fridlyand, Antoine M. Snijders, Dan Pinkel, Donna G. Albertson and A.N.Ajay N. Jain. (2004) Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal., 90, 132–153.
[10] Adam B. Olshen, E. S. Venkatraman, Robert Lucito and Michael Wigler. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557–572.
[11] Hupé P, Stransky N, Thiery JP, Radvanyi F, Barillot E. (2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics, 20, 3413–3422.
[12] 鄭主佑 et al. (2010) A Comparison of Genome Copy Number Variation Analysis using two Methods: Hidden Markov Model and Pair-wise Gaussian Merging. 中央大學碩士論文

指導教授

李弘謙(Hoong-Chien Lee)

審核日期

2011-6-29

推文