使用兩種方法偵測基因體拷貝數變異：成對高斯合併法與隱藏馬可夫模型

DC 欄位	值	語言
DC.contributor	系統生物與生物資訊研究所	zh_TW
DC.creator	楊立行	zh_TW
DC.creator	Li-hsing Young	en_US
dc.date.accessioned	2011-6-29T07:39:07Z
dc.date.available	2011-6-29T07:39:07Z
dc.date.issued	2011
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=982213004
dc.contributor.department	系統生物與生物資訊研究所	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	目前，很多的研究顯示物種間的顯型、特徵差異以及疾病、癌症發病機制跟遺傳變異有關聯。而在全基因組中，由於拷貝數變異發生的規模,也就是發生拷貝數變異的區段涵蓋在基因組程度較高，推論拷貝數變異是遺傳多樣性的關鍵來源之一。因此，檢測全基因體拷貝數變異來研究遺傳學逐漸變成一個重要的方法。然而，因為取得人類組織樣本不容易、花費高，使得評估全面性的全基因體拷貝數變異較為困難，為了克服這樣的限制,我們採用了目前微陣列晶片技術水準也達到了不錯水平的生物個體，取得來源較易、品系間的差異也有豐富資訊可供查詢的老鼠來當作我們的模型。此外，相對於之前數量較少的晶片實驗數據而言，我們傾向使用符合處理更大樣本數量以及更高解析度微陣列晶片龐大數據的演算法來做後續分析。我們比較兩種不同的演算法─隱藏馬可夫模型以及成對高斯合併法─執行在比較性全基因體雜交晶片所判定的老鼠全基因體拷貝數變異結果，除了發現兩者在長度及位置上有顯著的差異，我們更進一步地分析兩種演算法的優缺點，並試著挑選成對高斯合併法在本實驗較適當的參數數值。雖然，兩種演算法背後有著截然不同的理論支持，會導致判定區段時的策略有所不同，但具體地說，成對高斯合併法判定的拷貝數變異區段相對於隱藏馬可夫模型判定的結果，有著較廣的區段長度分布，相對地，區段個數卻可能比較少。其最可能原因是隱藏馬可夫模型的基本假設是拷貝變異區段的開始與結束都隨機發生，而機率由訓練（已知）的數據決定。因此，隱藏馬可夫模型所預測的變異區段的長度有很高的相似性。也就是說相較於成對高斯合併法，隱藏馬可夫模型所預測的變異區段長度的變化遠遠較小。這是隨機事件的特性，然而我們知道生物事件的空間分佈大部分是不隨機的。相反的，成對高斯合併法則沒有對事件作任何的隨機假設。基於以上觀點，我們認為高斯合併法預測結果的正確性較高。這個推論與之前成對高斯合併法與其它偵測方法，包括隱藏馬可夫模型，作比較之後所取得的結論相符合。最後就本比較而言，我們的結論是兩個方法的結果有相當大的差異，但評價仍須由實驗決定。	zh_TW
dc.description.abstract	So far extensive studies are being performed to associate phenotypic differences、disease susceptibility and pathogenesis of cancer with genetic variation. And at a genome-wide scale, since copy number variation（CNV）regions cover more content of the genome, suggesting the importance of CNV in one form of genetic diversity. Hence, detection of CNV in whole-genome DNA become an important method for understanding genetics. However, owing to the difficulty in obtaining human tissue samples and more expensive costs, such that a global assessment of CNVs exists challenges. To overcome this limitation, we use a reliable technology, mouse whole genome CGH microarray, as our biological model source, and there are complete information of mouse strains in databases. Additionally, compare occupied on overall less memory in previous data, we trend to execute appropriate algorithms towards even larger sample sizes and higher resolution microarrays in follow-up analysis. We compare two different algorithms－Pair-wised Gaussian Merging（PGM）and Hidden Markov Model（HMM）－to detect copy number variations of mouse genome. In addition to results of two algorithms are significantly different, we further analyze advantages and defections of two algorithms, and we try to choose some more appropriate parameters in PGM. Although, those are two different kinds of theory to support two algorithms, result in different strategies of detections. Specifically compared to HMM, results of PGM exist wider distribution of size of CNV-regions, however, counts of CNV-regions are lower. We suggest one of reasons is, HMM is a stochastic generative model for time series defined by a finite set of states, and the probabilities depend on training（past）.Therefore, the segmental lengths of predicted CNVs of HMM are similar. In other words, compare to PGM, prediction of HMM is a narrower distribution of segmental lengths of CNVs. We could say, this is a random variable； however, we know the spatial distributions of biology are the non-random association. On the other hand, we don’’t make a random assumption in PGM. Based on the view, we suggest that the results of PGM are more precise. This inference with before, compared PGM to other algorithms（include HMM）are coincident. Finally, in this research, we conclude much divergent in results of these two methods, but for ultimately evaluation still depends on some experiments in the future.	en_US
DC.subject	比較性全基因體雜交晶片	zh_TW
DC.subject	隱藏馬可夫模型	zh_TW
DC.subject	成對高斯合併法	zh_TW
DC.subject	拷貝數變異	zh_TW
DC.subject	comparative genomic hybridization chip	en_US
DC.subject	copy number variation	en_US
DC.subject	Hidden Markov Model	en_US
DC.subject	Pair-wise Gaussian Merging	en_US
DC.title	使用兩種方法偵測基因體拷貝數變異：成對高斯合併法與隱藏馬可夫模型	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Copy number variation detection with two methods：Pair-wise Gaussian Merging and Hidden Markov Model	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 982213004 完整後設資料紀錄