摘要(英) |
So far extensive studies are being performed to associate phenotypic differences、disease susceptibility and pathogenesis of cancer with genetic variation. And at a genome-wide scale, since copy number variation(CNV)regions cover more content of the genome, suggesting the importance of CNV in one form of genetic diversity. Hence, detection of CNV in whole-genome DNA become an important method for understanding genetics. However, owing to the difficulty in obtaining human tissue samples and more expensive costs, such that a global assessment of CNVs exists challenges. To overcome this limitation, we use a reliable technology, mouse whole genome CGH microarray, as our biological model source, and there are complete
information of mouse strains in databases. Additionally, compare occupied on overall less memory in previous data, we trend to execute appropriate algorithms towards even larger sample sizes and higher resolution microarrays in follow-up analysis. We compare two different algorithms-Pair-wised Gaussian Merging(PGM)and Hidden Markov Model(HMM)-to detect copy number variations of mouse genome. In addition to results of two algorithms are significantly different, we further analyze advantages and defections of two algorithms, and we try to choose some more appropriate parameters in PGM. Although, those are two different kinds of theory to support two algorithms, result in different strategies of detections. Specifically compared to HMM, results of PGM exist wider distribution of size of CNV-regions, however, counts of CNV-regions are lower. We suggest one of reasons is, HMM is a stochastic generative model for time series defined by a finite set of states, and the probabilities depend on training(past).Therefore, the segmental lengths of predicted CNVs of HMM are similar. In other words, compare to PGM, prediction of HMM is a narrower distribution of segmental lengths of CNVs. We could say, this is a random variable; however, we know the spatial distributions of biology are the non-random association. On the other hand, we don’’t make a random assumption in PGM. Based on the view, we suggest that the results of PGM are more precise. This inference with before, compared PGM to other algorithms(include HMM)are coincident. Finally, in this research, we conclude much divergent in results of these two methods, but for ultimately evaluation still depends on some experiments in the future.
|
參考文獻 |
[1] Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C. et al. (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science, 315, 848–853.
[2] Donna G. Albertson and Daniel Pinkel. (2003) Genomic microarrays in human genetic disease and cancer. Human Molecular Genetics , Vol. 12 DOI: 10.1093/hmg/ddg261
[3] Chao Xie and Martti T Tammi. (2009) CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics, 10:80
doi:10.1186/1471-2105-10-80
[4] Henrichsen CN, Vinckenbosch N, Zöllner S, Chaignat E, Pradervand S, Schütz F, Ruedi M, Kaessmann H, Reymond A. (2009) Segmental copy number variation shapes tissue transcriptomes. Nat. Genet., 41, 424–429.
[5] Chih-Hao Chen, Hsing-Chung Lee, Qingdong Ling, Hsiao-Rong Chen, Yi-An Ko, Tsong-Shan Tsou, Sun-Chong Wang, Li-Ching Wu and H. C. Lee. (2011) An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes. Nucleic Acids Research, 1–7 doi:10.1093/nar/gkr137.
[6] Jia Li. Hidden Markov Model.
[7] Kees Jong, Elena Marchiori, Gerrit Meijer, A. v. d. Vaart and Bauke Ylstra. et al. (2004) Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics, 20, 3636–3637.
[8] Lai WR, Johnson MD, Kucherlapati R, Park PJ. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21: 3763–3770.
[9] Jane Fridlyand, Antoine M. Snijders, Dan Pinkel, Donna G. Albertson and A.N.Ajay N. Jain. (2004) Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal., 90, 132–153.
[10] Adam B. Olshen, E. S. Venkatraman, Robert Lucito and Michael Wigler. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557–572.
[11] Hupé P, Stransky N, Thiery JP, Radvanyi F, Barillot E. (2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics, 20, 3413–3422.
[12] 鄭主佑 et al. (2010) A Comparison of Genome Copy Number Variation Analysis using two Methods: Hidden Markov Model and Pair-wise Gaussian Merging. 中央大學碩士論文
|