博碩士論文 92522047 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:46 、訪客IP:3.15.18.189
姓名 江國立(Kuo-Li Chiang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 人類長終端重覆序列之分類與預測使用隱藏馬可夫模型
(Human LTR classification and prediction using Profile Hidden Markov Models)
相關論文
★ 應用嵌入式系統於呼吸肌肉群訓練儀之系統開發★ 勃起障礙與缺血性心臟病的雙向研究: 以台灣全人口基礎的世代研究
★ 基質輔助雷射脫附飛行時間式串聯質譜儀 微生物抗藥性資料視覺化工具★ 使用穿戴式裝置分析心律變異及偵測心律不整之應用程式
★ 建立一個自動化分析系統用來分析任何兩種疾病之間的關聯性透過世代研究設計以及使用承保抽樣歸人檔★ 青光眼病患併發糖尿病,使用Metformin及Sulfonylurea治療得到中風之風險:以台灣人口為基礎的觀察性研究
★ 利用組成識別和序列及空間特性構成之預測系統來針對蛋白質交互作用上的特殊區段點位進行分析及預測辨識★ 新聞語意特徵擷取流程設計與股價變化關聯性分析
★ 藥物與疾病關聯性自動化分析平台設計與實作★ 建立財務報告自動分析系統進行股價預測
★ 建立一個分析疾病與癌症關聯性的自動化系統★ 基於慣性感測器虛擬鍵盤之設計與實作
★ 一個醫療照護監測系統之實作★ 應用手機開發手握球握力及相關資料之量測
★ 利用關聯分析全面性的搜索癌症關聯疾病★ 全面性尋找類風濕性關節炎之關聯疾病
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 在人類基因體上,大約8% 屬於長終端重覆序列反轉錄跳躍子(LTR elements),長
終端重覆序列是跳躍子(transposable elements;TEs)中變異最大的一部分。大多數的
人類長終端重覆序列反轉錄跳躍子來是於人類內生性反轉錄病毒(HERVs),人類內
生性反轉錄病毒的分類學是一個難解的問題,因為反轉錄病毒本身的多變異性。
在人體基因體上,逆向轉移(retrotranslocation)造成單現長終端重覆序列(Solitary
LTRs)與不完整的反轉錄病毒序列。在長終端重覆序列的調控區域中,啟動子
(promoter)和加強子(enhancer)在移動演化的過程中會被保留下來,因此,我們擷
取這些保留區域當作特徵來建立隱藏馬可夫模型(Hidden Markov Model),使用隱
藏馬可夫模型,我們可以偵測並分類長終端重覆序列。在我們設計實驗中,我們
找到了大部分RepeatMasker 找到的長終端重覆序列,這篇論文中,我們討論使
用我們方法分類的效能與現象。
摘要(英) About 8 % of human genome was annotated as LTR elements. The long terminal
repeats (LTRs) in LTR elements are most divergent part of transposable elements (TEs).
Most human LTR elements come from human endogenous retrovirus (HERVs).
Taxonomy of HERVs is an unresolved problem since the diversity of retrovirus.
Solitary LTRs and partial retroviral sequences are the result of retrotranslocation in
human genomes. There are promoter and enhancer as regulatory sites in LTR and they
could be conserved in mobilization of LTR elements. Therefore, we capture the
conserved regions as fingerprints of LTR and build them into profile Hidden Markov
Models. We classify and predict LTRs using those profiles. From the experimental
results, we find most known LTRs detected by RepeatMasker are also found by our
approach. The performance and appearance in our LTR classifier are discussed.
關鍵字(中) ★ 長終端重覆序列
★ 人類內生性反轉錄病毒
★ 分類
關鍵字(英) ★ long terminal repeats
★ human endogenous retrovirus
★ classification
論文目次 Table of content......................................................................................................... I
List of Figures ........................................................................................................ III
List of Table ............................................................................................................ V
Chapter 1 Introduction.......................................................................................... 1
1.1 Background..........................................................................................1
1.2 Motivation............................................................................................5
1.3 Goal......................................................................................................5
Chapter 2 Related works....................................................................................... 6
2.1 Profile Hidden Markov Models ........................................................... 6
2.2 MEME – motif discovering ................................................................. 6
2.3 RepBase Update and RepeatMasker.................................................... 7
2.4 Tandem repeats finder.......................................................................... 7
2.5 BlastClust.............................................................................................8
2.6 CompareACE.......................................................................................8
Chapter 3 Material and Methods .......................................................................... 9
3.1 Overview..............................................................................................9
3.2 Material..............................................................................................12
3.3 Data preprocessing.............................................................................12
3.4 Motif extraction and building profile HMMs .................................... 14
3.5 Motif scanning and LTR detection..................................................... 15
3.6 Model evaluation ...............................................................................16
3.7 Implementation ..................................................................................20
II
Chapter 4 Experiment and Result....................................................................... 21
4.1 Performance of classification............................................................. 21
4.2 Family to family specificity of classification..................................... 25
4.3 Case study: LTR prediction in human chromosome 22..................... 31
Chapter 5 Discussion and Conclusion ................................................................ 34
Reference ............................................................................................................... 37
Appendix................................................................................................................ 39
參考文獻 Altschul, S. F., T. L. Madden, et al. (1997). "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs." Nucleic Acids Res 25(17):
3389-402.
Bailey, T. L. and C. Elkan (1994). "Fitting a mixture model by expectation maximization
to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol 2: 28-36.
Bailey, T. L. and C. Elkan (1995). "The value of prior knowledge in discovering motifs
with MEME." Proc Int Conf Intell Syst Mol Biol 3: 21-9.
Bannert, N. and R. Kurth (2004). "Retroelements and the human genome: new
perspectives on an old relation." Proc Natl Acad Sci U S A 101 Suppl 2: 14572-9.
Benson, G. (1999). "Tandem repeats finder: a program to analyze DNA sequences."
Nucleic Acids Res 27(2): 573-80.
Brown, T. A. (2002). The Repetitive DNA Content of Genomes. GENOMES: 59-64.
Cheng, B. Y., J. G. Carbonell, et al. (2005). "Protein classification based on text document
classification techniques." Proteins 58(4): 955-70.
Eddy, S. R. (1998). "Profile hidden Markov models." Bioinformatics 14(9): 755-63.
Eddy, S. R. (2003). HMMER User's Guide.
Han, J. K., M. (2000). Chapter 7 Classication and Prediction. Data Mining: Concepts and
Techniques, Morgan Kaufmann Publishers.
Havecker, E. R., X. Gao, et al. (2004). "The diversity of LTR retrotransposons." Genome
Biol 5(6): 225.
Henikoff, S., E. A. Greene, et al. (1997). "Gene families: the taxonomy of protein
paralogs and chimeras." Science 278(5338): 609-14.
Hubbard, T., D. Barker, et al. (2002). "The Ensembl genome database project." Nucleic
Acids Res 30(1): 38-41.
Hughes, J. D., P. W. Estep, et al. (2000). "Computational identification of cis-regulatory
elements associated with groups of functionally related genes in Saccharomyces
cerevisiae." J Mol Biol 296(5): 1205-14.
Hughey, R. and A. Krogh (1996). "Hidden Markov models for sequence analysis:
extension and analysis of the basic method." Comput Appl Biosci 12(2): 95-107.
Juretic, N., T. E. Bureau, et al. (2004). "Transposable element annotation of the rice
genome." Bioinformatics 20(2): 155-60.
Jurka, J. (2000). "Repbase update: a database and an electronic journal of repetitive
elements." Trends Genet 16(9): 418-20.
Jurka, J., P. Klonowski, et al. (1996). "CENSOR--a program for identification and
elimination of repetitive elements from DNA sequences." Comput Chem 20(1):
119-21.
Lower, R., J. Lower, et al. (1996). "The viruses in all of us: characteristics and biological
significance of human endogenous retrovirus sequences." Proc Natl Acad Sci U S
A 93(11): 5177-84.
Madera, M. and J. Gough (2002). "A comparison of profile hidden Markov model
procedures for remote homology detection." Nucleic Acids Res 30(19): 4321-8.
Mager, D. L. M., P. (2003). "Retroviral repeat sequences." Nature Encyclopedia of the
Human Genome.
Patience, C., D. A. Wilkinson, et al. (1997). "Our retroviral heritage." Trends Genet 13(3):
38
116-20.
Sgourakis, N. G., P. G. Bagos, et al. (2005). "A method for the prediction of GPCRs
coupling specificity to G-proteins using refined profile Hidden Markov Models."
BMC Bioinformatics 6(1): 104.
Smit, A., Hubley, R & Green, P (1996-2004). "RepeatMasker Open-3.0."
Thompson, J. D., D. G. Higgins, et al. (1994). "CLUSTAL W: improving the sensitivity
of progressive multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice." Nucleic Acids Res
22(22): 4673-80.
Yada, T., Y. Totoki, et al. (1998). "Automatic extraction of motifs represented in the
hidden Markov model from a number of DNA sequences." Bioinformatics 14(4):
317-25.
指導教授 洪炯宗(Jorng-Tzong Horng) 審核日期 2005-7-20
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明