生物體之重複序列資料庫之建構

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：36

、訪客IP：3.143.244.58

姓名

林家煌(Jason Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

生物體之重複序列資料庫之建構
(A Design and Implementation of a Database for Repeat Sequences)

相關論文

★ 應用嵌入式系統於呼吸肌肉群訓練儀之系統開發	★ 勃起障礙與缺血性心臟病的雙向研究: 以台灣全人口基礎的世代研究
★ 基質輔助雷射脫附飛行時間式串聯質譜儀微生物抗藥性資料視覺化工具	★ 使用穿戴式裝置分析心律變異及偵測心律不整之應用程式
★ 建立一個自動化分析系統用來分析任何兩種疾病之間的關聯性透過世代研究設計以及使用承保抽樣歸人檔	★ 青光眼病患併發糖尿病,使用Metformin及Sulfonylurea治療得到中風之風險:以台灣人口為基礎的觀察性研究
★ 利用組成識別和序列及空間特性構成之預測系統來針對蛋白質交互作用上的特殊區段點位進行分析及預測辨識	★ 新聞語意特徵擷取流程設計與股價變化關聯性分析
★ 藥物與疾病關聯性自動化分析平台設計與實作	★ 建立財務報告自動分析系統進行股價預測
★ 建立一個分析疾病與癌症關聯性的自動化系統	★ 基於慣性感測器虛擬鍵盤之設計與實作
★ 一個醫療照護監測系統之實作	★ 應用手機開發手握球握力及相關資料之量測
★ 利用關聯分析全面性的搜索癌症關聯疾病	★ 全面性尋找類風濕性關節炎之關聯疾病

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

累積至目前為止，已經有許多的基因體完成定序的工作，並且陸陸續續的被發表出來，這其中包括了前不久才完成定序且長達約3千萬序列對的人類第22條染色體，有相當多基因體結構與功能的資訊隱藏在這裡面，而對於重複序列來說，更是生物學家所感興趣的研究項目之一。重複序列在基因體序列裡頭佔了相當大的比例，生物學家已從重複序列中找出大量的調控機制，藉由分析重複序列，可以進一步了解染色體結構的組成與基因和物種演化之間的關係。在本論文中，我們設計並實作一重複序列資料庫，在我們重複序列資料庫裡主要包括了有長度至少是由20 base pairs所組成的direct、inverted及palindromic重複序列，可以說是長度至少20以上的完整的重複序列資料庫，這樣的重複序列資料量是相當多的，我們在實作資料庫的部分是採用ORACLE關聯式資料庫管理系統，同時並探討影響整個系統查詢效能的關鍵性因素。

摘要(英)

Many genomes have recently been sequenced and published, including Human chromosome 22 with length of more than 30 million base pairs (Mb). This gives enormous amount of information for the studies of how the genome as the whole is organized and how it functions. Repeat sequence is one of the analysis of biological interests. They are the most abundant sequences in extragenic region of genomes. At present, biologists found that a large number of regulatory elements are also located here. They may play an important role in the chromatin structure formation in nucleus. They also contain important clues in genetic evolution and phylogeny study. In this thesis, we design and implement a database for repeat sequences. The repeat database now contains direct, inverted and palindromic repeats. Every repeat sequence has variable length from 20 to several hundreds of base pairs. The database contains a huge number of repeats. We implement the database by using ORACLE database. We also discuss the physical design factors that affect the performance of the manipulation of the database.

關鍵字(中)

★ 資料庫
★ 重複序列

關鍵字(英)

★ repeat sequences
★ repeats
★ repeat
★ database

論文目次

Chapter 1 Introduction1
1.1 Background1
1.2 Motivation2
1.3 Goal and Purpose3
1.4 Related Work3
1.5 Organization of The Thesis6
Chapter 2 System Design7
2.1 Infrastructure7
2.2 Database7
2.3 Data model8
2.4 Communication12
Chapter 3 Implementation14
3.1 Storage Structure of Repeat Sequences14
3.2 Partitions of Tables and Indexes14
3.3 Nested Table and Index-Organized Table16
3.3 Bitmap Index20
3.4 Functions of Utilities21
3.5 Search Tools on Web23
3.6 Search Flow26
Chapter 4 Statistics30
Chapter 5 Performance Evaluation33
5.1 Experiment 1: Search by features33
5.2 Experiment 2: Search by range34
Chapter 6 Discussion and Conclusion38
References40
Appendix 1. Abbreviation of Organism42
Appendix 2. A Flat File of RSDB43

參考文獻

[1]. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403-410.
[2]. Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J., Ouellette,B.F., Rapp,B.A. and Wheeler,D.L. (1999) GenBank. Nucleic Acids Res. 27, 12-17.
[3]. Biaudet,V., Samson,F., and Bessieres,P. (1997) Micado--a network-oriented database for microbial genomes. Comput. Applic. Biosci., 13, 431-438.
[4]. Burks,C. (1999) Molecular Biology Database List. Nucleic Acids Res., 27, 1-9.
[5]. Cheang,I.K., Choi,Y.B. and Tang A. (1994) Overview of the Structures of Heterogeneous Genome Databases. Proceedings of the 27th Hawaii International Conference on System Sciences, Biotechnology Computing, 5, 15 —24.
[6]. Courteau,J. (1991) Genome Databases. Science, 254, 201-207.
[7]. Elmasri R. and Navathe S. B. (1994) Fundamentals of Database Systems, 2nd edn. Addison-Wesley Publishing Company, Menlo Park, CA.
[8]. Etzold,T., Ulyanov,A., and Argos,P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114-128.
[9]. Harger,C. et al. (1998) The Genome Sequence DataBase (GSDB): improving data quality and data access. Nucleic Acids Res., 26, 21-26.
[10]. Kitakami, H., Shin-I, T., Ikeo, K., Ugawa, Y., Saitou, N., Gojobori, T. and Tateno, Y. (1995) YAMATO and ASUKA: DNA Database Management System. Proceedings of the 28th Hawaii International Conference on System Sciences, 5, 72-80
[11]. Sargent,R., Fuhrman,D., Critchlow,T., Sera, T. D., Mecklenburg,R., Lindstrom,G., and Schuler,G.D., Epstein,J.A., Ohkawa,H. and Kans,J.A. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol., 266, 141-162.
[12]. Stein,L.D. and Thierry-Mieg,J. (1999) AceDB: a genome database management system. Computing in Science & Engineering, 1-3, 44 —52.
[13]. Stoesser,G., Moseley,M.A., Sleep,J., McGowran,M., Garcia-Pastor,M. and Sterk,P. (1998) The EMBL nucleotide sequence database. Nucleic Acids Res., 26, 8-15.
[14]. Wall,L., Christiansen,T., Schwartz,R.L. (1996) Programming Perl, 2nd edn. O’Reilly & Associates, Sebastopol, CA.

指導教授

洪炯宗(Jorng-Tzong Horng)

審核日期

2000-7-6

推文