累積至目前為止,已經有許多的基因體完成定序的工作,並且陸陸續續的被發表出來,這其中包括了前不久才完成定序且長達約3千萬序列對的人類第22條染色體,有相當多基因體結構與功能的資訊隱藏在這裡面,而對於重複序列來說,更是生物學家所感興趣的研究項目之一。重複序列在基因體序列裡頭佔了相當大的比例,生物學家已從重複序列中找出大量的調控機制,藉由分析重複序列,可以進一步了解染色體結構的組成與基因和物種演化之間的關係。在本論文中,我們設計並實作一重複序列資料庫,在我們重複序列資料庫裡主要包括了有長度至少是由20 base pairs所組成的direct、inverted及palindromic重複序列,可以說是長度至少20以上的完整的重複序列資料庫,這樣的重複序列資料量是相當多的,我們在實作資料庫的部分是採用ORACLE關聯式資料庫管理系統,同時並探討影響整個系統查詢效能的關鍵性因素。 Many genomes have recently been sequenced and published, including Human chromosome 22 with length of more than 30 million base pairs (Mb). This gives enormous amount of information for the studies of how the genome as the whole is organized and how it functions. Repeat sequence is one of the analysis of biological interests. They are the most abundant sequences in extragenic region of genomes. At present, biologists found that a large number of regulatory elements are also located here. They may play an important role in the chromatin structure formation in nucleus. They also contain important clues in genetic evolution and phylogeny study. In this thesis, we design and implement a database for repeat sequences. The repeat database now contains direct, inverted and palindromic repeats. Every repeat sequence has variable length from 20 to several hundreds of base pairs. The database contains a huge number of repeats. We implement the database by using ORACLE database. We also discuss the physical design factors that affect the performance of the manipulation of the database.