研究基因的調控機制,是一項探討真核生物體基因體運作的重要工作。由於大量基因表現(Gene Expression)資料的取得,使得可以運用電腦計算的方式來探索基因(Gene)的調控機制。一般來說,在生物體內的有著相同表現現象的某一群基因,有很大的機會被同一組的轉錄因子(Transcription Factors)來調控(Regulation)。透過電腦計算的方式可以用來預測轉錄因子所黏結(Binding)的黏結子(Binding Sites)是可行的。而傳統的研究分析方法,是繁瑣的、不方便的和費時的。本研究的目的即是設計及實作一自動化之整合性的基因轉錄因子黏結子(Transcription Factor Binding Sites)預測系統,簡稱為RgS-Miner。RgS-Miner預測系統的功能為輸入一群基因,此預測系統便會針對這群基因的調控區(Upstream)進行分析,透過具有統計基礎的電腦運算方法,預測可能共同調控(Co-regulation)這群基因的基因轉錄因子黏結子。此系統更運用資料探勘(Data Mining)的方法,找尋基因轉錄因子黏結子之間出現的關連分析(Occurrence Association)。系統並提供網頁介面供使用者直接查詢、使用及分析,圖形化的使用介面提供使用者更容易了解預測結果。在與其它的系統比較之後,我們的系統確實提供生物學家更方便的工具,可以用來分析真核生物基因體的基因調控機制。 The availability of genome-wide gene expression data provides a unique set of genes from which can be to decipher the mechanisms underlying the common transcriptional response. The identification of transcription factor binding sites provides valuable information on gene expression and regulation. Recently, the biological information and analyzing methods are available for the analysis of gene expression and transcriptional regulatory sequences. However, users should make elaborate the complicated analysis processes to query the data from different databases, followed by analyzing the gene upstreams by different prediction tools, and finally convert among different data formats. Beyond methods for the prediction of transcriptional regulatory site, new automated and integrated methods for gene upstream sequence analysis at a higher level are needed. Since the identification of regulatory sites requires a large set of biological databases, methods for an efficient and integrated data management are also crucial. In this dissertation, we proposed a predictive system, designated RgS-Miner, which is capable of predicting transcriptional regulatory sites in eukaryotes and detecting co-occurrence of these regulatory sites by inputting a group of genes, i.e., a set of genes that are considered potentially with the common regulatory mechanisms. The system integrates several regulatory site detection methods, such as known site matching, over-presented oligonucleotide detection, and DNA motif discovery. Three case studies in yeast and human genomes are studies in the proposed system. Besides, the system successfully constructs a biological data warehouse to integrate a variety of heterogeneous biological databases. By comparison to other systems, our system is a useful tool in the analyses of transcriptional regulatory sites when users investigate on the regulation of gene expression.