DC 欄位 |
值 |
語言 |
DC.contributor | 物理學系 | zh_TW |
DC.creator | 魏建豪 | zh_TW |
DC.creator | Jian-Hao Wei | en_US |
dc.date.accessioned | 2011-8-29T07:39:07Z | |
dc.date.available | 2011-8-29T07:39:07Z | |
dc.date.issued | 2011 | |
dc.identifier.uri | http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=93242003 | |
dc.contributor.department | 物理學系 | zh_TW |
DC.description | 國立中央大學 | zh_TW |
DC.description | National Central University | en_US |
dc.description.abstract | 一個普遍被使用的數理統計方法-齊普夫定律,1994年被Mantegna與他的研究團隊使用在基因序列k字串的發生頻率與其排名的解析上(k字串齊普夫解析),強調非編碼區有類語言的冪次規則。不過,這樣的結論被大量的質疑與討論。
我們整理不同的齊普夫分佈研究領域,發現觀察的重點雖不盡相同,但事件總數為N時,各別事件在隨機狀態時機率均為1/N。然而,基因序列在序列的p(序列A+T含量所佔比)越遠離一半時,各別字串的機率在隨機狀態差異越大,因此在非隨機狀態中,機率不等是受到p與生物特徵兩個因素造成,影響齊普夫分佈的解析判斷。
這個研究中,我們運用不同p的基因體序列與其對應的隨機序列的數據,證實k字串齊普夫子集解析法可以去除p的影響,改善k字串齊普夫解析難以定義隨機序列冪次的障礙,確立子集解析的優勢。
另外,我們擬合四個函式(直線、指數、對數、冪次)選定足以代表物種特徵的「高頻字」(高頻率出現的字串),並嘗試找出865個物種高頻字冪次的普適性。研究結果顯示物種的冪次與其物種複雜度有關,傳達基因複製的演化結果。
| zh_TW |
dc.description.abstract | Zipf’s law is a characterization of the relation between the frequency of any word in a text and the ranking of that word in the frequency table. It states that if the text is that of a natural language, then the frequency versus ranking relation is an approximate power law. For a few years in the mid to late 1990’s Zipf’s law was intensely discussed in the context of genomic sequences, but no clear consensus was reached as to whether, as a general rule, the word frequencies -- a genomic a word is an oligonucleotide of a given length; we call a k-nucleotide word a k-mer -- in genomic sequences, or some specific portion thereof, obey a Zipf’s law. Here we revisit the issue by studying the frequency versus ranking relations of a large number of complete genomes, and of parts of genomes having different biological functions. We show that the nucleotide composition has an influence on the frequency versus rank relation of a genomic sequence that is strong enough to mask whatever Zipf’s-law behavior the sequence may possess. Once this influence is removed, then all genomes obey the same broadly defined classes of Zipf’s laws, with the most important class-defining factor being the length of k-mers, or the integer k. For eukaryotes, the Zipf’s laws for the exonic and intronic segments of the genome differ significantly. Based on the observation that the Zipf’s law of a sequence is determined by the subset of k-mers having the highest frequencies (of occurrence), we derive a relation between the Zipf’s-law exponent and the high-frequency tail of the frequency distribution, and infer that for genomes in general the high-frequency tail is best represented by an exponential function, as opposed to linear, logarithmic, or power-law functions.
| en_US |
DC.subject | 高頻字 | zh_TW |
DC.subject | 排名 | zh_TW |
DC.subject | 字的發生頻率 | zh_TW |
DC.subject | 全基因序列 | zh_TW |
DC.subject | 語言 | zh_TW |
DC.subject | 齊普夫定律 | zh_TW |
DC.subject | 編碼區 | zh_TW |
DC.subject | 非編碼區 | zh_TW |
DC.subject | 外顯子 | zh_TW |
DC.subject | 內含子 | zh_TW |
DC.subject | 頻率分佈 | zh_TW |
DC.subject | 冪次分佈 | zh_TW |
DC.subject | coding parts | en_US |
DC.subject | high-frequency words | en_US |
DC.subject | ranking | en_US |
DC.subject | k-mers | en_US |
DC.subject | frequency of occurrence of words | en_US |
DC.subject | complete genome sequences | en_US |
DC.subject | noncoding parts | en_US |
DC.subject | Zipf’s law | en_US |
DC.subject | natural language | en_US |
DC.subject | exons | en_US |
DC.subject | introns | en_US |
DC.subject | power-law distribution | en_US |
DC.subject | frequency distribution | en_US |
DC.title | 基因序列的k 字齊普夫子集解析 | zh_TW |
dc.language.iso | zh-TW | zh-TW |
DC.title | k-tuple Zipf m-Set analysis on DNA | en_US |
DC.type | 博碩士論文 | zh_TW |
DC.type | thesis | en_US |
DC.publisher | National Central University | en_US |