摘要: | 我們說明了頻率分佈的相對頻譜寬度(relative spectral width)和Shannon資訊(Shannon information)之間是有簡單關聯的. 從計算 108個細菌全基因體序列的2到10長的核甘酸 (k-字串, k從2到10) 頻率分佈的相對頻譜寬度, 揭露出了一組由全部細菌基因體所共有的"根序列長度(root-sequence length)" 它與細菌基因體的長度及鹼基成分無關, 但和k成指數關係, 隨著k的增加而變大. 若給定一個k, 細菌基因體序列與長度恰為根序列長度的隨機序列擁有相同的相對頻譜寬度(relative spectral length). 由此概念我們由電腦模擬了一條原長大約為200鹼基(base)長的隨機序列, 經由高度隨機的短片段自我複製的過程後, 其長成的"準複製體(qusairelpicas)" 序列也擁有一些與細菌基因體序列相類似的特性. 準複製體序列是條自我組織的、複雜的且無週期的序列, 它是個儲存大量資訊的理想地方. 由小尺度觀之, 它是一條短隨機序列的高倍複製體, 由大尺度觀之, 它僅像是一條隨機序列. 從這些發現之中, 我們推斷出當遠祖基因體的長度大約為200鹼基長且已有初步的複製機制時, 基因體開始藉由複製而生長, 而那時候的遺傳世界是個只有去氧核醣核酸(DNA)與核醣核酸(RNA)而沒有蛋白質的世界. Spectral width and Shannon information of a frequency distribution are shown to be simply related. Measurements of spectral widths of distributions of frequencies of words two to ten nucleotides long (k-mers, k=2 to 10) in 108 bacterial complete genomes reveal the existence of a set of universal "root-sequence lengths" shared by all bacterial genomes independent of sequence length and base composition but grow exponentially with k. For a given k the relative spectral widths of all bacterial genomes are the same as that of a random sequence whose length is the root-sequence length for k-mers. We use computer modelling to show that such properties of bacterial genomes are reproduced by "quasireplicas" -sequences "grown" by maximally stochastic short-segmental duplications from initial random root sequences about 200 bases long. Ideal of storing large amounts of information, quasireplicas are self-organized, complex, aperiodic sequences appearing in the short scale as high-multiple replicas of random sequences and in the large scale as random sequences. From our findings we infer that growth by duplication in a world with only DNA/RNA and devoid of proteins, when the ancestral genomes were about 200 bases long and had acquired a rudimentary duplication machinery. |