片段複製被認為是生物基因體演化與成長的最主要驅動力,但對於生物複雜性與基因體長度的明確關係卻尚未明瞭。利用寡核苷酸頻率法分析了865條完整的染色體後,歸納出基因體全序列具有等價長度、倒對稱與長程關聯等普適特徵。藉由這些普適的統計性質,我們提出以片段複製為基礎的成長機制,並建立模型來重現了生物基因體的普適特徵。模型由隨機片段複製後,以隨機插入、倒字串插入或連接插入等方式來成長,並加入單點突變來架構整個模型的成長機制。在要求吻合基因體普適性來探索模型參數值的過程中,我們發現模型序列的特性與複製片段的長度沒有明顯關聯,而對單點突變有較強的依靠性。在選擇適當的參數下,模型序列具有與基因體全序列相同的統計性質,驗證了模型成長機制可能就是生物基因體成長的主要生化機能。 Segmental duplication has long been considered to be an important driving force in genome growth and evolution. But a quantitative description of the nature of the duplication process and its relation to the complexity of genome structure has been lacking. We use word frequency to analyze complete genomes and use non-trivial universal statistical properties of genomes – equivalent length, inverse symmetry and long-range variation – as clues for specifying the nature of the segmental duplication process. We use a minimal genome growth model based on random segmental duplication (RSD) to generate genome-length sequences and compare their statistical properties with those of real genomes. With a few biologically meaningful universal parameters the RSD model can well describe most of the prominent and non-trivial statistical properties of genomes, including the universality of their equivalent lengths, and their patterns of long-range variation and inverse symmetry. Neutral and mostly random segmental duplication (RSD) is a dominant characteristic of genome growth, with the typical length of duplicated segments (DS) being 500 to 5000 nucleotides long. About 70% of the duplication events are “tandem” – DS is proximal to its origin – and about 30% are inverse – DS is made from one strand to the other. Occasionally a whole genome is inversely duplicated.