| 摘要: | 近年深度學習推動音樂生成技術快速發展,然而多數既有系統仍以西方音樂資料與美學為主,對亞洲音樂在五聲音階、旋律語彙、伴奏織度與節奏設計等關鍵面向的支援相對不足。為回應此一缺口,本研究開發一套以網頁為基礎的亞洲音樂生成系統,以符號式表示為核心,聚焦於五種風格:臺灣歌仔戲、客家民謠、傳統江南風格、傳統陝西秦派風格、日本那卡西。系統以自訂樂譜表示法 YNote 建置音樂資料庫,並針對資料量有限且風格語彙細緻的特性,提出結合「資料增強」與「規則化前處理」的流程:首先以隱藏式馬可夫模型擴充旋律資料,以提升小樣本條件下的可訓練性;再以模擬退火法萃取旋律骨架,作為多聲部生成的共同結構依據;同時以有限狀態機將功能和聲規則編碼為狀態轉移,以提供可控且可追溯的和弦進行約束。進入深度生成階段後,旋律生成整合三種策略,包括馬可夫鏈方法、結合監督式學習與強化學習的深度模型,以及結合強化學習的 GPT-2 文字式生成;此外,系統亦同步建構伴奏、和聲與打擊樂之生成機制,並在最終整合作業中完成多聲部對齊、結構整併與作品輸出。系統提供中文與英文網頁介面,採參數選擇式操作流程,讓使用者在完成風格與聲部設定後,即可產出具地域風格辨識度之作品。實驗結果與專業音樂家回饋顯示,本系統在風格一致性與音樂性表現上具可用性,並可支援音樂教育、現場演出與多媒體內容製作等應用。最後,本研究亦提出未來可在資料擴充、複雜結構建模與互動式創作支援等方向持續深化,以進一步提升系統之適用性與延展性。;In recent years, deep learning has driven rapid progress in music generation. However, most existing systems rely on Western music data and aesthetics. They offer limited support for Asian music in key aspects such as pentatonic scale usage, melodic vocabulary, accompaniment texture, and rhythmic design. To address this gap, we developed a web-based Asian music generation system. It uses symbolic representation as its core and focuses on five styles: Taiwanese Opera, Hakka Folk Song, Traditional Jiangnan Style, Traditional Qin Style, and Japanese Nakashi. The system builds a music database using our customized score representation, YNote. To handle limited data volume and fine-grained style characteristics, we designed a pipeline that combines data augmentation and rule-based preprocessing. First, we use a Hidden Markov Model to augment melody data. This improves model training under small-sample conditions. Next, we apply Simulated Annealing to extract the melodic skeleton. This serves as a shared structural reference for multi-part generation. In addition, we use a Finite State Machine to encode functional harmony rules as state transitions. This provides controllable and traceable constraints on chord progressions. In the deep generation stage, we integrate three melody generation strategies: a Markov chain method, a deep model combining supervised and reinforcement learning, and a reinforcement learning-enhanced GPT-2 text-based method. The system also includes generation modules for accompaniment, harmony, and percussion. In the final integration stage, the system aligns multiple parts, merges structural elements, and outputs complete pieces. The system provides both Chinese and English web interfaces and adopts a parameter-selection workflow. This allows users to produce pieces with clear regional style identity after configuring the style and parts. Experimental results and feedback from professional musicians indicate that the system is effective in terms of style consistency and musicality. It can support applications such as music education, live performance, and multimedia content production. Finally, we discuss future directions. These include expanding the dataset, improving the modeling of complex musical structures, and strengthening interactive support for iterative creation to further enhance the system’s applicability and extensibility. |