GPT-2 及 CycleGAN 生成江南風格音樂打擊節奏;Generate Jiangnan Percussion Rhythm with GPT-2 and CycleGAN

NCU Institutional Repository > 資訊電機學院 > 軟體工程研究所 > 博碩士論文 > Item 987654321/95265

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95265

题名:	GPT-2 及 CycleGAN 生成江南風格音樂打擊節奏;Generate Jiangnan Percussion Rhythm with GPT-2 and CycleGAN
作者:	劉洮語;Liu, Chao-Yu
贡献者:	軟體工程研究所
关键词:	打擊節奏;GPT-2;CycleGAN;江南;GPT-2;CycleGAN;Percussion Rhythm;Jiangnan
日期:	2024-07-17
上传时间:	2024-10-09 16:36:33 (UTC+8)
出版者:	國立中央大學
摘要:	自從音樂自動生成技術的興起，我們見證了一系列的演進。從早期神經網路像 DNN、CNN，到如今的 GAN、LSTM，每一個技術都為音樂創作帶來了新的可能性。最近，GPT-2（Generative Pre-trained Transformer 2）的應用尤其引人注目。GPT-2 是一種基於 Transformer 架構的預訓練語言模型，最初用於自然語言處理任務，但近年來已擴展至音樂生成領域。相比其他技術，GPT-2 具備顯著的優勢，例如能夠利用大量音樂資料進行訓練，從而更好地理解音樂結構和風格。其預訓練特性使其生成的音樂片段更為流暢、自然，並且在創作中展現出更高的創造性和多樣性。此外，GPT-2 具有良好的可擴展性，可應用於不同類型和風格的音樂生成任務。然而，使用 GPT-2 進行音樂生成也面臨挑戰，例如模型可能存在固有的偏見或對音樂理解的不完整，生成的音樂可能缺乏情感表達或創意性，需要進一步的後處理和調整。另一方面，CycleGAN（Cycle-Consistent Generative Adversarial Network）的應用也在音樂生成領域崭露頭角。CycleGAN 利用生成對抗網路技術進行未配對圖像的轉換，並通過引入循環一致性損失來確保生成內容的連貫性和真實性。這種技術在音樂生成中尤其適用於將一種音樂風格轉換為另一種風格。例如，CycleGAN 能夠學習並生成具有特定風格特徵的音樂片段，這對於保留傳統音樂特別有效。與 GPT-2 相比， CycleGAN 在捕捉和維持音樂風格一致性方面展現了優勢，但可能在創造性和多樣性上稍顯不足。因此，將 GPT-2 和 CycleGAN 的優勢結合，可以在音樂生成中取得更平衡的效果，既能產生自然且富有創意的音樂片段，又能保留原有風格的特徵。在本研究中，我們探討了利用 GPT-2 和 CycleGAN 兩種技術來自動生成中國江南絲竹音樂中主旋律對應的打擊樂器節奏。我們從傳統江南絲竹樂曲中提取音樂片段，使用這些數據來訓練 GPT-2 模型生成獨特的打擊節奏。同時，我們還使用 CycleGAN 進行生成，它透過學習不同音樂風格之間的轉換來生成符合江南絲竹風格的打擊樂節奏。與 GPT-2 相比，CycleGAN 在捕捉風格特徵和音樂結構一致性方面表現出色，但可能在創造性和多樣性上略顯不足。我們發現 GPT-2 在生成多樣性和創新性節奏方面具有優勢，而 CycleGAN 更擅長在保留原有風格特徵的同時生成連貫且風格化的節奏。結合兩者的優點可以進一步提升打擊樂器節奏的生成質量，為中國古典音樂的自動化創作提供了新的路徑。 ;Since the emergence of music automatic generation technology, we have witnessed a se ries of advancements. From early neural networks like DNN and CNN to the recent develop ments in GANs and LSTMs, each technique has brought new possibilities to music composition. Recently, the application of GPT-2 (Generative Pre-trained Transformer 2) has garnered partic ular attention. GPT-2 is a pre-trained language model based on the Transformer architecture, initially used for natural language processing tasks but has expanded into the field of music generation in recent years. Compared to other techniques, GPT-2 offers significant advantages, such as the ability to train on large amounts of music data to better understand musical structure and style. Its pre-training nature enables it to generate music segments that are more fluent, natural, and demonstrate higher creativity and diversity in composition. Additionally, GPT-2 exhibits good scalability and can be applied to various types and styles of music generation tasks. However, using GPT-2 for music generation also faces challenges, such as inherent bi ases in the model or incomplete understanding of music, resulting in generated music lacking emotional expression or creativity, requiring further post-processing and adjustments. On the other hand, the application of CycleGAN (Cycle-Consistent Generative Adversar ial Network) has also emerged in the field of music generation. CycleGAN utilizes generative adversarial network technology for unpaired image transformation and ensures the coherence and authenticity of generated content by introducing cycle consistency loss. This technique is particularly useful in music generation for transforming one musical style into another. For ex ample, CycleGAN can learn and generate music segments with specific stylistic features, which is especially effective in preserving traditional music characteristics. Compared to GPT-2, Cy cleGAN demonstrates advantages in capturing and maintaining stylistic consistency in music but may be slightly lacking in creativity and diversity. Therefore, combining the strengths of GPT-2 and CycleGAN can achieve a more balanced effect in music generation, producing nat ural and creative music segments while preserving the original style features. In this study, we explored the use of both GPT-2 and CycleGAN techniques to automatically generate percussion rhythms corresponding to the main melodies of Chinese Jiangnan silk and bamboo music. We extracted music segments from traditional Jiangnan silk and bamboo music and used this data to train the GPT-2 model to generate unique percussion rhythms. Additionally, we employed CycleGAN for generation, which generates percussion rhythms that match the Jiangnan silk and bamboo style by learning transformations between different musical styles. Compared to GPT-2, CycleGAN performs well in capturing stylistic features and maintaining consistency in musical structure, but may be slightly lacking in creativity and diversity. We found that GPT-2 has advantages in generating diverse and innovative rhythms, while CycleGAN excels in gen erating coherent and stylized rhythms while preserving original style features. Combining the strengths of both can further improve the quality of percussion rhythm generation, providing a new path for the automation of Chinese classical music composition.
显示于类别:	[軟體工程研究所 ] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	22	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....