結合深度監督式學習與強化式學習的音樂旋律生成

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：68

、訪客IP：18.224.68.242

姓名

黃千豪(Chien-Hao Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

結合深度監督式學習與強化式學習的音樂旋律生成
(Combining Deep Supervised Learning and Reinforcement Learning for Music Melody Generation)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-5以後開放)

摘要(中)

在本篇研究中，我們結合了深度監督式學習與強化式學習進行符號化音樂的生成。使用深度學習對符號化音樂進行建模的相關任務時，音樂的片段可以被當作沿著時間的符號序列來處理，因此通常會使用具有時間訊息建模能力的模型，如同文本建模或自然語言處理等其他序列建模任務。在這種監督式的方法中，深度神經網路可以自動地從現存的資料庫中抓取音樂性的特徵。然而，音樂的譜曲通常包含一些定義完整的結構和慣用的樂理規則，對聽眾來說較為悅耳。這些約束可以使用強化式學習強加到神經網路中，而單純使用監督式學習的技術較難達成。透過結合這兩種深度學習的主要訓練架構，我們可以讓模型模仿現存資料庫的風格，並且控制生成旋律的特定表現。我們還研究了輸入表式與架構的設計讓模型更容易的抓取音樂的結構特徵。在實驗中，我們主要聚焦在中國江南音樂的單音旋律生成，並驗證生成結果的品質與特性，以及架構中不同模組的有效性。

摘要(英)

In this work, we present a symbolic music melody generation method that combines supervised learning and reinforcement learning. For using deep learning in symbolic music modeling tasks, music clips can be processed as sequences of symbols along time, so sequence models with the temporal information modeling ability usually be used, just like other sequential modeling tasks, such as text modeling or natural language processing. In these kind of supervised approaches, deep neural network is able to capture the musical features from the existing dataset automatically. However, music compositions by human composers usually have some well-defined structures and conventional rules of music theory that please the audience. These constraints can be enforced into neural network using reinforcement learning which cannot achieve using supervised learning techniques only. By combining these two major training architectures in deep learning, we can make the model mimic the style of the existing dataset and also control specific behaviors of the generated melody. We also investigate the design of input representation and architecture to make the model capture the music structure feature easier. In the experiments, we focus on monophonic melody generation of Chinese Jiangnan style music, and validate the quality and some characteristics of the generated result, as well as the effectiveness of different modules in the architecture.

關鍵字(中)

★ 人工智慧
★ 深度學習
★ 音樂生成

關鍵字(英)

★ Artificial Intelligence
★ Deep Learning
★ Music Generation

論文目次

Chinese Abstract ......................................................................................................................... i
English Abstract ........................................................................................................................ ii
Table of Contents ..................................................................................................................... iii
I Introduction ....................................................................................................................... 1
II Related Works ................................................................................................................... 3
III Background ....................................................................................................................... 6
3-1 LSTM Sequence Generation ..................................................................................... 6
3-2 PPO (Proximal Policy Optimization) Algorithms ................................................... 10
IV Method ............................................................................................................................. 15
4-1 Architecture Overview ............................................................................................ 15
4-2 Hierarchical Recurrent Neural Network (Bar Profile) ............................................. 17
4-3 Input Representation and Additional Rhythmic Information ................................... 19
4-4 PPO with LSTM Architecture ................................................................................. 22
4-5 Positional Encoding ................................................................................................. 23
4-6 Reward Functions Design ........................................................................................ 26
4-6-1 Pitch Model ............................................................................................. 26
4-6-2 Duration Model ....................................................................................... 28
4-6-3 Bar Profile Model .................................................................................... 29
4-7 Implementation Details ........................................................................................... 29
V Experiments ..................................................................................................................... 31
5-1 Dataset ..................................................................................................................... 31
5-2 Training ................................................................................................................... 31
5-3 Evaluation ................................................................................................................ 32
5-3-1 Characteristics ......................................................................................... 32
5-3-2 Positional Encoding ................................................................................. 34
5-3-3 Additional Rhythmic Information ........................................................... 36
5-3-4 RL Tuner Method ..................................................................................... 38
5-4 Result Examples ...................................................................................................... 39
VI Discussions and Conclusions ........................................................................................... 41
References ............................................................................................................................... 42

參考文獻

[1] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. “Proximal Policy Optimization Algorithms”. arXiv:1707.06347, 2017.
[2] Natasha Jaques, Shixiang Gu, Richard E. Turner, Douglas Eck. “Tuning Recurrent Neural Networks with Reinforcement Learning”. arXiv:1611.02796v2, 2016.
[3] Hado van Hasselt, Arthur Guez, David Silver. “Deep Reinforcement Learning with Double Q-learning”. arXiv:1509.06461, 2015.
[4] Sepp Hochreiter, Jürgen Schmidhuber. “Long Short-Term Memory”. Neural Computation, 1997.
[5] Peter M. Todd. “A Connectionist Approach to Algorithmic Composition”. Computer Music Journal (CMJ), Vol. 13, No. 4, pp. 27-43, 1989.
[6] Michael C. Mozer. “Neural Network Music Composition by Prediction: Exploring the Benefits of Psychophysical Constraints and Multi-scale Processing”. Connection Science, Vol. 6 (2-3):247-280, 1994.
[7] Alex Graves. “Generating Sequences With Recurrent Neural Networks”. arXiv:1308.0850v5, 2014.
[8] Douglas Eck, Jürgen Schmidhuber. “A First Look at Music Composition using LSTM Recurrent Neural Networks”. IDSIA/USI-SUPSI, Technical Report No. IDSIA-07-02, Switzerland, 2002.
[9] Gaëtan Hadjeres, François Pachet, Frank Nielson. “DeepBach: a Steerable Model for Bach Chorales Generation”. arXiv:1612.01010v2, 2016.
[10] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. “Generative Adversarial Networks”. arXiv:1406.2661, 2014.’
[11] Olof Mogren. “C-RNN-GAN: Continuous recurrent networks with adversarial training”. arXiv:1611.09904, 2016.
[12] Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang. “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment”. arXiv:1709.06298v2, 2017.
[13] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin. “Attention Is All You Need”. arXiv:1706.03762v5, 2017.
[14] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck. “Music Transformer: Generating Music with Long-Term Structure”. arXiv:1809.04281v3, 2018.
[15] Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang. “RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning”. arXiv:2002.03082, 2020.
[16] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wiestra, Martin Riedmiller. “Playing Atari with Deep Reinforcement Learning”. arXiv:1312.5602, 2013.
[17] Long short-term memory, Wikipedia. https://en.wikipedia.org/wiki/Long_short-term_memory#/media/File:LSTM_Cell.svg
[18] Xuefei Huang, Seung Ho Hong, Mengmeng Yu, Yuemin Ding, Junhui Jiang. “Demand Response Management for Industrial Facilities: A Deep Reinforcement Learning Approach”. IEEE Access, vol. 7, pp. 82194-82205, 2019.
[19] Daphné Lafleur, Sarath Chandar, Gilles Pesant. “Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints”. 28th International Conference on Principles and Practice of Constraint Programming (CP 2022), 2022.
[20] Harish Kumar, Balaraman Ravindran. “Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning”. arXiv:1902.01973v2, 2019.
[21] Training an RNN without Supervision, Machine Learning for Scientists. https://ml-lectures.org/docs/unsupervised_learning/ml_unsupervised-2.html
[22] Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang. “Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding”. arXiv:1611.05416v2, 2016.
[23] Pedro Borges. Deep Learning: Recurrent Neural Networks. https://medium.com/deeplearningbrasilia/deep-learning-recurrent-neural-networks-f9482a24d010
[24] Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, Jun Zhu. “A Hierarchical Recurrent Neural Network for Symbolic Melody Generation”. arXiv:1712.05274v2, 2017.
[25] Recurrent PPO, Stable Baselines3 - Contrib https://sb3-contrib.readthedocs.io/en/master/modules/ppo_recurrent.html
[26] Sho Takase, Naoaki Okazaki. “Positional Encoding to Control Output Sequence Length”. arXiv:1904.07418, 2019.
[27] PyTorch. https://pytorch.org/
[28] LSTM. PyTorch. https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html
[29] Shulei Ji, Xinyu Yang, Jing Luo, and Juan Li. “RL-Chord: CLSTM-Based Melody Harmonization Using Deep Reinforcement Learning”. IEEE Transactions on Neural Networks and Learning Systems (Early Access), 2023.
[30] Diederik P. Kingma, Jimmy Lei Ba. “Adam: A Method for Stochastic Optimization”. arXiv:1412.6980v9, 2014.
[31] Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan. “This Time with Feeling: Learning Expressive Musical Performance”. arXiv:1808.03715, 2018.
[32] Proximal Policy Optimization. OpenAI. https://openai.com/research/openai-baselines-ppo
[33] John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel. “Trust Region Policy Optimization”. arXiv:1502.05477v5, 2015.
[34] Magenta, Google. https://magenta.tensorflow.org/
[35] Jean-Pierre, Gaëtan Hadjeres, François-David Pachet. “Deep Learning Techniques for Music Generation – A Survey”. arXiv:1709.01620v4, 2017.
[36] Nonchord tone, Wikipedia. https://en.wikipedia.org/wiki/Nonchord_tone
[37] 施國琛, 張儷瓊, 黃志方, 孫沛立. “Erhu Performance and Music Style Analysis Using Artificial Intelligence” (“以人工智慧實踐二胡演奏行為暨音樂風格分析”). 國家科學及技術委員會 NSTC 112-2420-H-008-002.
[38] “中國民間歌曲集成”. 中國民間歌曲集成全國編輯委員會, 1988.
[39] 蒲亨強. “江蘇地域音樂文化”. 2014.
[40] 劉健. “葫蘆絲作品風格研究”. 2013.
[41] 周美妤. “朱昌耀《揚州小調》、《江南春色》作品分析與詮釋”. 2016.

指導教授

施國琛(Kuo-Chen Shih)

審核日期

2023-7-11

推文