使用生成對抗網路進行音樂風格轉換

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：13

、訪客IP：3.136.17.231

姓名

丁婉芩(Wan-Chin Ting) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用生成對抗網路進行音樂風格轉換
(Using Generative Adversarial Network for Music Transformation)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-4以後開放)

摘要(中)

自動音樂生成是指使用計算機算法和人工智能技術來創建音樂的過程，發展已有悠久的歷史，可以追溯至 20 世紀中葉。過去的研究者使用不同的方法和技術，從開發規則系統、進化算法，到機器學習和神經網路等技術興起，自動音樂生成取得了顯著的進展，能夠生成更具創造性和多樣性的音樂作品。另一方面，數位音訊處理技術的影響使得音樂的分析和轉換變得更加容易和準確。自動音樂生成在應用方面範圍廣泛，不僅為音樂創作提供了新的可能性和靈感來源，同時也可以幫助人們節省時間和資源，快速生成符合需求的音樂作品。
Gatys 等人[1]在神經網絡的背景下引入了風格轉換這一術語，通常指的是保留圖像的明確內容特徵，並將另一張圖像的顯著風格特徵應用於該圖像。音樂風格轉換通過將不同音樂作品的音樂內容和音樂風格進行分離和重新組合，產生具有創造性和人工合成特徵的新穎音樂。音樂風格可以指代任何音樂特徵的不同層次。內容和風格之間的界線是高度動態的，取決於音色、演奏風格或作曲的不同目標函數，與不同的風格轉換問題有關[2]。
本研究提出了一種利用生成對抗網絡(GAN)[3]框架實現音樂風格轉換的方法。我們將音樂前處理轉換為 pianoroll 圖像的形式，將音樂視為圖片，並運用 CycleGAN 模型[4]進行起落音和完整樂曲的風格轉換。這使得使用者只需提供起落音，即可獲得對應的完整樂曲。在方法實現上，我們不僅使用了深度學習框架，還運用了音樂專業資訊進行資料處理，以進一步分析和優化轉換結果，提升了轉換的質量和實用性。我們還比較了不同生成器和判別器架構在我們的資料集上的表現能力。
該方法的優勢在於能夠自動生成對應的完整樂曲，從而提高了音樂風格轉
換的實用性。這項研究為音樂風格轉換和自動音樂生成領域提供了新的思路和
方法。未來的研究方向可以進一步探索不同音樂風格之間的轉換，並應用於音
樂創作、音樂教育等領域，豐富和拓展音樂創作的可能性。

摘要(英)

Automatic music generation refers to the process of creating music using computer algorithms and artificial intelligence techniques. It has a long history and can be traced back to the mid-20th century. Researchers have employed various methods and techniques over the years, ranging from rule-based systems and evolutionary algorithms to the emergence of machine learning and neural networks. Automatic music generation has made significant progress, enabling the generation of more creative and diverse music compositions. Furthermore, the impact of digital audio processing technology has made music analysis and transformation easier and more accurate.
Automatic music generation has a wide range of applications. It not only provides new possibilities and sources of inspiration for music composition but also helps save time and resources by quickly generating music compositions that meet specific requirements.
In the context of neural networks, Gatys et al. [1] introduced the term "style transfer," which typically refers to preserving the explicit content features of an image and applying the salient style features of another image to it. In the case of music, style transfer involves separating and recombining the musical content and musical style of different music compositions to generate novel music with creative and synthesized characteristics. Music style can refer to different levels of musical features, and the boundary between content and style is highly dynamic, depending on factors such as timbre, performance style, or compositional objectives, which are associated with different style transfer problems [2].
This study proposes a method for music style transfer using the Generative Adversarial Network (GAN) framework [3]. We transform music preprocessing into pianoroll images, treating music as images, and utilize the CycleGAN model [4] for style transfer of musical phrases and complete compositions. This allows users to provide only musical phrases and obtain corresponding complete compositions. In the implementation of the method, we not only employ deep learning frameworks but also utilize domain-specific music knowledge for data processing to further analyze and optimize the transformation results, enhancing the quality and practicality of the conversion. We also compare the performance of different generator and discriminator architectures on our dataset.
The advantage of this method lies in its ability to automatically generate corresponding complete compositions, thus enhancing the practicality of music style transfer. This research provides new ideas and methods for the field of music style transfer and automatic music generation. Future research directions can further explore transformations between different music styles and apply them to music composition, music education, and other domains, enriching and expanding the possibilities of music creation.

關鍵字(中)

★ 音樂風格轉換
★ 生成對抗網路 (GAN)
★ 自動音樂生成
★ Pianoroll圖片
★ 樂曲起落音

關鍵字(英)

★ Music style transfer
★ Generative Adversarial Network (GAN)
★ Automatic music generation
★ Pianoroll images
★ Anacrusis and Coda

論文目次

摘要............................................................................................................................................ I Abstract ..................................................................................................................................... II Content ......................................................................................................................................V 1. Introduction ...................................................................................................................... 1
1.1. Motivation and background of the research ............................................................. 1
1.2. Research purpose and contributions......................................................................... 3
2. Related Work.......................................................................................................................4 2.1. Existing methods for music style transfer ................................................................ 4
2.1.1. Rule-based methods ............................................................................................. 5
2.1.2. Neural network-based methods ........................................................................... 5
2.1.3. Hybrid methods ................................................................................................... 6
2.2. Music representation methods.................................................................................. 8
2.2.1. Digital Audio Representation .............................................................................. 9
2.2.2. Spectral Representation ..................................................................................... 11
2.2.3. Note-based Representation ................................................................................ 12
2.2.4. Music Symbol Representation ........................................................................... 16
2.2.5. Hybrid Representation ....................................................................................... 16
2.3. Application of pianoroll images in music processing ............................................17
2.4. Advantages and disadvantages of cycleGAN models in image transformation ..... 20
3. Methodology ......................................................................................................................22
3.1 Method for generating pianoroll images ................................................................ 22
3.2 Introduction and preprocessing of the dataset........................................................27
3.3 Design and implementation of the cycleGAN model.............................................29
3.3.1. Model overview ................................................................................................. 30
3.3.2. Generator Design ............................................................................................... 32
3.3.3. Discriminator Design ......................................................................................... 36
3.3.4. Loss function .....................................................................................................37 3.4. Post-processing ......................................................................................................40 Experiments and Results ..............................................................................................46
5.
4.1 4.1.1.
4.1.2. 4.2
Training..................................................................................................................46 Training Strategy ............................................................................................... 47 Training Optimization Methods and Results .....................................................54
Evaluation Metrics.................................................................................................65
Discussion of Research Findings ...........................................................................69 Conclusion ...................................................................................................................... 72 6. References .......................................................................................................................... 74

參考文獻

[1] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
[2] Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, and Li Su. Play as you like: Timbre-enhanced multi- modal music style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1061–1068, 2019.
[3] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks. NIPS, 2014.
[4] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. "Unpaired image-to- image translation using cycle-consistent adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.
[5] David Cope. Experiments in musical intelligence (emi): Non-linear linguistic- based composition. Journal of NewMusic Research, 18(1-2):117–139, 1989.
[6] Iannis Xenakis. Musiques formelles nouveaux principes formels de composition musicale. 1981.
[7] https://koenigproject.nl/project-1/, 2019. accessed July 2021.
[8] Peter M. Todd. A connectionist approach to algorithmic composition. Computer
Music Journal, 13(4):27–43,1989.
[9] Gerhard Nierhaus. Algorithmic composition: paradigms of automated music
generation. Springer Science &Business Media, 2009. 74
[10] Lejaren A Hiller Jr and Leonard M Isaacson. Musical composition with a high speed digital computer. In AudioEngineering Society Convention 9. Audio Engineering Society, 1957.
[11] Gaëtan Hadjeres, François Pachet, and Frank Nielsen. Deepbach: a steerable model for bach chorales generation.In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on MachineLearning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine LearningResearch, pages 1362–1371. PMLR, 2017.
[12] Michael C. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacousticconstraints and multi-scale processing. Connect. Sci., 6(2-3):247–280, 1994.
[13] Douglas Eck. A network of relaxation oscillators that finds downbeats in rhythms. In Georg Dorffner, HorstBischof, and Kurt Hornik, editors, Artificial Neural Networks - ICANN 2001, International Conference Vienna,Austria, August 21-25, 2001 Proceedings, volume 2130 of Lecture Notes in Computer Science, pages 1239–1247.Springer, 2001.
[14] Douglas Eck and Jürgen Schmidhuber. Learning the long-term structure of the blues. In José R. Dorronsoro,editor, Artificial Neural Networks - ICANN 2002, International Conference, Madrid, Spain, August 28-30, 2002,Proceedings, volume 2415 of Lecture Notes in Computer Science, pages 284–289. Springer, 2002.
[15] Jamshed J Bharucha and Peter M Todd. Modeling the perception of tonal structure with neural nets. ComputerMusic Journal, 13(4):44–53, 1989.
[16] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. Acomprehensive survey on transfer learning. Proc. IEEE, 109(1):43–76, 2021.
[17] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun,editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16,2014, Conference Track Proceedings, 2014.
[18] Irina Higgins, Loïc Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed,and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework.In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017,Conference Track Proceedings. OpenReview.net, 2017.
[19] Adam Roberts, Jesse H. Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. A hierarchical latent vectormodel for learning long-term structure in
music. In Jennifer G. Dy and Andreas Krause, editors, Proceedings ofthe 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden,July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 4361–4370. PMLR, 2018.
[20] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, AaronCourville, and Yoshua Bengio. Generative adversarial nets. NIPS’14, page 2672–2680, Cambridge, MA, USA,2014. MIT Press.
[21] Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain.Psychological review, 65(6):386, 1958.
[22] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
[23] Kunihiko Fukushima and Sei Miyake. Neocognitron: A new algorithm for pattern recognition tolerant ofdeformations and shifts in position. Pattern Recognit., 15(6):455–469, 1982.
[24] Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. Object recognition with gradient-based learning.In David A. Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla, editors, Shape, Contour and Groupingin Computer Vision, volume 1681 of Lecture Notes in Computer Science, page 319. Springer, 1999.
[25] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser,and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M.Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural InformationProcessing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017,Long Beach, CA, USA, pages 5998–6008, 2017.
[26] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Curtis Hawthorne, Andrew M Dai,Matthew D Hoffman, and Douglas Eck. Music transformer: Generating music with long-term structure. arXivpreprint arXiv:1809.04281, 2018.
[27] Dai, S.; Zhang, Z.; Xia, G.Music style transfer: A position paper. P6 INT WORKSH MUS M Published: 2018
[28] Gwen Henderson. Sampling and quantization. https://slideplayer.com/slide/15463344/
[29] Bracewell, Ronald N., The Fourier Transform and Its Applications, Second Edition. McGraw-Hill Book Company. New York, 1978. pp. 356-381.
[30] Davide Baccherini, Donatella Merlini, and Renzo Sprugnoli. Tablatures for
Stringed Instruments and Generating Functions , 2007.
[31] Muhaddisa Barat Ali, Irene Yu-Hua Gu, Mitchel S. Berger , Johan Pallud , Derek
Southwell, Georg Widhalm, Alexandre Roux and Tomás Gomez Vecchio and Asgeir Store Jakola. Domain Mapping and Deep Learning from Multiple MRI Clinical Datasets for Prediction of Molecular Subtypes in Low Grade Gliomas. 2020
[32] Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI, 2015.
[33] Li, C.; Wand, M. Precomputed real-time texture synthesis with markovian generative adversarial networks. arXiv, 2016, arXiv:1604.04382.
[34] Tieleman, T., & Hinton, G. RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26-31, 2012.
[35] Smith, L. N. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464-472). IEEE, 2015.
[36] Duchi, J., Hazan, E., & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul), 2121-2159, 2011.
[37] Tieleman, T., & Hinton, G. RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26-31, 2012.
[38] Ulyanov, D., Vedaldi, A., & Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
[39] Ioffe, S., & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 448-456), 2015.
[40] Ivan P Yamshchikov and Alexey Tikhonov. Music generation with variational recurrent autoencoder supported by history. arXiv preprint arXiv:1705.05458, 2017.
[41] Li-Chia Yang and Alexander Lerch. On the evaluation of generative models in music. Neural Computing and Applications, 32(9):4773–4784, 2020.

指導教授

施國琛(Kuo-Chen Shih)

審核日期

2023-7-13

推文