以人工智慧方法驅動音樂轉錄與生成

、線上人數：52

、訪客IP：3.23.101.82

姓名	沙文森(CHAPUIS Vincent) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	以人工智慧方法驅動音樂轉錄與生成 (AI Driven Music Transcription and Generation)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載] 本電子論文使用權限為同意立即開放。已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。
摘要(中)	音樂轉錄與生成為一廣泛之研究領域,結集多數對於此領域好奇且充滿希望的研究學者們,積極的尋求能夠超越現況且能滿足人們好奇心的創新技術,藉此藉由技術突破,完成過去難以達成之各種任務。為能夠更了解此領域之困難點,本論文針對音樂轉錄與生成之相關研究做深入之文獻探討,並針對各困難點提出對應之改良演算法, 用實驗來驗證其魯棒姓。首先,本論文採用Waon演算法來轉錄音符,並開發友善操作之圖像使用者介面。接者,本論文將介紹如何在足夠之資料集中,能夠透過以卷積神經網路為基底之遞規神經網路作深度學習訓練,來滿足我們期望之結果。除此之外, 本論文也將介紹如何能夠透過如Transflormer之新型模型,將轉錄過之音符作為音樂生成之可用素材。本論文提出之各項實驗,皆以MiDi之格式作為深度學習之輸入源,搭配Pytorch之深度學習框架所完成的。最後,本研究針對實驗之成果做深入之討論,探討此項研究如何能夠更進階之優化以完成一互動產品,提供作曲家更友善之圖形化操作介面。
摘要(英)	Music Transcription and Generation is a wide field that has been looked over by many with hope and curiosity. Hope to reach and surpass human skills and creativity, curios- ity to find new ways of accomplishing tasks that were either difficult either impossible for previously existing technology to succeed. In this thesis, we explore this field and review the different existing technics used to realize those tasks. We then introduce the several approaches tested during our research to try new methods or to improve the current state-of-the-art. We first used an algorithmic approach based on the Waon al- gorithm to transcript music notes and developed a Graphical User Interface to help for this task. We then show how deep learning approach like Convolutional Neural net- work linked with Recurrent Neural Network can give satisfying results in the matter when adequate dataset is chosen, and how it can also be a great asset for generating music with cutting edge models like the Transformer. For all those tasks we mainly used the MiDi file format and Python frameworks like Pytorch to reach our goals. We finally discuss on how those technics can improve a composer’s life to help him cre- ate new music and improve his ideas, and how future work on this subject could be focused on creating an ergonomic user interface for production use.
關鍵字(中)	★ 音樂 ★ 深度學習 ★ 轉錄 ★ 生成	關鍵字(英)	★ Music ★ DeepLearning ★ Transcription ★ Generation
論文目次	Abstract i Acknowledgements v 1 Introduction 1 2 Related Work 4 2.1 Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Music Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 MiDi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 PianoRoll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Automatic Music Transcription . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Impact of Spectrogram type on performances . . . . . . . . . . . 9 2.3.2 Onset and frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.3 Imbalanced Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Music Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 MiDiNet, a DCGAN . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 MuseGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.3 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.4 Music Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.1.3 2.2 2.3 2.4 3 Methodology 34 3.1 A first algorithmic approach . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.1 Embedded Application . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.2 Waon, a Wave-to-Notes transcriber . . . . . . . . . . . . . . . . . . 35 3.1.3 SoX, the Swiss Army knife of audio manipulation . . . . . . . . . 35 3.1.4 The experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Deep Learning Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.1 Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Common Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 First used Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Second used Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 ROC, precision vs recall . . . . . . . . . . . . . . . . . . . . . . . . 43 Third Metric: mAP . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Fourth Metric: Mir_Eval . . . . . . . . . . . . . . . . . . . . . . . . 47 Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Subjective Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Objective Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Deep Learning Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.1 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 First Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . 50 Second Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . 50 Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 50 First Modification: Short Range Detector . . . . . . . . . . . . . . 51 Second Modification: Triplet-Ranking . . . . . . . . . . . . . . . . 52 Third Unfinished Modification: Transformer . . . . . . . . . . . . 52 Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Different Dataset and Style premise . . . . . . . . . . . . . . . . . 52 3.2 3.2.2 3.3 3.3.2 3.3.3 4 Experiment 54 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.1 MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1.2 MusicNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1.3 Maestro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1.4 Dataset Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Transcription: Short Range Detector . . . . . . . . . . . . . . . . . 58 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Ablation-like study . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Unbalanced Dataset Discovery . . . . . . . . . . . . . . . . . . . . 63 Simple Strategy for unbalanced Dataset . . . . . . . . . . . . . . . 66 Advanced Method: Triplet Ranking . . . . . . . . . . . . . . . . . 67 Generation: Different Style Inference . . . . . . . . . . . . . . . . . 68 4.2 4.2.2 5 Discussion and Conclusion 70 5.1 Interpretations and insights . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 References 72
參考文獻	[1] Paula Branco, Luís Torgo, and Rita P. Ribeiro. “A Survey of Predictive Modelling under Imbalanced Distributions”. In: CoRR abs/1505.01658 (2015). arXiv: 1505. 01658. URL : http://arxiv.org/abs/1505.01658. [2] Kin Wai Cheuk, Kat Agres, and Dorien Herremans. “The impact of Audio in- put representations on neural network based music transcription”. In: ArXiv abs/2001.09989 (2020). [3] Kyunghyun Cho et al. “On the Properties of Neural Machine Translation: Encoder- Decoder Approaches”. In: ArXiv abs/1409.1259 (2014). [4] Hao-Wen Dong et al. “MuseGAN: Symbolic-domain Music Generation and Ac- companiment with Multi-track Sequential Generative Adversarial Networks”. In: ArXiv abs/1709.06298 (2017). [5] Qi Dong, Shaogang Gong, and Xiatian Zhu. “Imbalanced Deep Learning by Mi- nority Class Incremental Rectification”. In: CoRR abs/1804.10851 (2018). arXiv: 1804.10851. URL : http://arxiv.org/abs/1804.10851. [6] Jeffrey L. Elman. “Finding Structure in Time”. In: Cognitive Science 14.2 (1990), pp. 179–211. DOI : 10.1207/s15516709cog1402\_1. [7] Valentin Emiya. “Transcription automatique de la musique de piano”. In: 2008. URL : https://pastel.archives-ouvertes.fr/pastel-00004867/document. [8] Kunihiko Fukushima. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position”. In: Bio- logical Cybernetics 36 (1980), pp. 193–202. 72REFERENCES REFERENCES [9] Ian Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Infor- mation Processing Systems 27. Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, pp. 2672–2680. URL : http://papers.nips.cc/paper/5423- generative- adversarial-nets.pdf. [10] Curtis Hawthorne et al. “Enabling Factorized Piano Music Modeling and Gener- ation with the MAESTRO Dataset”. In: International Conference on Learning Repre- sentations. 2019. URL : https://openreview.net/forum?id=r1lYRjC9F7. [11] Curtis Hawthorne et al. “Onsets and Frames: Dual-Objective Piano Transcrip- tion”. In: CoRR abs/1710.11153 (2017). arXiv: 1710.11153. URL : http://arxiv. org/abs/1710.11153. [12] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neu- ral Computation 9.8 (1997), pp. 1735–1780. [13] Anna Huang et al. “Visualizing Music Self-Attention”. In: 2018. URL : https:// openreview.net/pdf?id=ryfxVNEajm. [14] Cheng-Zhi Anna Huang et al. “Music Transformer: Generating Music with Long- Term Structure”. In: arXiv preprint arXiv:1809.04281 (2018). [15] Muhammad Huzaifah. “Comparison of Time-Frequency Representations for En- vironmental Sound Classification using Convolutional Neural Networks”. In: CoRR abs/1706.07156 (2017). arXiv: 1706.07156. URL : http://arxiv.org/abs/ 1706.07156. [16] Rainer Kelz et al. “On the Potential of Simple Framewise Approaches to Piano Transcription”. In: CoRR abs/1612.05153 (2016). arXiv: 1612.05153. URL : http: //arxiv.org/abs/1612.05153. [17] Colin Raffel et al. “mir_eval: a transparent implementation of common MIR met- rics”. In: In Proceedings of the 15th International Society for Music Information Re- trieval Conference, ISMIR. 2014. [18] F. Rosenblatt. “The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain”. In: Psychological Review (1958), pp. 65–386. 73REFERENCES REFERENCES [19] John Thickstun, Zaid Harchaoui, and Sham M. Kakade. “Learning Features of Music from Scratch”. In: International Conference on Learning Representations (ICLR). 2017. URL : https://arxiv.org/pdf/1611.09827.pdf. [20] Ashish Vaswani et al. “Attention is All you Need”. In: Advances in Neural Infor- mation Processing Systems 30. Ed. by I. Guyon et al. Curran Associates, Inc., 2017, pp. 5998–6008. URL : http://papers.nips.cc/paper/7181-attention-is-all- you-need.pdf. [21] Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. “MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions”. In: CoRR abs/1703.
指導教授	施國琛 Frederic Lassabe(Timothy K. Shih Frederic Lassabe)	審核日期	2020-7-9
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 108522606 詳細資訊