博碩士論文 108522606 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:52 、訪客IP:3.23.101.82
姓名 沙文森(CHAPUIS Vincent)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 以人工智慧方法驅動音樂轉錄與生成
(AI Driven Music Transcription and Generation)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 音樂轉錄與生成為一廣泛之研究領域,結集多數對於此領域好奇且充滿希望的研究
學者們,積極的尋求能夠超越現況且能滿足人們好奇心的創新技術,藉此藉由技術突
破,完成過去難以達成之各種任務。為能夠更了解此領域之困難點,本論文針對音樂
轉錄與生成之相關研究做深入之文獻探討,並針對各困難點提出對應之改良演算法,
用實驗來驗證其魯棒姓。首先,本論文採用Waon演算法來轉錄音符,並開發友善操作
之圖像使用者介面。接者,本論文將介紹如何在足夠之資料集中,能夠透過以卷積神
經網路為基底之遞規神經網路作深度學習訓練,來滿足我們期望之結果。除此之外,
本論文也將介紹如何能夠透過如Transflormer之新型模型,將轉錄過之音符作為音樂生
成之可用素材。本論文提出之各項實驗,皆以MiDi之格式作為深度學習之輸入源,搭
配Pytorch之深度學習框架所完成的。最後,本研究針對實驗之成果做深入之討論,探
討此項研究如何能夠更進階之優化以完成一互動產品,提供作曲家更友善之圖形化操
作介面。
摘要(英) Music Transcription and Generation is a wide field that has been looked over by many
with hope and curiosity. Hope to reach and surpass human skills and creativity, curios-
ity to find new ways of accomplishing tasks that were either difficult either impossible
for previously existing technology to succeed. In this thesis, we explore this field and
review the different existing technics used to realize those tasks. We then introduce the
several approaches tested during our research to try new methods or to improve the
current state-of-the-art. We first used an algorithmic approach based on the Waon al-
gorithm to transcript music notes and developed a Graphical User Interface to help for
this task. We then show how deep learning approach like Convolutional Neural net-
work linked with Recurrent Neural Network can give satisfying results in the matter
when adequate dataset is chosen, and how it can also be a great asset for generating
music with cutting edge models like the Transformer. For all those tasks we mainly
used the MiDi file format and Python frameworks like Pytorch to reach our goals. We
finally discuss on how those technics can improve a composer’s life to help him cre-
ate new music and improve his ideas, and how future work on this subject could be
focused on creating an ergonomic user interface for production use.
關鍵字(中) ★ 音樂
★ 深度學習
★ 轉錄
★ 生成
關鍵字(英) ★ Music
★ DeepLearning
★ Transcription
★ Generation
論文目次 Abstract i
Acknowledgements v
1 Introduction 1
2 Related Work 4
2.1 Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Music Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
MiDi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
PianoRoll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Automatic Music Transcription . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Impact of Spectrogram type on performances . . . . . . . . . . . 9
2.3.2 Onset and frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Imbalanced Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Music Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 MiDiNet, a DCGAN . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2 MuseGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.4 Music Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.3
2.2
2.3
2.4

3 Methodology 34
3.1 A first algorithmic approach . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.1 Embedded Application . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.2 Waon, a Wave-to-Notes transcriber . . . . . . . . . . . . . . . . . . 35
3.1.3 SoX, the Swiss Army knife of audio manipulation . . . . . . . . . 35
3.1.4 The experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Deep Learning Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1 Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Common Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
First used Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Second used Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
ROC, precision vs recall . . . . . . . . . . . . . . . . . . . . . . . . 43
Third Metric: mAP . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Fourth Metric: Mir_Eval . . . . . . . . . . . . . . . . . . . . . . . . 47
Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Subjective Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Objective Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Deep Learning Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
First Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . 50
Second Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . 50
Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 50
First Modification: Short Range Detector . . . . . . . . . . . . . . 51
Second Modification: Triplet-Ranking . . . . . . . . . . . . . . . . 52
Third Unfinished Modification: Transformer . . . . . . . . . . . . 52
Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Different Dataset and Style premise . . . . . . . . . . . . . . . . . 52
3.2
3.2.2
3.3
3.3.2
3.3.3

4 Experiment 54
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.2 MusicNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.3 Maestro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.4 Dataset Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Transcription: Short Range Detector . . . . . . . . . . . . . . . . . 58
Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Ablation-like study . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Unbalanced Dataset Discovery . . . . . . . . . . . . . . . . . . . . 63
Simple Strategy for unbalanced Dataset . . . . . . . . . . . . . . . 66
Advanced Method: Triplet Ranking . . . . . . . . . . . . . . . . . 67
Generation: Different Style Inference . . . . . . . . . . . . . . . . . 68
4.2
4.2.2
5 Discussion and Conclusion 70
5.1 Interpretations and insights . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
References
72
參考文獻 [1] Paula Branco, Luís Torgo, and Rita P. Ribeiro. “A Survey of Predictive Modelling
under Imbalanced Distributions”. In: CoRR abs/1505.01658 (2015). arXiv: 1505.
01658. URL : http://arxiv.org/abs/1505.01658.
[2] Kin Wai Cheuk, Kat Agres, and Dorien Herremans. “The impact of Audio in-
put representations on neural network based music transcription”. In: ArXiv
abs/2001.09989 (2020).
[3] Kyunghyun Cho et al. “On the Properties of Neural Machine Translation: Encoder-
Decoder Approaches”. In: ArXiv abs/1409.1259 (2014).
[4] Hao-Wen Dong et al. “MuseGAN: Symbolic-domain Music Generation and Ac-
companiment with Multi-track Sequential Generative Adversarial Networks”.
In: ArXiv abs/1709.06298 (2017).
[5] Qi Dong, Shaogang Gong, and Xiatian Zhu. “Imbalanced Deep Learning by Mi-
nority Class Incremental Rectification”. In: CoRR abs/1804.10851 (2018). arXiv:
1804.10851. URL : http://arxiv.org/abs/1804.10851.
[6] Jeffrey L. Elman. “Finding Structure in Time”. In: Cognitive Science 14.2 (1990),
pp. 179–211. DOI : 10.1207/s15516709cog1402\_1.
[7] Valentin Emiya. “Transcription automatique de la musique de piano”. In: 2008.
URL :
https://pastel.archives-ouvertes.fr/pastel-00004867/document.
[8] Kunihiko Fukushima. “Neocognitron: A Self-Organizing Neural Network Model
for a Mechanism of Pattern Recognition Unaffected by Shift in Position”. In: Bio-
logical Cybernetics 36 (1980), pp. 193–202.
72REFERENCES
REFERENCES
[9] Ian Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Infor-
mation Processing Systems 27. Ed. by Z. Ghahramani et al. Curran Associates, Inc.,
2014, pp. 2672–2680.
URL :
http://papers.nips.cc/paper/5423- generative-
adversarial-nets.pdf.
[10] Curtis Hawthorne et al. “Enabling Factorized Piano Music Modeling and Gener-
ation with the MAESTRO Dataset”. In: International Conference on Learning Repre-
sentations. 2019. URL : https://openreview.net/forum?id=r1lYRjC9F7.
[11] Curtis Hawthorne et al. “Onsets and Frames: Dual-Objective Piano Transcrip-
tion”. In: CoRR abs/1710.11153 (2017). arXiv: 1710.11153.
URL :
http://arxiv.
org/abs/1710.11153.
[12] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neu-
ral Computation 9.8 (1997), pp. 1735–1780.
[13] Anna Huang et al. “Visualizing Music Self-Attention”. In: 2018.
URL :
https://
openreview.net/pdf?id=ryfxVNEajm.
[14] Cheng-Zhi Anna Huang et al. “Music Transformer: Generating Music with Long-
Term Structure”. In: arXiv preprint arXiv:1809.04281 (2018).
[15] Muhammad Huzaifah. “Comparison of Time-Frequency Representations for En-
vironmental Sound Classification using Convolutional Neural Networks”. In:
CoRR abs/1706.07156 (2017). arXiv: 1706.07156.
URL :
http://arxiv.org/abs/
1706.07156.
[16] Rainer Kelz et al. “On the Potential of Simple Framewise Approaches to Piano
Transcription”. In: CoRR abs/1612.05153 (2016). arXiv: 1612.05153.
URL :
http:
//arxiv.org/abs/1612.05153.
[17] Colin Raffel et al. “mir_eval: a transparent implementation of common MIR met-
rics”. In: In Proceedings of the 15th International Society for Music Information Re-
trieval Conference, ISMIR. 2014.
[18] F. Rosenblatt. “The Perceptron: A Probabilistic Model for Information Storage
and Organization in The Brain”. In: Psychological Review (1958), pp. 65–386.
73REFERENCES
REFERENCES
[19] John Thickstun, Zaid Harchaoui, and Sham M. Kakade. “Learning Features of
Music from Scratch”. In: International Conference on Learning Representations (ICLR).
2017. URL : https://arxiv.org/pdf/1611.09827.pdf.
[20] Ashish Vaswani et al. “Attention is All you Need”. In: Advances in Neural Infor-
mation Processing Systems 30. Ed. by I. Guyon et al. Curran Associates, Inc., 2017,
pp. 5998–6008. URL : http://papers.nips.cc/paper/7181-attention-is-all-
you-need.pdf.
[21] Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. “MidiNet: A Convolutional
Generative Adversarial Network for Symbolic-domain Music Generation using
1D and 2D Conditions”. In: CoRR abs/1703.
指導教授 施國琛 Frederic Lassabe(Timothy K. Shih Frederic Lassabe) 審核日期 2020-7-9
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明