基於機器學習方法之巨量音樂檢索系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：87

、訪客IP：52.15.245.1

姓名

黃梓翔(Tzu-Hsiang Huang) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於機器學習方法之巨量音樂檢索系統
(Large-Scale Music Retrieval System Using Machine Learning Approaches)

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在大數據的時代中，網際網路上的多媒體資訊量以指數性成長，如何正確地尋找特定多媒體資訊成為一個重要的研究議題。
本系統參考翻唱歌曲辨識的理論架構，利用歌曲的音樂內涵式特徵，消除不同樂器、語言、歌手等等演奏時的音色、調性與些微結構差異，尋找資料庫中與輸入歌曲俱有相似旋律特徵的歌曲。
在內涵式音樂檢索領域中，由於不同歌曲的時間長度不一，先前的研究以輸入歌曲對整個資料庫的歌曲進行高複雜度的比對來計算兩首歌曲的相似度，最後輸出資料庫中相似度最高的歌曲清單，這種方法雖然盡可能提升辨識正確率，但是消耗過多的運算資源，在大規模的資料庫並不可行。本研究提出在大規模資料庫中快速檢索特定相似歌曲的系統，系統擷取音樂的頻譜特徵並以二維傅立葉轉換壓縮資料，接著合併成固定長度的向量，再以K-Means、主成份分析、線性判別分析等機器學習的方式強化向量的模式特徵，藉此將資料庫的全部歌曲投影到一個向量空間，系統直接比對查詢歌曲與資料庫歌曲的向量距離，將相似度最高的音樂作為回饋歌單。本系統不僅大幅度地提升內涵式音樂檢索的效率，更探討音樂檢索結合機器學習的潛力。

摘要(英)

In this work, we proposed a music retrieval system which can search the similar music in large-scale database.
Large-scale similar music recognition should calculate song-to-song simi-larity that can accommodate differences in timing, key and tempo. Simple vector distance measure is not powerful enough to perform the similar music recogni-tion task, but expensive solutions such as dynamic time warping do not scale to millions of instances, making the similar music recognition inappropriate for commercial-scale application. In this work, we used the content-based music features of songs as input and transformed them into semantic vectors by 2D-Fourier transform. We even explored different machine learning approaches to learn and reinforce the pattern of these semantic vector. By projecting the songs into the sematic vector space, we can use the efficient nearest neighbor algorithm to compare the similarity of songs and retrieve the most similar songs in the large-scale database.
The proposed system is not only efficient enough to perform scalable con-tent-based music retrieval, but also develop the potential of machine learning approaches, making the similar music recognition application more fast and accurate.

關鍵字(中)

★ 音樂資訊檢索
★ 翻唱歌曲辨識
★ 二維傅立葉轉換
★ 機器學習

關鍵字(英)

★ Music information retrieval
★ Cover song identification
★ 2D-Fourier transform
★ Machine learning

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 v
圖目錄 viii
表目錄 xi
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與目的 2
1-3 論文架構 3
第二章音樂資訊檢索 4
2-1 音樂檢索特徵 4
2-1-1 低階特徵 5
2-1-2 中階特徵 7
2-1-3 二維傅立葉轉換 9
2-2 翻唱歌曲辨識 12
2-2-1 翻唱歌曲的類型與音樂特性 13
2-2-2 翻唱歌曲辨識方法 16
2-3 音樂特徵比對方法 17
2-3-1 歐式距離 19
2-3-2 曼哈頓距離 20
2-3-3 餘弦距離 21
第三章機器學習 22
3-1 K平均演算法 24
3-2 主成份分析 27
3-3 線性判別分析 29
3-4 最近鄰居分類 31
3-4-1 搜尋演算法 33
第四章提出之架構 36
4-1 特徵擷取 37
4-2 特徵前處理 38
4-3 特徵學習與轉換 45
4-4 檢索系統 49
第五章實驗與分析 50
5-1 實驗環境 50
5-1-1 實驗資料庫 51
5-1-2 效能評估方法 53
5-1-3 實驗設計 56
5-2 二元選擇實驗 57
5-2-1 參數選擇 58
5-3 檢索實驗 60
5-3-1 訓練集參數選擇 61
5-3-2 測試集參數選擇 65
5-3-3 大規模資料庫參數選擇 70
5-4 實驗結果比較與分析 73
第六章結論與未來展望 77
參考文獻 78

參考文獻

[1] Serra, Joan, Emilia Gómez, and Perfecto Herrera. "Audio cover song iden-tification and similarity: background, approaches, evaluation, and beyond", Advances in Music Information Retrieval, pp. 307-332, Springer Berlin Heidelberg, 2010.
[2] Tzanetakis, George, Andrey Ermolinskyi, and Perry Cook, "Pitch histograms in audio and symbolic music information retrieval", Journal of New Music Research, pp. 143-152, 2003.
[3] T. Bertin-Mahieux and D. P. W. Ellis, "Large-scale cover song recognition using hashed chroma landmarks", 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 117-120, New Paltz, NY, 2011.
[4] Bertin-Mahieux, Thierry, and Daniel PW Ellis, "Large-Scale Cover Song Recognition Using the 2D Fourier Transform Magnitude.", International So-ciety for Music Information Retrieval Conference (ISMIR), 2012.
[5] Khadkevich, Maksim, and Maurizio Omologo, "Large-Scale Cover Song Identification Using Chord Profiles.", International Society for Music In-formation Retrieval Conference (ISMIR), 2013.
[6] M. Marolt, "A Mid-Level Representation for Melody-Based Retrieval in Audio Collections," in IEEE Transactions on Multimedia, vol. 10, no. 8, pp. 1617-1625, Dec. 2008.
[7] Schmidt, Erik, and Youngmoo Kim, "Learning Rhythm And Melody Features With Deep Belief Networks", International Society for Music In-formation Retrieval Conference (ISMIR), 2013.
[8] O. Nieto and J. P. Bello, "Music segment similarity using 2D-Fourier Magni-tude Coefficients," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 664-668, Florence, 2014.
[9] 林銀議，信號與系統，二版，五南圖書出版股份有限公司，台北市，2009年。
[10] Slaney, Malcolm, Kilian Weinberger, and William White, "Learning a met-ric for music similarity." International Symposium on Music Information Retrieval (ISMIR). 2008.
[11] J. Schluter and C. Osendorfer, "Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine", Machine Learning and Applications and Workshops (ICMLA), pp. 118-123, 2011 10th International Conference on, Honolulu, HI, 2011.
[12] J. Stephen Downie: MIREX 2006:Audio Cover Song. 2006, from http://www.music-ir.org/mirex/wiki/2006:Audio_Cover_Song
[13] Ranjani, S. Sri, et al. "Application of SHAZAM-Based Audio Finger-printing for Multilingual Indian Song Retrieval", Advances in Communi-cation and Computing, pp. 81-92, Springer India, 2015.
[14] Bertin-Mahieux, Thierry, et al. "The million song dataset", International Society for Music Information Retrieval Conference (ISMIR). Vol. 2. No. 9. 2011.
[15] Pedregosa, Fabian, et al, "Scikit-learn: Machine learning in Python", Journal of Machine Learning Research, pp. 2825-2830, 12, Oct, 2011.
[16] E. J. Humphrey, J. P. Bello, and Y. LeCun, “Moving beyond feature design: Deep architectures and automatic feature learning in music informatics”, International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, October 2012.
[17] Honglak Lee, Peter Pham, Yan Largman, and Andrew Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks”, Advances in Neural Information Processing Systems, 22. 2009.
[18] Hamel, Philippe, and Douglas Eck, "Learning Features from Music Audio with Deep Belief Networks", International Society for Music Information Retrieval Conference (ISMIR), 2010.
[19] Humphrey, Eric J., Juan P. Bello, and Yann LeCun, "Feature learning and deep architectures: new directions for music informatics", Journal of Intel-ligent Information Systems, pp. 461-4814. 2013.
[20] Y. Kim, H. Lee and E. M. Provost, "Deep learning for robust feature generation in audiovisual emotion recognition", 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3687-3691, Vancouver, BC, 2013.
[21] Dieleman, Sander, and Benjamin Schrauwen, "Multiscale approaches to music audio feature learning", International Society for Music Information Retrieval Conference (ISMIR), Pontifícia Universidade Católica do Paraná, 2013.
[22] S. Dieleman and B. Schrauwen, "End-to-end learning for music audio" ,2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964-6968, Florence, 2014..
[23] Coates, Adam, Honglak Lee, and Andrew Y. Ng, "An analysis of sin-gle-layer networks in unsupervised feature learning", Ann Arbor, 2010.

指導教授

張寶基(Pao-Chi Chang)

審核日期

2016-7-26

推文