實現於音訊壓縮域之內涵式歌者分類法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：51

、訪客IP：3.128.198.21

姓名

黃昱翔(Yu-siang Huang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

實現於音訊壓縮域之內涵式歌者分類法
(Design and Implementation for Content-based Singer Classification on Compressed Domain Audio Data)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在本論文中，我們提出了一個實現於MP3及AAC音樂壓縮域的自動化歌者分類法。不同於早年在MP3壓縮域使用MDCT (Modified Discrete Cosine Transform) 係數的作法，在本論文中我們是使用梅爾倒頻係數 (Mel-Frequency Cepstral Coefficients, MFCC) 當作辨識之特徵值。雖然梅爾倒頻係數經常用於音樂分類及語者辨識，但是這類的研究大多都不是在壓縮域中實現，因為梅爾倒頻係數無法直接由壓縮域中取得。在本論文中，我們使用了一個修正的梅爾倒頻係數計算法，使得梅爾倒頻係數可以從MP3及AAC音樂壓縮域中取得。除此之外，為了描述特徵空間中梅爾倒頻係數向量的分布，我們使用了高斯混合模型 (Gaussian Mixture Model, GMM)。而為了找出最相近的歌者/分類，我們則是使用最大似然分類法 (Maximum Likelihood Classification, MLC)。藉由最大似然分類法，每一個輸入的梅爾倒頻係數向量將會分配到其最相似的群聚中。最後，我們將演算法實現在兩個不同的嵌入式平台上，分別是Socle CDK及ITRI PAC Duo。最後的實驗結果也證實了我們所提方法的可行性。

摘要(英)

In this thesis we proposed a singer classification approach to automatically identify the singer of an unknown MP3 or AAC audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it can not be directly obtained from compressed music data such as MP3 and AAC. In this thesis, we introduce a modified method for calculating MFCC vector in MP3 and AAC compressed domain. Besides, for describing the distribution of MFCC vectors in MFCC feature space, the GMM (Gaussian mixture model) is used. And then, for finding the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. Finally, we implement our approach on two embedded platforms, including Socle CDK and ITRI PAC Duo. Except the two embedded platforms, two operation system configurations are adopted, including Linux and Android. The experimental result verifies the feasibility of the proposed approach.

關鍵字(中)

★ 壓縮域
★ 內涵式
★ 歌者辨識
★ 歌者分類

關鍵字(英)

★ classification
★ identification
★ content-based
★ compressed domain
★ MP3
★ AAC

論文目次

摘要…………i
Abstract…………ii
致謝…………iii
Content…………iv
List of Figures…………vi
List of Tables…………viii
Chapter 1 Introduction…………1
1.1 Background…………1
1.2 Motivation…………3
1.3 Thesis Organization…………6
Chapter 2 Compressed Audio Decoding…………7
2.1 MP3 Decoding…………8
2.2 AAC Decoding…………9
Chapter 3 Background of Music Classification…………11
3.1 The Comparison between Music information retrieval and Singer Classification…………12
3.2 The basic structure of a music classification system…………13
3.2.1 Feature Extraction…………13
3.2.1.1 MFCC…………14
3.2.1.2 LPCC…………14
3.2.1.3 OFCC…………14
3.2.2 Similarity Computing…………15
Chapter 4 Previous Singer Identification Approach in MP3 Compressed Domain…………16
4.1 Phoneme Segmentation…………17
4.2 PMCV Computing…………18
4.3 Similarity Computing…………20
4.4 The Result of Liu’s Approach…………20
Chapter 5 Proposed Approach…………22
5.1 MFCC Computing in Compressed Domain…………23
5.2 Gaussian Mixture Model…………29
5.3 Singer Classification…………32
5.3.1 K-means Clustering……………32
5.3.2 Maximum Likelihood Classification…………33
Chapter 6 Experimental Result and Implementation…………36
6.1 Experimental Result…………37
6.2 Implementation on Socle CDK…………39
6.3 Implementation on ITRI PAC Duo…………43
Chapter 7 Conclusions and Future Work…………46
References…………48

參考文獻

[1] Wang Y, Yaroslavsky L, Vilermo M (2000) On the relationship between MDCT, SDFT and DFT. Proceeding of the 5th International Conference on Signal Processing, vol 1, pp. 44-47
[2] Chang LY, Yu XQ, Wan WG, Li CL, Xu XQ (2009) Research and realization of speech segmentation in MP3 compressed domain. Journal of Computer Applications 29(4):1188-1192
[3] Logan B (2000) Mel frequency cepstral coefficients for music modeling. Proceeding of the 1st International Symposium on Music Information Retrieval.
[4] Langlois T, Marques G (2009) A music classification method based on timbral features. Proceeding of the 10th International Society for Music Information Retrieval conference, pp. 81-86
[5] Mesaros A, Virtanen T, Klapuri A (2007) Singer identification in polyphonic music using vocal separation and pattern recognition methods. Proceeding of the 8th International Conference on Music Information Retrieval, pp. 375-378
[6] Liu CC, Huang CS (2002) A singer identification technique for content-based classification of MP3 music objects. Proceeding of the 11th International Conference on Information and Knowledge Management, pp. 438-445
[7] Tsai WH, Wang HM (2006) Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Transactions on 14(1): 330-341
[8] Tsai WH, Liao SJ, Lai C (2008) Automatic Identification of Simultaneous Singers in Duet Recordings. Proceeding of the 9th International Conference on Music Information Retrieval, pp. 115-120
[9] Sigurdsson S, Petersen KB, Lehn-Schiøler T (2006) Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music. Proceeding of the 7th International Conference on Music Information Retrieval.
[10] Gu HY, You ZR (2008) A speaker-clustering method using GMM and k-means. Proceeding of the 13th Taiwanese Association for Artificial Intelligence.
[11] Peng X, Xu W, Wang B (2005) Speaker clustering via novel pseudo-divergence of Gaussian mixture models. Proceeding of the 1st Natural Language Processing and Knowledge Engineering conference, pp. 111-114
[12] Hasan MR, Jamil M, Rahman MGRMS (2004) Speaker identification using Mel frequency cepstral coefficients. Proceeding of the 3rd International Conference on Electrical and Computer Engineering, pp. 566-568
[13] Maddage NC, Xu C, Wang Y (2004) Singer identification based on vocal and instrumental Models. Proceeding of the 17th International Conference on Pattern Recognition, pp. 375-378
[14] Shen J, Cui B, Shepherd J, Tan KL (2006) Towards efficient automated singer identification in large music databases. Proceeding of the 29th Special Interest Group on Information Retrieval, pp. 59-66
[15] Mesaros A, Astola J (2005) The Mel-frequency cepstral coefficients in the context of singer identification. Proceeding of the 6th International Conference on Music Information Retrieval.
[16] Panagakis Y, Kotropoulos C, Arce GR (2009) Music genre classification using locality preserving non-negative tensor factorization and sparse representations. Proceeding of the 10th International Society for Music Information Retrieval, pp. 249-254
[17] Abeßer J, Lukashevich H, Dittmar C, Schuller G (2009) Genre classification using bass-related high-level features and playing styles. Proceeding of the 10th International Society for Music Information Retrieval, pp. 453-458
[18] Panagakis I, Benetos E, Kotropoulos C (2008) Music genre classification: a multilinear approach. Proceeding of the 9th International Society for Music Information Retrieval, pp. 583-588
[19] Sony Ericsson TrackID, http://www.sonyericsson.com/product/trackid/
[20] Lidy T, Rauber A (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. Proceeding of the 6th International Society for Music Information Retrieval, pp. 34-41
[21] Shazam, http://www.shazam.com/
[22] Jang JS. Audio Signal Processing and Recognition Chapter 12: Speech Features, http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/speechFeatureMfcc.asp
[23] ISO/IEC JTC1/SC29/WG11 No.1650 “ISO 13818-7 (MPEG-2 Advanced Audio Coding, AAC)”, Apr. 1997.
[24] Bouman CA (2005) Cluster: An unsupervised algorithm for modeling Gaussian mixtures. Tech. rep., School of Electrical Engineering, Purdue University, https://engineering.purdue.edu/bouman/software/cluster
[25] Socle CDK platform, http://www.socle-tech.com.tw/en/service_9.html
[26] K-means clustering from Wiki, http://en.wikipedia.org/wiki/K-means_clustering
[27] Simple DirectMedia Layer, http://www.libsdl.org/
[28] PAC Product Home, http://pac.itri.org.tw/
[29] Android NDK, http://developer.android.com/sdk/ndk/index.html
[30] MP3 coding from Wiki, http://en.wikipedia.org/wiki/MP3#Decoding_audio

指導教授

蔡宗漢(Tsung-han Tsai)

審核日期

2010-11-18

推文