摘要(英) |
In this thesis we proposed a singer classification approach to automatically identify the singer of an unknown MP3 or AAC audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it can not be directly obtained from compressed music data such as MP3 and AAC. In this thesis, we introduce a modified method for calculating MFCC vector in MP3 and AAC compressed domain. Besides, for describing the distribution of MFCC vectors in MFCC feature space, the GMM (Gaussian mixture model) is used. And then, for finding the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. Finally, we implement our approach on two embedded platforms, including Socle CDK and ITRI PAC Duo. Except the two embedded platforms, two operation system configurations are adopted, including Linux and Android. The experimental result verifies the feasibility of the proposed approach.
|
參考文獻 |
[1] Wang Y, Yaroslavsky L, Vilermo M (2000) On the relationship between MDCT, SDFT and DFT. Proceeding of the 5th International Conference on Signal Processing, vol 1, pp. 44-47
[2] Chang LY, Yu XQ, Wan WG, Li CL, Xu XQ (2009) Research and realization of speech segmentation in MP3 compressed domain. Journal of Computer Applications 29(4):1188-1192
[3] Logan B (2000) Mel frequency cepstral coefficients for music modeling. Proceeding of the 1st International Symposium on Music Information Retrieval.
[4] Langlois T, Marques G (2009) A music classification method based on timbral features. Proceeding of the 10th International Society for Music Information Retrieval conference, pp. 81-86
[5] Mesaros A, Virtanen T, Klapuri A (2007) Singer identification in polyphonic music using vocal separation and pattern recognition methods. Proceeding of the 8th International Conference on Music Information Retrieval, pp. 375-378
[6] Liu CC, Huang CS (2002) A singer identification technique for content-based classification of MP3 music objects. Proceeding of the 11th International Conference on Information and Knowledge Management, pp. 438-445
[7] Tsai WH, Wang HM (2006) Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Transactions on 14(1): 330-341
[8] Tsai WH, Liao SJ, Lai C (2008) Automatic Identification of Simultaneous Singers in Duet Recordings. Proceeding of the 9th International Conference on Music Information Retrieval, pp. 115-120
[9] Sigurdsson S, Petersen KB, Lehn-Schiøler T (2006) Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music. Proceeding of the 7th International Conference on Music Information Retrieval.
[10] Gu HY, You ZR (2008) A speaker-clustering method using GMM and k-means. Proceeding of the 13th Taiwanese Association for Artificial Intelligence.
[11] Peng X, Xu W, Wang B (2005) Speaker clustering via novel pseudo-divergence of Gaussian mixture models. Proceeding of the 1st Natural Language Processing and Knowledge Engineering conference, pp. 111-114
[12] Hasan MR, Jamil M, Rahman MGRMS (2004) Speaker identification using Mel frequency cepstral coefficients. Proceeding of the 3rd International Conference on Electrical and Computer Engineering, pp. 566-568
[13] Maddage NC, Xu C, Wang Y (2004) Singer identification based on vocal and instrumental Models. Proceeding of the 17th International Conference on Pattern Recognition, pp. 375-378
[14] Shen J, Cui B, Shepherd J, Tan KL (2006) Towards efficient automated singer identification in large music databases. Proceeding of the 29th Special Interest Group on Information Retrieval, pp. 59-66
[15] Mesaros A, Astola J (2005) The Mel-frequency cepstral coefficients in the context of singer identification. Proceeding of the 6th International Conference on Music Information Retrieval.
[16] Panagakis Y, Kotropoulos C, Arce GR (2009) Music genre classification using locality preserving non-negative tensor factorization and sparse representations. Proceeding of the 10th International Society for Music Information Retrieval, pp. 249-254
[17] Abeßer J, Lukashevich H, Dittmar C, Schuller G (2009) Genre classification using bass-related high-level features and playing styles. Proceeding of the 10th International Society for Music Information Retrieval, pp. 453-458
[18] Panagakis I, Benetos E, Kotropoulos C (2008) Music genre classification: a multilinear approach. Proceeding of the 9th International Society for Music Information Retrieval, pp. 583-588
[19] Sony Ericsson TrackID, http://www.sonyericsson.com/product/trackid/
[20] Lidy T, Rauber A (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. Proceeding of the 6th International Society for Music Information Retrieval, pp. 34-41
[21] Shazam, http://www.shazam.com/
[22] Jang JS. Audio Signal Processing and Recognition Chapter 12: Speech Features, http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/speechFeatureMfcc.asp
[23] ISO/IEC JTC1/SC29/WG11 No.1650 “ISO 13818-7 (MPEG-2 Advanced Audio Coding, AAC)”, Apr. 1997.
[24] Bouman CA (2005) Cluster: An unsupervised algorithm for modeling Gaussian mixtures. Tech. rep., School of Electrical Engineering, Purdue University, https://engineering.purdue.edu/bouman/software/cluster
[25] Socle CDK platform, http://www.socle-tech.com.tw/en/service_9.html
[26] K-means clustering from Wiki, http://en.wikipedia.org/wiki/K-means_clustering
[27] Simple DirectMedia Layer, http://www.libsdl.org/
[28] PAC Product Home, http://pac.itri.org.tw/
[29] Android NDK, http://developer.android.com/sdk/ndk/index.html
[30] MP3 coding from Wiki, http://en.wikipedia.org/wiki/MP3#Decoding_audio
|