基於階層式狄氏程序混合模型之音樂情緒標註之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：32

、訪客IP：13.59.88.181

姓名

陳膺任(Chen-Ying Ren) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於階層式狄氏程序混合模型之音樂情緒標註之研究
(A Study on Hierarchical Dirichlet Process Mixture Model Based Music Emotion Annotation)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

音樂在人們生活中有著舉足輕重的地位，在資訊數位化且越來越容易保存的現代，
音樂資料庫也更加龐大，因此需要一個自動化的分類方式，來幫助人們快速找到自己需
要的音樂。傳統上可能會以歌手、類型來做為類別，但是音樂真正影響我們的其實是它
所釋放的情感，因此以音樂情緒作為類別來進行標註或檢索的研究，也逐漸受到重視。
傳統上的情緒模型，將每個類別分開建構，但真實世界中情緒的界定並不是那麼清
楚，也就是情緒之間會有一些模糊地帶或是重疊。而我們考慮到這一點，並基於階層式
狄氏程序混合模型(Hierarchical Dirichlet Process Mixture Model)可以共用成分的特性，建
立模型間的連結關係，提出一個音樂情緒標註與檢索系統。同時我們也考慮到共用的特
性可能會造成類別間的混淆，因此我們基於線性鑑別分析(Linear Discriminant Analysis)
的概念，在系統中加入了鑑別性的因素，另外，本系統以一個對應整體情緒成分的權重
向量，來表示每個情緒類別。對於測試資料，我們也提出了三種不同的方法，來產生測
試資料對應的權重向量。而我們利用該權重向量來判斷測試資料是否包含某個情緒。
在實驗部分，其結果顯示我們提出的系統不論在標註或檢索上都有更好的表現，而
我們也將討論不同的計算測試資料權重方法的差異。

摘要(英)

The development of digital technology has enabled the storage of large collections music
could be. For the convenience of users, some music database applications tag songs with
some class labels. In tradition, music was classified by artist or genre, but the real influence of
music is the emotion which it releases. Therefore, researchers have recently been studying the
music annotation and retrieval method.
In tradition, the model of each emotion was constructed individually, but an emotion
cannot be defined clearly in the real world because the classes of emotions are usually
considered overlapping. Accordingly, this paper proposes an music annotation and retrieval
system that is based on hierarchical Dirichlet process mixture model (HDPMM), whose
components can be shared between each model of emotions. Moreover, an improvement in
HDPMM is proposed by added a discriminant factor to the proposed system based on the
concept of linear discriminant analysis. The proposed system represents an emotion using a
weighting coefficient that is related to a global set of components. Moreover, three methods
are proposed to compute the weighting coefficients of testing data, and using the weighting
coefficient to determine whether the testing data contain certain emotional content or not.
Experimental results show that the proposed system performs well in automatic music
emotion annotation and retrieval tasks. Finally, the evaluation of the three methods for
computing weighting coefficients of testing data is also discussed in the experiments.

關鍵字(中)

★ 階層式狄氏程序
★ 音樂情緒辨識

關鍵字(英)

★ Hierarchical Dirichlet Process
★ Music Emotion recognition

論文目次

摘要........ .................................................................................................................................... i
Abstract... ................................................................................................................................. ii
章節目次 .................................................................................................................................. iii
圖目錄… ................................................................................................................................... v
表目錄… .................................................................................................................................. vi
第一章緒論 ............................................................................................................................ 1
1.1. 前言 ............................................................................................................................... 1
1.2. 研究動機與目的 ............................................................................................................ 2
1.3. 論文架構與章節概要 .................................................................................................... 3
第二章相關研究及文獻探討 ................................................................................................ 4
2.1. 音樂情緒特徵 ................................................................................................................ 4
2.1.1. 均方根振幅(Root Mean Square Energy) ....................................................................... 4
2.1.2. 音樂事件密度(Event Density) ......................................................................................... 4
2.1.3. 粗糙度(Roughness) .......................................................................................................... 5
2.1.4. 音調(Chromagram) ......................................................................................................... 5
2.1.5. 大小調(Mode) ................................................................................................................... 6
2.1.6. 過零率(Zero Crossing Rate) ........................................................................................... 6
2.1.7. 梅爾倒頻譜係數(Mel-scale Frequency Cepstral Coefficients, MFCC) ...................... 6
2.2. 音樂情緒分類與分類器方法文獻回顧 ........................................................................ 7
2.2.1. 高斯混合模型（Gaussian Mixture Model, GMM） .................................................... 7
2.2.2. 支持向量機(Support Vector Machine, SVM ) .............................................................. 8
2.2.3. 考量整體情境(Context)的音樂分類方法 ..................................................................... 10
2.2.4. 數值化音樂情緒表示方法 ............................................................................................. 11
iv
第三章階層式狄氏程序 ...................................................................................................... 13
3.1. 狄氏程序(Dirichlet Process)簡介 .............................................................................. 13
3.2. 狄氏程序建構方法 ...................................................................................................... 13
3.2.1. 截棍程序(Stick-Breaking Process) ............................................................................... 13
3.2.2. 中國餐廳程序(Chinese Restaurant Process) ............................................................... 15
3.3. 狄氏程序混合模型(Dirichlet Process Mixture Model) ............................................ 16
3.4. 階層式狄氏程序(Hierarchical Dirichlet Process)混合模型 .................................... 18
3.4.1. 基於中國連鎖餐廳的後驗取樣 ..................................................................................... 22
3.4.2. Augmented Representation 後驗取樣 ......................................................................... 24
3.4.3. 直接分配(Direct Assignment)後驗取樣 ....................................................................... 25
第四章音樂情緒標註系統 .................................................................................................. 26
4.1. 簡介(Introduction) ...................................................................................................... 26
4.2. 基於階層式狄氏程序混合模型之音樂情緒標註系統 .............................................. 27
4.3. 結合階層式狄氏程序與稀疏表示方法 ...................................................................... 32
4.4. 考慮鑑別性因素之系統 .............................................................................................. 35
第五章實驗結果 .................................................................................................................. 37
5.1. 實驗設置與環境 .......................................................................................................... 37
5.2. 音樂情緒標註實驗 ...................................................................................................... 38
5.3. 音樂情緒檢索實驗 ...................................................................................................... 41
第六章結論及未來研究方向 .............................................................................................. 42
參考文獻 ................................................................................................................................. 44

參考文獻

[1] Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A.
Speck, and D. Turnbull, “Music emotion recognition: A state of the art review ,”
International Society for Music Information Retrieval Conference, 2010.
[2] Y. H. Yang, and H. H. Chen, “Machine recognition of music emotion: A review,” ACM
Transactions on Intelligent Systems and Technology, vol. 3, no. 3, pp. 1-30, May. 2012.
[3] O. Lartillot, and P. Toiviainen, “A Matlab Toolbox for Musical Feature Extraction From
Audio,” International Conference on Digital Audio Effects, Bordeaux, 2007.
[4] T. S. Ferguson, “A Bayesian analysis of some nonparametric problems,” The Annals of
Statistics, 1973.
[5] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical dirichlet processes,”
Journal of the American Statistical Association, vol. 101, no. 467, pp. 1566-1781, 2006.
[6] Y. W. Teh, “Dirichlet Processes,” Encyclopedia of Machine Learning, Springer, 2010.
[7] J. Sethuraman, “A Constructive Definition of Dirichlet Priors,”Statistica Sinica, vol. 4,
pp. 639–650, 1994.
[8] J. Pitman, “Poisson–Dirichlet and GEM Invariant Distributions for Split-and-Merge
Transformations of an Interval Partition,” Combinatorics, Probability and Computing,
vol. 11, pp. 501–514, 2002.
[9] D. Blackwell, and J. MacQueen, “Ferguson distributions via Polya urn schemes,” Annals
of Statistics, vol. 1, pp. 353-355, 1973.
[10] A. Ranganathan, “The Dirichlet Process Mixture (DPM) Model”, [online], 2006,
available: http://www.ananth.in/Notes_files/ dirichlet.pdf
[11] C. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Information
Processing Systems, vol. 12, pp. 554–560, 2000.
[12] R. M. Neal, “Bayesian Mixture Modeling,” Proceedings of the Workshop on Maximum
Entropy and Bayesian Methods of Statistical Analysis, vol. 11, pp. 197–211, 1992.
[13] P. Green, and S. Richardson, “Modelling Heterogeneity With and Without the Dirichlet
Process,” Scandinavian Journal of Statistics, vol. 28, pp. 355-377, 2001.
[14] H. Ishwaran, and M. Zarepour, “Exact and Approximate Sum-Representations for the
Dirichlet Process,” Canadian Journal of Statistics, vol. 30, pp. 269 –283, 2002.
[15] C. E. Antoniak, “Mixtures of Dirichlet Processes With Applications to Bayesian
Nonparametric Problems,” The Annals of Statistics, vol. 2, pp. 1152–1174, 1974.
[16] K. P. Murphy, “Conjugate Bayesian analysis of the Gaussian distribution,” 2007, [online]
Available: http://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf
[17] C. Fraley, and A. E. Raftery, “Bayesian Regularization for Normal Mixture Estimation
and Model-Based Clustering,” Journal of Classification, vol. 24, pp. 155–181, 2007.
[18] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via
sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 31, no. 2, pp. 210–227, Feb. 2009.
[19] P. Nagesh, and B. Li, “A compressive sensing approach for expression-invariant face
recognition,” IEEE Conference on Computer Vision and Pattern Recognition., pp. 1518
– 1525, June 2009.
[20] Z. Zeng, H. Li, W. Liang, and S. Zhang, “Similarity- Towards image classification via
kernelized sparse representation,” IEEE Conference on Image Processing, pp. 277-280,
Sept. 2010.
[21] K. Huang and S. Aviyente, “Sparse Representation for Signal Classification,” Neural
Information Processing Systems, 2006.
[22] J. M. K. Kua, E. Ambikairajah, J. Epps, and R. Togneri, “Speaker verification using
sparse representation classification,” International Conference on Acoustics, Speech and
Signal Processing, , pp. 4548–4551, May 2011.
[23] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming,”
[online], September, 2013, available: http://cvxr.com/cvx
[24] M. Grant and S. Boyd, “Graph implementations for nonsmooth convex programs,”
Lecture Notes in Control and Information Sciences, Springer, pp. 95-110, 2008.
[25] C. Y. Lin and S. Cheng, “Multi-theme analysis of music emotion similarity for jukebox
application,” International Conference on Language and Image Processing, pp. 241-246,
July, 2012.
[26] B. Shao, M. Ogihara, D. Wang, and T. Li, “Music recommendation based on acoustic
features and user access patterns,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 17, pp. 1602-1611, 2009.
[27] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet, “Semantic Annotation and
Retrieval of Music and Sound Effects,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 16, pp. 467–476, Feb., 2008.
[28] M. I. Mandel and D. P. W. Ellis, “Multiple-Instance Learning for Music Information
Retrieval,” The International Society for Music Information Retrieval, 2008.
[29] S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support vector machines for
multiple-instance learning”. Advances in Neural Information Processing Systems, vol. 15,
pp. 561–568. MIT Press, Cambridge, MA, 2003.
[30] Y. Chen, J. Bi, and J. Z. Wang, “MILES: Multiple-instance learning via embedded
instance selection”. IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 28, pp. 1931–1947, 2006.
[31] R. Miotto and G. Lanckriet, “A Generative Context Model for Semantic Music
Annotation and Retrieval,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 20, pp. 1096–1108, May 2012
[32] S. Ness, A. Theocharis, G. Tzanetakis, and L. Martins, “Improving automatic music tag
annotation using stacked generalization of probabilistic SVM outputs,” ACM
MULTIMEDIA, pp. 705–708, 2009.
[33] Y. Yang, Y. Lin, A. Lee, and H. Chen, “Improving musical concept detection by ordinal
regression and context fusion,” International Society for Music Information Retrieval, pp.
147–152, 2009.
[34] E. Mower, M. J. Mataric, S. Narayanan, “A Framework for Automatic Human Emotion
Classification Using Emotion Profiles,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 19, pp. 1057–1070, Jul. 2011.
[35] Y. H. Yang, Y. C. Lin, Y. F. Su, and H. H. Chen, “A Regression Approach to Music
Emotion Recognition,” IEEE Transactions on Audio, Speech, and Language Processing,
vol.16, pp. 448–457, Feb. 2008.
[36] K. F. MacDorman, S. Ough, and C. C. Ho, “Automatic Emotion Prediction of Song
Excerpts: Index Construction, Algorithm Design, and Empirical Comparison,” Journal
of New Music Research, vol. 36, no. 4, pp. 283–301, 2007.
[37] A. J. Smola, and B. Schölkopf, “A tutorial on support vector regression,” Statistics and
Computing, pp. 199–222, 2004.
[38] A. Sen, and M. Srivastava, “Regression Analysis: Theory, Methods, and Applications”.
Springer, 1990.
[39] D. P. Solomatine, and D. L. Shrestha, “AdaBoost.RT: a boosting algorithm for
regression problems,” International Joint Conference on Neural Networks, pp. 1163–
1168, 2004.
[40] J. C. Wang, Y. H. Yang, H. M. Wang, and S. K. Jeng, “The acoustic emotion gaussians
model for emotion-based music annotation and retrieval,” ACM international conference
on Multimedia, pp. 89–98, 2012.
[41] R. He, W. S. Zheng, B. G. Hu, and X. W. Kong, “Two-Stage Nonnegative Sparse
Representation for Large-Scale Face Recognition,” IEEE Transactions on Neural
Networks and Learning Systems, vol. 24, pp. 35-46, Jan. 2013.
[42] M. D. Escobar, and M. West, “Bayesian Density Estimation and Inference Using
Mixtures,” Journal of the American Statistical Association, vol. 90, pp. 577-588, 1995.
[43] J. Y. Zhou, F. Y. Wang, and D. J. Zeng, “Hierarchical Dirichlet Processes and Their
Applications: A Survey,” Acta Automatica Sinica, vol. 37, no.4, Apr. 2011.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2014-8-26

推文