基於貝氏非參數樹狀結構混合模型之階層式音訊表示法之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：35

、訪客IP：3.142.245.243

姓名

羅基宏(Chi-hung Lo) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於貝氏非參數樹狀結構混合模型之階層式音訊表示法之研究
(A Study on Hierarchical Representation of Audio based on Bayesian Nonparametric Tree-Structured Mixture Model)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

將事物以階層式的架構分類是符合人類直覺的一種分類方式，例如購物網站的商品分類或書店的書目分類等。本論文試圖將階層式架構的分類概念引入音訊分類的問題中，因此我們提出貝氏非參數樹狀結構混合模型(Bayesian Nonparametric Tree-structured Mixture Model)。此模型以樹狀結構來表示音訊資料，接近上層根部的節點模擬音訊之間的共通成分，接近下層葉部的節點模擬音訊的獨特成分。這個模型以巢狀式中國餐廳程序(Nested Chinese Restaurant Process, nCRP)作為樹狀結構模型的先驗分布(Prior Distribution)，由資料自動調適決定樹狀結構的寬度與深度，理論上可達成擴展成無限擴張的樹狀結構。這種非監督式的學習(Unsupervised Learning)方式解決了模型選擇(Model Selection)，過度估測(Over-estimation)等等的問題。

本論文以吉布斯取樣演算法(Gibbs Sampling Algorithm)來解決模型推論(Model Inference)的問題。透過事後機率的取樣得到音檔在樹狀結構上的特徵，利用這個結果當作是聚類後的特徵參數，最後接上分類器來做音訊分類的實驗。我們使用各種音訊檔案如環境聲音，吉他演奏技巧，音樂類型，音樂子類型作實驗，結果顯示我們的模型在不同類別之間存在聽覺上較為像似的資料庫中，可以有更好的聚類效果，因而提升最後的辨識率。

摘要(英)

The idea of hierarchically organize things is human intuition. For example, the items organized hierarchically in shopping website or the book store. In our work, we try to bring this idea into the audio file classifiy problem, so we develop the Bayesian nonparametric tree-structured mixture model. This model constructs the tree-structured representation for audio file. The root node of this tree presents the sharing parts between different audio, the left node presents the unique parts for each audio. We use the nested Chinese restaurant process (nCRP) as the prior distribution for the tree-structured model. Our model is automatically adjust the width and depth of the tree and could be extended to the infinite tree theoretically. This unsupervised learning method solved the problem of model selection and the over-estimation.

We use the Gibbs sampling algorithm to solve the problem of model inference. According to the posterior probabilities sampling, every audio file has a path on this tree and frame distribution among level on this path. Using this result as the clustering feature, then we put this feature into the classifier to get the recognition result. In our experimentation, we collect many different type of audio file database, like environment sounds, guitar-tech clips, music genre and music sub-genre. The result shows the recognition rate is improved via our proposal model.

關鍵字(中)

★ 貝式非參數混合模型

關鍵字(英)

★ Bayesian Nonparametric Mixture Model

論文目次

章節目次

摘要........ iii

Abstract... iv

章節目次 v

圖目錄.... vii

表目錄.... viii

第一章緒論 1

1.1. 前言 1

1.2. 研究動機與目的 1

1.3. 論文架構與章節概要 2

第二章相關文獻探討 3

2.1. 傳統分類器回顧 3

2.1.1. 高斯混合模型(Gaussian Mixture Model, GMM) 3

2.1.2. 支持向量機(Support Vector Machine, SVM) 4

2.1.3. 稀疏表示分類器(Sparse Representation Classifier, SRC) 6

2.2. 音訊主題模型 7

2.2.1. 參數型模型(Parametric Model) 8

2.2.2. 混合模型(Mixture Model) 8

2.2.3. 潛在狄氏配置(Latent Dirichlet Allocation, LDA) 10

2.2.4. 高斯潛在狄氏配置(Gaussian-LDA) 11

2.3. 狄氏程序(Dirichlet process, DP) 12

2.4. 狄氏程序建構方法 13

2.4.1. 截棍程序(Stick-breaking Process) 13

2.4.2. 中國餐廳程序(Chinese Restaurant Process, CRP) 15

2.5. 貝式非參數混合模型(Bayesian Nonparametric Mixture Model) 17

2.5.1. 介紹 17

2.5.2. 階層式狄氏程序(Hierarchical Dirichlet Process, HDP) 18

第三章貝氏非參數樹狀混合模型 21

3.1. 巢狀中國餐廳程序(Nested Chinese Restaurant Process, nCRP) 21

3.2. 階層式潛在狄氏配置(Hierarchical Latent Dirichlet Allocation, hLDA) 22

3.2.1. 模型介紹 23

3.2.2. 模型推論 25

3.3. 高斯階層式潛在狄氏配置(Gaussian-hLDA) 30

3.3.1. 模型介紹 30

3.3.2. 模型推論 34

3.3.3. 演算法 36

3.3.4. 計算加速 37

3.3.5. 模型比較 38

第四章音訊辨識系統與實驗結果 39

4.1. 音訊辨識系統簡介 39

4.2. 實驗設置與結果 41

第五章結論及未來研究方向 48

參考文獻 49

參考文獻

[1] S Tong, D Koller, “Support vector machine active learning with applications to text classification, ” The Journal of Machine Learning Research, 2002.

[2] Z. Zeng, H. Li, W. Liang, and S. Zhang, “Similarity- Towards image classification via kernelized sparse representation,” IEEE Conference on Image Processing, pp. 277-280, Sept. 2010.

[3] K. Huang and S. Aviyente, “Sparse Representation for Signal Classification,” Neural Information Processing Systems, 2006.

[4] J. M. K. Kua, E. Ambikairajah, J. Epps, and R. Togneri, “Speaker verification using sparse representation classification,” International Conference on Acoustics, Speech and Signal Processing, , pp. 4548–4551, May 2011.

[5] R. He, W. S. Zheng, B. G. Hu, and X. W. Kong, “Two-Stage Nonnegative Sparse Representation for Large-Scale Face Recognition,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, pp. 35-46, Jan. 2013.

[6] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming,” [online], September, 2013, available: http://cvxr.com/cvx

[7] M. Grant and S. Boyd, “Graph implementations for nonsmooth convex programs,” Lecture Notes in Control and Information Sciences, Springer, pp. 95-110, 2008.

[8] DM Blei, “Probabilistic topic models, ” Communications of ACM, 2012.

[9] Y Linde, A Buzo, RM Gray, “An algorithm for vector quantizer design, ” IEEE Transactions, 1980.

[10] DM Blei, AY Ng, MI Jordan, “Latent Dirichlet allocation, ” the Journal of machine Learning research, 2003.

[11] S Kim, S Narayanan, Sundaram, S, “Acoustic topic model for audio information retrieval, ” Applications of Signal Processing to Audio and Acoustics, WASPAA ‘09, 2009.

[12] A. Gersho, R. M. Gray, “Vector quantization and signal compression, ” Norwell, MA, USA: Kluwer Academic Publishers, 1991.

[13] P Hu, W Liu, W Jiang, Z Yang, “Latent Topic Model Based on Gaussian-LDA for Audio Retrieval, ” Pattern Recognition -Springer, 2012.

[14] Dan Geiger, David Heckerman, “A characterization of the Dirichlet distribution with application to learning Bayesian networks, ” Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 1995.

[15] T. S. Ferguson, “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 1973.

[16] J. Sethuraman, “A Constructive Definition of Dirichlet Priors,”Statistica Sinica, vol. 4, pp. 639–650, 1994.

[17] J. Pitman, “Poisson–Dirichlet and GEM Invariant Distributions for Split-and-Merge Transformations of an Interval Partition,” Combinatorics, Probability and Computing, vol. 11, pp. 501–514, 2002.

[18] D. Blackwell, and J. MacQueen, “Ferguson distributions via Polya urn schemes,” Annals of Statistics, vol. 1, pp. 353-355, 1973.

[19] Landgrebe D, “A survey of decision tree classifier methodology,” Systems, Man and Cybermetics, IEEE Transactions, vol. 1, 1991.

[20] Y Freund, RE Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system science, 1997.

[21] TM Cover, PE Hart, “Nearest neighbor pattern classification, ” Information Theory, IEEE Transactions, 1967.

[22] Wang, J; Lee Y; Chin, Y; Chen Y; Hsieh, W, “Hierarchical Dirichlet Process Mixture Model for Music Emotion Recognition, ” Affective Computing, IEEE Transaction, 2015.

[23] DM Blei, TL Griffiths, MI Jordan, “The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies,” Journal of the ACM, 2010.

[24] Katherine A, Zoubin Ghahramani, “Bayesian nonparametric clustering,” ICML’05 Procssdings of the 22nd international conference on Machine learning, 2005.

[25] MM Shafiei, EE Milios “Latent Dirichlet co-clustering,” Data Mining, ICDM’06. Sixth International Conference, 2006.

[26] Nakanom, M; Ohishi, Y.; Kameoka, H; Mukai, Ryo “Bayesian nonparametric music parse,” Acoustics, Speech and Signal Procssing(ICASSP) International Conference, 2012.

[27] H Zhang, CL Giles, HS Foley, J Yen “Bayesian nonparametric music parse,” Association for the Advancement of Artificial Intelligence, 2007.

[28] Y Panagakis, C Kotropoulos, “Music genre classification via sparse representations of auditory temporal modulations,” Signal Processing Conference, 17th European, 2009.

[29] Panagakis, Y; Kotropoulos, C “Music genre classification via Topology Preserving Non-Negative Tensor Factorization and sparse representations,” Acoustrics Speech and Signal Processing(ICASSP) International Conference, 2010.

[30] HB Ariyaratne, D Zhang, G Lu “A class centric feature and classifier ensemble selection approach for music genre classification,” Structural, Syntactic and Statistical Pattern, 2012.

[31] Chih-Hsun Chou; Bo-Jun Liao, “Music genre classification by analyzing the subband spectrogram,” Information Science, Electronics and Electrical Engineering (ISEEE) International Conference, 2014.

[32] Guitar playing techniques dataset(GPT) http://mac.citi.sinica.edu.tw/GuitarTranscription/

[33] L Su, YH Yang; Li-Fan Yu “Sparse cepstral and phase codes for guitar playing technique classification,” 15th International Society for Music Information Retrieval Conference, 2014.

[34] T-Distributed Stochastic Neighbor Embedding (t-SNE), http://lvdmaaten.github.io/tsne/

[35] L.J.P. van der Maaten; G.E. Hinton “Visualizing High-Dimensional Data Using t-SNE,” Journal of Machine Learning Research 9, 2008.

[36] David M. Blei, Michael I. Jordan “Variational methods for the Dirichlet process,” ICML, Proceedings of the 21th international conference on Machine Learning, 2004.

[37] J Paisley, C Wang, DM Blei “Nested hierarchical Dirichlet processes,” Pattern Analysis and Machine Intelligence, IEEE Transactions, 2015.

[38] M Hoffman, FR Bach, DM Blei “Online learning for latent Dirichlet allocation,” Neural Information Processing System, 2010.

[39] C Wang, JW Paisley, DM Blei, “Online variational inference for the hierarchical Dirichlet process,” International Conference on Artificial Intelligence and Statistics(AISTATS), 2011.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2015-8-26

推文