基於貝氏非參數樹狀結構混合模型之階層式音訊表示法之研究;A Study on Hierarchical Representation of Audio based on Bayesian Nonparametric Tree-Structured Mixture Model

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/68973

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/68973

题名:	基於貝氏非參數樹狀結構混合模型之階層式音訊表示法之研究;A Study on Hierarchical Representation of Audio based on Bayesian Nonparametric Tree-Structured Mixture Model
作者:	羅基宏;Lo,Chi-hung
贡献者:	資訊工程學系
关键词:	貝式非參數混合模型;Bayesian Nonparametric Mixture Model
日期:	2015-08-26
上传时间:	2015-09-23 14:48:01 (UTC+8)
出版者:	國立中央大學
摘要:	將事物以階層式的架構分類是符合人類直覺的一種分類方式，例如購物網站的商品分類或書店的書目分類等。本論文試圖將階層式架構的分類概念引入音訊分類的問題中，因此我們提出貝氏非參數樹狀結構混合模型(Bayesian Nonparametric Tree-structured Mixture Model)。此模型以樹狀結構來表示音訊資料，接近上層根部的節點模擬音訊之間的共通成分，接近下層葉部的節點模擬音訊的獨特成分。這個模型以巢狀式中國餐廳程序(Nested Chinese Restaurant Process, nCRP)作為樹狀結構模型的先驗分布(Prior Distribution)，由資料自動調適決定樹狀結構的寬度與深度，理論上可達成擴展成無限擴張的樹狀結構。這種非監督式的學習(Unsupervised Learning)方式解決了模型選擇(Model Selection)，過度估測(Over-estimation)等等的問題。本論文以吉布斯取樣演算法(Gibbs Sampling Algorithm)來解決模型推論(Model Inference)的問題。透過事後機率的取樣得到音檔在樹狀結構上的特徵，利用這個結果當作是聚類後的特徵參數，最後接上分類器來做音訊分類的實驗。我們使用各種音訊檔案如環境聲音，吉他演奏技巧，音樂類型，音樂子類型作實驗，結果顯示我們的模型在不同類別之間存在聽覺上較為像似的資料庫中，可以有更好的聚類效果，因而提升最後的辨識率。 ;The idea of hierarchically organize things is human intuition. For example, the items organized hierarchically in shopping website or the book store. In our work, we try to bring this idea into the audio file classifiy problem, so we develop the Bayesian nonparametric tree-structured mixture model. This model constructs the tree-structured representation for audio file. The root node of this tree presents the sharing parts between different audio, the left node presents the unique parts for each audio. We use the nested Chinese restaurant process (nCRP) as the prior distribution for the tree-structured model. Our model is automatically adjust the width and depth of the tree and could be extended to the infinite tree theoretically. This unsupervised learning method solved the problem of model selection and the over-estimation. We use the Gibbs sampling algorithm to solve the problem of model inference. According to the posterior probabilities sampling, every audio file has a path on this tree and frame distribution among level on this path. Using this result as the clustering feature, then we put this feature into the classifier to get the recognition result. In our experimentation, we collect many different type of audio file database, like environment sounds, guitar-tech clips, music genre and music sub-genre. The result shows the recognition rate is improved via our proposal model.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	708	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....