應用於非監督式音訊轉換偵測之新型方法及特徵參數; New Segmentation Method and Acoustical Features for Unsupervised Audio Change Detection

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/48520

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/48520

Title:	應用於非監督式音訊轉換偵測之新型方法及特徵參數;New Segmentation Method and Acoustical Features for Unsupervised Audio Change Detection
Authors:	辜振禹;Zhen-yu Gu
Contributors:	資訊工程研究所
Keywords:	語者切割;語者轉換偵測;speaker segmentation;speaker change detection
Date:	2011-08-23
Issue Date:	2012-01-05 14:57:00 (UTC+8)
Abstract:	音訊分割可以分成兩部份，分別為語音分割及環境聲音分割，其目的是將聲音切成多個分段，而每一個分段都只包含單一語者或單一環境聲音。對於語音分割，本論文主要提出一個新的概念，將傳統語音切割轉換成語者驗證問題。而為解決訓練的資料不足問題，因此採用支持向量機作模型的訓練，由於支持向量機需要耗費較多的訓練時間，因此我們先用較簡單的廣義概似比例作為第一階段找出可能的轉換點，第二階段再由我們提出的支持向量機相鄰音窗相似度演算法作確認，藉此減少運算時間，而實驗結果顯示我們提出的音訊切割方法效果較傳統貝氏資訊準則演算法好。在音訊特徵參數部分，語音部份我們採用梅爾倒頻譜參數，而環境聲音則因變化較大，因此我們提出非均勻尺度頻率圖參數，此參數採用匹配追蹤演算法對音訊作拆解。環境聲音分割的實驗結果顯示，我們提出的參數較梅爾倒頻譜參數有更好的抗噪能力及鑑別度。 Audio segmentation can be divided into two categories which are speech segmentation and environmental sound segmentation. It divides an audio stream into many segments and there is only one speaker or one environmental sound in each segment. In speaker segmentation, this thesis proposes a new concept that turns traditional speaker change detection problem into speaker verification problem. In order to solve the problem of insufficient training data, we use support vector machine (SVM) to train the speaker models. Because SVM has a computational load in training, we adopt a two stage search strategy. In the first stage, generalized likelihood ratio is used to find the change point candidates. In the second stage, we confirm it by the proposed SVM based adjacent window similarity criterion. In the experimental results, the performance of the proposed SVM based adjacent window similarity criterion is better than conventional Bayesian information criterion (BIC). Considering the acoustical features, we use MFCC to do the speaker segmentation. As for the environmental sound, we propose a feature set based on non-uniform scale frequency map (SFM). This feature is obtained by decomposing an audio signal by matching pursuit algorithm. Experimental results demonstrates that the proposed non-uniform SFM based feature set is more noise robust than MFCC in environmental sound segmentation.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	510	View/Open

社群 sharing

Loading...