中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/48520
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78937/78937 (100%)
Visitors : 39859472      Online Users : 581
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/48520


    Title: 應用於非監督式音訊轉換偵測之新型方法及特徵參數;New Segmentation Method and Acoustical Features for Unsupervised Audio Change Detection
    Authors: 辜振禹;Zhen-yu Gu
    Contributors: 資訊工程研究所
    Keywords: 語者切割;語者轉換偵測;speaker segmentation;speaker change detection
    Date: 2011-08-23
    Issue Date: 2012-01-05 14:57:00 (UTC+8)
    Abstract: 音訊分割可以分成兩部份,分別為語音分割及環境聲音分割,其目的是將聲音切成多個分段,而每一個分段都只包含單一語者或單一環境聲音。 對於語音分割,本論文主要提出一個新的概念,將傳統語音切割轉換成語者驗證問題。而為解決訓練的資料不足問題,因此採用支持向量機作模型的訓練,由於支持向量機需要耗費較多的訓練時間,因此我們先用較簡單的廣義概似比例作為第一階段找出可能的轉換點,第二階段再由我們提出的支持向量機相鄰音窗相似度演算法作確認,藉此減少運算時間,而實驗結果顯示我們提出的音訊切割方法效果較傳統貝氏資訊準則演算法好。 在音訊特徵參數部分,語音部份我們採用梅爾倒頻譜參數,而環境聲音則因變化較大,因此我們提出非均勻尺度頻率圖參數,此參數採用匹配追蹤演算法對音訊作拆解。環境聲音分割的實驗結果顯示,我們提出的參數較梅爾倒頻譜參數有更好的抗噪能力及鑑別度。 Audio segmentation can be divided into two categories which are speech segmentation and environmental sound segmentation. It divides an audio stream into many segments and there is only one speaker or one environmental sound in each segment. In speaker segmentation, this thesis proposes a new concept that turns traditional speaker change detection problem into speaker verification problem. In order to solve the problem of insufficient training data, we use support vector machine (SVM) to train the speaker models. Because SVM has a computational load in training, we adopt a two stage search strategy. In the first stage, generalized likelihood ratio is used to find the change point candidates. In the second stage, we confirm it by the proposed SVM based adjacent window similarity criterion. In the experimental results, the performance of the proposed SVM based adjacent window similarity criterion is better than conventional Bayesian information criterion (BIC). Considering the acoustical features, we use MFCC to do the speaker segmentation. As for the environmental sound, we propose a feature set based on non-uniform scale frequency map (SFM). This feature is obtained by decomposing an audio signal by matching pursuit algorithm. Experimental results demonstrates that the proposed non-uniform SFM based feature set is more noise robust than MFCC in environmental sound segmentation.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML510View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明