基於壓縮感測於語音增強及盲訊號源分離之研究;A Study on Speech Enhancement and Blind Source Separation Using Compressive Sensing

NCU Institutional Repository > 資訊電機學院 > 資訊工程學系 > 研究計畫 > Item 987654321/62961

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/62961

題名:	基於壓縮感測於語音增強及盲訊號源分離之研究;A Study on Speech Enhancement and Blind Source Separation Using Compressive Sensing
作者:	王家慶
貢獻者:	國立中央大學資訊工程系
關鍵詞:	資訊科學;軟體
日期:	2012-12-01
上傳時間:	2014-03-17 14:15:24 (UTC+8)
出版者:	行政院國家科學委員會
摘要:	研究期間：10108~10207;Compressive sensing is a highly interesting research topic in international academia. It can reconstruct the original signal efficiently by merely sampling part of the signal samples. Owing to this property, the sampling number can be less than the Nyquist rate. Compressive sensing is still a brand new research topic; therefore, speech processing researches based on compressive sensing is quite few. The goal of this project is to propose new speech enhancement and blind source separation methods using compressive sensing. This project is performed in the following three steps: developing a speech enhancement technique based on compressive sensing, developing a blind source separation technique based on compressive sensing, and improving the previous two techniques. The first step of the project is to develop the speech enhancement technique based on compressive sensing. First, we will establish the sparse representation model for speech signal. Considering the frame-based power spectrum, we can train an overcomplete dictionary by the iterative procedures of sparse coding and dictionary updating. This overcomplete dictionary will then be used for performing compressive sensing. For noisy speech, we develop a missing data mask technique, which includes non-stationary noise estimation and SNR estimation of each time-frequency point of the noisy speech. Reliable time-frequency points can be obtained by applying missing data mask. With the overcomplete dictionary, the enhanced speech spectrum is then reconstructed by using compressive sensing and an adjustment procedure. The second step of the project is to develop the blind source separation technique based on compressive sensing. In a microphone array, speech sources from different channels are mixed. This project develops proper spectrum-related features so that different speech sources can be separated. The extracted features are clustered and outliers are deleted by expectation maximization (EM) algorithm. In each frequency bin, each speech source is extracted from the corresponding mask. The permutation problem in blind source separation is solved by estimating direction of arrival (DOA) of each speech source. Therefore, the partial spectrum of each speech source can be obtained. With the overcomplete dictionary, the whole spectrum of each speech source is then reconstructed by compressive sensing. The third step of the project is to improve the previous two techniques. In the previous speech enhancement step, the overcomplete dictionary mentioned above is fixed. It does not consider the property of the current processed speech. Hence, we will develop an adaptive dictionary that will enable the processed speech to express more sparsity. As the dictionary cannot reconstruct the noise, the unconstructed part of the noisy speech can be used as noise estimation. This method can overcome the noise overestimating drawback. For the blind source separation, this project will develop a multi-stage compressive sensing. For each time-frequency point generated by the compressive sensing, a confidence measure is constructed by observing whether its total power is similar to that of the mixing speech. The time-frequency points with a high confidence measure will also be used as reliable measurements for the next stage compressive sensing. Furthermore, this project will develop a multi-range selection technique for the compressive sensing in blind source separation. This technique provides more flexibility and more measurements by training multi-frame and multi-frequency band dictionaries. This project span a total of three years, and the main objectives of the first year are listed below: 1. To develop the sparse representation model for speech signal. 2. To build a proper overcomplete dictionary from speech database. 3. To develop a non-stationary noise estimation technique. 4. To develop a missing data mask technique. 5. To develop a speech enhancement technique based on compressive sensing. The main objectives of the second year are listed below: 1. To develop a feature set for blind source separation. 2. To develop a clustering technique and an outlier elimination technique based on EM. 3. To solve the permutation problem of blind source separation by DOA. 4. To construct partial spectrum of each speech source. 5. To develop a blind source separation technique based on compressive sensing. The main objectives of the third year are listed below: 1. To develop an adaptive dictionary technique for speech enhancement. 2. To develop a noise estimation technique using compressive sensing for speech enhancement. 3. To develop a multi-stage compressive sensing technique for blind source separation. 4. To develop a multi-range compressive sensing technique for blind source separation.
關聯:	財團法人國家實驗研究院科技政策研究與資訊中心
顯示於類別:	[資訊工程學系] 研究計畫

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	341	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....