典藏語音強化之研究; A Study of Archive Speech Enhancement

NCU Institutional Repository > 資訊電機學院 > 資訊工程學系 > 研究計畫 > Item 987654321/47134

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/47134

題名:	典藏語音強化之研究;A Study of Archive Speech Enhancement
作者:	王家慶
貢獻者:	資訊工程系
關鍵詞:	數位典藏;語音強化;背景噪音;殘響語音;去殘響濾波器;干擾源;Digital Archive;Speech Enhancement;Background Noise;Reverberant Speech;Dereverberation Filter;Interfering Signal;資訊科學--軟體
日期:	2010-08-01
上傳時間:	2011-07-14 09:59:24 (UTC+8)
出版者:	行政院國家科學委員會
摘要:	本計畫將以兩年的時間研發兩個用於數位典藏語音之強化技術，分別是 (1)進行已錄製語音強化技術之研發; (2)進行高品質語音錄製技術之研發。其中，已錄製語音之強化技術主要著重背景噪音去除及殘響語音去除; 而高品質語音之錄製技術則是利用多麥克風處理，將同時達成背景噪音去除、殘響語音去除及干擾音源去除。在背景噪音去除方面，本計畫將提出一種可處理非穩態噪音之廣義子空間語音增強方法。本計畫將對不同類型噪音，分別採用最小均方差估測器或頻域限制估測器來估計。我們將分別假設語音為拉普拉斯模型及高斯混和模型，求出其最佳之最小均方差估測器; 對於頻域限制估測器，我們將根據聽覺遮蔽效應的概念，計算聽覺遮蔽的門檻值，並且利用此門檻估計值設計廣義子空間頻域限制估測器。對於非穩態噪音，本計畫將計算時頻平滑因子，並依此因子計算語音存在機率，此存在機率可運用先前時間的噪音頻譜來估測目前噪音的頻譜。在殘響語音去除方面，本計畫將利用語音的諧波結構性來估計殘響語音中的直達音。首先，利用基頻估計器找出殘響語音的基頻，再利用基頻產生諧波濾波器。諧波濾波器過濾輸入音源成為具有諧波結構性的語音，且利用其作為直達音的估計值，最後計算估計直達音的平均值與殘響語音的頻譜比值來得到去殘響濾波器。在干擾源去除方面，本計畫利用語音訊號稀疏的特性來估計目標語音及干擾源之混合矩陣。因為語音訊號具有諧頻的性質，我們可以依賴它們的稀疏性並且藉由最大事後機率的方式求出在指定之時頻點上具有最大機率的事件，並依事件狀況的不同運用最佳組合法及啓發式的方法達到未知訊號的恢復，然後將頻域上的分離訊號轉換到時域。最後，我們可藉由人工或自動選擇的方式擷取出目標語音。本期計畫全程共計二年，第一年預定達成目標包括下列七項： 1. 完成典藏語音資料之收集。 2. 完成非穩態噪音估測方法。 3. 完成新式最小均方差估測器與新式頻域限制估測器。 4. 完成基於新式估測器之廣義子空間背景噪音去除方法。 5. 完成強健性基頻估計方法。 6. 完成殘響語音去除方法。 7. 完成整合背景噪音去除及殘響語音去除之已錄製語音強化技術。第二年預定達成目標包括下六項： 1. 完成多通道錄音環境建置。 2. 完成整合階層式分群法和解排列問題以估計混合矩陣。 3. 完成運用MAP求得最佳事件之方法。 4. 完成發展以最佳組合法和啓發式法為基礎之恢復源訊號架構。 5. 完成干擾源去除方法。 6. 完成整合干擾源去除、背景噪音去除及殘響語音去除，建構出高品質語音錄製技術。 A Study of Archive Speech Enhancement In this project, we intend to develop two techniques for enhancing the quality of digital archive speech. These two techniques are (1) technique for enhancing the recorded speech; (2) high quality speech recording technique. In the technique for enhancing the recorded speech, this project focuses on background noise removal and reverberant speech removal. In the high quality speech recording technique, this project will remove the all the background noise, reverberant speech, and interfering signal based on microphone array processing. For background noise removal, this project proposes a generalized subspace approach for non-stationary noise removal. The minimum mean square error (MMSE) or spectral domain constrained (SDC) estimators are used to deal with different noise types. To obtain the best MMSE estimator, speech signal is modeled by Laplacain distribution and Gussian mixture model, respectively. In SDC based estimation, the auditory masking threshold is utilized to design the best SDC estimator. For non-stationary noise, this project calculates the time–frequency smoothing factor first. This factor is employed to compute the speech-presence probability which is able to estimate the current noise spectrum in accordance with the previous noise spectrum. For reverberant speech removal, the harmonic structure of speech is useful to estimate the direct sound in reverberant speech. First, fundamental frequency (F0) is estimated by an F0 estimator. The harmonic filter is then constructed by F0. The harmonic speech is generated by filtering the input speech by harmonic filter. Finally, we use the ratio of average estimated direct sound and reverberant speech spectrum to design a dereverberation filter. For interfering signal removal, this project uses the sparsity to estimate the mixing matrix of target speech and interfering signal. In a time-frequency point, we use maximum a posteriori method to obtain the event with highest probability. In accordance with the event type, an optimal combinatorial approach or a heuristic approach is adopted to reconstruct the unknown original signal. Finally, the frequency domain separated signal is transformed to time domain. The target speech can be identified manually or automatically. This proposal will be completed in two years respectively. The main items in the first year are shown as follows: 1. Complete the archive speech collection. 2. Complete the method to estimate non-stationary noise. 3. Complete the proposed MMSE and SDC estimators. 4. Complete the proposed generalized subspace approach for background noise removal. 5. Complete the robust fundamental frequency estimator. 6. Complete the method to remove reverberant speech. 7. Complete the integration of background noise removal and reverberant speech removal to be a recorded speech enhancement technique. The main items in the second year are shown as follows: 1. Complete the microphone array establishment and the corresponding recorded speech database. 2. Complete the method to estimate the mixing matrix based on hierarchical clustering and permutation solving. 3. Complete the MAP approach to obtain the best event. 4. Complete the optimal combinatorial approach and the heuristic approach to reconstruct unknown original signal. 5. Complete the method to remove interfering signal. 6. Complete the high quality speech recording technique to remove all the background noise, reverberant speech, and interfering signal based on microphone array processing. 研究期間：9908 ~ 10007
關聯:	財團法人國家實驗研究院科技政策研究與資訊中心
顯示於類別:	[資訊工程學系] 研究計畫

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	603	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....