典藏語音強化之研究 (II);A Study of Archive Speech Enhancement (Ii)

NCU Institutional Repository > 資訊電機學院 > 資訊工程學系 > 研究計畫 > Item 987654321/49598

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/49598

題名:	典藏語音強化之研究 (II);A Study of Archive Speech Enhancement (Ii)
作者:	王家慶
貢獻者:	資訊工程系
關鍵詞:	Digital Archive;Enhancement;Background Noise;Reverberant Speech;Interfering Signal;研究領域：資訊科學--軟體
日期:	2011-08-01
上傳時間:	2012-01-17 19:05:26 (UTC+8)
出版者:	行政院國家科學委員會
摘要:	本計畫將研發兩個用於數位典藏語音之強化技術，分別是 (1)進行已錄製語音強化技術之研發; (2)進行高品質語音錄製技術之研發。其中，已錄製語音之強化技術主要著重背景噪音去除及殘響語音去除; 而高品質語音之錄製技術則是利用多麥克風處理，將同時達成背景噪音去除、殘響語音去除及干擾音源去除。在前期一年的計畫中，我們已完成背景噪音去除方法的雛形，以及殘響語音去除的方法。本期計畫將延續成果，進行背景噪音去除方法的優化以及研發干擾音源去除的方法。在背景噪音去除方法的優化部分，我們將提出兩個改善策略: (1) 改善傳統演算法僅著重 SNR改進的缺點，進而考量人耳聽覺認知的特性，發展感知導向之背景噪音去除架構; (2) 發展改良之非穩態噪音估測方法。在感知導向之背景噪音去除架構方面，首先，我們以廣義子空間方法獲得初估之增強語音，根據聽覺遮蔽效應的概念，計算初估增強語音之聽覺遮蔽門檻值。由於此聽覺遮蔽門檻值是在傅立葉頻域上表達，因此本計畫將聽覺遮蔽門檻由傅立葉頻域轉到特徵域(Eigen Domain)，最後利用此門檻估計值設計廣義子空間頻域限制估測器，確保最後獲得之增強語音其殘餘噪音會在聽覺門檻之下。由於非穩態噪音的估測優劣，是上述感知導向背景噪音去除架構之成敗關鍵，所以本計畫第二個改善策略，即是發展改良之非穩態噪音估測方法。為了有效利用音框間的相依性，我們將在頻域單一頻率上，利用滑動音窗建立共變異矩陣，藉由子空間分解拆解出語音併噪音 (Speech Plus Noise) 子空間與噪音子空間，利用噪音子空間來估測目前的噪音頻譜。此外，考量到環境噪音與語音具有相似度，我們設計一比重因子，決定語音併噪音子空間與噪音子空間兩者的比重。作法上首先萃取三個強健性參數，再利用支持向量機進行分類的動作，其與超平面的距離將決定比重的大小。在干擾源去除方面，本計畫利用語音訊號稀疏的特性來估計目標語音及干擾源之混合矩陣。因為語音訊號具有諧頻的性質，我們可以依賴它們的稀疏性並且藉由最大事後機率的方式求出在指定之時頻點上具有最大機率的事件，並依事件狀況的不同運用最佳組合法及啓發式的方法達到未知訊號的恢復，然後將頻域上的分離訊號轉換到時域。最後，我們可藉由人工或自動選擇的方式擷取出目標語音。本期計畫預定達成目標包括下列十一項： 1. 完成強健型參數設計和支持向量機的訓練及測試。 2. 完成改良之非穩態噪音估測方法。In this project, we intend to develop two techniques for enhancing the quality of digital archive speech. These two techniques are (1) technique for enhancing the recorded speech; (2) high quality speech recording technique. In the technique for enhancing the recorded speech, this project focuses on background noise removal and reverberant speech removal. In the high quality speech recording technique, this project will remove all the background noise, reverberant speech, and interfering signal based on microphone array processing. In the previous year (A Study of Archive Speech Enhancement (I) ), we have completed a reverberant speech removal method and a prototype of background noise removal method. This year (A Study of Archive Speech Enhancement (II) ), we plan to optimize the background noise removal method and develop an interfering signal removal method. For optimizing the method of background noise removal, we present the following two strategies: (1) Perceptual-based subspace speech enhancement: This strategy improves the drawback of conventional speech enhancement approaches which only focus on SNR improvement; (2) Improved non-stationary noise estimation. For the perceptual-based subspace speech enhancement, rough estimated speech is obtained by generalized subspace approach first. After using rough estimated speech to calculate the FFT-domain auditory masking, a transform from FFT-domain into eigen-domain is applied to the auditory masking. Finally, the spectral domain constrained estimator of the generalized subspace approach is designed based on the auditory masking. This strategy makes residual noise fall below the auditory masking. Non-stationary noise estimation is essential to the above perceptual-based subspace speech enhancement. Therefore, the second strategy in this project is to develop an improved non-stationary noise estimation method. To exploit the frame dependence, this project uses sliding window to calculate covariance matrix and perform eigen-decomposition in each FFT frequency. The obtained noise subspace is used to estimate the noise spectrum. Besides, we design a weighting factor to decide the weighting between the speech plus noise subspace and noise subspace. For this purpose, three robust features are adopted to train a support vector machine first. The distance between the test feature vector and the hyperplane determines the weighting factor. For interfering signal removal, this project uses the sparsity to estimate the mixing matrix of target speech and interfering signal. In a time-frequency point, we use maximum a posteriori method to obtain the event with highest probability. In accordance with the event type, an optimal combinatorial approach or a heuristic approach is adopted to reconstruct the unknown original signal. Finally, the frequency domain separated signal is transformed to time domain. The target speech can be identified manually or automatically. The main items of this project are shown as follows 研究期間：10008 ~ 10107
關聯:	財團法人國家實驗研究院科技政策研究與資訊中心
顯示於類別:	[資訊工程學系] 研究計畫

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	454	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....