深度學習用於語音回響抑制之研究

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	林金賢	zh_TW
DC.creator	Jin-Sian Lin	en_US
dc.date.accessioned	2021-1-25T07:39:07Z
dc.date.available	2021-1-25T07:39:07Z
dc.date.issued	2021
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107521087
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	回響通常是由天花板、牆壁和地板的聲音反射所造成的，在我們生活的環境中處處都會有回響的存在。對於正常人耳而言，回響所造成的影響並不明顯，不過對於助聽器或其他聽覺輔具的使用者而言，回響會嚴重影響他們語音接收的品質，即使在安靜的環境下，也可能會聽不清楚。現有的傳統除回響方法，雖然也可以表現出相當不錯的性能，但它們仍都需要已知的環境特性來抑制回響，這在真實環境下很難去實現。現今，深度學習發展迅速，利用大量的訓練資料來訓練深度神經網路(Deep neural network, DNN)便可以得到輸出與輸入之間的非線性關係，改善了傳統方法對環境的依賴性。本論文利用實驗室先前所錄製的TMHINT(Taiwan mandarin hearing in noise test)句子作為實驗語料，模擬了許多不同環境下的回響語料來進行訓練(2160句)及測試(480句)，再從語音中萃取對數功率聲譜(Logarithmic power spectrogram, LPS)作為輸入特徵，讓深度神經網路來進行監督式學習。本實驗中使用的神經網路架構有深層降噪自編碼器(Deep denoise autoencoder, DDAE)與整體式深度與集成學習演算法(Integrated deep and ensemble learning algorithm, IDEA)，並比較他們彼此間的優劣勢及結合其他網路架構所呈現出來的結果，依據不同的訓練目標，網路的性能也不一致。在這我們也比較了映射(Mapping)與遮罩(Masking)方式的區別。為了證實比較結果的可信度，我們使用了國外語音研究上常用的TIMIT語料，加以驗證我們的結果。最後，藉由語音品質感知度(Perceptual evaluation of speech quality, PESQ)與短時客觀語音清晰度(Short time objective intelligibility, STOI)等評估方法來對各項結果做評估，來找出最合適的網路架構及輸出目標。評估結果表明，DDAE與IDEA兩者跟殘差網路(Residual networks)做結合的效益是最佳的(PESQ平均值2.2以上、STOI平均值0.8以上)，而在遮罩目標下，DDAE無論是在架構上或是回響抑制能力上的表現，都明顯優於IDEA。	zh_TW
dc.description.abstract	Reverberation, generally caused by sound reflections from ceilings, floors, and walls, exists everywhere in the environment we live in. For normal human ears, the effect of reverberation is not obvious. However, for the people who need hearing aids or other assistive hearing devices, reverberation significantly affect the quality of speech reception. Even in a noiseless environment, reverberation still makes people with hearing loss unable to hear clearly. Although traditional dereverberation approaches can show reasonably good performance, they still rely on the knowledge of environmental characteristics, which are difficult to be obtained in the real environment. Nowadays, the rapid-growing deep learning is a powerful tool that can be used for dereverberation. By using a large amount of data to train the deep neural networks (DNNs), we can obtain the nonlinear relationship between input and output. Comparing to the traditional methods, DNN eliminates the environment dependence and improve the performance. In this thesis, sentences from TMHINT (Taiwan mandarin hearing in noise test) previously recorded by our research team, are chosen as the speech material for experiments, and simulated the reverberant speech under different conditions for training (2160 sentences) and testing (480 sentences). The logarithmic power spectrum (LPS) was extracted from the speech as the input feature, and the DNN is used for supervised learning. The neural network architecture utilized in this experiment includes the deep denoising autoencoder (DDAE) and the integrated deep and ensemble learning algorithm (IDEA). This research, compares their advantages and disadvantages, and combines with other network architectures. Different training targets with the same network are also compared for the performance. The differences between mapping and masking methods are evaluated. In order to verify the credibility of the comparison results, we also used the TIMIT corpus for experiments. The evaluation methods perceptual evaluation of speech quality (PESQ) and short-time objective voice intelligibility (STOI) are used to assess the results, which give most suitable network architecture and output target. The evaluation results showed that both of the combination of DDAE with residual network and IDEA with residual network were the best among all of the methods. (Average PESQ score is equal to 2.2 or more, while STOI is equal to 0.8 or more). Furthermore, under masking, DDAE offered a better indications of the architecture and dereverberation capability compared to IDEA.	en_US
DC.subject	深度學習	zh_TW
DC.subject	回響抑制	zh_TW
DC.subject	Deep learning	en_US
DC.subject	Dereverberation	en_US
DC.title	深度學習用於語音回響抑制之研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Study of speech dereverberation based on deep learning approach	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 107521087 完整後設資料紀錄