基於知識蒸餾之單通道語音增強;Single channel speech enhancement based on knowledge distillation

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/86849

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/86849

Title:	基於知識蒸餾之單通道語音增強;Single channel speech enhancement based on knowledge distillation
Authors:	高康捷;Kao, Kang Jie
Contributors:	資訊工程學系
Keywords:	單通道語音增強;知識蒸餾;深度神經網路
Date:	2021-10-27
Issue Date:	2021-12-07 13:19:58 (UTC+8)
Publisher:	國立中央大學
Abstract:	近年來深度類神經網路於語音增強領域發展迅速，深度深、層數多的大型神經網路架構可以獲得更好的降噪效果，但在實際應用層面，如即時通訊、即時語音辨識，多需應用在行動裝置、智慧家電等設備上，這些設備的運算效能有限，沒有足夠的資源來進行大量的運算。因此，為了克服這個問題，最新的研究傾向發展低延遲的輕量模型，以較少的參數來獲得同等或更好的效果。本論文以雙信號變換LSTM網路(Dual-Signal Transformation LSTM Network, DTLN)為基礎，提出一知識蒸餾的訓練方法。知識蒸餾中，老師模型(Teacher model)是一訓練好的層數加深、寬度加寬的DTLN模型，學生模型(Student model)則是原模型設置。由於DTLN是由兩個LSTM(Long Short-Term Memory)網路級聯而成，因此，本論文中，老師模型對學生模型中兩部分分別進行蒸餾，實驗結果表明，此方法能夠達到更好的蒸餾效果，使學生模型成為一個參數量相當，降噪效果更好的網路。 ;In recent years, deep neural networks have developed rapidly in the field of speech enhancement. Large-scale neural network architectures with deep depth and many layers can achieve better noise reduction effects. However, at the practical application level, such as instant messaging, real-time speech recognition, and more It needs to be applied to devices such as mobile devices and smart home appliances. These devices have limited computing performance and do not have enough resources to perform a large amount of computing. Therefore, in order to overcome this problem, the latest research tends to develop a low-latency lightweight model to obtain the same or better results with fewer parameters. Based on the Dual-Signal Transformation LSTM Network (DTLN), this paper proposes a knowledge distillation training method. In knowledge distillation, the teacher model is a trained DTLN model with deeper layers and wider width, and the student model is the original model setting. Since DTLN is formed by cascading two LSTM (Long Short-Term Memory) networks, in this paper, the teacher model distills the two parts of the student model separately. The experimental results show that this method can achieve better The distillation effect makes the student model a network with equivalent parameters and a better noise reduction effect.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	169	View/Open

社群 sharing

Loading...