|Abstract: ||本研究的目的是在可自動情境分類與補償之雙麥克風除噪系統的後端加入語音增強功能，主要是為了讓雙麥克風系統能夠藉由語音增強的功能，進一步提高對訊號的除噪效果，讓整個系統的輸出能夠有更高的語音理解度。語音增強系統主要分為噪音估測策略和語音估測函數兩部分，研究中所使用的噪音估測策略: 最小統計法(Minimum Statistics, MS)、最小控制遞迴平均(Minima-Controlled Recursive Averaging, MCRA)、改良最小控制遞迴平均(Improved Minima-Controlled Recursive Averaging, IMCRA)、Loizou改良最小控制遞迴平均(MCRA-L)、變異量控制平滑係數(Constrained Variance Spectral Smoothing, CVS)、正反向最小控制遞迴平均法(MCRA-FB)；語音估測函數:最大相似度策略(Maximum-Likelihood, ML)、對數頻譜振幅估測(Log-Spectral Amplitude, LSA)、最大後驗振幅估測(Maximum A Posteriori Amplitude, MAPA)、改良版韋納濾波器(Wiener-type)和韋納濾波器(Wiener Filter)。|
在本研究中使用MATLAB(The MathWorks, Natick, Massachusetts, USA)來進行軟體上的語音增強系統的模擬，軟體模擬主要是針對不同訊噪比(Signal-to-Noise Ratio, SNR)的噪音語音輸入訊號進行語音增強處理後語音品質的評估，然後評選出最好的語音增強系統的搭配；最後將語音增強系統結合可自動情境分類與補償之雙麥克風除噪系統，將其實現在TMS320C6713開發板(Texas Instruments, Dallas, Texas, USA)，並與未加入語音增強前的除噪系統進行語音品質的評估比較，語音品質的評估主要是使用語音品質客觀評量(Perceptual Evaluation of Speech Quality, PESQ)與主觀的語音接收閥值(Speech Reception Threshold, SRT)做為評估的指標，輸入訊號的SNR範圍落在30dB到-30dB之間。
在客觀評量方面，軟體模擬結果顯示，當使用CVS搭配上MAPA時，在輸入訊號的SNR值為30 dB的情況下，對於PESQ也有0.45的改善，而在訊號的SNR值為10 dB時，PESQ更有高達0.65的改善，為了在開發板上進行即時運算，在硬體上只能使用MCRA搭配上MAPA的語音增強系統，硬體上實現上實驗結果顯示，在SNR值為30 dB的情況下造成了PESQ下降0.36，而在SNR值低於10 dB以下時，由於自動情境分類系統會自動開啟方向性麥克風，此時在方向性麥克風與語音增強的雙重作用下，能有效的降低語音的失真與提高語音品質，在SNR值為0 dB時，PESQ則有了最高0.27的提升。
在主觀評量方面，使用HINT Pro聽力檢查儀(Bio-logic, Chicago, IL, USA)對五位年齡介於23到26歲之間的受測者進行在不同噪音環境下的SRT測試，實驗結果顯示，受測者的SRT平均上升了8.54dB，加入語音增強系統後SRT不但沒有改善，反而還變得比原本差了，這是因為經語音增強處理後的聲音音量變得太小，導致語音品質雖然改善了，可是SRT卻不降反升。由以上的實驗結果可以驗證，使用較短的音框長度時，加入語音增強系統後雖然會在低噪音環境下造成些許的失真，不過在高噪音環境下仍然能夠有效的提升語音品質，而在使用較長的音框長度且音框間有疊合時，語音增強的效果更是能大幅度的提高語音理解度，由於為了能夠在開發板上即時運算，只能使用較短的音框長度，如果能加入放大器，更能使整個系統在實際運用上，達到跟客觀評量一樣的效果。
;The purpose of this research was to add a speech enhancement process that could further improve speech intelligibility and the performance of automatic scene classification and auto-matching noise reduction system after the application of the adaptive directional microphone strategy. The speech enhancement system is divided into two parts, one is the noise-estimation strategy and another the speech-estimation function. Noise-estimation algorithms used in the research are: Minimum Statistics (MS), Minima-Controlled Recursive Averaging (MCRA), Improved Minima-Controlled Recursive Averaging (IMCRA), Minima-Controlled Recursive Averaging-Loizou (MCRA-L), Constrained Variance Spectral Smoothing (CVS), Forward-Backward MCRA(MCRA-FB); Speech-estimation function: Maximum-Likelihood (ML), Log-Spectral Amplitude (LSA), Maximum A Posteriori Amplitude (MAPA), Wiener-type, Wiener Filter.
In this research, The MATLAB (The MathWorks, Natick, Massachusetts, USA) software was first used to simulate the speech enhancement system. The simulation was mainly to evaluate the speech quality of the signal after speech enhancement process with different signal-to-noise ratio (SNR) of the input speech noise signal, and then to select the best combination of the speech enhancement system. Finally, the selected speech enhancement system was implemented with automatic scene classification and auto-matching noise reduction system in TMS320C6713 DSP Starter Kit (Texas Instruments, Dallas, Texas, USA), and compared with the output signal in the original noise reduction system. To show the performance of the selected speech enhancement system, the objective perceptual evaluation of speech quality (PESQ) approach and the subjective speech reception threshold (SRT) were further used to evaluate the quality of speech with the SNR range between 30dB to -30dB.
In the objective evaluation, the simulated results showed that the PESQ score was increased by 0.45 when the speech enhancement CVS with MAPA was used for the input signal with 30dB SNR and by 0.65 for 10 dB SNR. For the hardware implementation, only the speech enhancement MCRA with MAPA was used for real-time processing. The experimental results indicated that speech enhancement system could decrease the speech quality by 0.36 for the input signal with 30dB SNR. When the SNR was below 10dB, the automatic scene classification system would automatically select the function of microphone noise reduction strategy. With the speech enhancement system, our overall hardware implementation could effectively reduce speech distortion and improve speech quality. The PESQ score was increased by 0.27 for the input signal with 0 dB SNR.
The SRT from five normal hearing subjects (between 23 to 26 years old) in different noise conditions were measured with the HINT Pro system (Bio-logic, Chicago, IL, USA) for subjective evaluation. Our experimental results showed that speech enhancement could not improve the SRT of the subjects, but become worse than original system. The average SRT of the subjects was increased by 8.54dB because the volume of the signal processed by the speech enhancement system became too small, even though the objective speech quality was improved. The above-mentioned experimental results suggested that the speech enhancement system could provide better speech quality in high SNR when the system used shorter frame length despite of some distortion in low SNR. Nevertheless, the speech enhancement system was able to greatly improve speech intelligibility when the system used longer frame length. If the amplifier stage was included in the system, the whole system could achieve the same performance as that of the objective evaluation.