基於壓縮感測之語音增強

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：113

、訪客IP：18.117.188.213

姓名

施志豪(Zhih-hao Shih) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於壓縮感測之語音增強
(Speech enhancement based on compressive sensing)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

語增增強一直以來是許多學者嘗試解決的課題，然而時至今日，仍未發展出一個令人滿意可以處理各種不同特性的噪音的方法。由於不適當的錄製環境或錄製裝置的不完美，噪音是不可避免的。而含噪音的訊號會影響後續語音訊號之處理，因此有效的語音增強是重要的。
語音增強可以視為是一種估測問題－從雜訊訊號中準確地估測出語音訊號。我們假設語音滿足某種統計模型，噪音是與語音無相關(uncorrelated)之隨機變數，我們可以利用這個特性並根據某種誤差準則來求得增強後之語音。然而，語音滿足何種統計模型以及要用何種物差準則仍是一個尚在發展的問題。
而近十年，一個新的訊號取樣及重建的方法，壓縮感測被提出，壓縮感測給了我們一個新的估測訊號的方法，因此本文主要探討如何結合壓縮感測來進行語音強化。首先我們將訊號轉至時頻域上，並假設我們可以將該時頻圖轉到一個稀疏的轉化域上。接著我們利用遺失資料插補技術(missing data technique)以及壓縮感測對雜訊的時頻圖做處理。
從我們最後的實驗結果得知，我們的方法在許多噪音下都能有很好的表現，此外，我們也適合用來處理傳統方法無法處理的噪音，最後，我們也進一步探討，我們的方法特別能夠針對某種特性的噪音進行處理。

摘要(英)

Speech enhancement is an active issue which many researchers have devoted to addressing it. However, there is still not a satisfactory method which can deal with different noises. Noise is inevitable, due to the improper recording environmental or imperfect recording device. It is found that the following speech processing would be affected by noise. Therefore, speech enhancement is a very important topic.
We can regard speech enhancement as an estimation problem which we estimate the clean speech from noisy measurement. Assume the speech signal is satisfy some kind of statistic model and noise is an uncorrelated random process. We can estimate the enhance signal according to some distortion measure. However, what kind of speech model and distortion measure should be used is still a developing issue.
In recent years, a new signal acquisition and reconstruction method, compressive sensing has been proposed this decade. Compressive sensing gives us a new sight of estimating the signal. Hence, in the thesis, we explore how to perform speech enhancement by applying compressive sensing.
According to our experimental results, we can find out that the performance of the proposed method performs well in various noise types. Besides, it is much better for dealing with the noise which cannot be addressed well in traditional methods.

關鍵字(中)

★ 遺失資料插補
★ 語音增強
★ 壓縮感測
★ 噪音去除
★ 遺失資料遮罩

關鍵字(英)

★ missing data imputation
★ compressive sensing(CS)
★ speech enhancement
★ noise removal
★ missing data mask

論文目次

摘要 i
Abstract ii
Acknowledgement iii
The List of Figures iv
The List of Tables v
Description of Symbols vi
Contents vii
Chapter 1 Introduction 1
1.1 Background 1
1.2 The Concept of Speech Enhancement 2
1.3 Motivation 3
1.4 Organization of the Thesis 4
Chapter 2 Literature Survey of Speech Enhancement Techniques 6
2.1 Spectral Subtraction (SS) 6
2.2 Wiener Filtering 7
2.3 Spectral Subtraction with MMSE 8
2.4 Signal Subspace Approach 9
2.5 Other Speech Enhancement Methods 10
Chapter 3 Sparse Representation and Compressive Sensing 12
3.1 Sparse Representation 12
3.2 Compressive Sensing 15
Chapter 4 The Proposed Method 20
4.1 Construct an Overcomplete Dictionary 20
4.2 Missing Data Mask 24
4.3 Estimating Missing Data by CS 26
Chapter 5 Experimental Results 33
5.1 Experiment Environment and Installation 33
5.2 CS Imputation 38
5.3 The Comparison of the Performance 44
5.4 The Artificial Tonal Noise 50
Chapter 6 Conclusion and Future Work 55
Reference 56

參考文獻

[1] I. B. Thomas and A. Ravindran, “Intelligibility enhancement of already noisy speech signals,” J. Audio Eng. Soc., vol. 22, pp. 234-236, May 1974.
[2] J. F. Gemmeke, H. V. Hamme, B. Cranen, and L. Boves, “Compressive sensing for missing data imputation in noise robust speech recognition,” IEEE J. Sel. Topics Signal Process., vol. 4, no.2, Apr. 2010.
[3] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans Signal Process., vol. 54, no. 11, Nov, 2006.
[4] S. T. Roweis, “Factorial models and refiltering for speech separation and denoising,” Interspeech 2003.
[5] M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Commun., vol. 34, pp. 267–285, 2001.
[6] L. Josifovski, M. Cooke, P. Green, and A. Vizinho, “State based imputation of missing data for robust speech recognition and speech enhancement,” in Proc. Eurospeech, 1999, pp. 2837–2840.
[7] J. F. Gemmeke and B. Cranen, “Using sparse representations for missing data imputation in noise robust speech recognition,” in Proc. EUSIPCO, 2008.
[8] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
[9] J. Eggert and E. Korner, “Sparse coding and NMF,” in Proc. IEEE Int. Conf. Neural Netw., 2004, pp. 2529–2533.
[10] P. Hoyer, “Non-negative matrix factorization with sparseness constraints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, 2004.
[11] W. Dong, L. Zhang; G.Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive,” IEEE Trans Signal Process., vol. 20, no. 20, pp. 1838-1857, Jul. 2011.
[12] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image Super-Resolution Via Sparse Representation,” IEEE Trans Signal Process. vol.19, no. 11, pp. 2861-2873, Nov. 2010.
[13] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans Signal Process. vol.15, no. 12, pp. 3736-3745, Dec. 2006.
[14] P. Chatterjee and P. Milanfar, “Patch-based near-optimal image denoising,” IEEE Trans Signal Process., vol. 21, no. 4, pp. 1635-1649, Apr. 2012.
[15] L. Vese, G. Sapiro, S. Osher, “Simultaneous structure and texture image inpainting,” IEEE Trans Signal Process., vol. 12, no. 8, pp. 882-889, Aug. 2003.
[16] M. Elad, J. L. Starck, P. Querre, and D. L. Donoho, “Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA),” J. Appl. Comput. Harmon. Anal., vol. 19, pp. 340–358, Nov. 2005.
[17] J. K. Pillai, V. M. Patel, R. Chellappa, and N. K. Ratha, “Secure and robust iris recognition using random projections and sparse representations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1877-1893, Sep. 2011.
[18] L. W. Kang, C. Y. Hsu, H. W. Chen, C. S. Lu, C. Y. Lin, and S. C. Pei, “Feature-based sparse representation for image similarity assessment,” IEEE Trans. Multimedia, vol. 13, no. 5, pp. 1019-1030, Oct. 2011.
[19] J. Wright, A. Y. Yang A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell. vol.31, no. 2, pp. 210, 227, Feb. 2009.
[20] A. Mahalanobis and R. Muise, “Object specific image reconstruction using a compressive sensing architecture for application in surveillance systems,” IEEE Trans. Aerosp. Electron. Syst., vol. 45, no. 3, pp. 1167-1180, Jul. 2009.
[21] J. Wu, F. Liu, L. C. Jiao, X. Wang, and B. Hou, “Multivariate compressive sensing for image reconstruction in the wavelet domain: using scale mixture models,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3483-3493, Dec. 2011.
[22] C. Deng, W. Lin, B. Lee, and C. T. Lau, “Robust image coding based upon compressive sensing,” IEEE Trans. Multimedia, vol. 14, no. 2, pp. 278-290, Apr. 2012.
[23] J. Trzasko and A. Manduca, “Highly undersampled magnetic resonance image reconstruction via homotopic l0-minimization,” IEEE Trans. Med. Imag. vol. 28, no. 1, pp. 106-121, Jan. 2009.
[24] Q. F. Tan, P. G. Georgiou, and S. S. Narayanan, “Enhanced sparse imputation techniques for a robust speech recognition front-end,” IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 8, pp. 2418-2429, Nov. 2011.
[25] J. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals. Piscataway, NJ: IEEE Press, 2000.
[26] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, Apr. 1979.
[27] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proc. IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 1979.
[28] M. Berouti, P. Schwarts, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc. Int. Conf. on Acoust., Speech and Signal Process., pp. 208-211, Apr. 1979.
[29] Y. Ephraim and D. Malah, “Speech enhancement usisng a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans, Acoust., Speech and Signal Process., vol. 33, no. 2, pp. 443-445, Apr. 1985.
[30] Y. Ephraim and H. L. Trees, “A signal subspace approach for speech enhancement,” IEEE Trans. Acoust., Speech and Signal Process., vol. 3, pp. 251-266, Jul. 1995.
[31] D. P. W. Ellis and R. J. Weiss, “Model-based monaural source separation using a vector-qunatized phase-vocoder representation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2006, vol. 5, pp. 957–960.
[32] S. Srinivasan, J. Samuelsson, and W. Kleijn, “Codebook driven shortterm predictor parameter estimation for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan. 2006.
[33] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, Dec. 1993.
[34] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed sensing and redundant dictionaries,” IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 2210–2219, May 2008.
[35] D. Donoho and I. Johnstone, “Ideal spatial adaptation via wavelet shrinkage,” Biometrika, vol. 81, pp. 425–455, 1994.
[36] M. G. Jafari and M. D. Plumbley, “Fast dictionary learning for sparse representations of speech signals,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 5, pp.1025-1031, Sep. 2011.
[37] J. Mairal, F. Bach, and J. Ponce, “Task-Driven Dictionary Learning,” IEEE trans. Pattern Anal. Mach. Intell., vol. 34, no. 4, pp. 791-804, Apr. 2012.
[38] S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci Comp., vol. 20, no. 1, pp. 33–61, 1999.
[39] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society (Series B), vol. 58, pp. 267–288, 1996.
[40] A. C. Gilbert and J. A. Tropp, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, Dec. 2007.
[41] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Appl. Computat. Harmon. Anal., vol. 26, no. 3, pp. 301–321, May 2009.
[42] D. L. Donoho, Y. Tsaig, I. Drori, and J.-C. Starck, “Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit,” Stanford Statistics Dept., Stanford Univ., Stanford, CA, TR-2006–2, Mar. 2006, Preprint.
[43] D. L. Donoho, Compressed Sensing, Manuscript, September 2004.
[44] E. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, no. 12, pp. 4203-4215, Dec. 2005.
[45] A. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, The Noisex-92 Study on the Effect of Additive Noise on Automatic Speech Recognition. Technical Report. Malvern, U.K.: DRA Speech Res. Unit, 1992.

指導教授

王家慶(Jia-ching Wang)

審核日期

2012-8-6

推文