強健性聲音事件辨識之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：61

、訪客IP：3.138.119.68

姓名

林昶宏(Chang Hong Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

強健性聲音事件辨識之研究
(A Study on Robust Sound Event Recognition)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來，環境聲音辨識在家庭自動化應用中已成為一個新的研究主題。在家庭自動化系統中，正確辨識環境中的聲音是執行任務的基礎。然而，真實環境中有外在干擾會導致辨識率低落，例如目標聲音與其他聲音同時出現，或是有環境噪音的干擾。為了處理這兩個問題，在此篇論文中，我們共提出了三套強健性處理方法。我們首先提出了一套混和聲音辨識方法來處理聲音同時出現的問題。對於環境噪音的問題，本論文採用兩種方法來移除噪音的影響。第一種方法是先移除收到訊號中的噪音後，再擷取特徵參數，稱作聲音強化。第二種方法則是在移除噪音的同時也擷取特徵參數，稱作強健性特徵參數擷取。在此篇論文中，對於聲音同時出現的問題，我們提出一個基於無線感測網路下的混和聲音驗證方法。此架構包括基於無線感測網路的聲音分離以及聲音驗證技術。在有噪音的環境下，對於聲音強化的方式，本論文提出了快速子空間聲音增強演算法濾除背景雜訊。對於強健性特徵參數擷取的方式，本論文提出了一套基於非均勻尺度-頻率圖的參數擷取方法。實驗數據顯示出，在有聲音同時出現或是有環境噪音下，我們提出的三種方法與基準方法相比，我們的系統都具有更高的辨識率。

摘要(英)

In recent years, environmental sound recognition has become a new research topic in home automation. In home automation systems, the sound recognized by the system becomes the basis for performing certain tasks. However, there are various disturbances which may cause recognition system to fail in real world applications. For example, a target source is mixed with another sound due to simultaneous occurrence, or the sound received by the applications is exposed to background noise. To resolve these two issues, we totally propose three robust processing methods in this dissertation. We firstly propose a mixed sound verification method to deal with simultaneous occurrence of sounds. For the problem of background noise, this dissertation adopts two approaches to reduce the impact on recognition. The first approach is sound enhancement, which suppresses the noise of received sound before feature extraction. The second approach is to simultaneously remove noise and extract feature (implements feature extraction and denoising simultaneously), called robust feature extraction. To handle the problem of simultaneous occurrences of multiple sounds, this study proposes a framework, which consists of sound separation and sound verification techniques based on a wireless sensor network (WSN). For the problem of reducing noice from the input audio, we propose a fast subspace based sound enhancement method to filter background noise on signal subspace. For the approach of robust feature extraction, we proposed a novel feature extraction approach called nonuniform scale-frequency map for environmental sound recognition. Furthermore, the experimental results demonstrate the robustness and feasibility of the three proposed systems are superior to baseline systems.

關鍵字(中)

★ 強健性聲音事件辨識
★ 混和聲音事件驗證
★ 音訊增強
★ 強健性特徵參數擷取

關鍵字(英)

★ Robust Sound Event Recognition
★ Mixed Sound Event Verification
★ Sound Enhancement
★ Robust Feature Extraction

論文目次

摘要 I
Abstract II
List of Figures V
List of Tables VI
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Sound Recognition 5
2.2 Mixed Sound Separation 7
2.3 Sound Enhancement 9
2.4 Robust Feature Extraction 11
Chapter 3 Mixed Sound Event Verification on Wireless Sensor Network 13
3.1 Introduction 13
3.2 Mixed Sound Event Verification on Wireless Sensor Network 14
3.3 Mixed Sound Separation 16
3.4 Sound Verification 24
3.5 Experimental Results 27
3.6 Summary 33
Chapter 4 Robust Environmental Sound Recognition Using Fast Subspace Based Noise Suppression 34
4.1 Introduction 34
4.2 System Overview 34
4.3 Fast Subspace Based Noise Suppression 35
4.4 Wavelet Subspace Based Features 42
4.5 Experimental Results 45
4.6 Summary 50
Chapter 5 Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Recognition 51
5.1 Introduction 51
5.2 System Overview 52
5.3 Proposed Scale-Frequency Map 53
5.4 Dimensional Reduction of Scale-Frequency Maps 58
5.5 Experimental Results 60
5.6 Summary 69
Chapter 6 Conclusion and Future Work 70
6.1 Conclusion 70
6.2 Future Work 71
Bibliographies 73
Publication List 82

參考文獻

[1] J.-C. Wang, H.-P. Lee, J.-F. Wang, and C.-B. Lin, “Robust environmental sound recognition for home automation,” IEEE Trans. Automation Science and Engineering, vol. 5, no. 1, pp. 25–31, Jan. 2008.
[2] M. Vacher, F. Portet, A. Fleury, and N. Noury, “Development of audio sensing technology for ambient assisted living: Applications and challenges,” International Journal of E-Health and Medical Communications, vol. 2, no. 1, pp. 35–37, Mar. 2011.
[3] M. Vacher, D. Istrate, F. Portet, T. Joubert, T. Chevalier, S. Smidtas, B. Meillon, B. Lecouteux, M. Sehili, P. Chahuara, and S. Meniard, “The sweet-home project: Audio technology in smart homes to improve well-being and reliance,” in Proc. 33rd Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, Boston, Massachusetts, United States, 2011, Aug. 30–Sep. 03, pp. 5291–5294.
[4] A. Fleury, N. Noury, M. Vacher, H. Glasson, and J. F. Serignat, “Sound and speech detection and recognition in a health smart home,” in Proc. 30th Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, Vancouver, British Columbia, Canada, 2008, Aug. 20–25 pp. 4644–4647.
[5] M. A. M. Shaikh, A. R. F. Rebordao, A. Nakasone, H. Prendinger, and K. Hirose, “An automatic approach to virtual living based on environmental sound cues,” in Proc. 3rd Int. Conf. Affective Computing and Intelligent Interaction and Workshops, Amsterdam, Netherlands, 2009, Sep. 10–12, pp. 1–6.
[6] J. Chen, A. H. Kam, J. Zhang, N. Liu, and L. Shue, “Bathroom activity monitoring based on sound,” in Proc. 3rd Int. Conf. Pervasive Computing, Munich, Germany, 2005, May 08–13, pp. 47–61.
[7] S. Chu, S. Narayanan, and C.-C. J. Kuo, “Environmental sound recognition with time-frequency audio features,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142–1158, Aug. 2009.
[8] J.-C. Wang, J.-F. Wang, K. W. He, and C.-S. Hsu, “Environmental sound recognition using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor,” in Proc. Int. Joint Conf. Neural Networks, Vancouver, British Columbia, Canada, 2006, Jul. 16–21, pp. 1731–1735.
[9] S. P. Ebenezer, A. Papandreou-Suppappola, and S. B. Suppappola, “Recognition of acoustic emissions using modified matching pursuit,” EURASIP Journal on Applied Signal Processing, vol. 2004, no. 3, pp. 347–357, 2004.
[10] E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based recognition, search, and retrieval of audio,” IEEE Trans. Multimedia, vol. 3, no. 3, pp. 27–36, Sep. 1996.
[11] J. T. Foote, “Content-based retrieval of music and audio,” in Proc. 1997 SPIE Conf. Multimedia Storage and Archiving Systems II, Dallas, Texas, United States, 1997, Nov. 03, pp. 138–147.
[12] S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” in Proc. 5th ACM Int. Conf. Multimedia, Boston, Massachusetts, United States, 1996, Nov. 18–22, pp. 21–30.
[13] S. Z. Li, “Content-based audio recognition and retrieval using the nearest feature line method,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 5, pp. 619–625, Sep. 2000.
[14] G. Guo and S. Z. Li, “Content-based audio recognition and retrieval by support vector machines,” IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 209–215, Jan. 2003.
[15] C.-C. Lin, S.-H. Chen, T.-K. Truong, and Y. Chang, “Audio recognition and categorization based on wavelets and support vector machine,” IEEE Trans. Speech and Audio Processing, vol. 13, no. 5, pp. 644–651, Sep. 2005.
[16] J. Zheng, G. Wei, and C. Yang, “Modified local discriminant bases and its application in audio feature extraction,” in Proc. Int. Forum on Information Technology and Application, Chengdu, China, 2009, May 15–17, pp. 42–52.
[17] K. Umapathy, S. Krishnan, and S. Jimaa, “Multigroup recognition of audio signals using time-frequency parameters,” IEEE Trans. Multimedia, vol. 7, no. 2, pp. 308–315, Apr. 2005.
[18] K. Umapathy and S. Krishnan, “Time-width versus frequency band mapping of energy distributions,” IEEE Trans. Signal Processing, vol. 55, no. 3, pp. 978–989, Mar. 2007.
[19] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models, 2nd ed. New York, NY: Springer-Verlag, Apr. 1999.
[20] S. Wang, A. Sekey, and A. Gersho, “An objective measure for predicting subjective quality of speech coders,” IEEE Journal on Selected Areas in Communications, vol. 10, no. 5, pp. 819–829, 1992.
[21] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice-Hall, 1993.
[22] J. D. Durrant and J. H. Lovrinic, Bases of Hearing Science, 3rd ed. Baltimore, MD: Lippincott Williams and Wilkins, Jan. 1995.
[23] B. Moore, An Introduction to the Psychology of Hearing, 5th ed. Bingley, United Kingdom: Emerald Group Publishing Ltd., Jan. 2003.
[24] W. A. Yost and R. R. Fay, Auditory Perception of Sound Sources. New York, NY: Springer-Verlag, Nov. 2007.
[25] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228–233, Feb. 2001.
[26] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, Jul. 1997.
[27] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec. 1993.
[28] W. Brent, “Perceptually based pitch scales in cepstral techniques for percussive timbre identification,” in Proc. International Computer Music Conference, Montreal, Québec, Canada, 2009, Aug. 16–21, pp. 121–124.
[29] J. M. Grey and J. W. Gordon, “Perceptual effects of spectral modifications on musical timbres,” Journal of the Acoustical Society of America, vol. 63, no. 5, pp. 1493–1500, 1978.
[30] D. L. Swets and J. J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831–836, Aug. 1996.
[31] Freesound. Available: http://www.freesound.org.
[32] Free Sound Effects Archive. Available: http://www.grsites.com/archive/sounds/.
[33] B. Mailhé, R. Gribonval, F. Bimbot, and P. Vandergheynst, “A low complexity orthogonal matching pursuit for sparse signal approximation with shift-invariant dictionaries,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 2009, Apr. 19–24, pp. 3445–3448.
[34] J.-C. Wang, J.-F. Wang, C.-B. Lin, K.-T. Jian, and W.-H. Kuo, “Content-based audio recognition using support vector machines and independent component analysis,” in Proc. 18th Int. Conf. Pattern Recognition, Hong Kong, China, 2006, Aug. 20–24, pp. 157–160.
[35] A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo, “CLEAR evaluation of acoustic event detection and recognition systems,” in Proc. 1st Int. Evaluation Workshop on Recognition of Events, Activities and Relationships, Southampton, United Kingdom, 2006, Apr. 06–07, pp. 311–322.
[36] K. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press, Aug. 2012.
[37] P. C. Loizou, Speech Enhancement: Theory and Practice, 1st ed. Boca Raton, FL: CRC Press, Jun. 2007. D. R. Raymond, R.C.
[38] Marchany, M. I. Brownfield, and S. F. Midkiff, “Effects of Denial-of-sleep attacks on wireless sensor MAC protocols,” IEEE Trans. Vehicular Technology, vol. 58, pp. 367-380, Jan. 2009.
[39] M. Peng, Y. Xiao, and P. P. Wang, “Error analysis and Kernel density approach of scheduling sleeping nodes in cluster-based wireless sensor networks,” IEEE Trans. Vehicular Technology, vol. 58, pp. 5105-5114, Nov. 2009.
[40] F. Talantzis, A. Pnevmatikakis, and A. G. Constantinides, “Audio-visual active speaker tracking in cluttered indoor environments,” IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, pp. 799-807, Jun. 2008.
[41] J. Nishimura and T. Kuroda, “Versatile recognition using Haar-like feature and cascaded classifier,” IEEE Sensors Journal, vol. 10, pp. 942-951, May 2010.
[42] J. Du and W. Shi, “App-MAC: An spplication-aware event-oriented MAC protocol for multimodality wireless sensor networks”, IEEE Trans. Vehicular Technology, vol. 57, pp. 3723-3731, Nov. 2008.
[43] A. Fleury, M. Vacher, and N. Noury, “SVM-based multimodal recognition of activities of daily living in health smart homes: sensors, algorithms, and first experimental results,” IEEE Trans. Information Technology in Biomedicine, vol. 14, pp. 274-283, Mar. 2010.
[44] C. N. Doukas and I. Maglogiannis, “Emergency fall incidents detection in assisted living environments utilizing motion, sound, and visual perceptual components,” IEEE Trans. Information Technology in Biomedicine, vol. 15, pp.277-289, Mar. 2011.
[45] V. Wan and S. Renals, “Speaker verification using sequence discriminant support vector machines,” IEEE Trans. Speech and Audio Processing, vol. 13, pp.203-210, Mar. 2005.
[46] A. Cichocki and S. Amari, Adaptive Blind Signal and Image processing. Wiley, 2002.
[47] S. Makino, H. Sawada and T. W. Lee, Blind Speech Separation. Springer, 2007.
[48] S. C. Douglas, M. Gupta, H. Sawada, and S. Makino, “Spatio-Temporal FastICA algorithm for the blind separation of convolutive mixtures,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, pp. 1540-1550, Jul. 2007.
[49] H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee and K. Shikano, “Blind Source Separation based on a Fast-Convergence algorithm combining ICA and Beamforming,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, pp. 666-678, Mar. 2006.
[50] A. Belouchrani and M. G. Amin, “Blind Source Separation based on time-frequency signal representation,” IEEE Trans. Signal Processing, vol. 46, pp. 2888-2898, Nov. 1998.
[51] Y. Zhang and M. G. Amin, “Signal averaging of time-frequency distributions for signal recovery in uniform linear arrays,” IEEE Trans. Signal Processing, vol. 48, pp. 2892-2902, Oct. 2000.
[52] J. F. Cardoso, “Blind signal separation : Statistical principles,” IEEE Proc., vol. 86, pp. 2009-2025, Oct. 1998.
[53] K. Todros and J. Tabrikian, “Blind Separation of Independent Sources using Gaussian Mixture Model,” IEEE Trans. Signal Processing, vol. 55, pp. 3645-3658, Jul. 2007.
[54] M. Welling and M. Weber, “A constraint EM algorithm for independent component analysis,” Neural Comput., vol. 13, pp. 677-689, 2001.
[55] R. Courant and D. Hilbert, Methods of Mathematical Physics, Interscience Publishers, 1953.
[56] V. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
[57] J. C. Wang, J. F. Wang, and Y. S. Weng, “Chip design of MFCC extraction for speech recognition,” Integration, the VLSI journal, vol. 32, pp. 111–131, 2002.
[58] M. Vacher, F. Portet, A. Fleury, and N. Noury, “Challenges in the processing of audio channels for ambient assisted living,” in Proc. 12th IEEE Int. Conf. e-Health Networking Applications and Services, Lyon, France, 2010, Jul. 01–03, pp. 330–337.
[59] C. R. Baker, K. Armijo, S. Belka, M. Benhabib, V. Bhargava, N. Burkhart, A. Der Minassians, G. Dervisoglu, L. Gutnik, M. B. Haick, C. Ho, M. Koplow, J. Mangold, S. Robinson, M. Rosa, M. Schwartz, C. Sims, H. Stoffregen, A. Waterbury, E. S. Leland, T. Pering, and P. K. Wright, “Wireless sensor networks for home health care,” in Proc. 21st Int. Conf. Advanced Information Networking and Applications Workshops, Niagara Falls, Canada, 2007, May 21–23, pp. 832–837.
[60] A. Sleman and R. Moeller, “Integration of wireless sensor network services into other home and industrial networks using device profile for web services (DPWS),” in Proc. 3rd Int. Conf. Information and Communication Technologies: From Theory to Applications, Damascus, Syria, 2008, Apr. 07–11, pp. 1–5.
[61] H. Yan, H. Huo, Y. Xu, and M. Gidlund, “Wireless sensor network based E-health system—Implementation and experimental results,” IEEE Trans. Consumer Electronics, vol. 56, no. 4, pp. 2288–2295, Nov. 2010.
[62] P. Gajbhiye and A. Mahajan, “A survey of architecture and node deployment in wireless sensor network,” in Proc. 1st Int. Conf. Applications of Digital Information and Web Technologies, Czech Republic, 2008, Aug. 04–06, pp. 426–430.
[63] H. Chen, C. K. Tse, and J. Feng, “Source extraction in bandwidth constrained wireless sensor networks,” IEEE Trans. Circuits and Systems II: Express Briefs, vol. 55, no. 9, pp. 947–951, Sep. 2008
[64] T. Routtenberg and J. Tabrikian, “MIMO-AR System Identification and Blind Source Separation for GMM-Distributed Sources,” IEEE Trans. Signal Processing, vol. 57, pp. 1717-1730, May. 2009.
[65] S. Winter, W. Kellermann, H. Sawada, and S. Makino, “MAP-based underdetermined Blind Source Separation of Convolutive mixtures by Hierarchical Clustering and -norm minimization,” EURASIP Journal on Advances in Signal Processing, vol. 2007, Article ID 24717, 12 pages.
[66] A. Graps, “An introduction to wavelets,” IEEE Computational Science and Engineering, vol. 2, no. 2, pp. 50-61, 1995.
[67] Wang, Y. Zhao, Y. T. Hou, and Y. L. Li, “A novel construction of SVM compound kernel function,” in Proc. 2010 International Conference on Logistics Systems and Intelligent Management, 2010, 9-10 Jan. vol.3, pp.1462-1465.
[68] L. Parra and C. Spence, “Convolutive blind source separation of non-stationary sources,” IEEE Trans. on Speech and Audio Processing, pp. 320-327, May 2000.
[69] E. Vincent, R. Gribonval and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, pp. 1462-1469, 2006.
[70] T. Kemp, M. Schmidt, M. Westphal, and A. Waibel, “Strategies for automatic segmentation of audio data,” in Proc. Int. Conf. Acoust., Speech, Signal Process., vol. 3, 2000, pp. 1423–1426.
[71] R. C. Luo and O. Chen, “Mobile Sensor Node Deployment and Asynchronous Power Management for Wireless Sensor Networks,” IEEE Trans. Industrial Electronics, vol. 59, no. 5, pp. 2377-2385, May 2012.
[72] Zhang, R. Simon, and H. Aydin, “Harvesting-Aware Energy Management for Time-Critical Wireless Sensor Networks With Joint Voltage and Modulation Scaling ,” IEEE Trans. Industrial Informatics, vol. 9, no. 1, pp. 514-526, Feb 2013.
[73] Caione, D. Brunelli, and L. Benini, “Distributed Compressive Sampling for Lifetime Optimization in Dense Wireless Sensor Networks,” IEEE Trans. Industrial Informatics, vol. 8, no. 1, pp. 30-40, Feb 2012.
[74] P. T. A. Quang and D.-S. Kim, “Enhancing Real-Time Delivery of Gradient Routing for Industrial Wireless Sensor Networks,” IEEE Trans. Industrial Informatics, vol. 8, no. 1, pp. 61-68, Feb 2012.
[75] T. M. Chiwewe and G. P. Hancke, “A Distributed Topology Control Technique for Low Interference and Energy Efficiency in Wireless Sensor Networks,” IEEE Trans. Industrial Informatics, vol. 8, no. 1, pp. 11-19, Feb 2012.
[76] P. Bofill, “Underdetermined blind separation of delayed sound sources in the frequency domain,” Neurocomputing., vol. 55, no. 3-4, 99. 627-641, 2003.
[77] P. Bofill and M. Zibulevsky, “Underdetermined blind source separation using sparse representations,” Signal Processing, vol. 81, pp. 2353-2362, Jun. 2001.
[78] Y. Li, S. I. Amari, A. Cichocki, D. W. C. Ho and S. Xie, “Underdetermined Blind Source Separation Based on Sparse Representation,” IEEE Trans. Signal Processing, vol. 54, pp. 423-437, Feb. 2006.
[79] T. Jebara, R. Kondor, and A. Howard, “Probability product kernels,” Journal of Machine Learning Research, vol. 5, pp. 819-844, Aug. 2004.
[80] B. Ghoraani and S. Krishnan, “Time-Frequency matrix feature extraction and recognition of environmental audio signals,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2197-2209, Sep. 2011.
[81] J. C. Wang, C. H. Yang, J. F. Wang, and H. P. Lee, “Robust speaker identification and verification”, IEEE Computational Intelligence Magazine, vol.2, no. 2, pp. 52-59, May. 2007.
[82] C. H. Yang and J. F. Wang, “Noise suppression based on approximate KLT with wavelet packet expansion,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, pp. I-565–I-568.
[83] Y. Nagata, K. Mitsubori, T. Kagi, T. Fujioka, and M. Abe, “Fast implementation of KLT-based speech enhancement using vector quantization”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2086-2097, Nov. 2006.
[84] J. Huang and Y. Zhao, “A DCT-based fast signal subspace technique for robust speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 747-751, Nov. 2000.
[85] Y. Ephraim and H. L. Van Trees, “A signal supsbace approach for speech enhancement”, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul. 1995.
[86] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 2, pp. 87-95, Feb. 2001.
[87] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, vol. 2, pp. 355-358.
[88] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1979, pp. 208–211.
[89] P. C. Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007.
[90] S. H. Jensen, P. C. Hansen, S. D. Hansen, and J. A. Sorensen, “Reduction of broad-band noise in speech by truncated qsvd,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 439-448, Nov. 1995.
[91] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, Chestnut Hill, MA, 1998.
[92] M. V. Wickerhauser, “Fast approximate Karhunen-Ldve expansions,’’ Yale Univ. May 1990, available in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.1489&rep=rep1&type=pdf.
[93] M. V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software , A K Peters Press, Wellesley, MA, 1994.
[94] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice-Hall, Mar. 1984.
[95] H. Krim, D. Tucker, S. Mallat, and D. Donoho, “On denoising and best signal representation,” IEEE Transactions on Information Theory, vol. 45, Nov. 1999.
[96] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithm for best basis selection,” IEEE Transactions on Information Theory, vol. 38, Mar. 1992.
[97] E. Visser, M. Otsuka, and T. W. Lee, “A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments,” Speech Communications, vol. 41, no. 2, pp. 393-407, October 2003.
[98] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp. 113–120, Apr. 1979.
[99] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109-1121, Dec. 1984.
[100] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33 no. 2, pp.443-445, Apr. 1985.
[101] I. Cohen, “Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator,” IEEE Signal Process. Letters, vol. 9, no. 4, pp. 113-116, Apr. 2002.
[102] J. Huang and Y. Zhao, “A DCT-based fast signal subspace technique for robust speech recognition,” IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 747-751, Nov. 2000.
[103] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Trans. Speech Audio Process., vol. 9, no. 2, pp. 87-95, Feb. 2001.
[104] Y. Nagata, K. Mitsubori, T. Kagi, T. Fujioka, and M. Abe, “Fast implementation of KLT-based speech enhancement using vector quantization,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 6, pp. 2086-2097, Nov. 2006.
[105] C. H. Yang and J. F. Wang, “Noise suppression based on approximate KLT with wavelet packet expansion,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2002, vol. 1, pp. I-565–I-568.
[106] A. Aissa-El-Bey, K. Abed-Mraim and Y. Grenier, “Blind Separation of Underdetermined Convolutive Mixtures Using Their Time-Frequency Representation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, pp. 1540-1550, Jul. 2007.
[107] F. Abrard and Y. Deville, “A time-frequency blind signal separation method a applicable to underdetermined mixtures of dependent sources,” Signal Processing., vol. 85, pp. 1389-1403, Jul. 2005.
[108] H. Sawada, S. Araki, R. Mukai, and S. Makino, “Blind extraction of dominant target sources using ICA and time-frequency masking,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, pp. 2165-2173, Nov. 2006.
[109] V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined convolutive Blind Source Separation via time-frequency masking,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, pp. 101-116, Jan. 2010.
[110] S. Araki, H. Sawada, R. Mukai and S. Makino, “Underdetermined Blind Sparse Source Separation for arbitrarily arranged multiple sensors” Signal Processing., vol. 87, pp. 1833-1847, Feb. 2007.
[111] A. Cichocki, J. Karhunen, W. Kasprzak, and R. Vigario, “Neural networks for blind separation with unknown number of sources,” Neurocomputing, vol. 24, pp. 55-93, Feb. 1999.
[112] A. Rosenberg, C.-H. Lee, and F. Soong, “Cepstral channel normalization techniques for HMM-based speaker verification,” in Proc. ICSLP, 1994, vol. 4, pp. 1835–1838.
[113] M. Holmberg, D. Gelbart, and W. Hemmert, “Automatic speech recognition with an adaptation model motivated by auditory processing,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 43–49, Jan. 2006.
[114] M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Commun., vol. 34, pp. 267–285, 2001.
[115] L. Josifovski, M. Cooke, P. Green, and A. Vizinho, “State based imputation of missing data for robust speech recognition and speech enhancement,” in Proc. Eurospeech, 1999, pp. 2837–2840.
[116] J. F. Gemmeke and B. Cranen, “Using sparse representations for missing data imputation in noise robust speech recognition,” in Proc. EUSIPCO, 2008.
[117] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, 2001.
[118] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, Dec. 1993.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2014-8-27

推文