博碩士論文 107521605 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:28 、訪客IP:3.20.238.187
姓名 女哲藹(Harisma Khoirun Nisa)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 應用於人工電子耳編碼策略之H-ELM架構的語音回響消除法
(Speech Dereverberation Based on H-ELM framework for Cochlear Implant Coding Strategy)
相關論文
★ 獨立成份分析法於真實環境中聲音訊號分離之探討★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性★ 教學用電腦模擬生理系統之建構
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 在現實環境中,人類的語音會被背景噪聲與回響所干擾,而對於電子耳的使用者來說,影響更是嚴重,因為回響會降低電子耳接收的語音品質與清晰度。本研究的目的是使用深度學習來增強語音的清晰度及品質。階層式極限學習機(Hierarchical Extreme Learning Machine, HELM)架構包含了是Original HELM與 Highway HELM,兩者皆能利用各個不同的回響環境進行快速訓練來有效地抑制回響。研究中使用了映射目標和理想比率遮罩(Ideal Ratio Masking, IRM)來作為HELM的訓練目標,並利用台灣地區噪音下漢語語音聽辨測試(Taiwan Mandarin Hearing in Noise, TMHINT)語料以及短時客觀與音理解度(Short-Time Objective Intelligibility, STOI)評估HELM的性能。實驗結果顯示,在短時客觀與音清晰度(STOI)的評估指標下,使用映射目標時,改善幅度可從0.677至0.683,而遮罩目標的改善幅度則是0.677至0.641。不過兩種架構對於回響抑制的結果並無明編碼顯差異。Original HELM及Highway HELM改善幅度分別是0.683至0.706、0.683至0.707。以HELM架構抑制回響後的語音更進一步地經過人工電子耳電子耳編碼策略處理,包括了進階聯合編碼(advanced combination encoder, ACE)、包絡增強 (Envelope Enhancement, EE) 、基本頻率調變(Fundamental frequency modulation, F0mod)等方法,以模擬電子耳使用者的聆聽表現。結果顯示採用映射的HELM架構可改善有效改善ACE及EE策略的言語理解度。
摘要(英) Human speech activity in the real condition is distorted by background noise and reverberant conditions, which affects the speech intelligibility and speech quality especially for cochlear implant (CI) users. Environmental noise especially in reverberant condition represents one of the challenges for CI user speech understanding in everyday life. The purpose of this study is to increase the intelligibility and perceived quality of the speech component using machine learning. The Hierarchical Extreme Learning Machine (HELM) framework, including HELM original and HELM Highway, demonstrated the attenuation of reverberation which have effectively and quickly learning. Feature learning based on training target mapping and ideal ratio masking (IRM) were applied on this framework to evaluate the performance of speech enhancement. The Taiwan Mandarin Hearing in Noise (TMHINT) dataset and short-time objective intelligibility (STOI) test were used to evaluate the performance of the HELM framework. The experimental results showed that average STOI scores of the mapping training target (0.677 to 0.683) achieved better results compared to masking training target (0.677 to 0.641) to attenuate reverberant effect. However, both framework HELM original (0.683 to 0.706) and HELM Highway (0.683 to 0.707) had no significant effect on the result. The deverberant speech processed by the HELM framework, was further processed by the cochlear implant sound coding strategies. Advanced Combination Encoder (ACE), Envelop Enhancement (EE) and Fundamental Frequency (F0mod), to simulate the listening performance of CI users. The results showed that HELM mapping framework could improve speech intelligibility in both ACE and EE strategies.
關鍵字(中) ★ 階層式極限學習機(HELM)
★ 回響
★ 映射目標
★ 盲法目標
★ 特徵學習
★ 電子耳
關鍵字(英) ★ Hierarchical extreme Learning Machine (HELM)
★ dereverberation
★ feature learning
★ mapping target
★ masking target
★ CI strategies
論文目次 TABLE OF CONTENT
摘要 i
ABSTRACT ii
ACKNOWLEDGEMENTS iii
TABLE OF CONTENT iv
LIST OF FIGURE vi
LIST OF TABLES vii
CHAPTER 1 1
1.1. Background and Motivation 1
1.2. Literature Review 3
1.3. Objective of the thesis 4
1.4. Thesis Outlines 5
CHAPTER 2 6
2.1. Speech Dereverberation 6
2.1.1. Traditional Dereverberation 6
2.1.2. Machine Learning Base Speech enhancement 7
2.2. Extreme Learning Machine 8
2.3. Hierarchical Extreme Learning Machine 12
2.4. Cochlear Implant Strategy 16
2.4.1. Advanced Combination Encoder (ACE) strategy 17
2.4.2. Fundamental Frequency modulation (F0mod) 18
2.4.3. Envelop Enhancement Strategy 19
CHAPTER 3 21
3.1. HELM framework for speech dereverberation 21
3.2. Feature Training Target 24
3.2.1 Mapping Feature learning 25
3.2.2 Masking training target 26
3.3. Reverberation configuration 28
3.4. Dataset 28
3.5. Investigate Speech dereverberation result in sound coding strategies 29
3.6. Evaluation Metric 30
3.6.1. Short-term objective intelligibility (STOI) 30
3.6.2. Log Spectral Distance (LSD) 31
CHAPTHER 4 32
4.1. Experimental study and evaluation result 32
4.1.1. Different Feature training target 32
4.1.2. Effect window size and Neuron Layer 34
4.1.3. Evaluation with different speaker number 36
4.1.4. Speech dereverberation with sound coding strategies 37
4.2. Discussion 41
4.2.1. The effect of window size 41
4.2.2. Comparison the HELM framework and IDEA for speech dereverberation 42
4.2.3. The investigating of similarity signal on different sound coding strategies 44
CHAPTHER 5 48
5.1. Conclusion 48
5.2. Future Work 49
REFERENCES 50
APPENDICES
參考文獻 REFERENCES
Ariyanti, W. (2020). Ensemble and Multimodal learning for pathologiacal Voice. Taoyuan: Master Thesis National Central University.
ASHA, A. S.-L.-H. (2020, October 28). American Speech-Languange-Hearing Association. Retrieved from Definition of Communication: https://www.asha.org/NJC/Definition-of-Communication-and-Appropriate-Targets/
Bhat, G., Shankar, N., Reddy, C. K., & Panahi, I. M. (2019). A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone. IEEE Access, 78421-78433.
Bradley, J. S., & Sato, H. (2003). On the importance of early reflection for speech in rooms. Journal of the Acoustical Society of America, 113, 3233-3244.
CD, J., & N, A. (2019). Development and comparison of Extreme Learning machine and multi-layer perceptron neural network models for predicting optimum coagulant dosage for water treatmen. Journal of Physics: Confernece Series, 1-15.
Chu, K., Throckmorton, C., Collins, L., & Mainsah, B. Using Machine Learning to mitigate the effect of reverberation and noise in cochlear implant. Proceeding of Meeting on acoustic, 175th Meeting of the Acosutical Society of America , 1-13.
Chung, K., Nelson, L., & Teske, M. (2012). Noise reduction technologies implemented in head-worn preprocessor for improving cochlear implant performance in reverberant noise fields. Hearing and Research, 291, 41-51.
Crowson, M. G., Lin, V., Chen, J. M., & Chan, T. C. (2020). machine learning and Cochlear Implantation- A structured Review of Opportunities and Challenges. Journal Otology adn Neurotology , e36-e45.
Delfarah, M., & Wang, D. (2017). Feature for Masking-based Monaural Speech Speration in Reverberant Condition. IEEE Transaction on Audio, Speech, and Language Processing, 1085-1094.
Desmond, J. M., Collins, L. M., & Throckmorton, C. S. (2014). The effect of reverberation self-and overlap-masking on speech recognition in cochlear implant. The Journal of the Acoustical Society of America, 304-310.
Ding, S., Xu, X., & Nie, R. (2013). Extreme learning machineand its application. Neural Computing and Applications, 549-556.
Falk, T. H., Parsa, V., Santos, J. F., Arehart, K., Hazrati, O., Huber, R., . . . Scollies, S. (2015). Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices. IEEE SIGNAL PROCESSING MAGAZINE, 115-124.
Geurts, L., & Wouters, J. (1999). Enhancing the speech envelope of continuous interleaved sampling processors for cochlear implants. The Journal of the Acoustical Society of America, 2476-2484.
Goehring, T., Bolner, F., Monaghan, J. J., Djik, B. v., Zarowski, A., & Bleeck, S. (2017). Speech Enhancement based on neural Network improves speech intelligibility in noise for cochlear implant users. Hearing Research, 183-194.
Han, K., Wang, Y., Wang, D., Woods, W., Merks, I., & Zhang, T. (2015). Learning Spectral Mapping for Speech Dereverberation and Denoising. IEEE Transactions on Audio, Speech and Language Processing, 982-992.
Hazrati, O., & Loizou, P. C. (2013). Reverberation suppression in cochlear implants using a blind channel-selection strategy. Journal Acoustical Society of America, 133(6), 4188–4196.
Hazrati, O., Lee, J., & Loizou, P. C. (2013a). Blind binary masking for reverberation suppression in cochlear implant. Journal Acoustical Society of America, 133, 1607–1614.
Hazrati, O., sadjadi, S. O., Loizou, P., & Hansen, J. H. (2013). Simultaneous suppression of noise and reverberation in cochlear Implant using ratio masking strategy. Journal Acoustic Society, 3759-3765.
Huang, G. B. (2015). What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computer, 263-278.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. 2004 INternational Joint Conference on Neural Network , 985-990. Budapest: IEEE.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. h. (2006). Extreme learning machine Theory and applications. Neurocomputing, 489-501.
Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental Study on Extreme learning machine Applications for Speech Enhancement. IEEE Access.
Hussain, T., Siniscalchi, S. M., Wang, H.-L. S., Tsao, Y., Salerno, V. M., & Liao, W.-H. (2020). Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation. IEEE Transactions on Cognitive and Developmental Systems, 744-758.
Jinxian, L. (2021). Study of speech dereverebration based on deep learning approach. Taiwan: Master Thesis National Central University.
Kokkinakis, K., & Loizou, P. C. (2010). Multi-microhone adaptive noise reduction strategies for coordinated stimulation in bilateral cochlear implant devices. Journal of the Acoustical Society of America, 127(5), 3136–3144.
Kokkinakis, K., & Loizou, P. C. (2011). The impact of reverberant self-masking and overlap-masking effect on speech intelligibility by cochlear implant listener (L). Journal Acousitical Society of America, 130, 1099-1102.
Kokkinakis, K., Hazrati, O., & Loizou, P. C. (2011). A channel-selection criterion for suppressing reverberation in cochlear implant. Journal Acoustical Society of America, 129, 3221–3232.
Koning, R., & Wouters, J. (2012). The Potential of Onset enhancement for increased speech intelligibility in auditory protheses. The Journal of the Acoustical Society of America, 2569-2581.
Lai, Y. H., Tsao, Y., Lu, X., Chen, F., Su, Y. T., Chen, K. C., . . . Lee, C. H. (2018). Deep Learning Based Noise Reduction Approach to improve speech intelligibility for Cochlear Implant Recipient. Ear and Hearing.
Lai, Y.-H., Chen, F., Wang, S.-S., Lu, X., Tsao, Y., & Lee, C.-H. (2017). A Deep Denoising Autoencoder Approach to improving Intelligibilty of Vocoded Speech in Cochlear Implant Simulation. IEEE Transaction on Biomedical Engineering, 64.
Lee, W.-J., Wang, S.-S., Chen, F., Lu, X., Chien, S.-Y., & Tsao, Y. (2018). Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), , 5454-5458. Calgary,AB.
Li, R., Li, T., Sun, X., Sun, X., & Zhao, F. (2020). Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments. Applied Acoustic, 1-17.
Loizou, P. C. (2007). Speech Enhancement Theory and Practice. CRC Press.
McGovern, S. (2004). A Model for Room Acoustic. 1-13.
Meister, H., Walger, M., Lang-Roth, R., & Müller, V. (2020). Voice fundamental frequency differences and speech recognition with noise and speech maskers in cochlear implant recipients. The Journal of the Acoustical Society of America, 19-24.
Milczynski, M., Wouters, J., & Wieringen, A. v. (2009). Improved fundamental frequency coding in cochlear implant signal processing. The Journal of the Acoustical Society of America, 2260-2271.
Monaghan, J. J., & Seeber, B. U. (2016). A method to enhance the use of interaural time differences for cochlear implant in reverberant environtment. Journal Acoustical Society of America, 140, 1116–1129.
Nisar, S., Khan, O. U., & Tariq, M. (2016). An Efficient Adaptive Window Size Selection Method for Improving Spectrogram Visualization. Computational Intelligence and Neuroscience, 1-14.
Noisser, S. A., Wall, J., Moniri, M., Glackin, C., & Cannings, N. (2020). Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures. 2020 International Joint Conference on Neural Network (IJCNN), 1-8. Glasgow, United Kingdom: IEEE.
Odelowo, B. O., & Anderson, D. V. (2017, October 15-18). Speech Enhancement Using Extreme Learning Machines. IEEE Workshop on Application of Signal Processing to Audio and Acoustic, 200-204.
P. Wang, Y.Wang, H.LIu, Y.Sheng, X.Wang, & Z.Wei. (2013). Speech enhancement based on auditory masking properties and log-spectral distance. Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, 1060-1064. Dalian: IEEE.
Paliwal, K., & Wojcicki, K. (2008). Effect of analysis window duration on speech intelligibility. IEEE Signal Processing Letters, 785-788.
Pandey, A., & Wang, D. (2020). Learning complex spectral mapping for Speech Enhancement with Improved Cross-corpus Generalization. INTERSPEECH 2020 (4511-4515). Shanghai, China: ISCA.
Parameswaran, K. (2018). Objective Assessment of Machine Learning Algorithms for. Canada: Electronic Thesis and Dissertation Repository, The University of Western Ontario.
Poissant, S. F., Whitmal, N. A., & Freyman, R. L. (2006). Effect of reverberation and masking on speech intelligibility in cochlear implant simulation. Journal Acoustical Society of America, 119, 1606–1615.
Prell, C. G., & Clavier, O. H. (2017). Effect of noise on speech recognition : Challenges for comunication by service members. Hearing Research, 76-89.
Qazi, O. u., Dijk, B. v., Moonen, M., & Wouters, J. (2012). Speech Understanding Performance of Cochlear Implant Subject Using Time-Frequency Masking Based Noise Reduction. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 59, 1364-1373.
Rao, L., & Yang, J. (2020). Speech Dereverberation Based on Improved Wesserstain Generative Adversarial Network. Journal of Physics : Conference Series.
Roman, N., & Woodruff, J. (2013). Speech intelligibility in reverberation with ideal binary masking : Effects of early reflections and signal-to-noise ratio threshold. Journal Acoustical Society of America, 133, 1707–1717.
Sun, L., Du, J., Dai, L.-R., & Lee, C.-H. (2017). Multiple-Target Deeep Learning for LSTM-RNN based Speech Enhancement. 2017 Hands-Free Speech Communincation and Microphone Arrays, HSCMA (136-140). San Francisco, CA: IEEE.
Sun, Y., Wang, W., Chambers, J., & Navqi, S. M. (2018). Enhanced Time-Frequency Masking by Using Neural Networks for Monaural Source Separation in Reverberant Room Environments. 26th European Signal Processing Conference (EUSIPCO) (1647-1651). Rome: IEEE.
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An Algorithm For Intelligibility Prediction Of Time-Frequency Weighted Noisy Speech. IEEE Transactions On Audio, Speech, And Language Processing, 2125-2135.
Tabibi, S., Kegel, A., Lai, K. W., & Dillier, N. (2020). A bio-inspired coding (BIC) strategy for cochlear implants. Hearing Research, 1-16.
Tang, J., Deng, C., & Huang, G.-B. (2016). Extreme Learning Machine for Multilayer Perceptron. IEEE Transactions On Neural Networks And Learning Systems, 809-821.
Vandali, A. E., Whitford, L. A., Plant, K. L., & Clark, G. M. (2000). Speech Perception as a Function of Electrical stimulation rate Using the nucleaus 24 cochlear Implant system. Ear Hear, 608-624.
Wang, D., & Hansen, H. L. (2018). Speech enhancement for cochlear implant recipient. Jornal Acoustical Society of America, 143(4), 2244–2254.
Wang, Y., Han, K., & Wang, D. (2013). Exploring Monaural Features for Classification-Based Speech Segregation. IEEE Transaction on Audio, Speech and language Processing, 270-279.
Whitmal, N. A., & Poissant, S. F. (2009). Effect of source-to-listener distance and masking on perception of cochlear implant processed speech in reverberant rooms. Journal Acoustical Society of America, 126, 2556–2569.
Williamson, D. S., Wang, Y., & Wang, D. (2016). Complex ratio masking for monaural speech separation. IEEE/ACM Transactions On Audio, Speech, And Language Processing, 483-492.
Wong, L. L., Soli, S. D., Liu, S., Han, N., & Huang, M.-W. (2007). Development of the Mandarin Hearing in Noise Test (MHINT). Ear & Hearing, 70-74.
Wouters, J., McDermott, H. J., & Francart, T. (2015). Sound coding in cochlear implants from electric pulse to hearing. IEEE signal Processing Magazine, 67-80.
Xia, J., Xu, B., Pentony, S., Xu, J., & Swaminathan, J. (2018). Effect of Reverberation and noise on speech intelligibility in normal-hearing and aided hearing impaired listeners. The Journal of the Acoustical Society of America, 1523-1533.
Xiao, D., Li, B., & Mao, Y. (2017). A Multiple Hidden Layers Extreme Learning Machine Method and Its Application. Mathematical Problems in Engineering, 1-10.
Xiao, X., Zhao, S., Nguyen, D. H., Zhong, X., & Jones, D. L. (2016). Speech dereverebation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal on Advances in Signal, 1-18.
Xu, Y., Dai, L.-R., & Lee, C.-H. (2014). An Experimental Study on Speech Enhancement based on Deep Neural Network. IEEE Signal Processing Letters, 65-68.
Yuan, M., Sun, Y., Feng, h., & Lee, T. (2013). A Speech Enhancement Method for Cochlear Implant. Japan: Annu Int Conf IEEE Eng Med Biol Soc.
Zeng, F.-G., Rebscher, S., Harrison, W. V., Sun, X., & Feng, H. (2008). Cochlear Implants System Design, Integration, and Evaluation. IEEE Rev Biomedical Engineering, 115-142.
Zezario, R. E., Sigalingging, J. W., Hussain, T., Wang, J.-C., & Tsao, Y. (2019). Comparative Study of Masking and Mapping Based on Hierarchical Extreme Learning Machine for Speech Enhancement. 2019 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) (1-2). Taipe: IEEE.
Zhang, X.-L., & Wang, D. (2016). A Deep Ensemble Learning Method for Monaural Speech Spearation. IEEE Transaction on Audio, Speech, and Language Processing, 967-977.
指導教授 吳炤民(Wu, Chao-Min) 審核日期 2021-1-26
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明