密集卷積網路應用在聲學場景分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：22

、訪客IP：18.117.158.147

姓名

劉建杰(Jian-Jie Liu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

密集卷積網路應用在聲學場景分類
(Densely Connected Convolutional Networks (DenseNet) for Acoustic Scene Classification)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著智慧城市發展與無人駕駛技術的推動，日常生活中，環境聲音包含的資訊越來越重要，這些資訊透過正確且有效的轉換後，讓我們可以對身處的環境有更進一步的分析及處理。而近年來隨著GPU進步與大數據時代的來臨，深度學習在各種不同的領域持續帶來重大的突破，尤其是在電腦視覺與自然語言處理方面更是讓人們的生活更有感。
而在聲源分離領域，有一個重要的概念Computational Auditory Scene Analysis(CASA)，此概念的一個重要目標就是將機器人置身於某個聲學場景，比如街道的十字路口、機場的大廳、甚至是購物中心，讓它能全盤了解自己所處的聲學環境，知道各個聲源的位子，知道有那些聲源。在所處的生活中，有著非常多元的音訊接收裝置，透過物聯網的概念，可以更方便的讓行動裝置成為資料蒐集的來源。
本論文提出透過深度神經網路對DCASE Challenge 2020的公開資料集TAU Urban Acoustic Scenes 2020 Mobile進行聲學場景分類。此競賽是由IEEE AASP授權的競賽，為該領域目前最大型的競賽，今年已經舉辦第六屆，由CMU、法國INRIA、芬蘭Tampere大學共同舉辦，Google和Audio Analytic(英國劍橋的音頻處理公司)共同贊助。聲學特徵採用Log-mel spectrogram為主要方法，神經網路的部分採用DenseNet的結構為基礎，針對dataset中10類聲學場景進行分類，最終可達65.84%的準確度，並高於baseline system。

摘要(英)

With the development of smart cities and driverless driving technology, the information contained in environmental sounds is becoming more and more important in daily life. After correct and effective conversion, this information allows us to further analyze and analyze the environment we are living.
In recent years, with the advancement of GPUs and the advent of the era of big data, deep learning continues to bring major breakthroughs in various fields, especially in computer vision and natural language processing, which makes people’s lives more meaningful.
In the field of sound source separation, there is an important concept Computational Auditory Scene Analysis (CASA). An important goal of this concept is to place the robot in an acoustic scene, such as a street intersection, an airport lobby, or even a shopping center. It can fully understand the acoustic environment in which it is located, know the position of each sound source, and know which sound sources are available. In the life where you live, there are very diverse audio receiving devices. Through the concept of the Internet of Things, it is more convenient to make mobile devices a source of data collection.
This paper proposes to classify the acoustic scenes of the public dataset TAU Urban Acoustic Scenes 2020 Mobile of the DCASE Challenge 2020 through deep neural networks. This competition is authorized by IEEE AASP. It is the largest competition in this field. It has been held for the sixth time. It is co-organized by CMU, INRIA of France, and Tampere University of Finland. Google and Audio Analytic (audio processing company in Cambridge, UK) Co-sponsored. Acoustic features use Log-mel spectrogram as the main method, and the neural network part uses DenseNet structure as the basis. It classifies 10 types of acoustic scenes in the dataset, and finally achieves 65.84% accuracy, which is higher than the baseline system.

關鍵字(中)

★ 聲學場景分類

關鍵字(英)

★ Acoustic Scene Classification

論文目次

摘要......................................................I
ABSTRACT.................................................II
1. 序論.................................................1
1-1 研究背景與動機.....................................1
1-2 相關研究探討.......................................4
1-3 聲學場景與事件偵測與分類競賽 2020..................5
1-4 TAU Urban 聲學場景數據集...........................5
1-5 論文架構...........................................6
2. 密集卷積網路.........................................7
2-1 原理介紹..........................................7
2-2 卷積神經網路......................................8
2-2-1 感受視野.......................................11
2-2-2 權值共享.......................................12
2-2-3 池化層.........................................13
2-2-4 全連接層.......................................14
2-2-5 激勵函數.......................................15
2-3 密集卷積網路.....................................18
2-3-1 模型...........................................18
2-3-2 訓練與預測.....................................25
3. 聲學場景分類系統....................................27
3-1 密集卷積網路模型與數據集.........................27
3-2 聲學特徵提取.....................................29
3-3 訓練方式與參數...................................30
3-4 實驗結果.........................................31
3-5 系統架構與結果...................................32
4. 結論................................................34
參考文獻.................................................35

參考文獻

[1] Fonseca, E.; Gong, R.; Bogdanov, D.; Slizovskaia, O.; Gomez, E.; Serra, X. Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Munich, Germany, 16–17 November 2017.
[2] Maka, T. Audio Feature Space Analysis for Acoustic Scene Classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Surrey, UK, 19–20 November 2018.
[3] Bisot, V.; Essid, S.; Richard, G. HOG and Subband Power Distribution Image Features for Acoustic Scene Classification. In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 719–723, doi:10.1109/EUSIPCO.2015.7362477. [CrossRef]
[4] Jiménez, A.; Elizalde, B.; Raj, B. DCASE 2017 Task 1: Acoustic Scene Classification using Shift-Invariant Kernels and Random Features. In Proceedings of the Detection and Classification of Acoustic Scenes and EventsWorkshop (DCASE), Munich, Germany, 16–17 November 2017.
[5] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Trans. on Multimedia, vol. 17, no. 10, pp. 1733–1746, October 2015.
[6] A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, “Detection and classification of acoustic scenes and events: Outcome of the dcase 2016 challenge,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 379–393, Feb 2018.
[7] A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, “DCASE2017 challenge setup: Tasks, datasets and baseline system,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), November 2017, pp. 85–92.
[8] H. Eghbal-zadeh, B. Lehner, M. Dorfer, and G. Widmer, “A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification,” in 2017 25th European Signal Processing Conference (EUSIPCO), Aug 2017, pp. 2749–2753.
[9] E. Marchi, D. Tonelli, X. Xu, F. Ringeval, J. Deng, S. Squartini, and B. Schuller, “Pairwise decomposition with deep neural networks and multiscale kernel subspace learning for acoustic scene classification,” in Proc. of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), September 2016, pp. 65–69.
[10] M. Valenti, S. Squartini, A. Diment, G. Parascandolo, and T. Virtanen, “A convolutional neural network approach for acoustic scene classification,” in 2017 International Joint Conference on Neural Networks (IJCNN), May 2017, pp. 1547–1554.
[11] S. H. Bae, I. Choi, and N. S. Kim, “Acoustic scene classification using parallel combination of LSTM and CNN,” in Proc. of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), September 2016, pp. 11–15.
[12] V. Bisot, R. Serizel, S. Essid, and G. Richard, “Feature learning with matrix factorization applied to acoustic scene classification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1216–1229, June 2017.
[13] A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho, and J. Huopaniemi, "Audio-based context recognition," IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 321-329, 2006.
[14] S. Chu, S. Narayanan, and C. J. Kuo, "Environmental sound recognition with time-frequency audio features," IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142- 1158, 2009.
[15] V. Carletti, P. Foggia, G. Percannella, A. Saggese, N. Strisciuglio, and M. Vento, "Audio surveillance using a bag of aural words classifier," in IEEE Int. Conf. on Advanced Video and Signal Based Surveillance, pp. 81-86, 2013.
[16] J.-J. Aucouturier, B. Defreville, and F. Pachet, "The bag-offrames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music," The Journal of the Acoustical Society of America, vol. 122, no. 2, pp. 881-891, 2007.
[17] S. Pancoast and M. Akbacak, "Bag-of-Audio-Words approach for multimedia event classification," in INTERSPEECH 2012, pp. 2105-2108, 2012.
[18] W. Choi, S. Kim, M. Keum, D. K. Han, and H. Ko, “Acoustic and visual signal based context awareness system for mobile application,” IEEE Trans. Consum. Electron., vol. 57, no. 2, pp. 738-746, 2011.
[19] Singh, A.; Rajan, P.; Bhavsar, A. Deep Multi-View Features from Raw Audio for Acoustic Scene Classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA, 25–26 October 2019; pp. 229–233.
[20] Yang, L.; Chen, X.; Tao, L. Acoustic Scene Classification using Multi-Scale Features. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Surrey, UK, 19–20 November 2018.
[21] Singh, A.; Thakur, A.; Rajan, P.; Bhavsar, A. A Layer-Wise Score Level Ensemble Framework for Acoustic Scene Detection. In Proceedings of the 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 837–841, doi:10.23919/EUSIPCO.2018.8553052. [CrossRef]
[22] Mars, R.; Pratik, P.; Nagisetty, S.; Lim, C. Acoustic Scene Classification from Binaural Signals using Convolutional Neural Networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA, 25–26 October 2019; pp. 149–153, doi:10.33682/6c9z-gd15. [CrossRef]
[23] Ren, Z.; Pandit, V.; Qian, K.; Yang, Z.; Zhang, Z.; Schuller, B. Deep Sequential Image Features for Acoustic Scene Classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Munich, Germany, 16–17 November 2017.
[24] Fonseca, E.; Gong, R.; Bogdanov, D.; Slizovskaia, O.; Gomez, E.; Serra, X. Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Munich, Germany, 16–17 November 2017.
[25] Huang, J.; Lu, H.; Lopez-Meyer, P.; Maruri, H.A.C.; Ontiveros, J.A.d.H. Acoustic Scene Classification using Deep Learning-Based Ensemble Averaging. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA, 25–26 October 2019; pp. 94–98.
[26] Nguyen, T.; Pernkopf, F. Acoustic Scene Classification using a Convolutional Neural Network Ensemble and Nearest Neighbor Filters. In Proceedings of the Detection and Classification of Acoustic Scenes and EventsWorkshop (DCASE), Surrey, UK, 19–20 November 2018.
[27] Zeinali, H.; Burget, L.; Cernocky, J. Convolutional Neural Networks and X-Vector Embeddings for DCASE2018 Acoustic Scene Classification Challenge. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Surrey, UK, 19–20 November 2018.
[28] Weiping, Z.; Jiantao, Y.; Xiaotao, X.; Xiangtao, L.; Shaohu, P. Acoustic SceneClassification usingDeepConvolutional Neural Networks and Multiple Spectrogram Fusions. In Proceedings of the Detection and Classification of Acoustic Scenes and EventsWorkshop (DCASE),Munich, Germany, 16–17 November 2017.
[29] G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, "Densely Connected Convolutional Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 2261-2269, doi: 10.1109/CVPR.2017.243.
[30] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pages 1–9,2015.
[31] G. Huang, Z. Liu, K. Q. Weinberger and L. Maaten. Densely Connected Convolutional Networks. arXiv:1608.06993v3,2016.
[32] Suh Sangwon, Park Sooyoung, Jeong Youngho and Lee Taejin, Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, South Korea, Designing Acoustic Scene Classification Models with CNN Variants.
[33] Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du and Chin-Hui Lee, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA, Tencent Media Lab, Shenzhen, China, University of Science and Technology of China, HeFei, China, Tencent Media Lab, Beijing, China, Computer Engineering School, University of Enna Kore, Italy, Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.
[34] Monteiro Joao, Shruti Kshirsagar, Anderson Avila, Amr Aaballah, Parth Tiwari and Tiago Falk, EMT, Institut National de la Recherche Scientifique, Montreal, Canada, Development of the Inrs-Emt Scene Classification Systems for the 2020 Edition of the DCASE Challenge.
[35] Chi Zhang1, Hanxin Zhu2 and Cheng Ting3, Electronic Information Engineering, University of Electronic Science and Technology of China, Chengdu, China, 2Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China, 3University of Electronic Science and Technology of China, Chengdu, China, Simple Convolutional Networks Attempting Acoustic Scene Classification Cross Devices.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2020-7-29

推文