密集卷積網路應用在聲學場景分類;Densely Connected Convolutional Networks (DenseNet) for Acoustic Scene Classification

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Electrical Engineering > Electronic Thesis & Dissertation > Item 987654321/84175

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/84175

Title:	密集卷積網路應用在聲學場景分類;Densely Connected Convolutional Networks (DenseNet) for Acoustic Scene Classification
Authors:	劉建杰;Liu, Jian-Jie
Contributors:	電機工程學系
Keywords:	聲學場景分類;Acoustic Scene Classification
Date:	2020-07-29
Issue Date:	2020-09-02 18:27:05 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著智慧城市發展與無人駕駛技術的推動，日常生活中，環境聲音包含的資訊越來越重要，這些資訊透過正確且有效的轉換後，讓我們可以對身處的環境有更進一步的分析及處理。而近年來隨著GPU進步與大數據時代的來臨，深度學習在各種不同的領域持續帶來重大的突破，尤其是在電腦視覺與自然語言處理方面更是讓人們的生活更有感。而在聲源分離領域，有一個重要的概念Computational Auditory Scene Analysis(CASA)，此概念的一個重要目標就是將機器人置身於某個聲學場景，比如街道的十字路口、機場的大廳、甚至是購物中心，讓它能全盤了解自己所處的聲學環境，知道各個聲源的位子，知道有那些聲源。在所處的生活中，有著非常多元的音訊接收裝置，透過物聯網的概念，可以更方便的讓行動裝置成為資料蒐集的來源。本論文提出透過深度神經網路對DCASE Challenge 2020的公開資料集TAU Urban Acoustic Scenes 2020 Mobile進行聲學場景分類。此競賽是由IEEE AASP授權的競賽，為該領域目前最大型的競賽，今年已經舉辦第六屆，由CMU、法國INRIA、芬蘭Tampere大學共同舉辦，Google和Audio Analytic(英國劍橋的音頻處理公司)共同贊助。聲學特徵採用Log-mel spectrogram為主要方法，神經網路的部分採用DenseNet的結構為基礎，針對dataset中10類聲學場景進行分類，最終可達65.84%的準確度，並高於baseline system。 ;With the development of smart cities and driverless driving technology, the information contained in environmental sounds is becoming more and more important in daily life. After correct and effective conversion, this information allows us to further analyze and analyze the environment we are living. In recent years, with the advancement of GPUs and the advent of the era of big data, deep learning continues to bring major breakthroughs in various fields, especially in computer vision and natural language processing, which makes people’s lives more meaningful. In the field of sound source separation, there is an important concept Computational Auditory Scene Analysis (CASA). An important goal of this concept is to place the robot in an acoustic scene, such as a street intersection, an airport lobby, or even a shopping center. It can fully understand the acoustic environment in which it is located, know the position of each sound source, and know which sound sources are available. In the life where you live, there are very diverse audio receiving devices. Through the concept of the Internet of Things, it is more convenient to make mobile devices a source of data collection. This paper proposes to classify the acoustic scenes of the public dataset TAU Urban Acoustic Scenes 2020 Mobile of the DCASE Challenge 2020 through deep neural networks. This competition is authorized by IEEE AASP. It is the largest competition in this field. It has been held for the sixth time. It is co-organized by CMU, INRIA of France, and Tampere University of Finland. Google and Audio Analytic (audio processing company in Cambridge, UK) Co-sponsored. Acoustic features use Log-mel spectrogram as the main method, and the neural network part uses DenseNet structure as the basis. It classifies 10 types of acoustic scenes in the dataset, and finally achieves 65.84% accuracy, which is higher than the baseline system.
Appears in Collections:	[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	111	View/Open

社群 sharing

Loading...