姓名 林郁凱(Yu-Kai Lin)  查詢紙本館藏   畢業系所 軟體工程研究所
論文名稱 深度類神經網路於環境音偵測之應用與改良
(The Applications and Improvements of Deep Neural Networks in Environmental Sound Recognition)
摘要(中) 類神經網路已能在聲音辨識上取得極好的成績,多種不同的聲音特徵都被嘗試作為網路的輸入進行訓練辨識,然而以原始聲音訊號作為網路輸入,測試神經網路是否能夠自行擷取出聲音特徵依舊是一門挑戰。本文改良了現有原始訊號網路的架構,利用高層數的深度神經網路成功提升了訊號輸入分析的效果,以擬似頻譜轉換的方式,探討正確的參數設定,最終提出的1d-2d network 於ESC50中可成功達到73.55%的正確率。
摘要(英) Neural network has achieved a great result in the sound recognition, many different kinds of acoustic features have been tried as the training input with the network. However, there is still under doubt that the whether the neural network could efficiently extract features from the raw audio signal input. This study improved the raw-signal-input network from other researches, with the deeper network architectures, the raw signals get the well analysis with our network, we also make the discussion in several kinds of network settings, with the spectrogram-like conversion, our network could reach the accuracy of 73.55% in the open-audio-dataset ESC50.
Besides, in this study, we proposed a network architectures that could combine different kinds of networks feed with different features. With the help of global pooling, a flexible fusion way is well integrated into the network. Our experiment successfully combined two different networks which use different kinds of audio feature inputs—raw audio signal and log-mel spectrum. By the above settings, the ParallelNet we proposed finally reaches the accuracy of 81.55% in ESC50, which also reaches the recognition level of human being.
關鍵字(中) ★ 深度神經網路
★ 卷積神經網路
★ 環境音偵測
★ 特徵融合
關鍵字(英) ★ Deep Neuron network
★ Convolutional Neuron Network
★ Environmental Sound Recognition
★ Feature Combination
論文目次 摘要 i
Abstract ii
誌謝 iii
Content v
List of Figures vii
List of Tables viii
1. Introduction 1
2. Background 3
2.1 Related Works of Environmental sound recognition 3
2.2 Review of Neuron Networks 5
2.2.1 Feed-Forward Neural Networks 5
2.2.2 Convolutional Neural Networks 9
2.2.3 Convolutional Layers 10
2.2.4 Activation Layers 10
2.2.5 Pooling Layers 12
2.2.6 Fully Connected Layers 13
2.2.7 Loss Functions 13
2.2.8 Model Initialization 14
2.2.9 Batch Normalization 15
3. Methods Development 17
3.1 Data Sets 17
3.2 Data Preprocessing 17
3.3 Data Augmentations 18
3.4 Network Customizations 19
3.4.1 Network configuration 19
3.4.2 Network parallelization 23
4. Results and Discussion 25
4.1 Experiment Setup 25
4.2 The architecture of 1D network 25
4.2.1 Frame Size 25
4.2.2 Network depth 26
4.2.3 Number of filters 28
4.3 The architecture of 2D network 28
4.3.1 Kernel Shapes 28
4.4 The parallel network 29
4.4.1 Data augmentation 30
4.4.2 The effect of pre-train 29
4.5 Network Conclusion 31
5. Conclusion and Perspectives 33
Bibliography 35
指導教授 蘇木春(Mu-Chun Su) 審核日期 2018-8-23
