應用於環境聲音辨識之可信度估測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：122

、訪客IP：3.17.25.60

姓名

謝珳棋(Wen-Chi Hsieh) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

應用於環境聲音辨識之可信度估測
(Environmental Sound Recognition with Confidence Estimation)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來，環境聲音辨識在家庭自動化應用中已成為一個新的研究主題。在家庭自動化系統中，正確辨識環境中的聲音是執行任務的基礎。對一個辨識系統來說，特徵值與分類器的選取扮演著影響辨識率的重要角色。本篇論文使用非均勻尺度頻率圖當作特徵參數，並選擇高斯程序作為分類器使用。然而，除了特徵值與分類器之外，訓練資料的可靠程度也影響著辨識率。因此本篇論文提出了一個新的資料可信度估測方法用以實作離群點偵測。此可信度估測方法使用一個預先定義的字典來將特徵參數表示成高斯分佈。根據此高斯分佈的參數，我們定義兩個可信度值，稱為資料可信度及維度可信度；並且提出了兩個相應的核化函數用以應用於高斯程序之中。我們設定一個閥值來辦斷資料點是否為離群點。若資料點的可信度小於閥值，則此資料點視為離群點；反之，則為一般資料點。
關於字典選擇的部分，本篇論文討論多種基於矩陣分解的字典所估測出的可信度的差異，如: 傳統非負矩陣分解、半非負矩陣分解、稀疏非負矩陣分解、主成分分析及二維(半)非負矩陣分解。測試資料庫為一個二十類環境聲音資料庫。實驗結果顯示，稀疏非負矩陣分解字典所估測的可信度較具有鑑別性，在所提出的離群點偵測演算法上有較好的表現。

摘要(英)

In recent years, environmental sound recognition has become a new research topic in home automation. In home automation systems, the sound recognized by the system becomes the basis for performing certain tasks. For a recognition system, features and classifiers play the important roles in improving performance. This thesis adapts the nonuniform scale-frequency maps (nSFMs) as the feature, and the Gaussian process is chosen as the classifier. However, apart from features and classifiers, the reliability of the data should be also taken into consideration. Therefore, we propose a new confidence estimation approach to achieve the outlier detection. Two confidence measures called data confidence and dimension confidence are defined. And two relative kernels are proposed for the Gaussian process. A threshold is set to decide whether the data point is an outlier or not. If the confidence value of the data point is less than the threshold, the data point is regarded as an outlier. Otherwise, it is a normal data.
For the dictionary selection, the matrix factorization based dictionaries are discussed, such as standard nonnegative matrix factorization (NMF), Semi-NMF, sparse NMF, principal component analysis (PCA), and 2D (Semi-)NMF. Experiments are conducted on a 20 class environmental sound database. The results indicate that the confidence values estimated by the sparse NMF dictionary are discriminative and have better performances in the proposed outlier detection approach.

關鍵字(中)

★ 可信度估測
★ 環境聲音辨識
★ 高斯程序

關鍵字(英)

★ Confidence estimation
★ Environmental sound recognition
★ Gaussian process

論文目次

摘要 ii
Abstract vi
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 3
1.3 Organization of the Thesis 5
Chapter 2 Environmental Sound Recognition 6
2.1 Features 6
2.2 Classifiers 8
2.2.1 K-Nearest Neighbors (K-NN) 8
2.2.2 Support Vector Machine (SVM) 9
2.2.3 Gaussian Mixture Model (GMM) 9
2.3 Nonuniform Scale-Frequency Maps (nSFMs) 10
Chapter 3 Gaussian Process 13
3.1 Kernel Functions in a Gaussian Process 15
Chapter 4 Proposed Outlier Detection Method 17
4.1 Previous Works on Outlier Detection 17
4.2 Nonnegative Matrix Factorization 19
4.2.1 Semi-NMF 21
4.2.2 Sparse NMF 21
4.2.3 Principal Component Analysis (PCA) 22
4.2.4 Two-Dimensional NMF 22
4.2.5 Two-Dimensional Semi-NMF 24
4.2.6 Two-Dimensional Convex-NMF 25
4.3 Confidence Interval 27
4.4 Bayesian Learning 29
4.5 Definition of the Confidence Measure 32
4.6 Outlier Detection using Confidence Measure 33
4.7 Gaussian Process with Confidence Kernels 34
Chapter 5 Experimental Results 35
5.1 Database 35
5.2 Evaluation of Outlier Definition 35
5.3 Toy Examples 37
5.4 Outlier Detection on the ES Database 39
5.5 Gaussian Processes with Confidence Kernels 44
5.6 Environmental Sound Recognition using NMF 45
Chapter 6 Conclusions and Future Works 47
References 48
Publication List 52

參考文獻

[1] D. Mitrovi´c, M. Zeppelzauer, and C. Breiteneder, “Features for content-based audio retrieval,” Advances in computers, vol. 78, pp. 71-150, 2010.
[2] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397-3415, Dec. 1993.
[3] K. Umapathy, S. Krishnan, and S. Jimaa, “Multigroup classification of audio signals using time-frequency parameters,” IEEE Trans. Speech Audio Process., vol. 7, no. 2, pp. 308-315, Mar. 2005.
[4] S. Esmaili, S. Krishnan, and K. Raahemifar, “Content based audio classification and retrieval using joint time–frequency analysis,” in Proc. Int. Conf. Acoust., Speech, Signal Process., vol. 5, pp. 665-668, May 2004.
[5] S. Chu, S. Narayanan, and C.-C. J. Kuo, “Environmental sound recognition with time-frequency audio features,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 6, Aug. 2009.
[6] B. Ghoraani and S. Krishnan, “Time–frequency matrix feature extraction and classification of environmental audio signals,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 7, Sep, 2011.
[7] J. C. Wang, C. H. Lin, B. W. Chen, and M. K. Tsai, “Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation,” IEEE Trans. Automation Science and Engineering, vol. 11, pp. 607-613, Apr. 2014.
[8] M. Karbasi, S. M. Ahadi, and M. Bahmanian, “Environmental sound classification using spectral dynamic features,” in Int. Conf. Inform., Commun. and Signal Process. (ICICS), Singapore, 2011.
[9] Mohanapriya S. P., E. P. Sumesh, and R. Karthika, “Environmental sound recognition using Gaussian mixture model and neural network classifier,” in Int. Conf. Green Computing Commun. and Elect. Eng. (ICGCCEE), Coimbatore, Mar. 2014.
[10] J. C. Wang, H. P. Lee, J. F. Wang, and C. B. Lin, “Robust environmental sound recognition for home automation,” IEEE Trans. Autom. Sci. Eng., vol. 5, no. 1, pp. 25-31, 2008.
[11] W. Wen, Z. Hao, and R. Cai, “Gaussian process learning for image classification based on low-level features,” in Int. Conf. Natural Computation (ICNC), pp. 237-241, May 2012.
[12] F. Cheng, J. Yu, and H. Xiong, “Facial expression recognition in JAFFE dataset based on Gaussian process classification,” IEEE Trans. Neural Networks, vol. 21, Oct. 2010.
[13] B. Liu, Y. Xiao, P. S. Yu, Z. Hao, and L. Cao, “An efficient approach for outlier detection with imperfect data labels,” IEEE Trans. Knowledge and Data Engineering, vol. 26, no. 7, Jul. 2014.
[14] Y. Zhang, N. Meratnia, and P. Havinga, “Outlier detection techniques for wireless sensor networks: a survey,” IEEE Commun. Surveys & Tutorials, vol. 12, no. 2, Second Quarter 2010.
[15] I.T. Jolliffe, “Principal Component Analysis,” Springer-Verlag, New York, 1986.
[16] S. Mikat, G. Fitscht, J. Weston, B. Scholkopft, and K. R. Mullert, “Fisher discriminant analysis with kernels,” Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp. 41-48, Aug. 1999.
[17] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[18] Y. C. Cho, S. Choi, and S. Y. Bang, “Non-negative component parts of sound for classification,” IEEE International Symposium on Signal Processing and Information Technology, pp. 633-636, Dec 2003.
[19] K. Markov and T. Matsui, “Nonnegative matrix factorization based self-taught learning with application to music genre classification,” IEEE International Workshop on Machine Learning for Signal Processing, pp. 1-5, Sep. 2012.
[20] C. Joder and B. Schuller, “Exploring nonnegative matrix factorization for audio classiﬁcation: Application to speaker recognition,” in Proceedings of Speech Communication, pp. 1-4, Sep. 2012.
[21] N. Yamakawa, T. Takahashi, T. Kitahara, T. Ogata, and H. G. Okuno, “Environmental sound recognition for robot audition using Matching-Pursuit,” in Modern Approaches in Applied Intelligent, pp. 1-10, 2011.
[22] J. Chen, A. H. Kam, J. Zhang, N. Liu, and L. Shue, “Bathroom activity monitoring based on sound,” in Pervasive Computing, pp. 47-61, 2005.
[23] R. Sitte and L. Willets, “Non-speech environmental sound identification for surveillance using self-organizing-maps,” in Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications, pp. 281-286, 2007.
[24] J. D. Deng, C. Simmermacher, and S. Cranefield, “A study on feature analysis for musical instrument classification,” IEEE Trans. Syst., Man, Cybern., vol. 38, no. 2, pp. 429-438, 2008.
[25] V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa, “Computational auditory scene recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2002.
[26] S. Chachada and C.-C. Jay Kuo, “Environmental Sound Recognition: A Survey,” Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific.
[27] E. Tsau, S.-H. Kim, and C.-C. J. Kuo, “Environmental sound recognition with CELP-based features,” International Symposium on Signals, Circuits and Systems (ISSCS), 2011 10th, pp. 1-4.
[28] M. Cowling and R. Sitte, “Comparison of techniques for environmental sound recognition,” Pattern Recognition Letters, vol. 24, no. 15, pp. 2895-2907, 2003.
[29] A. Rabaoui, M. Davy, S. Rossignol and N. Ellouze, “Using one-class SVMs and wavelets for audio surveillance,” IEEE Trans. Information Forensics and Security, vol. 3, no. 4, pp. 763-775, Dec 2008.
[30] J. Bilmes, “A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models,” Int. Computer Science Inst., Tech. Rep. 97-021, 1998.
[31] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397-3415, Dec. 1993.
[32] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005.
[33] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[34] D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” in Advances in Neural Information Processing 13 (Proc. NIPS 2000), MIT Press, 2001.
[35] C. Ding, T. Li, and M. I. Jordan, “Convex and semi-nonnegative matrix factorizations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 1, pp. 45-55, Jan. 2010.
[36] H. Kim and H. Park, “Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares,” in Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pp. 1147-1151, Oct. 2007.
[37] A.M.Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 2, pp. 228–233, Feb. 2001.
[38] D. Zhang, S. Chen, and Z. H. Zhou, “Two-dimensional non-negative matrix factorization for face representation and recognition,” ICCV 2005 Workshop on Analysis and Modeling of Faces and Gestures, pp. 350-363, 2005.
[39] V. J. Hodge and J. Austin, ”A survey of outlier detection methodologies,” Artificial Intelligence Review, 22 (2). pp. 85-126, 2004.
[40] F. E. Grubbs, “Procedures for detecting outlying observations in samples,” Technometrics, col. 11, no. 1, pp. 1-21, Feb. 1969.
[41] J. Laurikkala1 , M. Juhola and E. Kentala, “Informal identification of outliers in medical data,” International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, IDAMAP-2000 Berlin, 22 August.
[42] S. J. Hickinbotham and J. Austin, “Novelty detection in airframe strain data,” in Proceedings of 15th International Conference on Pattern Recognition, vol. 2, pp. 536-539, 2000.
[43] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,“ in Proceedings of the ACM SIGMOD Conference on Management of Data. Dallas, TX, pp.427-438, 2000.
[44] E. Knorr and R. Ng, “Algorithms for mining distance-based outliers in large datasets,” in Proc. of the VLDB Conference, pp. 392-403, New York, USA, Sep. 1998.
[45] J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang, “Topic detection and tracking pilot study: Final report,” in Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[46] S. Ji, Y. Xue, and L. Carin, “Bayesian Compressive Sensing,” IEEE Trans. Signal Proceessing, vol. 56, no. 6, Jun. 2008.
[47] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211-244, 2001.
[48] P. C. Loizou, Speech Enhancement: Theory and Practice, 1st ed. Boca Raton, FL: CRC Press, Jun. 2007. D. R. Raymond, R.C.
[49] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006.
[50] Z. Jiang, Zhe Lin, and L. S. Davis, “Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, no. 11, Nov. 2013.
[51] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Supervised Dictionary Learning,” NIPS 2008, pp. 1033-1040, 2008.
[52] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online Dictionary Learning for Sparse Coding,” in Proceedings of the Annual International Conference on Machine Learning, pp. 689-696, 2009.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2015-8-24

推文