協同表示之聯合核化字典學習及其於聲音事件辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：59

、訪客IP：18.220.97.161

姓名

廖唯鈞(Wei-Chung Liao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

協同表示之聯合核化字典學習及其於聲音事件辨識
(Joint Kernel Dictionary Learning via Collaborative Representation and Its Application to Sound Event Classification)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

聲音事件辨識的應用在當今人類社會已逐漸成為一個重要的課題，舉凡
安全監測系統、環境聲音辨識、家庭看護系統等等皆與人們的日常生活息
息相關，為了達到精準辨識的目的，從傳統的辨識器如支持向量機(SVM)、
高斯混合模型(GMM)、到近年來火紅的稀疏表示分類器(SRC)，都能獲得不
錯的結果。
基於稀疏表示分類器，本論文針對其所使用的稀疏字典提出改良的訓練
方法。在訓練的目標函式中加入辨識誤差項並在稀疏限制項改採ℓ2-norm 而
非ℓ1-norm，訓練字典的過程同時訓練一個簡單的線性分類器達到增強辨識
能力且節省時間的效果。除此之外，我們使用核化方法將訓練資料投射至
高維特徵空間以增強字典的辨識及重建能力及提升系統彈性。線上學習
(online learning)的應用讓演算法在訓練資料依時間而變動時能有較高的效
率。
基於一個17 類別的聲音資料庫，實驗結果上以80.56%的辨識率優於其
他演算法，證明本論文所提出之改進方法確實對字典的辨識能力有所提升。
另外在執行測試的時間上也遠比SRC 和CRC 來的有效率。

摘要(英)

Environment sound classification is become more and more popular in
humans daily life, such as security surveillance, environment detection, human
health care. To accurately classify the sound from different event, we can use
the traditional SVM, GMM and the popular SRC to obtain well classification
result.
In this paper, we present a joint kernel dictionary learning (JKDL) method
base on sparse representation. Using ℓ2-norm instead of ℓ1-norm can reserve the
performance but reduce the computation time massively. Adding the
classification error term into the objective function to train a simple linear
classifier enhanced the relationship between the classifier and dictionary. Kernel
method plays an important role which efficiently strengthen the reconstructive
and discriminative ability. The dictionary update step is iteratively performed by
taking partial derivatives on objective function in feature space. Online
dictionary learning approach handle dynamic training data more efficient than
batch approach.
Experiments on a 17 classes sound database indicates that the proposed
method can achieve an high accuracy rate about 80.56%. Also, the average
executing time of a testing data is notably faster than SRC and CRC.

關鍵字(中)

★ 字典學習
★ 稀疏表示

關鍵字(英)

★ dictionary learning
★ sparse representation

論文目次

Contents
CHAPTER 1 INTRODUCTION ............................................................................................ 1
1-1 MOTIVATION .............................................................................................................. 2
1-2 OUTLINE OF THESIS ................................................................................................... 2
1-3 KEY NOTES OF CHAPTERS ......................................................................................... 3
CHAPTER 2 RELATED WORKS AND LITERATURE REVIEW .............................. 4
2-1 FUNDAMENTAL OF DICTIONARY LEARNING ............................................................. 4
2-1.1 Sparse Dictionary Learning ............................................................................... 4
2-1.2 Dictionary Learning via Collaborative Representation ..................................... 5
2-2 UNSUPERVISED DICTIONARY LEARNING .................................................................. 6
2-3 SUPERVISED DICTIONARY LEARNING ....................................................................... 9
2-4 SPARSE REPRESENTATION-BASED CLASSIFIER ...................................................... 10
2-5 KERNEL TRICK ........................................................................................................ 12
2-6 ONLINE LEARNING ................................................................................................... 13
CHAPTER 3 PROPOSED JOINT KERNEL DICTIONARY LEARNING ............... 14
3-1 JOINT DICTIONARY LEARNING ............................................................................... 16
3-2 OPTIMIZATION FOR JOINT DICTIONARY LEARNING .............................................. 16
3-3 JOINT KERNEL DICTIONARY LEARNING (JKDL) ................................................... 18
3-4 OPTIMIZATION FOR JDKL ...................................................................................... 19
3-5 ONE-VERSUS-ONE CLASSIFIER EXTENSION ........................................................... 22
3-6 CLASSIFICATION ...................................................................................................... 24
3-7 CLASSIFICATION ON ONE-VERSUS-ONE CLASSIFIERS ........................................... 25
3-8 ONLINE JKDL .......................................................................................................... 26
CHAPTER 4 EXPERIMENTAL RESULTS ................................................................ 28
4-1 ENVIRONMENT OF EXPERIENTS .............................................................................. 28
iv
4-2 PARAMETERS SELECTION ........................................................................................ 28
4-3 EFFECT OF DIFFERENT TRAINING DATA NUMBER ................................................. 29
4-4 EFFECT OF DIFFERENT DICTIONARY SIZE .............................................................. 32
4-5 ONLINE JKDL .......................................................................................................... 34
CHAPTER 5 CONCLUSION AND FUTURE WORKS ............................................... 36
REFERENCE ..................................................................................................................... 37

參考文獻

REFERENCES
[1] A. Temko, R. Malkin, C. Zieger, D. Macho, and C. Nadeu, "Acoustic Event Detection
and Classification in Smart-room Environments: Evaluation of Chil Project Systems," in
Proc. IV Biennial Workshop on Speech Technology, Zaragoza, 2006, pp. 1-6.
[2] C. Clavel, T. Ehrette, and G. Richard, “Events Detection for an Audio-Based
Surveillance System,” Multimedia and Expo, 2005. ICME 2005. IEEE International
Conference on, pp. 1306–1309, 2005.
[3] S. Chen, Z. P. Sun, and B. Bridge, "Automatic Traffic Monitoring by Intelligent Sound
Detection," Proc. IEEE Intelligent Transportation Systems Conf., Nov. 1997.
[4] B. Ghoraani, S. Krishnan, "Time–Frequency Matrix Feature Extraction and
Classification of Environmental Audio Signals," IEEE Trans. Audio, Speech, Lang.
Process., vol. 19, no. 7, pp. 2197-2209, Sep. 2011.
[5] A. Fleury, N. Noury, M. Vacher, H. Glasson, and J. F. Serignat, “Sound and Speech
Detection and Classification in a Health Smart Home,” in 30th IEEE EMBS Annual
International Conference, Vancouver, British Columbia, Canada, Aug. 20–24 2008, pp.
4644 – 4647.
[6] S. Chu, S. Narayanan, and C.-C. J. Kuo, “Environmental Sound Recognition with
Time-frequency Audio Features,” IEEE Trans. Audio, Speech, and Language Processing,
vol. 17, no. 6, pp. 1142–1158, Aug. 2009.
[7] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. "Robust Face Recognition via
Sparse Representation." IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 210-227, 2009.
- 38 -
[8] M. Aharon, M. Elad, and A. Bruckstein. "K-svd: An Algorithm for Designing
Overcomplete Dictionaries for Sparse Representation." IEEE Trans. Signal Processing,
54(11): 4311-4322, 2006.
[9] K. Engan, S. O. Aase, and J. H. Husoy. "Frame Based Signal Compression Using
Method of Optimal Directions (MOD)." In Proceedings of the 1999 IEEE International
Symposium on Circuits Systems, volume 4, 1999.
[10] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust Face Recognition
via Sparse Representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp.
210–226, Feb. 2009.
[11] H. Lee, A. Battle, R. Raina, and A. Y. Ng. "Efficient Sparse Coding Algorithms." In
NIPS, pages 801–808, 2006.
[12] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. "Supervised Dictionary
Learning." In NIPS, 2009.
[13] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. "Locality-constrained Linear
Coding for Image Classification." Proc. CVPR, 2010.
[14] J. Yang, K. Yu, and T. S. Huang. "Supervised Translation-invariant Sparse Coding." In
CVPR, pages 3517–3524, 2010.
[15] T. Lin, S. Liu, and H. Zha. "Incoherent Dictionary Learning for Sparse Representation."
In ICPR, pp. 1237 - 1240, 2012.
[16] D. Pham and S. Venkatesh, “Joint Learning and Dictionary Construction for Pattern
Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[17] Z. Jiang, Z. Lin, L. Davis, "Label Consistent K-SVD: Learning A Discriminative
Dictionary for Recognition" IEEE Trans. on Pattern Analysis and Machine Intelligence,
May 2013.
[18] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. "Discriminative Learned
Dictionaries for Local Image Analysis." In CVPR, 2008.
- 39 -
[19] I. Ramirez, P. Sprechmann, and G. Sapiro. "Classification and Clustering via Dictionary
Learning with Structured Incoherence and Shared Features." In CVPR, pages 3501–
3508, 2010.
[20] N. Zhou, Y. Shen, J. Peng, and J. Fan. "Learning Inter-related Visual Dictionary for
Object Recognition." In CVPR, 2012.
[21] H. Nguyen, V. patel, N. Nasrabad, and R. Chellappa, "Design of Non-linear Kernel
Dictionaries for Object Recognition," IEEE TIP, pp. 5123-5135, no. 99, 2013.
[22] Y. Xie, W. Zhang, C. Li, S. Lin, Y. Qu, and Y. Zhang. "Discriminative Object Tracking
via Sparse Representation and Online Dictionary Learning." IEEE Transactions on
Cybernetics, 2013.
[23] C. Lu, J. Shi, and J. Jia, “Online Robust Dictionary Learning,” in IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2013.
[24] Z. Feng, M. Yang, L. Zhang, Y. Lui, D. Zhang, "Joint Discriminative Dimensionality
Reduction and Dictionary Learning for Face Recognition," Pattern Recognition
46(8)(2013) 2134-2143.
[25] R. Rubinstein, M. Zibulevsky, and M. Elad, “Double Sparsity: Learning Sparse
Dictionaries for Sparse Signal Approximation,” Signal Processing, IEEE Transactions
on, vol. 58, no. 3, pp. 1553 –1564, march 2010.
[26] H. Lee, A. Battle, R. Raina, and A.Y. Ng, “Efficient Sparse Coding Algorithms,” Proc.
Conf. Neural Information Processing Systems, 2006.
[27] G. Karol and Y. LeCun, “Learning Fast Approximations of Sparse Coding,” Proc. Int’l
Conf. Machine Learning, 2010.
[28] L. Zhang, M. Yang, X. Feng, "Sparse Representation or Collaborative Representation:
Which Helps Face Recognition," in: Proceeding of the ICCV, 2011.
[29] J. Mairal, F. Bach, and J. Ponce, “Task-Driven Dictionary Learning,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 791-804, Apr. 2012.
- 40 -
[30] M. Yang, L. Zhang, X. Feng, and D. Zhang, “Fisher Discrimination Dictionary Learning
for Sparse Representation,” Proc. IEEE Int’l Conf. Computer Vision, 2011.
[31] K. Engan, S. O. Aase, and J. H. Hakon-Husoy, “Method of Optimal Directions for Frame
Design,” in IEEE Int. Conf. Acoust., Speech, Signal Process., 1999, vol. 5, pp.
2443–2446.
[32] C. J. C. Burges, “A tutorial on Support Vector Machines for Pattern Recognition,” Data
Mining Knowledge Discov., vol. 2, no. 2, pp. 121–167, 1998.
[33] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller, “Fisher Discriminant
Analysis with Kernels,” in IEEE Int. Workshop Neural Netw. Signal Process. IX,
Madison, WI, Aug. 1999, pp. 41–48.
[34] B. Schölkopf, A. J. Smola, and K.-R. Müller, “Nonlinear Component Analysis as a
Kernel Eigenvalue Problem,” Neural Comput., vol. 10, no. 5, pp. 1299–1319, 1998.
[35] C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data
Mining Knowledge Discov., vol. 2, no. 2, pp. 121–167, 1998.
[36] L. Zhang, W. D. Zhou, and L. C. Jiao, "Wavelet Support Vector Machine," IEEE Trans.
Syst., Man, Cybern. B, Cybern., vol. 34, no. 1, pp. 34-39, Feb. 2004.
[37] L. Zhang, W. D. Zhou, and L. C. Jiao, "Support Vector Machines Based on the
Orthogonal Projection Kernel of Father Wavelet," Int. J. Comput. Intell. Appl., vol. 5, no.
3, pp. 28-303, 2005.
[38] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. "Online Dictionary Learning for Sparse
Coding." ICML, 2009.
[39] D. L. Donoho and M. Elad, “Optimally Sparse Representation in General (nonorthogonal)
Dictionaries via ℓ1 Minimization,” Proceedings of the National Academy of Sciences,
vol. 100, no. 5, pp. 2197–2202, Mar. 2003.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2014-8-26

推文