聯合局部保留字典學習法研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：3.133.158.178

姓名

沈正勝(Seksan Mathulaprangsan) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

聯合局部保留字典學習法研究
(A Study of Locality Preserved Joint Dictionary Learning)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文結合局部特徵保留(Locality preserving)技術與字典學習方法，並且藉此提升其應用在語音情緒辨識以及物件辨識上之效果。
首先，針對影像物件辨識應用，我們提出二個新穎之局部保留字典學習方法，其一為具鑑別性(Discriminative)之局部保留KSVD(LP-KSVD)，將標籤資訊引入局部保留項。其二為標籤一致性(Label-consistent)之LP-KSVD(LCLP-KSVD)，利用標籤一致做為限制項來進一步強化不同類別間之鑑別性。
接著，本論文針對語音情緒辨識應用，提出具局部保留之聯合非負矩陣分解(Joint nonnegative matrix factorization)方法(LP-JNMF)，透過同時重建語音特徵與訓練一簡單線性分類器，來學習具備高度鑑別力之共通特徵。此外，我們也引入局部保留限制項來使得學習出的特徵保留高維度特徵之流型(Manifold)。
實驗結果顯示，所提出的方法在物件辨識與語音情緒辨識的應用上，優於多項先進字典學習方法。

摘要(英)

This study focuses on using the locality preserving technique, which uses geometric information of data, to boost up the performance of dictionary learning approaches to a number of pattern recognition tasks including speech emotion recognition and object recognition.
Firstly, to exploit fully the potential of the locality-preserving technique for the object recognition task, two novel dual-layer locality-preserving methods were developed. The former is the discriminative LP-KSVD (DLP-KSVD), which incorporates the label information into locality-preserving term. The latter is the label-consistent LP-KSVD (LCLP-KSVD), which applied the label-consistent constraint to the original LP-KSVD model to penalize the sparse codes from different classes to improve the discriminative power.
Secondly, a novel approach for speech emotion recognition, named locality preserved joint NMF (LP-JNMF), is introduced. This study achieves two goals jointly; the first is to learn a dictionary for the reconstruction of input acoustic features and the second is to learn a simple linear classifier for annotation. Since the learned representations are shared between the learned dictionaries and annotation matrix, the discriminative power is promoted. Moreover, to preserve the manifold of input acoustic features, a locality penalty term is incorporated into the objective function of joint dictionary learning. Thus, the discriminability of the learned dictionary is further improved.
Experimental results prove that the proposed methods outperform the baseline algorithms, which are state-of-the-art dictionary learning algorithms for object recognition and speech emotion recognition problems.

關鍵字(中)

★ 字典學習
★ 聯合詞典學習
★ 局部特徵保留
★ 非負矩陣分解

關鍵字(英)

★ dictionary learning
★ joint dictionary learning
★ locality preserving
★ nonnegative matrix factorization

論文目次

Abstract iii
摘要 v
Table of Contents vi
List of Figures ix
List of Table x
Abbreviations xi
Pupblications xiii
Chapter 1 Introduction 1
1.1. Background 1
1.2. Research Problem and Scope 3
1.3. Thesis Organization 5
Chapter 2 Related Works 6
2.1. Sparse Dictionary Learning 6
2.2. Dictonary Learning via Collaborative Representation 7
2.3. Joint Dictionary Learning 9
2.3.1. Discriminative K-SVD 10
2.3.2. Optimization for Joint Dictionary Learning 11
2.4 Nonnegative Matrix Factorization 13
2.5 Joint NMF 14
2.6 Locality Preserving Projection 15
Chapter 3 Proposed Models 17
3.1 Locality-preserving K-SVD Based Joint Dictionary Learning and Classifier 17
3.1.1 DLP-KSVD and LCLP-KSVD System Overview 17
3.1.2 Discriminative LP-KSVD (DLP-KSVD) 19
3.1.3 Label-consistent LP-KSVD (LCLP-KSVD) 20
3.1.4 Locality-incorporated Dictionary Learning 21
3.2 Locality Preserved Joint Nonnegative Matrix Factorization 24
3.2.1 LP-JNMF System Overview 25
3.2.2 The Proposed LP-JNMF 26
Chapter 4 Experimental Results 28
4.1 Experiments of the Proposed Locality-preserving K-SVD Based JDL Methods on Caltech101 dataset 28
4.1.1 Comparison of Effects of Variously Sized Training Data 30
4.1.2 Comparison of Effects of Variously Sized Dictionary 31
4.2 Experiments of the Proposed LP-JNMF on MHMC 32
4.2.1 Effect of Different Sizes of Bases 35
4.2.2 Effect of Different Types of Features 36
4.2.3 Effect of Different Sizes of Training Data 37
Chapter 5 Discussions 40
5.1 Pros and Cons of the Proposed Models 40
Chapter 6 Conclustion and Future Works 43
6.1 Conclusion 43
6.2 Future Works 44
Bibliographies 45

參考文獻

[1] J. C. Wang, Y. H. Chin, B. W. Chen, C. H. Lin, and C. H. Wu, “Speech Emotion Verification Using Emotion Variance Modeling and Discriminant Scale-Frequency Maps,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 10, pp. 1552–1562, 2015.
[2] M. J. Gangeh, P. Fewzee, A. Ghodsi, M. S. Kamel, and F. Karray, “Multiview Supervised Dictionary Learning in Speech Emotion Recognition,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1056–1068, Jun. 2014.
[3] S. Lazebnik and M. Raginsky, “Supervised Learning of Quantizer Codebooks by Information Loss Minimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 7, pp. 1294–1309, Jul. 2009.
[4] X. C. Lian, Z. Li, C. Wang, B. L. Lu, and L. Zhang, “Probabilistic Models for Supervised Dictionary Learning,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2010, pp. 2305–2312.
[5] J. Yang, K. Yu, and T. Huang, “Supervised translation-invariant sparse coding,” IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3517–3524, 2010.
[6] H. Zhang, Y. Zhang, and T. S. Huang, “Simultaneous Discriminative Projection and Dictionary Learning for Sparse Representation Based Classification,” Pattern Recognit., vol. 46, no. 1, pp. 346–354, 2013.
[7] Q. Zhang and B. Li, “Discriminative K-SVD for Dictionary Learning in Face Recognition,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2010, pp. 2691–2698.
[8] Z. Jiang, Z. Lin, and L. S. Davis, “Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2011, pp. 1697–1704.
[9] W. Liu, Z. Yu, M. Yang, L. Lu, and Y. Zou, “Joint kernel dictionary and classifier learning for sparse coding via locality preserving K-SVD,” Proc. - IEEE Int. Conf. Multimed. Expo, vol. 2015–Augus, 2015.
[10] X. He and P. Niyogi, “Locality Preserving Projections,” in Proc. Conf. Advances Neural Inform. Process. Syst., 2003, pp. 153–160.
[11] Y. Zhou, J. Gao, and K. E. Barner, “Locality Preserving KSVD for Nonlinear Manifold Learning,” in Acoust., Speech, and Signal Process. (ICASSP), 2013, pp. 3372–3376.
[12] T. Komatsu, Y. Senda, and R. Kondo, “Acoustic Event Detection Based on Non-negative Matrix Factorization with Mixtures of Local Dictionaries and Activation Aggregation,” in Acoust., Speech, and Signal Process. (ICASSP), 2016, pp. 2259–2263.
[13] A. Mesaros, T. Heittola, O. Dikmen, and T. Virtanen, “Sound Event Detection in Real Life Recordings Using Coupled Matrix Factorization of Spectral Representations and Class Activity Annotations,” in Acoust., Speech, and Signal Process. (ICASSP), 2015, pp. 151–155.
[14] Z. Wu, E. S. Chng, and H. Li, “Joint nonnegative matrix factorization for exemplar-based voice conversion.”
[15] S. Fu, P. Li, Y. Lai, C. Yang, L. Hsieh, and Y. Tsao, “Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to,” vol. 64, no. 11, pp. 2584–2594, 2017.
[16] A. Y. N. Honglak Lee, Alexis Battle, Rajat Raina, “Efficient Sparse coding algorithms,” Adv. nerual infromation Process. Syst., pp. 801–808, 2006.
[17] K. Gregor and Y. Lecun, “Learning Fast Approximations of Sparse Coding,” Vision, Image Signal Process. IEE Proc. -, vol. 152, no. 3, pp. 318–326, 2005.
[18] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: Which helps face recognition?,” Proc. IEEE Int. Conf. Comput. Vis., pp. 471–478, 2011.
[19] Z. Zhang, S. Member, Y. Xu, and S. Member, “A Survey of Sparse Representation : Algorithms and Applications,” IEEE Access, vol. 3, pp. 490–530, 2015.
[20] I. S. Dhillon and S. Sra, “Generalized nonnegative matrix approximations with Bregman divergences,” in Advances in neural information processing systems 18, 2005.
[21] R. Tandon and S. Sra, “Sparse nonnegative matrix approximation: new formulations and algorithms,” Tech Report No. 193, Max-Planck, 2010.
[22] K. Jeong, J. Song, and H. Jeong, “NMF Features for Speech Emotion Recognition,” in Proceedings of the 2009 International Conference on Hybrid Information Technology, 2009, pp. 368–374.
[23] K. Jeong, J. Song, and H. Jeong, “Spectral Analysis for Emotion Recognition by NMF Features,” in 2009 Fifth International Conference on Natural Computation, 2009, vol. 5, pp. 121–125.
[24] S.-Y. Lee, H.-A. Song, and S. Amari, “A new discriminant NMF algorithm and its application to the extraction of subtle emotional differences in speech,” Cognitive Neurodynamics, vol. 6, no. 6. Dordrecht, pp. 525–535, Dec-2012.
[25] D. Kim, S. Y. Lee, and S. I. Amari, “Representative and discriminant feature extraction based on NMF for emotion recognition in speech,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5863 LNCS, no. PART 1, C. S. Leung, M. Lee, and J. H. Chan, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 649–656.
[26] P. Song, S. Ou, W. Zheng, Y. Jin, and L. Zhao, “Speech emotion recognition using transfer non-negative matrix factorization,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5180–5184.
[27] Z. Wu, E. Chng, and H. Li, “Joint nonnegative matrix factorization for exemplar-based voice conversion,” Multimed. Tools Appl., vol. 74, 2014.
[28] L. Zhang, G. Bao, Y. Luo, and Z. Ye, “Monaural Speech Enhancement Using Joint Dictionary Learning with Cross-Coherence Penalties,” Proc. - 2015 8th Int. Symp. Comput. Intell. Des. Isc. 2015, vol. 2, pp. 518–522, 2016.
[29] J. Sadasivan, S. Mukherjee, and C. S. Seelamantula, “Joint dictionary training for bandwidth extension of speech signals,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2016–May, pp. 5925–5929, 2016.
[30] Y. K. Yılmaz and a T. Cemgil, “Generalised Coupled Tensor Factorisation,” Adv. Neural Inf. Process. Syst., pp. 2151--2159, 2011.
[31] D. Cai, X. He, J. Han, and T. S. Huang, “Graph Regularized Nonnegative Matrix Factorization for Data Representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1548–1560, 2011.
[32] J. Wang, J. Yang, K. Yu, F. Lv, and T. Huang, “Locality-constrained Linear Coding for Image Classification,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2010, pp. 3360–3367.
[33] L. Fei-fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples : An incremental Bayesian approach tested on 101 object categories,” vol. 106, pp. 59–70, 2007.
[34] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.
[35] S. Lazebnik and C. Schmid, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognition (CVPR), 2006, pp. 2169–2178.
[36] J. C. Lin, C. H. Wu, and W. L. Wei, “Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition,” IEEE Trans. Multimed., vol. 14, no. 1, pp. 142–156, Feb. 2012.
[37] M. R. Schädler and B. Kollmeier, “Separable Spectro-temporal Gabor Filter Bank Features: Reducing the Complexity of Robust Features for Automatic Speech Recognition,” J. Acoust. Soc. Am., vol. 137, no. 4, pp. 2047–2059, 2015.
[38] Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: Learning a discriminative dictionary for recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2651–2664, 2013.
[39] Y. S. Lee, C. Y. Wang, S. Mathulaprangsan, J. H. Zhao, and J. C. Wang, “Locality-preserving K-SVD Based Joint Dictionary and Classifier Learning for Object Recognition,” in Proc. ACM Multimedia Conf., 2016, pp. 481–485.
[40] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit,” CS Tech., pp. 1–15, 2008.
[41] R. Hennequin, “NMF-matlab, https://github.com/romi1502/NMF-matlab.” 2015.
[42] I. Luengo, E. Navas, I. Hernáez, and J. Sánchez, “Automatic Emotion Recognition using Prosodic Parameters,” in in Proc. of INTERSPEECH, 2005, pp. 493–496.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2019-5-1

推文