具潛藏語意的階層型表示應用於音訊以及影像分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：86

、訪客IP：18.117.254.22

姓名

吳紹暉(Shao-Hui Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

具潛藏語意的階層型表示應用於音訊以及影像分類
(Latent Semantic Learning with Hierarchical Representation for Audio and Image Classification)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

階層式的分類是一種用來處理分類問題的常用架構，例如，在一個超級市場要買一瓶洗髮精，我們可能會先找到生活用品區，再找到沐浴用品區，最後拿到洗髮精；或是在很餓的時候，我們可能會先選擇吃麵或是吃飯，再來選擇要吃義大利麵、拉麵或是一般的家常麵，最後找到想吃的東西。本論文提出了一種基於高斯階層型潛在狄氏配置(Gaussian Hierarchical Latent Dirichlet Allocation , G-hLDA)的特徵表示法。我們藉由樹狀架構，將以音框為基底(Frame-Level)的特徵參數，例如，MFCC，經過每個階層分群成數個類別，並應用於音訊以及影像的分類問題。不同於一般的特徵表示方法，在我們提出的階層式架構中，可以藉由樹狀的結構得知類別間的相似程度，此外，我們提出的特徵表示也能擷取音訊及影像背後的潛藏語意。在模型中，我們將每個音檔或圖片視為主題模型中的文檔(document)，將以音框為基礎(frame-level)的特徵視為字詞(word)，尋找每個音檔或圖檔的潛在主題(latent topic)，利用巢狀式中國餐廳程序(Nested Chinese Restaurant Process, nCRP)來建構潛在主題間的樹狀結構。這個方法相較於階層式潛在狄氏配置(Hierarchical Latent Dirichlet Allocation , hLDA)可以直接處理從資料中擷取的特徵，能減少因向量化而造成的量化誤差，而相較於高斯潛在狄氏配置(Gaussian latent Dirichlet allocation, G-LDA)，能找出潛在主題間的階層性，並解決模型選擇(Model Selection)的問題。
在論文中，我們將實驗分為音訊分類以及影像分類。在音訊分類上，我們使用吉他技巧資料庫，吉他技巧資料庫為一種不同類別間有許多相似成分的資料庫，在影像上，我們使用影像分類的場景資料庫。實驗結果顯示，我們的方法在影像以及音訊的分類問題上，都有較好的表現。

摘要(英)

Hierarchical classification is one of the most popular method to deal with the classification problems. For example, the items organized hierarchically in shopping website or the book store. In our work, we proposes a topic model for discovering the hierarchical latent characteristics behind the frame-level features. In our model, frame-level features are regarded as words, clip-level feature are regarded as document. A Gaussian hierarchical latent Dirichlet allocation (G-hLDA) is proposed to fnd the latent topics behind the continuous features. Unlike other method, Ghlda can capture latent semantic and construct tree-structured representation. We use the nested Chinese restaurant process (nCRP) as the prior distribution for the tree-structured model.Compared to Hierarchical Latent Dirichlet Allocation (hLDA),the G-hLDA directly handles the continuous features instead of transforming them into discrete words, reducing information loss from discretization-based vector quantization. It can constructs the tree-structured representation for continuous features directly. Compared to Gaussian latent Dirichlet allocation(G-LDA), it can find the Hierarchical behind latent topic and solved the problem of model selection.
In this paper, we do our experiments on audio classification and image classification problem. In the part of audio classification, we use the guitar techniques dataset. On other hand, natural scene dataset is used in image classification.The experimental results demonstrate that the proposed method outperforms baselines in terms of the F-score and the accuracy.

關鍵字(中)

★ 階層型表示法
★ 階層型潛在狄式配置
★ 高斯成分

關鍵字(英)

★ Hierarchical Representation
★ Hierarchical Latent Dirichlet Allocation
★ Gaussian Component

論文目次

摘要 ii
Abstract iii
章節目次 iv
圖目錄 viii
表目錄 ixx
第一章緒論 1
1.1 前言 1
1.2 研究動機與目的 1
1.3 論文架構與章節概要 2
第二章相關文獻探討 3
2.1 音訊特徵 3
2.1.1 時頻圖 3
2.1.2 線性預估係數(Linear Prediction Coefficients, LPC) 4
2.1.3 色度特徵(Chroma Feature) 5
2.1.4 梅爾頻譜(Mel-spectrum) 5
2.1.5 梅爾頻率倒譜系數 (Mel-Frequency Cepstral Coefficients, MFCCs) 6
2.2 影像特徵 7
2.2.1 局部二值模式(Local Binary Patterns, LBP) 7
2.2.2 方向梯度直方(Histogram of oriented gradient, HOG) 8
2.2.3 尺度不變特徵轉換(Scale-invariant feature transform, SIFT) 8
2.2.4 加速穩健特徵(Speeded Up Robust Features, SURF) 9
2.2.5 SPM (Spatial pyramid matching, SPM) 9
2.3 特徵學習 10
2.3.1 基於編碼簿的特徵表示法(Codebook based Feature Representation) 10
2.3.1.1 向量量化編碼(Vector Quantization, VQ) 11
2.3.1.2 稀疏編碼(Sparse Coding, SC) 12
2.3.1.3 非負矩陣分解(Nonnegative Matrix Factorization, NMF) 13
2.3.2 主題模型表示法(Topic Model based Feature Representation) 14
2.3.2.1潛在狄式配置(Latent Dirichlet Allocation, LDA) 15
2.3.2.2監督式潛在狄式配置(supervised Latent Dirichlet Allocation, sLDA) 17
2.3.2.3高斯潛在狄式配置(Gaussian Latent Dirichlet Allocation, GLDA) 18
2.3.2.4階層型潛在狄式配置(Hierarchical Latent Dirichlet Allocation, hLDA) 19
第三章具潛藏語意的階層型表示 22
3.1 系統簡介 22
3.2 支持向量機(Support Vector Machine, SVM) 22
3.3 高斯階層式潛在狄氏配置(Gaussian Hierarchical Latent Dirichlet Allocation, Gaussian-hLDA) 24
3.3.1 模型介紹 29
3.3.2 模型推論 32
3.3.3 考慮鑑別性因素之事前機率(Discriminative term) 34
3.3.4 特徵參數的形成 34
3.3.5 演算法介紹 36
第四章實驗結果 37
4.1 實驗配置 37
4.2 實驗分析 38
4.2.1 收斂分析 38
4.2.2 深度影響 39
4.2.3 階層型表示的鑑別能力 40
4.3 實驗結果 43
4.3.1 音訊分類實驗結果 43
4.3.2 影像分類實驗結果 45
第五章結論及未來研究方向 47
參考文獻 48

參考文獻

[1] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. ”Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.
[2] DM Blei, TL Griffiths, MI Jordan, “The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies,” Journal of the ACM, 2010.
[3] P Hu, W Liu, W Jiang, Z Yang, “Latent Topic Model Based on Gaussian-LDA for Audio Retrieval, ” Pattern Recognition -Springer, 2012.
[4] O′Shaughnessy, Douglas. ”Linear predictive coding.” IEEE potentials 7.1 (1988): 29-32.
[5] Molau, Sirko, et al. ”Computing mel-frequency cepstral coefficients on the power spectrum.” Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP′01). 2001 IEEE International Conference on. Vol. 1. IEEE, 2001.
[6] Logan, Beth. ”Mel Frequency Cepstral Coefficients for Music Modeling.” ISMIR. 2000.
[7] MM Shafiei, EE Milios “Latent Dirichlet co-clustering,” Data Mining, ICDM’06. Sixth International Conference, 2006.
[8] Nakanom, M; Ohishi, Y.; Kameoka, H; Mukai, Ryo “Bayesian nonparametric music parse,” Acoustics, Speech and Signal Procssing(ICASSP) International Conference, 2012.
[9] Canny, John. ”A computational approach to edge detection.” IEEE Transactions on pattern analysis and machine intelligence 6 (1986): 679-698.
[10] Pietikäinen, Matti. ”Local binary patterns.” Scholarpedia 5.3 (2010): 9775.
[11] Ahonen, Timo, Abdenour Hadid, and Matti Pietikainen. ”Face description with local binary patterns: Application to face recognition.” IEEE transactions on pattern analysis and machine intelligence 28.12 (2006): 2037-2041.
[12] Dalal, Navneet, and Bill Triggs. ”Histograms of oriented gradients for human detection.” Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005.
[13] Lowe, David G. ”Object recognition from local scale-invariant features.” Computer vision, 1999. The proceedings of the seventh IEEE international conference on. Vol. 2. Ieee, 1999.
[14] Ke, Yan, and Rahul Sukthankar. ”PCA-SIFT: A more distinctive representation for local image descriptors.” Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 2. IEEE, 2004.
[15] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. ”Surf: Speeded up robust features.” Computer vision–ECCV 2006 (2006): 404-417.
[16] Bay, Herbert, et al. ”Speeded-up robust features (SURF).” Computer vision and image understanding 110.3 (2008): 346-359.
[17] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[18] Gray, Robert. ”Vector quantization.” IEEE Assp Magazine 1.2 (1984): 4-29.
[19] Lee, Honglak, et al. ”Efficient sparse coding algorithms.” Advances in neural information processing systems. 2007.
[20] J. Mairal et al. Online dictionary learning for sparse coding. In Int. Conf. Machine Learning, pages 689–696, 2009.
[21] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[22] C. Ding, T. Li, and M. I. Jordan, “Convex and semi-nonnegative matrix factorizations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 1, pp. 45-55, Jan. 2010.v
[23] H. Kim and H. Park, “Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares,” in Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pp. 1147-1151, Oct. 2007.
[24] S Tong, D Koller, “Support vector machine active learning with applications to text classification, ” The Journal of Machine Learning Research, 2002.
[25] Y Linde, A Buzo, RM Gray, “An algorithm for vector quantizer design, ” IEEE Transactions, 1980.
[26] S Kim, S Narayanan, Sundaram, S, “Acoustic topic model for audio information retrieval, ” Applications of Signal Processing to Audio and Acoustics, WASPAA ‘09, 2009.
[27] Chong, Wang, David Blei, and Fei-Fei Li. ”Simultaneous image classification and annotation.” Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
[28] Ramos, Juan. ”Using tf-idf to determine word relevance in document queries.” Proceedings of the first instructional conference on machine learning. Vol. 242. 2003.
[29] Huang, Chu-Ren, and Lung-Hao Lee. ”Contrastive Approach towards Text Source Classification based on Top-Bag-of-Word Similarity.” PACLIC. 2008.
[30] Lafferty, John D., and David M. Blei. ”Correlated topic models.” Advances in neural information processing systems. 2006.
[31] Blei, David M. ”Probabilistic topic models.” Communications of the ACM 55.4 (2012): 77-84.
[32] Blei, David M., and John D. Lafferty. ”A correlated topic model of science.” The Annals of Applied Statistics (2007): 17-35.
[33] Liu, Xuezheng, et al. ”Boosting image classification with LDA-based feature combination for digital photograph management.” Pattern Recognition 38.6 (2005): 887-901.
[34] Wang, Xiaogang, and Eric Grimson. ”Spatial latent dirichlet allocation.” Advances in neural information processing systems. 2008.
[35] Mcauliffe, Jon D., and David M. Blei. ”Supervised topic models.” Advances in neural information processing systems. 2008.
[36] Perotte, Adler J., et al. ”Hierarchically supervised latent Dirichlet allocation.” Advances in Neural Information Processing Systems. 2011.
[37] Nguyen, Viet-An, Jordan L. Boyd-Graber, and Philip Resnik. ”Lexical and hierarchical topic regression.” Advances in neural information processing systems. 2013.
[38] Chien, Jen-Tzung, and Ying-Lan Chang. ”Hierarchical theme and topic model for summarization.” Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on. IEEE, 2013.
[39] T-Distributed Stochastic Neighbor Embedding (t-SNE), http://lvdmaaten.github.io/tsne/
[40] L.J.P. van der Maaten; G.E. Hinton “Visualizing High-Dimensional Data Using t-SNE,” Journal of Machine Learning Research 9, 2008.
[41] Guitar playing techniques dataset(GPT) http://mac.citi.sinica.edu.tw/GuitarTranscription/
[42] J. Pitman, “Poisson–Dirichlet and GEM Invariant Distributions for Split-and-Merge Transformations of an Interval Partition,” Combinatorics, Probability and Computing, vol. 11, pp. 501–514, 2002.
[43] Lindeberg, Tony. ”Scale invariant feature transform.” Scholarpedia 7.5 (2012): 10491.
[44] Wang, C., D. Blei, and L. Fei-Fei. ”Supervised latent dirichlet allocation for classification.” (2009).
[45] Fei-Fei, Li, and Pietro Perona. ”A bayesian hierarchical model for learning natural scene categories.” Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.

指導教授

王家慶(Jai-Ching Wang)

審核日期

2017-8-18

推文