具潛藏語意的階層型表示應用於音訊以及影像分類;Latent Semantic Learning with Hierarchical Representation for Audio and Image Classification

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/74737

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/74737

題名:	具潛藏語意的階層型表示應用於音訊以及影像分類;Latent Semantic Learning with Hierarchical Representation for Audio and Image Classification
作者:	吳紹暉;Wu, Shao-Hui
貢獻者:	資訊工程學系
關鍵詞:	階層型表示法;階層型潛在狄式配置;高斯成分;Hierarchical Representation;Hierarchical Latent Dirichlet Allocation;Gaussian Component
日期:	2017-08-18
上傳時間:	2017-10-27 14:37:55 (UTC+8)
出版者:	國立中央大學
摘要:	階層式的分類是一種用來處理分類問題的常用架構，例如，在一個超級市場要買一瓶洗髮精，我們可能會先找到生活用品區，再找到沐浴用品區，最後拿到洗髮精；或是在很餓的時候，我們可能會先選擇吃麵或是吃飯，再來選擇要吃義大利麵、拉麵或是一般的家常麵，最後找到想吃的東西。本論文提出了一種基於高斯階層型潛在狄氏配置(Gaussian Hierarchical Latent Dirichlet Allocation , G-hLDA)的特徵表示法。我們藉由樹狀架構，將以音框為基底(Frame-Level)的特徵參數，例如，MFCC，經過每個階層分群成數個類別，並應用於音訊以及影像的分類問題。不同於一般的特徵表示方法，在我們提出的階層式架構中，可以藉由樹狀的結構得知類別間的相似程度，此外，我們提出的特徵表示也能擷取音訊及影像背後的潛藏語意。在模型中，我們將每個音檔或圖片視為主題模型中的文檔(document)，將以音框為基礎(frame-level)的特徵視為字詞(word)，尋找每個音檔或圖檔的潛在主題(latent topic)，利用巢狀式中國餐廳程序(Nested Chinese Restaurant Process, nCRP)來建構潛在主題間的樹狀結構。這個方法相較於階層式潛在狄氏配置(Hierarchical Latent Dirichlet Allocation , hLDA)可以直接處理從資料中擷取的特徵，能減少因向量化而造成的量化誤差，而相較於高斯潛在狄氏配置(Gaussian latent Dirichlet allocation, G-LDA)，能找出潛在主題間的階層性，並解決模型選擇(Model Selection)的問題。在論文中，我們將實驗分為音訊分類以及影像分類。在音訊分類上，我們使用吉他技巧資料庫，吉他技巧資料庫為一種不同類別間有許多相似成分的資料庫，在影像上，我們使用影像分類的場景資料庫。實驗結果顯示，我們的方法在影像以及音訊的分類問題上，都有較好的表現。 ;Hierarchical classification is one of the most popular method to deal with the classification problems. For example, the items organized hierarchically in shopping website or the book store. In our work, we proposes a topic model for discovering the hierarchical latent characteristics behind the frame-level features. In our model, frame-level features are regarded as words, clip-level feature are regarded as document. A Gaussian hierarchical latent Dirichlet allocation (G-hLDA) is proposed to fnd the latent topics behind the continuous features. Unlike other method, Ghlda can capture latent semantic and construct tree-structured representation. We use the nested Chinese restaurant process (nCRP) as the prior distribution for the tree-structured model.Compared to Hierarchical Latent Dirichlet Allocation (hLDA),the G-hLDA directly handles the continuous features instead of transforming them into discrete words, reducing information loss from discretization-based vector quantization. It can constructs the tree-structured representation for continuous features directly. Compared to Gaussian latent Dirichlet allocation(G-LDA), it can find the Hierarchical behind latent topic and solved the problem of model selection. In this paper, we do our experiments on audio classification and image classification problem. In the part of audio classification, we use the guitar techniques dataset. On other hand, natural scene dataset is used in image classification.The experimental results demonstrate that the proposed method outperforms baselines in terms of the F-score and the accuracy.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	468	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....