基於高斯程序回歸模型與變異型自編碼器之強健性聲音辨識方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：95

、訪客IP：3.147.53.221

姓名

謝旻哲(Min-Che Hsieh) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於高斯程序回歸模型與變異型自編碼器之強健性聲音辨識方法
(Robust Audio Recognition Based on Gaussian Process Regression Model and Variational Auto-encoder)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

聽覺在人們生活中站了一大部分的地位，在擁有聽覺情況下，聲音讓人們更加清楚周遭的狀況，並且使生活多了一點色彩。在各式各樣的聲音種類下，若經由強健其特徵與自動化的分類方法，有助於迅速瞭解各式的緊急狀況或增進學習效果，因此以環境聲音、樂器聲為類別來進行分類與強健，也逐漸受到重視。
在傳統的自編碼器上，主要是經由神經網路[29]去做重建，並有利於連接各式分類器提升其辨識效果；而變異型自編碼器(Variational Auto-encoder , VAE)引入隨機變分推理[25]，運用隨機梯度法使重新參數化的變分下界可以達到最佳優化的結果，進而使用識別模型(Recognition model)近似較難處理之後驗分佈(Posterior distribution)。基於高斯程序回歸模型(Gaussian Process Regression Model)也須經由訓練其參數得出下界值，並加以結合變異型自編碼器與高斯程序回歸模型之下界，使其同時訓練其參數以便減少各別訓練之時間，達到最佳優化之效果。
在實驗部分，為了顯示出此模型之強健性，我們藉此比較有噪聲與無噪聲之辨識效果，而我們也將討論不同的初始參數設定的差異，了解其收斂速度與辨識效果。

摘要(英)

The sense of hearing plays an important role in human’s daily life. In the case of hearing circumstances, sense of hearing not only enables people to understand the situation more clearly, but also enrich people’s life more colorful. Within all various of sound types, if we apply robust features and automated classification methods can assist us to understand different types of emergencies more quickly or enhance the effect of learning. Therefore, the classification of categories and robustness through ambient sound and musical instruments has gradually been taken more seriously.
In the traditional auto-encoder, photos and audios are mainly reconstructed through the neural network [29], and it is conducive to connect all kinds of classifiers to enhance its recognition effect. On the other side, variational auto-encoder introduced random variational inference [25]. It uses the stochastic gradient method to re-parameterize the variational lower bound to achieve the best optimization results. Afterwards, they use the recognition model to estimate the more difficult the Posterior distribution. The Gaussian Process Regression Model is also required to derive the lower bound by training its parameters, and then we combine the lower bound of the variational auto-encoder and Gaussian process regression model. Finally, we train these parameters which including (Gaussian process regression model and the variational auto-encoder) will achieve the best optimize effect, by reducing the cost of time.
In the experimental part, in order to show the robustness of this model, we compare the differences between noise and clean identification effect. And we will also discuss the differences between the initial parameters of different, to discover its speed of convergence and identification effect.

關鍵字(中)

★ 高斯程序回歸模型
★ 變異型自編碼器
★ 變異推理

關鍵字(英)

★ Gaussian Process Regression Model
★ Variational Auto-encoder
★ variational inference

論文目次

中文摘要 i
Abstract ii
章節目次 iii
圖目錄 v
表目錄 vi
第一章緒論 1
1.1 背景 1
1.2 研究動機與目的 2
1.3 研究方法與章節概要 3
第二章相關研究及文獻探討 4
2.1 特徵學習 4
2.1.1 原始資料處理 4
2.1.2 非負矩陣分解（Nonnegative matrix factorization , NMF） 5
2.1.3 稀疏表示(Sparse Representation , SR) 6
2.1.4 主成分分析(Principal component analysis , PCA) 8
2.2 分類器 9
2.2.1 支援向量機(Support Vector Machine , SVM) 9
2.2.2 貝氏支援向量機(Bayesian Support Vector Machine , BSVM) 11
2.2.3 高斯程序(Gaussin Process , GP) 13
2.2.4 高斯混合模型（Gaussian Mixture Model, GMM） 16
第三章自編碼器 20
3.1 自編碼器 20
3.2 變異型自編碼器模型 24
3.2.1 變異型界限(The variational bound) 24
3.2.2 變異型貝氏隨機梯度估計器和AEVB演算法 25
3.2.3 變異型自編碼器 28
第四章變異型自編碼器之結合 30
4.1　變異型界限(The variational bound) 30
4.1.1 核函數-線性 30
4.1.2 核函數-RBF 31
4.2　預測分類 33
第五章實驗結果 35
5.1實驗設置與環境 35
5.2實驗流程 37
5.3 吉他技巧之強健性辨識實驗 38
第六章結論及未來研究方向 45
參考文獻 46

參考文獻

[1] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[2] D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” in Advances in Neural Information Processing 13 (Proc. NIPS 2000), MIT Press, 2001.
[3] D. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Trans. on Information Theory, vol. 47, no. 7, pp. 2845–2862, 2001.
[4] Y. Lin and D. Lee, “Bayesian L1-Norm sparse learning,” in ICASSP, 2006, vol. 5, pp. 605–608.
[5] D. Wipf and B. Rao, “Sparse bayesian learning for basis selection,” IEEE Trans. on Signal Processing, vol. 52, no. 8, pp. 2153–2164, 2004.
[6] A.M.Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 2, pp. 228–233, Feb. 2001.
[7] M. I. Mandel and D. P. W. Ellis, “Multiple-Instance Learning for Music Information Retrieval,” The International Society for Music Information Retrieval, 2008.
[8] S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support vector machines for multiple-instance learning”. Advances in Neural Information Processing Systems, vol. 15, pp. 561–568. MIT Press, Cambridge, MA, 2003.
[9] Y. Chen, J. Bi, and J. Z. Wang, “MILES: Multiple-instance learning via embedded instance selection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp. 1931–1947, 2006.
[10] Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[11] V. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., 1995.16
[12] R. Henao, X. Yuan, and L. Carin. Bayesian nonlinear SVMs and factor modeling. NIPS, 2014.
[13] N. G. Polson and S. L. Scott. Data augmentation for support vector machines. Bayes. Anal., 2011.
[14] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005.
[15] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504-507, 2006.
[16] A. Ng, “Sparse autoencoder,” CS294A Lecture notes, pp. 72-2011.
[17] David M Blei, Michael I Jordan, and John W Paisley.Variational bayesian inference with stochastic search. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1367–1374, 2012.
[18] D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In ICLR, 2014.
[19] Mnih, Andriy, and Karol Gregor. ”Neural variational inference and learning in belief networks.” arXiv preprint arXiv:1402.0030 (2014).
[20] Kaae Sønderby, C., et al. ”How to train deep variational autoencoders and probabilistic ladder networks. arXiv preprint.” arXiv preprint arXiv:1602.02282(2016).
[21] Salimans, Tim. ”A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features.” arXiv preprint arXiv:1602.08734 (2016).
[22] Kingma, Diederik P., and Max Welling. ”Stochastic gradient VB and the variational auto-encoder.” Second International Conference on Learning Representations, ICLR. 2014.
[23] Pu, Yunchen, et al. ”Variational autoencoder for deep learning of images, labels and captions.” Advances in Neural Information Processing Systems. 2016.
[24] Pu, Yunchen, et al. ”Variational Autoencoder for Deep Learning of Images, Labels and Captions: Supplementary Material.”
[25] Hoffman, Matthew D., et al. ”Stochastic variational inference.” The Journal of Machine Learning Research 14.1 (2013): 1303-1347.
[26] CireşAn, Dan, et al. ”Multi-column deep neural network for traffic sign classification.” Neural Networks 32 (2012): 333-338.
[27] Wainwright, Martin J., and Michael I. Jordan. ”Graphical models, exponential families, and variational inference.” Foundations and Trends® in Machine Learning 1.1–2 (2008): 1-305.
[28] Su, Li, Li-Fan Yu, and Yi-Hsuan Yang. ”Sparse Cepstral, Phase Codes for Guitar Playing Technique Classification.” ISMIR. 2014.
[29] Quiñonero-Candela, Joaquin, and Carl Edward Rasmussen. ”A unifying view of sparse approximate Gaussian process regression.” Journal of Machine Learning Research 6.Dec (2005): 1939-1959.
[30] Rowley, Henry A., Shumeet Baluja, and Takeo Kanade. ”Neural network-based face detection.” IEEE Transactions on pattern analysis and machine intelligence20.1 (1998): 23-38.
[31] Herman, Gabor T. ”Image reconstruction from projections.” Image Reconstruction from Projections: Implementation and Applications (1979).
[32] Berry, Michael W., et al. ”Algorithms and applications for approximate nonnegative matrix factorization.” Computational statistics & data analysis 52.1 (2007): 155-173.
[33] Wright, John, et al. ”Robust face recognition via sparse representation.” IEEE transactions on pattern analysis and machine intelligence 31.2 (2009): 210-227.
[34] Wold, Svante, Kim Esbensen, and Paul Geladi. ”Principal component analysis.” Chemometrics and intelligent laboratory systems 2.1-3 (1987): 37-52.
[35] Yang, Ming-Hsuan, and Narendra Ahuja. ”Gaussian mixture model for human skin color and its applications in image and video databases.” Storage and Retrieval for Image and Video Databases (SPIE). 1999.
[36] Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.
[37] Joyce, James M. ”Kullback-leibler divergence.” International Encyclopedia of Statistical Science. Springer Berlin Heidelberg, 2011. 720-722.
[38] Sotiris, Vasilis A., W. Tse Peter, and Michael G. Pecht. ”Anomaly detection through a bayesian support vector machine.” IEEE Transactions on Reliability59.2 (2010): 277-286.
[39] Zheng, Zijian, and Geoffrey I. Webb. ”Lazy learning of Bayesian rules.” Machine Learning 41.1 (2000): 53-84.
[40] Moon, Todd K. ”The expectation-maximization algorithm.” IEEE Signal processing magazine 13.6 (1996): 47-60.
[41] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet, “Semantic Annotation and Retrieval of Music and Sound Effects,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, pp. 467–476, Feb., 2008.
[42] A. Mnih and K. Gregor. Neural variational inference and learning in belief networks. In ICML, 2014.
[43] D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
[44] Scholkopf, Bernhard, et al. ”Comparing support vector machines with Gaussian kernels to radial basis function classifiers.” IEEE transactions on Signal Processing 45.11 (1997): 2758-2765.
[45] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint arXiv:1401.4082, 2014.
[46] Wu, Gin-Der, and Chin-Teng Lin. ”Word boundary detection with mel-scale frequency bank in noisy environment.” IEEE transactions on speech and audio processing 8.5 (2000): 541-554.
[47] Hsu, Chih-Wei, and Chih-Jen Lin. ”A comparison of methods for multiclass support vector machines.” IEEE transactions on Neural Networks 13.2 (2002): 415-425.
[48] Deng, Li, et al. ”Binary coding of speech spectrograms using a deep auto-encoder.” Eleventh Annual Conference of the International Speech Communication Association. 2010.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2017-8-18

推文