博碩士論文 101522104 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:10 、訪客IP:18.222.69.152
姓名 溫偉森(David Gunawan)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 應用以傅立葉轉換為基礎之動態性局部二值模式於自動化聲音訊號辨識
(Automatic Recognition of Audio Signal Using Dynamic Local Binary Patterns Based on Fourier Transform)
相關論文
★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 聲音辨識技術一直是一個很重要的課題,因為其發展使我們的生活更加便捷,並且,近年來此項技術也被廣泛應用在一些移動裝置如:智慧型手機、平板等等。因此,如何開發一套效果良好之音訊辨識系統非常重要。聲音可細分為很多種類型,在這篇論文中,我們針對環境聲音事件來研究。
我們提出的辨識系統以傅立葉轉換為基礎,結合了所提出之動態Local Binary Pattern (LBP) Uniform與具平滑化功能之Filter,並且利用Variance Measure (VAR)作為前處理來強化時頻圖之邊緣紋理與對比度。
在我們提出的系統中,利用Box Filter 與Gaussian Filter來使傅立葉轉換後的時頻圖平滑。此外,我們進一步考慮到時頻圖中能量分布差異的特性,提出了動態Local Binary Pattern (LBP) Uniform方法。本論文提出把頻譜圖分為不同頻段區域,並且藉由對LBP Histogram降維來動態的調整不同頻率之解析度,以形成特徵參數並藉由Support Vector Machine(SVM)來進行環境聲音辨認。
摘要(英) Sound recognition has become an important application in some devices. The type of sound to be recognized may vary, e.g., musical instrument sounds, environmental sounds, and speech. In this study we use environmental sound for our experiment.
Time-frequency, which can represent an audio signal, is a form of texture image that can be used for image classification. In this paper, we introduce a simple image classification method using local binary pattern (LBP) and an image smoothing method prior to feature extraction to reduce spectrogram image noise.
In this thesis, we combine spectrograms and LBP uniform with an image filter and variance measure (VAR) for contrast enhancement. We alsointroduce adynamic LBP method to reduce the dimension in difference dimension for each sub-band(high, middle, and low frequency). After using image filter as pre-treatment and VAR for contrast enhancement, weconcatenate all thesefeatures.
To remove image noise, we use two types of smoothing filter:a box filter (mean filter) and a Gauss filter. To improve recognition, filtering is applied as a pretreatment prior to feature extraction. To enhance local image texture contrast, such as object edges and corners, we use a VAR function. We use a support vector machine for the classifier.
關鍵字(中) ★ 傅立葉轉換
★ 聲音訊號辨識
★ 局部二值模式
關鍵字(英) ★ Local Binary Patterns
★ Automatic Recognition
★ Audio Signal
論文目次 摘要 ............................................................. i
ABSTRACT ................................................................ ii
ACKNOWLEDGEMENTS ....................................................... iii
LIST OF FIGURES ......................................................... vi
LIST OF TABLES ........................................................ viii
I. INTRODUCTION ......................................................... 1
II. IMAGE FILTER ......................................................... 3
2-1. Box or Mean Filter ..................................................................................................... 3
2-2. Gaussian Filter ........................................................................................................... 5
2-3. Comparison of Box Filter and Gaussian Filter........................................................... 7
III. LOCAL BINARY PATTERN ............................................... 9
3-1. Basic LBP ................................................................................................................... 9
3-2. Common Use of LBP ............................................................................................... 10
3-3. LBP Uniform ............................................................................................................ 11
3-4. Local Image TextureContrast ................................................................................... 13
IV. PROPOSED SYSTEM ..................................................... 15
4-1.System Overview ...................................................................................................... 15
4-2. Feature Extraction .................................................................................................... 16
4-2-1. Fourier Transform ......................................................................................... 17
4-2-2. LBP Basic Method ........................................................................................ 17
4-2-3. LBP Uniform Method ................................................................................... 17
4-2-4. Proposed Dynamic LBP Method ................................................................... 18
V. EXPERIMENT AND RESULTS .............................................. 22
6-1. Original or Individual Method Result ...................................................................... 22
6-2. Add Pre-Treatment or Image Filter Result ............................................................... 23
6-3. VAR after Image Filter Result ................................................................................. 24
6-4. Proposed Dynamic LBP Method ............................................................................. 25
6-4-1.
2
8,1
u LBP with 40-20-10 .................................................................................. 25
6-4-2.
2
8,1
u LBP with 50-30-20 ................................................................................. 27
6-4-3.
2
16,2
u LBP with 100-70-40 ............................................................................. 28
6-5. Feature Combination ................................................................................................ 29
6-6. Summary Result ....................................................................................................... 30
VI. CONCLUSION .......................................................... 32
REFERENCES .............................................................. 33
參考文獻 [1]
J.-C. Wang, H.-P. Lee, J.-F. Wang and C.-B. Lin, "Robust Environmental Sound Recognition for Home Automation," IEEE Trans. on Automation Science and Engineering, vol. 5, no. 1, pp. 25-31, Jan 2008.
[2]
S.-H. Shin, T. Hashimoto and S. Hatano, "Automatic Detection System for Cough Sounds as a Symptom of Abnormal Health Condition," IEEE Trans. on Information Technology in Biomedicine, vol. 13, no. 4, pp. 486-493, Jul 2009.
[3]
S. Chu, S. Narayanan and C. C. J. Kuo, "Environmental Sound Recognition with Time-Frequency Audio Features," IEEE Trans. on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142-1158, Aug 2009.
[4]
I. Paraskevas, S. Potirakis and M. Rangoussi, "Natural Soundscapes and Identification of Environmental Sounds : A Pattern Recognition Approach," in Digital Signal Processing, 2009 16th International Conference on, Santorini-Hellas, 2009.
[5]
Z. Shi, B. Gao, J. Han and Z. Wu, "Study of Objectionable Sound Recognition Based on Histogram Features and SVM", in Image and Signal Processing, 2009. CISP ′09. 2nd International Congress on, Tianjin, 2009.
[6]
A. Rabaoui, M. Davy, S. Rossignol and N. Ellouze, "Using One-Class SVMs and Wavelets for Audio Surveillance," IEEE Trans. on Information Forensics and Security, vol. 3, no. 4, pp. 763-775, Dec 2008.
[7] G. Muhammad and K. Alghathbar, “Environment Recognition from Audio Using MPEG-7 Features”, EM-Com 2009, 4th International Conference on, Jeju, 2009.
[8]
K. Hyoung-Gook and S. Thomas, “How Efficient is MPEG-7 for General Sound Recognition”, Metadata for Audio, 25thInternational Conference, London, 2004.
[9]
Chang-Hong Lin, Meng-Chi Tu, Yu-Hau Chin, Wei-Jun Liao, Cheng-Shu Hsu, Szu-Hsien Lin, Jia-Ching Wang, and Jhing-Fa Wang, “SVM-Based Sound Classification Based on MPEG-7 Audio LLDs and Related Enhanced Features”, Convergence and Hybrid Information Technology, 6th International Conference, Korea, 2012.
[10] D. Mitrovic, M. Zeppelzauer and H. Eidenberger, "Analysis of the Data Quality of Audio Descriptions of Environmental Sounds", Fourth Special Workshop Proceedings, Griechenland, 2006. [11] M. Muller, D.P.W. Ellis, A. Klapuri and G. Richard, “Signal Processing for Music Analysis”, IEEE Journal of Signal Processing, vol. 5, no. 6, pp. 1088-1110, Oct 2011.
[12]
P. Boonmatham, S. pongpinigpinyo and T. Soonklang, “A Comparison of Audio Features of Thai Classical Music Instrument”, Computing and Convergence Technology,7thInternational Conference, Seoul, Dec 2012.
[13]
T. Nicolas and T. Zenonas, ”Object Classification Using the MPEG-7 Visual Descriptor: An experiment Evaluation Using State of Art Data Classifier”, ICANN 2009, Part II, LNCS 5769, pp. 905-912, 2009.
[14]
S. Rahman, S.M. Naim, A. Al Faroog and M.M. Islam, “Performance of MPEG-7 Edge Histogram Descriptor in Face Recognition Using PCA”, Computer and Information Technology, 13thInternational Conference, Dhaka, Dec 2010. [15] N. Zaeri, F. Mokhtarian and A. Cherri, “Extension of the MPEG-7 Fourier Feature Descriptor for Face Recognition using PCA”, GCC Conference, Manama, March 2006. [16] Yong Man Ro, Munchurl Kim, Ho Kyung Kang, B.S. Manjunath and Jinwoong Kim, “MPEG-7 Homogeneous Texture Descriptor”, ETRI Journal, vol. 23, no. 2, June 2001.
[17]
W. K. Pratt, Digital Image Processing. New York: Wiley, 1978.
[18]
F. Robert, P. Simons, W. Ashley and W. Erik, "HIPR2," [Online]. Available: http://homepages.inf.ed.ac.uk/rbf/HIPR2/gsmooth.htm.
[19]
Robert Collins, CSE Department, Penn State University, Lecture 04.
[20] T.Ojala,M.Pietikäinen and D.Harwood, “A comparative study of texture measures with classification based on feature distributions”,Pattern Recognition, vol. 29(1) pp.51-59, 1996.
[21] T.Ojala,M.Pietikäinen, “Unsupervised Texture Segmentation Using Feature Distribution”,Pattern Recognition, vol.32, pp. 447-486, 1999.
[22] M.Pietikäinen, G. Zhao, A. Hadid and T. Ahonen, “Computer Vision Using Local Binary Patterns”, Springer, 2011.
[23] T.Ojala,M.Pietikäinenand T. Mäenpää, “Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns”, Proc. ECCV 2000, in press.
[24] T.Ojala, K.Valkealahti, E.Oja and M.Pietikäinen, “Texture Discrimination with multidimensional distributions”, Pattern Recognition,vol.34, pp. 727-739, 2001. [25]
"HIPR2", [Online]. Available: http://scikit-image.org/docs/dev/auto_examples/plot_local_binary_pattern.html
[26]
S.K. Kopparapu and M. Laxminarayana, “Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech”, Information Sciences Signal Processing and their Applications (ISSPA), 10th International Conference, Kuala Lumpur, May 2010.
[27]
R. Schluter and H. Ney, “Using Phase Spectrum Information for Improved Speech Recognition Performance”, ICASSP, 2001 IEEE International Conference, Salt Lake City, May 2001.
指導教授 王家慶 審核日期 2014-8-27
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明