多通道之多重音頻串流方法之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：13

、訪客IP：3.17.166.149

姓名

官志誼(Chih-Yi Kuan) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

多通道之多重音頻串流方法之研究
(Multi-Channel Method For Multiple Pitch streaming)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

基礎頻率分析在數位訊號處理中是一項重要課題並可以延伸到許多相關的研究，無論是在音樂或者語音上皆有其中要性，本論文主要討論多個單音音源的音頻串流方法，本論文提出之系統需要三個輸入，分別為音源個數、基頻偵測結果、混合音檔。而整體系統可以分為兩個階段，第一階段為依據基頻偵測結果將每一個音高取得相對應特徵參數，第二階段則將上述所有資料進行的聚類，最後輸出各個音源的音頻串流，簡單來說即是每個時刻每個音源演奏哪些音高的資訊。
本論文在特徵參數方面我提出了新的多通道方位特徵參數，並與其他音色特徵參數融合成為更加強健的特徵參數，聚類方面我們基於粒子群最佳化演算法提出了兩種不同架構，並融合領域知識於其中來提高準確率。另外本論文特別針對音源音域接近、音頻串流纏繞頻繁的音檔來設計並能有更好的準確率。

摘要(英)

Fundamental frequency analysis of multiple sound mixtures is a important information in audio signal processing. To know the information of fundamental frequency can be extended for several applications like in music information retrieval, automatic music transcription, melody extraction, instrument identification. In speech research, like speech separation, speech recognition and prosody analysis. This paper aims at source transcription of polyphonic audio, can be consisting of two stages .Stage one is to detecting each pitches values provide by different sources in every time frame is known as multiple F0 estimation. Stage two is to clustering all the pitch which detected in stage one into a single pitch trajectory originating from the corresponding sources. The main focus on this paper is to do source clustering of the detected pitch in polyphonic audio signal which the pitch provide by different sources playing at the same time.
Although many works have been proposed to do source transcription, multi pitch streaming, multi F0 source clustering there are still various challenges in this task. In feature extraction, since the different sound sources playing simultaneously, the pitch contour numerous overlap in the mixture audio, is hard to generating the source characterizing feature corresponding to different F0 values, especially in music case. To solve this problem we adopt multi -channel approach to improve the source characterizing feature.
While source characterizing feature corresponding to each pitches has been extracted. The next step is to clustering all the detected pitches into corresponding sources. Since the supervised approaches require more information of isolated recordings to training models, our approach focus on the unsupervised way. We introduce a new Constrained PSO clustering which can deal with this task more precise.
This paper introduce a novel scheme for the source transcription of polyphonic sound mixture. Our approach need three inputs: multi-channel mixture sound, multi pith estimation (MPE) values, number of sources. We use the Ground truths multi pitch values and some other multi pitch estimation work provide by Duan et al. as MPE input. Then, use this MPE value to extract both timbre and direction feature, and concatenation two feature with the STD weight. After feature extraction, we use the Constrained PSO clustering to try all the possible of the clustering distribution and to find minimize timbre and direction inconsistency. Finally, we can map the clustering result back to each individual pitches labels and output the every single pitch trajectory from corresponding sources.

關鍵字(中)

★ 基礎頻率分析
★ 多重音頻串流
★ 粒子群最佳化

關鍵字(英)

★ pitch detection
★ Multi pitch streaming
★ PSO

論文目次

中文摘要 ii
英文摘要 iii
圖目錄 v
表目錄 vii
符號說明 ix
章節目次 xi
第一章緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 研究方法與章節概要 3
第二章文獻概要 5
第三章特徵參數擷取 8
3.1 音色特徵參數 8
3.2 方位特徵參數 9
3.3 特徵參數融合 11
第四章聚類方法 12
4.1 K-Means 聚類演算法 12
4.2 限制型粒子群最佳化聚類演算法 13
4.2.1 粒子群最佳化演算法 13
4.2.2 粒子群最佳化演算法運用於聚類問題 15
4.2.3 相關性與互斥性表格 17
4.2.4 符合更多相關性的資料搜尋方法 19
4.2.5 限制型粒子群聚類演算法 20
第五章整體系統架構 30
5.1 系統架構 30
5.2 群組投票 30
5.3 碰撞處理 31
第六章實驗 32
6.1 效能評估方式與標準解答製作 32
6.2 資料庫與混合音檔設置 33
6.3 特徵參數評估 38
6.4 聚類方法評估 48
6.5 整體系統評估 52
第七章結論及未來研究方向 55

參考文獻

[1] R. Hennequin, B. David, and R. Badeau, “Score informed audio source separation using a parametric model of non-negative spectrogram,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2011, pp. 45–48.
[2] Z. Duan and B. Pardo, “Soundprism: An online system for score-informed source separation of music audio,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, pp. 1205–1215, Dec. 2011.
[3]J. Ganseman, P. Scheunders, G. J. Mysore, and J. S. Abel, “Evaluation of a score-informed source separation system,” in Proc. Int. Soc. Music Inf. Retrieval (ISMIR), 2010, pp. 219–224.
[4]J. Woodruff, B. Pardo, and R. B. Dannenberg, “Remixing stereo music with score-informed source separation,” in Proc. Int. Conf. Music Inf. Retrieval (ISMIR), 2006, pp. 314–349.
[5]A. Klapuri and M. Davy, Eds.,“Signal Processing Methods for Music Transcription”. New York, NY, USA: Springer, 2006.
[6]D. Campbell, K. Palomäki, and G. Brown, “A matlab simulation of shoebox room acoustics for use in research andiimm teaching,” Comput. Inf. Syst. J., vol. 9, no. 3, pp. 48–51, Oct. 2005.
[7]Nadine Kroher; Emilia Gómez “Automatic Transcription of Flamenco Singing From Polyphonic Music Recordings“IEEE/ACM Transactions on Audio, Speech, and Language Processing.Year: 2016, Pages : 901 – 913 DOI : 10.1109 / TASLP.2016.2531284
[8]A. de Cheveigné and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, pp.1917–1930, 2002.
[9]M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon, J. Dai, J. Bello and S. Dixon, “Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency”, in Proceedings of the First International Conference on Technologies for Music Notation and Representation, 2015.
[10]Zhiyao Duan, Jinyu Han, and Bryan Pardo, “Multi-pitch Streaming of Harmonic Sound Mixtures ,” IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 1, JANUARY
[11] P. Leveau, E. Vincent, G. Richard, and L. Daudet, “Instrument-specific harmonic atoms for mid-level music representation,” IEEE Trans.Audio, Speech, Lang. Process., vol. 16, no. 1, pp. 116–128, Jan. 2008.
[12] V. Arora and L. Behera, “On-line melody extraction from polyphonic audio using harmonic cluster tracking,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 3, pp. 520–530, Mar. 2013.
[13]Justin Salamon; Emilia Gomez; Daniel P. W. Ellis; Gael Richard “Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges” IEEE Signal Processing Magazine Year: 2014 Pages: 118 - 134
[14] P. Mowlaee, R. Saeidi, M. G. Christensen, Z.-H. Tan, T. Kinnunen, P. Franti, and S. H. Jensen, “A joint approach for single-channel speaker identification and speech separation,” IEEE Trans. Audio, Speech,Lang. Process., vol. 20, no. 9, pp. 2586–2601, Nov. 2012.
[15] M. Cooke, J. R. Hershey, and S. Rennie, “Monaural speech separation and recognition challenge” Comput. Speech Lang., vol. 24, pp. 1–15, 2010.
[16]Yun-Kyung Lee; In Sung Lee; Oh-Wook Kwon“Single channel speech separation using phase-based methods”Yun-Kyung Lee; In Sung Lee; Oh-Wook Kwon IEEE Transactions on Consumer Electronics Year: 2010, Pages: 2453 - 2459,
[17] D.-n. Jiang, W. Zhang, L.-q. Shen, and L.-h. Cai, “Prosody analysis andmodeling for emotional speech synthesis,” in Proc. IEEE Int. Conf. Audio, Speech, Signal Process. (ICASSP), 2005, pp. 281–284

[18] Siddharth Sigtia; Emmanouil Benetos; Simon Dixon, ‘’An End-to-End Neural Network for Polyphonic Piano MusicTranscription ‘’ IEEE/ACM Transactions on Audio, Speech, and Language Processing Year: 2016, Pages: 927 - 939, DOI: 10.1109/TASLP.2016.2533858
[19] V. Arora and L. Behera, “Musical source clustering and identification in polyphonic audio,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1003–1012, Jun. 2014.
[20] T. Heittola, A. Klapuri, and T. Virtanen, “Musical instrument recognition in polyphonic audio using source-filter model for sound separation,” in Proc. Int. Symp. Music Inf. Retreival (ISMIR), 2009.
[21] E. Benetos, M. Kotti, and C. Kotropoulos, “Musical instrument classification using non-negative matrix factorization algorithms and subset feature selection,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2006, pp. 221–224

[22]R. Jaiswal, D. FitzGerald, D. Barry, E. Coyle, and S. Rickard, “Clustering NMF basis functions using shifted NMF for monaural sound source separation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2011, pp. 245–248.
[23]F. Rigaud, A. Falaize, B. David, and L. Daudet, “Does inharmonicityimprove an NMF-based piano transcription model? ” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 11–15
[24] P. Smaragdis, B. Raj, and M. Shashanka, “A probabilistic latent variable model for acoustic modeling,” Adv. Models for Acoust. Process.,
NIPS, vol. 148, 2006.
[25]G. Grindlay and D. P. W. Ellis, “Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, pp. 1159–1169, Oct. 2011.
[26]V. Arora and L. Behera, “Semi-supervised polyphonic source identification using PLCA based graph clustering,” in Proc. Int. Symp. Music Inf. Retreival (ISMIR), 2013.
[27] V. Arora and L. Behera, “Musical source clustering and identification in polyphonic audio IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1003–1012, Jun. 2014.
[28] L. G. Martins, J. J. Burred, G. Tzanetakis, and M. Lagrange, “Polyphonic instrument recognition using spectral clustering.,” in Proc. Int. Symp. Music Inf. Retreival (ISMIR), 2007.
[29]M. Wohlmayr, M. Stark, and F. Pernkopf, “A probabilistic interaction model for multipitch tracking with factorial hidden Markov models,”IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 799–810,May 2011
[30]F. Bach and M. Jordan, “Discriminative training of hidden Markov models for multiple pitch tracking,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process. (ICASSP), 2005, pp. 489–492.
[31]M. Bay, A. F. Ehmann, J. W. Beauchamp, P. Smaragdis, and J. S. Downie, “Second fiddle is important too: Pitch tracking individual voices in polyphonic music,” in Proc. Int. Soc. Music Inf. Retrieval Conf. (ISMIR), 2012, pp. 319–324
[32] Shoko Arakia,b,, Hiroshi Sawadaa , Ryo Mukaia , Shoji Makinoa,b “Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors”Signal Processing 87 (2007) 1833–1847
[33]Guodong Guo and Stan Z. Li ‘’Content-Based Audio Classification and Retrieval by Support Vector Machines”IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 209
[34] Shuai Li; Xin-Jun Wang; Ying Zhang “X-SPA: Spatial Characteristic PSO Clustering Algorithm with Efficient Estimation of the Number of Cluster” Fuzzy Systems and Knowledge Discovery, 2008. FSKD ′08. Fifth International Conference on Year: 2008
[35] Rehab F. Abdel-Kader ‘’Genetically Improved PSO Algorithm for Efficient Data Clustering’’ Machine Learning and Computing (ICMLC), 2010 Second International Conference on Year: 2010
[36] Z. Duan, B. Pardo, and C. Zhang, “Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2121–2133, Nov. 2010.
[37]D. Campbell, K. Palomäki, and G. Brown, “A matlab simulation of shoebox room acoustics for use in research andiimm teaching,” Comput. Inf. Syst. J., vol. 9, no. 3, pp. 48–51, Oct. 2005.
[38] R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello. Medleydb: A multitrack dataset for annotation-intensive mir research. In Proc.
Int. Soc. Music Info. Retrieval Conf., 2014.
[39]MIREX MultiF0 Development Dataset, Available : “http://www.music -ir.org/mirex/wiki/MIREX_HOME”
[40]Fred Cummins, Marco Grimaldi, Thomas Leonard and Juraj Simko “The CHAINS corpus: CHAracterizing INdividual Speakers” School of Computer Science and Informatics University College Dublin, Dublin 4, Ireland
[41]Xiang Wang; Zhitao Huang; Yiyu Zhou“Underdetermined DOA estimation and blind separation of non-disjoint sources in time-frequency domain based on sparse representation method”Journal of Systems Engineering and ElectronicsYear: 2014, Volume: 25, Issue: 1Pages: 17 - 25,

Koichi Ichige; Yoshihisa Ishikawa; Hiroyuki Arai
[42]“High resolution 2-D DOA estimation using second-order partial-differential of MUSIC spectrum”2008 IEEE International Symposium on Circuits and SystemsYear: 2008,Pages: 1152 - 1155,
[43]A. Jourjine, S. Rickard, O¨. Yılmaz, Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures, in: Proceedings of the ICASSP 2000, vol. 12, 2000, pp. 2985–2988
[44]S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada, Underdetermined blind separation for speech in real environments with sparseness and ICA, in: Proceedings of the ICASSP 2004, vol. III, 2004, pp. 881–88
[45]Jae-Hun Choi; Joon-Hyuk Chang“Dual-Microphone Voice Activity Detection Technique Based on Two StepPower Level Difference Ratio” IEEE/ACM Transactions on Audio, Speech, and Language Processing.Year: 2014, Volume: 22, Issue: 6.Pages: 1069 - 1081,

指導教授

王家慶(Jia-Ching Wang)

審核日期

2016-8-25

推文