摘要: | 基礎頻率分析在數位訊號處理中是一項重要課題並可以延伸到許多相關的研究,無論是在音樂或者語音上皆有其中要性,本論文主要討論多個單音音源的音頻串流方法,本論文提出之系統需要三個輸入,分別為音源個數、基頻偵測結果、混合音檔。而整體系統可以分為兩個階段,第一階段為依據基頻偵測結果將每一個音高取得相對應特徵參數,第二階段則將上述所有資料進行的聚類,最後輸出各個音源的音頻串流,簡單來說即是每個時刻每個音源演奏哪些音高的資訊。 本論文在特徵參數方面我提出了新的多通道方位特徵參數,並與其他音色特徵參數融合成為更加強健的特徵參數,聚類方面我們基於粒子群最佳化演算法提出了兩種不同架構,並融合領域知識於其中來提高準確率。另外本論文特別針對音源音域接近、音頻串流纏繞頻繁的音檔來設計並能有更好的準確率。 ;Fundamental frequency analysis of multiple sound mixtures is a important information in audio signal processing. To know the information of fundamental frequency can be extended for several applications like in music information retrieval, automatic music transcription, melody extraction, instrument identification. In speech research, like speech separation, speech recognition and prosody analysis. This paper aims at source transcription of polyphonic audio, can be consisting of two stages .Stage one is to detecting each pitches values provide by different sources in every time frame is known as multiple F0 estimation. Stage two is to clustering all the pitch which detected in stage one into a single pitch trajectory originating from the corresponding sources. The main focus on this paper is to do source clustering of the detected pitch in polyphonic audio signal which the pitch provide by different sources playing at the same time. Although many works have been proposed to do source transcription, multi pitch streaming, multi F0 source clustering there are still various challenges in this task. In feature extraction, since the different sound sources playing simultaneously, the pitch contour numerous overlap in the mixture audio, is hard to generating the source characterizing feature corresponding to different F0 values, especially in music case. To solve this problem we adopt multi -channel approach to improve the source characterizing feature. While source characterizing feature corresponding to each pitches has been extracted. The next step is to clustering all the detected pitches into corresponding sources. Since the supervised approaches require more information of isolated recordings to training models, our approach focus on the unsupervised way. We introduce a new Constrained PSO clustering which can deal with this task more precise. This paper introduce a novel scheme for the source transcription of polyphonic sound mixture. Our approach need three inputs: multi-channel mixture sound, multi pith estimation (MPE) values, number of sources. We use the Ground truths multi pitch values and some other multi pitch estimation work provide by Duan et al. as MPE input. Then, use this MPE value to extract both timbre and direction feature, and concatenation two feature with the STD weight. After feature extraction, we use the Constrained PSO clustering to try all the possible of the clustering distribution and to find minimize timbre and direction inconsistency. Finally, we can map the clustering result back to each individual pitches labels and output the every single pitch trajectory from corresponding sources. |