dc.description.abstract | Fundamental frequency analysis of multiple sound mixtures is a important information in audio signal processing. To know the information of fundamental frequency can be extended for several applications like in music information retrieval, automatic music transcription, melody extraction, instrument identification. In speech research, like speech separation, speech recognition and prosody analysis. This paper aims at source transcription of polyphonic audio, can be consisting of two stages .Stage one is to detecting each pitches values provide by different sources in every time frame is known as multiple F0 estimation. Stage two is to clustering all the pitch which detected in stage one into a single pitch trajectory originating from the corresponding sources. The main focus on this paper is to do source clustering of the detected pitch in polyphonic audio signal which the pitch provide by different sources playing at the same time.
Although many works have been proposed to do source transcription, multi pitch streaming, multi F0 source clustering there are still various challenges in this task. In feature extraction, since the different sound sources playing simultaneously, the pitch contour numerous overlap in the mixture audio, is hard to generating the source characterizing feature corresponding to different F0 values, especially in music case. To solve this problem we adopt multi -channel approach to improve the source characterizing feature.
While source characterizing feature corresponding to each pitches has been extracted. The next step is to clustering all the detected pitches into corresponding sources. Since the supervised approaches require more information of isolated recordings to training models, our approach focus on the unsupervised way. We introduce a new Constrained PSO clustering which can deal with this task more precise.
This paper introduce a novel scheme for the source transcription of polyphonic sound mixture. Our approach need three inputs: multi-channel mixture sound, multi pith estimation (MPE) values, number of sources. We use the Ground truths multi pitch values and some other multi pitch estimation work provide by Duan et al. as MPE input. Then, use this MPE value to extract both timbre and direction feature, and concatenation two feature with the STD weight. After feature extraction, we use the Constrained PSO clustering to try all the possible of the clustering distribution and to find minimize timbre and direction inconsistency. Finally, we can map the clustering result back to each individual pitches labels and output the every single pitch trajectory from corresponding sources. | en_US |