dc.description.abstract | The goal of BSS is solving cocktail party problem. Imagine a room with a number of persons and microphones for recording. When people are speaking at the same time, each microphone registers a different mixture of individual speaker′s audio signals. And the task of BSS is to untangle these mixtures into their sources. There are various applications including
mobile telephony, multiuser communication systems, voice reinforce in noisy environment.
The mixtures recorded by microphones will be transformed to frequency domain with STFT (Short-Time Fourier Transform). Owing to the characteristics, sparseness and the disjointness ,of the source signal, we can obtain those features from the mixtures during feature extraction
step. The features are represented as complex number. Afterwards, by utilizing K-meansalgorithm, we divide those features into N group, where N is the number of sources. Prior to transform the separated signal back to time domain, we adopt mask design to label the target signal, for example, if the target signal is a speech signal, we will label it one, otherwise zero.
To solve the convolutive blind source separation (BSS) problem, this thesis presents a new method which utilizing a fast time frequency mask technique. We first define two features, which are normalized level-ratio and phase-difference. Next, we reduce the variance of feature in order to obtain lower iterations of K-Means clustering. Afterwards, with K-means algorithm, we cluster the features by assigning them to the nearest group. In the end, according to the clustered features, a time frequency mask is generated. The method is not only easy, but also faster without reducing the quality of the target signal. In real environment,we also evaluate the separated signal in terms of SDR (signal to distortion ratio) and SIR (signal to interference ratio). | en_US |