dc.description.abstract | In recent years, smart speaker gets into full swing, amazon smart speaker, Echo, successfully changed customers’ habits of using home appliances, and voice assistant Alexa enables customers to command via voice. Smart speaker related technology are divided into front-end and back-end, front-end refers to the device, namely smart speaker front-end technology, including noise reduction, speech enhancement, echo cancellation, voice activity detection, etc., and back-end technology refers to server end, including speech recognition and semantic understanding, and so on. These technologies make each firms bet a lot of efforts.
In this thesis, we combined previous research and implemented robust wake word detection on embedded system, the system consists of two techniques in smart speakers, wake word detection and noise reduction, wake word detection is voice through the Mel cepstrum coefficient (MFCC) to extract the characteristics as input on convolution neural network and the output are probabilities of each class of wake word. Probabilities determine whether wake words are identified; Noise reduction use short-time Fourier Transform (STFT) results of the time-frequency mixed signals, after taking out the energy and put it into the recursive neural network to train, then we get the output, noise mask and speech mask, applying these masks on GEV beamformer to achieve noise reduction. | en_US |