dc.description.abstract | In recent years, the use of language models to improve the output of end-to-end speech recognition models has become the mainstream method in the field of monolingual speech recognition. Not only the data for training the model is extremely scarce, but also the traditional model architecture is not easy to learn multilingual semantic information. Therefore, in order to solve the above two problems, this paper introduces a masked language model into the code-switching speech recognition system, hoping to make the system produce more accurate results through general language knowledge and bidirectional context information. Among them, the masked language model uses unlabeled data for self-supervised pre-training to obtain general language knowledge, and then the model is transferred to the field of code-switching speech recognition for adaptation. In addition, since the training of the masked language model will use the complete bidirectional contextual information, it will also greatly enhance the semantic understanding and the performance of the model. Therefore, we take advantage of the masking language model and apply it to establish code-switching language model and re-score the output sequence of the end-to-end speech recognition model to improve the performance of the overall system. In this paper, we propose to replace the traditional causal language model and add the masked language model on the standard speech recognition system, and conduct experiments on the code-switched corpus SEAME. Finally, the two systems are compared. Compared with the standard architecture, relative mixed error rates of up to 7% and 8.4% were achieved, respectively, proving that our proposed method can solve the aforementioned problems and enhance the performance of the code-switched speech recognition system. | en_US |