dc.description.abstract | In recent years, speech recognition research has gradually turned to the development of end-to-end models, simplifying the overall model process. In the "Listen, Attention and Spell" paper in 2015, the Seq-to-Seq architecture and Attention mechanism were used for the end-to-end speech recognition task for the first time, laying the current end-to-end speech recognition model. Unfortunately, Attention is based on full-sequence modeling, so it cannot identify fragment sequences, and therefore cannot be perfectly used in streaming applications. Based on the problem of Attention fragment sequence identification, this paper uses Layer-level Time Limited Attention Mask (L-TLAM), which improves the model′s ability to model non-complete sequences and alleviates excessive indirect attention due to stacked networks problems to achieve more perfect streaming speech recognition effect.
Punctuation marks are an integral part of textual information, used to indicate pauses, tone, and the nature and function of words. However, the corpus generally used for training speech recognition does not provide punctuation marks, so in speech recognition tasks, it is impossible to directly provide recognition results with punctuation marks. In the second work of this paper, in order to integrate the punctuation mark task into speech recognition training, we train the speech recognition main task based on the Transducer model architecture, and use the Multi-task Learning training method to convert the language model Predictor in the Transducer architecture. Shared in two tasks 1) Context Representation for Acoustic Model 2) Punctuation Prediction. The first task provides the textual context information required in the ASR task. The second task provides textual semantic information to predict Punctuation. In the end, the thesis also tried to import Language Model task to improve Predictor′s semantic comprehension ability, and then improve the accuracy of speech recognition and punctuation prediction tasks. | en_US |