dc.description.abstract | The purpose of Neural Machine Translation (NMT) is to translate a source sentence to a target sentence by deep learning models and to be able to preserve the semantic meaning of the source sentence and have correct syntax as well. Recently, Transformer is one of the commonly used models. It can capture the global information of sentences through the Self-Attention Mechanism and performs well in lots of Natural Language Processing (NLP) tasks. However, some studies have indicated that the Self-Attention Mechanism learns repetitive information and cannot learn local information of texts effectively. Therefore, we modify the Self-attention Mechanism in Transformer and propose Gated Attention and Clustered Attention, by adding Gated Mechanism and K-means clustering algorithm respectively. Moreover, Gated Attention includes Top-k% method and Threshold method. These approaches centralize the Attention Map to made model improve the ability to capture local information and learn more different relationship in sentences. Hence Transformer can provide a higher quality translation.
In this work, we apply Clustered Attention as well as Top-k% method and Threshold method of Gated Attention to Chinese-to-English translation tasks, and then the results are 24.69, 25.30 and 24.69 BLEU, respectively. Secondly, the best result of the hybrid combination model that uses both attention mechanisms at the same time is 24.88 BLEU, which is not better than using a single attention mechanism. In our experiments, we have found that the proposed model outperforms the vanilla Transformer. Furthermore, we have also observed that using only one attention mechanism can help Transformer learn text information better and achieve the goal of Attention Map centralization as well. | en_US |