dc.description.abstract | Paraphrase generation is one of the important tasks in natural language processing (NLP). The purpose is to retain the same semantic meaning but a different syntactic structure. As for this task, it can be classified as supervised learning, semi-supervised learning, and unsupervised learning. There are several promising results in supervised learning, and good performance has been achieved in various indicators. As semi-supervised learning and unsupervised learning are still in the research stage, there is not much research discussing this task. For this reason, this research explores unsupervised paraphrasing.
In addition, no matter the supervised methods or unsupervised methods for paraphrase generation, some researchers are exploring the method of controlling the generation. The main purpose is to preserve the important vocabulary in the sentence to avoid the change of meaning. For example, “Trump has a dog”. In this sentence, Trump and the dog are the words that cannot be converted. If Trump converts into Hillary Clinton, the meaning of the entire sentence will be changed. There are several ways to control the generation that some use syntactic structure to achieve controllable generation. Some are proposing the method of modifying the model to achieve the controllable. In our research, we modified the structure of the Transformer model. In the structure, we added the concept of the introduced Named Entity (NE). The reason is that usually, these words with NE are irreplaceable in the sentence. In this study, we assume words with NE tags as irreplaceable words. Therefore, in the training phase, we expect the words with NE tags can be learned by the model. Consequently, we combine the embedding of NE tags with position encoding and input token for model training.
From the experimental results, we proposed a method to judge whether the entity is effectively retained. We calculated the NE′s recall. The recall score is better than that of the baseline model, and we also compared the main evaluation metric of the baseline model, iBLEU. iBLEU is an extension of BLEU which mainly determines the degree of semantic retention of the generated sentence and the target sentence. In addition, iBLEU is BLEU with a penalty mechanism. In our results, iBLEU scores are better than the benchmark. This can also show that our method of using NE constraints has potential influence. | en_US |