多向注意力機制於翻譯任務改進之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：106

、訪客IP：18.220.109.180

姓名

林佳蒼(Chia-Tsang Lin) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

多向注意力機制於翻譯任務改進之研究

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

機器翻譯是自然語言處理中熱門的研究主題之一，歷年來都有許多模型被提出，其中Transformer運用多向注意力（Multi-head Attention）機制大幅提升了機器翻譯的準確度，但多數研究卻還是專注在模型的創新及架構的調整，而不是對原生的Transformer進行優化，因此本研究將針對Transformer中的多向注意力進行改良，以遮罩的方式在不增加訓練參數及訓練時間的情況下，增加注意力機制學習輸入句子小區資訊的能力，讓Transformer能在中英翻譯任務上提升3.6~11.3%的準確率，德英翻譯任務上提升17.4%的準確率。

摘要(英)

Neural Machine Translation (NMT) is one of the popular research topics in Natural Language Processing (NLP). Lots of new model have been proposed by researchers throughout the world each year. Recently, a model called Transformer, which uses only attention mechanism, outperforms a lot of model in NMT. Most research on this model focus on model innovation, but not adjusting the original model itself. Therefore, this work will modify the Multi-head Self-Attention module used in this model to better learn the information about the input. The result increases the performance of the model by 3.6 to 11.3% BLEU score on Chinese-English translation and 17.4% BLEU on Dutch-English translation.

關鍵字(中)

★ 自然語言處理
★ 機器翻譯
★ 注意力機制
★ Transformer

關鍵字(英)

★ Natural Language Processing
★ Machine Translation
★ Attention mechanism
★ Transformer

論文目次

摘要 I
Abstract II
目錄 III
一、前言 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 3
1-4 文章架構 3
二、文獻探討 4
2-1 編解碼器架構 4
2-1-1 RNN Sequence-to-Sequence模型 5
2-2 Transformer模型 8
2-3 注意力機制 9
2-3-1 注意力機制的發展 10
2-3-2 多向注意力機制 12
2-3-3 多向注意力機制的表現 13
三、研究方法 15
3-1 資料前處理 15
3-2 翻譯模型 17
3-2-1 詞向量 18
3-2-2 Add & Norm 18
3-2-3 Feed Forward 19
3-2-4 混和多向自注意力機制 19
3-2-5 輸出結果 21
3-3 結果評估 22
四、實驗 23
4-1 實驗環境 23
4-2 實驗設計 24
4-2-1 實驗一：最佳N-gram參數 24
4-2-2 實驗二：混和多向注意力之結果 25
4-2-3 實驗三：德英語言翻譯任務之表現 27
4-2-4 實驗四：實例探討 28
五、結論與未來方向 32
5-1 結論 32
5-2 研究限制 32
5-3 未來研究方向 32
參考文獻 34

參考文獻

Aiken, M. (2019). An Updated Evaluation of Google Translate Accuracy. Studies in Linguistics and Literature, 3, p253. https://doi.org/10.22158/sll.v3n3p253
Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer Normalization. ArXiv:1607.06450 [Cs, Stat]. http://arxiv.org/abs/1607.06450
Baan, J., ter Hoeve, M., van der Wees, M., Schuth, A., & de Rijke, M. (2019). Do Transformer Attention Heads Provide Transparency in Abstractive Summarization? ArXiv:1907.00570 [Cs]. http://arxiv.org/abs/1907.00570
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. ArXiv:1607.04606 [Cs]. http://arxiv.org/abs/1607.04606
Chen, B., & Cherry, C. (2014). A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU. Proceedings of the Ninth Workshop on Statistical Machine Translation, 362–367. https://doi.org/10.3115/v1/W14-3346
Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., Jones, L., Parmar, N., Schuster, M., Chen, Z., Wu, Y., & Hughes, M. (2018). The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. ArXiv:1804.09849 [Cs]. http://arxiv.org/abs/1804.09849
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv:1406.1078 [Cs, Stat]. http://arxiv.org/abs/1406.1078
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. ArXiv:1901.02860 [Cs, Stat]. http://arxiv.org/abs/1901.02860
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1207/s15516709cog1402_1
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. Proceedings of the 34th International Conference on Machine Learning - Volume 70, 1243–1252.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 315–323. http://proceedings.mlr.press/v15/glorot11a.html
Hao, J., Wang, X., Shi, S., Zhang, J., & Tu, Z. (2019). Multi-Granularity Self-Attention for Neural Machine Translation. ArXiv:1909.02222 [Cs]. http://arxiv.org/abs/1909.02222
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Huang, K., Altosaar, J., & Ranganath, R. (2019). ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. ArXiv:1904.05342 [Cs]. http://arxiv.org/abs/1904.05342
Iida, S., Kimura, R., Cui, H., Hung, P.-H., Utsuro, T., & Nagata, M. (2019). Attention over Heads: A Multi-Hop Attention for Neural Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 217–222. https://doi.org/10.18653/v1/P19-2030
Li, J., Tu, Z., Yang, B., Lyu, M. R., & Zhang, T. (2018). Multi-Head Attention with Disagreement Regularization. ArXiv:1810.10183 [Cs]. http://arxiv.org/abs/1810.10183
Liu, Y. (2019). Fine-tune BERT for Extractive Summarization. ArXiv:1903.10318 [Cs]. http://arxiv.org/abs/1903.10318
Medina, J. R., & Kalita, J. (2018). Parallel Attention Mechanisms in Neural Machine Translation. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 547–552. https://doi.org/10.1109/ICMLA.2018.00088
Michel, P., Levy, O., & Neubig, G. (2019). Are Sixteen Heads Really Better than One? In H. Wallach, H. Larochelle, A. Beygelzimer, F. d extquotesingle Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32 (pp. 14014–14024). Curran Associates, Inc. http://papers.nips.cc/paper/9551-are-sixteen-heads-really-better-than-one.pdf
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 3111–3119.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318. https://doi.org/10.3115/1073083.1073135
Rush, A. (2018). The Annotated Transformer. Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 52–60. https://doi.org/10.18653/v1/W18-2509
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725. https://doi.org/10.18653/v1/P16-1162
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 3104–3112.
Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
Vig, J. (2019). Visualizing Attention in Transformer-Based Language Representation Models. ArXiv:1904.02679 [Cs, Stat]. http://arxiv.org/abs/1904.02679
Voita, E., Talbot, D., Moiseev, F., Sennrich, R., & Titov, I. (2019). Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5797–5808. https://doi.org/10.18653/v1/P19-1580
Vosoughi, S., Vijayaraghavan, P., & Roy, D. (2016). Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’16, 1041–1044. https://doi.org/10.1145/2911451.2914762
Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S. (2018). CBAM: Convolutional Block Attention Module. ArXiv:1807.06521 [Cs]. http://arxiv.org/abs/1807.06521
Wu, F., Fan, A., Baevski, A., Dauphin, Y. N., & Auli, M. (2019). Pay Less Attention with Lightweight and Dynamic Convolutions. ArXiv:1901.10430 [Cs]. http://arxiv.org/abs/1901.10430
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., … Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv:1609.08144 [Cs]. http://arxiv.org/abs/1609.08144
Yang, B., Tu, Z., Wong, D. F., Meng, F., Chao, L. S., & Zhang, T. (2018). Modeling Localness for Self-Attention Networks. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4449–4458. https://doi.org/10.18653/v1/D18-1475

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2020-7-20

推文