Conditional Contrastive Learning for Multilingual Neural Machine Translation

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：134

、訪客IP：3.144.123.89

姓名

邱睿揚(Rui-Yang Qiu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

(Conditional Contrastive Learning for Multilingual Neural Machine Translation)

相關論文

★ 多重標籤文本分類之實證研究 : word embedding 與傳統技術之比較	★ 基於圖神經網路之網路協定關聯分析
★ 學習模態間及模態內之共用表示式	★ Hierarchical Classification and Regression with Feature Selection
★ 病徵應用於病患自撰日誌之情緒分析	★ 基於注意力機制的開放式對話系統
★ 針對特定領域任務—基於常識的BERT模型之應用	★ 基於社群媒體使用者之硬體設備差異分析文本情緒強烈程度
★ 機器學習與特徵工程用於虛擬貨幣異常交易監控之成效討論	★ 捷運轉轍器應用長短期記憶網路與機器學習實現最佳維保時間提醒
★ 基於半監督式學習的網路流量分類	★ ERP日誌分析-以A公司為例
★ 企業資訊安全防護：網路封包蒐集分析與網路行為之探索性研究	★ 資料探勘技術在顧客關係管理之應用─以C銀行數位存款為例
★ 人臉圖片生成與增益之可用性與效率探討分析	★ 人工合成文本之資料增益於不平衡文字分類問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-8-26以後開放)

摘要(中)

翻譯任務一直是自然語言領域備受關注的話題，即使之前知名翻譯任務的許多研究都對翻譯性能和效果做出了貢獻，但它們僅限於處理單一對或“以英語為中心”。最近，越來越多的研究針對多語言、非以英語為中心，希望構建多語言機器翻譯系統。
現行多語言機器翻譯方法仍然在訓練中存在資料不均衡問題。近期有研究使用對比式學習方法來縮小語言之間的表示差距，藉此提升多語言機器翻譯性能。另一方面有研究於電腦視覺領域對當前知名的對比式學習框架負採樣的方法提出存疑，認為沒有條件選取負樣本會導致學習不穩定，因此假設給予條件選取負樣本能協助提升對比式學習之性能，並且證實其有效性。由於目前沒有任務在多語言翻譯任務上探討對比式學習的負採樣問題，因此本研究想探討於對比式學習中設定條件選取負樣本的方法。
從實驗結果中，我們發現我們提出的條件式對比學習方法在監督定向翻譯結果及 zero-shot 翻譯結果不如我們預期。當前的數據量不足使屬於自監督式學習的對比式學習方法無法有效改進多語言機器翻譯，使得多任務學習模型翻譯結果難以超越監督式學習的模型。
我們進一步分析了使用我們提出的模型學習出來的共享句子表示式，我們也將此共用表達式與單一任務學習的 m-Transformer 和對比式學習模型做視覺化比較，並且證明我們的共用表示式是能夠有效學習跨語言的共享句子表示式，我們取得與對比式學習模型類似的結果，我們認為我們的優勢在於在於在學習目標事先經過採樣而不像先前作法提取全部對象進行學習，使用較少學習對象能夠降低學習不穩定問題。

摘要(英)

The translation task has always been a topic of great concern in the field of natural language, and even though many previous studies of well-known translation tasks have contributed to translation performance and effectiveness, they have been limited to dealing with single pair or "English-centric" tasks. Recently, more and more research has been conducted on multilingual, non-English-centric, machine translation systems.
Current multilingual machine translation methods still suffer from data imbalance in training. Recent studies have used contrastive learning methods to reduce the representation gap between different languages to improve the performance of multilingual machine translation. On the other hand, a study in the field of computer vision has questioned the well- known contrastive learning framework of negative sampling, arguing that unconditional selection of negative samples can cause unstable learning, and therefore hypothesizing that conditional selection of negative samples can help improve the performance of contrastive learning and prove its effectiveness. Since there is no task to discuss the problem of negative sampling in contrastive learning on multilingual translation tasks, this study aims to discuss the method of setting conditional negative samples in contrastive learning.
From the experimental results, we found that our proposed conditional contrastive learning method was not as effective as we expected in supervised directional translation. Insufficient amount of training data makes the contrastive learning method of self-supervised learning ineffective in improving multilingual machine translation, making it difficult for multi-task learning models to outperform supervised learning models in translation.
We further analyze the shared sentence representations learned using our proposed model. We also visualize and compare these shared representations with m-Transformer and contrastive learning models for single-task learning, and demonstrate that our shared representations can be effective in learning shared sentence representations across languages. We obtained similar results to the contrastive learning model, and we believe that our advantage lies in the fact that the learning targets are sampled beforehand instead of extracting all objects for learning as in the previous approach, and that using fewer learning objects reduces the learning instability problem.

關鍵字(中)

★ 機器翻譯
★ 深度神經網絡
★ 對比式學習
★ 負採樣

關鍵字(英)

★ machine translation
★ deep neural network
★ contrastive learning
★ negative sampling

論文目次

摘要 .....................................................................................................................................I Abstract .............................................................................................................................. II Acknowledgement............................................................................................................. III Table of contents ...............................................................................................................IV List of Tables.....................................................................................................................VI List of Figures ................................................................................................................. VII
1. Introduction ................................................................................................................ 1
1.1. Overview ............................................................................................................1
1.2. Motivation .......................................................................................................... 2
1.3. Objectives...........................................................................................................3
1.4. Thesis Organization............................................................................................4
2. Related Works ............................................................................................................5 2.1. Neural Machine Translation ............................................................................... 5 2.1.1. Encoder-Decoder Framework ....................................................................5 2.1.2. Attention Mechanism .................................................................................6 2.2. Multilingual Neural Machine Translation .......................................................... 8 2.2.1. Multi-Way Machine Translation ................................................................9 2.2.2. Low-Resource Machine Translation ........................................................11 2.2.3. Multi-Source Machine Translation...........................................................12 2.3. ContrastiveLearning........................................................................................16 2.3.1. Contrastive Learning in Computer Vision ...............................................17 2.3.2. Contrastive Learning in Natural Language Processing ............................ 18
3. Methodology.............................................................................................................24
3.1. Model Architecture...........................................................................................24 3.2. Multi-Task Learning.........................................................................................25 3.2.1. Machine Translation.................................................................................25 3.2.2. Conditional Contrastive Learning ............................................................26 3.3. Flow Chart ........................................................................................................ 28 3.4. Datasets.............................................................................................................29 3.5. Experiment Settings..........................................................................................31 3.5.1. Preprocessing............................................................................................31 3.5.2. Model settings ..........................................................................................32 3.6. Experiment Design ........................................................................................... 32
4. Experiment Results...................................................................................................34
4.1. Experiment 1 - The effectiveness of our proposed method in the supervised directional translation ............................................................................................... 34
4.1.1. Experiment 1 Results................................................................................34
4.1.2. Summary of Experiment 1........................................................................38
4.2. Experiment 2 - The effectiveness of our proposed method in the zero-shot translation ................................................................................................................. 39
4.2.1. Experiment 2 Results................................................................................39 4.3. Analysis and discussion....................................................................................41
5. Conclusion ................................................................................................................ 52 5.1. Overall Summary..............................................................................................52 5.2. Contributions....................................................................................................52 5.3. Limitations........................................................................................................52 5.4. Future work ......................................................................................................53
Reference........................................................................................................................... 54

參考文獻

Bahdanau, D., Cho, K., Bengio, Y., 2016. Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv14090473 Cs Stat.
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A., 2020. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, in: Advances in Neural Information Processing Systems. Curran Associates, Inc., pp. 9912–9924.
Cettolo, M., Girardi, C., Federico, M., 2012. WIT3: Web Inventory of Transcribed and Translated Talks, in: Proceedings of the 16th Annual Conference of the European Association for Machine Translation. Presented at the EAMT 2012, European Association for Machine Translation, Trento, Italy, pp. 261–268.
Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A Simple Framework for Contrastive Learning of Visual Representations. ArXiv200205709 Cs Stat.
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734. https://doi.org/10.3115/v1/D14-1179
Cipolla, R., Gal, Y., Kendall, A., 2018. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, pp. 7482–7491. https://doi.org/10.1109/CVPR.2018.00781
Conneau, A., Kiela, D., n.d. SentEval: An Evaluation Toolkit for Universal Sentence Representations 6.
Dabre, R., Chu, C., Kunchukuttan, A., 2020. A Survey of Multilingual Neural Machine Translation. ACM Comput. Surv. 53, 1–38. https://doi.org/10.1145/3406095
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Presented at the NAACL-HLT 2019, Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423
Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R., Makhoul, J., 2014. Fast and Robust Neural Network Joint Models for Statistical Machine Translation, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2014, Association for Computational Linguistics, Baltimore, Maryland, pp. 1370–1380. https://doi.org/10.3115/v1/P14-1129
Doersch, C., Gupta, A., Efros, A.A., 2015. Unsupervised Visual Representation Learning by Context Prediction, in: 2015 IEEE International Conference on Computer Vision (ICCV). Presented at the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1422–1430. https://doi.org/10.1109/ICCV.2015.167
Eisele, A., Chen, Y., n.d. MultiUN: a multilingual corpus from United Nation documents 5.
Firat, O., Cho, K., Bengio, Y., 2016. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Presented at the NAACL-HLT 2016, Association for Computational Linguistics, San Diego, California, pp. 866–875. https://doi.org/10.18653/v1/N16-1101
Gao, T., Yao, X., Chen, D., 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2021, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 6894–6910. https://doi.org/10.18653/v1/2021.emnlp-main.552
Giorgi, J., Nitski, O., Wang, B., Bader, G., 2021. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Presented at the ACL-IJCNLP 2021, Association for Computational Linguistics, Online, pp. 879–895. https://doi.org/10.18653/v1/2021.acl-long.72
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative Adversarial Networks. ArXiv14062661 Cs Stat.
Gu, J., Hassan, H., Devlin, J., Li, V.O.K., 2018. Universal Neural Machine Translation for Extremely Low Resource Languages, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Presented at the NAACL-HLT 2018, Association for Computational Linguistics, New Orleans, Louisiana, pp. 344–354. https://doi.org/10.18653/v1/N18-1032
Hadsell, R., Chopra, S., LeCun, Y., 2006. Dimensionality Reduction by Learning an Invariant Mapping, in: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). Presented at the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06), IEEE, New York, NY, USA, pp. 1735–1742. https://doi.org/10.1109/CVPR.2006.100
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R., 2020. Momentum Contrast for Unsupervised Visual Representation Learning, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, pp. 9726–9735. https://doi.org/10.1109/CVPR42600.2020.00975
Jing, L., Tian, Y., 2019. Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey.
Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J., 2017. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Trans. Assoc. Comput. Linguist. 5, 339–351. https://doi.org/10.1162/tacl_a_00065
Kim, T., Yoo, K.M., Lee, S., 2021. Self-Guided Contrastive Learning for BERT Sentence Representations, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Presented at the ACL-IJCNLP 2021, Association for Computational Linguistics, Online, pp. 2528–2540. https://doi.org/10.18653/v1/2021.acl-long.197
Kingma, D.P., Welling, M., 2014. Auto-Encoding Variational Bayes. ArXiv13126114 Cs Stat.
Lakew, S.M., Erofeeva, A., Negri, M., Federico, M., Turchi, M., 2018. Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary. ArXiv181101137 Cs.
Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jégou, H., 2018. Word translation without parallel data. Presented at the International Conference on Learning Representations.
Lin, Z., Pan, X., Wang, M., Qiu, X., Feng, J., Zhou, H., Li, L., 2020. Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the EMNLP 2020, Association for Computational Linguistics, Online, pp. 2649–2663. https://doi.org/10.18653/v1/2020.emnlp-main.210
Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013. Efficient Estimation of Word Representations in Vector Space. ArXiv13013781 Cs.
Nishimura, Y., Sudoh, K., Neubig, G., Nakamura, S., 2018. Multi-Source Neural Machine Translation with Missing Data, in: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, Melbourne, Australia, pp. 92–99. https://doi.org/10.18653/v1/W18-2711
Noroozi, M., Favaro, P., 2017. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. ArXiv160309246 Cs.
Pan, X., Wang, M., Wu, L., Li, L., 2021. Contrastive Learning for Many-to-many Multilingual Neural Machine Translation, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Presented at the ACL-IJCNLP 2021, Association for Computational Linguistics, Online, pp. 244–258. https://doi.org/10.18653/v1/2021.acl-long.21
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., 2002. Bleu: a Method for Automatic Evaluation of Machine Translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2002, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. https://doi.org/10.3115/1073083.1073135
Qi Y., Sachan D., Felix M., Padmanabhan S., Neubig G., 2018. When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? Presented at the Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 529–535. https://doi.org/10.18653/v1/N18-2084 Reimers, N., Gurevych, I., 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Presented at the EMNLP-IJCNLP 2019, Association for Computational Linguistics, Hong Kong, China, pp. 3980–3990. https://doi.org/10.18653/v1/D19-1410
Schroff, F., Kalenichenko, D., Philbin, J., 2015. FaceNet: A unified embedding for face recognition and clustering, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA, USA, pp. 815–823. https://doi.org/10.1109/CVPR.2015.7298682
Schwenk, H., 2012. Continuous Space Translation Models for Phrase-Based Statistical Machine Translation, in: Proceedings of COLING 2012: Posters. Presented at the COLING 2012, The COLING 2012 Organizing Committee, Mumbai, India, pp. 1071– 1080.
Sennrich, R., Haddow, B., Birch, A., 2016. Improving Neural Machine Translation Models with Monolingual Data, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2016, Association for Computational Linguistics, Berlin, Germany, pp. 86–96. https://doi.org/10.18653/v1/P16-1009
Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z., Liu, T.-Y., 2019. MULTILINGUAL NEURAL MACHINE TRANSLATION WITH KNOWLEDGE DISTILLATION 15.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., 2017a. Attention is All you Need, in: Advances in Neural Information Processing Systems. Curran Associates, Inc.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017b. Attention Is All You Need. ArXiv170603762 Cs.
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S., 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, in: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Presented at the Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, Brussels, Belgium, pp. 353–355. https://doi.org/10.18653/v1/W18-5446
Wu, M., Mosse, M., Zhuang, C., Yamins, D., Goodman, N., 2020. Conditional Negative Sampling for Contrastive Learning of Visual Representations. Presented at the International Conference on Learning Representations.
Wu, Z., Wang, S., Gu, J., Khabsa, M., Sun, F., Ma, H., 2020. CLEAR: Contrastive Learning for Sentence Representation.
Xia, M., Kong, X., Anastasopoulos, A., Neubig, G., 2019. Generalized Data Augmentation for Low-Resource Translation, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2019, Association for Computational Linguistics, Florence, Italy, pp. 5786–5796. https://doi.org/10.18653/v1/P19-1579
Zaremoodi, P., Buntine, W., Haffari, G., 2018. Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Presented at the ACL 2018, Association for Computational Linguistics, Melbourne, Australia, pp. 656–661. https://doi.org/10.18653/v1/P18-2104
Zhang, B., Williams, P., Titov, I., Sennrich, R., 2020. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2020, Association for Computational Linguistics, Online, pp. 1628–1639. https://doi.org/10.18653/v1/2020.acl-main.148
Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B., 2016. The United Nations Parallel Corpus v1.0, in: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Presented at the LREC 2016, European Language Resources Association (ELRA), Portorož, Slovenia, pp. 3530–3534.
Zoph, B., Knight, K., 2016. Multi-Source Neural Translation, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Presented at the NAACL-HLT 2016, Association for Computational Linguistics, San Diego, California, pp. 30–34. https://doi.org/10.18653/v1/N16-1004

指導教授

柯士文(Shih-Wen Ke)

審核日期

2022-8-26

推文