運用合成器混合注意力改善BERT模型於科學語言編輯

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：51

、訪客IP：3.144.243.25

姓名

王昱翔(Yuh-Shyang Wang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

運用合成器混合注意力改善BERT模型於科學語言編輯
(Improving BERT Model with Synthesizers based Mixed-Attentions for Scientific Language Editing)

相關論文

★ 多重嵌入增強式門控圖序列神經網路之中文健康照護命名實體辨識	★ 基於腦電圖小波分析之中風病人癲癇偵測研究
★ 基於條件式生成對抗網路之資料擴增於思覺失調症自動判別	★ 標籤圖卷積增強式超圖注意力網路之中文健康照護文本多重分類
★ 強化領域知識語言模型於中文醫療問題意圖分類	★ 管道式語言轉譯器之中文健康照護開放資訊擷取
★ 運用句嵌入向量重排序器增進中文醫療問答系統效能	★ 利用雙重註釋編碼器於中文健康照護實體連結
★ 聯合詞性與局部語境於中文健康照護實體關係擷取	★ 運用異質圖注意力網路於中文醫療答案擷取式摘要
★ 學習使用者意圖於中文醫療問題生成式摘要	★ 標籤強化超圖注意力網路模型於精神疾病文本多標籤分類
★ 上下文嵌入增強異質圖注意力網路模型於心理諮詢文本多標籤分類	★ 基於階層式聚類注意力之編碼解碼器於醫療問題多答案摘要
★ 探索門控圖神經網路於心理諮詢文字情感強度預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-10-4以後開放)

摘要(中)

自動化的寫作評估可以幫助寫作者減少語意表達上的錯誤，提升寫作品質。尤其在科技論文領域中，有相當多的非英文作為母語的寫作者，一個自動化評測工具可以幫助寫作者減少校稿的時間以及人力成本。我們提出SynBERT模型提取語句資訊，用以分辨科技英文論文中的句子是否需要語言編輯。我們以BERT衍生模型ELECTRA作為基底進行改良，使用科技論文作為訓練資料，結合自注意力、區間動態卷積、隨機合成注意力三個不同的注意力，提出一個合成器混合注意力機制，並使用元素替換檢測，作為語言模型的預訓練目標任務，最後經過微調進行科技英文寫作評估。
我們使用科技英文寫作評估競賽的AESW2016資料集，作為模型效能評估的實驗資料，該任務目標是要判斷句子是否需要語言編輯，以符合科技論文的寫作體裁，並提供三組資料：訓練集、發展集、測試集，分別包含1,196,940筆、148,478筆、143,804筆資料，其中需要語言編輯者約占四成。藉由實驗結果與錯誤分析可以得知，我們提出的SynBERT在此任務上可以達到最好的F1-score 65.26%，比過去競賽中使用的模型(MaxEnt, SVM, LSTM, CNN) 以及近年新興的模型 (BERT, RoBERTa, XLNet, ELECTRA) 表現都來的好。

摘要(英)

Automated writing assessment can help writers reduce semantic errors and improve writing quality, especially in the field of scientific papers, due to a huge number of authors who are not native English speakers. An automated evaluation tool can help writers save the time and labor cost of proofreading. We propose the SynBERT model to extract sentence information for classifying whether sentences in scientific English papers required language editing. We use ELECTRA model as the base architecture and make improvements by using scientific papers as training data and integrating three different attentions: self-attention, span-based dynamic convolution, and random synthesizer into proposed synthesizers based mixed-attentions. We use token replacement detection as the task of the language model and fine-tuned the pre-trained language model on the grammatical error detection task.
We use AESW 2016 datasets as the experimental data for the model evaluation. The goal of this task is to determine whether a sentence needs language editing to meet the writing style of scientific papers. It provides three sets of data: training set, development set, test set, respectively contains 1,196940, 148,478, and 143,804 articles, respectively. In the AESW 2016 datasets, about 40% of sentences need language editing. Our proposed SynBERT model can achieve the best F1-score of 65.26%, which is better than the methods used in the competitions (i.e., MaxEnt, SVM, LSTM, and CNN) and outperformed the recent models (i,e., BERT, RoBERTa, XLNet, and ELECTRA).

關鍵字(中)

★ 科技英文
★ 寫作評估
★ 預訓練語言模型
★ 混合注意力
★ 合成器

關鍵字(英)

★ Scientific English
★ writing evaluation
★ pre-trained language models
★ mixed-attentions
★ synthesizers

論文目次

摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 v
表目錄 vi
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與目的 3
1-3 章節概要 4
第二章相關研究 5
2-1 科學英文評測 5
2-2 注意力與預訓練語言模型 12
第三章研究方法 19
3-1 系統架構 19
3-2 模型元件 22
3-3 合成器混合注意力 27
第四章實驗結果 31
4-1 任務介紹 31
4-2 實驗設定 34
4-2 評估指標 36
4-3 訓練語料實驗 37
4-4 混合比例實驗 38
4-5 模型比較 39
4-6 效能比較 42
4-7 錯誤分析 43
第五章結論與未來工作 44
參考文獻 45

參考文獻

[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA.
[2] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, jun). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Minneapolis, Minnesota.
[3] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
[4] Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A Maximum Entropy Approach to Natural Language Processing [journal article]. Computational Linguistics, 22(1), 39-71.
[5] Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18-28.
[6] Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
[7] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
[8] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-term Memory. Neural computation, 9, 1735-1780.
[9] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. s., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems, 26.
[10] Pennington, J., Socher, R., & Manning, C. (2014, oct). GloVe: Global Vectors for Word Representation.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar.
[11] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information [journal article]. Transactions of the Association for Computational Linguistics, 5, 135-146.
[12] Dale, R., & Kilgarriff, A. (2011, sep). Helping Our Own: The HOO 2011 Pilot Shared Task.Proceedings of the 13th European Workshop on Natural Language Generation Nancy, France.
[13] Dale, R., Anisimoff, I., & Narroway, G. (2012). HOO 2012: a report on the preposition and determiner error correction shared task.
[14] Ng, H. T., Wu, S. M., Wu, Y., Hadiwinoto, C., & Tetreault, J. (2013, aug). The CoNLL-2013 Shared Task on Grammatical Error Correction.Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task Sofia, Bulgaria.
[15] Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto, R. H., & Bryant, C. (2014, jun). The CoNLL-2014 Shared Task on Grammatical Error Correction.Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task Baltimore, Maryland.
[16] Yu, L.-C., Lee, L.-H., & Chang, L. (2014). Overview of Grammatical Error Diagnosis for Learning Chinese as a Foreign Language.
[17] Lee, L.-H., Yu, L.-C., & Chang, L.-P. (2015, jul). Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis.Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications Beijing, China.
[18] Lee, L.-H., Rao, G., Yu, L.-C., Xun, E., Zhang, B., & Chang, L.-P. (2016, dec). Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis.Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016) Osaka, Japan.
[19] Bryant, C., Felice, M., Andersen, Ø. E., & Briscoe, T. (2019, aug). The BEA-2019 Shared Task on Grammatical Error Correction.Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications Florence, Italy.
[20] Daudaravicius, V., Banchs, R. E., Volodina, E., & Napoles, C. (2016, jun). A Report on the Automatic Evaluation of Scientific Writing Shared Task.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[21] Leydesdorff, L. (1998). Theories of citation? Scientometrics, 43(1), 5-25.
[22] Li, Z., & Ho, Y.-S. (2008). Use of citation per publication as an indicator to evaluate contingent valuation research. Scientometrics, 75(1), 97-110.
[23] Small, H. (2018). Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty. Journal of Informetrics, 12(2), 461-480.
[24] Ritchie, A. (2009). Citation context analysis for information retrieval.
[25] Cohan, A., & Goharian, N. (2017). Scientific article summarization using citation-context and article′s discourse structure. arXiv preprint arXiv:1704.06619.
[26] Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics, 6, 391-406.
[27] Cohan, A., Ammar, W., Van Zuylen, M., & Cady, F. (2019). Structural scaffolds for citation intent classification in scientific publications. arXiv preprint arXiv:1904.01608.
[28] Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Crawford, M., Downey, D., Dunkelberger, J., Elgohary, A., Feldman, S., & Ha, V. (2018). Construction of the literature graph in semantic scholar. arXiv preprint arXiv:1805.02262.
[29] Li, L., Xie, Y., Liu, W., Liu, Y., Jiang, Y., Qi, S., & Li, X. (2020, nov). CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization.Proceedings of the First Workshop on Scholarly Document Processing Online.
[30] Grundkiewicz, R., Junczys-Dowmunt, M., & Heafield, K. (2019, aug). Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data.Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications Florence, Italy.
[31] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[32] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.
[33] Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T., & McDermott, M. (2019, jun). Publicly Available Clinical BERT Embeddings.Proceedings of the 2nd Clinical Natural Language Processing Workshop Minneapolis, Minnesota, USA.
[34] Beltagy, I., Lo, K., & Cohan, A. (2019, nov). SciBERT: A Pretrained Language Model for Scientific Text.Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Hong Kong, China.
[35] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692.
[36] Clark, K., Luong, M.-T., Le, Q. V., & Manning, C. D. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.
[37] Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. (2018, nov). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding.Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP Brussels, Belgium.
[38] Jiang, Z., Yu, W., Zhou, D., Chen, Y., Feng, J., & Yan, S. (2020). ConvBERT: Improving BERT with Span-based Dynamic Convolution. ArXiv, abs/2008.02496.
[39] Wu, F., Fan, A., Baevski, A., Dauphin, Y. N., & Auli, M. (2019). Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430.
[40] Tay, Y., Bahri, D., Metzler, D., Juan, D.-C., Zhao, Z., & Zheng, C. (2020). Synthesizer: Rethinking self-attention in transformer models. arXiv preprint arXiv:2005.00743.
[41] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
[42] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
[43] Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. (2020, jul). S2ORC: The Semantic Scholar Open Research Corpus.Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics Online.
[44] Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Crawford, M., Downey, D., Dunkelberger, J., Elgohary, A., Feldman, S., Ha, V., Kinney, R., Kohlmeier, S., Lo, K., Murray, T., Ooi, H.-H., Peters, M., Power, J., Skjonsberg, S., Wang, L., Wilhelm, C., Yuan, Z., van Zuylen, M., & Etzioni, O. (2018, jun). Construction of the Literature Graph in Semantic Scholar.Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers) New Orleans - Louisiana.
[45] Shen, Z., Ma, H., & Wang, K. (2018, jul). A Web-scale system for scientific knowledge exploration.Proceedings of ACL 2018, System Demonstrations Melbourne, Australia.
[46] Witte, R., & Sateli, B. (2016, jun). Combining Off-the-shelf Grammar and Spelling Tools for the Automatic Evaluation of Scientific Writing (AESW) Shared Task 2016.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[47] Remse, M., Mesgar, M., & Strube, M. (2016, jun). Feature-Rich Error Detection in Scientific Writing Using Logistic Regression.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[48] Flickinger, D., Goodman, M., & Packard, W. (2016, jun). UW-Stanford System Description for AESW 2016 Shared Task on Grammatical Error Detection.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[49] Mamani Sanchez, L., & Franco-Penya, H.-H. (2016, jun). Combined Tree Kernel-based classifiers for Assessing Quality of Scientific Text.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[50] Lee, L.-H., Lin, B.-L., Yu, L.-C., & Tseng, Y.-H. (2016, jun). The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[51] Schmaltz, A., Kim, Y., Rush, A. M., & Shieber, S. (2016, jun). Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction.Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications San Diego, CA.
[52] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
[53] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, 32.

指導教授

李龍豪(Lung-Hao Lee)

審核日期

2021-10-6

推文