博碩士論文 107521135 完整後設資料紀錄

DC 欄位 語言
DC.contributor電機工程學系zh_TW
DC.creator王昱翔zh_TW
DC.creatorYuh-Shyang Wangen_US
dc.date.accessioned2021-10-6T07:39:07Z
dc.date.available2021-10-6T07:39:07Z
dc.date.issued2021
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107521135
dc.contributor.department電機工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract自動化的寫作評估可以幫助寫作者減少語意表達上的錯誤,提升寫作品質。尤其在科技論文領域中,有相當多的非英文作為母語的寫作者,一個自動化評測工具可以幫助寫作者減少校稿的時間以及人力成本。我們提出SynBERT模型提取語句資訊,用以分辨科技英文論文中的句子是否需要語言編輯。我們以BERT衍生模型ELECTRA作為基底進行改良,使用科技論文作為訓練資料,結合自注意力、區間動態卷積、隨機合成注意力三個不同的注意力,提出一個合成器混合注意力機制,並使用元素替換檢測,作為語言模型的預訓練目標任務,最後經過微調進行科技英文寫作評估。 我們使用科技英文寫作評估競賽的AESW2016資料集,作為模型效能評估的實驗資料,該任務目標是要判斷句子是否需要語言編輯,以符合科技論文的寫作體裁,並提供三組資料:訓練集、發展集、測試集,分別包含1,196,940筆、148,478筆、143,804筆資料,其中需要語言編輯者約占四成。藉由實驗結果與錯誤分析可以得知,我們提出的SynBERT在此任務上可以達到最好的F1-score 65.26%,比過去競賽中使用的模型(MaxEnt, SVM, LSTM, CNN) 以及近年新興的模型 (BERT, RoBERTa, XLNet, ELECTRA) 表現都來的好。zh_TW
dc.description.abstractAutomated writing assessment can help writers reduce semantic errors and improve writing quality, especially in the field of scientific papers, due to a huge number of authors who are not native English speakers. An automated evaluation tool can help writers save the time and labor cost of proofreading. We propose the SynBERT model to extract sentence information for classifying whether sentences in scientific English papers required language editing. We use ELECTRA model as the base architecture and make improvements by using scientific papers as training data and integrating three different attentions: self-attention, span-based dynamic convolution, and random synthesizer into proposed synthesizers based mixed-attentions. We use token replacement detection as the task of the language model and fine-tuned the pre-trained language model on the grammatical error detection task. We use AESW 2016 datasets as the experimental data for the model evaluation. The goal of this task is to determine whether a sentence needs language editing to meet the writing style of scientific papers. It provides three sets of data: training set, development set, test set, respectively contains 1,196940, 148,478, and 143,804 articles, respectively. In the AESW 2016 datasets, about 40% of sentences need language editing. Our proposed SynBERT model can achieve the best F1-score of 65.26%, which is better than the methods used in the competitions (i.e., MaxEnt, SVM, LSTM, and CNN) and outperformed the recent models (i,e., BERT, RoBERTa, XLNet, and ELECTRA).en_US
DC.subject科技英文zh_TW
DC.subject寫作評估zh_TW
DC.subject預訓練語言模型zh_TW
DC.subject混合注意力zh_TW
DC.subject合成器zh_TW
DC.subjectScientific Englishen_US
DC.subjectwriting evaluationen_US
DC.subjectpre-trained language modelsen_US
DC.subjectmixed-attentionsen_US
DC.subjectsynthesizersen_US
DC.title運用合成器混合注意力改善BERT模型於科學語言編輯zh_TW
dc.language.isozh-TWzh-TW
DC.titleImproving BERT Model with Synthesizers based Mixed-Attentions for Scientific Language Editingen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明