中文筆順預訓練效能之研究

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	黃晧誠	zh_TW
DC.creator	Hao-Cheng Huang	en_US
dc.date.accessioned	2019-7-19T07:39:07Z
dc.date.available	2019-7-19T07:39:07Z
dc.date.issued	2019
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106423012
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	預訓練(Pre-training)在自然語言處理極為重要，然而中文在較新的自然語言處理遷移學習研究較少，且多數是基於特徵及靜態嵌入方法之模型，因此本研究提出利用中文更深層的特徵——筆順，納入輸入維度以學習子字元之特徵，並以近期提出基於特徵方法 ELMo 及微調方法 BERT 的預訓練模型為基礎進行修改，試探討筆順對於中文預訓練模型的影響，提出利用卷積類神經網路模型考量筆順特徵(Stroke)之 ELMo+S 及 BERT+S 模型。最後，使用下游任務 XNLI 及 LCQMC 資料集進行評估，結果顯示筆順特徵對於這兩種預訓練模型並無明顯幫助。	zh_TW
dc.description.abstract	Pre-training is extremely important in natural language processing. However, Chinese studies about transfer learning are less, and most of them are uesd features-based and static embedding methods. Therefore, this study proposes to use deeper features by Chinese- strokes, and integrates input dimensions to learn the characteristics of sub-characters based on the recent proposed pre-training model ELMO with feature-based method and BERT with fine-tuning method. We proposed the ELMo+S and BERT+S models which consider stroke features by the convolutional neural network. Finally, the results show that stroke features are not significantly helpful for these two pre-training models on the downstream task XNLI and LCQMC datasets.	en_US
DC.subject	預訓練	zh_TW
DC.subject	表徵	zh_TW
DC.subject	自然語言處理	zh_TW
DC.subject	中文	zh_TW
DC.subject	筆順	zh_TW
DC.subject	Pre-training	en_US
DC.subject	Representation	en_US
DC.subject	Natural language processing	en_US
DC.subject	Chinese	en_US
DC.subject	Stroke	en_US
DC.title	中文筆順預訓練效能之研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 106423012 完整後設資料紀錄