中文筆順預訓練效能之研究

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/81281

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/81281

題名:	中文筆順預訓練效能之研究
作者:	黃晧誠;Huang, Hao-Cheng
貢獻者:	資訊管理學系
關鍵詞:	預訓練;表徵;自然語言處理;中文;筆順;Pre-training;Representation;Natural language processing;Chinese;Stroke
日期:	2019-07-19
上傳時間:	2019-09-03 15:42:34 (UTC+8)
出版者:	國立中央大學
摘要:	預訓練(Pre-training)在自然語言處理極為重要，然而中文在較新的自然語言處理遷移學習研究較少，且多數是基於特徵及靜態嵌入方法之模型，因此本研究提出利用中文更深層的特徵——筆順，納入輸入維度以學習子字元之特徵，並以近期提出基於特徵方法 ELMo 及微調方法 BERT 的預訓練模型為基礎進行修改，試探討筆順對於中文預訓練模型的影響，提出利用卷積類神經網路模型考量筆順特徵(Stroke)之 ELMo+S 及 BERT+S 模型。最後，使用下游任務 XNLI 及 LCQMC 資料集進行評估，結果顯示筆順特徵對於這兩種預訓練模型並無明顯幫助。;Pre-training is extremely important in natural language processing. However, Chinese studies about transfer learning are less, and most of them are uesd features-based and static embedding methods. Therefore, this study proposes to use deeper features by Chinese- strokes, and integrates input dimensions to learn the characteristics of sub-characters based on the recent proposed pre-training model ELMO with feature-based method and BERT with fine-tuning method. We proposed the ELMo+S and BERT+S models which consider stroke features by the convolutional neural network. Finally, the results show that stroke features are not significantly helpful for these two pre-training models on the downstream task XNLI and LCQMC datasets.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	243	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....