博碩士論文 107423007 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator李昕儒zh_TW
DC.creatorHsin-Ju Leeen_US
dc.date.accessioned2020-7-20T07:39:07Z
dc.date.available2020-7-20T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107423007
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract當訓練資料量不足時,資料增益(Data Augmentation)是改善下游任務性能常見的技術之一。但是,相較於圖片的資料增益方法,資料增益在文字數據的的做法上幾乎沒有共識。原因是圖片很容易制定出通用的轉換規則(翻轉、旋轉、裁切等等),然而一段文字如果更動其內文順序很容易會影響到原先的語意。在這項研究中,我們提出了一個資料增益的框架SDA:Semantic-based Data Augmentation,目的是利用現有的標籤資料,從大量的無標籤資料中找到跟標籤資料有相同語意的擴充樣本,用以提高文本分類任務的分類性能。SDA從外部的無標籤文本中,利用採樣的方法找出語意與原始標籤資料相似的文本,並給予與原始標籤文本相同標籤來增加訓練資料。本研究透過實驗說明了語意相似的無標籤文本對於下游分類任務的實用性,我們在相同框架中分別使用了基於不同訓練目標訓練出的文本表示。我們首先探討在不同的表示方法對於語意的捕捉能力分別為何,以及評估將不同數量的擴增樣本添加到訓練集中的效果。 SDA的概念簡單,但對於提升下游分類性能的表現十分卓越。SDA在七個分類數據集中的六個,明顯優於其他常見的增益方法。此外,SDA不僅僅在性能的提升上勝過其它增益方法,在與真實資料相比,也就是添加原本的標籤資料到訓練集當中的情況下,也能夠取得不亞於真實資料的分類性能。zh_TW
dc.description.abstractData augmentation is among the most widely used techniques for improving the performance of downstream tasks when insufficient training data is present. However, there is little agreement on the augmentation approaches of text data such as transformation rules. In this study, we propose a flexible augmentation framework, SDA: Semantic-based Data Augmentation, which aims to improve the classification performance on text classification tasks. The SDA augments the insufficient training documents by sampling external unlabeled documents that are semantically similar to the existing training documents. This study sheds new light on the usefulness of semantics. We take advantage of advanced representation methods into our framework. We first investigate the ability of semantic capturing on different representation methods and then evaluate the effect of adding different quantities of semantically similar texts into the training data. The SDA is conceptually simple and shows promising performance. It obtains remarkable results on seven classification datasets. Moreover, the SDA not only outperforms the data augmentation benchmarks, but also achieves comparable performances where labeled documents are added into the training data. Through the experiments and analysis, we knew that the SDA can be applied to improve the performance of classifiers for a wide range of classification tasks, such as sentiment analysis and opinion polarity detection, even training documents are severely insufficient.en_US
DC.subject資料增益zh_TW
DC.subject文本語意相似度zh_TW
DC.subject深度學習zh_TW
DC.subject文字分類zh_TW
DC.subjectdata augmentationen_US
DC.subjectsemantic textual similarityen_US
DC.subjectdeep learningen_US
DC.subjecttext classificationen_US
DC.title基於語意之資料增益方法於文本分類任務zh_TW
dc.language.isozh-TWzh-TW
DC.titleSDA: Semantic-based Data Augmentation on Text Classification Tasksen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明