透過弱擴增與強擴增輔助半監督式學習中的文本分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：18.223.125.219

姓名

何逸家(Yi-Jia He) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

透過弱擴增與強擴增輔助半監督式學習中的文本分類

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

半監督式學習（Semi-supervised Learning）有效地使用無標籤資料來提升模型的表現，在當前研究中展現了其在少量標籤資料困境（Low-data Regime）下的卓越表現；當前使用無標籤資料的方法主要為一致性訓練（Consistency Training）並搭配合適的人工標籤作為訓練目標，然而在自然語言處理中，當前方法的一致性訓練的設計仍不夠嚴謹，且人工標籤也不夠具有意義，使模型不僅無法學習到足夠的無標籤資料資訊，導致容易在標籤資料上過度學習（Overfitting），甚至會因為品質差的人工標籤而對模型造成負面影響；因此本研究提出以弱擴增（Weakly Augmentation）、強擴增（Strongly Augmentation）後的無標籤資料並搭配閥值，建立更嚴謹的一致性訓練過程，也透過混合擴增結合使用標籤資料與無標籤資料，讓模型更好地避免在標籤資料上過度學習，而實驗結果證實本研究提出方法在僅使用每個類別各10筆標籤資料的情況下，於AG NEWS文本分類資料集上取得87.88%的準確率，高於當前方法1.58%，並在Yahoo! Answers資料集上取得67.3%的準確率，高於當前方法3.5%。

摘要(英)

Semi-supervised learning can effectively utilize unlabeled data improving deep learning model’s performance and it has shown its outstanding performance in the low-data regime in current researches. The current mainly approaches of using unlabeled data is consistency training with a suitable artificial label as training target. However, these approaches are still not rigorous enough and the training targets are also not meaningful enough in natural language processing, so that the model not only can’t get enough information from unlabeled data and easily lead to overfitting on labeled data, but also have a negative impact due to poor quality training targets. Therefore, this work presents a more rigorous consistency training process by using weakly augmented and strongly augmented unlabeled data with confidence-based masking, and besides, we mix the labeled data and unlabeled data so that can utilize labeled data and unlabeled data together, this allows the model to better avoid overfitting on labeled data. Our approaches outperformed current approaches on two text classification datasets AG NEWS and Yahoo! Answers while only utilize 10 labeled data per class.
Our approach achieves better performance on two text classification benchmarks, including 87.88% accuracy on AG NEWS and 67.3% accuracy on Yahoo! Answers with 10 labeled data per class, the gap of performance between our approach and current state-of-the-art on AG NEWS and Yahoo! Answers are respectively 1.58% and 3.5%.

關鍵字(中)

★ 半監督式學習
★ 一致性訓練
★ 資料擴增
★ 文本分類
★ 自然語言處理

關鍵字(英)

★ Semi-supervised Learning
★ Consistency Training
★ Data Augmentation
★ Text Classification
★ Natural Language Processing

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 vi
表目錄 viii
一、緒論 1
1-1研究背景 1
1-2研究動機 3
1-3研究目的 4
1-4論文架構 5
二、文獻探討 6
2-1自然語言處理領域中的資料擴增 7
2-1-1詞彙替換（Random replacement） 8
2-1-2加入隨機噪訊（Random Noise Injection） 10
2-1-3反向翻譯（Back Translation） 11
2-1-4表達式轉換（Text Surface Transformation） 11
2-1-5交叉擴增（Instance crossover augmentation） 12
2-1-6語法轉換（Syntax-tree Manipulation） 13
2-1-7混合擴增（wordMixup & sentMixup） 13
2-1-8 基於預訓練模型的資料擴增 14
2-2自然語言處理領域中的遷移學習 15
2-2-1 自編碼語言模型（Autoencoder language model） 15
2-2-2 自回歸語言模型（Autoregressive language model） 16
2-2-3 排列語言模型（Permutation language model） 16
2-3半監督式學習 17
2-3-1 Pseudo-Label 19
2-3-2 Π-Model / Temporal Ensembling 20
2-3-3 Mean Teacher 21
2-3-4 Virtual Adversarial Training（VAT） 22
2-3-5 MixMatch 23
2-3-6 Unsupervised Data Augmentation（UDA） 24
2-3-7 VAMPIRE 25
2-3-8 MixText 26
2-3-9 歷年半監督式學習研究比較 27
三、研究方法 29
3-1 研究流程 29
3-2 半監督式學習架構 30
3-3 無標籤資料的弱擴增與強擴增 32
3-4 文本分類模型 34
3-5 無標籤資料、弱擴增資料、強擴增資料的self-target prediction 34
3-6 標籤資料、無標籤資料、強擴增資料的混合擴增 35
3-7 一致性訓練與監督式訓練 38
3-8 整體架構總結 40
四、實驗設計與分析 44
4-1 前置作業 44
4-2 主要實驗 46
4-2-1 實驗一：本研究與當前文本分類之半監督式學習研究表現比較 46
4-2-2 實驗二：消融測試 50
4-3 實驗小結 52
五、結論與未來研究方向 54
5-1 結論 54
5-2 研究限制 54
5-3 未來研究方向 55
參考文獻 56

參考文獻

[1] 黃晉豪（2020）。以注意力機制輔助文本分類中的資料增益。國立中央大學資訊管理研究所碩士論文，桃園市。http://ir.lib.ncu.edu.tw/handle/987654321/84051#.YOdMz-gzaUk
[2] Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. (2019). MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv:1905.02249 [cs, stat]. http://arxiv.org/abs/1905.02249
[3] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
[4] Chang, M.-W., Ratinov, L.-A., Roth, D., & Srikumar, V. (2008). Importance of Semantic Representation: Dataless Classification. Aaai, 2, 830–835.
[5] Chapelle, O., Scholkopf, B., & Zien, A. (2009). Semi-supervised learning (chapelle, o. Et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3), 542–542.
[6] Chawla, N. V., & Karakoulas, G. (2005). Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains. Journal of Artificial Intelligence Research, 23, 331–366. https://doi.org/10.1613/jair.1509
[7] Chen, J., Yang, Z., & Yang, D. (2020). Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. arXiv preprint arXiv:2004.12239.
[8] Coulombe, C. (2018). Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs. arXiv:1812.04718 [cs]. http://arxiv.org/abs/1812.04718
[9] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[10] Grandvalet, Y., & Bengio, Y. (2005). Semi-supervised learning by entropy minimization. CAP, 367, 281–296.
[11] Guo, H., Mao, Y., & Zhang, R. (2019). Augmenting Data with Mixup for Sentence Classification: An Empirical Study. arXiv:1905.08941 [cs]. http://arxiv.org/abs/1905.08941
[12] Gururangan, S., Dang, T., Card, D., & Smith, N. A. (2019). Variational Pretraining for Semi-supervised Text Classification. arXiv:1906.02242 [cs]. http://arxiv.org/abs/1906.02242
[13] Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., & Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597
[14] Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
[15] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning, 448–456.
[16] Jawahar, G., Sagot, B., & Seddah, D. (2019). What does BERT learn about the structure of language? ACL 2019-57th Annual Meeting of the Association for Computational Linguistics.
[17] Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., & Liu, Q. (2020). TinyBERT: Distilling BERT for Natural Language Understanding. arXiv:1909.10351 [cs]. http://arxiv.org/abs/1909.10351
[18] Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
[19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
[20] Krogh, A., & Hertz, J. A. (1992). A simple weight decay can improve generalization. Advances in neural information processing systems, 950–957.
[21] Kumar, V., Choudhary, A., & Cho, E. (2021). Data Augmentation using Pre-trained Transformer Models. arXiv:2003.02245 [cs]. http://arxiv.org/abs/2003.02245
[22] Laine, S., & Aila, T. (2016). Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242.
[23] Lee, D.-H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. Workshop on challenges in representation learning, ICML, 3(2), 896.
[24] Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
[25] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
[26] Loper, E., & Bird, S. (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.
[27] Luque, F. M. (2019). Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis. arXiv:1909.11241 [cs]. http://arxiv.org/abs/1909.11241
[28] Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 142–150.
[29] Mendes, P. N., Jakob, M., & Bizer, C. (2012). DBpedia: A multilingual cross-domain knowledge base.
[30] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[31] Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748
[32] Miyato, T., Maeda, S., Koyama, M., & Ishii, S. (2018). Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8), 1979–1993.
[33] Oliver, A., Odena, A., Raffel, C., Cubuk, E. D., & Goodfellow, I. J. (2019). Realistic Evaluation of Deep Semi-Supervised Learning Algorithms. arXiv:1804.09170 [cs, stat]. http://arxiv.org/abs/1804.09170
[34] Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., & Auli, M. (2019). fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038.
[35] Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162
[36] Prechelt, L. (1998). Early stopping-but when? 收入 Neural Networks: Tricks of the trade (頁 55–69). Springer.
[37] Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., & Huang, X. (2020). Pre-trained Models for Natural Language Processing: A Survey. Science China Technological Sciences, 63(10), 1872–1897. https://doi.org/10.1007/s11431-020-1647-3
[38] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
[39] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs]. http://arxiv.org/abs/1506.02640
[40] Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks.
[41] Sajjadi, M., Javanmardi, M., & Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Advances in neural information processing systems, 29, 1163–1171.
[42] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929–1958.
[43] Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the Gap to Human-Level Performance in Face Verification. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1701–1708. https://doi.org/10.1109/CVPR.2014.220
[44] Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780.
[45] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Lukasz, & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 5998–6008.
[46] Wang, W. Y., & Yang, D. (2015). That’s So Annoying‼!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2557–2563. https://doi.org/10.18653/v1/D15-1306
[47] Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.
[48] Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., … Rush, A. M. (2020). HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771 [cs]. http://arxiv.org/abs/1910.03771
[49] Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., & Le, Q. V. (2019). Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848.
[50] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
[51] Yu, A. W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K., Norouzi, M., & Le, Q. V. (2018). QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. arXiv:1804.09541 [cs]. http://arxiv.org/abs/1804.09541
[52] Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
[53] Zhang, X., Zhao, J., & LeCun, Y. (2016). Character-level Convolutional Networks for Text Classification. arXiv:1509.01626 [cs]. http://arxiv.org/abs/1509.01626
[54] Zhu, X., & Goldberg, A. B. (2009). Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1), 1–130.

指導教授

林熙禎(Si-Jin Lin)

審核日期

2021-8-11

推文