基於語意之資料增益方法於文本分類任務

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：59

、訪客IP：3.15.10.139

姓名

李昕儒(Hsin-Ju Lee) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

基於語意之資料增益方法於文本分類任務
(SDA: Semantic-based Data Augmentation on Text Classification Tasks)

相關論文

★ 多重標籤文本分類之實證研究 : word embedding 與傳統技術之比較	★ 基於圖神經網路之網路協定關聯分析
★ 學習模態間及模態內之共用表示式	★ Hierarchical Classification and Regression with Feature Selection
★ 病徵應用於病患自撰日誌之情緒分析	★ 基於注意力機制的開放式對話系統
★ 針對特定領域任務—基於常識的BERT模型之應用	★ 基於社群媒體使用者之硬體設備差異分析文本情緒強烈程度
★ 機器學習與特徵工程用於虛擬貨幣異常交易監控之成效討論	★ 捷運轉轍器應用長短期記憶網路與機器學習實現最佳維保時間提醒
★ 基於半監督式學習的網路流量分類	★ ERP日誌分析-以A公司為例
★ 企業資訊安全防護：網路封包蒐集分析與網路行為之探索性研究	★ 資料探勘技術在顧客關係管理之應用─以C銀行數位存款為例
★ 人臉圖片生成與增益之可用性與效率探討分析	★ 人工合成文本之資料增益於不平衡文字分類問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-10以後開放)

摘要(中)

當訓練資料量不足時，資料增益(Data Augmentation)是改善下游任務性能常見的技術之一。但是，相較於圖片的資料增益方法，資料增益在文字數據的的做法上幾乎沒有共識。原因是圖片很容易制定出通用的轉換規則(翻轉、旋轉、裁切等等)，然而一段文字如果更動其內文順序很容易會影響到原先的語意。在這項研究中，我們提出了一個資料增益的框架SDA：Semantic-based Data Augmentation，目的是利用現有的標籤資料，從大量的無標籤資料中找到跟標籤資料有相同語意的擴充樣本，用以提高文本分類任務的分類性能。SDA從外部的無標籤文本中，利用採樣的方法找出語意與原始標籤資料相似的文本，並給予與原始標籤文本相同標籤來增加訓練資料。本研究透過實驗說明了語意相似的無標籤文本對於下游分類任務的實用性，我們在相同框架中分別使用了基於不同訓練目標訓練出的文本表示。我們首先探討在不同的表示方法對於語意的捕捉能力分別為何，以及評估將不同數量的擴增樣本添加到訓練集中的效果。
SDA的概念簡單，但對於提升下游分類性能的表現十分卓越。SDA在七個分類數據集中的六個，明顯優於其他常見的增益方法。此外，SDA不僅僅在性能的提升上勝過其它增益方法，在與真實資料相比，也就是添加原本的標籤資料到訓練集當中的情況下，也能夠取得不亞於真實資料的分類性能。

摘要(英)

Data augmentation is among the most widely used techniques for improving the performance of downstream tasks when insufficient training data is present. However, there is little agreement on the augmentation approaches of text data such as transformation rules. In this study, we propose a flexible augmentation framework, SDA: Semantic-based Data Augmentation, which aims to improve the classification performance on text classification tasks. The SDA augments the insufficient training documents by sampling external unlabeled documents that are semantically similar to the existing training documents. This study sheds new light on the usefulness of semantics. We take advantage of advanced representation methods into our framework. We first investigate the ability of semantic capturing on different representation methods and then evaluate the effect of adding different quantities of semantically similar texts into the training data.
The SDA is conceptually simple and shows promising performance. It obtains remarkable results on seven classification datasets. Moreover, the SDA not only outperforms the data augmentation benchmarks, but also achieves comparable performances where labeled documents are added into the training data. Through the experiments and analysis, we knew that the SDA can be applied to improve the performance of classifiers for a wide range of classification tasks, such as sentiment analysis and opinion polarity detection, even training documents are severely insufficient.

關鍵字(中)

★ 資料增益
★ 文本語意相似度
★ 深度學習
★ 文字分類

關鍵字(英)

★ data augmentation
★ semantic textual similarity
★ deep learning
★ text classification

論文目次

1. Introduction 1
1.1. Overviews 1
1.2. Motivation and Objectives 2
1.3. Research Questions 4
1.4. Structure of the thesis 5
2. Related Work 6
2.1. Data Augmentation 6
2.1.1. Generative-based Data Augmentation 6
2.1.2. Sample-based Data Augmentation 13
2.2. Text Representation 15
2.2.1. Word Representation 15
2.2.2. Sentence Representation 23
2.3. Chapter Summary 29
3. SDA: Semantic-based Data Augmentation 30
3.1. Overview 30
3.2. The Proposed Method: The SDA 31
3.3. Experiments 33
3.3.1. Datasets 33
3.3.2. Dataset Preparation 35
3.3.3. Experiment Settings 35
3.3.4. Proposed Experiments 43
3.4. Chapter Summary 45
4. Experiment Results 46
4.1. Experiment 1 – Effectiveness of Augmentation Methods 47
4.1.1. Overview of Experiment 1 Results 47
4.1.2. Experiment 1 Conclusions 54
4.2. Experiment 2 – Effectiveness of Semantic Capturing 55
4.2.1. Overview of Experiment 2 Results 55
4.2.2. Experiment 2 Conclusions 60
4.3. Chapter Summary 61
5. Conclusion 62
5.1. Overall Summary 62
5.2. Contributions 63
5.3. Study Limitations 64
5.4. Future Research 64

參考文獻

Agibetov, A., Blagec, K., Xu, H., Samwald, M., 2018. Fast and scalable neural embedding models for biomedical sentence classification. BMC Bioinformatics 19, 541. https://doi.org/10.1186/s12859-018-2496-4
Aixin Sun, Ee-Peng Lim, 2001. Hierarchical text classification and evaluation, in: Proceedings 2001 IEEE International Conference on Data Mining. Presented at the Proceedings 2001 IEEE International Conference on Data Mining, pp. 521–528. https://doi.org/10.1109/ICDM.2001.989560
Arora, S., Liang, Y., Ma, T., 2019. A simple but tough-to-beat baseline for sentence embeddings. Presented at the 5th International Conference on Learning Representations, ICLR 2017.
Badimala, P., Mishra, C., Modam Venkataramana, R.K., Bukhari, S., Dengel, A., 2019. A Study of Various Text Augmentation Techniques for Relation Classification in Free Text. pp. 360–367. https://doi.org/10.5220/0007311003600367
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T., 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5, 135–146. https://doi.org/10.1162/tacl_a_00051
Bouthillier, X., Konda, K., Vincent, P., Memisevic, R., 2016. Dropout as data augmentation. arXiv:1506.08700 [cs, stat].
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D., 2015. A large annotated corpus for learning natural language inference, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2015, Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642. https://doi.org/10.18653/v1/D15-1075
Cambria, E., Poria, S., Gelbukh, A., Thelwall, M., 2017. Sentiment Analysis Is a Big Suitcase. IEEE Intell. Syst. 32, 74–80. https://doi.org/10.1109/MIS.2017.4531228
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., St. John, R., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Strope, B., Kurzweil, R., 2018a. Universal Sentence Encoder for English, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Brussels, Belgium, pp. 169–174. https://doi.org/10.18653/v1/D18-2029
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N.L.U., John, R.S., Constant, N., Guajardo-Céspedes, M., Yuan, S., Tar, C., Sung, Y., Strope, B., Kurzweil, R., 2018b. Universal Sentence Encoder, in: In Submission to: EMNLP Demonstration. Brussels, Belgium.
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A., 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets. arXiv:1405.3531 [cs].
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv:1406.1078 [cs, stat].
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A., 2017a. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2017, Association for Computational Linguistics, Copenhagen, Denmark, pp. 670–680. https://doi.org/10.18653/v1/D17-1070
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A., 2017b. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2017, Association for Computational Linguistics, Copenhagen, Denmark, pp. 670–680. https://doi.org/10.18653/v1/D17-1070
Das, A., Yenala, H., Chinnakotla, M., Shrivastava, M., 2016. Together we stand: Siamese Networks for Similar Question Retrieval, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2016, Association for Computational Linguistics, Berlin, Germany, pp. 378–387. https://doi.org/10.18653/v1/P16-1036
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs].
Edunov, S., Ott, M., Auli, M., Grangier, D., 2018. Understanding Back-Translation at Scale, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp. 489–500. https://doi.org/10.18653/v1/D18-1045
Fedus, W., Goodfellow, I., Dai, A., 2018. MaskGAN: Better Text Generation via Filling in the ____.
Fernández Anta, A., Núñez Chiroque, L., Morere, P., Santos Méndez, A., 2013. Sentiment analysis and topic detection of Spanish tweets: a comparative study of NLP techniques.
Garay-Maestre, U., Gallego, A.-J., Calvo-Zaragoza, J., 2019. Data Augmentation via Variational Auto-Encoders, in: Vera-Rodriguez, R., Fierrez, J., Morales, A. (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp. 29–37. https://doi.org/10.1007/978-3-030-13469-3_4
Goodfellow, I., Shlens, J., Szegedy, C., 2015. Explaining and Harnessing Adversarial Examples, in: International Conference on Learning Representations.
Han, D., Liu, Q., Fan, W., 2018. A new image classification method using CNN transfer learning and web data augmentation. Expert Systems with Applications 95, 43–56. https://doi.org/10.1016/j.eswa.2017.11.028
Han, S., Gao, J., Ciravegna, F., 2019. Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus.
He, Z., Xie, L., Chen, X., Zhang, Y., Wang, Y., Tian, Q., 2019. Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data. arXiv:1909.09148 [cs, stat].
Hou, Y., Liu, Y., Che, W., Liu, T., 2018. Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding, in: Proceedings of the 27th International Conference on Computational Linguistics. Presented at the COLING 2018, Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1234–1245.
Howard, A.G., 2013. Some Improvements on Deep Convolutional Neural Network Based Image Classification. arXiv:1312.5402 [cs].
Howard, J., Ruder, S., 2018. Universal Language Model Fine-tuning for Text Classification, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2018, Association for Computational Linguistics, Melbourne, Australia, pp. 328–339. https://doi.org/10.18653/v1/P18-1031
Joachims, T., 1998. Text categorization with Support Vector Machines: Learning with many relevant features, in: Nédellec, C., Rouveirol, C. (Eds.), Machine Learning: ECML-98, Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 137–142. https://doi.org/10.1007/BFb0026683
Johnstone, I.M., Titterington, D.M., 2009. Statistical challenges of high-dimensional data. Proc. R. Soc. A 367, 4237–4253. https://doi.org/10.1098/rsta.2009.0159
Kafle, K., Yousefhussien, M.A., Kanan, C., 2017. Data Augmentation for Visual Question Answering, in: INLG. https://doi.org/10.18653/v1/w17-3529
Kim, Y., 2014. Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882 [cs].
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kiros, J., Chan, W., 2018. InferLite: Simple Universal Sentence Representations from Natural Language Inference Data, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 4868–4874. https://doi.org/10.18653/v1/D18-1524
Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Torralba, A., Urtasun, R., Fidler, S., 2015. Skip-thought Vectors, in: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15. MIT Press, Cambridge, MA, USA, pp. 3294–3302.
Kobayashi, S., 2018. Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. arXiv:1805.06201 [cs].
Konno, T., Iwazume, M., 2018. Icing on the Cake: An Easy and Quick Post-Learnig Method You Can Try After Deep Learning. arXiv:1807.06540 [cs, stat].
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems. pp. 1097–1105.
Le, Q., Mikolov, T., 2014. Distributed Representations of Sentences and Documents, in: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14. JMLR.org, p. II–1188–II–1196.
Li, J., Jia, R., He, H., Liang, P., 2018. Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Presented at the NAACL-HLT 2018, Association for Computational Linguistics, New Orleans, Louisiana, pp. 1865–1874. https://doi.org/10.18653/v1/N18-1169
Liu, P., Qiu, X., Huang, X., 2016. Recurrent neural network for text classification with multi-task learning, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16. AAAI Press, New York, New York, USA, pp. 2873–2879.
Liu, P.J., Saleh, M.A., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., Shazeer, N., 2018. Generating Wikipedia by Summarizing Long Sequences.
Lu, S., Zhu, Y., Zhang, W., Wang, J., Yu, Y., 2018. Neural Text Generation: Past, Present and Beyond. arXiv:1803.07133 [cs].
Malandrakis, N., Shen, M., Goyal, A., Gao, S., Sethi, A., Metallinou, A., 2019. Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents. arXiv:1910.03487 [cs, stat].
Manning, C.D., Manning, C.D., Schütze, H., 1999. Foundations of statistical natural language processing. MIT press.
Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., Malossi, C., 2018. BAGAN: Data Augmentation with Balancing GAN. arXiv:1803.09655 [cs, stat].
McCann, B., Bradbury, J., Xiong, C., Socher, R., 2017. Learned in Translation: Contextualized Word Vectors, in: NIPS.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J., 2013. Distributed Representations of Words and Phrases and Their Compositionality, in: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13. Curran Associates Inc., USA, pp. 3111–3119.
Miller, G.A., 1995. WordNet: a lexical database for English. Commun. ACM 38, 39–41. https://doi.org/10.1145/219717.219748
Mitra, T., Gilbert, E., 2015. Credbank: A large-scale social media corpus with associated credibility annotations, in: Ninth International AAAI Conference on Web and Social Media.
Moreno-Barea, F.J., Strazzera, F., Jerez, J.M., Urda, D., Franco, L., 2018. Forward Noise Adjustment Scheme for Data Augmentation, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI). Presented at the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, Bangalore, India, pp. 728–734. https://doi.org/10.1109/SSCI.2018.8628917
Nicosia, M., Moschitti, A., 2017. Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information, in: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Presented at the CoNLL 2017, Association for Computational Linguistics, Vancouver, Canada, pp. 260–270. https://doi.org/10.18653/v1/K17-1027
Pagliardini, M., Gupta, P., Jaggi, M., 2018. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Presented at the NAACL-HLT 2018, Association for Computational Linguistics, New Orleans, Louisiana, pp. 528–540. https://doi.org/10.18653/v1/N18-1049
Pang, B., Lee, L., 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales, in: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Presented at the ACL 2005, Association for Computational Linguistics, Ann Arbor, Michigan, pp. 115–124. https://doi.org/10.3115/1219840.1219855
Pang, B., Lee, L., 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04). Presented at the ACL 2004, Barcelona, Spain, pp. 271–278. https://doi.org/10.3115/1218955.1218990
Papadaki, M., Chalkidis, I., Michos, A., 2017. Data Augmentation Techniques for Legal Text Analytics.
Pennington, J., Socher, R., Manning, C., 2014. Glove: Global Vectors for Word Representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162
Pérez, L.A., 2019. The Effect of Embeddings on SQuAD v 2 . 0.
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L., 2018. Deep contextualized word representations. arXiv:1802.05365 [cs].
Polson, N.G., Scott, S.L., 2011. Data augmentation for support vector machines. Bayesian Analysis 6, 1–23.
Radford, A., 2018. Improving Language Understanding by Generative Pre-Training.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 9.
Reimers, N., Gurevych, I., 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Presented at the EMNLP-IJCNLP 2019, Association for Computational Linguistics, Hong Kong, China, pp. 3980–3990. https://doi.org/10.18653/v1/D19-1410
Rong, X., 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
Schuller, B.W., 2018. Data Augmentation and Deep Learning for Hate Speech Detection.
Sennrich, R., Haddow, B., Birch, A., 2016. Improving Neural Machine Translation Models with Monolingual Data, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, pp. 86–96. https://doi.org/10.18653/v1/P16-1009
Set (mathematics), 2020. . Wikipedia.
Shen, T., Lei, T., Barzilay, R., Jaakkola, T., 2017. Style Transfer from Non-Parallel Text by Cross-Alignment. arXiv:1705.09655 [cs].
Shorten, C., Khoshgoftaar, T.M., 2019. A survey on Image Data Augmentation for Deep Learning. J Big Data 6, 60. https://doi.org/10.1186/s40537-019-0197-0
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C., 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2013, Association for Computational Linguistics, Seattle, Washington, USA, pp. 1631–1642.
Takahashi, N., Gygli, M., Pfister, B., Van Gool, L., 2016. Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection. arXiv:1604.07160 [cs].
Tang, D., Qin, B., Liu, T., 2015. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, pp. 1422–1432.
Turney, P.D., Pantel, P., 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research 37, 141–188.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention Is All You Need. arXiv:1706.03762 [cs].
Wang, S., Manning, C., 2012. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification, in: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Presented at the ACL 2012, Association for Computational Linguistics, Jeju Island, Korea, pp. 90–94.
Wang, W.Y., Yang, D., 2015. That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Presented at the Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 2557–2563. https://doi.org/10.18653/v1/D15-1306
Wei, J., Zou, K., 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. arXiv:1901.11196 [cs].
Wiebe, J., Wilson, T., Cardie, C., 2005. Annotating Expressions of Opinions and Emotions in Language. Language Res Eval 39, 165–210. https://doi.org/10.1007/s10579-005-7880-9
Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D., 2016. Understanding Data Augmentation for Classification: When to Warp?, in: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). Presented at the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Gold Coast, Australia, pp. 1–6. https://doi.org/10.1109/DICTA.2016.7797091
Wu, R., Yan, S., Shan, Y., Dang, Q., Sun, G., 2015. Deep image: Scaling up image recognition. arXiv preprint arXiv:1501.02876 7.
Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., Le, Q.V., 2019. Unsupervised Data Augmentation for Consistency Training. arXiv:1904.12848 [cs, stat].
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V., 2019. Xlnet: Generalized autoregressive pretraining for language understanding, in: Advances in Neural Information Processing Systems. pp. 5753–5763.
Young, T., Hazarika, D., Poria, S., Cambria, E., 2018. Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine 13, 55–75.
Yu, A.W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K., Norouzi, M., Le, Q.V., 2018. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. arXiv:1804.09541 [cs].
Zhang, X., Zhao, J., LeCun, Y., 2015a. Character-level Convolutional Networks for Text Classification, in: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 649–657.
Zhang, X., Zhao, J., LeCun, Y., 2015b. Character-level convolutional networks for text classification, in: Advances in Neural Information Processing Systems. pp. 649–657.

指導教授

柯士文(Shih-Wen George Ke)

審核日期

2020-7-20

推文