參考文獻 |
Ah-Pine, J., Morales, E.P.S., 2016. A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis 8.
Akkaradamrongrat, S., Kachamas, P., Sinthupinyo, S., 2019. Text Generation for Imbalanced Text Classification, in: 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE). Presented at the 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), IEEE, Chonburi, Thailand, pp. 181–186. https://doi.org/10.1109/JCSSE.2019.8864181
Akosa, J., 2017. Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data 12.
Ali, A., Shamsuddin, S.M., Ralescu, A.L., 2015. Classification with class imbalance problem: A Review 29.
Arjovsky, M., Chintala, S., Bottou, L., 2017. Wasserstein Generative Adversarial Networks, in: International Conference on Machine Learning. Presented at the International Conference on Machine Learning, pp. 214–223.
Bahuleyan, H., Mou, L., Vechtomova, O., Poupart, P., 2018. Variational Attention for Sequence-to-Sequence Models. arXiv:1712.08207 [cs].
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N., 2015. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks 9.
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C., 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv:1905.02249 [cs, stat].
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S., 2016. Generating Sentences from a Continuous Space. arXiv:1511.06349 [cs].
Bradley Efron, Robert Tibshirani, 1993. An introduction to the bootstrap. CRC press.
Caccia, M., Caccia, L., Fedus, W., Larochelle, H., Pineau, J., Charlin, L., 2020. Language GANs Falling Short. arXiv:1811.02549 [cs].
Chawla, N.V., 2005. Data Mining for Imbalanced Datasets: An Overview, in: Maimon, O., Rokach, L. (Eds.), Data Mining and Knowledge Discovery Handbook. Springer US, Boston, MA, pp. 853–867. https://doi.org/10.1007/0-387-25465-X_40
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic Minority Over-sampling Technique. jair 16, 321–357. https://doi.org/10.1613/jair.953
Che, T., Li, Y., Zhang, R., Hjelm, R.D., Li, W., Song, Y., Bengio, Y., 2017. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. arXiv:1702.07983 [cs].
Chen, E., Lin, Y., Xiong, H., Luo, Q., Ma, H., 2011. Exploiting probabilistic topic models to improve text categorization under class imbalance. Information Processing & Management 47, 202–214. https://doi.org/10.1016/j.ipm.2010.07.003
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734. https://doi.org/10.3115/v1/D14-1179
Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs].
Cıfka, O., Severyn, A., Alfonseca, E., Filippova, K., 2018. Eval all, trust a few, do wrong to none: Comparing sentence generation models 9.
Colah’s blog, 2015. Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
d’Autume, C. de M., Rosca, M., Rae, J., Mohamed, S., 2020. Training language GANs from Scratch. arXiv:1905.09922 [cs, stat].
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs].
Esuli, A., Sebastiani, F., 2006. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining 6.
Frank, E., Bouckaert, R.R., 2006. Naive Bayes for Text Classification with Unbalanced Classes, in: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (Eds.), Knowledge Discovery in Databases: PKDD 2006. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 503–510. https://doi.org/10.1007/11871637_49
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H., 2018. GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification. Neurocomputing 321, 321–331. https://doi.org/10.1016/j.neucom.2018.09.013
Ganganwar, V., 2012. An overview of classification algorithms for imbalanced datasets 2, 6.
Ger, S., Klabjan, D., 2019. Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification. arXiv:1901.02514 [cs, stat].
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative Adversarial Nets 9.
Google Code, 2016. https://code.google.com/archive/p/word2vec/
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., Wang, J., 2017. Long Text Generation via Adversarial Training with Leaked Information. arXiv:1709.08624 [cs].
Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G., 2008. On the Class Imbalance Problem, in: 2008 Fourth International Conference on Natural Computation. Presented at the 2008 Fourth International Conference on Natural Computation, IEEE, Jinan, Shandong, China, pp. 192–201. https://doi.org/10.1109/ICNC.2008.871
Haibo He, Yang Bai, Garcia, E.A., Shutao Li, 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning, in: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Presented at the 2008 IEEE International Joint Conference on Neural Networks (IJCNN 2008 - Hong Kong), IEEE, Hong Kong, China, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Han, H., Wang, W.-Y., Mao, B.-H., 2005. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, in: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (Eds.), Advances in Intelligent Computing, Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 878–887. https://doi.org/10.1007/11538059_91
Harris, Z.S., 1954. Distributional Structure. WORD 10, 146–162. https://doi.org/10.1080/00437956.1954.11659520
Hinton, G.E., Osindero, S., Teh, Y.-W., 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18, 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527
Hu, F., Li, H., 2013. A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE. https://doi.org/10.1155/2013/694809
Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.P., 2018. Toward Controlled Generation of Text. arXiv:1703.00955 [cs, stat].
Huszár, F., 2015. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? arXiv:1511.05101 [cs, math, stat].
Ibrahim, M., Torki, M., El-Makky, N., 2018. Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning, in: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). Presented at the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Orlando, FL, pp. 875–878. https://doi.org/10.1109/ICMLA.2018.00141
Ioffe, S., Szegedy, C., 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs].
Jiang, H., 2016. Sentiment Analysis on Imbalanced Airline Data 13.
Joachims, T., 1998. Text categorization with Support Vector Machines: Learning with many relevant features, in: Nédellec, C., Rouveirol, C. (Eds.), Machine Learning: ECML-98. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 137–142. https://doi.org/10.1007/BFb0026683
Jolicoeur-Martineau, A., 2018. The relativistic discriminator: a key element missing from standard GAN. arXiv:1807.00734 [cs, stat].
Karras, T., Aila, T., Laine, S., Lehtinen, J., 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv:1710.10196 [cs, stat].
Kawthekar, P., Rewari, R., Bhooshan, S., 2017. Evaluating Generative Models for Text Generation 8.
Kingma, D.P., Mohamed, S., Jimenez Rezende, D., Welling, M., 2014. Semi-supervised Learning with Deep Generative Models, in: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 27. Curran Associates, Inc., pp. 3581–3589.
Kingma, D.P., Welling, M., 2014. Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat].
Kobayashi, S., 2018. Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. arXiv:1805.06201 [cs].
Kotsiantis, S.B., Pintelas, P.E., 2003. Mixture of Expert Agents for Handling Imbalanced Data Sets. ANNALS OF MATHEMATICS 1, 10.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. ImageNet Classification with Deep Convolutional Neural Networks, in: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp. 1097–1105.
Kubat, M., Holte, R.C., Matwin, S., Kohavi, R., Provost, F., 1998. Machine learning for the detection of oil spills in satellite radar images, in: Machine Learning. pp. 195–215.
Kubat, M., Matwin, S., 1997. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of the Fourteenth International Conference on Machine Learning 179–186.
Li, G., Wang, J., Zheng, Y., Franklin, M.J., 2016. Crowdsourced Data Management: A Survey 23.
Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D., 2017. Adversarial Learning for Neural Dialogue Generation. arXiv:1701.06547 [cs].
Li, Y., Sun, G., Zhu, Y., 2010. Data Imbalance Problem in Text Classification, in: 2010 Third International Symposium on Information Processing. Presented at the 2010 Third International Symposium on Information Processing (ISIP), IEEE, Qingdao, Shandong, China, pp. 301–305. https://doi.org/10.1109/ISIP.2010.47
Lin, K., Li, D., He, X., Zhang, Z., Sun, M., 2017. Adversarial Ranking for Language Generation 11.
Liu, Y., Loh, H.T., Sun, A., 2009. Imbalanced text classification: A term weighting approach. Expert Systems with Applications 36, 690–701. https://doi.org/10.1016/j.eswa.2007.10.042
Longadge, M.R., Dongre, S.S., Malik, D.L., 2013. Class Imbalance Problem in Data Mining: Review 2, 6.
Lowe, R., Pow, N., Serban, I., Pineau, J., 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems, in: Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, Prague, Czech Republic, pp. 285–294. https://doi.org/10.18653/v1/W15-4640
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P., 2017. Least Squares Generative Adversarial Networks. arXiv:1611.04076 [cs].
Miao, Z., Li, Y., Wang, X., Tan, W.-C., 2020. Snippext: Semi-supervised Opinion Mining with Augmented Data. arXiv:2002.03049 [cs]. https://doi.org/10.1145/3366423.3380144
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S., 2010. Recurrent Neural Network Based Language Model 4.
Miller, G.A., 1995. WordNet: a lexical database for English. Commun. ACM 38, 39–41. https://doi.org/10.1145/219717.219748
Mirza, M., Osindero, S., 2014. Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs, stat].
Moreo, A., Esuli, A., Sebastiani, F., 2016. Distributional Random Oversampling for Imbalanced Text Classification, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’16. Presented at the the 39th International ACM SIGIR conference, ACM Press, Pisa, Italy, pp. 805–808. https://doi.org/10.1145/2911451.2914722
Mosolova, A.V., Fomin, V.V., Bondarenko, I.Y., 2018. Text Augmentation for Neural Networks 6.
Nair, V., Hinton, G.E., 2010. Rectified Linear Units Improve Restricted Boltzmann Machines 8.
Pennington, J., Socher, R., Manning, C., 2014. Glove: Global Vectors for Word Representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162
Porter, M., 2006. The Porter Stemming Algorithm [WWW Document]. URL https://tartarus.org/ martin/PorterStemmer/
Qiu, S., Xu, B., Zhang, J., Wang, Y., Shen, X., de Melo, G., Long, C., Li, X., 2020. EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks, in: Companion Proceedings of the Web Conference 2020. Presented at the WWW ’20: The Web Conference 2020, ACM, Taipei Taiwan, pp. 249–252. https://doi.org/10.1145/3366424.3383552
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., 2018a. Improving Language Understanding by Generative Pre-Training 12.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., 2018b. Language Models are Unsupervised Multitask Learners 24.
Ramos, J., 2013. Using TF-IDF to Determine Word Relevance in Document Queries.
Rosario, R., 2017. A Data Augmentation Approach to Short Text Classification.
Saif, M.A., Medvedev, A.N., Medvedev, M.A., Atanasova, T., 2018. Classification of online toxic comments using the logistic regression and neural networks models. AIP Conference Proceedings 2048, 060011. https://doi.org/10.1063/1.5082126
Sandfort, V., Yan, K., Pickhardt, P.J., Summers, R.M., 2019. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Scientific Reports 9, 1–9. https://doi.org/10.1038/s41598-019-52737-x
Schuster, M., Paliwal, K.K., 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681. https://doi.org/10.1109/78.650093
Semeniuta, S., Severyn, A., Gelly, S., 2019. On Accurate Evaluation of GANs for Language Generation. arXiv:1806.04936 [cs].
Sennrich, R., Haddow, B., Birch, A., 2016. Improving Neural Machine Translation Models with Monolingual Data, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2016, Association for Computational Linguistics, Berlin, Germany, pp. 86–96. https://doi.org/10.18653/v1/P16-1009
Sepp, H., Jurgen, S., 1997. Long Short-Term Memory | Neural Computation 1735–1780.
Shleifer, S., 2019. Low Resource Text Classification with Backtranslation 9.
Shorten Connor, Taghi M. Khoshgoftaar, 2019. A survey on Image Data Augmentation for Deep Learning | SpringerLink [WWW Document]. URL https://link.springer.com/article/10.1186/s40537-019-0197-0 (accessed 3.5.20).
Silfverberg, M., Wiemerslage, A., Liu, L., Mao, L.J., 2017. Data Augmentation for Morphological Reinflection, in: Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection. Presented at the CoNLL 2017, Association for Computational Linguistics, Vancouver, pp. 90–99. https://doi.org/10.18653/v1/K17-2010
Sun, A., Lim, E.-P., Liu, Y., 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems 48, 191–201. https://doi.org/10.1016/j.dss.2009.07.011
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y., 2007. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378. https://doi.org/10.1016/j.patcog.2007.04.009
Sutskever, I., Vinyals, O., Le, Q.V., 2014. Sequence to Sequence Learning with Neural Networks. arXiv:1409.3215 [cs].
Tayyar Madabushi, H., Kochkina, E., Castelle, M., 2019. Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data, in: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda. Association for Computational Linguistics, Hong Kong, China, pp. 125–134. https://doi.org/10.18653/v1/D19-5018
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention Is All You Need. arXiv:1706.03762 [cs].
Wang, J., Lu, W.F., Loh, H.T., 2012. P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification. Presented at the ASME 2011 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, American Society of Mechanical Engineers Digital Collection, pp. 1089–1098. https://doi.org/10.1115/DETC2011-47313
Wang, X., Sheng, Y., Deng, H., Zhao, Z., 2019. CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION 20.
Wei, J., Zou, K., 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. arXiv:1901.11196 [cs].
Wu, X., Lv, S., Zang, L., Han, J., Hu, S., 2019. Conditional BERT Contextual Augmentation, in: Rodrigues, J.M.F., Cardoso, P.J.S., Monteiro, J., Lam, R., Krzhizhanovskaya, V.V., Lees, M.H., Dongarra, J.J., Sloot, P.M.A. (Eds.), Computational Science – ICCS 2019. Springer International Publishing, Cham, pp. 84–95. https://doi.org/10.1007/978-3-030-22747-0_7
Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., Le, Q.V., 2019. Unsupervised Data Augmentation for Consistency Training. arXiv:1904.12848 [cs, stat].
Xie, Z., Wang, S.I., Li, J., Levy, D., Nie, A., Jurafsky, D., Ng, A.Y., 2017. DATA NOISING AS SMOOTHING IN NEURAL NETWORK LANGUAGE MODELS 12.
Xu, J., Ren, X., Lin, J., Sun, X., 2018. DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text. arXiv:1802.01345 [cs].
Yan, X., Yang, J., Sohn, K., Lee, H., 2016. Attribute2Image: Conditional Image Generation from Visual Attributes. arXiv:1512.00570 [cs].
Yang, Z., Hu, Z., Salakhutdinov, R., Berg-Kirkpatrick, T., 2017. Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. arXiv:1702.08139 [cs].
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E., 2016. Hierarchical Attention Networks for Document Classification, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Presented at the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, San Diego, California, pp. 1480–1489. https://doi.org/10.18653/v1/N16-1174
Ye-hang, Z., 2007. Text tendency categorization method based on class space model [WWW Document]. URL /paper/Text-tendency-categorization-method-based-on-class-Ye-hang/759cdffe9892204a5f8df7856fca5096ca6e9d59 (accessed 3.7.20).
Yu, A.W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K., Norouzi, M., Le, Q.V., 2018. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. arXiv:1804.09541 [cs].
Yu, L., Zhang, W., Wang, J., Yu, Y., 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. arXiv:1609.05473 [cs].
Yu, S., Yang, J., Liu, D., Li, R., Zhang, Y., Zhao, S., 2019. Hierarchical Data Augmentation and the Application in Text Classification. IEEE Access 7, 185476–185485. https://doi.org/10.1109/ACCESS.2019.2960263
Zhang, X., Zhao, J., LeCun, Y., 2015. Character-level Convolutional Networks for Text Classification, in: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 649–657.
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., Carin, L., 2017. Adversarial Feature Matching for Text Generation. arXiv:1706.03850 [cs, stat]. |