上下文嵌入增強異質圖注意力網路模型 於心理諮詢文本多標籤分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：115

、訪客IP：3.17.157.165

姓名

曾郁雯(Yu-Wen Tzeng) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

上下文嵌入增強異質圖注意力網路模型於心理諮詢文本多標籤分類
(Contextual Embeddings Enhanced Heterogeneous Graph Attention Networks for Multi-Label Classification of Psychological Counseling Texts)

相關論文

★ 基於階層式聚類注意力之編碼解碼器於醫療問題多答案摘要

★ 探索門控圖神經網路於心理諮詢文字情感強度預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-23以後開放)

摘要(中)

文本多標籤分類 (Multi-Label Text Classification, MLTC) 任務對於每一則文字內容，預測一個或多個事先給定的分類標籤，由於標籤之間存在隱含關係，難以充分挖掘標籤間的相關性，目前模型效能普遍不佳。本研究探討將圖神經網路 (GNN) 與轉譯器(Transformer) 模型結合，利用異質圖方式建構文本詞彙及標籤之間的關係，並透過網路的圖結構更新能力和轉譯器的自注意力機制，提出了上下文嵌入增強異質圖注意力網路模型 (Contextual Embeddings Enhanced Heterogeneous Graph Attention Networks, CE-HeterGAT)，旨在加強文本特徵表示，提升多標籤分類的效能。我們將文本詞彙及標籤透過五種不同的邊建立異質圖，圖節點包括文本詞、標籤詞及一個虛擬節點，邊類型包括字詞之間的序列關係、字詞之間的依存句法關係、字詞與標籤詞之間的語義關係、標籤詞之間的共現關係以及虛擬節點和所有字詞的邊，然後經由圖注意力網路學習異質圖的節點表示。同時文本透過BERT (Bidirectional Encoder Representations from Transformers)得到文本上下文關係，最後將兩種特徵經由我們設計的注意力解碼器得到整個文本的節點表示並預測最後的標籤分類。
我們建置了兩種標籤分類的中文心理諮詢多標籤文本分類資料集，總共蒐集4,473筆線上心理諮詢的留言，人工標記內容的主題和事件，最終建置完成包含11種主題多標籤的Psycho-MLTopic資料集以及52種事件多標籤的Psycho-MLEvent資料集。藉由實驗與效能評估得知，我們提出的模型CE-HeterGAT效能皆優於其他相關模型(TextCNN、Bi-LSTM、BERT、GCN、GAT、TextGCN、SAT、UGformer、Exphormers)，尤其是在Macro-F1 Score指標有顯著的提升，證明異質圖結構及結合上下文訊息的圖神經網路能夠有效提升文本多標籤分類的效能。

摘要(英)

Multi-Label Text Classification (MLTC) is a task that focuses on assigning at least one pre-defined label to given texts. Due to the complexity of discovering the implicit relationships among labels, existing methods still need to work on fully exploiting the correlations between labels. This study explores the combination of Graph Neural Networks (GNN) and a Transformer model, using a heterogeneous graph approach to construct relationships between text words and labels. Leveraging the graph processing capabilities of GNNs and the self-attention mechanism of Transformers, we propose the Contextual Embeddings Enhanced Heterogeneous Graph Attention Networks (CE-HeterGAT) model, aimed at enhancing text feature representation and improving multi-label classification performance. We construct a heterogeneous graph comprising of content nodes, label nodes, and a virtual node and five edge types between nodes: 1) sequential relationships between content words; 2) syntactic relationships between content words; 3) semantic relationships between content words and label words; 4) conditional co-occurrence relationships between label words; and 5) edges between the virtual node and all content words. The graph attention networks are then used to learn the node representations of the heterogeneous graph. Simultaneously, the BERT Transformer captures the contextual relationships within the texts. Finally, we use a cross-attention decoder to obtain the fully exploited node representations and predict the final label classifications.
We collected 4,473 online psychological counseling texts and manually annotated multiple labels, resulting in a Psycho-MLTopic dataset across 11 topic labels and a Psycho-MLEvent dataset across 52 event labels. Experimental results and performance evaluations show that our proposed CE-HeterGAT model outperforms other related models (TextCNN, Bi-LSTM, BERT, GCN, GAT, TextGCN, SAT, UGformer, Exphormers). The CE-HeterGAT model demonstrates significant improvements, especially in the Macro-F1 Score metric, proving that the heterogeneous graph structure combined with contextual information in graph neural networks effectively enhances text classification performance.

關鍵字(中)

★ 多標籤分類
★ 異質圖
★ 圖注意力網路
★ 上下文嵌入
★ 心理諮詢

關鍵字(英)

★ multi-label text classification
★ heterogeneous graph
★ graph attention networks
★ contextual embeddings
★ psychological counseling

論文目次

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章緒論 1
1-1 研究背景 1
1-2 研究動機 4
1-3 研究目的 5
1-4 章節概要 7
第二章相關研究 8
2-1 多標籤文本分類 8
2-2 詞嵌入向量 13
2-3 類神經網路 17
2-4 圖神經網路 21
2-5 圖神經網路與轉譯器結合 25
第三章研究方法 29
3-1 系統架構 29
3-2 異質圖建構 31
3-2-1 序列邊 (Sequential edge)： 32
3-2-2 句法邊 (Syntactic edge)： 32
3-2-3 共現邊 (Co-occurrence edge)： 33
3-2-4 語義邊 (Semantic edge)： 34
3-2-5 虛擬節點邊 (Virtual Node edge)： 35
3-3 異質圖注意力網路 36
3-4 文本編碼器 39
3-5 注意力解碼器 40
3-6 分類層 42
第四章實驗與效能評估 43
4-1 資料集建置 43
4-2 效能指標 53
4-3 實驗設定 56
4-4 模型比較 58
4-5 消融實驗 64
4-6 圖注意力網路分析 68
4-7 深度分析 70
4-8 錯誤分析 73
第五章研究結論 75
5-1 結論 75
5-2 未來展望 76
參考文獻 77
附錄 80
附錄一：文本完整例子 80
附錄二：分類標籤種類 83
附錄三：高中低頻標籤種類 86

參考文獻

Berger, M. J. (2015). Large scale multi-label text classification with semantic word vectors. Technical report, Stanford University.
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 1757-1771
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint, arXiv:1312.6203.
Chen, D., O’Bray, L., & Borgwardt, K. (2022). Structure-aware transformer for graph representation learning. Proceedings of the 39th International Conference on Machine Learning, 3469-3489
Clare, A., & King, R. D. (2001). Knowledge discovery in multi-label phenotype data. European Conference on Principles of Data Mining and Knowledge Discovery, 42-53
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 37-46
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 273-297
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. Proceedings of the 30th International Conference on Neural Information Processing Systems, 3844–3852
Deng, Y. C., Tsai, C. Y., Wang, Y. R., Chen, S. H., & Lee, L. H. (2022). Predicting Chinese Phrase-Level Sentiment Intensity in Valence-Arousal Dimensions With Linguistic Dependency Features. IEEE Access, 126612-126620
Deng, Y. C., Wang, Y. R., Chen, S. H., & Lee, L. H. (2023). Toward Transformer Fusions for Chinese Sentiment Intensity Prediction in Valence-Arousal Dimensions. IEEE Access, 109974-109982
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186
Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 681–687
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 378
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 1735-1780
Holmes, T. H., & Rahe, R. H. (1967). The social readjustment rating scale. Journal of Psychosomatic Research, 213-218
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of Tricks for Efficient Text Classification. In M. Lapata, P. Blunsom, & A. Koller, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 427-431
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In A. Moschitti, B. Pang, & W. Daelemans, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746-1751
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. arXiv preprint, arXiv:1609.02907.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 84–90
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159-174
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., & He, L. (2022). A Survey on Text Classification: From Traditional to Deep Learning. ACM Transactions on Intelligent Systems and Technology, Article 31
Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2015). Gated graph sequence neural networks. arXiv preprint, arXiv:1511.05493.
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2873–2879
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781.
Nguyen, D. Q., Nguyen, T. D., & Phung, D. (2022). Universal Graph Transformer Self-Attention Networks. Companion Proceedings of the Web Conference 2022, 193–196
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543
Quinlan, J. R. (2014). C4. 5: programs for machine learning. The Morgan Kaufmann Series in Machine Learning.
Rampášek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G., & Beaini, D. (2022). Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 14501-14515
Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 333-359
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2009). The Graph Neural Network Model. IEEE Transactions on Neural Networks, 61-80
Shirzad, H., Velingker, A., Venkatachalam, B., Sutherland, D. J., & Sinop, A. K. (2023). Exphormer: Sparse transformers for graphs. Proceedings of the 40th International Conference on Machine Learning, 31613-31632
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining Multi-label Data. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook, 667-685
Van Nguyen, M., Lai, V. D., Veyseh, A. P. B., & Nguyen, T. H. (2021). Trankit: A light-weight transformer-based toolkit for multilingual natural language processing. arXiv preprint, arXiv:2101.03289.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS 2017).
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2018). Graph Attention Networks. arXiv preprint, arXiv:1710.10903.
Yao, L., Mao, C., & Luo, Y. (2019). Graph convolutional networks for text classification. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 7370-7377
Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2038-2048

指導教授

徐國鎧李龍豪(Kuo-Kai Shyu Lung-Hao Lee)

審核日期

2024-7-26

推文