English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41635271      線上人數 : 1368
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93191


    題名: 標籤圖卷積增強式超圖注意力網路之中文健康照護文本多重分類;Label Graph Convolutions Enhanced Hypergraph Attention Networks for Chinese Multi-Label Text Classification in the Healthcare Domain
    作者: 高浩銓;Kao, Hao-Chuan
    貢獻者: 電機工程學系
    關鍵詞: 嵌入向量;圖神經網路;超圖神經網路;文本分類;健康資訊學;embedding;graph neural networks;hypergraph neural networks;text classification;health informatics
    日期: 2023-03-10
    上傳時間: 2024-09-19 16:47:13 (UTC+8)
    出版者: 國立中央大學
    摘要: 多標籤文本分類目標是自動分析文字內容自動指派一個或多個事先給定的類別標籤,常見的應用包括情感分析、主題檢測及新聞分類等。我們提出一個標籤圖卷積增強式超圖注意力網路 (Label Graph Convolutions Enhanced Hypergraph Attention Networks, LGC-HyperGAT) 模型,藉由超圖注意力網路以找出字詞與句子的關聯,然後用標籤圖卷積網路建構類別標籤之間隱含關係,最後將其銜接在一起,用來預測文本內容種類。實驗資料分為兩個部分,包含 (1) 中文健康照護資料集(HealthDoc):我們以網路爬蟲蒐集網頁上健康照護相關的新聞、文章專欄以及部落格,並將前處理後的文字內容,由3位大學生人工標記類別標籤,文本總數有2,724篇,平均字數是1,096.91,類別標籤共有9個,分別是疾病資訊、養生保健、心理健康、治療方案、醫療檢測、保健食品、注意事項、藥物以及銀髮族,標籤總數是8,731,平均每篇文章有3.21個標籤。 (2) 中文憂鬱症資料集(PsychPark):此資料是從心靈園地 (http://www.psychpark.org)網站收集,文本為網友提出的精神疾病狀況與敘述,醫師再依據病患提出的心理問題做多標籤分類,文本總數有2,831篇,平均字數是247.89,類別標籤共有21個,標籤總數是4,425,平均每篇文章有1.56個標籤。藉由實驗結果與錯誤分析得知,我們提出的LGC-HyperGAT模型,在HealthDoc和PsyPark資料集分別達到最好的Macro -F1分數0.725和0.35,比相關研究模型 (CNN, LSTM, Bi-LSTM, FastText, BERT, Graph-CNN, TextGCN, Text-Level-GNN, HyperGAT) 的表現來得更好,藉由錯誤分析可知,標籤分類器學習到的隱含特徵可以有效地提升文本分類的效能。;Multi-label text classification task focuses on automatically assigning one or more predefined category labels to the text content. The common applications include sentiment analysis, topic detection, news classification, and so on. We propose a Label Graph Convolutions Enhanced Hypergraph Attention Networks (LGC-HyperGAT) model, in which the hypergraph attention networks are used to formulate the relationships between words and sentences in the text content, and the label graph convolutions networks are used to capture the implicit correlations within the labels, and both kinds of networks are finally connected to predict the content labels. There are two experimental datasets including 1) Chinese healthcare dataset (HealthDoc): We firstly crawled to collect health-related news, articles, and blogs on the web. After preprocessing the text content, three undergraduate students were trained to annotate the category manually. A total of 2724 documents were annotated and each contained 1096.91 words on average. There are 9 category labels including disease, health protection, mental health, treatment, examination, ingredient, caution, drug, and elder. The total number of labels is 8,731. Each document contains an average of 3.21 labels. 2) Chinese depression dataset (PsychPark): This data is collected from the PsychPark website (http://www.psychpark.org). Users propose mental illnesses and then doctors classify psychological diseases according to their self-descriptions. The total number of texts is 2,831 and the average number of words is 247.89. The total number of labels is 4,425 across 21 categories with an average of 1.56 labels per document. Based on the experimental results, our proposed LGC-HyperGAT model respectively achieved the best Macro-F1 scores of 0.725 and 0.35 in the HealthDoc and PsyPark datasets, which are better than related models (CNN, LSTM, Bi-LSTM, FastText). , BERT, Graph-CNN, TextGCN, Text-Level-GNN, HyperGAT). Through error analysis, the features learned by the label classifier can effectively improve the performance of multi-label text classification.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML20檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明