姓名 鄭新禹(Xin-Yu Zheng)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 病徵應用於病患自撰日誌之情緒分析
摘要(中) 近年來社群媒體發展快速,人們習慣在平台上分享心情,遇到問題時第一時間也會至社
本研究資料集來源為英國醫療論壇 DailyStrength,其中病患自撰日誌包含許多醫療專
而是區分 Bad 及 Horrible 的程度差別,藉以找出情緒極差的高危險族群,並適時地給予幫
本研究以四部分實驗方法進行探討:(1)探討傳統文本表示法 Bag-of-word 及 Word
Embedding 在病患日誌上的 Baseline,相較於傳統領域最佳準確率僅 57%,顯示過去常用
升 3~4%預測準確率;(3)運用半監督式及階層式架構幫助加強分辨 Bad 及 Horrible 情緒,
發現利用半監督式方法增加訓練樣本,應用於階層式架構中準確率能達 65%,但相較於過
摘要(英) Social media has developed rapidly in recent years. People are used to sharing their own journal on the community. When you have a problem, you will first go to the social media to seek answers. In this case, our study wants to use sentiment analysis to find hidden value and generate more related extension application. Past studies indicate that sentiment analysis is used for movie reviews and product reviews, etc. Less research is aimed at sentiment analysis in the medical field. Therefore, this study uses the patient-authored text as the dataset of sentiment analysis. In order to find out whether the user is currently suffering from disease and find ways to help them. The source of the study′s dataset is the UK′s medical forum called DailyStrength. We found that the patient-authored text contained many medical terms such as drug names, symptoms, diseases, etc. And these words with a different adverb of degree or adjectives will make the emotions become excellent, good, bad or horrible. And the symptoms are often used to express physical condition. Therefore, the purpose of this study is using symptoms to patient-authored text in sentiment analysis. It’s not only just about understanding positive and negative emotions but distinguishing the difference between bad and horrible, in order to identify high-risk groups and give timely help. The research method mainly divided into four parts. First, we mainly discuss the baseline of the bag-of-words and word embedding representation on the patient-authored text. the best accuracy rate is only 57%, showing that in the most common text representation on the patient-authored text has limited effect. The second part uses the three mentioned symptom representations compared to the baseline, it is found that it can actually improve the prediction accuracy by 3% to 4%. Confirmed that using symptoms can improve prediction accuracy. The third part uses a semi-supervised and hierarchical structure to help distinguish between bad and horrible emotions. The semi-supervised method is used to increase the training samples, which can achieve 65% accuracy in the hierarchical structure, but the effect is not significant compared with the accuracy of the traditional classification in the past. Finally, we use manual evaluation to explore the reasons, which divide the text into long and short texts, found that In the short text there is a great gap between objective analysis and patient subjective feelings. In the long text, human assessment and machine assessment are more inconsistent.
關鍵字(中) ★ 社群媒體
★ 自然語言處理
★ 情緒分析
★ 病患自撰日誌
關鍵字(英) ★ Social media
★ Natural Language Processing
★ Sentiment analysis
★ Patient-authored text
論文目次 摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VIII
1. 緒論 1
1.1. 研究背景 1
1.2. 研究動機 2
1.3. 研究目的 2
1.4. 論文架構 3
2. 相關研究 5
2.1. 情緒分析介紹 5
2.1.1. 情緒分析應用於醫療領域之相關研究 5
2.2. 文本向量表示法 9
2.2.1. Bag-of-word 9
2.2.2. TF-IDF (Term Frequency-Inverse Term Frequency) 10
2.2.3. Word2vec 11
2.2.4. GloVe (Global Vectors for Word Representation) 13
2.2.5. ELMo(Embedding from Language Models) 14
2.3. Machine Learning Techniques 15
2.3.1. 支持向量機(SVM) 16
2.3.2. 隨機森林(Random Forest) 17
2.3.3. 人工神經網路( Artificial Neural Network,ANN ) 19
2.4. Evaluation 20
3. 研究方法 23
3.1. 資料集Dataset 23
3.2. Preprocessing 27
3.3. 實驗方法及流程 27
3.3.1. 前置實驗:傳統方法在病患日誌上的baseline 28
3.3.2. 實驗一 提及病徵的向量表示法 29
3.3.3. 實驗二 階層式架構 32於Neutral data之應用 33
3.3.4. 評估方法 34
4. 實驗結果與分析 35
4.1. 前置實驗:傳統方法在病患自撰日誌的Baseline 35
4.2. 實驗一、提及病徵表示法 38
4.3. 實驗二、階段式架構 41
4.3.1. 問卷評估 49
4.4. 綜合分析 52
5. 總結 54
5.1. 結論 54
5.2. 實驗貢獻 55
5.3. 未來展望 55
參考文獻 56
附錄 61
指導教授 柯士文 審核日期 2019-7-23
