病徵應用於病患自撰日誌之情緒分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：228

、訪客IP：3.135.193.179

姓名

鄭新禹(Xin-Yu Zheng) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

病徵應用於病患自撰日誌之情緒分析

相關論文

★ 多重標籤文本分類之實證研究 : word embedding 與傳統技術之比較	★ 基於圖神經網路之網路協定關聯分析
★ 學習模態間及模態內之共用表示式	★ Hierarchical Classification and Regression with Feature Selection
★ 基於注意力機制的開放式對話系統	★ 針對特定領域任務—基於常識的BERT模型之應用
★ 基於社群媒體使用者之硬體設備差異分析文本情緒強烈程度	★ 機器學習與特徵工程用於虛擬貨幣異常交易監控之成效討論
★ 捷運轉轍器應用長短期記憶網路與機器學習實現最佳維保時間提醒	★ 基於半監督式學習的網路流量分類
★ ERP日誌分析-以A公司為例	★ 企業資訊安全防護：網路封包蒐集分析與網路行為之探索性研究
★ 資料探勘技術在顧客關係管理之應用─以C銀行數位存款為例	★ 人臉圖片生成與增益之可用性與效率探討分析
★ 人工合成文本之資料增益於不平衡文字分類問題	★ 探討使用多面向方法在文字不平衡資料集之分類問題影響

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來社群媒體發展快速，人們習慣在平台上分享心情，遇到問題時第一時間也會至社
群詢問相同經驗者以尋求解答，在此情況下，本研究欲利用情緒分析從社群資料中找出隱藏
的價值並產生相關應用。而過往情緒分析多用於商品、電影評論等，較少文獻探討醫療領域，
故本研究以醫療論壇上病人日誌為文本，關懷發文者目前是否遭受疾病侵害，讓身邊的人可
以設法給予幫助，提升使用者治療過程心理方面的正向影響。
本研究資料集來源為英國醫療論壇 DailyStrength，其中病患自撰日誌包含許多醫療專
有名詞，如藥物名稱、病徵、疾病等，這些詞語搭配不同程度的副詞或形容詞會讓情緒變成
極好、好、壞或是極差。而病徵通常為一種直觀表達身體感受的專有名詞，因此本研究目的
為探討病徵結合情緒分析是否能夠加強病患自撰日誌的情緒辨識，其中不單想了解正負情緒，
而是區分 Bad 及 Horrible 的程度差別，藉以找出情緒極差的高危險族群，並適時地給予幫
助。
本研究以四部分實驗方法進行探討：(1)探討傳統文本表示法 Bag-of-word 及 Word
Embedding 在病患日誌上的 Baseline，相較於傳統領域最佳準確率僅 57%，顯示過去常用
的文本表示法於病患自撰日誌上效果有限；(2)利用三種提及病徵表示法發現病徵確實可提
升 3~4%預測準確率；(3)運用半監督式及階層式架構幫助加強分辨 Bad 及 Horrible 情緒，
發現利用半監督式方法增加訓練樣本，應用於階層式架構中準確率能達 65%，但相較於過
去傳統分類來說效果不顯著；(4)利用人工評估探討長、短文本中病患主觀感受與第三方客
觀感受的差別，發現短文本中人為評估與機器學習結果一致性高，顯示客觀分析與病患主觀
感受存在極大落差，而長文本中人為評估與機器評估的感受較不一致，推斷長文本中人為評
估因容易理解上下文關係及轉折語氣的表達，因此較機器學習容易判斷情緒。

摘要(英)

Social media has developed rapidly in recent years. People are used to sharing their own journal on the community. When you have a problem, you will first go to the social media to seek answers. In this case, our study wants to use sentiment analysis to find hidden value and generate more related extension application. Past studies indicate that sentiment analysis is used for movie reviews and product reviews, etc. Less research is aimed at sentiment analysis in the medical field. Therefore, this study uses the patient-authored text as the dataset of sentiment analysis. In order to find out whether the user is currently suffering from disease and find ways to help them. The source of the study′s dataset is the UK′s medical forum called DailyStrength. We found that the patient-authored text contained many medical terms such as drug names, symptoms, diseases, etc. And these words with a different adverb of degree or adjectives will make the emotions become excellent, good, bad or horrible. And the symptoms are often used to express physical condition. Therefore, the purpose of this study is using symptoms to patient-authored text in sentiment analysis. It’s not only just about understanding positive and negative emotions but distinguishing the difference between bad and horrible, in order to identify high-risk groups and give timely help. The research method mainly divided into four parts. First, we mainly discuss the baseline of the bag-of-words and word embedding representation on the patient-authored text. the best accuracy rate is only 57%, showing that in the most common text representation on the patient-authored text has limited effect. The second part uses the three mentioned symptom representations compared to the baseline, it is found that it can actually improve the prediction accuracy by 3% to 4%. Confirmed that using symptoms can improve prediction accuracy. The third part uses a semi-supervised and hierarchical structure to help distinguish between bad and horrible emotions. The semi-supervised method is used to increase the training samples, which can achieve 65% accuracy in the hierarchical structure, but the effect is not significant compared with the accuracy of the traditional classification in the past. Finally, we use manual evaluation to explore the reasons, which divide the text into long and short texts, found that In the short text there is a great gap between objective analysis and patient subjective feelings. In the long text, human assessment and machine assessment are more inconsistent.

關鍵字(中)

★ 社群媒體
★ 自然語言處理
★ 情緒分析
★ 病患自撰日誌

關鍵字(英)

★ Social media
★ Natural Language Processing
★ Sentiment analysis
★ Patient-authored text

論文目次

摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VIII
1. 緒論 1
1.1. 研究背景 1
1.2. 研究動機 2
1.3. 研究目的 2
1.4. 論文架構 3
2. 相關研究 5
2.1. 情緒分析介紹 5
2.1.1. 情緒分析應用於醫療領域之相關研究 5
2.2. 文本向量表示法 9
2.2.1. Bag-of-word 9
2.2.2. TF-IDF (Term Frequency-Inverse Term Frequency) 10
2.2.3. Word2vec 11
2.2.4. GloVe (Global Vectors for Word Representation) 13
2.2.5. ELMo(Embedding from Language Models) 14
2.3. Machine Learning Techniques 15
2.3.1. 支持向量機(SVM) 16
2.3.2. 隨機森林(Random Forest) 17
2.3.3. 人工神經網路( Artificial Neural Network，ANN ) 19
2.4. Evaluation 20
3. 研究方法 23
3.1. 資料集Dataset 23
3.2. Preprocessing 27
3.3. 實驗方法及流程 27
3.3.1. 前置實驗：傳統方法在病患日誌上的baseline 28
3.3.2. 實驗一提及病徵的向量表示法 29
3.3.3. 實驗二階層式架構 32
3.3.3.1.Pseudo-Labeling於Neutral data之應用 33
3.3.4. 評估方法 34
4. 實驗結果與分析 35
4.1. 前置實驗：傳統方法在病患自撰日誌的Baseline 35
4.2. 實驗一、提及病徵表示法 38
4.3. 實驗二、階段式架構 41
4.3.1. 問卷評估 49
4.4. 綜合分析 52
5. 總結 54
5.1. 結論 54
5.2. 實驗貢獻 55
5.3. 未來展望 55
參考文獻 56
附錄 61

參考文獻

Aydoğan, E., Akcayol, M.A., 2016. A comprehensive survey for sentiment analysis tasks using machine learning techniques, in: 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA). Presented at the 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–7. https://doi.org/10.1109/INISTA.2016.7571856
Blair, D.C., 1979. Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths; 1979: 208 pp. Price: $32.50. J. Am. Soc. Inf. Sci. 30, 374–375. https://doi.org/10.1002/asi.4630300621
Chapelle, O., Scholkopf, B., Eds, A.Z., 2009. Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Trans. Neural Netw. 20, 542–542. https://doi.org/10.1109/TNN.2009.2015974
Choudhury, M.D., Gamon, M., Counts, S., Horvitz, E., 2013. Predicting Depression via Social Media, in: Seventh International AAAI Conference on Weblogs and Social Media. Presented at the Seventh International AAAI Conference on Weblogs and Social Media.
Daniulaityte, R., Chen, L., Lamy, F.R., Carlson, R.G., Thirunarayan, K., Sheth, A., 2016. “When ‘Bad’ is ‘Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets. JMIR Public Health Surveill. 2, e162. https://doi.org/10.2196/publichealth.6327
Denecke, K., Deng, Y., 2015. Sentiment analysis in medical settings: New opportunities and challenges. Artif. Intell. Med. 64, 17–27. https://doi.org/10.1016/j.artmed.2015.03.006
Fang, X., Zhan, J., 2015. Sentiment analysis using product review data. J. Big Data 1, 1–14. https://doi.org/10.1186/s40537-015-0015-2
Gohil, S., Vuik, S., Darzi, A., 2018. Sentiment Analysis of Health Care Tweets: Review of the Methods Used. JMIR Public Health Surveill. 4, e43. https://doi.org/10.2196/publichealth.5789
Greaves, F., Laverty, A.A., Cano, D.R., Moilanen, K., Pulman, S., Darzi, A., Millett, C., 2014. Tweets about hospital quality: a mixed methods study. BMJ Qual. Saf. 23, 838–846. https://doi.org/10.1136/bmjqs-2014-002875
Greaves, F., Ramirez-Cano, D., Millett, C., Darzi, A., Donaldson, L., 2013. Use of sentiment analysis for capturing patient experience from free-text comments posted online. J. Med. Internet Res. 15, e239. https://doi.org/10.2196/jmir.2721
Hirata, M., Onodera, H., Suzuki, M., 2016. Determination of the End of Positioning Phase Using SVM: Kernel Choice and Parameter Tuning**This work was supported by JSPS KAKENHI Grant Number 25420429. IFAC-Pap., 7th IFAC Symposium on Mechatronic Systems MECHATRONICS 2016 49, 103–108. https://doi.org/10.1016/j.ifacol.2016.10.519
Huppertz, J.W., Otto, P., 2018. Predicting HCAHPS scores from hospitals’ social media pages: A sentiment analysis. Health Care Manage. Rev. 43, 359. https://doi.org/10.1097/HMR.0000000000000154
Hussain, M., Wajid, S.K., Elzaart, A., Berbar, M., 2011. A Comparison of SVM Kernel Functions for Breast Cancer Detection, in: Imaging and Visualization 2011 Eighth International Conference Computer Graphics. Presented at the Imaging and Visualization 2011 Eighth International Conference Computer Graphics, pp. 145–150. https://doi.org/10.1109/CGIV.2011.31
Jiang, Z., Li, L., Huang, D., Liuke Jin, 2015. Training word embeddings for deep learning in biomedical text mining tasks, in: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Presented at the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 625–628. https://doi.org/10.1109/BIBM.2015.7359756
Jiménez-Zafra, S.M., Martín-Valdivia, M.T., Molina-González, M.D., Ureña-López, L.A., 2019. How do we talk about doctors and drugs? Sentiment analysis in forums expressing opinions for medical domain. Artif. Intell. Med., Extracting and Processing of Rich Semantics from Medical Texts 93, 50–57. https://doi.org/10.1016/j.artmed.2018.03.007
Johnson, R., Zhang, T., 2014. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. ArXiv14121058 Cs Stat.
Kalarani, P., Selva Brunda, S., 2018. Sentiment analysis by POS and joint sentiment topic features using SVM and ANN. Soft Comput. https://doi.org/10.1007/s00500-018-3349-9
Korkontzelos, I., Nikfarjam, A., Shardlow, M., Sarker, A., Ananiadou, S., Gonzalez, G.H., 2016. Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts. J. Biomed. Inform. 62, 148–158. https://doi.org/10.1016/j.jbi.2016.06.007
Lee, D., 2013. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks.
Li, J., Fong, S., Zhuang, Y., Khoury, R., 2016. Hierarchical classification in text mining for sentiment analysis of online news. Soft Comput. 20, 3411–3420.
Liaw, A., Wiener, M., 2002. Classification and regression by randomForest. R News 2, 18–22.
Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, in: Aggarwal, C.C., Zhai, C. (Eds.), Mining Text Data. Springer US, Boston, MA, pp. 415–463. https://doi.org/10.1007/978-1-4614-3223-4_13
Liu, Y., Bi, J.-W., Fan, Z.-P., 2017. A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Inf. Sci. 394–395, 38–52. https://doi.org/10.1016/j.ins.2017.02.016
Lowe, R., Pow, N., Serban, I., Pineau, J., 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. ArXiv150608909 Cs.
Manek, A.S., Shenoy, P.D., Mohan, M.C., R, V.K., 2017. Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web 20, 135–154. https://doi.org/10.1007/s11280-015-0381-x
Manning, C., Raghavan, P., Schütze, H., 2010. Introduction to Information Retrieval. Nat. Lang. Eng. 16, 100–103.
Medlock, B., 2003. A Language Model Approach to Spam Filtering.
Melville, P., Gryc, W., Lawrence, R.D., 2009. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification, in: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09. ACM, New York, NY, USA, pp. 1275–1284. https://doi.org/10.1145/1557019.1557156
Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013a. Efficient Estimation of Word Representations in Vector Space. ArXiv13013781 Cs.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013b. Distributed Representations of Words and Phrases and their Compositionality, in: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 26. Curran Associates, Inc., pp. 3111–3119.
Moraes, R., Valiati, J.F., Gavião Neto, W.P., 2013. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Syst. Appl. 40, 621–633. https://doi.org/10.1016/j.eswa.2012.07.059
Moraes, R., Valiati, J.F., Neto, W.P.G., 2018. Unbalanced sentiment classification: an assessment of ANN in the context of sampling the majority class (No. e26618v1). PeerJ Inc. https://doi.org/10.7287/peerj.preprints.26618v1
Pak, A., Paroubek, P., 2010. Twitter as a corpus for sentiment analysis and opinion mining., in: LREc. pp. 1320–1326.
Paliwal, S., Khatri, S.K., Sharma, M., 2019. Sentiment Analysis and Prediction Using Neural Networks, in: Luhach, A.K., Singh, D., Hsiung, P.-A., Hawari, K.B.G., Lingras, P., Singh, P.K. (Eds.), Advanced Informatics for Computing Research, Communications in Computer and Information Science. Springer Singapore, pp. 458–470.
Peng, Y., Moh, M., Moh, T., 2016. Efficient adverse drug event extraction using Twitter sentiment analysis, in: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). Presented at the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1011–1018. https://doi.org/10.1109/ASONAM.2016.7752365
Pennington, J., Socher, R., Manning, C., 2014. Glove: Global Vectors for Word Representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L., 2018. Deep contextualized word representations. ArXiv180205365 Cs.
Ramos, J., 2003. Using TF-IDF to Determine Word Relevance in Document Queries.
Safavian, S.R., Landgrebe, D., 1991. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21, 660–674. https://doi.org/10.1109/21.97458
Sahu, T.P., Ahuja, S., 2016. Sentiment analysis of movie reviews: A study on feature selection amp; classification algorithms, in: 2016 International Conference on Microelectronics, Computing and Communications (MicroCom). Presented at the 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), pp. 1–6. https://doi.org/10.1109/MicroCom.2016.7522583
Salas-Zárate, M.D.P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., Rodríguez-García, M.Á., Valencia-García, R., 2017a. Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach. Comput. Math. Methods Med. 2017, 5140631. https://doi.org/10.1155/2017/5140631
Salas-Zárate, M.D.P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., Rodríguez-García, M.Á., Valencia-García, R., 2017b. Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach. Comput. Math. Methods Med. 2017, 5140631. https://doi.org/10.1155/2017/5140631
Shah, A.M., Yan, X., Shah, S.J., Khan, S., 2018. Use of Sentiment Mining and Online NMF for Topic Modeling Through the Analysis of Patients Online Unstructured Comments, in: Chen, H., Fang, Q., Zeng, D., Wu, J. (Eds.), Smart Health, Lecture Notes in Computer Science. Springer International Publishing, pp. 191–203.
Sharma, A., Dey, S., 2012. An Artificial Neural Network Based Approach for Sentiment Analysis of Opinionated Text, in: Proceedings of the 2012 ACM Research in Applied Computation Symposium, RACS ’12. ACM, New York, NY, USA, pp. 37–42. https://doi.org/10.1145/2401603.2401611
Silva, N.F.F.D., Coletta, L.F.S., Hruschka, E.R., 2016. A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning. ACM Comput Surv 49, 15:1–15:26. https://doi.org/10.1145/2932708
Sreng, S., Maneerat, N., Hamamoto, K., Panjaphongse, R., 2018. Automated Diabetic Retinopathy Screening System Using Hybrid Simulated Annealing and Ensemble Bagging Classifier. Appl. Sci. 8, 1198. https://doi.org/10.3390/app8071198
Staiano, J., Guerini, M., 2014. DepecheMood: a Lexicon for Emotion Analysis from Crowd-Annotated News. ArXiv14051605 Cs.
Uysal, A.K., Gunal, S., 2014. The impact of preprocessing on text classification. Inf. Process. Manag. 50, 104–112. https://doi.org/10.1016/j.ipm.2013.08.006
Yadav, S., Ekbal, A., Saha, S., Bhattacharyya, P., 2018. Medical sentiment analysis using social media: towards building a patient assisted system, in: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018).
Yang, F.-C., Lee, A.J.T., Kuo, S.-C., 2016. Mining Health Social Media with Sentiment Analysis. J. Med. Syst. 40, 236. https://doi.org/10.1007/s10916-016-0604-4
Zhang, L., Wang, S., Liu, B., 2018. Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8, e1253. https://doi.org/10.1002/widm.1253

指導教授

柯士文

審核日期

2019-7-23

推文