Mobile Virtual Therapist for Multi-Modal Depression-Level Assessment

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：58

、訪客IP：52.14.187.136

姓名

高廷瑜(Ting-Yu Gao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(Mobile Virtual Therapist for Multi-Modal Depression-Level Assessment)

相關論文

★ 具多重樹狀結構之可靠性群播傳輸	★ 在嵌入式行動裝置上設計與開發跨平台Widget
★ 在 ARM 架構之嵌入式系統上實作輕量化的手持多媒體播放裝置圖形使用者介面函式庫	★ 基於網路行動裝置所設計可擴展的服務品質感知GStreamer模組
★ 針對行動網路裝置開發可擴展且跨平台之GSM/HSDPA引擎	★ 於單晶片多媒體裝置進行有效率之多格式解碼管理
★ IMS客戶端設計與即時通訊模組研發：個人資訊交換模組與即時訊息模組實作	★ 在可攜式多媒體裝置上實作人性化的嵌入式小螢幕網頁瀏覽器
★ 以IMS為基礎之及時語音影像通話引擎的實作:使用開放原始碼程式庫	★ 電子書嵌入式開發: 客制化下載服務實作, 資料儲存管理設計
★ 於數位機上盒實現有效率訊框參照處理與多媒體詮釋資料感知的播放器設計	★ 具數位安全性的電子書開發：有效率的更新模組與資料庫實作
★ 適用於異質無線寬頻系統的新世代IMS客戶端軟體研發	★ 在可攜式數位機上盒上設計並實作重配置的圖形使用者介面
★ Friendly GUI design and possibility support for E-book Reader based Android client	★ Effective GUI Design and Memory Usage Management for Android-based Services

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-8-13以後開放)

摘要(中)

憂鬱症不僅困擾著數億人口，而且還加重了全球殘疾和醫療保健負擔。診斷憂鬱症的主要方法依賴於醫療專業人員在與患者的臨床訪談中的判斷，這是主觀且又耗時的。最近的研究表明，文本、聲音、臉部特徵、心率和眼球運動可用於憂鬱症評估。在本研究中，我們構建了一個虛擬治療師，用於在行動裝置上進行自動化憂鬱症評估，可以通過語音對話主動引導使用者，並透過情緒感知技術變換對話內容。在對話過程中，將文本、聲音、臉部屬性、心率和眼球運動中提取的特徵，使用於多模態憂鬱程度評估。我們利用特徵級融合框架整合五種模態和深度神經網絡，對不同程度的憂鬱症進行分類，包括健康、輕度、中度或重度憂鬱症，以及雙相情感障礙（或稱為躁狂憂鬱症）。經過來自168名受試者的實驗結果證明，具有五個模態特徵的特徵級融合架構之總體準確率達到最高的90.26%。

摘要(英)

Depression not only afflicts hundreds of millions of people but also contributes to a global disability and healthcare burden. The primary method of diagnosing depression relies on the judgment of medical professionals in clinical interviews with patients, which is subjective and time-consuming. Recent studies have demonstrated that text, audio, facial attributes, heart rate, and eye movement could be utilized for depression assessment. In this paper, we construct a virtual therapist for automatic depression assessment on mobile devices that can actively guide users through voice dialogue and change conversation content using emotion perception. During the conversation, features from text, audio, facial attributes, heart rate, and eye movement are extracted for multi-modal depression-level assessment. We utilize a feature-level fusion framework to integrate five modalities and the deep neural network to classify the varying levels of depression, which include healthy, mild, moderate, or severe depression, as well as bipolar disorder (formerly called manic depression). With outcome data from 168 subjects, experimental results reveal that the total accuracy of feature-level fusion with five modal features achieves the highest accuracy of 90.26 percent.

關鍵字(中)

★ 虛擬人
★ 憂鬱識別
★ 多模態數據融合

關鍵字(英)

★ virtual human
★ depression recognition
★ multi-modal fusion

論文目次

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 v
表目錄 vi
符號說明 vii
一、緒論 1
二、相關文獻 3
三、研究內容與方法 6
3-1 Virtual Therapist Construction 6
3-2 Dialogue Management 7
3-2-1 Uni-modal Emotion Recognition 7
3-2-2 3-Pass Algorithm 8
3-3 Feature Extraction 9
3-3-1 Text 9
3-3-2 Audio 9
3-3-3 Facial Attributes 10
3-3-4 Heart Rate Variability 10
3-3-5 Eye Movement 11
3-4 Multi-modal Depression Level Assessment 13
3-5 Evaluation Metrics 15
四、實驗結果 16
五、討論 22
六、結論與未來展望 23
參考文獻 24

參考文獻

﹝1﹞WHO — World Health Organization, “Depression,” Fact Sheets, 2021.[Online]. Available: https://www.who.int/news-room/fact- sheets/detail/depression
﹝2﹞A. Pampouchidou, P. Simos, K. Marias, F. Meriaudeau, F. Yang, M. Pediaditis, and M. Tsiknakis, “Automatic assessment of depres- sion based on visual cues: A systematic review,” IEEE Transactions on Affective Computing, vol. 10, no. 4, pp. 445–470, 2019.
﹝3﹞L. He, M. Niu, P. Tiwari, P. Marttinen, R. Su, J. Jiang, C. Guo, H. Wang, S. Ding, Z. Wang et al., “Deep learning for depression recognition with audiovisual cues: A review,” Information Fusion, vol. 80, pp. 56–86, 2022.
﹝4﹞D. C. Mohr, J. Ho, J. Duffecy, K. G. Baron, K. A. Lehman, L. Jin, and D. Reifler, “Perceived barriers to psychological treatments and their relationship to depression,” Journal of clinical psychology, vol. 66, no. 4, pp. 394–409, 2010.
﹝5﹞M. J. DuPont-Reyes, A. P. Villatoro, J. C. Phelan, K. Painter, and B. G. Link, “Adolescent views of mental illness stigma: An intersectional lens.” American Journal of Orthopsychiatry, vol. 90, no. 2, p. 201, 2020.
﹝6﹞N. Cummins, V. Sethu, J. Epps, J. R. Williamson, T. F. Quatieri, and J. Krajewski, “Generalized two-stage rank regression framework for depression score prediction from speech,” IEEE Transactions on Affective Computing, vol. 11, no. 02, pp. 272–283, 2020.
﹝7﹞C. Koch, M. Wilhelm, S. Salzmann, W. Rief, and F. Euteneuer, “A meta-analysis of heart rate variability in major depression,” Psychological Medicine, vol. 49, no. 12, pp. 1948–1957, 2019.
﹝8﹞J. Zhu, Z. Wang, T. Gong, S. Zeng, X. Li, B. Hu, J. Li, S. Sun, and L. Zhang, “An improved classification model for depression detection using eeg and eye tracking data,” IEEE transactions on nanobioscience, vol. 19, no. 3, pp. 527–537, 2020.
﹝9﹞L. Yang, D. Jiang, and H. Sahli, “Integrating deep and shallow models for multi-modal depression analysis—hybrid architec- tures,” IEEE Transactions on Affective Computing, vol. 12, no. 01, pp. 239–253, 2021.
﹝10﹞C. Demiroglu, A. Bes¸irli, Y. Ozkanca, and S. C¸elik, “Depression- level assessment from multi-lingual conversational speech data using acoustic and text features,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2020, no. 1, pp. 1–17, 2020.
﹝11﹞X. Zhou, K. Jin, Y. Shang, and G. Guo, “Visually interpretable representation learning for depression recognition from facial images,” IEEE Transactions on Affective Computing, vol. 11, no. 03, pp. 542–552, 2020.
﹝12﹞J. Firth, J. Torous, J. Nicholas, R. Carney, A. Pratap, S. Rosen- baum, and J. Sarris, “The efficacy of smartphone-based mental health interventions for depressive symptoms: a meta-analysis of randomized controlled trials,” World Psychiatry, vol. 16, no. 3, pp. 287–298, 2017.
﹝13﹞M. Fuller-Tyszkiewicz, B. Richardson, B. Klein, H. Skouteris, H. Christensen, D. Austin, D. Castle, C. Mihalopoulos, R. O’Donnell, L. Arulkadacham et al., “A mobile app–based in- tervention for depression: End-user and expert usability testing study,” JMIR mental health, vol. 5, no. 3, p. e54, 2018.
﹝14﹞G. M. Lucas, J. Gratch, A. King, and L.-P. Morency, “It’s only a computer: Virtual humans increase willingness to disclose,” Computers in Human Behavior, vol. 37, pp. 94–100, 2014.
﹝15﹞M. D. Pickard, C. A. Roster, and Y. Chen, “Revealing sensitive information in personal interviews: Is self-disclosure easier with humans or avatars and under what conditions?” Computers in Human Behavior, vol. 65, pp. 23–30, 2016.
﹝16﹞M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic, “Avec 2013: the continuous audio/visual emotion and depression recognition challenge,” in Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, 2013, pp. 3–10.
﹝17﹞M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic, “Avec 2014: 3d dimensional affect and depression recognition challenge,” in Proceedings of the 4th interna- tional workshop on audio/visual emotion challenge, 2014, pp. 3–10.
﹝18﹞M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Tor- res Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic, “Avec 2016: Depression, mood, and emotion recognition workshop and challenge,” in Proceedings of the 6th international workshop on au- dio/visual emotion challenge, 2016, pp. 3–10.
﹝19﹞T. Baltruˇ saitis, P. Robinson, and L.-P. Morency, “Openface: an open source facial behavior analysis toolkit,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2016, pp. 1–10.
﹝20﹞G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer, “Covarep—a collaborative voice analysis repository for speech technologies,” in 2014 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 2014, pp. 960–964.
﹝21﹞F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, M. Schmitt, S. Alisamir, S. Amiriparian, E.-M. Messner et al., “Avec 2019 workshop and challenge: state-of-mind, detect- ing depression with ai, and cross-cultural affect recognition,” in Proceedings of the 9th International on Audio/visual Emotion Challenge and Workshop, 2019, pp. 3–12.
﹝22﹞F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. Andr´ e, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan et al., “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE transactions on affective computing, vol. 7, no. 2, pp. 190–202, 2015.
﹝23﹞K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
﹝24﹞A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica- tion with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
﹝25﹞G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
﹝26﹞K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
﹝27﹞W. C. de Melo, E. Granger, and M. B. Lopez, “Mdn: A deep maximization-differentiation network for spatio-temporal depres- sion detection,” IEEE Transactions on Affective Computing, 2021.
﹝28﹞M. Al Jazaery and G. Guo, “Video-based depression level analysis by encoding deep spatiotemporal features,” IEEE Transactions on Affective Computing, vol. 12, no. 01, pp. 262–268, 2021.
﹝29﹞Z. Zhao, Z. Bao, Z. Zhang, J. Deng, N. Cummins, H. Wang, J. Tao, and B. Schuller, “Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 2, pp. 423–434, 2019.
﹝30﹞S. H. Dumpala, S. Rempel, K. Dikaios, M. Sajjadian, R. Uher, and S. Oore, “Estimating severity of depression from acoustic features and embeddings of natural speech,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 7278–7282.
﹝31﹞E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Oth- mani, “Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,” Biomedical Signal Processing and Control, vol. 71, p. 103107, 2022.
﹝32﹞H. Cai, X. Zhang, Y. Zhang, Z. Wang, and B. Hu, “A case- based reasoning model for depression based on three-electrode eeg data,” IEEE Transactions on Affective Computing, vol. 11, no. 03, pp. 383–392, 2020.
﹝33﹞C. Jiang, Y. Li, Y. Tang, and C. Guan, “Enhancing eeg-based classification of depression patients using spatial information,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 566–575, 2021.
﹝34﹞Z. Pan, H. Ma, L. Zhang, and Y. Wang, “Depression detection based on reaction time and eye movement,” in 2019 IEEE Inter- national Conference on Image Processing (ICIP). IEEE, 2019, pp. 2184–2188.
﹝35﹞M. Li, L. Cao, Q. Zhai, P. Li, S. Liu, R. Li, L. Feng, G. Wang, B. Hu, and S. Lu, “Method of depression classification based on behavioral and physiological signals of eye movement,” Complexity, vol. 2020, p. 4174857, Jan 2020.[Online]. Available: https://doi.org/10.1155/2020/4174857
﹝36﹞H. Dibeklio˘glu, Z. Hammal, and J. F. Cohn, “Dynamic multimodal measurement of depression severity using deep autoencoding,” IEEE journal of biomedical and health informatics, vol. 22, no. 2, pp. 525–536, 2017.
﹝37﹞H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min- redundancy,” IEEE Transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226–1238, 2005.
﹝38﹞S. A. Qureshi, S. Saha, M. Hasanuzzaman, and G. Dias, “Multitask representation learning for multimodal estimation of depression level,” IEEE Intelligent Systems, vol. 34, no. 5, pp. 45–52, 2019.
﹝39﹞G. Lam, H. Dongyan, and W. Lin, “Context-aware deep learning for multi-modal depression detection,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 3946–3950.
﹝40﹞M. Rodrigues Makiuchi, T. Warnita, K. Uto, and K. Shinoda, “Multimodal fusion of bert-cnn and gated cnn representations for depression detection,” in Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019, pp. 55–63.
﹝41﹞J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- training of deep bidirectional transformers for language under- standing,” arXiv preprint arXiv:1810.04805, 2018.
﹝42﹞M. Hamilton, “A rating scale for depression,” Journal of neurology, neurosurgery, and psychiatry, vol. 23, no. 1, p. 56, 1960.
﹝43﹞J. Angst, R. Adolfsson, F. Benazzi, A. Gamma, E. Hantouche, T. D. Meyer, P. Skeppar, E. Vieta, and J. Scott, “The hcl-32: towards a self- assessment tool for hypomanic symptoms in outpatients,” Journal of affective disorders, vol. 88, no. 2, pp. 217–233, 2005.
﹝44﹞H. Tian, C. Gao, X. Xiao, H. Liu, B. He, H. Wu, H. Wang, and F. Wu, “Skep: Sentiment knowledge enhanced pre-training for sentiment analysis,” arXiv preprint arXiv:2005.05635, 2020.
﹝45﹞M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
﹝46﹞A. Zadeh and P. Pu, “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Compu- tational Linguistics (Long Papers), 2018.
﹝47﹞C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, pp. 335–359, 2008.
﹝48﹞S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english,” PloS one, vol. 13, no. 5, p. e0196391, 2018.
﹝49﹞F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, B. Weiss et al., “A database of german emotional speech.” in Interspeech, vol. 5, 2005, pp. 1517–1520.
﹝50﹞M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multi- modal database for affect recognition and implicit tagging,” IEEE transactions on affective computing, vol. 3, no. 1, pp. 42–55, 2011.
﹝51﹞W.-L. Zheng, W. Liu, Y. Lu, B.-L. Lu, and A. Cichocki, “Emo- tionmeter: A multimodal framework for recognizing human emo- tions,” IEEE transactions on cybernetics, vol. 49, no. 3, pp. 1110–1122, 2018.
﹝52﹞J. A. Russell, “A circumplex model of affect.” Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980.
﹝53﹞S. S. Stevens, J. Volkmann, and E. B. Newman, “A scale for the measurement of the psychological magnitude pitch,” The journal of the acoustical society of america, vol. 8, no. 3, pp. 185–190, 1937.
﹝54﹞S. Davis and P. Mermelstein, “Comparison of parametric rep- resentations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357–366, 1980.
﹝55﹞D.-N. Jiang, L. Lu, H.-J. Zhang, J.-H. Tao, and L.-H. Cai, “Music type classification by spectral contrast feature,” in Proceedings. IEEE International Conference on Multimedia and Expo, vol. 1. IEEE, 2002, pp. 113–116.
﹝56﹞D. Ellis, “Chroma feature analysis and synthesis,” Resources of Laboratory for the Recognition and Organization of Speech and Audio- LabROSA, vol. 5, 2007.
﹝57﹞C. Harte, M. Sandler, and M. Gasser, “Detecting harmonic change in musical audio,” in Proceedings of the 1st ACM workshop on Audio and music computing multimedia, 2006, pp. 21–26.
﹝58﹞O. Wiles, A. Koepke, and A. Zisserman, “Self-supervised learn- ing of a facial attribute embedding from video,” arXiv preprint arXiv:1808.06882, 2018.
﹝59﹞A. Nagrani, J. S. Chung, and A. Zisserman, “Voxceleb: a large-scale speaker identification dataset,” arXiv preprint arXiv:1706.08612, 2017.
﹝60﹞J. S. Chung, A. Nagrani, and A. Zisserman, “Voxceleb2: Deep speaker recognition,” arXiv preprint arXiv:1806.05622, 2018.
﹝61﹞A. H. Kemp, D. S. Quintana, M. A. Gray, K. L. Felmingham, K. Brown, and J. M. Gatt, “Impact of depression and antide- pressant treatment on heart rate variability: a review and meta- analysis,” Biological psychiatry, vol. 67, no. 11, pp. 1067–1074, 2010.
﹝62﹞R. Hartmann, F. M. Schmidt, C. Sander, and U. Hegerl, “Heart rate variability as indicator of clinical state in depression,” Frontiers in psychiatry, vol. 9, p. 735, 2019.
﹝63﹞G. Boccignone, D. Conte, V. Cuculo, A. D’Amelio, G. Grossi, and R. Lanzarotti, “An open framework for remote-PPG methods and their assessment,” IEEE Access, pp. 1–1, 2020.[Online]. Available: https://doi.org/10.1109/access.2020.3040936
﹝64﹞S. Krishna and J. Anju, “Different approaches in depression anal- ysis: A review,” in 2020 International Conference on Computational Performance Evaluation (ComPE). IEEE, 2020, pp. 407–414.
﹝65﹞Y. Lin, H. Ma, Z. Pan, and R. Wang, “Depression detection by combining eye movement with image semantics,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 269–273.
﹝66﹞R. Shen, Q. Zhan, Y. Wang, and H. Ma, “Depression detection by analysing eye movements on emotional images,” in ICASSP 2021- 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 7973–7977.
﹝67﹞G. Buscher, A. Dengel, and L. van Elst, “Eye movements as implicit relevance feedback,” in CHI’08 extended abstracts on Human factors in computing systems, 2008, pp. 2991–2996.
﹝68﹞C. Rigaud, T.-N. Le, J.-C. Burie, J.-M. Ogier, S. Ishimaru, M. Iwata, and K. Kise, “Semi-automatic text and graphics extraction of manga using eye tracking information,” in 2016 12th IAPR Work- shop on Document Analysis Systems (DAS). IEEE, 2016, pp. 120–125.
﹝69﹞J. G. Saxe, The blind men and the elephant. Enrich Spot Limited, 2016.

指導教授

吳曉光(Eric Hsiao-Kuang Wu)

審核日期

2022-8-3

推文