dc.description.abstract | Depression not only afflicts hundreds of millions of people but also contributes to a global disability and healthcare burden. The primary method of diagnosing depression relies on the judgment of medical professionals in clinical interviews with patients, which is subjective and time-consuming. Recent studies have demonstrated that text, audio, facial attributes, heart rate, and eye movement could be utilized for depression assessment. In this paper, we construct a virtual therapist for automatic depression assessment on mobile devices that can actively guide users through voice dialogue and change conversation content using emotion perception. During the conversation, features from text, audio, facial attributes, heart rate, and eye movement are extracted for multi-modal depression-level assessment. We utilize a feature-level fusion framework to integrate five modalities and the deep neural network to classify the varying levels of depression, which include healthy, mild, moderate, or severe depression, as well as bipolar disorder (formerly called manic depression). With outcome data from 168 subjects, experimental results reveal that the total accuracy of feature-level fusion with five modal features achieves the highest accuracy of 90.26 percent. | en_US |