dc.description.abstract | Bipolar disorder is a mental disorder that can seriously affect individuals. Early and accurate diagnosis is crucial; however, misdiagnosis of bipolar disorder as depression can lead to incorrect treatment. Therefore, it is important to develop tools that support clinicians in making accurate diagnoses. Machine learning approaches can provide such solutions. Recently, audio has become an important domain for research, with increasing studies exploring the use of audio data to predict mental disorders. However, collecting a sufficient amount of audio data to build a classifier model is costly and impractical, presenting a challenge in utilizing audio data. To address the issue of limited datasets, transfer learning offers a viable solution. In this paper, we conduct a comprehensive study on the effectiveness of audio and textual features, comparing conventional hand-crafted features with learned features. Our results show that learned features outperform conventional hand-crafted features. Additionally, we explore multimodal approaches that combine audio and textual data, finding that while multimodal techniques do not surpass the performance of audio features alone, they do provide improvements over textual features alone. These findings highlight the potential of learned features and multimodal approaches in supporting the accurate diagnosis of bipolar disorder, suggesting a promising direction for future research. | en_US |