進一步分析發現,不同音樂曲風於情緒空間分布具有明顯差異:如金屬與搖滾音樂多傾向於高喚醒負向情緒,古典與爵士則偏向正向低喚醒狀態。本研究證實,結合音樂語意特徵與序列建模方法,能有效辨識音樂引發之情緒,並展示其於自動化情緒分析、智慧音樂推薦與情緒調節等應用的潛力。;This study investigates the impact of music genres on psychological arousal and emotional valence, and performs emotion recognition based on audio features. During the experiment, music pieces of various genres from the GTZAN dataset were played, and audio features were extracted using the pretrained YAMNet model. Emotional labels were derived from the DEAM music dataset, mapping each genre’s average valence and arousal to the four quadrants of the Russell two-dimensional emotion model.
A Flutter app was also developed to provide real-time genre recognition functionality. The genre classification model, which integrates YAMNet features with fully connected neural networks, achieved an F1-score of 0.87 on the GTZAN test set, with genres such as Classical and Metal performing particularly well. For emotion quadrant classification, the semantic feature sequences of genres (learned by YAMNet and the genre classification branch) were further input into an LSTM model to capture temporal information and perform emotion classification. This approach achieved an F1-score of 0.93 in the four-quadrant emotion classification task, demonstrating the effectiveness of sequential deep learning architectures for music emotion prediction.
Further analysis revealed that different music genres are distributed distinctly within the emotional space: genres such as Metal and Rock tend to be associated with high-arousal negative emotions, whereas Classical and Jazz are more related to positive low-arousal states. The findings of this study confirm that combining music semantic features with sequential modeling methods enables effective recognition of music-induced emotions, highlighting the potential for automated emotion analysis, intelligent music recommendation, and emotion regulation applications.