基於曲風鑑別參數之音樂情緒辨識系統設計與實現;A Music Emotion Recognition Framework Based on Genre-Informed Feature Representation

NCUIR > College of Electrical Engineering & Computer Science > Executive Master of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/98200

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98200

Title:	基於曲風鑑別參數之音樂情緒辨識系統設計與實現;A Music Emotion Recognition Framework Based on Genre-Informed Feature Representation
Authors:	吳宗霖;Wu, Zong-Lin
Contributors:	資訊工程學系在職專班
Keywords:	音樂情緒辨識;曲風分類;YAMNet;LSTM;深度學習;Music Emotion Recognition;Genre Classification;YAMNet;LSTM;Deep Learning
Date:	2025-08-01
Issue Date:	2025-10-17 12:28:45 (UTC+8)
Publisher:	國立中央大學
Abstract:	本研究探討音樂曲風對心理刺激度（喚醒度）與情緒正負向（效價）的影響，並以音訊特徵為依據進行情緒辨識。實驗過程中播放 GTZAN 資料集中的不同曲風音樂，並以預訓練 YAMNet 模型萃取音訊特徵。情緒標籤則參考 DEAM 音樂資料集，將各曲風的平均效價與喚醒度對應至 Russell 二維情緒模型中的四個象限。本研究亦開發 Flutter APP，實現即時曲風辨識功能。曲風分類模型結合 YAMNet 特徵與全連接神經網路，在 GTZAN 測試集上達到 F1-Score 0.87，尤以 Classical、Metal 等曲風辨識表現最佳。情緒象限分類方面，進一步將曲風語意特徵序列（由 YAMNet 與曲風分類分支學得）輸入 LSTM 模型進行情緒分類，最終在四象限分類達到 F1-Score 0.93，顯示序列式深度學習架構於音樂情緒預測的有效性。進一步分析發現，不同音樂曲風於情緒空間分布具有明顯差異：如金屬與搖滾音樂多傾向於高喚醒負向情緒，古典與爵士則偏向正向低喚醒狀態。本研究證實，結合音樂語意特徵與序列建模方法，能有效辨識音樂引發之情緒，並展示其於自動化情緒分析、智慧音樂推薦與情緒調節等應用的潛力。;This study investigates the impact of music genres on psychological arousal and emotional valence, and performs emotion recognition based on audio features. During the experiment, music pieces of various genres from the GTZAN dataset were played, and audio features were extracted using the pretrained YAMNet model. Emotional labels were derived from the DEAM music dataset, mapping each genre’s average valence and arousal to the four quadrants of the Russell two-dimensional emotion model. A Flutter app was also developed to provide real-time genre recognition functionality. The genre classification model, which integrates YAMNet features with fully connected neural networks, achieved an F1-score of 0.87 on the GTZAN test set, with genres such as Classical and Metal performing particularly well. For emotion quadrant classification, the semantic feature sequences of genres (learned by YAMNet and the genre classification branch) were further input into an LSTM model to capture temporal information and perform emotion classification. This approach achieved an F1-score of 0.93 in the four-quadrant emotion classification task, demonstrating the effectiveness of sequential deep learning architectures for music emotion prediction. Further analysis revealed that different music genres are distributed distinctly within the emotional space: genres such as Metal and Rock tend to be associated with high-arousal negative emotions, whereas Classical and Jazz are more related to positive low-arousal states. The findings of this study confirm that combining music semantic features with sequential modeling methods enables effective recognition of music-induced emotions, highlighting the potential for automated emotion analysis, intelligent music recommendation, and emotion regulation applications.
Appears in Collections:	[Executive Master of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	207	View/Open

社群 sharing

Loading...