對比學習以提升嬰兒哭聲分類器效能之研究;A Study on Enhancing Infant Cry Classification Performance via Contrastive Learning

NCU Institutional Repository > 管理學院 > 企業管理研究所 > 博碩士論文 > Item 987654321/97575

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/97575

題名:	對比學習以提升嬰兒哭聲分類器效能之研究;A Study on Enhancing Infant Cry Classification Performance via Contrastive Learning
作者:	葉錡;Chi, Yeh
貢獻者:	企業管理學系
關鍵詞:	嬰兒哭聲分類;監督式對比學習;SimCLR;卷積神經網路;支持向量機;Infant Cry Classification;Supervised Contrastive Learning;SimCLR;Convolutional Neural Networks;Support Vector Machine
日期:	2025-07-16
上傳時間:	2025-10-17 11:38:07 (UTC+8)
出版者:	國立中央大學
摘要:	嬰兒的哭聲反映其基本需求，如飢餓、情緒波動或缺乏安全感，為其唯一可用之溝通方式。準確辨識嬰兒哭聲類型，有助於照護者即時理解並對應其生理或心理狀態。然而，由於嬰兒哭聲類型在聲音特徵上存在顯著重疊，使得模型在類別區分上面臨挑戰，尤其在資料標註有限的情況下，深度學習模型常難以有效學得具判別力的特徵。本研究結合監督式對比學習（Supervised Contrastive Learning, SCL）架構，藉以提升嬰兒哭聲分類模型之特徵表徵能力與分類效能。於上游訓練階段，本研究採用 SimCLR 為基礎架構，搭配三種不同深度的卷積神經網路（CNN3、CNN5 與 CNN7）作為編碼器，透過 SCL 訓練拉近同類別樣本距離並拉遠異類別樣本距離，以強化特徵嵌入空間之區辨性。於下游分類任務中，本研究比較線性分類器（Linear Classifier）與支持向量機（Support Vector Machine, SVM）兩種模型於四類嬰兒哭聲（生氣、肚子餓、不安全感、想睡覺）之分類表現。實驗結果顯示，經監督式對比學習預訓練之模型相較於非監督式對比學習（使用 NT-Xent 損失函數）具備更高之分類效能，其中監督式訓練模型於測試集最高可達 82% 準確率，優於無監督訓練下之 66.8%。此外，三種 CNN 結構比較顯示，CNN5 結合 SimCLR 架構可取得最佳分類效能。由此可見，監督式對比學習不僅能提升特徵表徵品質，更有助於改善模型於嬰兒哭聲分類任務之泛化能力與整體準確率。 ;Infant cries reflect their fundamental needs—such as hunger, emotional distress, or insecurity—and serve as their primary means of communication. Accurate classification of infant cry types assists caregivers in promptly understanding and responding to the infant’s physiological or psychological states. However, due to the substantial overlap in acoustic features across different cry types, models often face challenges in distinguishing between categories. This is particularly problematic when labeled data are limited, making it difficult for deep learning models to learn discriminative representations effectively To address this issue, this study integrates a supervised contrastive learning (SCL) framework to enhance both the representational quality and classification performance of infant cry recognition models. In the upstream training stage, a SimCLR-based architecture was adopted, coupled with three convolutional neural networks of varying depths (CNN3, CNN5, and CNN7) as encoders. The SCL mechanism was employed to reduce intra-class distances and enlarge inter-class separations in the embedding space, thus improving the distinctiveness of learned features. In the downstream classification tasks, the study compared the performance of two classifiers—linear classifier and support vector machine (SVM)—in identifying four types of infant cries: angry, hungry, insecured, and sleepy. Experimental results show that models pre-trained with supervised contrastive learning outperform those trained with unsupervised contrastive learning using the NT-Xent loss function. The SCL-based model achieved a highest test accuracy of 82%, compared to 66.8% from the unsupervised approach. Furthermore, among the CNN architectures evaluated, CNN5 integrated with SimCLR yielded the best classification performance. These findings confirm that supervised contrastive learning not only improves feature representation quality but also enhances the generalization and overall accuracy of infant cry classification models.
顯示於類別:	[企業管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	20	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....