姓名 楊千郁(Chien-Yu Yang)  查詢紙本館藏   畢業系所 企業管理學系
論文名稱 透過生成式 AI 增強嬰兒哭聲分類 模型效能之研究
(A Study on Enhancing the Performance of Infant Cry Classification Models Using Generative AI)
摘要(中) 嬰兒的哭聲如同成人的言語,嬰兒透過哭泣來表達需求以及感受,使照護者
了解決這一項問題,我們提出了使用 GAN 生成額外的嬰兒哭聲樣本,進行擴充訓
便、想睡覺)下的真實嬰兒哭聲的樣本,並使用 WaveGAN 生成模型來逐一生成
集訓練的模型。這表明,使用 GAN 能夠有效擴充訓練資料集,提高模型的泛化能
力和準確性。綜合以上結果,我們認為使用 GAN 生成額外的嬰兒哭聲樣本是一種
摘要(英) Infant cries, akin to adult speech, serve as a means for infants to express their needs
and feelings, allowing caregivers to receive cues and provide corresponding care.
However, there lacks a comprehensive dataset of infant cries, leading to suboptimal
performance in using deep learning models to predict infant needs based on cry sounds.
This study aims to explore the improvement of infant cry classification models
using Generative Adversarial Networks (GANs). Due to the limited availability of
infant cry datasets, the predictive performance of deep learning classification models is
compromised. To address this issue, we propose using GANs to generate additional
infant cry samples to augment the training dataset and subsequently enhance the
predictive performance of the classification model.
In this study, we collected samples of real infant cries corresponding to five
different needs (anger, hunger, insecured, poopee, sleepy) in advance. We then
utilized the WaveGAN to generate additional infant cry samples for each need category.
The generated samples were combined with the original dataset to form a new
augmented dataset. Subsequently, this augmented dataset was used for model training,
while the original dataset was also separately utilized for training. The performances of
models trained on the augmented dataset and the original dataset were compared
individually. We employed Long Short-Term Memory (LSTM) deep learning models
for the classification of infant cry needs.
The experimental results demonstrate that the model trained on the augmented
dataset, which incorporates the generated data, significantly outperforms the model
trained solely on the original dataset. This indicates that GANs effectively augment the
training dataset, thereby improving the model′s generalization ability and accuracy. In
conclusion, we believe that using GANs to generate additional infant cry samples is an
effective approach to enhance the predictive performance of infant cry classification
models. This contributes significantly to improving the standards and efficiency of
infant healthcare.
關鍵字(中) ★ 嬰兒哭聲
★ 深度學習
★ 生成對抗網路
★ 長短期記憶網路
關鍵字(英) ★ Infant cry
★ deep learning
★ Generative Adversarial Networks
★ Long Short-Term Memory
論文目次 目錄
摘要 I
誌謝 IV
目錄 V
圖目錄 VIII
表目錄 X
第一章 緒論 1
1-1研究背景 1
1-2 研究動機 2
1-3 研究目的 3
1-4研究架構 5
第二章 文獻探討 6
2-1音頻特徵提取 6
2-1-1音頻特徵提取方法及應用 7
2-1-2 梅爾頻率倒譜係數 (MFCC) 8
2-2 嬰兒哭聲分類模型 9
2-3 資料擴增方法 10
2-3-1 生成對抗網絡(GAN) 11
2-3-2 WaveGAN 12
第三章 研究方法 14
3-1 研究流程 14
3-2 生成對抗網路(GAN) 16
3-2-1 WaveGAN 17
3-2-2 損失函數:WGAN-GP 18
3-2-3 相位混洗(Phase Shuffle) 20
3-3 哭聲音訊特徵提取 21
3-4 分類模型 25
3-4-1 遞迴式神經網路(RNN) 25
3-4-2 長短期記憶網絡(LSTM) 26
3-4-3 損失函數 28
第四章 研究實驗 29
4-1資料蒐集 29
4-2資料生成 30
4-2-1 WaveGAN模型架構及參數設定 30
4-2-2 生成模型結果分析 36
4-2-3創建擴增資料集 38
4-3分類模型參數設定 41
4-4實驗結果分析 43
第五章 研究結論及建議 52
5-1 研究結論 52
5-2 研究限制 53
5-3 未來研究建議 53
參考文獻 54
指導教授 許秉瑜(Ping-Yu Hsu) 審核日期 2024-7-15
