中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90827
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 78852/78852 (100%)
造访人次 : 37841185      在线人数 : 531
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/90827


    题名: 深度神經網路於音訊、語音和影像之研究;Deep Neural Networks for Audio, Speech, and Image Applications
    作者: 鄧氏陲殷;An, Dang Thi Thuy
    贡献者: 資訊工程學系
    关键词: EMix;語音情緒辨識;聲學場景分類;MixStyleFreq;影 像檢索;影像檢索;基於内容的影像檢索;美容產品影像檢索;EMix;Speech Emotion Recognition;Acoustic Scene Classification;MixStyleFreq;image retrieval;content based image retrieval;beauty product image retrieval
    日期: 2023-02-23
    上传时间: 2023-05-09 18:07:20 (UTC+8)
    出版者: 國立中央大學
    摘要: 這項工作旨在為人工智能領域的幾個問題的發展做出貢獻,包括語音情緒辨識 (SER)、聲學場景分類 (ASC) 和基於内容的影像檢索 (CBIR)。 這些問題來自各個領域,並有許多實際應用。例如,SER 可用於人機交互和心理保健,而 ASC 有助於了解周圍環境,這對於機器人導航、情境感知和監控應用非常有用。CBIR 涉及根據給定的查詢影像識別數據庫中的相關影像,可用於各種類型的影像檢索。 在本論文中,我們提出了使用深度神經網絡 (DNN) 來解決這些問題的方法。
    具體來說,我們針對 SER 問題開發了一種簡單而有效的數據增強 (DA) 方法。 由於數據稀缺和標籤模糊,SER 很困難,DNN 模型容易過度擬合,這會導致測試數據泛化能力差。我們的 DA 方法創建的新數據樣本可能比原始數據樣本噪聲更大或模糊性更低,並且在我們對兩個公共數據集的實驗中,它證明了優於其他 DA 方法。 在 ASC 中,我們關注在跨設備設置中使用 DNN 模型時性能下降的問題,其中訓練和測試數據使用不同的設備記錄。我們提出了一個具有兩種 DA 方法的 ASC 系統:用於減少域間隙的 MixStyleFreq 和用於減輕 DNN 對主導設備的偏差的頻譜校正。 與其他 DA 方法相比,這些方法顯著提高了泛化性能,並取得了有競爭力的結果。 最後,我們針對 CBIR 中的美容產品影像檢索問題開發了一個完全端到端的 DNN 模型。 該模型不需要手動特徵聚合或後處理,在 Perfect-500K 數據集上的實驗結果顯示了其有效性和高檢索精度。
    ;The work aims to contribute to the development of several problems in the field of artificial intelligence, including speech emotion recognition (SER), acoustic scene classification (ASC), and content-based image retrieval (CBIR). These problems come from various domains and have many practical applications. For example, SER can be used in human-machine interaction and mental healthcare, while ASC helps to understand the surrounding environment, which is useful for robot navigation, context awareness, and surveillance applications. CBIR involves identifying relevant images in a database based on a given query image, and can be used in various types of image search. In this thesis, we propose approaches using deep neural networks (DNNs) to address these problems.
    Specifically, we develop a simple yet effective data augmentation (DA) method for the SER problem. SER is difficult due to the scarcity of data and ambiguity of labels, and DNN models are prone to overfitting, which can lead to poor generalization on test data. Our DA method creates new data samples that may be noisier or less ambiguous than the original ones, and in our experiments with two public datasets, it demonstrates superiority over other DA methods. In ASC, we focus on the problem of performance degradation when DNN models are used in a cross-device setting, where the train and test data are recorded using different devices. We propose an ASC system with two DA methods: MixStyleFreq to reduce domain gaps, and spectrum correction to mitigate the bias of DNNs toward dominant devices. These methods significantly improve the generalization performance compared to other DA methods and achieve competitive results. Finally, we develop a fully end-to-end DNN model for the beauty product image retrieval problem in CBIR. This model requires no manual feature aggregation or post-processing, and experimental results on the Perfect-500K dataset show its effectiveness with high retrieval accuracy.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML60检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明