中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98325
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 83696/83696 (100%)
造访人次 : 56450391      在线人数 : 1878
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98325


    题名: 應用於自訂語音控制系統的小樣本開集關鍵詞 偵測之原型產生網路;Prototype Generation Network for Few-Shot Open-Set Keyword Spotting in Custom Voice Control Systems
    作者: 王俞擎;Wang, Yu-Ching
    贡献者: 電機工程學系
    关键词: 關鍵詞偵測;小樣本學習;小樣本開集識別;keyword spotting;few-shot learning;few-shot open-set recognition
    日期: 2025-06-03
    上传时间: 2025-10-17 12:37:55 (UTC+8)
    出版者: 國立中央大學
    摘要: 近年來,越來越多的裝置與系統需要支援使用者自定語音控制的功能。然而,普通的關鍵詞偵測(Keyword Spotting)神經網路,因為其辨識的詞彙是在訓練前先設定好的、無法由使用者隨意更換,已然無法滿足此需求。若是使用大規模詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)模型,雖然幾乎能辨識所有使用者自訂的語音命令,但其所需的存儲空間過於龐大。因此,小樣本學習(Few-Shot Learning)關鍵詞偵測模型成為解決該問題的理想選擇,過往基於度量學習(Metric Learning)的方法有著原型(prototype)無法很好地代表類別的問題,我們在本論文中設計了幾種解決此問題的模型架構,在Google Speech Commands (GSC)資料集上評估並達到了state-of-the-art的表現。;In recent years, an increasing number of devices and systems have required support for user-defined voice commands. However, conventional keyword spotting neural networks define a fixed set of keywords during training, which users cannot freely modify, making them inadequate for meeting this demand. While Large Vocabulary Continuous Speech Recognition (LVCSR) neural networks can recognize nearly all user-defined keywords, their storage requirements are excessively large. Few-shot open-set keyword spotting, which only requires users to provide a few examples of voice commands for recognition, has become an ideal solution to this problem. However, previous metric-based few-shot models suffer from prototypes that do not accurately represent their corresponding classes. In this paper, we explore several methods to address this issue, evaluate them on the Google Speech Commands dataset, and achieve state-of-the-art accuracy.
    显示于类别:[電機工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML7检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明