中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98325
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83696/83696 (100%)
Visitors : 56450391      Online Users : 1878
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98325


    Title: 應用於自訂語音控制系統的小樣本開集關鍵詞 偵測之原型產生網路;Prototype Generation Network for Few-Shot Open-Set Keyword Spotting in Custom Voice Control Systems
    Authors: 王俞擎;Wang, Yu-Ching
    Contributors: 電機工程學系
    Keywords: 關鍵詞偵測;小樣本學習;小樣本開集識別;keyword spotting;few-shot learning;few-shot open-set recognition
    Date: 2025-06-03
    Issue Date: 2025-10-17 12:37:55 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 近年來,越來越多的裝置與系統需要支援使用者自定語音控制的功能。然而,普通的關鍵詞偵測(Keyword Spotting)神經網路,因為其辨識的詞彙是在訓練前先設定好的、無法由使用者隨意更換,已然無法滿足此需求。若是使用大規模詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)模型,雖然幾乎能辨識所有使用者自訂的語音命令,但其所需的存儲空間過於龐大。因此,小樣本學習(Few-Shot Learning)關鍵詞偵測模型成為解決該問題的理想選擇,過往基於度量學習(Metric Learning)的方法有著原型(prototype)無法很好地代表類別的問題,我們在本論文中設計了幾種解決此問題的模型架構,在Google Speech Commands (GSC)資料集上評估並達到了state-of-the-art的表現。;In recent years, an increasing number of devices and systems have required support for user-defined voice commands. However, conventional keyword spotting neural networks define a fixed set of keywords during training, which users cannot freely modify, making them inadequate for meeting this demand. While Large Vocabulary Continuous Speech Recognition (LVCSR) neural networks can recognize nearly all user-defined keywords, their storage requirements are excessively large. Few-shot open-set keyword spotting, which only requires users to provide a few examples of voice commands for recognition, has become an ideal solution to this problem. However, previous metric-based few-shot models suffer from prototypes that do not accurately represent their corresponding classes. In this paper, we explore several methods to address this issue, evaluate them on the Google Speech Commands dataset, and achieve state-of-the-art accuracy.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML7View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明