應用於自訂語音控制系統的小樣本開集關鍵詞 偵測之原型產生網路;Prototype Generation Network for Few-Shot Open-Set Keyword Spotting in Custom Voice Control Systems

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/98325

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98325

題名:	應用於自訂語音控制系統的小樣本開集關鍵詞偵測之原型產生網路;Prototype Generation Network for Few-Shot Open-Set Keyword Spotting in Custom Voice Control Systems
作者:	王俞擎;Wang, Yu-Ching
貢獻者:	電機工程學系
關鍵詞:	關鍵詞偵測;小樣本學習;小樣本開集識別;keyword spotting;few-shot learning;few-shot open-set recognition
日期:	2025-06-03
上傳時間:	2025-10-17 12:37:55 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，越來越多的裝置與系統需要支援使用者自定語音控制的功能。然而，普通的關鍵詞偵測(Keyword Spotting)神經網路，因為其辨識的詞彙是在訓練前先設定好的、無法由使用者隨意更換，已然無法滿足此需求。若是使用大規模詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)模型，雖然幾乎能辨識所有使用者自訂的語音命令，但其所需的存儲空間過於龐大。因此，小樣本學習(Few-Shot Learning)關鍵詞偵測模型成為解決該問題的理想選擇，過往基於度量學習(Metric Learning)的方法有著原型(prototype)無法很好地代表類別的問題，我們在本論文中設計了幾種解決此問題的模型架構，在Google Speech Commands (GSC)資料集上評估並達到了state-of-the-art的表現。;In recent years, an increasing number of devices and systems have required support for user-defined voice commands. However, conventional keyword spotting neural networks define a fixed set of keywords during training, which users cannot freely modify, making them inadequate for meeting this demand. While Large Vocabulary Continuous Speech Recognition (LVCSR) neural networks can recognize nearly all user-defined keywords, their storage requirements are excessively large. Few-shot open-set keyword spotting, which only requires users to provide a few examples of voice commands for recognition, has become an ideal solution to this problem. However, previous metric-based few-shot models suffer from prototypes that do not accurately represent their corresponding classes. In this paper, we explore several methods to address this issue, evaluate them on the Google Speech Commands dataset, and achieve state-of-the-art accuracy.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	17	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....