近年來,越來越多的裝置與系統需要支援使用者自定語音控制的功能。然而,普通的關鍵詞偵測(Keyword Spotting)神經網路,因為其辨識的詞彙是在訓練前先設定好的、無法由使用者隨意更換,已然無法滿足此需求。若是使用大規模詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)模型,雖然幾乎能辨識所有使用者自訂的語音命令,但其所需的存儲空間過於龐大。因此,小樣本學習(Few-Shot Learning)關鍵詞偵測模型成為解決該問題的理想選擇,過往基於度量學習(Metric Learning)的方法有著原型(prototype)無法很好地代表類別的問題,我們在本論文中設計了幾種解決此問題的模型架構,在Google Speech Commands (GSC)資料集上評估並達到了state-of-the-art的表現。;In recent years, an increasing number of devices and systems have required support for user-defined voice commands. However, conventional keyword spotting neural networks define a fixed set of keywords during training, which users cannot freely modify, making them inadequate for meeting this demand. While Large Vocabulary Continuous Speech Recognition (LVCSR) neural networks can recognize nearly all user-defined keywords, their storage requirements are excessively large. Few-shot open-set keyword spotting, which only requires users to provide a few examples of voice commands for recognition, has become an ideal solution to this problem. However, previous metric-based few-shot models suffer from prototypes that do not accurately represent their corresponding classes. In this paper, we explore several methods to address this issue, evaluate them on the Google Speech Commands dataset, and achieve state-of-the-art accuracy.