通過語音命令修正實現對話式用戶界面;Toward Conversational User Interface via Voice Command Correction

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/98329

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98329

Title:	通過語音命令修正實現對話式用戶界面;Toward Conversational User Interface via Voice Command Correction
Authors:	丁仕杰;Ding, Shi-Jie
Contributors:	資訊工程學系
Keywords:	語音辨識;錯誤修正;語音指令;自動修正模組;中文自然語言處理;Automatic Speech Recognition;Error Correction;Voice Command;Automatic Correction System;Spelling error Correction
Date:	2025-07-25
Issue Date:	2025-10-17 12:38:23 (UTC+8)
Publisher:	國立中央大學
Abstract:	近年來，人工智慧技術迅速發展，語音辨識（ASR）技術亦有顯著進展，並廣泛應用於對話系統、智慧家電與語音助理等日常場景。然而，ASR 在實際應用中仍常出現錯誤，特別容易受到發音差異與同音異字等因素影響，導致辨識結果與原意不符，例如「這個程式很棒」被誤辨為「這個城市很棒」。以往的研究大多著重於自動錯誤修正，雖具一定成效，但對於如人名等專有名詞的修正仍存在挑戰。為此，本研究提出一套基於語音指令的語音辨識錯誤修正系統，允許使用者透過語音下達「新增」、「刪除」與「修改」等自然語言指令，達到精確修正辨識結果、減少鍵盤輸入的目的。本系統包含三大核心模組：1. 輸入分類器，用以判斷語音輸入為敘述或指令；2. 指令分類器，辨別指令所屬類型；3. 指令標註器，標記錯誤位置及對應修改內容。為訓練上述模組，我們採用 SIGHAN-15 與 zh-tw-wikipedia 語料，並以 TTS 與 ASR 技術模擬錯誤，再利用大型語言模型與中文部件結合常用字詞生成自然指令，模擬真實使用情境下的修正方式。實驗結果顯示，原先的兩個模型在各自的資料集上皆能正確修正超過 80% 的錯誤句子，展現出良好的準確性與容錯能力。我們也嘗試將兩個資料集進行混合，並訓練出 Model-Mix 模型，其在整體表現上亦具備穩定且優異的修正能力。此外，我們將系統建置為 API 形式，提供其他語音辨識應用串接使用，並持續蒐集實際指令資料以優化模型。我們亦將大型語言模型導入系統，以提升指令理解能力並擴展系統的應用範圍，並測試 LLM 使否能理解修改指令。綜上所述，本研究提出一套創新且具實用性的語音辨識錯誤修正流程，不僅有效解決自動修正機制的限制，也顯著降低使用者的手動輸入成本。;Recent advances in AI have improved ASR performance, enabling its widespread use in dialogue systems and smart devices. However, real- world ASR still struggles with errors caused by pronunciation variations and homophones. To address limitations in prior automatic correction methods— especially with proper nouns and user-specific terms—we propose a speech-command-based ASR correction system. It allows users to is- sue natural language voice instructions to refine recognition results and reduce manual input. The system consists of three modules: an input classifier to detect commands, a command classifier to determine instruction type, and a command labeler to locate correction targets. We train these modules using data from SIGHAN-15 and zh-tw-wikipedia, simulate ASR errors via TTS/ASR, and generate realistic correction commands using LLMs and linguistic features. Experiments show that the original models each achieved over 80% correction accuracy, and a combined model maintained strong, stable performance. The system is deployed as an API for integration with ASR applications, with real user data continuously collected for opti- mization. LLMs are also integrated to enhance instruction understand- ing and expand application scope. In summary, our method provides a practical, flexible ASR correc- tion workflow that reduces user effort and improves correction precision.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	29	View/Open

社群 sharing

Loading...