藉由加入多重語音辨識結果來改善對話狀態追蹤;Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/75937

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/75937

題名:	藉由加入多重語音辨識結果來改善對話狀態追蹤;Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
作者:	蕭又誠;Hsiao, Yu-Cheng
貢獻者:	資訊工程學系
關鍵詞:	對話系統;自動語音辨識;狀態追蹤;深度學習;強化學習;Dialogue system;Automatic Speech Recognition;State Tracking;Deep Learning;Reinforcement Learning
日期:	2018-01-25
上傳時間:	2018-04-13 11:23:00 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，對話系統的發展改變了人們與電腦交流的方式。過去人們需要透過特定指令或動作才能命令電腦進行動作，而今追求的是電腦可以從對話中理解使用者的意圖，並協助達到使用者目的。相較於純聊天的對話機器人，任務式的對話機器人以完成使用者的任務為主，也因此需要克服的問題相當多。一、系統要能透過自然語言理解來明白使用者的意圖;二、系統需要進行對話管理來決策目前對話的狀態以及下個步驟;三、系統需要產生自然語言的句子回饋給使用者。而其中對話管理在對話系統中可以算是其中最為困難的課題，能否準確追蹤對話的狀態將會大大影響對話系統的結果。目前語音辨識結果中只有30%的錯誤率，雖然很多都是直接採用最好的語音辨識結果做為輸入來做對話狀態追蹤，但我們的目標是能夠藉由多個語音辨識結果的輸入來有效的改善對話狀態追蹤的準確率，此外還可以有效的允許錯誤的語音輸入結果。我們將以多個語音辨識結果為輸入，透過強化學習的方式，來決定每一輪對話中需要考慮的語音辨識結果有哪些，在聚合多個結果，根據機率選擇最有可能的作為本輪對話的狀態。而我們的方法可以在測試資料集中達到59.98%的準確率，比只使用最優語音辨識結果的系統要來的好。 ;Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users. Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%. Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	214	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....