藉由加入多重語音辨識結果來改善對話狀態追蹤

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：72

、訪客IP：18.118.27.148

姓名

蕭又誠(Yu-Cheng Hsiao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

藉由加入多重語音辨識結果來改善對話狀態追蹤
(Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究	★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索	★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現	★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法
★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結	★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data
★ 對話系統應用於中文線上客服助理:以電信領域為例	★ 應用遞歸神經網路於適當的時機回答問題
★ 使用多任務學習改善使用者意圖分類	★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法
★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家	★ 使用YMCL模型改善使用者意圖分類成效

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，對話系統的發展改變了人們與電腦交流的方式。過去人們需要透過特定指令或動作才能命令電腦進行動作，而今追求的是電腦可以從對話中理解使用者的意圖，並協助達到使用者目的。相較於純聊天的對話機器人，任務式的對話機器人以完成使用者的任務為主，也因此需要克服的問題相當多。一、系統要能透過自然語言理解來明白使用者的意圖;二、系統需要進行對話管理來決策目前對話的狀態以及下個步驟;三、系統需要產生自然語言的句子回饋給使用者。
而其中對話管理在對話系統中可以算是其中最為困難的課題，能否準確追蹤對話的狀態將會大大影響對話系統的結果。目前語音辨識結果中只有30%的錯誤率，雖然很多都是直接採用最好的語音辨識結果做為輸入來做對話狀態追蹤，但我們的目標是能夠藉由多個語音辨識結果的輸入來有效的改善對話狀態追蹤的準確率，此外還可以有效的允許錯誤的語音輸入結果。
我們將以多個語音辨識結果為輸入，透過強化學習的方式，來決定每一輪對話中需要考慮的語音辨識結果有哪些，在聚合多個結果，根據機率選擇最有可能的作為本輪對話的狀態。而我們的方法可以在測試資料集中達到59.98%的準確率，比只使用最優語音辨識結果的系統要來的好。

摘要(英)

Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users.
Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%.
Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method.

關鍵字(中)

★ 對話系統
★ 自動語音辨識
★ 狀態追蹤
★ 深度學習
★ 強化學習

關鍵字(英)

★ Dialogue system
★ Automatic Speech Recognition
★ State Tracking
★ Deep Learning
★ Reinforcement Learning

論文目次

目錄
摘要 i
Abstract ii
致謝 iii
目錄 iv
附圖目錄 vi
附表目錄 vii
第一章緒論 1
1.1對話系統與對話狀態追蹤 1
1.2研究動機與目的 2
1.3論文架構 3
第二章文獻探討 4
2.1對話狀態追蹤研究 4
2.2 強化學習相關研究 5
第三章實驗資料分析 8
3.1 餐廳領域的Ontology 8
3.2資料集之對話管理 9
3.3 語音辨識結果分析 10
4.1 強化學習模組 13
4.1.1狀態(state) 14
4.1.2 動作(action) 15
4.1.3 獎勵(reward) 17
4.1.4 Experience replay 19
4.1.5 Deep Q network 19
4.2 對話追蹤模組 21
第五章實驗結果與討論 23
5.1 實驗結果與討論 23
5.2 錯誤分析 24
第六章結論與未來研究方向 25
參考文獻 26

參考文獻

參考文獻
1. Henderson, M., B. Thomson, and J.D. Williams. The Second Dialog State Tracking Challenge. in SIGDIAL Conference. 2014.
2. Ren, H., et al. Dialog State Tracking using Conditional Random Fields. in SIGDIAL Conference. 2013.
3. Henderson, M., B. Thomson, and S. Young. Word-based dialog state tracking with recurrent neural networks. in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2014.
4. Mrkšić, N., et al., Multi-domain dialog state tracking using recurrent neural networks. arXiv preprint arXiv:1506.07190, 2015.
5. Henderson, M., B. Thomson, and J.D. Williams. The third dialog state tracking challenge. in Spoken Language Technology Workshop (SLT), 2014 IEEE. 2014. IEEE.
6. Kim, S., et al., The fourth dialog state tracking challenge, in Dialogues with Social Robots. 2017, Springer. p. 435-449.
7. Kim, S., et al. The fifth dialog state tracking challenge. in Spoken Language Technology Workshop (SLT), 2016 IEEE. 2016. IEEE.
8. Watkins, C.J. and P. Dayan, Q-learning. Machine learning, 1992. 8(3-4): p. 279-292.
9. Rummery, G.A. and M. Niranjan, On-line Q-learning using connectionist systems. Vol. 37. 1994: University of Cambridge, Department of Engineering.
10. Peters, J. and S. Schaal. Policy gradient methods for robotics. in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. 2006. IEEE.
11. Peters, J. and S. Schaal, Natural actor-critic. Neurocomputing, 2008. 71(7): p. 1180-1190.
12. Mnih, V., et al., Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
13. Henderson, M., et al. Discriminative spoken language understanding using word confusion networks. in Spoken Language Technology Workshop (SLT), 2012 IEEE. 2012. IEEE.
14. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780.
15. Kingma, D. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
16. Plátek, O., et al., Recurrent Neural Networks for Dialogue State Tracking. arXiv preprint arXiv:1606.08733, 2016.
17. Schaul, T., et al., Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
18. Van Hasselt, H., A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. in AAAI. 2016.
19. Wang, Z., et al., Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581, 2015.

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2018-1-25

推文