摘要(英) |
Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users.
Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%.
Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method.
|
參考文獻 |
參考文獻
1. Henderson, M., B. Thomson, and J.D. Williams. The Second Dialog State Tracking Challenge. in SIGDIAL Conference. 2014.
2. Ren, H., et al. Dialog State Tracking using Conditional Random Fields. in SIGDIAL Conference. 2013.
3. Henderson, M., B. Thomson, and S. Young. Word-based dialog state tracking with recurrent neural networks. in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2014.
4. Mrkšić, N., et al., Multi-domain dialog state tracking using recurrent neural networks. arXiv preprint arXiv:1506.07190, 2015.
5. Henderson, M., B. Thomson, and J.D. Williams. The third dialog state tracking challenge. in Spoken Language Technology Workshop (SLT), 2014 IEEE. 2014. IEEE.
6. Kim, S., et al., The fourth dialog state tracking challenge, in Dialogues with Social Robots. 2017, Springer. p. 435-449.
7. Kim, S., et al. The fifth dialog state tracking challenge. in Spoken Language Technology Workshop (SLT), 2016 IEEE. 2016. IEEE.
8. Watkins, C.J. and P. Dayan, Q-learning. Machine learning, 1992. 8(3-4): p. 279-292.
9. Rummery, G.A. and M. Niranjan, On-line Q-learning using connectionist systems. Vol. 37. 1994: University of Cambridge, Department of Engineering.
10. Peters, J. and S. Schaal. Policy gradient methods for robotics. in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. 2006. IEEE.
11. Peters, J. and S. Schaal, Natural actor-critic. Neurocomputing, 2008. 71(7): p. 1180-1190.
12. Mnih, V., et al., Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
13. Henderson, M., et al. Discriminative spoken language understanding using word confusion networks. in Spoken Language Technology Workshop (SLT), 2012 IEEE. 2012. IEEE.
14. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780.
15. Kingma, D. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
16. Plátek, O., et al., Recurrent Neural Networks for Dialogue State Tracking. arXiv preprint arXiv:1606.08733, 2016.
17. Schaul, T., et al., Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
18. Van Hasselt, H., A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. in AAAI. 2016.
19. Wang, Z., et al., Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581, 2015.
|