透過整合多模態資訊改進使用者意圖分類任務

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：3.128.200.205

姓名

張博皓(Bo-Hao Chang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

透過整合多模態資訊改進使用者意圖分類任務
(Incorporate Multi-modal Context for Improving User Intent Classification Work)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究	★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索	★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現	★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法
★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結	★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data
★ 藉由加入多重語音辨識結果來改善對話狀態追蹤	★ 對話系統應用於中文線上客服助理:以電信領域為例
★ 應用遞歸神經網路於適當的時機回答問題	★ 使用多任務學習改善使用者意圖分類
★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法	★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-6-30以後開放)

摘要(中)

現今，問答系統在自然語言處理相關連的任一領域都非常流行。如餐廳、交通航站往往都有屬於自己的問答系統或是問答機器人。在此篇研究中，我們提出了一個去建立幫助人們完成導向性任務的多模態問答系統之工作流程。我們透過自己設計的一項機器人組裝任務來展示問答系統，其會嘗試去解決人們在組裝任務中所遇到的問題。許多領域中的任務都存在有常見問答集(FAQ)，如客服系統或是產品維修手冊。在此篇研究中，我們展示了一套得已善用常見問答集知識的工作流程去建立問答系統，其中包含資料搜集、意圖定義與意圖分類等工作。此外，我們也引入多模態的架構來解決傳統單模態系統所遇到的瓶頸。在實驗結果中顯示，透過結合文字與影像的資訊，我們得已提升意圖分類任務的效果。在應用層面，這套工作流程可以遷移到許多類似的任務中，我們期待這對於智慧製造領域能有所貢獻。

摘要(英)

Now days, Question Answering system becomes popular for any purpose in the field ofNatural Language Processing. Some researchers develop QA system for restaurant, bus station,and many more. In this research, we propose the workflow to build a Multimodal QuestionsAnsweringSystemthathelphumantocompleteaninstructiongivingtasks. Wedemonstrateourwork on Meccanoid, a personal robot developed by SPIN MASTER. When the user encounterproblem in assembly of Meccanoid, they will ask our system and our system will provide thebest guide as the solution for the problem.An FAQ is a list of frequently asked questions (FAQs) and answers on a particular topic.This term is always mentioned in customer service or in a product operation manual. In this pa-per, we present a complete workflow that how to transfer the knowledge of an FAQ to constructa question answering system in the task of Meccanoid robot assembly, including the methodof data collection, user intent definition and classification. Furthermore, we introduce a multi-modal architecture for solving the bottleneck which the traditional single-modality system mayencounter. The experimental results show that the combination of visual and textual contextenhance the performance of intent classification work. The workflow we proposed should beable to generalize to other domains depended on the requester’s demand, hopefully adapt on smart manufacturing field.

關鍵字(中)

★ 多模態
★ 自然語言處理
★ 多分類任務

關鍵字(英)

★ Multimodality
★ Natural Language Processing
★ Multi-class classification

論文目次

摘要 ii
Abstract iii
List of Figures viii
List of Tables ix
1 Introduction 1
1.1 Background .................................................................... 1
1.2 Knowledge transfer from FAQ.................................................. 2
1.3 Meccanoid final assembly work ................................................ 3
1.4 Multimodality.................................................................. 5
1.5 Organization ................................................................... 5
2 Related Work 6
2.1 Overview of classification tasks ................................................ 6
2.2 Auto-encoder................................................................... 7
2.3 Multimodality.................................................................. 8
2.4 Object recognition.............................................................. 9
3 Method 11
3.1 Task Definition................................................................. 11 3.1.1 Intent Classification..................................................... 11
3.1.2 Multimodal Intent Classification ........................................ 13
3.2 Data Collection................................................................. 13
3.2.1 Pre-Data Collection: Wizard-of-Oz Pilot Study.......................... 13
3.2.2 User Question Collection ............................................... 15
3.3 Question Reformulation-based intent classifier.................................. 19
3.3.1 Word Embedding Layer................................................. 20
3.3.2 BERT................................................................... 20
3.3.3 Question Reformulation(QR)............................................ 21
3.3.4 Classifier ............................................................... 21
3.4 Visual Stage Discriminative Model (V-SDM) ................................... 22
3.4.1 Visual data collection & Training........................................ 23 3.5 Visual & Textual Sensitive Intent Classifier..................................... 24
4 Result 26
4.1 Meccanoid Multimodal Dataset................................................. 26
4.2 Evaluation Methodologies...................................................... 27
4.2.1 Accuracy and Confusion Matrix......................................... 27
4.2.2 Mean Reciprocal Rank (MRR).......................................... 28
4.3 Experiment Result.............................................................. 29
4.3.1 Intent Classification (13 class vs. 21 class) .............................. 29
4.3.2 Multimodal Intent Classification (21 class).............................. 30
4.3.3 Error Analysis and Discussion .......................................... 31
5 Conclusion 33
Bibliography35

參考文獻

[1] Redmon, Joseph, et al. ”You only look once: Unified, real-time object detection.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[2] Redmon, Joseph, and Ali Farhadi. ”YOLO9000: better, faster, stronger.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3] Everingham, Mark, et al. ”The pascal visual object classes (voc) challenge.” International journal of computer vision 88.2 (2010): 303-338.
[4] Liu, Wei, et al. ”Ssd: Single shot multibox detector.” European conference on computer vision. Springer, Cham, 2016.
[5] Rumelhart,DavidE.,GeoffreyE.Hinton,andRonaldJ.Williams.Learninginternalrepresentations by error propagation. No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[6] Elman, Jeffrey L. ”Finding structure in time.” Cognitive science 14.2 (1990): 179-211.
[7] Jordan,MichaelI.”Serialorder:Aparalleldistributedprocessingapproach.”Advancesinpsychol- ogy. Vol. 121. North-Holland, 1997. 471-495.
[8] Mozer,MichaelC.”Afocusedbackpropagationalgorithmfortemporal.”Backpropagation:Theory, architectures, and applications 137 (1995).
[9] Hochreiter, Sepp, and Jürgen Schmidhuber. ”Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.
[10] Mikolov,Tomas,etal.”Efficientestimationofwordrepresentationsinvectorspace.”arXivpreprint arXiv:1301.3781 (2013).
[11] Asuncion, Arthur, and David Newman. ”UCI machine learning repository.” (2007).

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2019-7-23

推文