問答系統在自然語言處理中越來越流行,尤其是在智慧工廠的環境中。在這項研究中,我們讓使用者經歷一個多步驟的機器人組裝任務來模擬工廠環境。我們設計了一個互動式對話系統,通過對使用者在組裝過程中的意圖進行分類來處理使用者的問題。為了解?使用者在組裝過程中,相同的問題在不同的步驟中應給出不同的回答,我們的系統將使用者的問題與使用者當前影像相結合,以更好地掌握使用者的問題並分析他們的意圖。此外,我們引入(a) 對話行為分類和(b) 判別當前問句使否與當前步驟相關兩項相任務作為輔助任務來支持我們的使用者意圖分類。透過多模態和多任務學習,提高了使用者意圖分 類的準確率。;Question answering systems are becoming increasingly popular in Natural Language Processing, especially when applied in smart factory settings. In this study, we simulated the manufacturing setting by having user go through a multiple stage robot assembling task. We designed an interactive dialogue system to deal with user question by classifying their intent during the assembly process. To tackle the issue that users required different answers for questions sharing the same intent since they are present in different stages of the work process, our system incorporates user utterances with real time video feed to better situate user questions and analyze their intent. Also, we introduce dialogue act classification and step-independent classification as auxiliary tasks to support our user intent classification. With the combination of multimodality and multitask learning, our proposed model enhances the performance of intent classification.