開發與評估大型語言模型驅動之臺灣電影產業問答系統：應對專業領域知識散佈之挑戰

、線上人數：43

、訪客IP：18.188.23.110

姓名	郭恩淳(En-Chun Kuo) 查詢紙本館藏	畢業系所	資訊管理學系
論文名稱	開發與評估大型語言模型驅動之臺灣電影產業問答系統：應對專業領域知識散佈之挑戰 (Developing and Evaluating a Large Language Model-Powered QA System for Taiwan′s Film Industry: Addressing the Challenge of Dispersed Knowledge)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載] 本電子論文使用權限為同意立即開放。已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。
摘要(中)	世界上各個專業領域一直存在著知識散落各處的問題，導致人們在搜尋資料過程耗時，抑或容易找到不精確資料；為了解決專業領域知識不集中等前述問題，此研究以臺灣電影產業為例深入探究，為加速在電影領域工作者或一般民眾查找臺灣電影領域資訊的速度及提高方便性，本研究開發一臺灣電影產業問答系統，運用現今已發展至成熟程度的自然語言處理（NLP）技術及檢索增強生成（RAG）技術，並透過 LangChain 開源框架實現此系統，目的是為幫助相關人士有效地提取相關資訊，同時降低使用現今大眾常使用的生成式AI問答機器人（如：ChatGPT）可能導致的資料洩漏風險。本研究進行了四項實驗，評估此系統與通用型的生成式AI問答機器及商業付費型 RAG 工具的性能表現，結果顯示，本系統在臺灣電影產業領域問題的回答準確度上具有明顯優勢，達到60%以上的準確率。此項研究的研究重要性包括：（1）為台灣首個針對電影產業並使用繁體中文的專屬問答系統雛形，（2）使電影產業專業人士能夠透過簡單的查詢快速獲取可靠答案，從而提升決策效率，並降低資料洩漏風險，（3）讓想涉略各專業領域的人們可以參考此方式，藉以降低搜索資料的困難。本研究同時也討論專業領域問答系統的重要性及結合不斷推陳出新的生成式AI技術，希冀未來能夠應用到其他專業領域或是結合更多型態的資料來源，藉以豐富並強化此臺灣電影領域問答系統。
摘要(英)	To address the challenges posed by dispersed domain knowledge in the Taiwanese movie industry, this research introduces a specialized Question Answering (QA) system. The system integrates advancements in Natural Language Processing (NLP) and Retrieval-Augmented Generation (RAG) technology through open-source platforms LangChain. It is designed to help industry professionals efficiently extract relevant information while minimizing the risk of data leakage associated with general-purpose chatbots. This study conducted four experiments to evaluate the system′s performance against widely used AI chatbots and a commercial RAG tool. The results showed our model′s superior accuracy, achieving over 60% in domain-specific queries, surpassing both generative AI chatbots and the commercial RAG tool. Research significance include: (1) developing Taiwan′s first specialized question-answering system for the film industry using Traditional Chinese, (2) enabling film professionals to quickly access reliable information, enhancing decision-making and reducing data leakage risks, and (3) offering a reference for individuals in various fields to ease information searches. The study also highlights the value of domain-specific QA systems and their integration with evolving generative AI technologies. Future applications may extend to other fields or include more diverse data sources, further strengthening this system for the Taiwanese film industry.
關鍵字(中)	★ 生成式AI ★ 自然語言處理 ★ 檢索增強生成 ★ LangChain開源框架 ★ 臺灣電影產業 ★ 專業領域問答系統	關鍵字(英)	★ Generative AI ★ Retrieval-Augmented Generation (RAG) ★ Large Language Model (LLM) ★ LangChain ★ Question Answering System ★ Taiwan Movie Industry
論文目次	CHINESE ABSTRACT i ENGLISH ABSTRACT ii ACKNOWLEDGEMENT iii TABLE OF CONTENTS iv FIGURES vi TABLES viii 1. INTRODUCTION 1 1.1. Research Background and Motivation 1 1.2. Research Question 7 1.3. Research Purpose 8 1.4. Research Importance 8 1.5. Following Structure 9 2. RELATED WORKS 11 2.1. The Movie Industry in Taiwan 11 2.2. The Evolutionary QA System 13 2.3. The Emergence of LLMs and Generative AI 16 2.4. RAG Models and Domain-Specific QA systems 19 2.5. Challenges in Implementing Generative AI 24 2.6. Assessing the Correctness of QA Systems 28 3. FRAMEWORK AND IMPLEMENTATION OF TWMovQA SYSTEM 31 3.1. Framework of the TWMovQA System 31 3.2. Creating Movie QA System Based on Taiwanese Film Industry 39 3.3. Foundation of TWMovQA System Movie-related Question-Answer Sets. 40 4. EXPERIMENTAL DESIGN AND PERFORMANCE METRIC OF TWMovQA SYSTEM 44 v 4.1. Experiments Structure 44 4.2. Detailed Experiment Descriptions 49 4.3. Evaluation Process and Performance Metric 55 5. EXPERIMENTAL RESULTS AND DISCUSSION 62 5.1. Experiment I: Evaluation of Knowledge Extraction from Diverse Articles 62 5.2. Experiment II: Assessing QA System Performance with Increasing Text Inputs 84 5.3. Experiment III: Assessing Knowledge Extraction Across Texts 89 5.4. Experiment IV: Evaluating Hallucination in Movie QA System 93 5.5. Discussion of TWMovQA System Implementation . 96 5.6. Discussion of Evaluation Methods and Experiments Results 98 6. CONCLUSIONS . 107 6.1. Research Results: The TWMovQA System 107 6.2. Managerial Implications 108 6.3. Current Limitation 109 6.4. Future Work 111 REFERENCES 113 APPENDIX 118 I. The Local Documents Utilized in the Experiments 118 II. The Sample of Question Set 118 III. Detailed Experiment Result for Experiment I: Phase Two 119 IV. The Unrelated Question Set of Experiment IV 122
參考文獻	[1] B. F. Green, A. K. Wolf, C. Chomsky, and K. Laughery, “Baseball: an automatic question-answerer,” in Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, 1961, pp. 219–224. [2] W. A. Woods, "Progress in natural language understanding: an application to lunar geology." in Proceedings of the June 4-8, 1973, national computer conference and exposition, 1973, pp. 441-450. [3] B. Katz, G. C. Borchardt, and S. Felshin, "Natural language annotations for question answering," in Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference (FLAIRS), vol. 6, May 2006, pp. 303–306. [4] B. Katz, J. Lin, and S. Felshin. "Annotating the world wide web." in MIT Artificial Intelligence Laboratory Research Abstracts (this volume), Sep. 2001. [5] D. Ferrucci, E. Brown, J. ChuCarroll, J. Fan, D. Gondek, A. A. Kalyanpur, et al., "Building Watson: an overview of the DeepQA project," AI Magazine, vol. 31, no. 3, pp. 59–79, Sep. 2010. [6] T. Joachims, “Text categorization with Support Vector Machines: learning with many relevant features,” in Machine Learning: ECML–98, vol. 1398, 1998, pp.137–142. [7] U. Shaham, T. Zahavy, C. Caraballo, S. Mahajan, D. Massey, and H. Krumholz, "Learning to ask medical questions using reinforcement learning," in Proceedings of the Machine Learning for Healthcare Conference, Sep. 2020, pp. 2–26. [8] S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint, Jun. 15, 2017, arXiv:1706.05098. doi: 10.48550/arXiv.1706.05098. [9] I. Sutskever,“Sequence to sequence learning with neural networks,” arXiv preprint, Dec. 14, 2014, arXiv:1409.3215. doi: 10.48550/arXiv.1409.3215. [10] A. Vaswani, "Attention is all you need," in Advances in Neural Information Processing Systems (NeurIPS), 2017. [11] M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler, "MovieQA: understanding stories in movies through question-answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4631–4640. [12] O. Tafjord and P. Clark, "General-purpose question-answering with Macaw,"arXiv preprint, arXiv:2109.02593, 2021. [13] Devlin, M.W. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, vol. 1, Jun 2019, pp. 1–2. [14] T. B. Brown, "Language models are few-shot learners," arXiv preprint,” arXiv:2005.14165, 2020. [15] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," [Online]. Available: https://www.openai.com/research/. [accessed: Mar.2024]. [16] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, et al., "GPT-4 technical report," arXiv preprint, arXiv:2303.08774, 2023. [17] Gemini Team, R. Anil, S. Borgeaud, J.B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, et al., "Gemini: a family of highly capable multimodal models," arXiv preprint, arXiv:2312.11805, 2023. [18] Anthropic, "The Claude 3 model family: Opus, Sonnet, Haiku," [Online]. Available: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf. [accessed: Jul. 2024]. [19] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, et al., "Llama 2: open foundation and fine-tuned chat models," arXiv preprint, arXiv:2307.09288, 2023. [20] T. Le Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili?, D. Hesslow, R. Castagne, et al., "BLOOM: A 176B-parameter open-access multilingual language model,"arXiv preprint, arXiv: 2211.05100, 2023. [21] H. Wang, L. Wang, Y. Du, L. Chen, J. Zhou, Y. Wang, and K. F. Wong, "A survey of the evolution of language model-based dialogue systems," arXiv preprint, arXiv:2311.16789, 2023. [22] S. Yella, “AI-Driven content creation and personalization: revolutionizing digital marketing strategies,” Int. Res. J. Eng. Technol. (IRJET), vol. 11, Jul. 2024. [23] O. Pavlova and A. Kuzmin, "Analysis of artificial intelligence-based systems for automated generation of digital content," 2024. [Online]. Available: https://elar.khmnu.edu.ua/handle/123456789/15916. [accessed: Jul. 2024]. [24] N. Bian, H. Lin, P. Liu, Y. Lu, C. Zhang, B. He, X. Han, and L. Sun, "Influence of external information on large language models mirrors social cognitive patterns," IEEE Trans. Comput. Social Syst.,2024. [25] V. Adlakha, P. BehnamGhader, X. H. Lu, N. Meade, and S. Reddy, "Evaluating correctness and faithfulness of instruction-following models for question answering," arXiv preprint, arXiv:2307.16877, 2023. [26] K. Sparck Jones, "A statistical interpretation of term specificity and its application in retrieval," J. Documentation, vol. 28, no. 1, pp. 11–21, 1972. [27] S. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” Foundations and TrendsR in Information Retrieval, vol. 3, no. 4, 2009, pp. 333–389. [28] V. Karpukhin, B. O?uz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.T. Yih, "Dense passage retrieval for open-domain question answering", in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol.1, 2020. pp. 6769-6781. [29] M. Xia, X. Zhang, C. Couturier, G. Zheng, S. Rajmohan, and V. Ruhle, "Hybrid retrieval-augmented generation for real-time composition assistance," arXiv preprint, arXiv:2308.04215, 2023. [30] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Kuttler, et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Adv. Neural Inf. Process. Syst.,vol. 33, pp. 9459–9474, 2020. [31] S. Jeong, J. Baek, S. Cho, S. J. Hwang, and J. C. Park, “Adaptive-RAG: learning to adapt retrieval-augmented large language models through question complexity,” arXiv preprint, arXiv:2403.14403, 2024. [32] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, et al., "A survey of large language models," arXiv preprint, arXiv:2303.18223, 2023. [33] P. Zhao, H. Zhang, Q. Yu, Z. Wang, Y. Geng, F. Fu, L. Yang, W. Zhang, et al., "Retrieval-augmented generation for AI-generated content: a survey," arXiv preprint, arXiv:2402.19473, 2024. [34] Y. Mao, P. He, X. Liu, Y. Shen, J. Gao, J. Han, and W. Chen, "Generation-augmented retrieval for open-domain question answering," arXiv preprint, arXiv:2009.08553, 2020. [35] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, "BloombergGPT: a large language model for finance," arXiv preprint, arXiv:2303.17564, 2023. [36] R. Liu, C. Zenke, C. Liu, A. Holmes, P. Thornton, and D. J. Malan, "Teaching CS50 with AI: leveraging generative artificial intelligence in computer science education," in Proceedings of the 55th ACM Technical Symposium on Computer Science Education. vol. 1, pp. 750–756, 2024. [37] Z. Xu, M. J. Cruz, M. Guevara, T. Wang, M. Deshpande, X. Wang, and Z. Li, “Retrieval-augmented generation with knowledge graphs for customer service question answering,” in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 2905–2909. [38] K. G. Yager, "Domain-specific chatbots for science using embeddings," Digital Discovery, vol. 2, no. 6, pp. 1850–1861, 2023. [39] H. Zhang, Y. Liu, L. Dong, Y. Huang, Z. H. Ling, Y. Wang, L. Wang, et al., "MoVQA: a benchmark of versatile question-answering for long-form movie understanding," arXiv preprint, arXiv:2312.04817, 2023. [40] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, p. 9, 2019. [41] Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, et al., "Siren′s song in the AI ocean: a survey on hallucination in large language models," arXiv preprint, arXiv:2309.01219, 2023. [42] F. Liu, K. Lin, L. Li, J. Wang, Y. Yacoob, and L. Wang, "Mitigating hallucination in large multi-modal models via robust instruction tuning," in The Twelfth International Conference on Learning Representations, 2023. [43] F. Shi, X. Chen, K. Misra, N. Scales, D. Dohan, E. H. Chi, N. Scharli, and D. Zhou, "Large language models can be easily distracted by irrelevant context," in International Conference on Machine Learning, pp. 31210–31227, 2023. [44] H. Ye, T. Liu, A. Zhang, W. Hua, and W. Jia, "Cognitive mirage: A review of hallucinations in large language models," arXiv preprint, arXiv:2309.06794, 2023. [45] J. Welser, J. W. Pitera, and C. Goldberg, "Future computing hardware for AI," in 2018 IEEE International Electron Devices Meeting (IEDM), pp. 1–3, 2018. [46] J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, et al., "A comprehensive capability analysis of GPT-3 and GPT-3.5 series models," arXiv preprint, arXiv:2303.10420, 2023. doi: 10.48550/arXiv.2303.10420. [47] Menghani, "Efficient deep learning: a survey on making deep learning models smaller, faster, and better," ACM Comput. Surv., vol. 55, no. 12, pp. 1–37, 2023. [48] S. G. Ayyamperumal and L. Ge, "Current state of LLM risks and AI guardrails," arXiv preprint, arXiv:2406.12934, 2024. [49] A. Venkatesh, C. Khatri, A. Ram, F. Guo, R. Gabriel, A. Nagar, R. Prasad, et al., "On evaluating and comparing open domain dialog systems," arXiv preprint, arXiv.1801.03625, 2018. [50] J. Casas, M. O. Tricot, O. Abou Khaled, E. Mugellini, and P. Cudre Mauroux, "Trends & methods in chatbot evaluation," in Companion Publication of the 2020 International Conference on Multimodal Interaction, pp. 280–286, 2020. [51] R. S. Goodman, J. R. Patrinely, C. A. Stone, E. Zimmerman, R. R. Donald, S. S. Chang, S. T. Berkowitz, et al., "Accuracy and reliability of chatbot responses to physician questions," JAMA Network Open, vol. 6, no. 10, pp. e2336483–e2336483, 2023. [52] N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, S. Welleck, et al., "Faith and fate: limits of transformers on compositionality," Adv. Neural Inf. Process. Syst., vol. 36, 2024. [53] T. Cao, N. Raman, D. Dervovic, and C. Tan, "Characterizing multimodal long-form summarization: a case study on financial reports," arXiv preprint, arXiv:2404.06162, 2024 [54] E. C. Kuo, Y. T. Chen, and Y. H. Su," Assembling fragmented domain knowledge: a LLM-powered QA system for Taiwan cinema," in 2024 IEEE Congress on Evolutionary Computation (CEC), Jun. 2024, pp. 1–8
指導教授	陳毓鐸蘇雅惠(Yu-To Chen Yea-Huey Su)	審核日期	2024-11-27
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 111423052 詳細資訊