基於RAG技術於台灣問卷生成之可行性研究;A Feasibility Study on the Application of RAG Technology for Taiwan Questionnaire Generation

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/96298

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/96298

Title:	基於RAG技術於台灣問卷生成之可行性研究;A Feasibility Study on the Application of RAG Technology for Taiwan Questionnaire Generation
Authors:	薛竣祐;Hsueh, Chun-Yu
Contributors:	資訊工程學系
Keywords:	大型語言模型（LLM）;擷取增強生成（RAG）;LLM-as-a-Judge;繁體中文資料集;Large Language Model (LLM);Retrieval-Augmented Generation (RAG);LLM-as-a-Judge;Traditional Chinese Dataset
Date:	2025-01-13
Issue Date:	2025-04-09 17:38:05 (UTC+8)
Publisher:	國立中央大學
Abstract:	在台灣，問卷調查是社會分析與學術研究中經常使用的重要工具，研究主題相當廣泛，包含國家政策、社會議題與科技發展等等。然而，設計高效且具針對性的問卷是一項複雜且耗時的工作，需考量受眾、社會環境、時效性甚至是政策推廣等多重因素，並依此來設計問卷內容。有鑒於此，本研究旨在透過自然語言處理（Natural Language Processing, NLP）的技術來協助人員解決在設計問卷過程中的困難，結合「擷取增強生成」（Retrieval-Augmented Generation, RAG）技術與經繁體中文優化的大型語言模型（Large Language Model, LLM）TAIDE，構建了一個靈活且高效的問卷生成系統。首先，我們與中研院調查研究專題中心合作，建立了一個涵蓋主題廣泛的繁體中文問卷資料集，搜集了曾真實應用的 136 筆原始問卷PDF 檔案，以文字辨識及圖像辨識方法，從原始問卷提取出 2252 筆問卷題目與選項，並且由專業問卷設計人員標注了 531 筆二元主題對問題的資料，以提供豐富的上下文資訊，確保模型在生成問卷問題時能夠參考並學習真實語境。此外，我們開發出一個具高度適應性的問卷生成器系統，使其能夠針對不同主題進行動態調整，無需頻繁的重新訓練便可產生符合當前研究需求的高品質問題。接著，為了評估模型的生成品質，我們採用了 LLM-as-a-Judge 與人工評估（Human Evaluation）相結合的評估方式，進行嚴謹的測試與驗證。結果顯示，我們的系統在生成問題的主題相關性與幫助性上，均顯著優於傳統人工設計的問卷，並提升了問卷設計的適應性、精確度與效率。這項研究為大型語言模型在問卷設計領域的應用奠定了基礎，驗證了 RAG 技術在問卷生成上的高品質表現及發展潛力。結果表明，RAG 能夠有效滿足多樣化的調查需求，為未來問卷生成的自動化、高效能及多元化應用開啟了新的可能性。;In Taiwan, surveys are widely used tools in social analysis and academic research, covering a broad range of topics, including national policies, social issues, and technological developments. However, designing effective and targeted surveys is a complex and time-consuming task that requires careful consideration of multiple factors, such as audience, social context, timeliness, and even policy promotion, all of which guide the content of the survey. In light of these challenges, this study aims to leverage Natural Language Processing (NLP) techniques to assist researchers in overcoming obstacles in the survey design process. By integrating Retrieval-Augmented Generation (RAG) technology with the Traditional Chinese-optimized Large Language Model (LLM) TAIDE, we developed a flexible and efficient questionnaire generation system. First, in collaboration with the Research Center for Survey Research (CSR) at Academia Sinica, we built a comprehensive Traditional Chinese survey dataset, collecting 136 original survey PDFs used in real-world applications. Through optical character recognition (OCR) and image recognition techniques, we extracted 2,252 survey questions and options from the original documents. Additionally, professional survey designers annotated 531 binary topic-question pairs to provide rich contextual information, ensuring that the model could reference and learn from real-world contexts when generating survey questions. Furthermore, we developed a highly adaptive questionnaire generator system capable of dynamically adjusting to different topics, enabling it to generate high-quality questions that meet current research needs without frequent retraining. To evaluate the quality of the generated questions, we conducted rigorous testing and validation using a combination of LLM-as-a-Judge and Human Evaluation methods. The results indicate that our system significantly outperforms traditional human-designed surveys in terms of topic relevance and helpfulness, enhancing the adaptability, accuracy, and efficiency of survey design. This study lays a foundation for applying large language models in the field of questionnaire survey design, demonstrating the high quality and potential of RAG technology in questionnaire generation. The findings indicate that RAG can effectively address diverse survey needs, opening up new possibilities for the automation, high performance, and versatile application of questionnaire generation in the future.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	20	View/Open

社群 sharing

Loading...