![]() |
以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:19 、訪客IP:3.147.59.250
姓名 薛竣祐(Chun-Yu Hsueh) 查詢紙本館藏 畢業系所 資訊工程學系 論文名稱 基於RAG技術於台灣問卷生成之可行性研究
(A Feasibility Study on the Application of RAG Technology for Taiwan Questionnaire Generation)相關論文 檔案 [Endnote RIS 格式]
[Bibtex 格式]
[相關文章]
[文章引用]
[完整記錄]
[館藏目錄]
至系統瀏覽論文 (2026-12-31以後開放)
摘要(中) 在台灣,問卷調查是社會分析與學術研究中經常使用的重要工具,研究主題相當廣泛,包含國家政策、社會議題與科技發展等等。然而,設計高效且具針對性的問卷是一項複雜且耗時的工作,需考量受眾、社會環境、時效性甚至是政策推廣等多重因素,並依此來設計問卷內容。有鑒於此,本研究旨在透過自然語言處理(Natural Language Processing, NLP)的技術來協助人員解決在設計問卷過程中的困難,結合「擷取增強生成」(Retrieval-Augmented Generation, RAG)技術與經繁體中文優化的大型語言模型(Large Language Model, LLM)TAIDE,構建了一個靈活且高效的問卷生成系統。
首先,我們與中研院調查研究專題中心合作,建立了一個涵蓋主題廣泛的繁體中文問卷資料集,搜集了曾真實應用的 136 筆原始問卷PDF 檔案,以文字辨識及圖像辨識方法,從原始問卷提取出 2252 筆問卷題目與選項,並且由專業問卷設計人員標注了 531 筆二元主題對問題的資料,以提供豐富的上下文資訊,確保模型在生成問卷問題時能夠參考並學習真實語境。此外,我們開發出一個具高度適應性的問卷生成器系統,使其能夠針對不同主題進行動態調整,無需頻繁的重新訓練便可產生符合當前研究需求的高品質問題。
接著,為了評估模型的生成品質,我們採用了 LLM-as-a-Judge 與人工評估(Human Evaluation)相結合的評估方式,進行嚴謹的測試與驗證。結果顯示,我們的系統在生成問題的主題相關性與幫助性上,均顯著優於傳統人工設計的問卷,並提升了問卷設計的適應性、精確度與效率。
這項研究為大型語言模型在問卷設計領域的應用奠定了基礎,驗證了 RAG 技術在問卷生成上的高品質表現及發展潛力。結果表明,RAG 能夠有效滿足多樣化的調查需求,為未來問卷生成的自動化、高效能及多元化應用開啟了新的可能性。摘要(英) In Taiwan, surveys are widely used tools in social analysis and academic research, covering a broad range of topics, including national policies, social issues, and technological developments. However, designing effective and targeted surveys is a complex and time-consuming task that requires careful consideration of multiple factors, such as audience, social context, timeliness, and even policy promotion, all of which guide the content of the survey. In light of these challenges, this study aims to leverage Natural Language Processing (NLP) techniques to assist researchers in overcoming obstacles in the survey design process. By integrating Retrieval-Augmented Generation (RAG) technology with the Traditional Chinese-optimized Large Language Model (LLM) TAIDE, we developed a flexible and efficient questionnaire generation system.
First, in collaboration with the Research Center for Survey Research (CSR) at Academia Sinica, we built a comprehensive Traditional Chinese survey dataset, collecting 136 original survey PDFs used in real-world applications. Through optical character recognition (OCR) and image recognition techniques, we extracted 2,252 survey questions and options from the original documents. Additionally, professional survey designers annotated 531 binary topic-question pairs to provide rich contextual information, ensuring that the model could reference and learn from real-world contexts when generating survey questions. Furthermore, we developed a highly adaptive questionnaire generator system capable of dynamically adjusting to different topics, enabling it to generate high-quality questions that meet current research needs without frequent retraining.
To evaluate the quality of the generated questions, we conducted rigorous testing and validation using a combination of LLM-as-a-Judge and Human Evaluation methods. The results indicate that our system significantly outperforms traditional human-designed surveys in terms of topic relevance and helpfulness, enhancing the adaptability, accuracy, and efficiency of survey design.
This study lays a foundation for applying large language models in the field of questionnaire survey design, demonstrating the high quality and potential of RAG technology in questionnaire generation. The findings indicate that RAG can effectively address diverse survey needs, opening up new possibilities for the automation, high performance, and versatile application of questionnaire generation in the future.關鍵字(中) ★ 大型語言模型(LLM)
★ 擷取增強生成(RAG)
★ LLM-as-a-Judge
★ 繁體中文資料集關鍵字(英) ★ Large Language Model (LLM)
★ Retrieval-Augmented Generation (RAG)
★ LLM-as-a-Judge
★ Traditional Chinese Dataset論文目次 中文摘要 i
Abstract iii
誌謝 v
Contents vii
List of Figures x
List of Tables xi
1 Introduction 1
1.1 Motivate 1
1.2 Research Goal 2
2 Related Work 4
2.1 Retrieval-Augmented Generation 4
2.2 LLM-as-a-Judge 5
3 Methodology 6
3.1 Task Description 6
3.2 Architecture 6
3.3 Retrieval Augmented Generation 7
3.4 LLaMA 2 and TAIDE 8
3.5 Evaluation 10
3.5.1 Automatic Evaluation 10
3.5.2 Human Evaluation 12
4 Experiment 16
4.1 Dataset 16
4.2 Experimental Settings 17
4.2.1 RAG Experimental 18
4.2.2 Specific question type Experimental 18
4.2.3 Comparative Experimental 19
4.3 Experimental Results 20
4.3.1 RAG Experimental Results 20
4.3.2 Specific question type Experimental Results 21
4.3.3 Comparative Experimental Results 22
5 Analysis and Discussion 23
5.1 Comparison of Methods and Generated Results 24
5.2 Score Analysis 25
5.2.1 RAG Experimental 25
5.2.2 Specific Question Type Experimental 26
5.2.3 Comparative Experimental 29
5.3 Discussion on the Advantages of RAG 30
6 Conclusion 31
7 Future Work 32
7.1 Enhancing the Quality of RapidOCR Data Extraction 32
7.2 Leveraging TAIDE’s Multi-Turn Dialogue Capabilities 32
7.3 Expanding the Traditional Chinese Dataset 33
A Dataset example 34
A.1 Original PDF Data 34
A.2 Annotated Data 35
B Comparative Analysis with Existing Tools 37
B.1 Strengths 37
B.2 Weaknesses 38
B.3 Case Study 38
C Analysis and Results of Multi-Turn Dialogue Experiment 40
C.1 Background and Objective 40
C.2 Experiment 40
C.2.1 Experiment Design 40
C.2.2 Experiment Results 41
C.3 Analysis and Conclusion 41
C.3.1 Analysis 41
C.3.2 Conclusion 42
Bibliography 43參考文獻 [1] OpenAI, “Gpt-4 technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2303.08774
[2] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open foundation and fine-tuned chat models,” 2023. [Online]. Available: https://arxiv.org/abs/2307.09288
[3] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Kuttler, M. Lewis, W. tau Yih, T. Rocktaschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” 2021. [Online]. Available: https://arxiv.org/abs/2005.11401
[4] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020. [Online]. Available: https://arxiv.org/abs/2005.14165
[5] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, “Realm: Retrieval-augmented language model pre-training,” 2020. [Online]. Available: https://arxiv.org/abs/2002.08909
[6] G. Izacard and E. Grave, “Leveraging passage retrieval with generative models for open domain question answering,” 2021. [Online]. Available: https://arxiv.org/abs/2007.01282
[7] S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. van den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, D. de Las Casas, A. Guy, J. Menick, R. Ring, T. Hennigan, S. Huang, L. Maggiore, C. Jones, A. Cassirer, A. Brock, M. Paganini, G. Irving, O. Vinyals, S. Osindero, K. Simonyan, J. W. Rae, E. Elsen, and L. Sifre, “Improvin language models by retrieving from trillions of tokens,” 2022. [Online]. Available: https://arxiv.org/abs/2112.04426
[8] Z. Zhou, J.-X. Shi, P.-X. Song, X.-W. Yang, Y.-X. Jin, L.-Z. Guo, and Y.-F. Li, “Lawgpt: A chinese legal knowledge-enhanced large language model,” 2024. [Online]. Available: https://arxiv.org/abs/2406.04614
[9] M. Yasunaga, H. Ren, A. Bosselut, P. Liang, and J. Leskovec, “Qa-gnn: Reasoning with language models and knowledge graphs for question answering,” 2022. [Online]. Available: https://arxiv.org/abs/2104.06378
[10] W. Yu, C. Zhu, Z. Zhang, S. Wang, Z. Zhang, Y. Fang, and M. Jiang, “Retrieval augmentation for commonsense reasoning: A unified approach,” 2022. [Online]. Available: https://arxiv.org/abs/2210.12887
[11] R. Likert, “A technique for the measurement of attitudes,” Archives of Psychology, 1932.
[12] R. Team, “Rapid OCR: Ocr toolbox,” https://github.com/RapidAI/RapidOCR, 2021.
[13] 吳志文, 涂妙如, and 周麗端, “母親回應教養與兒童安全依附表現的雙向影響:以臺灣幼兒發展調查資料庫進行交互延宕模式分析,” 當代教育研究季刊 , vol. 31, no. 1, pp. 75–77+79–81+83 108, Mar 2023.
[14] 王嵩音, “臉書使用動機與行為對於社會資本的影響,” 資訊社會研究 , no. 42, pp. 123–150, Jan 2022.
[15] 陳寅真 and 林如萍, “已婚成年子女與父母的代間矛盾類型,” 老年學研究 , no. 1, pp. 140–173, Jan 2022.
[16] J. Bowlby, Attachment and Loss: Vol. 1. Attachment. New York: Basic Books, 1969.
[17] M. D. S. Ainsworth, M. C. Blehar, E. Waters, and S. Wall, Patterns of Attachment: A Psychological Study of the Strange Situation. Hillsdale, NJ: Lawrence Erlbaum Associates, 1978.
[18] 楊孟麗, “利用多層線性模式瞭解題目無反應,” 調查研究—方法與應用期刊 ;Survey Research - Method and Application, no. 12, pp. 59–90, 2002.
[19] 陳 昭 如 and 張 晉 芬, “性 別 差 異 與 不 公 平 的 法 意 識 —— 以 勞 動 待 遇 為 例,” 法學評論 , no. 108, pp. 63–123, 04 2009. [Online]. Available: http://nccur.lib.nccu.edu.tw/handle/140.119/96770
[20] 黃毅志, “台灣地區新職業分類的建構與評估,” 調查研究—方法與應用期刊 ;Survey Research - Method and Application, no. 5, pp. 5–36, 1998.
[21] 顏郁玲 and 陸偉明, “學校教師因素對國三生數學成就之跨層次影響:考量參與數學補習的國中生個人因素,” 清華教育學報 , vol. 40, no. 1, pp. 1–43, Jun 2023.指導教授 蔡宗翰 審核日期 2025-1-13 推文 plurk
funp
live
udn
HD
myshare
netvibes
friend
youpush
delicious
baidu
網路書籤 Google bookmarks
del.icio.us
hemidemi
myshare