dc.description.abstract | In Taiwan, surveys are widely used tools in social analysis and academic research, covering a broad range of topics, including national policies, social issues, and technological developments. However, designing effective and targeted surveys is a complex and time-consuming task that requires careful consideration of multiple factors, such as audience, social context, timeliness, and even policy promotion, all of which guide the content of the survey. In light of these challenges, this study aims to leverage Natural Language Processing (NLP) techniques to assist researchers in overcoming obstacles in the survey design process. By integrating Retrieval-Augmented Generation (RAG) technology with the Traditional Chinese-optimized Large Language Model (LLM) TAIDE, we developed a flexible and efficient questionnaire generation system.
First, in collaboration with the Research Center for Survey Research (CSR) at Academia Sinica, we built a comprehensive Traditional Chinese survey dataset, collecting 136 original survey PDFs used in real-world applications. Through optical character recognition (OCR) and image recognition techniques, we extracted 2,252 survey questions and options from the original documents. Additionally, professional survey designers annotated 531 binary topic-question pairs to provide rich contextual information, ensuring that the model could reference and learn from real-world contexts when generating survey questions. Furthermore, we developed a highly adaptive questionnaire generator system capable of dynamically adjusting to different topics, enabling it to generate high-quality questions that meet current research needs without frequent retraining.
To evaluate the quality of the generated questions, we conducted rigorous testing and validation using a combination of LLM-as-a-Judge and Human Evaluation methods. The results indicate that our system significantly outperforms traditional human-designed surveys in terms of topic relevance and helpfulness, enhancing the adaptability, accuracy, and efficiency of survey design.
This study lays a foundation for applying large language models in the field of questionnaire survey design, demonstrating the high quality and potential of RAG technology in questionnaire generation. The findings indicate that RAG can effectively address diverse survey needs, opening up new possibilities for the automation, high performance, and versatile application of questionnaire generation in the future. | en_US |