最後,會將此系統藉由ngrok技術部屬於Line bot上,以實現人機互動介面以及利用Prompt實現的簡易MCP Tool Calling,並能夠通過UI來靈活切換模型。;Manual generation of datasets for language models has long been a labor-intensive task. However, with the rapid evolution of open-source large language models in recent years, more and more researchers have begun leveraging LLMs to assist in dataset generation. Therefore, this study proposes a fully open-source architecture that leverages the Gemma 2-27B model as the core language model. The primary goal is to automate the generation of training datasets for large language models, thereby reducing human effort and improving performance on quantitative evaluation metrics.
This research will explore which training strategies across combinations of fine-tuning and retrieval-augmented generation (RAG) will yield the highest quantitative scores. It will also examine whether incorporating chain-of-thought (CoT) reasoning during generation improves the results. Evaluation will be conducted using cosine similarity and LLM-as-a-judge metrics, and results will be compared against existing public datasets.
Finally, the system will be deployed to a LINE Bot via ngrok, enabling a human-AI interactive interface and a lightweight MCP tool calling mechanism using prompt-based control. The user interface will also support dynamic model switching for flexible operation.