| 摘要: | 傳統的視覺化導向自然語言互動介面(V-NLI)主要依賴下拉式選單來產生圖表,這種方式相較於自然語言輸入不夠直觀。因此,近年來越來越多研究開始引入大型語言模型(LLM),透過自然語言輸入來生成視覺化圖形。儘管 LLM 在視覺化領域表現出色,但隨著模型複雜度提升, LLM 對 GPU 與硬體資源的需求也隨之增加,使得本地部署變得困難,這促使輕量化的小型語言模型(SLM)成為替代方案。
雖然 SLM 具備輕量化優勢,但在生成視覺化圖形時仍面臨諸多挑戰,例如產生的圖形參數不符合使用者需求,或在相同輸入下,輸出結果卻不一致等問題。為了解決這些問題,我們設計一個以 Agent 為核心的流程架構,負責接收 SLM 分別生成的工具名稱與參數,組成完整的工具調用(Tool Call),並據此執行對應的視覺化工具操作,包括前端圖表更新、後端資料查詢,以及處理輸入中的資訊缺漏與非視覺化相關請求,進一步提升互動流程的彈性與穩定性。
在實驗中,我們採用公開的 ChartGPT 資料集進行驗證,並以常用於機器翻譯評估的兩項指標——ROUGE-L 與 BLEU——來衡量參數生成的準確性。實驗結果顯示,微調後的 Phi-3.5 模型在參數生成準確性上有明顯提升,其中 ROUGE-L 較目前以小型語言模型為基礎的視覺化系統 ChartGPT 提高約 1\%,而 BLEU 表現則相當。進一步與三個未經視覺化任務微調的一般化語言模型(xLAM、LLaMA3.2 與 Qwen2.5)相比,我們的模型在 ROUGE-L 指標上平均提升約 78\%,在 BLEU 指標上則平均提升約 48\%。此外,在使用私有的 6G 資料集進行測試時,即使面對使用者輸入不完整的情境,系統仍能生成符合需求的視覺化圖表,展現出良好的強健性(robustness)。;Traditional visualization-oriented natural language interfaces (V-NLIs) primarily rely on drop-down menus to generate charts, which are less intuitive than natural language input. As a result, recent research has increasingly incorporated large language models (LLMs) to generate visualizations through natural language commands. While LLMs have demonstrated strong performance in visualization, their growing hardware and GPU resource demands have made local deployment challenging. The situation has led to the adoption of lightweight small language models (SLMs) as an alternative solution.
Although SLMs offer the advantage of lightweight deployment, they still face several challenges in generating visualizations—such as producing chart parameters that do not meet user expectations or yielding inconsistent outputs for the same input. To address these issues, we propose a framework centered around an agent, which receives the tool name and parameters separately generated by the SLM, assembles them into a complete tool call format (also known as a function call), and executes the corresponding visualization tool operations. These operations include frontend chart updates, backend data queries, and handling missing input information or non-visualization-related requests, thereby enhancing the flexibility and stability of the overall interaction process.
In our experiments, we used the publicly available ChartGPT dataset for evaluation. We adopted two commonly used metrics in machine translation—ROUGE-L and BLEU—to assess the accuracy of parameter generation. The results show that the fine-tuned Phi-3.5 model significantly improves parameter generation accuracy. Specifically, it outperforms the existing visualization system ChartGPT based on a small language model by approximately 1\% in ROUGE-L, with comparable performance in BLEU. Furthermore, compared to three general-purpose language models that were not fine-tuned for visualization tasks (xLAM, LLaMA3.2, and Qwen2.5), our model achieves an average improvement of approximately 78\% in ROUGE-L and 48\% in BLEU. In addition, tests on a private 6G dataset demonstrate that the system can still generate appropriate visualization charts even when faced with incomplete user input, showcasing strong robustness. |