Browser Agent 效能瓶頸分析與改進挑戰;Browser Agent Performance Bottleneck Analysis and Improvement Challenges

NCUIR > College of Electrical Engineering & Computer Science > Executive Master of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/98216

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98216

Title:	Browser Agent 效能瓶頸分析與改進挑戰;Browser Agent Performance Bottleneck Analysis and Improvement Challenges
Authors:	李德泰;LI, DE-TAI
Contributors:	資訊工程學系在職專班
Keywords:	AI代理;瀏覽器代理;AI Agent;WebVoyager;Browser-use;Browser Agent;RAG
Date:	2025-08-21
Issue Date:	2025-10-17 12:30:19 (UTC+8)
Publisher:	國立中央大學
Abstract:	近年來，大型語言模型（Large Language Models, LLMs）在語言理解、推理與任務執行等方面的能力大幅提升。隨著網頁介面逐漸成為異質資訊系統的統一入口，基於 LLM 的瀏覽器代理（Browser Agent）已成為建構通用智慧代理的重要研究方向。本研究以 2024 年公開的開源專案 WebVoyager 為對象，探討其在真實網站任務執行上的表現。初步實驗顯示，原始模型在面對內容動態變化大、結構複雜或以視覺導向為主的網站時，在理解與執行效率上表現不佳。為提升代理模型的適應能力, 本研究針對 WebVoyager 的核心能力與整體架構提出改良方案，並與另一套主流開源系統 Browser-use 進行比較分析，涵蓋感知能力、思考規劃與執行方法等面向。並採用 WebVoyager 資料集做為評估標準，進行相關任務的實證。此外，本研究導入檢索增強生成（Retrieval-Augmented Generation, RAG）機制，透過本研究建構的輕量級知識文件 (lightweight knowledge texts), 使代理模型在執行任務前能獲取網站結構與功能的基礎知識，進而提升理解能力與操作準確性。實驗結果顯示，加入 RAG 的 WebVoyager 在任務成功率上提升了 8.7%，並在多數測試場景中優於 Browser-use。這些結果驗證了外部知識整合對 LLM 決策品質與瀏覽器代理系統泛化能力的實質助益。 ;In recent years, Large Language Models (LLMs) have demonstrated signifi cant improvements in language understanding, reasoning, and task execution. As web interfaces increasingly serve as unified access points to heterogeneous infor mation systems, LLM-based browser agents have emerged as a crucial direction for building general-purpose intelligent agents. This study focuses on WebVoyager, an open-source project released in 2024, investigating its performance in executing tasks on real-world websites. Prelimi nary experiments reveal that the original model struggles with sites characterized by dynamic content, complex structures, or visually oriented layouts, resulting in inefficiencies in comprehension and execution. To enhance the adaptability of browser agents, this research proposes im provements to both the capabilities and architecture of WebVoyager, and con ducts a comparative analysis with another mainstream open-source system, Browser Use. The comparison covers aspects such as perception, planning, and execution strategies. The evaluation is based on the WebVoyager benchmark dataset and includes empirical testing across relevant tasks. Furthermore, this study integrates a Retrieval-Augmented Generation (RAG) mechanism. By providing lightweight knowledge texts constructed during the experiments, the agent can acquire basic knowledge of website structures and functionalities prior to task execution, thereby improving its comprehension and operational accuracy. Experimental results show that the RAG-enhanced Web Voyager achieves an 8.7% improvement in task success rate and consistently out performs Browser-Use across most test scenarios. These findings demonstrate the practical benefits of external knowledge integration for improving LLM decision quality and the generalization ability of browser agents.
Appears in Collections:	[Executive Master of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	73	View/Open

社群 sharing

Loading...