解析複雜探索式問答與大型語言模型代理的問題分解、自我修正與提示優化機制比較分析;A Comparative Analysis of Question Decomposition, Self-Correction, and Prompt Optimization Mechanisms in Large Language Model Agents for Complex Exploratory Question Answering

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98211

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98211

題名:	解析複雜探索式問答與大型語言模型代理的問題分解、自我修正與提示優化機制比較分析;A Comparative Analysis of Question Decomposition, Self-Correction, and Prompt Optimization Mechanisms in Large Language Model Agents for Complex Exploratory Question Answering
作者:	張金騏;Zhang, Jin-Qi
貢獻者:	資訊工程學系
關鍵詞:	大型語言模型;大型語言模型代理;問答任務;後設認知;提示優化
日期:	2025-07-01
上傳時間:	2025-10-17 12:29:51 (UTC+8)
出版者:	國立中央大學
摘要:	為應對含糊、開放且觀點多元的複雜探索式問答（CEQA）的挑戰，本研究聚焦於大型語言模型代理（LLM Agents）的架構設計與行為表現。本研究提出了一個名為CQ Solver的新型代理框架。其核心在於將問題空間建模為動態演化的有向無環圖（DAG），並在推理循環中整合明確的後設認知（Metacognition）操作。為評估其效能，本研究將其與基線模型ReAct及僅具問題分解能力的 Q-Decomp ReAct 進行了系統性比較。實驗證實，明確的問題分解策略能帶來全面且顯著的改善。而本研究的核心發現是，後設認知機制引入了一種效能權衡：在強大模型GPT-4o上，它能顯著提升答案的技術品質（如事實正確性），卻也犧牲了使用者滿意度等指標。此效益在較弱模型上則不明顯甚至產生反效果，凸顯其對模型能力的依賴性。針對開源模型，本研究亦驗證了提示優化能帶來統計上顯著的微幅改善。此外，本研究最重要的發現來自於對代理行為的深入分析，揭示了代理的工具使用頻率與答案品質間存在顯著的負相關。此結果挑戰了傳統直覺，表明代理頻繁地與環境互動，並非「思考周全」的體現，而更可能是其陷入「認知掙扎」的訊號。高品質的回答，往往源於少量但策略精準的關鍵行動。本研究不僅提出一套能操作化後設認知機制的代理設計（CQ Solver），也提供對於大型語言模型代理行為與推理歷程的全新詮釋視角，為未來設計具備自主調節能力的語言代理奠定基礎。;To address the challenges of Complex Exploratory Question Answering (CEQA)—characterized by ambiguity, openness, and multi-perspective demands—this study investigates the architectural design and behavioral dynamics of Large Language Model (LLM) agents. We propose CQ Solver, a novel agent framework that models the problem space as a dynamically evolving Directed Acyclic Graph (DAG) and incorporates explicit metacognitive operations into its reasoning loop. To evaluate its effectiveness, we systematically compare CQ Solver against two baselines: ReAct and its decomposition-enhanced variant, Q-Decomp ReAct. Experimental results confirm that explicit question decomposition brings significant and broad improvements. However, the core finding of this study is the performance trade-off introduced by metacognition: on powerful models like GPT-4o, metacognitive mechanisms improve technical answer quality (e.g., factuality), but at the cost of lower user satisfaction. This effect diminishes—or even reverses—on weaker models, highlighting the dependence of metacognition on model capacity. Additionally, prompt refinement yields statistically significant but modest gains for open-source models. Most notably, our behavioral analysis reveals a negative correlation between tool usage frequency and answer quality. This challenges common assumptions: frequent interactions with the environment do not indicate thorough reasoning, but rather suggest a form of cognitive struggle. In contrast, high-quality answers often stem from fewer but strategically focused actions. By introducing a metacognitively grounded agent architecture and a fresh behavioral perspective on reasoning efficiency, this work lays the groundwork for future LLM agents capable of greater autonomy and self-regulation.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	15	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....