Optimizing Multi-Hop Question Answering through Post-Retrieval Denoising and Reordering

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/98352

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98352

Title:	Optimizing Multi-Hop Question Answering through Post-Retrieval Denoising and Reordering
Authors:	高璽媛;Kao, Hsi-Yuan
Contributors:	資訊管理學系
Keywords:	多跳問答;檢索增強生成;段落重排序;噪音過濾;大型語言模型;資訊檢索;Multi-Hop Question Answering (QA);Retrieval-Augmented Generation (RAG);Passage Reordering;Noise Filtering;Large Language Models (LLMs);Information Retrieval
Date:	2025-07-24
Issue Date:	2025-10-17 12:40:15 (UTC+8)
Publisher:	國立中央大學
Abstract:	多跳式問答（Multi-hop Question Answering）要求模型從多個相互關聯的證據段落中進行檢索與推理，以得出正確答案。檢索增強生成（Retrieval-Augmented Generation, RAG）架構透過擷取相關文件來提升事實涵蓋率，但往往面臨兩項關鍵缺陷：一是檢索內容中夾雜分散注意力的雜訊，二是缺乏具邏輯性的段落排序。為了解決這些問題，我們提出一個後檢索的兩階段精緻化流程。首先，DenoiseLM 負責過濾無關或具誤導性的段落，以減少幻覺現象，並聚焦模型注意力於有用的證據上；其次，OrderLM 將剩餘段落重新結構為一條連貫的推理鏈，模仿自然的推論步驟。在 MuSiQue 基準資料集上的實驗顯示，單獨使用 DenoiseLM 即可比原始檢索結果提升平均答案 F1 分數 +6.4 點，OrderLM 單獨使用僅貢獻 +0.1 點，而結合 DenoiseLM + OrderLM 的完整流程可達到 +6.9 點的整體提升。值得注意的是，兩個模組的少量範例提示學習（Few-shot In-Context Learning, ICL）變體，在跨領域任務中能與完整微調版本相匹敵甚至超越，提供了一種資源效率高的泛化解決方案。這些結果凸顯出，在多跳問答系統中，對檢索內容進行「相關性過濾」與「邏輯性排序」的結構化處理，是提升模型穩健性與精準度的關鍵。;Multi-hop question answering requires models to retrieve and reason over multiple interdependent evidence passages to arrive at correct answers. Retrieval-augmented generation (RAG) frameworks enhance factual coverage by fetching relevant documents, but they often suffer from two key shortcomings: distracting noise in the retrieved context and lack of logical passage ordering. To address these issues, we introduce a two-stage post-retrieval refinement pipeline. First, DenoiseLM filters out irrelevant or misleading passages, reducing hallucinations, and focusing the model’s attention on useful evidence. Second, OrderLM restructures the remaining passages into a coherent reasoning chain that mirrors the natural inferential steps. Empirical evaluations on the MuSiQue benchmark show that DenoiseLM alone yields an average Answer F1 improvement of +6.4 points over raw retrieval, OrderLM alone contributes only +0.1 point, and the full DenoiseLM + OrderLM cascade achieves a +6.9 point gain. Notably, few-shot in-context learning (ICL) variants of both modules match or exceed fully fine-tuned counterparts on out-of-sample tasks, offering a resource-efficient cross-domain solution. These results underscore the pivotal role of structuring retrieved content, both by relevance and by logical sequence, in bolstering the robustness and precision of multi-hop QA systems.
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	27	View/Open

社群 sharing

Loading...