多跳式問答(Multi-hop Question Answering)要求模型從多個相互關聯的證據段落中進行檢索與推理,以得出正確答案。檢索增強生成(Retrieval-Augmented Generation, RAG)架構透過擷取相關文件來提升事實涵蓋率,但往往面臨兩項關鍵缺陷:一是檢索內容中夾雜分散注意力的雜訊,二是缺乏具邏輯性的段落排序。為了解決這些問題,我們提出一個後檢索的兩階段精緻化流程。首先,DenoiseLM 負責過濾無關或具誤導性的段落,以減少幻覺現象,並聚焦模型注意力於有用的證據上;其次,OrderLM 將剩餘段落重新結構為一條連貫的推理鏈,模仿自然的推論步驟。 在 MuSiQue 基準資料集上的實驗顯示,單獨使用 DenoiseLM 即可比原始檢索結果提升平均答案 F1 分數 +6.4 點,OrderLM 單獨使用僅貢獻 +0.1 點,而結合 DenoiseLM + OrderLM 的完整流程可達到 +6.9 點的整體提升。值得注意的是,兩個模組的少量範例提示學習(Few-shot In-Context Learning, ICL)變體,在跨領域任務中能與完整微調版本相匹敵甚至超越,提供了一種資源效率高的泛化解決方案。這些結果凸顯出,在多跳問答系統中,對檢索內容進行「相關性過濾」與「邏輯性排序」的結構化處理,是提升模型穩健性與精準度的關鍵。;Multi-hop question answering requires models to retrieve and reason over multiple interdependent evidence passages to arrive at correct answers. Retrieval-augmented generation (RAG) frameworks enhance factual coverage by fetching relevant documents, but they often suffer from two key shortcomings: distracting noise in the retrieved context and lack of logical passage ordering. To address these issues, we introduce a two-stage post-retrieval refinement pipeline. First, DenoiseLM filters out irrelevant or misleading passages, reducing hallucinations, and focusing the model’s attention on useful evidence. Second, OrderLM restructures the remaining passages into a coherent reasoning chain that mirrors the natural inferential steps. Empirical evaluations on the MuSiQue benchmark show that DenoiseLM alone yields an average Answer F1 improvement of +6.4 points over raw retrieval, OrderLM alone contributes only +0.1 point, and the full DenoiseLM + OrderLM cascade achieves a +6.9 point gain. Notably, few-shot in-context learning (ICL) variants of both modules match or exceed fully fine-tuned counterparts on out-of-sample tasks, offering a resource-efficient cross-domain solution. These results underscore the pivotal role of structuring retrieved content, both by relevance and by logical sequence, in bolstering the robustness and precision of multi-hop QA systems.