中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98598
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83776/83776 (100%)
Visitors : 58272855      Online Users : 11937
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98598


    Title: 結合多模態檢索增強生成架構之個人化知識庫檢索系統設計與實作;Design and Implementation of a Personalized Knowledge Base Retrieval System with a Multimodal Retrieval-Augmented Generation Architecture
    Authors: 陳志昇;Chen, Zhi-Sheng
    Contributors: 資訊工程學系
    Keywords: 檢索增強生成;大型語言模型;多模態;Retrieval-Augmented Generation;Large Language Model;Multimodality
    Date: 2025-08-19
    Issue Date: 2025-10-17 12:59:01 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 大型語言模型(LLM)在知識問答中經常面臨幻覺問題,難以準確引用使用者的個人資料或文件內容。檢索增強生成(Retrieval-Augmented Generation, RAG)架構通過在生成回應時檢索外部知識,顯著提高了LLM回答知識型問題的準確性。然而,現有RAG方法多局限於純文字資料的檢索,對於包含圖像等多模態資訊的個人知識庫,傳統方法難以充分利用其中蘊含的視覺訊息。本研究針對此挑戰,設計並實作一套結合多模態 RAG 架構的個人化知識庫檢索系統。該系統利用本地部署的語意檢索與生成模型:首先透過多語言嵌入模型 BGE-M3 將文件文字與查詢轉換為高維向量,以密集檢索找出相關文本片段;繼而使用 BGE-reranker-v2-m3 交叉編碼模型對初步結果重新排序,提升檢索精度;最後由多模態大型模型 Qwen-2.5-VL 接收檢索到的文字片段和相關圖像,產生最終回答。本系統採用 LangChain 串接各模組,並透過 Ollama 平臺在本地執行模型,以確保個人資料隱私。實驗結果表明,該系統能夠在中英雙語下有效從使用者PDF文件中擷取文字與圖像資訊來回答複雜問題,減少模型幻覺並提供更準確且豐富的答覆。本研究詳細討論了系統各組件的技術設計與優勢,並比較了相關領域研究,證明將多模態檢索與生成技術結合應用於個人化知識庫的可行性與效益。;Large language models (LLMs) often suffer from hallucinations in knowledge question answering, making it difficult to accurately cite personal information or document content. Retrieval-Augmented Generation (RAG) improves LLM accuracy by retrieving external knowledge during response generation, but most existing RAG methods focus only on text and cannot fully utilize visual information in multimodal personal knowledge bases.
    To tackle this, this study designs and implements a personalized multimodal RAG system. It combines local semantic retrieval and generation models: the multilingual embedding model BGE-M3 converts text and queries into vectors for dense retrieval; the BGE-reranker-v2-m3 cross-encoder re-ranks results to improve precision; and the multimodal large model Qwen-2.5-VL generates final answers using both retrieved text and related images.
    The system is orchestrated with LangChain and runs locally via Ollama to ensure data privacy. Experiments show it effectively extracts bilingual text and images from user PDFs, reduces hallucinations, and provides more accurate answers. This work demonstrates the feasibility and benefits of applying multimodal retrieval and generation to personalized knowledge bases.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML7View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明