English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 84432/84432 (100%)
造訪人次 : 65810777      線上人數 : 346
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99238


    題名: 生醫文獻的檢索增強生成系統發展;Development of a retriever-augmented generation system for biomedical literature
    作者: 曾郁蓉;Tseng, Yu Jung
    貢獻者: 生物醫學工程研究所
    關鍵詞: 檢索增強生成系統;大型語言模型;提示詞工程;嵌入模型;重排器模型;關鍵字索引;retriever-augmented generation system;Large language model;Prompt Engineering;Embedding model;Reranker model;Keyword Table Index
    日期: 2025-11-12
    上傳時間: 2026-03-06 18:24:52 (UTC+8)
    出版者: 國立中央大學
    摘要: 摘 要


    本研究旨在優化檢索增強生成系統(Retrieval-Augmented Generation, RAG)於生醫文獻應用中的檢索配置與回應結果。隨著大型語言模型(LLMs)在專業領域的應用逐漸普及,如何避免幻覺(hallucinations)、提升檢索精準度與回應品質成為關鍵課題。研究中以三篇具代表性的生醫文獻為資料來源,透過不同的切分長度(chunk size)、嵌入模型(embedding models)、最相關排名數(Top-k)、重排序模型(reranker)、以及關鍵字索引(keyword table index)進行實驗,比較其在命中率(hit rate)、平均倒數排名(MRR)、回應正確性(Correctness)、忠實(faithfulness)、相關性(relevancy)等指標上的差異。結果顯示,最佳檢索配置為:chunk size = 1024 tokens﹑OpenAI text-embedding-3-large 嵌入模型﹑向量索引 + 關鍵字索引結合﹑Jina Reranker(jina-reranker-v1-tiny-en, Top-k=10, rerank Top-n=5)。回應生成建議使用 GPT 模型,溫度(temperature)設為 0,以提升忠實度與相關性。本研究驗證了透過檢索與生成雙重優化,可有效提升生醫文獻問答的正確性,並指出 RAG 系統仍存在延遲(latency)、需定期更新數據來源等限制。未來方向包括建立更進階的問答系統與結合Model Context Protocol (MCP),發展智能型生醫文獻搜尋引擎。
    ;ABSTRACT



    This study aims to optimize the retriever configuration and response outcomes of
    Retrieval-Augmented Generation (RAG) systems in the context of biomedical literature.
    As large language models (LLMs) become increasingly prevalent in specialized domains, addressing hallucinations, improving retrieval accuracy, and enhancing response quality are critical challenges. Using three representative biomedical papers as the data source, experiments were conducted with different chunk sizes, embedding models, top-k retrieval parameters, reranker models, and keyword table index. The performance was compared across evaluation metrics including hit rate, mean reciprocal rank(MRR), response correctness, faithfulness, and relevancy. The results indicate that the optimal retriever configuration is: Chunk size = 1024 tokens, OpenAI text-embedding-3-large embedding model, Combination of vector index and keyword index, Jina Reranker (jina-reranker-v1-tiny-en, Top-k=10, rerank Top-n=5) and GPT-based models for response generation with temperature = 0 to improve faithfulness and relevancy. This study demonstrates that dual optimization of retrieval and generation significantly improves the accuracy of biomedical literature answers. However, limitations such as latency and the need for regular data updates remain. Future work includes developing advanced QA systems and integrating the Model Context Protocol(MCP) to build intelligent biomedical literature search engines.
    顯示於類別:[生物醫學工程研究所 ] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML9檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明