 |
English
|
正體中文
|
简体中文
|
全文筆數/總筆數 : 84432/84432 (100%)
造訪人次 : 65812869
線上人數 : 176
|
|
|
資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://ir.lib.ncu.edu.tw/handle/987654321/99378
|
| 題名: | 針對文言文歷史文獻的領域感知型檢索流程:以魏晉南北朝為案例研究;A Domain-Aware Retrieval Pipeline for Classical Chinese Historical Texts: A Case Study on the Wei–Jin Northern and Southern Dynasties |
| 作者: | 柯函君;Ko, Han-Chun |
| 貢獻者: | 資訊工程學系 |
| 關鍵詞: | 數位人文;歷史文獻檢索;魏晉南北朝;檢索增強生成;文言文;卷節結構感知;Digital Humanities;Historical Information Retrieval;Wei-Jin Northern and Southern Dynasties;Retrieval-Augmented Generation;Classical Chinese;Metadata-Augmented Tex |
| 日期: | 2026-01-23 |
| 上傳時間: | 2026-03-06 18:50:40 (UTC+8) |
| 出版者: | 國立中央大學 |
| 摘要: | 隨著大型語言模型(Large Language Models, LLMs)與受檢索增強生成(Retrieval-Augmented Generation, RAG)啟發之檢索增強技術的發展,如何有效支援古今語言差異顯著且結構複雜之歷史文獻檢索,已成為數位人文研究中的重要課題。在中國中古早期史研究中,研究者常以現代中文設計問題,卻需在多部史書的文言文史料之中搜尋相關記載,使得僅依賴語義相似度的傳統檢索方法難以有效應對,尤以長尾型研究問題最為明顯。 本研究聚焦於魏晉南北朝時期之歷史文獻,提出一套結合領域知識之史料檢索 框架——魏晉南北朝史料檢索系統(Wei–Jin Northern and Southern Dynasties Historical Records Retrieval Pipeline, WJNS pipeline)。系統設計核心在於強化查詢端表徵並納入史書卷節結構感知機制,結合人物別名與朝代資訊導向的查詢擴展、生成 HyDE 式文言文史料段落,以及中繼資料增強的文本表徵工程,以提升複雜歷史檢索情境之下的檢索效能。 為有效進行系統評估,本研究設計了一套共 297 筆現代中文查詢問題之歷史文獻 檢索資料集,並依問題複雜度劃分為三個難度層級。實驗採用五折交叉驗證 (five-fold cross-validation),在本研究之評估設定下,實驗結果顯示 WJNS pipeline 在 Recall@10 與 nDCG@10 等指標上皆呈現統計顯著性且穩定的提升。整體而言,本論文建立並驗證一套領域知識導向的歷史文獻檢索框架,並提供相關基準資源,以支援資訊檢索與數位人文之後續研究。;With the rapid advancement of Large Language Models (LLMs) and retrieval-augmented techniques inspired by Retrieval-Augmented Generation (RAG), effective retrieval over structurally complex historical corpora with significant linguistic variation has become a key challenge in Digital Humanities. In the study of early medieval Chinese history, researchers often formulate queries in modern Chinese to locate related passages scattered across multiple volumes and sections of Classical Chinese historical texts. This cross-linguistic and cross-structural mismatch poses challenges for traditional retrieval approaches, particularly for long-tail queries that require aggregating passages from multiple sources. This thesis focuses on the historical materials of the Wei–Jin Northern and Southern Dynasties and introduces a domain-aware retrieval framework, the Wei–Jin Northern and Southern Dynasties Historical Records Retrieval Pipeline (WJNS pipeline). The proposed system enhances query representations while incorporating awareness of the hierarchical structure found in classical historical documents. Specifically, the framework integrates alias- and dynasty-informed query expansion to address name variation and contextual relevance, HyDE-style pseudo-document generation in Classical Chinese to bridge linguistic gaps, and Metadata-Augmented Text (MAT) representation to encode structural and contextual metadata directly into the retrieval process. These components collectively aim to improve retrieval effectiveness in scenarios characterized by name ambiguity, dispersed relevant passages, and heterogeneous document organization. To systematically evaluate the proposed framework, this study constructs a retrieval benchmark composed of 297 expert-crafted modern Chinese queries, categorized into three difficulty levels. A five-fold cross-validation approach is adopted to ensure robust evaluation. Experimental results demonstrate that the WJNS pipeline achieves statistically significant and consistent improvements across standard retrieval metrics under the adopted evaluation protocol, with particularly notable gains for complex and long-tail queries spanning multiple historical sources. Overall, rather than introducing a single monolithic model, this thesis presents a modular and domain-informed retrieval framework and provides a curated benchmark dataset to support future research at the intersection of Information Retrieval and Digital Humanities. |
| 顯示於類別: | [資訊工程研究所] 博碩士論文
|
文件中的檔案:
| 檔案 |
描述 |
大小 | 格式 | 瀏覽次數 |
| index.html | | 0Kb | HTML | 20 | 檢視/開啟 |
|
在NCUIR中所有的資料項目都受到原著作權保護.
|
::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::