在應用於中國古代文獻的數位人文領域中,已有些研究探討如何 實現文本對齊技術來幫助歷史學者比較不同的文獻,不過這些研究並 沒有以「相同語意」的觀點來對齊文本。故本研究將引入自然語言處 理中釋義識別任務的概念,來找出不同文本中擁有相同語意的段落, 並應用於後漢書、三國志和資治通鑑以作為範例。然而如果要採用釋 義識別任務中最先進的自然語言處理技術,則會有一些限制需要去考 量:(1)訓練資料不足(2)基於注意力方法的文本長度限制。為了 解決這些問題,本研究提出了應用二階段訓練於中國古代文獻釋義識 別的弱監督學習架構(SPITAC)。此方法有兩個主要部分:偽標籤訓 練集生成和二階段訓練。在偽標籤訓練集生成中,本研究使用基於規 則的方法來自動產生訓練資料集以解決訓練資料不足的問題。而為了 解決文本長度限制,則採用句子過濾器的方法來刪減不重要的句子, 將句子長度縮減到最大長度的範圍內。在二階段訓練的設計中,此方 法可以使分類器更好的識別出硬負樣本來提升模型性能。從實驗結果 表明,本研究的弱監督學習方法可以達到接近監督式學習的效果,而 在消融實驗中,句子過濾器和二階段訓練可以有效提升性能,能提高 4.14 F1 分數並超越基線模型。最後本研究將從實際的文本中演示並分 析此方法的成果,並從成效中探討這項任務的困難及未來改進方向。;Implementing text alignment on ancient Chinese literature offers signif- icant assistance to academics investigating historical events, particularly as variations may occur in the descriptions of an event across different texts. These variations represent valuable research materials. However, the current studies rarely align text from the perspective of the ”same event”. In order to develop a tool that better aligns with the practical application conditions of text alignment in ancient Chinese literature, we adopted the predecessors’ ideas. We have redefined the ”Paraphrase” definition of Paraphrase Identi- fication task (a Natural Language Processing task determining whether two texts convey the same meaning) to facilitate the task of text alignment for ancient Chinese literature. This work encounters two primary challenges: 1) the deficiency of train- ing data and 2) the limitations in input length of the attention-based method. To address these issues, we proposed the Event Alignment Model for Ancient Chinese Literature without Requirement of Manually Labeled Data. In this framework, we utilize ChatGPT to generate a training set, thereby overcom- ing the lack of training data. Furthermore, we resolve the issue of text length limitation by employing a data slicing method to reduce paragraph size within a maximum length. Additionally, the GujiBERT model is also implemented for paraphrase identification. Experimental results show that our proposed EAMAC outperforms significantly more than the baseline and exhibits con- siderable stability and applicability when applied to other texts.