中大學術數位典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98152
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83776/83776 (100%)
Visitors : 62106015      Online Users : 1014
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98152


    Title: 句子語意等級評估:可解釋性之詞彙釋義消歧;Sentence Semantic Level Evaluation: Leveraging Word Gloss Disambiguation for Interpretability
    Authors: 廖振閔;Liao, Jen-Min
    Contributors: 資訊工程學系
    Keywords: 詞義消歧;基於註釋的字典;資料集創建方法;句子簡化;句子難度量化;語意分析;自然語 言處理;繁體中文;文言文;Word Sense Disambiguation;Gloss-Based Dictionary;Traditional Chinese;Classical Chinese;Dataset Creation;Natural Language Processing;Sentence Simplification;Sentence Difficulty Quantification;Semantic Analysis
    Date: 2025-04-22
    Issue Date: 2025-10-17 12:26:22 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 本研究旨在利用詞義消歧(WordSense Disambiguation, WSD)技術,開發一個細緻的句子難度評分系統,以解決簡化任務(Simplification Task)中常用評估方式(如BLEU 和 SARI)需要多個參考答案的問題,並增強評估的可解釋性。WSD在自然語言處理中仍是一項重要挑戰,尤其對於資源有限的語言。本研究針對非英語WSD數據集的稀缺性,通過自動創建基於詞彙釋義的語義庫,展示了利用現有字典資源來緩解數據限制的可行性。

    為解決上述問題,我們利用詞彙釋義消歧(WordGlossDisambiguation, WGD)技術,這是一種與WSD相關但更專注於詞彙釋義的技術。雖然WGD並非全新概念,但過去研究中未曾明確區分WGD與WSD。我們使用兩本繁體中文字典建立WGD模型,將提示轉換為多選題,並使用TAIDELlama3 8B 模型,在現代漢語和古典漢語中分別達到86.9% 和 80.8% 的準確率。在進一步的GPT-4o API 設定下,分數提升至89.8% 和 83.2%。

    儘管由於授權限制,我們無法分發最終數據集,但我們提供了所有的處理程序和訪問原始字典的清晰說明,以確保研究的可重現性。這些模型能準確計算句子中每個詞彙的釋義難度,進而評估句子的整體難度。在Google及OpenCC翻譯後的CSS和MCTS 簡化資料集上,我們的方法與標註者的一致率皆超過72%。

    本研究展示了WGD技術在句子簡化評估中的潛力。儘管目前無法評估整句話語意簡化前後的一致性,但其對詞彙釋義的細緻分析增強了句子簡化的可解釋性。;This study aims to develop a nuanced sentence difficulty scorer using Word Sense Disambiguation (WSD) techniques to address the limitations of traditional evaluation methods (BLEU, SARI) in simplification tasks, which require multiple reference answers, while also enhancing interpretability. This research also addresses the scarcity of non English WSD datasets by demonstrating the feasibility of leveraging existing dictionary resources to mitigate data limitations through the automated creation of a gloss-based sense inventory.

    Specifically, we employ Word Gloss Disambiguation (WGD) technology, a technique related to WSD but more focused on word glosses, to develop a model using two Traditional Chinese dictionaries. We transformed prompts into multiple-choice questions, achieving accuracy rates of 86.9% for Modern Chinese and 80.8% for Classical Chinese with the TAIDE Llama3 8B model. Further enhancements with GPT-4o API settings increased these scores to 89.8% and 83.2%, respectively.

    Although licensing constraints prevent us from distributing the final dataset, we provide the necessary processing steps and clear instructions for accessing the original dictionaries to ensure reproducibility. These models can accurately calculate the gloss difficulty of each word in a sentence, thereby assessing the overall sentence difficulty. On the CSS and MCTS simplification datasets translated by Google and OpenCC, our method achieved over 72% agreement with annotators.

    This study demonstrates the potential of WGD technology in sentence simplification evaluation. Although it currently cannot assess whether the semantic consistency of entire sentences is maintained before and after simplification, its detailed analysis of word glosses enhances the interpretability of sentence simplification.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML46View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明