中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98191
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83776/83776 (100%)
Visitors : 60039374      Online Users : 939
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98191


    Title: 大型語言模型於多文本摘要之效能與成本效益分析:多段式架構、提示工程與評估方法之探討;Evaluating Large Language Models for Multi-Document News Summarization: Architecture Design, Prompting Strategies, and Cost-Effectiveness
    Authors: 傅孟淳;Fu, Meng-Chun
    Contributors: 資訊工程學系
    Keywords: 大型語言模型;多文檔摘要;提示工程;成本效益;基於 LLM 的評估;Large Language Models;Multi-document Summarization;Prompt Engineering;Cost-effectiveness;LLM-based Evaluation
    Date: 2025-07-01
    Issue Date: 2025-10-17 12:28:26 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 近年來,新聞報導的數量與資訊量爆炸性增長,如何有效消化多篇新聞內
    容並生成摘要逐漸受到大家的重視。此外,隨著 GPT-4o 等最新 LLM 放寬
    Token 限制,過去的多階段摘要方法是否仍具優勢,以及不同的提示工程
    (Prompt Engineering)技術是否能進一步提升摘要品質,值得我們思考。
    本研究將使用大型語言模型(Large Language Model, LLM) 進行多文檔新聞
    摘要生成與輿情分析,透過爬取 24 小時內的體育新聞,分析新聞熱點與情緒趨
    勢,並最終產出結合輿情分析結果的摘要。我們測試了 GPT-4o、Llama-3.3-
    70B、Mixtral-8x7B 和 Gemma2-9B 這四種模型的摘要效果,並比較了單階段與
    多階段摘要方法。此外,在提示工程部分,我們設計了簡單提示(Simple)、
    少樣本學習(Few-shot)、思維鏈(Chain-of-Thought, CoT)以及指令提示
    (Instruct)等不同策略,來探討提示工程對摘要品質的影響。為了進一步改善
    摘要評估的成本效益,我們採用了基於 LLM 的自動摘要評估方法,並與人工
    專家評估的花費進行比較。我們也探討了在 Notebooklm 這種工具已經推出的背
    景下,摘要任務是否還有研究的必要
    研究結果顯示,單階段摘要在品質上優於多階段摘要,而 Few-shot 能有效
    提升摘要的語義一致性與關鍵資訊保留能力。在 LLM 的比較上,Llama-3.3-
    70B 在新聞摘要任務中的表現優於 GPT-4o,顯示開源模型在特定應用場景下具
    備與商業模型競爭的能力。且 Notebooklm 在多文本新聞摘要其實表現不如本研
    究之方法所產生的摘要,顯示在特定任務下,依舊需要更深的研究。此外,在
    輿情分析應用中,LLM 能夠有效識別新聞話題並分析文本情緒,使摘要具有更
    高的品質。;In recent years, the number and volume of news reports have grown explosively,
    making it increasingly important to effectively digest multiple news articles and
    generate summaries. Moreover, with the latest large language models (LLMs) such as
    GPT-4o relaxing token limitations, it is worth reconsidering whether traditional multi-
    stage summarization methods still hold an advantage, and whether different prompt
    engineering techniques can further improve summary quality.
    This study utilizes large language models (LLMs) for multi-document news
    summarization and public opinion analysis. By crawling sports news published within
    the past 24 hours, we analyze trending topics and sentiment trends, and ultimately
    generate summaries that incorporate the results of public opinion analysis. We evaluate
    the summarization performance of four models: GPT-4o, Llama-3.3-70B, Mixtral-
    8x7B, and Gemma2-9B, and compare single-stage and multi-stage summarization
    methods. In terms of prompt engineering, we design and test various strategies,
    including Simple prompts, Few-shot learning, Chain-of-Thought (CoT), and Instruct
    prompts, to explore their effects on summary quality.
    To further improve the cost-effectiveness of summary evaluation, we adopt LLM-
    based automatic evaluation methods and compare them with human expert assessments.
    We also examine whether summarization research remains necessary in the context of
    tools like NotebookLM becoming available.
    The results show that single-stage summarization outperforms multi-stage
    summarization in terms of quality, and Few-shot prompts significantly enhance
    semantic consistency and the preservation of key information. Among the LLMs tested,
    Llama-3.3-70B performs better than GPT-4o in the news summarization task,
    demonstrating that open-source models can compete with commercial ones in specific
    application scenarios. Additionally, NotebookLM’s performance on multi-document
    news summarization is inferior to the summaries generated by our proposed method,
    indicating that in certain tasks, further in-depth research is still required. Furthermore,
    in the application of public opinion analysis, LLMs are effective in identifying news
    iii
    topics and analyzing textual sentiment, thereby contributing to higher-quality
    summarization.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML25View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明