中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98191
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 83776/83776 (100%)
造访人次 : 60039381      在线人数 : 945
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98191


    题名: 大型語言模型於多文本摘要之效能與成本效益分析:多段式架構、提示工程與評估方法之探討;Evaluating Large Language Models for Multi-Document News Summarization: Architecture Design, Prompting Strategies, and Cost-Effectiveness
    作者: 傅孟淳;Fu, Meng-Chun
    贡献者: 資訊工程學系
    关键词: 大型語言模型;多文檔摘要;提示工程;成本效益;基於 LLM 的評估;Large Language Models;Multi-document Summarization;Prompt Engineering;Cost-effectiveness;LLM-based Evaluation
    日期: 2025-07-01
    上传时间: 2025-10-17 12:28:26 (UTC+8)
    出版者: 國立中央大學
    摘要: 近年來,新聞報導的數量與資訊量爆炸性增長,如何有效消化多篇新聞內
    容並生成摘要逐漸受到大家的重視。此外,隨著 GPT-4o 等最新 LLM 放寬
    Token 限制,過去的多階段摘要方法是否仍具優勢,以及不同的提示工程
    (Prompt Engineering)技術是否能進一步提升摘要品質,值得我們思考。
    本研究將使用大型語言模型(Large Language Model, LLM) 進行多文檔新聞
    摘要生成與輿情分析,透過爬取 24 小時內的體育新聞,分析新聞熱點與情緒趨
    勢,並最終產出結合輿情分析結果的摘要。我們測試了 GPT-4o、Llama-3.3-
    70B、Mixtral-8x7B 和 Gemma2-9B 這四種模型的摘要效果,並比較了單階段與
    多階段摘要方法。此外,在提示工程部分,我們設計了簡單提示(Simple)、
    少樣本學習(Few-shot)、思維鏈(Chain-of-Thought, CoT)以及指令提示
    (Instruct)等不同策略,來探討提示工程對摘要品質的影響。為了進一步改善
    摘要評估的成本效益,我們採用了基於 LLM 的自動摘要評估方法,並與人工
    專家評估的花費進行比較。我們也探討了在 Notebooklm 這種工具已經推出的背
    景下,摘要任務是否還有研究的必要
    研究結果顯示,單階段摘要在品質上優於多階段摘要,而 Few-shot 能有效
    提升摘要的語義一致性與關鍵資訊保留能力。在 LLM 的比較上,Llama-3.3-
    70B 在新聞摘要任務中的表現優於 GPT-4o,顯示開源模型在特定應用場景下具
    備與商業模型競爭的能力。且 Notebooklm 在多文本新聞摘要其實表現不如本研
    究之方法所產生的摘要,顯示在特定任務下,依舊需要更深的研究。此外,在
    輿情分析應用中,LLM 能夠有效識別新聞話題並分析文本情緒,使摘要具有更
    高的品質。;In recent years, the number and volume of news reports have grown explosively,
    making it increasingly important to effectively digest multiple news articles and
    generate summaries. Moreover, with the latest large language models (LLMs) such as
    GPT-4o relaxing token limitations, it is worth reconsidering whether traditional multi-
    stage summarization methods still hold an advantage, and whether different prompt
    engineering techniques can further improve summary quality.
    This study utilizes large language models (LLMs) for multi-document news
    summarization and public opinion analysis. By crawling sports news published within
    the past 24 hours, we analyze trending topics and sentiment trends, and ultimately
    generate summaries that incorporate the results of public opinion analysis. We evaluate
    the summarization performance of four models: GPT-4o, Llama-3.3-70B, Mixtral-
    8x7B, and Gemma2-9B, and compare single-stage and multi-stage summarization
    methods. In terms of prompt engineering, we design and test various strategies,
    including Simple prompts, Few-shot learning, Chain-of-Thought (CoT), and Instruct
    prompts, to explore their effects on summary quality.
    To further improve the cost-effectiveness of summary evaluation, we adopt LLM-
    based automatic evaluation methods and compare them with human expert assessments.
    We also examine whether summarization research remains necessary in the context of
    tools like NotebookLM becoming available.
    The results show that single-stage summarization outperforms multi-stage
    summarization in terms of quality, and Few-shot prompts significantly enhance
    semantic consistency and the preservation of key information. Among the LLMs tested,
    Llama-3.3-70B performs better than GPT-4o in the news summarization task,
    demonstrating that open-source models can compete with commercial ones in specific
    application scenarios. Additionally, NotebookLM’s performance on multi-document
    news summarization is inferior to the summaries generated by our proposed method,
    indicating that in certain tasks, further in-depth research is still required. Furthermore,
    in the application of public opinion analysis, LLMs are effective in identifying news
    iii
    topics and analyzing textual sentiment, thereby contributing to higher-quality
    summarization.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML25检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明