新聞導言之智能生成

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：8

、訪客IP：18.225.92.112

姓名

鍾文翔(Wen-Xiang Zhong) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

新聞導言之智能生成
(Intelligent generation of news lead)

相關論文

★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例	★ 生物晶片之基因微陣列影像分析之研究
★ 台灣資訊家電產業IPv6技術地圖與發展策略之研究	★ 台灣第三代行動通訊產業IPv6技術地圖與發展策略之研究
★ 影響消費者使用電子書閱讀器採納意願之研究	★ 以資訊素養映對數位學習平台功能之研究
★ 台商群聚指標模式與資料分析之研究	★ 未來輪輔助軟體發展之需求擷取研究
★ 以工作流程圖展現未來研究方法配適於前瞻研究流程之研究	★ 以物件導向塑模未來研究方法配適於前瞻研究之系統架構
★ 應用TRIZ 探討核心因素建構電子商務新畫布	★ 企業策略資訊策略人力資源管理策略對組織績效的影響
★ 採用Color Petri Net方法偵測程式原始碼緩衝區溢位問題	★ 簡單且彈性化的軟體代理人通訊協定之探討與實作
★ 利用分析層級程序法探討台灣中草藥製造業之關鍵成功因素	★ 利用微陣列資料分析於基因調控網路之建構與預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

新聞導言是新聞內容中相當重要的一部分，導言處在新聞的開頭，以最簡練的文字寫出文章中的重點內容，吸引讀者繼續看完整篇報導，導言主要可分為硬式新聞導言及軟式新聞導言兩種大類，硬式導言的內容通常包含新聞的何時(when)、何地(where)、何事(what)、何人(who)、為何(why)、如何(how)，簡稱5w1h，要求在簡短的篇幅盡可能描述新文的主體；軟式導言則偏向使用新奇、懸疑的手法來吸引讀者興趣。但目前的自然語言處理任務中，生成新聞標題、新聞摘要的相關研究相當多，自動產生導言的研究卻較少。
本研究主要在建立一套自動產生導言的框架，從導言本身的寫作手法和要素去分析，利用TextRank結合Word2Vec與句子位置、句子長度、標題重疊率去辨識新聞關鍵事件，取得主題句子集合，再將句子集合去進行詞性標注、命名實體、語義角色標注等方式來抽取新聞5w1h要素，然後分別產生硬式新聞導言和軟式新聞導言。
硬式新聞導言抽取七種常見硬式新聞導言類型，即敘事式、描寫式、引語式、描寫式、提問式、評議式、結論式、對比式的特徵，例如：研究結果、地點描述、提問、引用句等，最後將5w1h要素及導言特徵兩者結合去產生硬式新聞導言。軟式新聞導言的部分，使用隱藏5w1h要素的句法來產生懸疑手法，成功產生了軟式新聞導言。
依照這些方式，本研究產生了硬式新聞導言及軟式新聞導言，確保產生的新聞導言包含足夠的新聞重點資訊，且能依使用者需求產生不同類型的導言。
本研究除了能幫助使用者減少撰寫導言的人力及時間需求，更使產生出來的導言有著多樣的寫作風格，可依照使用者的需求做改變，產生的導言也能讓讀者快速瞭解到新聞資訊。

摘要(英)

The news lead is a very important part of news content. The lead is at the beginning of the news, that is writes key content of the article in the most concise text to attract readers to read the entire report. The lead is written in many ways, but usually contains when, where, who, what, why, how in the news 5w1h information In natural language processing tasks, there are a lot of research is on generate headlines and summaries, but there are little research is on automatic lead.
This research is mainly to establish a framework for automatically generating leads, the writing techniques and elements of the lead are analyzed by using TextRank and Word2Vec with sentence position, sentence length, and title overlap rate to identify key events in news to obtain a set of topic sentences. Then the sentence collections are used for pos tagging, named entity tagging, semantic role tagging and other methods to extract 5w1h elements in news, and then to generate hard news lead and soft news lead respectively.
Seven common hard news lead types combined extract from hard news lead, and the 5w1h elements and the features of the lead are finally combined to produce a hard news lead. The introduction of soft news uses the syntax of hiding 5w1h elements to generate the lead of soft news.
According to these methods, this research has produced hard news introduction and soft news introduction, ensuring that the news introduction generated contains enough key news information and can generate different types of introduction according to the needs of users.
This research not only helps users reduce the manpower and time requirements for writing lead, but also makes the generated lead have a variety of writing styles, which can be changed according to the needs of users. The generated lead can also allow readers to quickly understand news information.

關鍵字(中)

★ 新聞導言
★ 事件提取
★ 5w1h
★ TextRank
★ Word2Vec

關鍵字(英)

★ News introduction
★ Event extraction
★ 5w1h
★ TextRank
★ Word2Vec

論文目次

摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章、緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 3
1.4 論文架構 3
第二章、文獻探討 5
2.1 新聞導言 5
2.2 中文斷詞系統 7
2.3 自動摘要任務 7
2.4 事件抽取 12
2.5 摘要評價方法 16
第三章、研究方法 17
3.1 系統架構 17
3.2 區分導言類型 17
3.3 資料前處理 18
3.4 關鍵事件識別 19
3.5 抽取新聞5w1h要素及導言特徵 22
3.6 產生新聞導言 28
3.7 新聞導言評估 30
3.8 小結 31
第四章、研究結果 33
4.1 關鍵句抽取結果 33
4.2 新聞5w1h要素抽取結果 35
4.3 導言特徵抽取結果 37
4.4 產生新聞導言分析 38
4.5 導言評估 49
第五章、結論 51
5.1 結論 51
5.2 研究限制與未來方向 52
參考文獻 54
附錄一：語義角色標注列表 61
附錄二：依存句法分析列表 62

參考文獻

Blake, K. (2006). Inverted pyramid story format. Middle Tennessee State University.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine.
Che, W., Li, Z., & Liu, T. (2010). Ltp: A chinese language technology platform. Paper presented at the Coling 2010: Demonstrations.
Chieu, H. L., & Ng, H. T. (2002). A maximum entropy approach to information extraction from semi-structured and free text. Aaai/iaai, 2002, 786-791.
Chinchor, N., & Marsh, E. (1998). Muc-7 information extraction task definition. Paper presented at the Proceeding of the seventh message understanding conference (MUC-7), Appendices.
Consortium, L. D. (2009). ACE (Automatic Content Extraction) Chinese annotation guidelines for events. In.
Dalal, M. K., & Zaveri, M. A. (2011). Heuristics based automatic text summarization of unstructured text. Paper presented at the Proceedings of the international conference & workshop on emerging trends in technology.
Dali, L., & Fortuna, B. (2008). Triplet extraction from sentences using svm. Proceedings of SiKDD, 2008.
Das, D., & Martins, A. (2007). A survey on automatic text summarization. literature survey for language and statistics. II Course at CMU.
Deng, L., & Yu, D. (2014). Deep learning: methods and applications. Foundations and trends in signal processing, 7(3–4), 197-387.
Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264-285.
Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational linguistics, 28(3), 245-288.
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.
Gupta, V., & Lehal, G. S. (2010). A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2(3), 258-268.
Hamborg, F., Breitinger, C., & Gipp, B. (2019). Giveme5W1H: A Universal System for Extracting Main Events from News Articles. arXiv preprint arXiv:1909.02766.
Khodra, M. L. (2015). Event extraction on Indonesian news article using multiclass categorization. Paper presented at the 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA).
Levy, R., & Manning, C. D. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the proceedings of the 41st Annual Meeting of the Association for Computational Linguistics.
Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out.
Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. Paper presented at the Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17.
Mani, I., & Maybury, M. T. (1999). Advances in automatic text summarization (Vol. 293). Cambridge, MA.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. Paper presented at the Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations.
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. Paper presented at the Proceedings of the 2004 conference on empirical methods in natural language processing.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
Nurdin, A., & Maulidevi, N. (2018). 5W1H information extraction with CNN-bidirectional LSTM. Paper presented at the J. Phys. Conf. Ser.
Paice, C. D. (1990). Constructing literature abstracts by computer: Techniques and prospects. Inf. Process. Manag., 26(1), 171-186.
Porter, J. (2010). Five Ws and One H: the secret to complete news stories. Weblog entry. Journalistics. Posted, 5.
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. Paper presented at the Proceedings of the first instructional conference on machine learning.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
Sun, J. (2012). Jieba chinese word segmentation tool. In.
Wang, W. (2012). Chinese news event 5W1H semantic elements extraction for event ontology population. Paper presented at the Proceedings of the 21st International Conference on World Wide Web.
Wang, W., Zhao, D., & Wang, D. (2010). Chinese news event 5w1h elements extraction using semantic role labeling. Paper presented at the 2010 Third International Symposium on Information Processing.
Wang, W., Zhao, D., Zou, L., Wang, D., & Zheng, W. (2010). Extracting 5W1H event semantic elements from Chinese online news. Paper presented at the International Conference on Web-Age Information Management.
Wolf, L., Hanani, Y., Bar, K., & Dershowitz, N. (2014). Joint word2vec Networks for Bilingual Semantic Representations. Int. J. Comput. Linguistics Appl., 5(1), 27-42.
Zhang, D., Xu, H., Su, Z., & Xu, Y. (2015). Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications, 42(4), 1857-1863.
王紅斌, 郜洪奎. (2017). 基於 word2vec 和依存分析的事件識別研究. 軟體, 38(6), 62-65.
王偉, 趙東岩, 趙偉. (2011). 中文新聞關鍵事件的主題句識別. 北京大學學報: 自然科學版, 47(5), 789-796.
甘惜分. (1993). 新聞學大辭典. 第 151 頁, 河南人民出版社.
石麗東. (1991). 當代新聞報導 (Vol. 1): 正中書局. 流傳文化. 墨文堂文化.
江珮翎. (2003). 中文新聞標題自動生成之研究. 撰者,
李慧, 陳紅倩, 馬麗儀, 祁梅. (2017). 結合注意力機制的新聞標題生成模型. 山西大學學報 (自然科學版)(4), 2.
沈征郎. (1992). 實用新聞編採寫作: 聯經出版公司.
林東泰. (2015). 敘事新聞與數位敘事: 五南圖書出版股份有限公司.
孫銳. (2017). 基於事件圖的新聞標題生成研究. 樂山師範學院學報, 32(4), 42-46.
張駿德. (2006). 中國新聞改革論: 秀威出版社.
梁晗, 陳群秀, 吳平博. (2006). 基於事件框架的資訊抽取系統. 中文資訊學報, 20(2), 42-48.
陳東園;鄭貞銘. (2007). 新聞編輯與採訪: 空大出版社.
陳萬達. (2008). 新聞採訪與編輯: 威仕曼文化.
曾元顯. (2004). 中文手機新聞簡訊自動摘要. 第十六屆自然語言與語音處理研討會, 臺北, 2-3.
甯建飛, 劉降珍. (2016). 融合 Word2vec 與 TextRank 的關鍵字抽取研究. 現代圖書情報技術, 6, 20-27.
馮禮. (2008). 基於事件框架的突發事件資訊抽取. 上海: 上海交通大學,
廖柏森. (2014). 新聞英文閱讀與翻譯技巧. 台灣: 眾文出版社.
蒲梅, 周楓, 周晶晶, 嚴馨, 周蘭江. (2017). 基於加權 TextRank 的新聞關鍵事件主題句提取. 電腦工程, 34(8), 219-224.
趙妍妍, 秦兵, 車萬翔, 劉挺. (2008). 中文事件抽取技術研究.
趙妍妍. (2007). 中文事件抽取的相關技術研究. 哈爾濱工業大學,
劉寧靜, 張衛國. (2007). 動詞配價的確定原則及三價動詞再分類探索. Paper presented at the 內容計算的研究與應用前沿——第九屆全國計算語言學學術會議論文集.
龐超. (2018). 神經網路在新聞標題生成中的研究. 北京交通大學,
顧益軍, 夏天. (2014). 融合 LDA 與 TextRank 的關鍵字抽取研究. 現代圖書情報技術(7), 41-47.
Brownlee, J. (2017). A gentle introduction to text summarization. Deep Learning for Natural Language Processing. Retrieved from https://machinelearningmastery. com/gentle-introduction-text-summarization.
Hsu, G. (2020). NLP專欄(1) — 詳解 Word2vec. Retrieved from https://medium.com/@sfhsu29/nlp%E5%B0%88%E6%AC%84-1-%E6%B7%BA%E8%AB%87word2vec-3775c7f7d5ba
中研院資訊所. (2003). 中文斷詞系統. Retrieved from http://ckipsvr.iis.sinica.edu.tw/
哈工大語言技術平臺. (2014). LTP. Retrieved fromhttp://www.ltp-cloud.com/intro#cws_how
陳運文. (2019). 達觀數據：中文對比英文自然語言處理NLP的區別綜述. Retrieved from https://www.twblogs.net/a/5c92489cbd9eee35cd6b9ab4

指導教授

薛義誠(Yih-Chearng Shiue)

審核日期

2020-8-24

推文