dc.description.abstract | The news lead is a very important part of news content. The lead is at the beginning of the news, that is writes key content of the article in the most concise text to attract readers to read the entire report. The lead is written in many ways, but usually contains when, where, who, what, why, how in the news 5w1h information In natural language processing tasks, there are a lot of research is on generate headlines and summaries, but there are little research is on automatic lead.
This research is mainly to establish a framework for automatically generating leads, the writing techniques and elements of the lead are analyzed by using TextRank and Word2Vec with sentence position, sentence length, and title overlap rate to identify key events in news to obtain a set of topic sentences. Then the sentence collections are used for pos tagging, named entity tagging, semantic role tagging and other methods to extract 5w1h elements in news, and then to generate hard news lead and soft news lead respectively.
Seven common hard news lead types combined extract from hard news lead, and the 5w1h elements and the features of the lead are finally combined to produce a hard news lead. The introduction of soft news uses the syntax of hiding 5w1h elements to generate the lead of soft news.
According to these methods, this research has produced hard news introduction and soft news introduction, ensuring that the news introduction generated contains enough key news information and can generate different types of introduction according to the needs of users.
This research not only helps users reduce the manpower and time requirements for writing lead, but also makes the generated lead have a variety of writing styles, which can be changed according to the needs of users. The generated lead can also allow readers to quickly understand news information. | en_US |