姓名 余東霖(Tung-lin Yu)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 以兩階段分類方法識別新聞類別
(Two-phase Classification Approach for Identifying News Category)
摘要(中) 在過去已有許多關於判斷新聞類別的研究,但這些研究僅注重於技術層面,也就是如何在現有的演算法架構之上,發展出更有效率或更正確的演算法,卻忽略了以人的觀點來進行新聞分類,即模仿新聞工作者真正在進行新聞分類的流程。因此,本研究模仿專家在進行新聞分類時的流程來發展演算法。在實際與新聞工作者訪談之後,我們發現專家在進行新聞分類時的流程大致上可分為兩個步驟;首先,快速瀏覽新聞文章,找尋具代表性或能協助他們進行分類的關鍵字。其次,若找到的關鍵字無法協助他們進行分類,或關鍵字在新聞類別內的代表性不足,則進一步仔細檢視整篇新聞內容。
摘要(英) The news classification problem is concerned with how to assign the correct category for the unclassified news. Although a large number of past studies have studied this problem, a common weakness of these studied is that their classification algorithms were usually designed from technical perspective and they seldom considered how experts really classify the news in a practical classification process. In this research, we first observe how media workers classify news in their daily operations, and we find that their classification process mainly consists of the following operations. (1) If some important keywords or phrases are present in the news, then they directly assign the news to certain categories. (2) Otherwise, they must check in details the whole content of news to determine which category it should belong to. (3) Since a news category may contain several independent but related subcategories, the news is usually classified by assigning it to the most appropriate subcategory, which can in turn determine its category.
  By imitating the above working process, we proposed a news classification algorithm. In the learning phase, we use associative classification rules to find representative keywords in each category. In addition, we further generate a number of subcategories by clustering news under each category. In the classification phase, we assign unclassified news the most appropriate category by using associative classification rules if rules’ confidence is high enough. Otherwise, we will determine the category by measuring the similarity between unclassified news and subcategories. The experimental comparison shows that our approach has better and more stable classification performance than traditional algorithms.
關鍵字(中) ★ 分群
★ 分類關聯規則
★ 文字探勘
★ 新聞分類
關鍵字(英) ★ Text Mining
★ News Classification
★ Clustering
★ Associative Classification Rule
論文目次 Abstract i
摘要 ii
誌謝辭 iii
Table of Contents iv
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
  1.1 Background 1
  1.2 Motivation 2
  1.3 The idea of our approach 4
  1.4 Objective 6
Chapter 2 Literature Review 7
  2.1 News (document) classification 7
  2.2 Text preprocessing 8
  2.3 Algorithm selection 9
  2.4 Our work 13
Chapter 3 Algorithm 14
  3.1 Sketch of the proposed approach 14
  3.2 Symbol definition 17
  3.3 Batch process 20
  3.4 Online process 28
Chapter 4 Performance Evaluation 36
  4.1 Data collections 36
  4.2 Measurements 37
  4.3 Control variables optimization 38
  4.4 Experimental comparison 47
Chapter 5 Conclusion 53
  5.1 Contribution 53
  5.2 Future work 53
Reference 55
指導教授 陳彥良(Yen-liang Chen) 審核日期 2010-7-14
