An automatic approach for finding keywords to classify opposite concepts

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/81136

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/81136

Title:	An automatic approach for finding keywords to classify opposite concepts
Authors:	楊景都;Yang, Ching-Tu
Contributors:	資訊管理學系
Keywords:	資訊檢索;意見探勘;文本情感分析;種子集;Information Retrieval;Opinion Mining;Sentiment Analysis;Seed Set
Date:	2019-06-19
Issue Date:	2019-09-03 15:36:31 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著科技的發展以及資料量的成長，意見探勘（Opinion Mining）已是近年來資訊檢索領域中最熱門的任務之一。就意見探勘來說，若我們想了解一個文本的語意傾向，首先就得去分析其中每個詞彙的語意傾向。根據過往研究，詞彙層級的意見探勘有許多方法，例如：語料庫法、字典法以及種子集法。本研究主要將聚焦探討種子集法，種子集是一組標有正向及負向標籤的詞彙列表，透過種子集，學者才得以去開展更多關於意見探勘的研究和應用。關於傳統種子集法的缺陷，由於過往研究多是透過人工挑選，或是引用前人資料的方式來獲取種子集，然而，倘若今日使用者想探討其他對立的概念，就人工挑選來說，如此又得重新耗費時間、人力及成本來建置種子集；而就引用過往的研究來說，使用者也很難在短時間內去找到和概念對應的種子集。為了解決過往研究所面臨的問題，本研究會使用詞彙資料庫WordNet去萃取和概念語意相關的詞彙，並導入詞向量的概念word2vec以篩出和概念語意相似度較高的詞彙，最後再利用Google Search去保留使用度較高的詞彙以作為種子集。本研究的貢獻在於提出一套自動化的方法，以為任一語意對立的概念去選取種子集，此方法除了能增進種子集選取的效率，且也不會受限於傳統的正負向概念，而是能依照使用者任意指定的對立概念去建立種子集。除了能拿所挑選的種子集去去判斷詞彙的語意傾向，使用者也能使用種子集去進行更多意見探勘的應用。;With the development of technology and the growth of data volume, Opinion Mining has become one of the most popular tasks in the field of Information Retrieval in recent years. Regarding the details of Opinion Mining, if we want to understand the semantic tendency of a text, then we must first analyze the semantic intention of each words in the text. According to past studies, there are many methods for Word-level opinion exploration, such as Corpus method, Dictionary method and Seed Set method. This study will mainly focus on Seed Set method. A seed set is a list of words with positive and negative labels. Through the seed set, scholars can carry out more research and application about Opinion Mining. Regarding the shortcomings of traditional ways to construct seed sets, most studies have obtained seed sets by manual selection or citing past research data. However, if today users want to explore other opposing concepts, then it’s hard for them to re-establish a seed set or find the right resources in a short time. In order to solve the problems of the past research, this study will first use the lexical database WordNet to extract words related to the opposing concepts. After extracting, we will introduce the vector tool word2vec to screen out the words with higher similarity to concepts, finally we will use Google Search to retain words with more popularity as a seed set of the opposing concepts. The contribution of this research is to propose an automated approach to select a seed set for any semantically opposite concept, our approach can not only boost the efficiency of seed set selection, but also is not limited to the traditional opposite concepts (positive and negative).
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	110	View/Open

社群 sharing

Loading...