An automatic approach for finding keywords to classify opposite concepts

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：27

、訪客IP：3.16.47.175

姓名

楊景都(Ching-Tu Yang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

(An automatic approach for finding keywords to classify opposite concepts)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-6-1以後開放)

摘要(中)

隨著科技的發展以及資料量的成長，意見探勘（Opinion Mining）已是近年來資訊檢索領域中最熱門的任務之一。就意見探勘來說，若我們想了解一個文本的語意傾向，首先就得去分析其中每個詞彙的語意傾向。根據過往研究，詞彙層級的意見探勘有許多方法，例如：語料庫法、字典法以及種子集法。本研究主要將聚焦探討種子集法，種子集是一組標有正向及負向標籤的詞彙列表，透過種子集，學者才得以去開展更多關於意見探勘的研究和應用。關於傳統種子集法的缺陷，由於過往研究多是透過人工挑選，或是引用前人資料的方式來獲取種子集，然而，倘若今日使用者想探討其他對立的概念，就人工挑選來說，如此又得重新耗費時間、人力及成本來建置種子集；而就引用過往的研究來說，使用者也很難在短時間內去找到和概念對應的種子集。為了解決過往研究所面臨的問題，本研究會使用詞彙資料庫WordNet去萃取和概念語意相關的詞彙，並導入詞向量的概念word2vec以篩出和概念語意相似度較高的詞彙，最後再利用Google Search去保留使用度較高的詞彙以作為種子集。本研究的貢獻在於提出一套自動化的方法，以為任一語意對立的概念去選取種子集，此方法除了能增進種子集選取的效率，且也不會受限於傳統的正負向概念，而是能依照使用者任意指定的對立概念去建立種子集。除了能拿所挑選的種子集去去判斷詞彙的語意傾向，使用者也能使用種子集去進行更多意見探勘的應用。

摘要(英)

With the development of technology and the growth of data volume, Opinion Mining has become one of the most popular tasks in the field of Information Retrieval in recent years. Regarding the details of Opinion Mining, if we want to understand the semantic tendency of a text, then we must first analyze the semantic intention of each words in the text. According to past studies, there are many methods for Word-level opinion exploration, such as Corpus method, Dictionary method and Seed Set method. This study will mainly focus on Seed Set method. A seed set is a list of words with positive and negative labels. Through the seed set, scholars can carry out more research and application about Opinion Mining. Regarding the shortcomings of traditional ways to construct seed sets, most studies have obtained seed sets by manual selection or citing past research data. However, if today users want to explore other opposing concepts, then it’s hard for them to re-establish a seed set or find the right resources in a short time. In order to solve the problems of the past research, this study will first use the lexical database WordNet to extract words related to the opposing concepts. After extracting, we will introduce the vector tool word2vec to screen out the words with higher similarity to concepts, finally we will use Google Search to retain words with more popularity as a seed set of the opposing concepts. The contribution of this research is to propose an automated approach to select a seed set for any semantically opposite concept, our approach can not only boost the efficiency of seed set selection, but also is not limited to the traditional opposite concepts (positive and negative).

關鍵字(中)

★ 資訊檢索
★ 意見探勘
★ 文本情感分析
★ 種子集

關鍵字(英)

★ Information Retrieval
★ Opinion Mining
★ Sentiment Analysis
★ Seed Set

論文目次

摘要 i
ABSTRACT ii
CONTENTS iii
LIST OF FIGURES vi
LIST OF TABLES vii
Chapter 1 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
Chapter 2 文獻探討 5
2.1 詞彙級別的意見探勘方法 5
2.1.1 基於語料庫的方法 5
2.1.2 基於字典的方法 6
2.1.3 種子集法 7
2.2 種子集的建立方式 8
2.3 基礎理論 8
2.3.1 WordNet 9
2.3.2 word2vec 10
2.3.3 PMI (pointwise mutual information) 11
Chapter 3 研究方法 13
3.1 研究架構 13
3.2 WordNet 14
3.2.1 方法介紹 14
3.2.2 實例 17
3.3 word2vec 19
3.3.1 方法介紹 19
3.3.2 實例 21
3.4 Google Search 23
3.4.1 方法介紹 23
3.4.2 實例 24
3.5 詞彙分類 25
3.5.1 方法介紹 25
3.5.2 實例 26
Chapter 4 實驗 28
4.1 實驗設計 28
4.2 實驗細節 29
4.2.1 過往研究已討過的對立概念 29
4.2.2 過往研究未曾探討的對立概念 31
4.3 實驗結果 34
Chapter 5 結論 35
5.1 研究發現 35
5.2 研究限制與未來發展 36
Reference 37
附錄一：正向詞彙清單 39
附錄二：負向詞彙清單 44
附錄三：「狗」詞彙清單 49
附錄四：「貓」詞彙清單 53

參考文獻

[1] Baroni, M., & Vegnaduzzo, S. (2004). Identifying subjective adjectives through web-based mutual information. Paper presented at the Proceedings of KONVENS.
[2] Bjørkelund, E., Burnett, T. H., & Nørvåg, K. (2012). A study of opinion mining and visualization of hotel reviews. Paper presented at the Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services.
[3] Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22-29.
[4] Esuli, A., & Sebastiani, F. (2006). Determining term subjectivity and term orientation for opinion mining. Paper presented at the 11th Conference of the European Chapter of the Association for Computational Linguistics.
[5] Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Paper presented at the LREC.
[6] Handler, A. (2014). An empirical study of semantic similarity in WordNet and Word2Vec.
[7] Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics.
[8] Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
[9] Kamps, J., Marx, M., Mokken, R. J., & De Rijke, M. (2004). Using WordNet to measure semantic orientations of adjectives. Paper presented at the LREC.
[10] Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50, 723-762.
[11] Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In Mining text data (pp. 415-463): Springer.
[12] Mao, H., Gao, P., Wang, Y., & Bollen, J. (2014). Automatic construction of financial semantic orientation lexicon from large-scale Chinese news corpus. Institut Louis Bachelier, 20 (2), 1-18.
[13] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[14] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in neural information processing systems.
[15] Mikolov, T., Yih, W.-t., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. Paper presented at the Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[16] Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.
[17] Mohammad, S., Dunne, C., & Dorr, B. (2009). Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. Paper presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2.
[18] Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62-73.
[19] Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity: measuring the relatedness of concepts. Paper presented at the Demonstration papers at HLT-NAACL 2004.
[20] Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion word expansion and target extraction through double propagation. Computational linguistics, 37(1), 9-27.
[21] Rao, D., & Ravichandran, D. (2009). Semi-supervised polarity lexicon induction. Paper presented at the Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.
[22] Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The general inquirer: A computer approach to content analysis.
[23] Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Mining and Knowledge Discovery, 24(3), 478-514.
[24] Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. Paper presented at the European conference on machine learning.
[25] Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
[26] Williams, G. K., & Anand, S. S. (2009). Predicting the polarity strength of adjectives using wordnet. Paper presented at the Third International AAAI Conference on Weblogs and Social Media.
[27] Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. Paper presented at the Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2019-6-19

推文