應用歌手辨識及角色標注於輿情意見目標分析之研究;Singer Recognition and Semantic Role Labeling for Opinion Target Extraction from Social Network

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/81058

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/81058

Title:	應用歌手辨識及角色標注於輿情意見目標分析之研究;Singer Recognition and Semantic Role Labeling for Opinion Target Extraction from Social Network
Authors:	黎桂如;Li, Gui-Ru
Contributors:	資訊工程學系
Keywords:	深度學習;命名實體辨識;語意角色標記;意見目標偵測;Deep Learning;Named Entity Recognition;Semantic Role Labeling;Opinion Target Detection
Date:	2019-06-28
Issue Date:	2019-09-03 15:31:45 (UTC+8)
Publisher:	國立中央大學
Abstract:	網路聲量偵測是在市場調查時常使用的手法之一，常見的偵測方法為將某事物被提及的次數作為熱門的指標。然而，只利用提及次數作為網路聲量是否真的足夠；有可能該子句真正的意見目標並非被提及的人物，因此本篇論文希望從社群網路的資料中找出意見目標。由於社群網路上的口語敘述並非正規的表達方式，這個問題導致模型在擷取意見目標時充滿挑戰性。為了應對上述問題與挑戰，本研究使用深度學習模型架構進行中文歌手辨識(Singer Name Recognition, NER)和語意角色標記(Semantic Role Labeling, SRL)，並透過自定義規則對子句進行意見目標偵測(Opinion Target Detection, OTD)。我們使用深度學習模型作為歌手辨識模型，並且比較Word2Vec字元嵌入模型以及BERT嵌入模型對效能之影響。在SRL任務中，我們參考Zhou等人[38]使用了額外的特徵以及Zhang等人[37]的高速網路架構來進行模型建立與訓練，希望效能可以有所提升。最後在OTD任務中，我們設計了自定義規則來合併NER實驗結果與SRL實驗結果，作為意見目標偵測的方法。本研究使用的資料為利用客製化爬蟲程式從社群網站上擷取之文章作為訓練資料，測試資料同樣從社群網站上隨機挑選文章，作為基準效能以評估模型之效能。實驗結果顯示，我們的歌手辨識模型在擷取未知歌手效能可達44%的F1，在判斷子句中的語意角色時其F1可以達到71%的效能，在OTD任務的辨識精準度(Precision)則可以達到73%的效能。;Social network is a good resource to collect public opinions considering the diversity and variety in fashion, especially user generated content (UGC). UGC is defined as any type of content that created by users which could be pictures, videos, texts, comment, etc. Extracting the opinions from UGC can be the base of commercial policy, so how to extract the opinions correctly is an important problem. A common method is to regard mention times of entities as important indicator of network volume. There are two problems about the network volume: Are the opinions really talking about the target entities? Or the amount of opinions is enough for network volume analysis? There are several features about UGC, the various written format of entities and the fragmentary structure of sentences. The former means there may have nickname or punctuations in the entities and may drop the performance of NER. The latter means users write the sentences but omit part of words which may drop the performance of SRL. These problems of NER and SRL will also drop the performance of opinion target detection. Therefore, a great challenge is how to recognize entities and semantic role in large UGC corpora. In this study, we combine Named Entity Recognition (NER) and Semantic Role Labeling (SRL) to detect the opinion target (OTD) from UGC. In NER task, we compare the performance between CRF++ and neuron network models. In SRL task, we use highway connection and additional features to improve the performance. Finally, we design the rule to combined the result of NER and SRL for OTD task. The result show that our NER model gets 44% F1 on out-of-vocabulary entities extraction. On SRL task and OTD task, we get 71% F1 and 73% precision respectively.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	122	View/Open

社群 sharing

Loading...