摘要: | 網路購物中評論的影響力對消費者與店家銷售策略已經產生巨大影響,其中,正 向的評論會對於消費者有積極的購買行為。因此許多店家為提升銷售量,會徵求許多 寫手編寫正向虛假評論,混淆消費者的資訊推銷產品。目前辨別真假評論的研究中, 若使用語言類別萃取特定評論的特徵,將導致原先表現良好的辨別方法換成另一批資 料測試時,準確率就會大幅下降。 至今相關研究逐漸由單一領域中辨別虛假評論進一步探討跨領域中辨別虛假評 論,例如:Li, Ott, Cardie, and Hovy (2014)、Ren and Ji (2017)、W. Liu, Jing, and Li (2019)。無論是使用評論之語言特徵或類神經網路等綜合方法建立辨別模型,皆面臨精 準度降低的問題,其中,也並未明確解釋為何字詞可以應用在跨領域的預測上。 本論文使用:Ott et al.(2011)及 Li, Ott, Cardie, and Hovy (2014)所搜集的三個領域 (hotel、restaurant、doctor)真實與虛假評論資料,利用心理學理論,Stimuli Organism Response (S-O-R)框架為基礎結合 LIWC (Linguistic Inquiry and Word Count),建立一個 跨領使用的分類模型,再加上透過 word2vec 詞向量頻繁特徵建萃取,克服過去論文跨 域辨別精準度大幅降低的狀況。 實驗結果得出若使用方法一,SOR 與評論之特徵權重進行分類演算法計算,表現最 佳的 DNN 方法中準確度達 63.6%。方法二,詞向量頻繁特徵進行分類演算法計算,表 現最佳的 random forest 準確度達 73.75%。;The online reviews not only have huge impact on consumer shopping behavior but also online stores’ marketing strategy. Positive reviews will have positive influence for consumer’s buying decision. Therefore, some sellers want to boost their sales volume. They will hire spammers to write undeserving positive reviews to promote their products. Currently, some of the researches related to detection of fake reviews based on the text feature, the model will reach to high accuracy. However, the same model test on the other dataset the accuracy decrease sharply. Relevant researches have gradually explored the identification of false reviews through field. For example, Li, Ott, Cardie, and Hovy (2014);Ren and Ji (2017)、W. Liu, Jing, and Li (2019). Whether the model built using comprehensive methods such as text features or neural networks, encountering the decreasing of accuracy. On the other hand, the method didn’t explain why the model can be applied to cross-domain predictions. In our research, we using the fake reviews and truthful reviews from Ott et al.(2011) and Li, Ott, Cardie, and Hovy (2014) in the three domain (hotel, restaurant, doctor). The cross domain detect model based on Stimuli Organism Response (S-O-R) combine LIWC (Linguistic Inquiry and Word Count), add word2vec quantization feature, overcoming the decreasing accuracy situation. According to the research result, in the method one SOR calculation of feature weight of reviews, the DNN classification algorithm accuracy is 63.6%. In the method two, calculation of frequent features of word vectors, the random forest classification algorithm accuracy is 73.75%. |