跨領域分辨真假評論之研究－以BERT為基礎模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：3.15.149.254

姓名

陳莉茿(LI-JU CHEN) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

跨領域分辨真假評論之研究－以BERT為基礎模型
(Identify Deceptive Reviews in Cross-domain Content with BERT)

相關論文

★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響	★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究	★ 太陽能光電產業經營績效評估－應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究	★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究	★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例	★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例	★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討	★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究	★ 資料視覺化圖表與議題之關聯

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

線上評論在電子商務中具有重要的影響力，消費者越來越仰賴這些評論來做出購買決策，然而，不道德的企業可能散佈假評論以操縱消費者意見，而Ott et al. (2011) [19] 實驗表明，人類識別假評論的準確率僅有57.3%，且對於跨領域的真假評論分類模型，目前尚缺乏對於在不同領域間共享的文本特徵和規則之研究，由於模型過度依賴相同來源的資料，導致同個模型在其它資料集測試時，準確率急遽下降。
因此，本研究提出基於 Bidirectional Encoder Representations from Transformers (BERT) 的模型，利用[MASK]替代評論中出現的該領域特定單詞，克服跨領域之間兩者評論風格差異性過大的問題，在我們的研究中使用來自Ott et al. (2011) [19] 和Li et al. (2014) [33] 在餐廳、旅館、醫生領域之評論，以及本研究額外加入Yelp真實評論做為訓練資料。最後，MASK-BERT於實驗結果中，與Ren & Ji (2017) [25] 為目前研究最佳之結果做比較，在Cross-domain中，F1-score最佳表現為 88.49%；而對於內容差異性較大的醫生領域，在本研究提出遮蔽機制後，Accuracy也提升了15~20%。

摘要(英)

Online reviews play a significant role in e-commerce. Consumer has been more relied on them when making decision in purchasing. However, unethical businesses may spread deceptive reviews to manipulate consumer`s opinion. Research by Ott et al. (2011) [19] showed that humans can only identify fraud reviews with only an accuracy of 57.3%. Besides, recent research face a crucial challenge that the cross-domain classification model is too rely on similar datasets from the same domain, which causes in a sharp decline in accuracy when testing on datasets from different domain. Currently, there is a lack of method on text features or rules to share with different domains.
Hence, our study proposes a model based on Bidirectional Encoder Representations from Transformers (BERT). We suggest replacing domain-specific words in reviews with [MASK] to overcome the significant stylistic differences between cross-domain reviews. Our research utilizes reviews from Ott et al. (2011) [19] and Li et al. (2014) [33] in the domains of restaurants, hotels, and doctors, supplemented with Yelp reviews as real data for training. Finally, we compare the results of MASK-BERT with the state-of-the-art approach by Ren & Ji (2017) [25]. In the cross-domain, particularly in the doctor domain with larger content differences, our proposed masking mechanism leads to a highest accuracy improvement of 15-20%.

關鍵字(中)

★ 跨領域
★ BERT
★ 假評論
★ 虛假偵測
★ 遮蔽資訊

關鍵字(英)

★ cross-domain
★ BERT
★ fraud reviews
★ deception detection
★ masking information

論文目次

中文摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 viii
第一章緒論 1
1-1 研究背景 1
1-1-1 線上評論影響力 1
1-1-2 假評論來源 1
1-1-3 模型應用於真假評論分類 2
1-2 研究動機 4
1-2-1 假評論標註 4
1-2-2 過往研究結果 4
1-3 研究目的 5
1-4 研究架構 6
第二章文獻探討 7
2-1 BERT應用於跨領域之真假評論分類 7
2-2 跨領域定義 Definition of Cross-domain 10
2-3 演算法應用於跨領域之真假評論分類文獻回顧 11
第三章研究方法 15
3-1 研究流程 15
3-2 BERT 16
3-3 遮蔽機制 MASK mechanism 18
3-4 微調機制 Fine-tuning 21
3-4-1 AE-BERT (Auto-encoder based on BERT) 21
3-4-2 MASK-BERT (MASK mechanism based on BERT) 22
第四章研究實驗 24
4-1 資料蒐集 24
4-2 資料前處理 25
4-2-1 MongoDB 25
4-2-2 特徵生成 26
4-3 超參數 28
4-4 實驗結果與分析 29
4-4-1 損失函數 30
4-4-2 In-domain 31
4-4-3 Cross-domain 33
第五章結論與未來研究之建議 34
5-1 研究結論 34
5-2 研究限制與未來建議 35
第六章參考文獻 36

參考文獻

[1] Cao, N., Ji, S., Chiu, D.K.W., He, M. and Sun, X., (2020). A Deceptive Review Detection Framework: Combination of Coarse and Fine-grained Features. Expert Systems with Applications (2020).
[2] Zhang, D., Li, W., Niu, B. and Wu, C., (2023). A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decision Support Systems, 166, 113911.
[3] Du, C., Sun, H., Wang, J., Qi, Q. and Liao, J., (2020). Adversarial and domain-aware BERT for cross-domain sentiment analysis. Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 4019-4028).
[4] Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F. and Vaughan, J. W., (2010). A theory of learning from different domains. Machine learning 79 (2010).
[5] Salunkhe, A., (2021). Attention-based Bidirectional LSTM for Deceptive Opinion Spam Classification arXiv:2112.14789v1.
[6] Devlin, J., Chang, M. W., Lee, K. and Toutanova, K., (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv:1810.04805v2.
[7] Lee, K. D., Han, K., & Myaeng, S. H., (2016). Capturing word choice patterns with LDA for fake review detection in sentiment analysis. Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics (2016).
[8] Salminen, J., Kandpal, C., Kamel, A. M., Jung, S. G. and Jansen, B. J., (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771.
[9] Hernández-Castañeda, Á., Calvo, H., Gelbukh, A. and Flores, J. J. G., (2017). Cross-domain deception detection using support vector networks. Soft Computing, 21, 585-595.
[10] Alsubari, S. N., Deshmukh, S. N., Alqarni, A. A., Alsharif, N., Aldhyani, T. H., Alsaade, F. W. and Khalaf, O. I., (2022). Data analytics for the identification of fake reviews using supervised learning. Computers, Materials & Continua, 70(2), 3189-3204.
[11] Cao, Z., Zhou, Y., Yang, A. and Peng, S., (2021). Deep transfer learning mechanism for fine-grained cross-domain sentiment classification. Connection Science, 33(4), 911-928.
[12] Cagnina, L. C. and Rosso, P., (2017). Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 25(Suppl. 2), 151-174.
[13] Qu, Z., Jia, Q., Lyu, C., Liu, J., Liu, X. and Zheng, K., (2022). Detecting Fake Reviews with Generative Adversarial Networks for Mobile Social Networks. Security and Communication Networks, 2022.
[14] Alsubari, S. N., Deshmukh, S. N., Al-Adhaileh, M. H., Alsaade, F. W. and Aldhyani, T. H., (2021). Development of integrated neural network model for identification of fake reviews in E-commerce using multidomain datasets. Applied Bionics and Biomechanics, (2021).
[15] Wei, C. S., Hsu, P. Y., Huang, C. W., Cheng, M. S. and Prassida, G. F., (2020). Devising a Cross-Domain Model to Detect Fake Review Comments. Advances in Computational Collective Intelligence: 12th International Conference, ICCCI 2020, Da Nang, Vietnam, November 30–December 3, 2020, Proceedings 12 (pp. 714-725). Springer International Publishing.
[16] Wu, Y., Ngai, E. W., Wu, P. and Wu, C., (2020). Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems, 132, 113280.
[17] Jia, S., Zhang, X., Wang, X. and Liu, Y., (2018). Fake reviews detection based on LDA. 2018 4th International Conference on Information Management (ICIM) (pp. 280-283). Ieee.
[18] Lin, T. Y., Goyal, P., Girshick, R., He, K. and Dollár, P., (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[19] Ott, M., Choi, Y., Cardie, C. and Hancock, J. T., (2011). Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557.
[20] Wang, Z., Gu, S. and Xu, X., (2018). GSLDA: LDA-based group spamming detection in product reviews. Applied Intelligence, 48, 3094-3107.
[21] Li, Z., Wei, Y., Zhang, Y. and Yang, Q., (2018). Hierarchical attention transfer network for cross-domain sentiment classification. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
[22] Gupta, P., Gandhi, S. and Chakravarthi, B. R., (2021). Leveraging transfer learning techniques-bert, roberta, albert and distilbert for fake review detection. Forum for Information Retrieval Evaluation (pp. 75-82).
[23] Sánchez-Junquera, J., Villaseñor-Pineda, L., Montes-y-Gómez, M., Rosso, P. and Stamatatos, E., (2020). Masking domain-specific information for cross-domain deception detection. Pattern Recognition Letters, 135, 122-130.
[24] Dos Santos, B. N., Marcacini, R. M. and Rezende, S. O., (2021). Multi-domain aspect extraction using bidirectional encoder representations from transformers. IEEE Access, 9, 91604-91613.
[25] Ren, Y. and Ji, D., (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213-224.
[26] Loper, E. and Bird, S., (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.
[27] Redko, I., Habrard, A. and Sebban, M., (2019). On the analysis of adaptability in multi-source domain adaptation. Machine Learning, 108(8-9), 1635-1652.
[28] Luca, M., (2016). Reviews, reputation, and revenue: The case of Yelp. com. Harvard Business School NOM Unit Working Paper, (12-016).
[29] Floh, A., Koller, M. and Zauner, A., (2013). Taking a deeper look at online reviews: The asymmetric effect of valence intensity on shopping behaviour. Journal of Marketing Management, 29(5-6), 646-670.
[30] Hasanat, M. W., Hoque, A., Shikha, F. A., Anwar, M., Hamid, A. B. A. and Tat, H. H., (2020). The impact of coronavirus (COVID-19) on e-business in Malaysia. Asian Journal of Multidisciplinary Studies, 3(1), 85-90.
[31] Li, J., Cardie, C. and Li, S., (2013). Topicspam: a topic-model based approach for spam detection. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 217-221).
[32] Klaus, T. and Changchit, C., (2019). Toward an understanding of consumer attitudes on online review usage. Journal of Computer Information Systems, 59(3), 277-286.
[33] Li, J., Ott, M., Cardie, C. and Hovy, E., (2014). Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1566-1576).
[34] Fellbaum, C., (2010). WordNet. Theory and applications of ontology: computer applications (pp. 231-243). Dordrecht: Springer Netherlands.

指導教授

許秉瑜(Ping-Yu Hsu)

審核日期

2023-7-26

推文