以文字探勘技術探討評論星級評分與實際評分之間的一致性研究;A Study on the Consistency between Review Star Ratings and Actual Ratings Using Text Mining Techniques

NCU Institutional Repository > 管理學院 > 資訊管理學系碩士在職專班 > 博碩士論文 > Item 987654321/95466

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95466

题名:	以文字探勘技術探討評論星級評分與實際評分之間的一致性研究;A Study on the Consistency between Review Star Ratings and Actual Ratings Using Text Mining Techniques
作者:	莊婉麗;Chuang, Wan-Li
贡献者:	資訊管理學系在職專班
关键词:	情感分析;文字向量;特徵類別;評論星級評分;一致性;sentiment analysis;text vectorization;feature categories;review ratings;consistency
日期:	2024-06-24
上传时间:	2024-10-09 16:53:06 (UTC+8)
出版者:	國立中央大學
摘要:	隨著社群媒體的盛行，eWOM的生成和傳播迅速擴散，線上評論已成為影響消費者購買與否的重要因素之一。在電影領域，消費者在觀影後常常會在IMDb等平台上留下文字評論和星級評分，這些數據蘊藏著豐富的用戶偏好訊息。然而，大多數現有研究都將重點放在了評論文本的情感分析和推薦系統的構建上，過去較少文獻針對評論星級評分與實際評分一致性進行預測。針對這一研究缺口，本文以 IMDb 網站電影評論數據為例，使用爬蟲技術蒐集電影評論與資訊做為本研究資料集，預處理後共有21,150筆資料，並取出前3大類電影類別，再以是否使用文字探勘技術拆分為文本類型及非文本類型共9種特徵。在模型構建方面研究採用了多種迴歸方法進行評分預測包括隨機森林、梯度提升機、自適應增強和線性迴歸。實驗使用十折交叉驗證來訓練預測模型，最後使用迴歸評估指標評估模型預測準確率。本研究主要聚焦於使用不同的文字向量技術來提高電影評論星級預測的準確率，並識別影響評分的關鍵特徵。透過四項實驗，發現Doc2Vec在評論星級評分預測中表現最佳，突顯為重要的文本特徵。研究也顯示，發現Action電影類別的評分預測結果優於其他3類。進一步分析文本與非文本特徵的組合發現，複合特徵可顯著提升預測精度。實驗四比較全特徵模型與特徵選擇模型，結果發現在本資料集上使用全部特徵可以取得更好的預測效果。這些結果不僅證明了文字探勘技術的實用性，也為電影評論分析提供了新的技術途徑。 ;With the prevalence of social media, the generation and dissemination of eWOM have rapidly expanded, and online reviews have become one of the important factors influencing consumer purchasing decisions. In the movie domain, consumers often leave text reviews and star ratings on platforms like IMDb after watching a movie, and these data contain rich user preference information. However, most existing studies have focused on sentiment analysis of review texts and the construction of recommendation systems, with few studies predicting the consistency between review star ratings and actual ratings.To address this research gap, this study uses movie review data from the IMDb website as an example, collecting movie reviews and information using web crawling techniques as the research dataset. After preprocessing, there are 21,150 data points, and the top 3 movie genres are selected. The data is then divided into text-type and non-text-type features based on whether text mining techniques are used, resulting in 9 types of features. In terms of model construction, the study employs various regression methods for rating prediction, including Random Forest, Gradient Boosting Machine, AdaBoost, and Linear Regression. The experiments use 10-fold cross-validation to train the prediction models, and regression evaluation metrics are used to assess the model prediction accuracy.The main focus of this research is on using different text vectorization techniques to improve the accuracy of movie review star rating predictions and identifying key features that influence the ratings. Through four experiments, it is found that Doc2Vec performs the best in review rating prediction, highlighting its importance as a text feature. The study also shows that the rating prediction results for the Action movie genre are better than those for the other three genres. Further analysis of the combination of text and non-text features reveals that composite features can significantly improve prediction accuracy. Experiment 4 compares the full-feature model with the feature selection model, and the results show that using all features can achieve better prediction effects on this dataset. These results not only demonstrate the practicality of text mining techniques but also provide new technical approaches for movie review analysis.
显示于类别:	[資訊管理學系碩士在職專班 ] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	23	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....