dc.description.abstract | With the prevalence of social media, the generation and dissemination of eWOM have rapidly expanded, and online reviews have become one of the important factors influencing consumer purchasing decisions. In the movie domain, consumers often leave text reviews and star ratings on platforms like IMDb after watching a movie, and these data contain rich user preference information. However, most existing studies have focused on sentiment analysis of review texts and the construction of recommendation systems, with few studies predicting the consistency between review star ratings and actual ratings.To address this research gap, this study uses movie review data from the IMDb website as an example, collecting movie reviews and information using web crawling techniques as the research dataset. After preprocessing, there are 21,150 data points, and the top 3 movie genres are selected. The data is then divided into text-type and non-text-type features based on whether text mining techniques are used, resulting in 9 types of features. In terms of model construction, the study employs various regression methods for rating prediction, including Random Forest, Gradient Boosting Machine, AdaBoost, and Linear Regression. The experiments use 10-fold cross-validation to train the prediction models, and regression evaluation metrics are used to assess the model prediction accuracy.The main focus of this research is on using different text vectorization techniques to improve the accuracy of movie review star rating predictions and identifying key features that influence the ratings. Through four experiments, it is found that Doc2Vec performs the best in review rating prediction, highlighting its importance as a text feature. The study also shows that the rating prediction results for the Action movie genre are better than those for the other three genres. Further analysis of the combination of text and non-text features reveals that composite features can significantly improve prediction accuracy. Experiment 4 compares the full-feature model with the feature selection model, and the results show that using all features can achieve better prediction effects on this dataset. These results not only demonstrate the practicality of text mining techniques but also provide new technical approaches for movie review analysis. | en_US |