以文字探勘技術探討評論星級評分與實際評分之間的一致性研究

DC 欄位	值	語言
DC.contributor	資訊管理學系在職專班	zh_TW
DC.creator	莊婉麗	zh_TW
DC.creator	Wan-Li Chuang	en_US
dc.date.accessioned	2024-6-24T07:39:07Z
dc.date.available	2024-6-24T07:39:07Z
dc.date.issued	2024
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=110453036
dc.contributor.department	資訊管理學系在職專班	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	隨著社群媒體的盛行，eWOM的生成和傳播迅速擴散，線上評論已成為影響消費者購買與否的重要因素之一。在電影領域，消費者在觀影後常常會在IMDb等平台上留下文字評論和星級評分，這些數據蘊藏著豐富的用戶偏好訊息。然而，大多數現有研究都將重點放在了評論文本的情感分析和推薦系統的構建上，過去較少文獻針對評論星級評分與實際評分一致性進行預測。針對這一研究缺口，本文以 IMDb 網站電影評論數據為例，使用爬蟲技術蒐集電影評論與資訊做為本研究資料集，預處理後共有21,150筆資料，並取出前3大類電影類別，再以是否使用文字探勘技術拆分為文本類型及非文本類型共9種特徵。在模型構建方面研究採用了多種迴歸方法進行評分預測包括隨機森林、梯度提升機、自適應增強和線性迴歸。實驗使用十折交叉驗證來訓練預測模型，最後使用迴歸評估指標評估模型預測準確率。本研究主要聚焦於使用不同的文字向量技術來提高電影評論星級預測的準確率，並識別影響評分的關鍵特徵。透過四項實驗，發現Doc2Vec在評論星級評分預測中表現最佳，突顯為重要的文本特徵。研究也顯示，發現Action電影類別的評分預測結果優於其他3類。進一步分析文本與非文本特徵的組合發現，複合特徵可顯著提升預測精度。實驗四比較全特徵模型與特徵選擇模型，結果發現在本資料集上使用全部特徵可以取得更好的預測效果。這些結果不僅證明了文字探勘技術的實用性，也為電影評論分析提供了新的技術途徑。	zh_TW
dc.description.abstract	With the prevalence of social media, the generation and dissemination of eWOM have rapidly expanded, and online reviews have become one of the important factors influencing consumer purchasing decisions. In the movie domain, consumers often leave text reviews and star ratings on platforms like IMDb after watching a movie, and these data contain rich user preference information. However, most existing studies have focused on sentiment analysis of review texts and the construction of recommendation systems, with few studies predicting the consistency between review star ratings and actual ratings.To address this research gap, this study uses movie review data from the IMDb website as an example, collecting movie reviews and information using web crawling techniques as the research dataset. After preprocessing, there are 21,150 data points, and the top 3 movie genres are selected. The data is then divided into text-type and non-text-type features based on whether text mining techniques are used, resulting in 9 types of features. In terms of model construction, the study employs various regression methods for rating prediction, including Random Forest, Gradient Boosting Machine, AdaBoost, and Linear Regression. The experiments use 10-fold cross-validation to train the prediction models, and regression evaluation metrics are used to assess the model prediction accuracy.The main focus of this research is on using different text vectorization techniques to improve the accuracy of movie review star rating predictions and identifying key features that influence the ratings. Through four experiments, it is found that Doc2Vec performs the best in review rating prediction, highlighting its importance as a text feature. The study also shows that the rating prediction results for the Action movie genre are better than those for the other three genres. Further analysis of the combination of text and non-text features reveals that composite features can significantly improve prediction accuracy. Experiment 4 compares the full-feature model with the feature selection model, and the results show that using all features can achieve better prediction effects on this dataset. These results not only demonstrate the practicality of text mining techniques but also provide new technical approaches for movie review analysis.	en_US
DC.subject	情感分析	zh_TW
DC.subject	文字向量	zh_TW
DC.subject	特徵類別	zh_TW
DC.subject	評論星級評分	zh_TW
DC.subject	一致性	zh_TW
DC.subject	sentiment analysis	en_US
DC.subject	text vectorization	en_US
DC.subject	feature categories	en_US
DC.subject	review ratings	en_US
DC.subject	consistency	en_US
DC.title	以文字探勘技術探討評論星級評分與實際評分之間的一致性研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	A Study on the Consistency between Review Star Ratings and Actual Ratings Using Text Mining Techniques	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110453036 完整後設資料紀錄