摘要: | 在線評論已經變成消費者重要的購買參考決策,它給消費者提供了豐富的參考資訊,但同時也帶給消費者資料過載的問題。消費的在線評論可分為兩種:一種是基於體驗品評論,另一種則是基於搜尋品評論。一般來說基於搜尋品的評論會更注重在產品功能上,基於體驗品的評論則由於每個人對體驗品的體驗都不同,通常難以判斷對於評論購買決策是否帶來有益性幫助。 本研究的研究對象為電影評論,通過IMDb網站使用爬蟲技術蒐集電影評論與電影資訊做為本研究資料集,資料集經過預處理後共有116,593筆並整理出Action、Adventure、Drama 3種電影分類、9種特徵類別並以是否使用文字探勘技術拆分為文本類型以及非文本類型。在迴歸預測方法採監督式機器學習模型 Random Forest、XGBoost、Adaboost並設計5項實驗,實驗中特徵類別組合選取上採逐步迴歸向前選擇法來進行。 從實驗結果可以得知比較預測方法的結果以Random Forest為最佳結果;使用非文本相關特徵類別的結果結論閱讀評論者會相信信任的評論者評論並參考此評論者對於電影的投票且評論發佈的時間越久以及評論發佈的時間與電影上映時間越短都對評論有益性的預測有所幫助;使用文本相關特徵類別的結果得知文字向量BERT為最重要的單項文本特徵類別,在組合方面Drama電影類型所要參考的文本特徵類別數量會大於Action/Adventure電影類別,這是因為Drama的電影類別具有較多的劇情,閱讀評論者會較為詳細的去閱讀評論本身而非只是參考評論本身的情緒或關鍵字等;使用非文本+文本相關特徵類別的結果得知文本的特徵類別對於提升電影評論有益性模型的準確度不一定能帶來幫助,最後從實驗結果的數據上證明使用逐步迴歸法確實可以有效找出特徵類別組合並提升評論有益性預測準確度。 ;Online reviews have become an important reference for consumers in their purchasing decisions, providing them with a wealth of information, but also presenting them with the problem of data overload. online reviews can be divided into two kinds: one is based on experience reviews. Generally, search-based reviews focus more on product features and experience-based reviews are more subjective and emotional because each person′s experience is different, It is difficult to judge the usefulness of the review in making a purchase decision. The research object of this study is movie reviews. Using NLP technology to analyze the characteristic categories of criticism on electricity the influence of the beneficial effect of film reviews helps consumers to find out the beneficial effect of experience products from a large number of reviews. Via IMDb the website uses crawler technology to collect movie reviews and movie information as the data set of this study. After preprocessing, data is total 116,593. The data were categorized into three movie types: Action, Adventure, and Drama movies, nine feature categories, and whether the feature categories were split into text types and non-text types using text exploration techniques. In this study, we adopt a supervised machine learning model, use Random Forest, XGBoost, and Adaboost in the regression prediction method, and design five sets of experiments for research, use the stepwise regression forward selection method was used to test the combination of different feature categories the best combination of feature categories was selected. After the experiment, it was found that the Random Forest method could achieve better results. In the non-text category, readers will trust the reviews of trusted reviewers and reference that reviewer′s vote for the movie, the older the review and the shorter the release date of the review and the release date of the film both predicted the beneficial effects of the review text vector. BERT is the most important single text feature category. In terms of combination, the number of textual feature categories to refer to in the Drama genre is larger than the Action/Adventure genre because the reader not only focus in the referring to the comments themselves in terms of mood or keywords, etc., as well as from research experiments it is known that when non-text and text-related feature categories are used, the feature categories of text are beneficial to improve the review of all film categories the accuracy of the model does not necessarily help.Finally, stepwise regression that it can effectively improve the accuracy of the beneficial prediction. |