中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/92641
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 37793094      Online Users : 689
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/92641


    Title: 基於機器學習之特徵選取方法於葡萄酒評論文本分類之研究;Research on feature selection methods based on machine learning in wine review text classification
    Authors: 呂浩瑜;Lu, Hao-Yu
    Contributors: 資訊管理學系在職專班
    Keywords: 文本分類;機器學習;葡萄酒評論
    Date: 2023-06-15
    Issue Date: 2023-10-04 16:07:21 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 隨著網際網路和社群媒體的普及以及Covid-19疫情影響,葡萄酒消費者越來越依賴線上評論來決定購買選擇。本研究旨在比較不同文本特徵值萃取方法在葡萄酒評論文本分類中的效果,以期對葡萄酒評論文本分類技術的發展和應用做出貢獻並提高消費者在購買葡萄酒時的選擇效率。本研究首先從VIVINO葡萄酒評論網爬取1500則評論並請專家標記香氣與口感類別,經過資料預處理後,分別使用TF-IDF、Doc2vec和BERT-word embedding三種文本特徵選取方法產生字詞向量。接著搭配Naive Bayes、Logistic Regression、Random Forest、Support Vector Machine和XGBoost五種分類模型,探討不同的特徵表示法與分類器在文本分類中的表現和適用性。研究結果顯示,最適合本次紅酒資料集五個目標變數的模型組合皆為使用TF-IDF文字轉譯器搭配XGBoost分類模型,這種組合的預測準確率皆高於0.8,表現出色。此外,使用樣本合成法SMOTE來解決樣本不平衡問題時,模型的結果有小幅度提升,尤其是Accuracy與Precision。但當原始樣本過於龐大時,SMOTE可能不值得使用,因為需要耗費較多的時間處理資料不平衡,而僅能提升小幅度的效果。;With the widespread use of the internet and social media, as well as the impact of the Covid-19 pandemic, wine consumers are increasingly relying on online reviews to make purchasing decisions. This study aims to compare the effectiveness of different text feature extraction methods in wine review text classification, in order to contribute to the development and application of wine review text classification techniques and improve the efficiency of consumers′ choices when purchasing wine. In this study, we first crawled 1,500 reviews from the VIVINO wine review website and asked experts to label aroma and taste categories. After data preprocessing, we used TFIDF, doc2vec, and BERT-word embedding methods to generate word vectors. We then paired these with five classification models, namely Naive Bayes, Logistic Regression, Random Forest, Support Vector Machine, and XGBoost, to explore the performance and applicability of different feature representations and classifiers in text classification. The results showed that the most suitable model combination for the five target variables of this wine dataset was using the Tf-idf text transformer paired with the XGBoost classification model, which had a prediction accuracy of more than 0.8, demonstrating excellent performance. Moreover, when using the Synthetic Minority Over-sampling Technique (SMOTE) to address the issue of sample imbalance, there was a slight improvement in the model′s results, especially in terms of accuracy and precision. However, when the original sample size is too large, SMOTE may not be worth using, as it requires more time to process data imbalance and only results in a slight improvement in performance.
    Appears in Collections:[Executive Master of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML45View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明