姓名 張鈺鴻(Yu-Hung Chang) 畢業系所 資訊管理學系在職專班
論文名稱 應用文字探勘技術建構預測客訴問題類別機器學習模型
摘要(中) 隨著科技的進步,顧客或消費者可以通過各種不同的渠道來發表或分享對該產品質量、服務優缺點;當負面的客訴評價出現時,接著會有許多的網友跟隨回應,有時議題也會因為這樣而引發漣渏效應進而受到群眾注意,這些負面的評價我們可以稱之為客訴。目前服務的企業對於社交平台上顧客抱怨(又稱客訴)的處理大多是客戶服務中心人員以人工方式來取得顧客抱怨評價留言進而進一步處理,在時效性上常會緩不濟急。客訴的留言通常也具有高度可用可提取的信息,這些客訴通常帶有不滿的情緒或者對於希望該產品求好的心態,分析這些客訴這對於組織而言是很重要的。
我們通過Google Play平台的取得評價留言資料集做為本次研究的資料集,該資料集的期限區間從2014年1月1日至2020年4月30日之間共有31401筆數據,將這些非結構化的客訴留言使用監督式機器學習方式來逐一進行本文探勘(Text Mining)、特徵詞萃取 (Feature Extraction) ,以Orange探勘工具分析特徵詞,並建立關鍵字詞庫 (Bag Words) 接著進行建模(Topic Model) 、標記(Labeled)、以樸素貝葉斯(Naïve Bayes, NB) 、k最近鄰居法(k-nearest neighbors, KNN)、隨機森林(Random Forest, RF)、支持向量機 (support vector machine, SVM) 等四種研究上較常應用在分類預測等研究演算法來對這些客訴問題進行分析以及問題類型分類預測,模型主要分為六個模型(Topic Model),研究發現在六分類方法 (Multi-Class Classification) 上複合詞性的語料庫較預測準確率比單一詞性語料庫較佳,而二分類方法 (Binary Classification) 則以單一詞性語料庫中的動作及物動詞準確度較佳,證實本研究可有效的預測客訴問題分類(Prediction customer complaint Classification),可節省人工對客訴問題分類的時間。
摘要(英) With the advancement of technology, customers or consumers can publish or
share the pros and cons of the product or service through a variety of different
channels; when negative customer opinions or evaluations appear, many natives will
follow to respond, and sometimes issues will also be discussed. Because of this, ripple
effect is caused and attention of the masses is attracted. These negative comments can
be called guest complaints. At present, the service companies deal with so called
customer complaints on social platforms, and most of them use customer service
center personnel to manually obtain customer complaints and further process them,
which often slows down in timeliness. The messages of customer complaints usually
also have a high degree of useful information. These customer complaints usually
contain dissatisfactions and hopes for improvements, which is very important for the
Data sets used in this research are gathered from user reviews between January 1,
2014 and April 30th, 2020 on Google Play platform, 31,401 data sets in total. A In
this article, customer complaints analyzation and problem category prediction are
accomplished based on Supervised Machine Learning Methods, for instance, Naive
Bayesian Calculations. After feature words extracted from unstructured user
complaints and analyzed with Orange exploration tools, a keyword vocabulary was
built, modelled and labelled, which includes six main dimensions.This research shows
that Multi-Class Classification has higher prediction accuracy on compound keyword
database, comparing with Binary Classification, which has higher accuracy when
applied on keyword database with single transitive verbs. It is also proved that
customer complaints could be efficiently classified and saved time from manual
Keywords: Text Mining, classification prediction, supervised learning
關鍵字(中) ★ 文字探勘
★ 分類預測
★ 監督式機器學習
第一章 緒論 1
1.1研究背景 1
1.2研究動機 2
1.3研究目的 3
第二章 文獻探討 5
2.1文字探勘與自然語言處理 (Natural Language Processing (NLP)) 5
2.2客訴留言問題分類預測機器學習方法相關文獻研究 6
2.3探討客訴議題及客訴議題本文分析相關文獻 18
2.4總結 24
第三章 研究方法 25
3.1資料集資料來源 26
3.2資料集預處理 30
3.3中文斷詞及特徵選取 31
3.4文字向量 33
3.5本文探勘 40
3.6研究變數 40
3.7分析方法及工具套件 44
3.8實驗設計與評估 47
第四章 實證結果分析 49
4.1實驗資料 49
4.2實驗結果 51
4.3變項重要性排序及Ranks 57
4.4綜合討論 69
第五章 結論與建議 71
5.1研究結論與貢獻 71
5.2研究限制 73
指導教授 胡雅涵(Ya-Han Hu) 審核日期 2021-6-3
