博碩士論文 105423018 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator黃靖雅zh_TW
DC.creatorJing-Ya Huangen_US
dc.date.accessioned2018-6-22T07:39:07Z
dc.date.available2018-6-22T07:39:07Z
dc.date.issued2018
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=105423018
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract現今生活裡,每一件事情都可以被大家公開評論,包括你看過的報章雜誌、書籍。網路評論已被認定為是可以信任的,用戶可以透過不同的方式提供網路評論,例如星級、文字、圖片和視頻。多數的用戶在購買商品和體驗前也都會先查看網路上的評論,當網路上資訊量過多的時候,就會造成資訊超載的問題。我們因此想對這些評論的資料去做資料探勘,利用機器學習的方法,處理及過濾這些大量的資訊。 本研究使用網路評論有益性資料集。在進行資料清理階段時,我們發現這些在真實世界中的資料,資料遺漏的現象是非常普遍的。且鑒於目前現有的文獻中,並無針對各項監督式學習演算法,在於真實世界的資料運作中有針對遺漏值預測填補上的效能表現進行比較。因此,設計了兩個實驗來進行,於實驗一,對具遺漏值之網路評論有益性資料集中的評論者資料進行遺漏值填補方法,使得能建立良好的預測模式,幫助旅客或是旅館業者找出最有幫助之評論。而實驗二,則對現實世界中其它可能產生的遺漏現象作探討,運用程式模擬10%到50%的資料遺漏,除了比較不同補值法之間的效能差異外,也會對網路評論領域找出最好的資料填補方法。 實驗中使用了三種類型的技術,如使用傳統的Case Deletion、平均眾數補值法、KNN、使用學術界常常運用的支持向量機進行補值,以及使用對遺漏值較不敏感的決策樹方法,直接處理遺漏值資料而不補值。於實驗後的結果得知,使用決策樹直接處理不完整資料得到的分類正確率結果最好。相信這樣的貢獻能協助未來使用者能更洽當且有效率的處理遺漏值問題,使得能更快進入到資料分析階段。zh_TW
dc.description.abstractIn today′s world, everyone can comment on many public posts, including newspapers, magazines and books you have ever read. Online reviews are considered as trustworthy. Users can provide online reviews through several ways such as star ratings, text, images, and videos. Most users will also browse the reviews on the websites before purchasing goods and experiencing. This constant state of information overload is caused by the Internet that contains too much information; hence data mining techniques can be employed to solve this problem. This thesis considers the helpfulness of online hotel reviews for the research. During the data preprocessing, we found that it is very common that real-world review datasets usually contain certain numbers of missing attribute values. In literature, there is no a study focus on examining the performances of different types of techniques to handle incomplete online review datasets. The experiment is composed of two studies. In the first study, the dataset is collected from TripAdvisor, where some reviewer related information is missing, such as reviewer level, age, sex, etc. Three types of techniques are compared, which are case deletion, imputation methods including mean/mode, KNN, and SVM, and directly handle the incomplete dataset without imputation by C5.0. In the second study, the raining information is simulated for 10% to 50% missing rates of the dataset. The experiment results of the two studies show that the C5.0 decision tree algorithm is the better choice for dealing with missing values in online review datasets.en_US
DC.subject資料前處理zh_TW
DC.subject遺漏值zh_TW
DC.subject補值方法zh_TW
DC.subject網路評論zh_TW
DC.subjectdata preprocessingen_US
DC.subjectmissing valueen_US
DC.subjectimputationen_US
DC.subjectonline reviewen_US
DC.title遺漏值填補於網路評論有益性資料集之研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleEvaluation of missing value imputation methods for the helpfulness of online reviewsen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明