姓名 洪彥群(Yen-Chun Hung)  查詢紙本館藏   畢業系所 資訊管理學系在職專班
論文名稱 利用資料探勘技術建立商用複合機銷售預測模型
(Applying Data Mining Techniques to Construct the Sale Forecast Model for Multiple Function Devices)
摘要(中) 商用多功能複合機是將影印、列印、傳真以及掃描等多項功能配載於單一裝置中,
動作。在實驗流程上,將資料分成連續型資料與離散型資料,並分別透過資料探勘工具Weka3.6.9 版本,進行不同分類器實驗,以試圖獲得最佳銷售預測模型。其中離散型資料是根據個案公司每月銷售數量,以常態分配法劃分為 3 類。
components analysis)篩選後的維度,其連續型與離散型的預測結果。在連續型資料的預測工具上,本研究分別採用 Linear Regression、MultilayerPerceptron、SMOreg 與 kNN 等4 種單一分類器,並搭配 Additive Regression 與 Bagging 多重分類器加以驗證;在離散型資料則採用 MultilayerPerceptron、SMO、LibSVM、kNN、CART 與 BayNet 等 6 種單一分類器,並搭配 Adaboost 與 Bagging 多重分類器加以驗證。
經過實驗結果得知,PCA 對於連續型或離散型資料的預測結果影響都不大,而在連
續型資料上,以 SMOreg 的表現最佳,錯誤率整體來說最低;而在離散型資料,則以LibSVM 的正確率較高。
摘要(英) Multiple function devices are a type of office machines which combines E-mail, fax,copy, printing, and scanning functions. It was designed to provide users with easy and promptoperation and usage. In the literature of data mining applications, very few focus on B2B selling forecast in Taiwan. Moreover, there is no a comparative study for the applicability of data mining techniques to different types of forecasting results, which are continuous and discrete prediction outputs. Therefore, in this thesis the research objective is to compare different supervised learning techniques for the sale forecast of multiple function devices. The contributions of this thesis are able to provide some guidelines for the case company to conduct sales forecast and can give academics a reference on B2B industry.
In the experiments, the attributes relate to sales from historical data are collected, and the data completeness in each attribute is also taken into account. Next, the historical selling quantity (i.e. continuous values) is used as the prediction output. In addition, the selling quantity is further divided into 3 classes by normal distribution for comparison. On the other hand, in order to find out the effect of performing feature selection on the forecasting result,PCA (principle components analysis) is used to select more representative attributes from the original data set. For model construction, different single and multiple classification techniques are compared.
The experimental results show that performing feature selection does not significantly affect the final prediction results no matter for continuous or discrete prediction output. For continuous prediction without PCA, the support vector machine (SVM) performs the best in terms of MAE (Mean Absolute Error). For discrete prediction without PCA, the SVM outperforms the other models in terms of prediction accuracy.
關鍵字(中) ★ 資料探勘
★ 銷售預測
★ 單一分類器與多重分類器
關鍵字(英) ★ Data Mining
★ Sales Forecast
★ Single Classifiers
★ Multiple Classifiers
論文目次 iv
摘要 ..............................................i
Abstract ..........................................ii
目錄 ..............................................iv
圖目錄 ............................................vi
表目錄 ............................................vii
第一章 前言 .......................................1
1.1 研究背景 ......................................1
1.2 研究動機 ......................................2
1.3 研究目的 ......................................3
1.4 研究對象與範圍 ................................4
1.4.1 個案公司介紹 ................................4
1.4.2 資料蒐集範圍 ................................4
1.4.3 資料蒐集的限制 ..............................4
第二章 文獻探討 ...................................5
2.1 資料探勘介紹 ..................................5
2.1.1 資料探勘定義 ................................5
2.1.2 資料探勘常見功能 ............................7
2.1.3 資料探勘的程序 ..............................9
2.2 銷售預測目的與方法 ............................11
2.2.1 銷售預測目的 ................................11
2.2.2 銷售預測的方法 ..............................14
2.3 景氣指標介紹 ..................................19
2.3.1 景氣對策信號(Monitoring Indicator)...........21
2.3.2 景氣燈號(Monitoring Lights) .................22
2.4 相關銷售預估論文回顧 ..........................23
第三章 研究方法 ...................................26
3.1 監督式學習技術 ................................26
3.1.1 Linear Regression............................26
3.1.2 Multilayer Perceptron........................27
3.1.3 SVM..........................................27
3.1.4 kNN..........................................28
3.1.5 CART.........................................29
3.1.6 Bayes Network................................30
3.1.7 Adaboost.....................................30
3.1.8 Bagging......................................31
3.2 實驗流程 ......................................32
3.2.1 資料來源 ....................................32
3.2.2 主成分分析 Principle Components Analysis.....36
3.2.3 模型建立流程 ................................40
3.2.4 連續型資料 ..................................41
3.2.5 離散型資料 ..................................41
3.2.6 K 折交叉驗證 K-Fold Cross-Validation.........42
第四章 研究結果 ...................................43
4.1 連續型資料研究結果 ............................43
4.1.1 Linear Regression............................44
4.1.2 MultilayerPerceptron.........................45
4.1.3 SMOreg.......................................46
4.1.4 kNN .........................................47
4.1.5 連續型資料結果小結 ..........................48
4.2 離散型資料研究結果 ............................49
4.2.1 MultilayerPerceptron.........................50
4.2.2 CART.........................................50
4.2.3 LibSVM.......................................51
4.2.4 SMO .........................................51
4.2.5 kNN .........................................52
4.2.6 BayesNetwork.................................53
4.2.7 離散型資料結果小結..............................54
4.3 討論 ..........................................55
第五章 結論 .......................................56
5.1 研究結論 ......................................56
5.2 研究貢獻 ......................................58
5.3 研究限制及未來研究方向 ........................59
5.3.1 研究限制 ....................................59
5.3.2 建議未來研究的方向 ..........................59
參考文獻 ..........................................60
