結合主成分分析之貝氏分類模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：18.227.48.28

姓名

吳若平(Jo-Ping Wu) 查詢紙本館藏

畢業系所

工業管理研究所

論文名稱

結合主成分分析之貝氏分類模型
(Naive Bayes classifier with Principal Components Analysis for continuous attributes)

相關論文

★ 二階段作業研究模式於立體化設施規劃應用之探討–以半導體製造廠X及Y公司為例	★ 推行TPM活動以改善設備總合效率並提昇企業競爭力...以U公司桃園工廠為例
★ 資訊系統整合業者行銷通路策略之研究	★ 以決策樹法歸納關鍵製程暨以群集法識別關鍵路徑
★ 關鍵績效指標(KPI)之建立與推行 - 在造紙業	★ 應用實驗計劃法- 提昇IC載板錫球斷面品質最佳化之研究
★ 如何從歷史鑽孔Cp值導出新設計規則進而達到兼顧品質與降低生產成本目標	★ 產品資料管理系統建立及導入-以半導體IC封裝廠C公司為例
★ 企業由設計代工轉型為自有品牌之營運管理	★ 運用六標準差步驟與FMEA於塑膠射出成型之冷料改善研究(以S公司為例)
★ 台灣地區輪胎產業經營績效之研究	★ 以方法時間衡量法訂定OLED面板蒸鍍有機材料更換作業之時間標準
★ 利用六標準差管理提升生產效率－以Ａ公司塗料充填流程改善為例	★ 依流程相似度對目標群組做群集分析- 以航空發動機維修廠之自修工件為例
★ 設計鏈績效衡量指標建立 —以電動巴士產業A公司為例	★ 應用資料探勘尋找影響太陽能模組製程良率之因子研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

大數據(Big data)的時代來臨，資料量急劇增加，資料處理分類的速度也成為資料探勘這門學問的一個很重要的環節。單純貝氏分類器(Naïve Bayes classifier)是一種簡單且實用的分類方法，其主要是根據貝氏定理的理論而來，它透過事前機率和事後機率和各屬性彼此間互相獨立的假設，來預測分類結果；此分類法為一種監督式的學習方法。它可以透過簡單的運算，而快速的獲得分類結果，也是其最大的優點。而獨立性假設就是為了能快速得到結果而設定的，但是現實生活中的資料大多是相依的並無法滿足這個假設。所以Naïve Bayes classifier的缺點主要有兩點，一個是現實資料並無法滿足這個獨立性假設，另一個則是他只能使用於類別行變數。
我們所提出的新方法就是為了繼續保有Naïve Bayes classifier簡單且快速的優點，並且去除現實例子中資料各屬性彼此間無法滿足獨立性假設的問題。我們利用主成分分析的轉換將各屬性轉成相互線性獨立的狀態，再用連續型資料離散化的方法將資料轉換成類別行變數，最後再進行Naïve Bayes classifier，進而提高預測模型的準確度。
我們利用UCI資料庫的資料進行模型的測試和比較，並建構出一個優於其他分類法(如:原始的Naïve Bayes classifier、decision tree、logistic regression等)的新模型。所以我們分別進行分析及測試，並測試其準確率和信賴區間，觀察資料在不同預測模型中的表現。最後我們還進一步的去探討不同離散化方法以及利用主成分分析後的結果去降低維度時，對於整個模型準確率的影響。

摘要(英)

Due to the progressing of the science and technology, the data is growing rapidly. The speed of classifier has become an important part of data mining. Naïve Bayes classifier model is a simple and practical method of classification, it is based on applying Bayes’ theorem with strong independence assumptions between the features. But this assumption is not very realistic as in many real situations.
We propose a classifier method, PC-Naïve, which is based on Naïve Bayes classifier. We keep the simple and fast advantages of the Naïve Bays classifier and relax vital assumption for independence of the Naïve Bayes classifie model. We use Principal components analysis to transform the original data, make the attributes mutual linearly independence. Then discretization the transform data and calculate the prior and conditional probability. Final we can get the posterior probability and classifier the data.
We have used the examples to present the classifier procedures in our research and compare the accuracy with four models, including PC-Naïve model, tradition Naïve Bayes model, Decision Tree model and Stepwise Logistic Regression model. At the end, we have discuss the accuracy of different dimension and discretization methods.

關鍵字(中)

★ 分類方法
★ 貝氏分類
★ 主成分分析

關鍵字(英)

★ Classification
★ Naïve Bayesian Classifier
★ Principal Components Analysis

論文目次

摘要 III
Abstract IV
Table of Contents V
List of Figures VI
List of Tables VII
Chapter 1 Introduction 1
1-1 Background and Motivation 1
1-2 Research Objectives and frameworks 2
Chapter 2 Literature Review 4
2-1 Classification 4
2-2 Naïve Bayesian Classifier 5
2-3 Principal Components Analysis 8
Chapter 3 Methodology 9
Chapter 4 Numerical Example 17
4-1 The “Glass (1987)” data problem 17
4-2 The “Pima Indians Diabetes (1990)” data problem 25
4-3 Accuracy of different dimension 31
4-4 Different discretization method with data 32
4-5 Different settings of attribute numbers in Glass data 33
Chapter 5 Conclusion and Future Research 34
Reference 36

參考文獻

1. Cortizo, J. C., I. Giraldez, and M. C. Gaya, “ Wrapping the Naïve Bayes Classifier to Relax the Effect of Dependences”, Lecture Notes in Computer Science, Volume 4881, 2007, pp 229-239.
2. Cortizo, J. C., and J. I. Gir´aldez, “Multi criteria wrapper improvements to naive bayes learning." Intelligent Data Engineering and Automated Learning, (2006) 419–427.
3. Domingos, P., and M. J. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss.”, Machine Learning 29(2-3), (1997) 103–130.
4. Domingos, P., and M. J. Pazzani, “Beyond independence: Conditions for the optimality of the simple bayesian classifier.”, International Conference on Machine Learning., (1996) 105–112.
5. Farid, D. M., L. Zhang, C. M. Rahman, M. A. Hossain, and R. Strachan, “Hybrid decision tree and naïve Bayes classifiers for multi-class.”, Expert Systems with Applications, (2014), 1937–1946.
6. Friedman, N., D. Geiger, and M. Goldszmidt, “ Bayesian network classifiers.”, Machine Learning, 29(2-3) (1997) 131–163.
7. Ghorbanian, P., A. Ghaffari, A. Jalali, and C. Nataraj, “Heart Arrhythmia Detection Using Continuous Wavelet Transform and Principal Component Analysis with Neural Network Classifier.”, Computing in Cardiology, (2010), 669 – 672.
8. Kohavi, R., “Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid”, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996) 202–207.
9. Kononenko, I., “ Semi-naive bayesian classifier.”, EWSL-91: Proceedings of the European working session on learning on Machine learning., (1991) 206–219.
10. Liu, J. L., Y. T. Hsu, and C. L. Hung, “Development of Evolutionary Data Mining Algorithms and their Applications to Cardiac Disease Diagnosis.”, Evolutionary Computation (CEC), IEEE Congress on, (2012) 1-8.
11. Pazzani, M., “Constructive induction of cartesian product attributes.”, ISIS: Information, Statistics and Induction in Science, (1996).
12. Pazzani, M. J., “searching for dependencies in Bayesian classifiers.”, Springer-Verlag New York, (1996), 239-248.
13. Zhang, H., C. X. Ling, and Z. Zhao, “The learnability of naive bayes.” Lecture Notes in Computer Science 1822, (2000) 432–441.
14. Zou, F., C. Li, X. Hu, and C. Zhou, “Combination of Principal Component Analysis and Bayesian Network and its Application on Syndrome Classification for Chronic Gastritis in Traditional Chinese Medicine.” ICNC ′07 Proceedings of the Third International Conference on Natural Computation, (2007), 588 – 592.

指導教授

曾富祥(Fu-Shiang Tseng)

審核日期

2015-7-6

推文