博碩士論文 984203045 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:44 、訪客IP:3.145.12.212
姓名 朱啟源(Chi-yuan Chu)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 資料前處理之研究:以基因演算法為例
(Feature and Instance Selection Using Genetic Algorithms:An Empirical Study)
相關論文
★ 利用資料探勘技術建立商用複合機銷售預測模型★ 應用資料探勘技術於資源配置預測之研究-以某電腦代工支援單位為例
★ 資料探勘技術應用於航空業航班延誤分析-以C公司為例★ 全球供應鏈下新產品的安全控管-以C公司為例
★ 資料探勘應用於半導體雷射產業-以A公司為例★ 應用資料探勘技術於空運出口貨物存倉時間預測-以A公司為例
★ 使用資料探勘分類技術優化YouBike運補作業★ 特徵屬性篩選對於不同資料類型之影響
★ 資料探勘應用於B2B網路型態之企業官網研究-以T公司為例★ 衍生性金融商品之客戶投資分析與建議-整合分群與關聯法則技術
★ 應用卷積式神經網路建立肝臟超音波影像輔助判別模型★ 基於卷積神經網路之身分識別系統
★ 能源管理系統電能補值方法誤差率比較分析★ 企業員工情感分析與管理系統之研發
★ 資料淨化於類別不平衡問題: 機器學習觀點★ 資料探勘技術應用於旅客自助報到之分析—以C航空公司為例
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 特徵選取(feature selection)和樣本選取(instance selection)在資料探勘裡,是兩個很重要的資料前處理技術,主要目的是希望再給定一個資料集時,可以透過特徵選取技術來去除不相關或是冗餘的特徵值,或是透過樣本選取技術來消除重覆及錯誤的資料,特別的是基因演算法(genetic algorithm)是過去最被廣泛應用在這資料前處理技術的演算法,而目前這兩種資料前處理的方法,在過去往往是被分開探討的,所以目前尚未清楚特徵選取和樣本選取同時執行與個別單獨執行,其執行效能與結果有什麼樣的不同,因此本研究的目的是透過基因演算法去處理特徵選取與樣本選取,並且探討兩種資料前處理方法之間的順序,在不同的領域資料集中的分類表現,實驗的結果來自於不同領域的四個大型資料集與四個小型資料集在分類器(例如:support vector machines and k-nearest neighbor)上的表現,而其中這八個資料集的維度特徵與資料樣本數目並不相同,目的是希望可以將這樣的方法不僅可以應用在不同領域的資料集,還可以應用在差異性大的資料集,除此之外,本研究除了找到不同的資料前處理模式,更進一步的分析資料集的特性,目的是希望透過正確率與時效性的兩個層面,更進一步的探討那種特性的資料集適合應用何種資料前處理方法,透過找出一定的規律和準則,讓不同領域的資料集皆能夠在分類器上或實驗的時效性上,皆有較佳的表現。
摘要(英) Feature selection and instance selection are two important data preprocessing steps in data mining, where the former aims at removing some irrelevant and/or redundant features from a given dataset and the later for discarding the faulty data. In particular, genetic algorithms have been widely used for these tasks in related studies. However, these two data processing tasks are generally considered separately in literature. It is unknown about the performance differences between performing both feature and instance selection and feature or instance selection individually. Therefore, the aim of this paper is to perform feature selection and instance selection based on genetic algorithms using different priorities to examine the classification performances over different domain datasets. Experimental results based on four small and large scale datasets containing various numbers of features and data samples show that performing both feature and instance selection usually make the classifiers (i.e., support vector machines and k-nearest neighbor) perform slightly poorer than feature selection or instance selection individually. However, while there is not a significant difference in classification accuracy between these different data preprocessing methods, the combination of feature and instance selection largely reduces the computational effort of training the classifiers than feature and instance selection individually. By considering both classification effectiveness and efficiency, performing feature and instance selection is the optimal solution for data preprocessing in data mining.
關鍵字(中) ★ 資料探勘
★ 特徵選取
★ 基因演算法
★ 樣本選取
關鍵字(英) ★ data mining
★ feature selection
★ instance selection
★ genetic algorithms
論文目次 摘要 i
Abstract ii
目錄 iii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 6
1.4 研究步驟 8
第二章 文獻探討 10
2.1 特徵選取(feature selection) 10
2.2 樣本選取(instance selection) 12
2.3 基因演算法(Genetic Algorithms) 13
第三章 研究方法 18
3.1 資料集(datasets) 18
3.2 資料前處理:以特徵選取為例 19
3.3 實驗流程 22
3.4 基因演算法的參數設定 25
3.5 分類器的設計 25
第四章 實驗結果 30
4.1 小型資料集的實驗結果 30
4.2 大型資料集的實驗結果 32
4.3 實驗成本的比較 36
4.4 實驗結果建議 42
第五章 結論與建議 44
5.1 結論 44
5.2 未來展望與建議 46
參考文獻 47
參考文獻 中文部分
洪振富,2010,距離式特徵於資料自動分類之研究,國立中央大學,碩士論文。
謝欣宏,2002,台鐵司機員排班與輪班問題之研究 – 以基因演算法求解,國立交通大學,碩士論文。
英文部分
D.R. Wilson, T.R. Martinez, 2000. Reduction techniques for instance-based learning algorithms, Machine Learning, Vol. 38, No. 3, pp. 257-286.
G I. Bose, R.K. Mahapatra, 2001. Business data mining ─ a machine learning perspective, Information & Management, Vol. 39, No. 3, pp. 221-225.
U. Fayyad, S.G. Piatetsky, P. Smyth, 1996. Advances in knowledge discovery and data mining, The MIT Press.
J. Han, M. Kamber, 2000. Data mining: concepts and techniques. Morgan Kaufmann.
S.F. Crone, S. Lessmann, R. Stahlbock, 2006. The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing, European Journal of Operational Research, Vol. 173, No. 3, pp. 781-800.
C.C. Aggarwal, P.S. Yu, 2001. Outlier detection for high dimensional data, in Proc. ACM SIGMOD Int. Conf. Management of Data, Santa Barbara, California, pp. 37-46.
V. Barnett, T. Lewis, 1994. Outliers in statistical data. John Wiley & Son, New York.
T. Reinartz, 2002. A unifying view on instance selection, Data Mining and Knowledge Discovery, Vol. 6, No. 2, pp. 191-210.
J. Yang, S. Olafsson, 2006. Optimization-based feature selection with adaptive instance sampling, Computers & Operations Research, Vol. 33, No. 11, pp. 3088-3106.
J. Li, M.T. Manry, P.L. Narasimha, C. Yu, 2006. Feature selection using a piecewise linear network, IEEE Transactions on Neural Networks, Vol. 17, No. 5, pp. 1101-1115.
I. Guyon, A. Elisseeff, 2003. An introduction to variable and feature selection, Journal of Machine Learnig Research, Vol. 3, pp. 1157-1182.
S. Gunal, R. Edizkan, 2008. Subspace based feature selection for pattern recognition, Information Sciences, Vol. 178, pp. 3716-3726.
A. Kuri-Morales, F. Rodrı’guez-Erazo, 2009. A search space reduction methodology for data mining in large databases, Engineering Applications of Artificial Intelligence, Vol. 22, pp. 57-65.
S. Piramuthu, 2004. Evaluating feature selection methods for learning in data mining applications, European Journal of Operational Research, Vol. 156, pp. 483-494.
C.-F. Tsai, 2009. Feature selection in bankruptcy prediction, Knowledge-Based Systems, Vol. 22, No. 2, pp. 120-127.
J.-S. Wang, J.-C. Chiang, 2008. A cluster validity measure with outlier detection for support vector clustering, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, Vol. 38, No. 1, pp. 78-89.
D. Fragoudis, D. Meretakis, S. Likothanassis, 2002. Integrating feature and instance selection for text classification, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 501-506.
J.T. De Souza, R.A.F. Do Carmo, G. Augusto, L. De Campos, 2008. A novel approach for integrating feature and instance selection, in Proc. Int. Conf. Machine Learning and Cybernetics, pp. 374-379.
J. Derrac, S. Garcia, F. Herrera, 2010. A survey on evolutionary instance selection and generation, International Journal of Applied Metaheuristic Computing, Vol. 1, No. 1, pp. 60-92.
M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, A.K. Jain, 2000. Dimensionality reduction using genetic algorithms, IEEE Transactions on Evolutionary Computation, Vol. 4, No. 2, pp. 164-171.
J.R. Cano, F. Herrera, M. Lozano, 2003. Using evolutionary algorithms as instance selection for data reduction: an experimental study, IEEE Transactions on Evolutionary Computation, Vol. 7, No. 6, pp. 561-575.
M. Kudo, J. Sklansky, 2000. Comparison of algorithms that select features for pattern classifiers, Pattern Recognition, Vol. 33, pp. 25-41.
W.B. Powell, 2007. Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience.
M. Dash, H. Liu, 1997. Feature selection methods for classifications, Intelligent Data Analysis, Vol. 1, No. 3, pp. 131-156.
Fayyad, U.M., Piatesky, S.G., Smyth, P., 1996. From Data Mining to Knowledge Discovery in Databases, AI Magazine, pp.37-54.
A. Ghosting, S. Parthasarathy, M.E. Otey, 2008. Fast mining of distance-based outliers in high-dimensional datasets, Data Mining and Knowledge Discovery, Vol. 16, pp. 349-364.
J. Derrac, S. Garcia, F. Herrera, 2010. IFS-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule, Pattern Recognition, Vol. 43, pp. 2082-2105.
J.-F. Ramirez-Cruz, V. Alarcon-Aquino, O. Fuentes, L. Garcia-Banuelos, 2006. Instance Selection and Feature Weighting Using Evolutionary Algorithms, in Proc. Int. Conf. Computing, pp. 73-79.
F. Ros, S. Guillaume, M. Pintore, J.R. Chretien, 2008. Hybrid genetic algorithm for dual selection, Pattern Analysis and Applications, Vol. 11, pp. 179-198.
H. Ahn, K.-J. Kim, 2009. Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach, Applied Soft Computing, Vol. 9, No. 2, pp. 599-607.
J.J. Grefenstette, 1986. Optimization of control parameters of genetic algorithms, IEEE Transactions on Systems, Man and Cybernetics, Vol. 16, No. 1, pp. 122-128.
S.-Y. Ho, C.-C. Liu, S. Liu, 2002. Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm, Pattern Recognition Letters, Vol. 23, pp. 1495-1503.
K.J. Kim, I. Han, 2000. Genetic algorithm approach to feature discretization in artificial neural network for the prediction of stock price index, Expert Systems with Applications, Vol. 19, No. 2, pp. 125-132.
L.I. Kuncheva, L.C. Jain, 1999. Nearest neighbor classifier : simultaneous editing and feature selection, Pattern Recognition Letters, Vol. 20, pp. 1149-1156.
H. Byun, S.-W. Lee, 2003. A survey on pattern recognition applications of support vector machines, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 17, No. 3, pp. 459-486.
H. Liu, H. Motoda, 2002. On issues of instance selection, Data Mining and Knowledge Discovery, Vol. 6, pp. 115-130.
N. Jankowski, M. Grochowski, 2004. Comparison of instances selection algorithms I: algorithms survey, in Proc. Int. Conf. Artificial Intelligence and Soft Computing, pp. 598-603.
M. Grochowski, N. Jankowski, 2004. Comparison of instances selection algorithms II: results and comments, in Proc. Int. Conf. Artificial Intelligence and Soft Computing, pp. 580-585.
D.E. Goldberg, 1989. Genetic algorithms in search optimization and machine learning, Addition Wesley.
P.G. Espejo, S. Ventura, F. Herrera, 2010. A survey on the application of genetic programming to classification, IEEE Transactions on Systems, Many, and Cybernetics – Part C: Applications and Reviews, Vol. 40, No. 2, pp. 121-144.
D.L. Wilson, 1972. Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man and Cybernetics, Vol. 2, pp. 408-421.
Haupt, L. Randy, S. E. Haupt, 1998. Practical genetic algorithms, Wiley, New York.
M. Gen, R. Cheng, 2000. Genetic algorithms and engineering optimization, John Wiley & Sons.
C. J. C. Burges, 1998. A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, Vol. 2, No. 2.
B. Schlkopf, C. J. C. Burges, A. J. Smola, 1999, Introduction to support vector learning, advances in kernel methods-support vector learning, Cambridge.
Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Vol. 2, pp.1137-1145.
Sikora Riyaz, Piramuthu Selwyn, 2007. Framework for efficient feature selection in genetic algorithm based data mining, European Journal of Operational Research, Vol. 180, Issue 2, pp. 723-737.
指導教授 蔡志豐、李俊賢
(Chih-fong Tsai、Chun-shien Li)
審核日期 2011-7-20
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明