應用遺傳演算法於離散化連續性屬性之研究; Apply Genetic Algorithms to Discretization

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/13158

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/13158

題名:	應用遺傳演算法於離散化連續性屬性之研究;Apply Genetic Algorithms to Discretization
作者:	邱獻良;Hsien-Lian Chiu
貢獻者:	資訊管理研究所
關鍵詞:	約略集合;遺傳演算法;屬性離散化;分類規則的歸納;rule induction;rough set theory;genetic algorithm;discretization
日期:	2005-06-23
上傳時間:	2009-09-22 15:25:03 (UTC+8)
出版者:	國立中央大學圖書館
摘要:	連續性屬性的離散化可以被視為如何去選擇出一組屬性切點集合的問題，多數的過去研究致力於找到一組最小的切點集合，並且同時保留資料的一致性。然而維持過高的資料一致性可能會導致分類演算法歸納出數目過多且概化能力不佳的分類規則。進行屬性離散化除了考量資料一致性外，也應該要將概化能力納入考量，因為概化能力好的分類規則是很容易被了解及解釋說明的。本研究中提出了以遺傳演算法為基礎的離散化方法，目標是能夠有效率地找出符合資料一致性及概化能力考量下的一個折衷最佳切點集合來進行離散化。本研究中設計了二組實驗，實驗中的資料選自於美國加洲大學爾灣分校的機器學習資料庫，實証結果顯示出本方法可以依照使用者的需求產生簡化的離散結果，而且可以幫助分類演算法歸納出概化能力佳及預測正確率亦高的分類規則。 Discretization of continuous attributes is one of main problems needed to be solved in data mining. Discretization can be viewed as the problem of selecting a set of cut points of attributes. Past studies concentrated on finding a minimal set of cut points and maintaining the fidelity of the original data in discretization. However, maintaining too high consistency may yield too many unnecessary rules which are not general. Generality is an important aspect to discretization because general rules are usually useful and easy to interpret. In this paper, a genetic algorithm based approach is proposed and the aim is to efficiently find an optimal compromise solution of discretization between generality and consistency criterions. Two sets of experiments on some data sets from UCI Machine Learning Repository by this approach were done. The empirical results have demonstrated that our GA approach can generate the simplest discretization result according to the requirement of the decision maker and help the classifier to induce general rules with high predictive accuracy.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	大小	格式	瀏覽次數

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....