dc.description.abstract | Discretization and feature selection are essential preprocessing techniques in many data mining and knowledge discovery tasks. The main goal of discretization is to transform a set of quantitative data into qualitative data; the main goal of feature selection is to select relative attributes for model reduction that obtains the optimal attribute subsets using selected measures, in order to achieve better accuracy or efficiency. With the preprocessing techniques, the preprocessed data can be regarded as the simplified and concise representation of information, which can be applied to many classification algorithms.
Distributed Decision Generation Based on Granular Computing (DGAGC), a recently developed classification algorithm by National Central University, Taiwan, has an excellent recognition rate on categorical dataset. However, it has a relatively low recognition rate on numerical data. To improve DGAGC, we propose a new algorithm for discretization of numerical data. First, the proposed algorithm decides whether an attribute should be treated as categorical or numerical in DGAGC. Second, the numerical data are preprocessed by the SC algorithm, a recently developed classification algorithm for numerical data by National Central University, Taiwan. Third, the SC algorithm transforms the numerical data into the corresponding categorical data. Finally, the transformed data, together with the untransformed categorical data, are handled by DGAGC for data classification. We compare the proposed classification algorithm to other famous classification algorithms using the UCI database and the KEEL database. The results show that the proposed algorithm improves DGAGC on handling numerical data. The results also show that, the proposed algorithm achieves a relatively good recognition rate when compared with existing popular classification algorithms. | en_US |