dc.description.abstract | As data mining techniques are explored extensively, incorporating discovered knowledge into business leads to superior competitive advantages. Most data mining techniques nowadays are designed to solve problems based on transformed data files. Namely, the raw data tables should be transformed into specific formats before mining methods could be applied, and some previous works have pointed that such data transformation usually consumes a lot of resources. Therefore, this study proposes new methods which incorporate mining algorithms with enterprise transaction databases directly.
In this study, two methods are proposed to discovery knowledge from raw data of Enterprise Systems. The first one, named FPN, is developed to mine frequent patterns from transaction tables. Traditionally, data mining technique has seldom being applied in real-time. However, in many cases, the decisions have to be made in a short time, such as the decisions of promoting fresh agriculture goods in retailing stores should be made daily and in the limit of one or two hours. So the FPN method has following advantages to support real-time mining performed in enterprise systems: (i) raw data of enterprise systems are used directly, (ii) when the threshold is tuned, only newly qualified data are read and the data structure built for original data is kept intact, (iii) product assortments centered on particular product can be effective performed, (iv) the performance of the mining algorithm is better than that of popular mining algorithms.
The second method, Char, is proposed to find characteristics from database tables. It can be applied to find characteristics of customer tables or product tables… etc. In contrast to traditional data generalization or induction methods, the Char does not need a concept tree in advance and can generate a manual set of characteristic rules that are precise enough to describe the main characteristics of the data. The simulation results show that the characteristic rules found by Char are efficient as well as consistent regardless of the number of records and of attributes in the dataset. | en_US |