dc.description.abstract | In this dissertation, we focus on how to devise an efficient and effective algorithm for discovering inter-transaction associations such as, periodic patterns, frequent continuities, frequent episodes and sequential pattern. Firstly, we propose a 3-phase FITS model in inter-transaction association mining. We adopt both horizontal and vertical formats to increase the mining efficiency. Furthermore, we focus on the application of FITS to closed pattern mining to reduce the number of patterns to be enumerated. The insight is “If an intra-transaction pattern is not a closed pattern, it will not be a closed frequent inter-transaction pattern”. The bi-format and bi-phase reduction are applied to overcome the problem of the duplicate item extensions especially for closed pattern mining. We have applied the FITS model to all inter-transaction mining tasks with a little modification.
Although the FITS model can be used for periodic pattern mining, it is not efficient enough since the constraints on periodicy are not fully utilized. Therefore, we propose a more general model, SMCA, to mine asynchronous periodic patterns from a complex sequence and correct some problem of the previous works. A 4-phase algorithm, including SPMiner, MPMiner, CPMiner and APMiner, is devised to discover periodic patterns from a transactional database presented in vertical format. The essential idea of SPMiner is to trace the possible segments for period p by a hash table. Besides, to avoid additional scans over the transactional database, we propose a segment-based combination to reduce redundant generation and testing. The experiments have demonstrated good performance of the proposed model on several inter-transaction patterns. Although the efficiency improvement is based on the requirement of additional memory cost, the memory cost can be further reduced by disk-based or partition-based approaches, which in turn also prove to be better than state-of-the-art algorithms. In summary, the proposed model can be orders of magnitude faster than previous works with a modest memory cost. | en_US |