序列樣式探勘之研究; The Research of Mining Frequent Sequential Patterns

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/12910

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/12910

Title:	序列樣式探勘之研究;The Research of Mining Frequent Sequential Patterns
Authors:	陳仕昇;Shih-Sheng Chen
Contributors:	資訊管理研究所
Keywords:	資料探勘;序列樣式;序列式資料;frequent pattern;sequential pattern;data mining
Date:	2003-06-30
Issue Date:	2009-09-22 15:18:18 (UTC+8)
Publisher:	國立中央大學圖書館
Abstract:	在眾多的資料中，具有有序性的序列資料是一個重要的研究議題，不管是科學上及商業上皆有廣泛的運用，在科學上如DNA序列的研究；在商業上如分析購物網站上使用者的瀏覽行為。我們可利用資料探勘技術可從序列資料中挖掘出高頻序樣式，提供使用者或決策者作不同的用途。在本論文中，我們將先前所研究的循序樣式，再細分成固定樣式及變動樣式，讓使用者或決策者更能瞭解潛藏在大量資料中更多的知識及規則。我們所提出的演算法可將循序樣式作區分外，其執行的效率不比目前執行效率相當快的PrefixSpan差。論文中，我們亦提出以抽樣為基礎的演算法分別挖掘一般的循序樣式及連續型循序樣式，我們演算法的優點有三，一是可處理大量的資料如同Apriori-like演算法，二是有效率如同Pattern Growth-like演算法，三是可與目前挖掘一般的循序樣式及連續型循序樣式的演算法相結合，且可與本論文提出挖掘混合樣式演算法相同。本論文主要應用於具有序列性質的資料，如在行銷上的依消費者行為作市場區隔，網站上網頁及系統效率維護等，提供使用者作為分析及決策的參考。 Mining sequential patterns in databases is an important issue with many applications on commercial and scientific domains. For example, finding the patterns of DNA sequences and analyzing users’ web site browsing patterns can help to discover important knowledge in genetic evolution and consumer behavior, respectively. Existing studies on finding sequential patterns can be classified into two categories, namely continuous and discontinuous patterns. In the first category, patterns are composed of elements in consecutive sequences. In the second category, patterns can be composed by elements that are separated by wild cards, which can denote zero or more than one elements. Although many researches have been published to find either kind of the patterns, no one can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. The dissertation defines hybrid patterns as the combination of continuous and discontinuous patterns and proposes a novel algorithm to mine hybrid patterns. The algorithm is as fast as PrefixSpan for mining sequential patterns. Algorithms such as PrefixSpan require data volume to be small enough to fit in the main memory of machines to gain the full speed. In the dissertation, we also propose a sampling-based approach to find discontinuous patterns and continuous patterns. There are three advantages in this approach. First, it can mine frequent patterns from huge data as Apriori-like algorithms but need not to scan database many times. Second, it is as efficient as Pattern-growth algorithm like PrefixSpan and need not compress the database into the memory. Third, it can work with any known algorithm in mining discontinuous or continuous patterns. The algorithms developed in the dissertation are important because they can be applied to mine knowledge from sequential data which are generated often in our daily life.
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Size	Format

社群 sharing

Loading...