利用多重門檻值挖掘序列規則

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：101

、訪客IP：3.17.164.34

姓名

林佳生(Chia-Sheng Lin) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

利用多重門檻值挖掘序列規則
(Mining Sequential Patterns with Multiple Minimum Supports)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來序列規則越來越重要了，傳統挖掘序列規則的方法都是建構在相同的方式上，也就是利用單一門檻值來挖掘那些出現次數超過門檻值的序列規則。利用單一門檻值的方法意味著所有的項目有著相似的特性，或者說它們出現在資料庫中的頻率類似，真實世界中並不常發生這樣的情形。
我們在這篇論文中首先延伸傳統單一門檻值來挖掘那些滿足不同門檻值的序列規則，接著設計了一個叫做MS-PrefixSpan的演算法。MS-PrefixSpan最主要的想法是以條件最小支持度當作門檻值來過濾投影資料庫中的項目，如果項目在投影資料庫中出現的次數超過門檻值，則將項目視為候選且長度為一的序列規則。條件最小支持度會依據每個投影資料庫逐漸調整以反映出每個最大序列規則實際的最小支持度。此外為了強調MS-PrefixSpan恰好可以找到所有的最大序列規則，我們提供了一個定理來說明MS-PrefixSpan的正確性。最後，我們的實驗結果顯示MS-PrefixSpan的確可以大量地減少時間和產生出的序列規則。

摘要(英)

Sequential mining is becoming more and more important recently. Traditional sequential pattern mining algorithms used the same model, i.e., finding all sequential patterns that satisfy one user-specified minimum support. However, using only one single minimum support implies that all items in the data are of the same nature and/or have similar frequencies in the database. This is not often the case in real-life applications.
In this paper, first we extended traditional one minimum support for all sequential patterns with multiple item supports. Second, we developed an effective algorithm called MS-PrefixSpan. Its general idea is using a conditional minimum support as a threshold to qualify items in each projected database for candidate length-1 sequential patterns. According to each projected database the conditional minimum support is gradually adjusted to reflect the actual minimum support of each maximal sequential pattern. Besides, in order to claim that MS-PrefixSpan can find all and only all maximal sequential patterns satisfying their own MSSP, we also provide a theorem to prove the correctness of MS-PrefixSpan. Third, our experimental result shows that MS-PrefixSpan indeed can substantially reduce the execution time and the number of produced sequential patterns.

關鍵字(中)

★ 資料挖礦
★ 序列規則
★ 多重門檻值

關鍵字(英)

★ PrefixSpan
★ Multiple Minimum Supports
★ Data Mining
★ Sequential Patterns

論文目次

Abstract III
Table of Contents IV
List of Illustrations V
List of Tables VI
1. Introduction 1
2. Background 5
2.1 Mining sequential patterns 5
2.1.1 Problem statement 5
2.1.2 The concept of GSP 6
2.2 PrefixSpan 8
2.2.1 The concept of PrefixSpan 9
2.3 Mining association rules with multiple minimum supports 12
2.3.1 The extend model 13
2.3.2 The concept of MSapriori 14
3. Mining sequential pattern with multiple minimum supports 17
3.1 The extended model 17
3.2 Why don’t we base on Apriori-like algorithm? 19
3.3 MS-PrefixSpan 21
3.3.1 Definitions in MS-PrefixSpan 21
3.3.2 MS-PrefixSpan algorithm 22
3.3.3 Examples of MS-PrefixSpan 25
3.3.4 Correctness of MS-PrefixSpan 30
4. Experiment 34
4.1 Synthetic data generation & setting of MIS value of each items 34
4.2 Different betas with different minimum supports 36
4.3 BackCheck of MS-PrefixSpan 40
4.4 Scale up 41
5. Conclusion 45
Reference 46

參考文獻

[1] Agarwall R.C., Aggarwal C. and V.V.V. Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing, 2000.
[2] Agrawal R. and Srikant R., Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pages 487-499, Santiago, Chile, Sept. 1994.
[3] Agrawal R. and Srikant R., Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE’95), pages 3-14, Taipei, Taiwan, Mar. 1995.
[4] Chen Y. L., Chiang M. C., Kao M. T., A New Approach for Discovering Time-Intervals in Sequential patterns. In Expert Systems with Applications, 2003.
[5] Chen M. S., Han J., Yu P. S., Data Mining: An Overview from Database Perspective. In IEEE Trans. On Knowledge And Data Engineering, 1997.
[6] Chen M.S., Park J.S. and Yu P.S., Efficient data mining for PathTraversal Patterns. In Proc. of IEEE Trans. Knowledge and DataEngineering (IEEE’98), Vol.10 No.2 pages 209-221, March 1998.
[7] Guralnik V., Garg N. and Karypis G., Parallel Tree Projection Algorithm for Sequence Mining, 7th International European Conference on Parallel Processing (Euro-Par 2001), Pages 310-320, Manchester, UK, Aug. 2001.
[8] Han J., Pei J., Mortazavi-Asl B., Chen Q., Dayal U. and Hsu M-C., FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), 355-359, Boston, MA, Aug. 2000.
[9] Liu B. and Hsu W. and Ma Y., Mining Association Rules with Multiple Minimum Supports. ACM SIGKDD International Conderence on Knowledge Discovery & Data Mining (KDD-99). August 15-18, 1999, San Diego, CA, USA.
[10] Mannila H., Toivonen H., and Inkeri A. Verkamo., Discovering Frequent Episodes in Sequences. In Proc. 1995 Int. Conf. on Knowledge Discovery and Data Mining (KDD'95), pages 210-215, Montreal, Canada, August 1995.
[11] Pei J., Han J., Mortazavi-Asl B., Pinto H., Chen Q., Dayal U. and Hsu M-C., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In. Proc. 2001 Int. Conf. Data Engineering (ICDE’01), pages 215-224, Heidelberg, Germany, April 2001.
[12] Pei J., Han J., Mortazavi-Asl B., and Zhu H., Mining Access Patterns Efficiently from Web Logs. In Proc. 2000 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'00), Pages 396-407, Kyoto, Japan, April 2000.
[13] Srikant R. and Agrawal R., Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. ExtendingDatabase Technology (EDBT’96), pages 3-17, Avignon, France, March 1996.
[14] Yan X., Han J., Afshar R., CloSpan Mining Closed Sequential Patterns in Large Datasets. In Proc. 2003 SIAM Int.Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003
[15] Zaki M. J., SPADE: An Efficient Algorithm for Mining Frequent Sequences. In Proc. of Machine Learning Journal, special issue on Unsupervised Learning (Doug Fisher, ed.), Vol. 42 Nos. 1/2, pages 31-60, Jan/Feb 2001.

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2003-7-5

推文