序列樣式探勘之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：13

、訪客IP：3.147.79.7

姓名

陳仕昇(Shih-Sheng Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

序列樣式探勘之研究
(The Research of Mining Frequent Sequential Patterns)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在眾多的資料中，具有有序性的序列資料是一個重要的研究議題，不管是科學上及商業上皆有廣泛的運用，在科學上如DNA序列的研究；在商業上如分析購物網站上使用者的瀏覽行為。我們可利用資料探勘技術可從序列資料中挖掘出高頻序樣式，提供使用者或決策者作不同的用途。在本論文中，我們將先前所研究的循序樣式，再細分成固定樣式及變動樣式，讓使用者或決策者更能瞭解潛藏在大量資料中更多的知識及規則。我們所提出的演算法可將循序樣式作區分外，其執行的效率不比目前執行效率相當快的PrefixSpan差。論文中，我們亦提出以抽樣為基礎的演算法分別挖掘一般的循序樣式及連續型循序樣式，我們演算法的優點有三，一是可處理大量的資料如同Apriori-like演算法，二是有效率如同Pattern Growth-like演算法，三是可與目前挖掘一般的循序樣式及連續型循序樣式的演算法相結合，且可與本論文提出挖掘混合樣式演算法相同。本論文主要應用於具有序列性質的資料，如在行銷上的依消費者行為作市場區隔，網站上網頁及系統效率維護等，提供使用者作為分析及決策的參考。

摘要(英)

Mining sequential patterns in databases is an important issue with many applications on commercial and scientific domains. For example, finding the patterns of DNA sequences and analyzing users’ web site browsing patterns can help to discover important knowledge in genetic evolution and consumer behavior, respectively.
Existing studies on finding sequential patterns can be classified into two categories, namely continuous and discontinuous patterns. In the first category, patterns are composed of elements in consecutive sequences. In the second category, patterns can be composed by elements that are separated by wild cards, which can denote zero or more than one elements. Although many researches have been published to find either kind of the patterns, no one can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. The dissertation defines hybrid patterns as the combination of continuous and discontinuous patterns and proposes a novel algorithm to mine hybrid patterns. The algorithm is as fast as PrefixSpan for mining sequential patterns.
Algorithms such as PrefixSpan require data volume to be small enough to fit in the main memory of machines to gain the full speed. In the dissertation, we also propose a sampling-based approach to find discontinuous patterns and continuous patterns. There are three advantages in this approach. First, it can mine frequent patterns from huge data as Apriori-like algorithms but need not to scan database many times. Second, it is as efficient as Pattern-growth algorithm like PrefixSpan and need not compress the database into the memory. Third, it can work with any known algorithm in mining discontinuous or continuous patterns.
The algorithms developed in the dissertation are important because they can be applied to mine knowledge from sequential data which are generated often in our daily life.

關鍵字(中)

★ 資料探勘
★ 序列樣式
★ 序列式資料

關鍵字(英)

★ frequent pattern
★ sequential pattern
★ data mining

論文目次

第１章簡介 1
第１節研究動機 1
第２節研究貢獻 3
第３節論文組織 3
第２章相關研究及問題定義 4
第１節循序樣式的研究 4
第２節挖掘高頻率循序樣式的主要方法 7
第３節挖掘循序樣式的問題 11
第３章挖掘混合循序樣式 17
第１節資料結構 17
第２節演算法 18
第３節改進的演算法 32
第４節產生序列規則 40
第５節實驗結果 42
第６節本章結語 49
第４章以抽樣為基礎挖掘循序樣式 50
第１節問題定義及演算法架構 50
第２節資料結構及演算法 53
第３節實驗結果 63
第４節本章結語 69
第５章以抽樣為基礎挖掘循序連續樣式 70
第１節問題定義及演算法架構 70
第２節資料結構及演算法 73
第３節實驗結果 81
第４節本章結語 87
第６章結論及未來發展 88
參考文獻 90

參考文獻

[1] R. Agrawal, C. Faloutsos, and A. Swami, Efficient similarity search in sequence databases, 4th International Conference on Foundations of Data Organization and Algorithms, 1993, pp. 69-84.
[2] R. Agrawal, D. Gunopulos, F. Leymann, Mining process models from workflow logs, Proceedings of the 6th International Conference on Extending Database Technology, 1998, pp. 469-483.
[3] R. Agrawal, K. Lin, H.S. Sawhney, and K. Shim, Fast similarity search in the presence of noise, scaling, and translation in time-series databases. Proceedings of the 21th International Conference on Very Large Data Bases, 1995, pp. 490-501.
[4] R. Agrawal, T. Imielinski and A. Swami, Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD International conference on Management of Data, 1993, pp. 207-216.
[5] R. Agrawal and R. Srikant, Fast algorithms for mining association rules. Proceedings of the 20th Internatioonal Conference on Very Large Data Bases, 1994, pp. 478-499.
[6] R. Agrawal and R. Srikant, Mining sequential patterns, Proceedings of the 7th International Conference on Data Engineering, 1995, pp. 3-14.
[7] R. Agrawal and J.C. Shafer. Parallel mining of association rules: design, implementation, and experience. IEEE Transactions on Knowledge and Data Engineering, 8(6), 1996, pp. 962-969.
[8] A. Chidanand, L. Bing, Pednault, Edwin P D, Smyth, Padhraic C. Apté, B. Liu, E. P. D. Pednault, P. Smyth, Evolving data mining into solutions for insights - business applications of data mining, Communications of the ACM, 45(8), 2002, pp.49-53.
[9] D.J. Cook and L.B. Holder, Graph-based data mining, IEEE Intelligent Systems, 15(2), 2002, pp. 32-41.
[10] M.S. Chen, J. Han, and P.S. Yu, Data mining: an overview from a database perspective, IEEE Transactions on Knowledge and Data Engineering, 8(6), 1996, pp. 866-883.
[11] R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999, pp. 5-32.
[12] M.-S. Chen, J.S. Park, and P.S. Yu, Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineering, 10(2), 1998, pp. 209-221.
[13] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, 1994, pp. 419-429.
[14] M. N. Garofalakis, R. Rastogi and K. Shim, Mining Sequential Pattern with Regular Expression Constraints, IEEE Transactions on Knowledge and Data Engineering, 14(3), 2002, pp. 530-552.
[15] M. N. Garofalakis, R. Rastogi and K. Shim, SPIRIT: Sequential Pattern Mining with Regular Expression Constraints, Proceedings of 25th International Conference on Very Large Data Bases, 1999, pp. 223-234.
[16] M. Greenberg and S. S. McDonald, Successful needs/benefits segmentation: a user’s guide, The Journal of Consumer Marketing, 6(3), 1989, pp. 29-36.
[17] J. Han, G. Dong, and Y. Yin, Efficient mining of partial periodic patterns in time series database, Proceedings of the 15th International Conference on Data Engineering, 1999, pp. 106-115.
[18] J. Han and Y. Fu, Mining multiple-level association rules in large databases, IEEE Transactions on Knowledge and Data Engineering, 11(5), 1999, pp. 798-804.
[19] J. Han, W. Gong, and Y. Yin, Mining segment-wise periodic patterns in time-related databases, Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 214-218.
[20] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U Dayal, M. Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp.355-359.
[21] M. Kuramochi and G. Karypis, Frequent subgraph discovery, Proceedings of the 2001 IEEE International Conference on Data Mining, 2001, pp. 313-320.
[22] M.-Y. Lin and S.-Y. Lee, Fast discovery of sequential patterns by memory indexing, Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, 2002, pp. 150-160.
[23] M.-Y. Lin, S.-Y. Lee and S.-S. Wang, DELISP: efficient discovery of generalized sequential patterns by delimited pattern-growth technology, Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, 2002, pp. 198-209.
[24] F. Masseglia, F. Cathala and P. Poncelet, The PSP approach for mining sequential patterns, Second European Symposium on Principles of Data Mining and Knowledge Discovery, 1998, pp.176-184.
[25] B. Mobasher, N. Jain, E. Han, and J. Srivastava, Web mining: pattern discovery from world wide web transactions, Technical Report TR96-050, Department of Computer Science, University of Minnesota, 1996.
[26] F. Masseglia, P. Poncelet and M.Teisseire, Incremental mining incremental mining of sequential patterns in large databases, Actes des 16imes Journes Bases de Donnes Avances, 2000.
[27] H. Mannila, H. Toivonen, and A.I. Verkamo, Discovery of frequent episodes in event sequences, Data Mining and Knowledge Discovery, 1(3), 1997, pp. 259-289.
[28] A. Nanopoulos and Y. Manolopoulos, Mining patterns from graph traversals, Data & Knowledge Engineering 37(3), 2001, pp. 243-266.
[29] J.S. Park, M.-S. Chen, and P.S. Yu, An effective hash based algorithm for mining association rules, Proceedings of the ACM SIGMOD international Conference on Management of Data, 1995, pp 175-186.
[30] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, Mining access pattern efficiently from web logs, Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000, pp. 396-407.
[31] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth, Proceedings of the 17th International Conference on Data Engineering, 2001, pp. 215-224.
[32] S. Parthasarathy, M. J. Zaki, M. Ogihara and S. Dwarkadas, Incremental and Interactive Sequence Mining, 8th International Conference on Information and Knowledge Management, 1999, pp 251-258.
[33] R. Srikant and R. Agrawal, Mining generalized association rules, Proceedings of the 21th International Conference on Very Large Data Bases, 1995, pp. 407-419.
[34] R. Srikant and R. Agrawa, Mining sequential patterns: generalizations and performance improvements, Proceedings of the 5th International Conference on Extending Database Technology, 1996, pp. 3-17.
[35] M. Seno and G. Karypis, SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length Decreasing Support Constraint, Proceedings of the 2002 IEEE International Conference on Data Mining, 2002, pp. 418-425.
[36] H. Toivonen, Sampling Large Databases for Association Rules, Proceedings of 22th International Conference on Very Large Data Bases, 1996, pp. 134-145.
[37] M. S. Tsechansky, N. Pliskin, G. Rabinowitz, and A. Porath, Mining relational patterns from multiple relational tables, Decision Supports Systems, 27, 1999, pp. 177-195.
[38] R. E Valdes-Perez, Discovery tools for science apps, Communications of the ACM, 42(11), 1999, pp. 37-41.
[39] J.T. Wang, G.W. Chirn, T.G. Marr, B. Shapiro, D. Shasha, and K. Zhang, Combinatorial pattern discovery for scientific data: some preliminary results. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, 1994, pp. 115-125.
[40] C.-Y. Wang, T.-P. Hong and S.-S. Tseng, Maintenance of sequential patterns for record modification using Pre-large Sequences. IEEE International Conference on Data Mining, 2002, pp. 693-696.
[41] K. Wand and H. Liu Discovering structural associaton of semistructured data, IEEE Transactions on Knowledge and Data Engineering, 12(2), 2000, pp. 353-371.
[42] K. Wang and J. Tan, Incremental discovery of sequential patterns, The ACM-SIGMOD's 96 Data Mining Workshop: On Research Issues on Data Mining and Knowledge Discovery, 1996, pp. 95-102.
[43] Y. Xiao and M.H. Dunham, Efficient mining of traversal patterns, Data & Knowledge Engineering, 39, 2001, pp. 191-214.
[44] M.J. Zaki, Efficient enumeration of frequent sequences, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, 1998, pp. 68-75.
[45] M.J. Zaki, N. Lesh, and M. Ogihara. PlanMine: sequence mining for plan failures. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 369-374.
[46] M. Zhang, B. Kao, D. W.-L. Cheung, C. L. Yip, Efficient algorithms for incremental update of frequent sequences, Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, 2002, pp. 186-197.
[47] M. Zhang, B. Kao, C. L. Yip, A comparison study on algorithms for incremental update of frequent sequences, Proceedings of the 2002 IEEE International Conference on Data Mining, 2002, pp. 554-561.
[48] M. J., Zaki, N. Lesh and M. Ogihara, PlanMine: predicting plan failures using sequence mining, Artificial Intelligence Review, 14(6), 2000, pp. 421-446.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2003-7-3

推文