區間式及點式序列樣式探勘

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：20

、訪客IP：3.149.250.19

姓名

吳欣怡(Shin-Yi Wu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

區間式及點式序列樣式探勘
(Interval-based and Point-based Sequential Pattern Mining)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

資料探勘技術可運用於許多領域，例如：行銷分析、決策支援、詐欺偵測、企業管理等等。資料探勘研究領域發展了許多技術從大量資料中分析出有用的資訊，而序列樣式探勘是其中重要的技術之一。過去的序列樣式探勘技術多為針對點式事件所設計，也就是說，這些技術所探勘的序列資料中，所有的事件皆發生於某個時間點。然而，在許多應用中，事件並非必然只發生在一個時間點，而可能持續發生於一段時間，這樣的事件稱之為區間式事件。在區間式事件所組成的序列中尋找頻繁樣式，稱之為區間序列樣式探勘。而在其它一些應用中，序列中的事件也許不必然為點式或區間式，而是兩種事件皆可能發生的情況。此類序列稱之為混合事件序列。而在這類序列中尋找頻繁序列即稱之為混合序列樣式探勘。由於傳統的序列樣式探勘方法無法用來探勘區間事件序列或混合序列樣式，因此本文提出兩個方法分別用以探勘區間序列樣式及混合序列樣式。經由一連串實驗過程 (包含人工資料及真實資料)，說明此二探勘方法皆為有效率及有效。

摘要(英)

Data mining is useful in various domains, such as market analysis, decision support, fraud detection and business management, among others. Many approaches have been proposed to extract information and sequential pattern mining is one of the mostimportant methods. Previous studies of sequential pattern mining have discovered patterns from point-based event sequences. However, in some applications, event sequences may contain interval-based events or hybrid events (both point-based and interval-based events). Frequent patterns discovered from interval-based event sequences are called temporal patterns, and those discovered from hybrid event sequences are called hybrid temporal patterns. But because the existing methods for discovering sequential patterns are not applicable to mine temporal pattern or hybrid patterns, this study is dedicated to develop new methods to discover temporal patterns and hybrid temporal patterns. Both proposed methods have been verified for efficiency and effectiveness by using synthetic and real datasets.

關鍵字(中)

★ 序列樣式
★ 時間區間樣式
★ 混合序列樣式
★ 資料探勘
★ 區間序列樣式

關鍵字(英)

★ Data Mining
★ Sequential Patterns
★ Temporal Patterns
★ Interval-based Event Sequence
★ Hybrid Event Sequence

論文目次

ABSTRACT.................................................................................................................. I
中文摘要......................................................................................................................II
誌謝............................................................................................................................. III
CONTENTS............................................................................................................... IV
LIST OF FIGURES .................................................................................................. VI
LIST OF TABLES....................................................................................................VII
CHAPTER 1 INTRODUCTION.............................................................................1
1.1 APPLICATIONS OF TEMPORAL PATTERN MINING .................................................3
1.2 APPLICATIONS OF HYBRID TEMPORAL PATTERN MINING....................................4
1.3 ORGANIZATION OF THIS DISSERTATION ...............................................................6
CHAPTER 2 RELATED WORKS..........................................................................8
2.1 BACKGROUND ....................................................................................................8
2.2 DATA MINING RESEARCHES..............................................................................10
2.3 SEQUENTIAL PATTERN MINING RESEARCHES....................................................12
2.4 CLASSIC SEQUENTIAL PATTERN MINING METHODS..........................................16
2.4.1. GSP .........................................................................................................17
2.4.2. PrefixSpan...............................................................................................18
CHAPTER 3 TEMPORAL PATTERN MINING................................................21
3.1 MOTIVATION .....................................................................................................21
3.2 NONAMBIGIOUS REPRESENTATION....................................................................24
3.2.1 Problem Definition...................................................................................24
3.2.2 Why Oue Format is Unambiguous...........................................................29
3.3 ALGORITHM FOR MINING TEMPORAL PATTERNS...............................................30
3.3.1 Data Transformation................................................................................30
3.3.2 The TPrefixSpan Algorithm......................................................................30
3.3.3 Correctness and Completeness ................................................................38
3.4 EXPERIMENTS ...................................................................................................39
3.4.1 Performance Evaluation ..........................................................................40
3.4.2 Real Case Analyses ..................................................................................45
3.4.3 Predictive Accuracy .................................................................................52
3.5 SUMMARY.........................................................................................................56
CHAPTER 4 HYBRID TEMPORAL PATTERN MINING...............................57
4.1 PROBLEM DEFINITIONS.....................................................................................57
4.2 TEMPORAL RELATIONS BETWEEN HYBRID EVENTS ..........................................62
4.3 ALGORITHM FOR MINING HYBRID TEMPORAL PATTERNS .................................64
4.4 EXPERIMENTS ...................................................................................................70
4.4.1. Performance Evaluation ............................................................................70
4.4.2. Real case analyses .....................................................................................78
4.5 SUMMARY.........................................................................................................83
CHAPTER 5 USAGE GUIDE...............................................................................84
5.1 IN FINANCE DOMAIN ........................................................................................86
5.2 IN ELECTRONIC COMMERCE DOMAIN...............................................................90
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS ..................................96
REFERENCES...........................................................................................................98

參考文獻

[1] R. Agrawal, C. Faloutsos, and A. Swami, "Efficient similarity search in
sequence databases", Proceedings of the 4th International Conference of
Foundations of Data Organization and Algorithms (FODO), Chicago, Illinois,
1993.
[2] R. Agrawal, et al., Automatic subspace clustering of high dimensional data for
data mining applications, Google Patents, 1999.
[3] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between
sets of items in large databases", ACM SIGMOD Record, 22(2), 207-216
1993.
[4] R. Agrawal, et al., "Fast discovery of association rules", Advances in
knowledge discovery and data mining table of contents, 307-328 1996.
[5] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in
Large Databases", Proceedings of the 20th International Conference on Very
Large Data Bases, 1994.
[6] R. Agrawal and R. Srikant, "Mining sequential patterns", Eleventh
International Conference on Data Engineering, Taipei, Taiwan, 1995.
[7] J. F. Allen, "Maintaining knowledge about temporal intervals",
Communications of the ACM, 26(11), 832-843 1983.
[8] M. Ankerst, et al., "OPTICS: ordering points to identify the clustering
structure", Proceedings of the 1999 ACM SIGMOD international conference
on Management of data, 1999.
[9] S. Berchtold, et al., "Fast parallel similarity search in multimedia databases",
Proceedings of the 1997 ACM SIGMOD international conference on
Management of data, 1997.
[10] S. Berchtold and H. P. Kriegel, "S3: similarity search in CAD database
systems", ACM SIGMOD Record, 26(2), 564-567 1997.
[11] D. J. Berndt and J. Clifford, "Finding patterns in time series: a dynamic
programming approach", Advances in knowledge discovery and data mining
table of contents, 229-248 1996.
[12] M. W. Berry, Survey of text mining: clustering, classification, and retrieval,
Springer, 2003.
[13] A. Berson, S. Smith, and K. Thearling, Building data mining applications for
CRM, McGraw-Hill New York, 2000.
[14] S. Chakrabarti, Mining the Web: discovering knowledge from hypertext data,
Morgan Kaufmann, 2003.
[15] P. K. Chan, et al., "Distributed data mining in credit card fraud detection",
Intelligent Systems and Their Applications, IEEE (see also IEEE Intelligent
Systems), 14(6), 67-74 1999.
[16] M. S. Chen, J. Han, and P. S. Yu, "Data mining: an overview from a database
perspective", IEEE Transactions on Knowledge and Data Engineering, 8(6),
866-883 1996.
[17] Y. L. Chen, M. C. Chiang, and M. T. Ko, "Discovering time-interval sequential
patterns in sequence databases", Expert Systems With Applications, 25(3),
343-354 2003.
[18] Y. L. Chen and T. C. K. Huang, "Discovering fuzzy time-interval sequential
patterns in sequence databases", Systems, Man and Cybernetics, Part B, IEEE
Transactions on, 35(5), 959-972 2005.
[19] Y. L. Chen and T. C. K. Huang, "A new approach for discovering fuzzy
quantitative sequential patterns in sequence databases", Fuzzy Sets and
Systems, 157(12), 1641-1661 2006.
[20] T. Denoeux, "A k-nearest neighbor classification rule based on
Dempster-Shafertheory", Systems, Man and Cybernetics, IEEE Transactions
on, 25(5), 804-813 1995.
[21] R. O. Duda and P. E. Hart, Pattern classification and scene analysis, Wiley
New York, 1973.
[22] M. El-Sayed, C. Ruiz, and E. A. Rundensteiner, "FS-Miner: efficient and
incremental mining of frequent sequence patterns in web logs", Proceedings of
the 6th annual ACM international workshop on Web information and data
management, 2004.
[23] M. Ester, et al., "Algorithms for characterization and trend detection in spatial
databases", Proc. of the 4th International Conference on Knowledge Discovery
and Data Mining (KDD-98), 1998.
[24] M. Ester, et al., "A density-based algorithm for discovering clusters in large
spatial databases with noise", Proc. 2nd Int. Conf. on Knowledge Discovery
and Data Mining, Portland, OR, AAAI Press, 1996.
[25] M. S. Flickner, et al., "Query by image and video content: the QBIC system",
Computer, 28(9), 23-32 1995.
[26] W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, "Knowledge discovery
in databases: an overview", AI Magazine, 13(3), 57-70 1992.
[27] M. N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: sequential pattern
mining with regular expression constraints", Proceedings of the 25th
International Conference on Very Large Data Bases, 1999.
[28] P. Giudici, Applied data mining: statistical methods for business and industry, Wiley, 2003.
[29] S. Guha, R. Rastogi, and K. Shim, "CURE: an efficient clustering algorithm
for large databases", Proceedings of the 1998 ACM SIGMOD international
conference on Management of data, 1998.
[30] V. Guralnik and G. Karypis, "Parallel tree-projection-based sequence mining
algorithms", Parallel Computing, 30(4), 443-472 2004.
[31] J. Han, G. Dong, and Y. Yin, "Efficient mining of partial periodic patterns in
time series database", ICDE, 99, 106-115 1999.
[32] J. Han, W. Gong, and Y. Yin, "Mining segment-wise periodic patterns in
time-related databases", Proc. Int. Conf. on Knowledge Discovery and Data
Mining, 1998.
[33] J. Han and M. Kamber, Data mining: concepts and techniques, 2nd edition,
Morgan Kaufmann, 2006.
[34] J. Han, S. Nishio, and H. Kawano, "Knowledge discovery in object-oriented
and active databases", Knowledge Building and Knowledge Sharing, 221-230
1994.
[35] J. Han, et al., "Generalization-based data mining in object-oriented databases
using an object cube model", Data and Knowledge Engineering, 25(1-2),
55-97 1998.
[36] J. Han, et al., "FreeSpan: frequent pattern-projected sequential pattern mining",
Proceedings of the sixth ACM SIGKDD international conference on
Knowledge discovery and data mining, Boston, Massachusetts, United States,
2000.
[37] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate
generation", ACM SIGMOD Record, 29(2), 1-12 2000.
[38] G. Hepner, et al., "Artificial neural network classification using a minimal
training set-comparison to conventional supervised classification",
Photogrammetric Engineering and Remote Sensing, 56, 469-473 1990.
[39] J. Hipp, U. G 　tzer, and G. Nakhaeizadeh, "Algorithms for association rule
mining: general survey and comparison", ACM SIGKDD Explorations
Newsletter, 2(1), 58-64 2000.
[40] T. P. Hong, C. S. Kuo, and S. C. Chi, "Mining fuzzy sequential patterns from
quantitative data", Systems, Man, and Cybernetics, 1999. IEEE SMC'99
Conference Proceedings. 1999 IEEE International Conference on, 1999.
[41] T. P. Hong, K. Y. Lin, and S. L. Wang, "Mining fuzzy sequential patterns from
multiple-item transactions", IFSA World Congress and 20th NAFIPS
International Conference, Vancouver, BC, Canada, 2001.
[42] M. James, Classification algorithms, Wiley-Interscience New York, NY, USA, 1985.
[43] P.-s. Kam and A. W.-c. Fu, "Discovering temporal patterns for interval-based
events", Proceeding of Second International Conference on Data Warehousing
and Knowledge Discovery, London, UK, 2000.
[44] G. Karypis, E. H. Han, and V. Kumar, "CHAMELEON: a hierarchical
clustering algorithm using dynamic modeling", COMPUTER, 32, 68-75 1999.
[45] D. E. Knuth, J. H. Morris Jr, and V. R. Pratt, "Fast pattern matching in strings",
SIAM Journal on Computing, 6, 323 1977.
[46] T. Kohonen, "Self-organized formation of topologically correct feature maps",
Biological Cybernetics, 43(1), 59-69 1982.
[47] K. Koperski and J. Han, "Discovery of spatial association rules in geographic
information databases", Proceedings of the 4th International Symposium on
Advances in Spatial Databases, 1995.
[48] B. Kovalerchuk and E. Vityaev, Data mining in finance: advances in relational
and hybrid methods, Kluwer Academic, 2000.
[49] P. Langley, W. Iba, and K. Thompson, "An analysis of Bayesian classifiers",
Proceedings of the Tenth National Conference on Artificial Intelligence, 1992.
[50] C. S. Li, P. S. Yu, and V. Castelli, "HierarchyScan: a hierarchical similarity
search algorithm for databases of long sequences", Proceedings of the Twelfth
International Conference on Data Engineering, 1996.
[51] S. Ma, et al., "Mining partially periodic event patterns with unknown periods",
Data Engineering, 2001. Proceedings. 17th International Conference on, 2001.
[52] J. MacQueen, "Some methods for classification and analysis of multivariate
observations", Proceedings of the Fifth Berkeley Symposium on Mathematical
Statistics and Probability, 1967.
[53] H. Mannila and H. Toivonen, "Levelwise search and borders of theories in
knowledge discovery", Data Mining and Knowledge Discovery, 1(3), 241-258
1997.
[54] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, "Discovery of frequent
episodes in event sequences", Data Mining and Knowledge Discovery, 1(3),
259-289 1997.
[55] R. Mattison, Data warehousing and data mining for telecommunications,
Artech House, Inc. Norwood, MA, USA, 1997.
[56] H. J. Mo and S. D. M. White, "An analytic model for the spatial clustering of
dark matter haloes", Arxiv preprint astro-ph/9512127 1995.
[57] S. K. Murthy, "Automatic construction of decision trees from data: a
multi-disciplinary survey", Data Mining and Knowledge Discovery, 2(4),
345-389 1998.
[58] R. T. Ng and J. Han, "Efficient and effective clustering methods for spatial
data mining", Proceedings of the 20th International Conference on Very Large
Data Bases, 1994.
[59] S. Parthasarathy, et al., "Incremental and interactive sequence mining",
Proceedings of the eighth international conference on Information and
knowledge management, 1999.
[60] J. Pei, et al., "PrefixSpan: mining sequential patterns efficiently by
prefix-projected pattern growth", Data Engineering, 2001. Proceedings. 17th
International Conference on, Heidelberg, Germany, 2001.
[61] J. Pei, J. Han, and W. Wang, "Mining sequential patterns with constraints in
large databases", Proceedings of the eleventh international conference on
Information and knowledge management, McLean, Virginia, USA, 2002.
[62] H. Pinto, et al., "Multi-dimensional sequential pattern mining", Proceedings of
the tenth international conference on Information and knowledge management,
2001.
[63] D. Pyle, Business modeling and data mining, Morgan Kaufmann, 2003.
[64] J. R. Quilan, "C4. 5: programs for machine learning", Morgan Kaufmann
1993.
[65] J. R. Quinlan, "Induction of decision trees", Machine Learning, 1(1), 81-106
1986.
[66] J. R. Quinlan, "Simplifying decision trees", International Journal of
Man-Machine Studies, 27(3), 221-234 1987.
[67] D. E. Rumelhart and D. Zipser, "Feature discovery by competitive learning",
Cognitive Science, 9(1), 75-112 1985.
[68] G. Sheikholeslami, S. Chatterjee, and A. Zhang, "WaveCluster: a
multi-resolution clustering approach for very large spatial databases",
Proceedings of the 24rd International Conference on Very Large Data Bases,
1998.
[69] R. Srikant and R. Agrawal, "Mining sequential patterns: generalizations and
performance improvements", Preceedings of the 5th International Conference
on Extending Database Technology (EDBT), Avignon, France, 1996.
[70] R. Sullivan, A. Timmermann, and H. White, The dangers of data-driven
inference: the case of calendar effects in stock returns, LSE Financial Markets
Group, 1998.
[71] E. A. Wan, "Neural network classification: a Bayesian interpretation", IEEE
Transactions on Neural Networks, 1(4), 303-305 1990.
[72] J. Wang and J. Han, "BIDE: efficient mining of frequent closed sequences",
Data Engineering, 2004. Proceedings. 20th International Conference on, 2004.
[73] K. Wang, et al., "Top down fp-growth for association rule mining", Proc. of
6th Pacific-Asia conference on Knowledge Discovery and Data Mining, 2002.
[74] W. Wang, J. Yang, and R. Muntz, "STING: a statistical information grid
approach to spatial data mining", Proceedings of the 23rd International
Conference on Very Large Data Bases, 1997.
[75] C. R. Westphal and T. Blaxton, Data mining solutions, Wiley New York, 1998.
[76] S.-Y. Wu and Y.-L. Chen, "Mining non-ambiguous temporal patterns for
interval-based events", IEEE Transactions on Knowledge and Data
Engineering (forthcomming), 19(6) 2007.
[77] X. Yan, J. Han, and R. Afshar, "CloSpan: Mining closed sequential patterns in
large datasets", Proceedings of the Int. Conference SIAM Data Mining, 2003.
[78] J. Yang, W. Wang, and P. S. Yu, "Mining asynchronous periodic patterns in
time series data", IEEE Transactions on Knowledge and Data Engineering,
15(3), 613-628 2003.
[79] C.-C. Yu and Y.-L. Chen, "Mining sequential patterns from multi-dimensional
sequence data", IEEE Transaction on Data and Knowledge Engineering,
17(1), 136-140 2005.
[80] O. R. Za

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2007-7-4

推文