頻繁同構圖形探勘策略之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：80

、訪客IP：18.216.3.58

姓名

何承道(Cheng-Tao Ho) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

頻繁同構圖形探勘策略之研究
(HybirdGMiner：The Mining Strategy on Frequent Isomorphism Graph Structure)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

由於在頻繁項目集合（Frequent Itemsets）和序列型樣（Sequential Patterns）的探勘技術日趨成熟，很自然的，我們會想再進一步探討另一種包涵更廣泛資料關聯性的型樣探勘（Pattern Mining）－圖形探勘（Graph Mining）。圖形探勘的應用非常廣泛，較著名的應用領域像是化學（Chemistry）、生物學（Biology）和電腦網路方面（Computer Network），以及其它所有可以對應成圖形型樣（Graph Pattern）的實際資料，在這些領域都會需要圖形型樣的探勘技術來支援其資料的分析與預測。圖形探勘的主要挑戰在於如何解決子/圖形同構（Subgraph/ Graph Isomorphism）問題，在本篇論文中我們提出一個結合圖形標準型態（Canonical Form）和資料內嵌結構的演算法，針對圖形資料庫（Graph Databases）進行高效率探勘。其主要概念為利用圖形標準型態解決重覆列舉問題，以及有技巧的記錄圖形型樣在資料庫中的位置（Embedding List），完全避免子圖形同構的檢查問題。實驗顯示我們所提出的演算法無論在合成資料與實際資料中，探勘效率都會勝過gSpan。

摘要(英)

As the mining of frequent itemsets and sequential patterns became more mature, it is very natural that we would want to explore other patterns such as graph structures. Graph mining has very wide applications, such as chemistry, biology and computer networks. The main challenge in graph mining is how to solve the graph/ subgraph isomorphism problems. Thus, we propose an algorithm that combined previous pattern mining skills and some graph mining techniques to mine all frequent subgraph patterns efficiently. Our algorithm adopts canonical form to avoid the duplicate enumeration, and used an effective embedding list structure to avert the subgraph isomorphism checking completely. Our empirical study on synthetic and real datasets demonstrates that HybridGMiner achieves a substantial performance gain over the algorithm gSpan.

關鍵字(中)

★ 圖形探勘
★ 型樣探勘
★ 圖形同構

關鍵字(英)

★ pattern mining
★ graph isomorphism
★ graph structures
★ graph mining

論文目次

第一章緒論 1
1.1. 研究動機與目的 1
1.2. 論文架構 2
第二章問題定義 3
第三章相關研究 6
3.1. 以廣度優先搜尋（BFS）之演算法 6
3.1.1. AGM演算法 7
3.1.2. FSG演算法 7
3.2. 以深度優先搜尋（DFS）之演算法 9
3.2.1. gSpan演算法 9
3.2.2. MoFa演算法 10
3.2.3. FFSM演算法 11
3.2.4. Gaston演算法 12
3.3. 演算法之整體比較表 13
第四章 HybridGMiner演算法 15
4.1. 圖形探勘的挑戰 15
4.2. HybridGMiner演算法架構 15
4.2.1. 型樣列舉方法（Enumeration） 16
4.2.2. 搜尋空間刪減技術（Pruning） 18
4.2.3. 圖形型樣成長機制（Extension） 22
4.3. 虛擬碼（Pseudo Code） 25
第五章實驗結果 28
5.1. 合成資料集（Synthetic Data） 28
5.1.1. 資料產生器 28
5.1.2. 實驗結果與分析 29
5.2. 實際資料（Real World Data） 32
第六章結論 35
參考文獻 36

參考文獻

［1］ R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 International Conference. Very Large Data Bases (VLDB’94), pages 487-499, Santiago, Chile, Sept.1994.
［2］ C. Borgelt. On Canonical Forms for Frequent Graph Mining. Workshop on Mining Graphs, Trees, and Sequences (MGTS'05 at PKDD'05, Porto, Portugal), 1-12. ECML/PKDD'05 Organization Committee, Porto, Portugal 2005.
［3］ C. Borgelt, M.R. Berthold. Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In Proceedings of the International Conference on Data Mining (ICDM), pages 51-58, 2002.
［4］ L. Dehaspe, H. Toivonen, and R.D. King. Finding frequent substructures in chemical compounds. Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, pages 30-36. AAAI Press. August 1998.
［5］ A. Deutsch, M. F. Fernandez, D. Suciu. Storing semistructured data with STORED. International Conference on Management of Data Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Pages: 431 – 442, 1999.
［6］ R. Goldman, J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, pages: 436-445, 1997.
［7］ L. B. Holder, D. J. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 169-180, 1994.
［8］ K. Y. Huang, C. H. Chang and K. Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In Proc. of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2004.
［9］ J. Huan, W. Wang, J. Prins. "Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism", in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), 2003.
［10］ J. Huan, W. Wang, J. Prins, J. Yang. SPIN: Mining Maximal Frequent Subgrsphs from Graph Databases. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’04), pages 581-586, 2004.
［11］ A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. the 4th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD2000), pp.13-23, 2000.
［12］ M. Kuramochi, G. Karypis. Frequent Subgraph Discovery. Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’02), pages 721-724, 2002.
［13］ B. D. McKay. Practical graph isomorphism. 10th. Manitoba Conference on Numerical Mathematics and Computing (Winnipeg, 1980); Congressus Numerantium, 30 (1981) 45-87.
［14］ Alípio M. Jorge, Luís Torgo, Pavel B. Brazdil, Rui Camacho, João Gama. A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. the 9th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD2005), pages 392-403, 2005.
［15］ S. Nijssen, J.N. Kok. Frequent Graph Mining and its Application to Molecular Databases. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, SMC 2004, Den Haag, Netherlands, October 10-13, 2004. IEEE Press, 2004.
［16］ J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In. Proc. 2001 International Conference Data Engineering (ICDE'01), pages 215-224, Heidelberg, Germany, April 2001.
［17］ K. Shearer, H. bunks, S. Venkatesh. Video Indexing and Similarity Retrieval by Largest Common Subgraph Detection using Decision Trees. Pattern Recognition 34 (2001) 1075—1091.
［18］ X. Yan, J. Han. gSpan: gSpan: Graph-based Substructure Pattern Mining. In Proc. 2002 International Conference Data Engineering (ICDM’02), pages 721, 2002.
［19］ X. Yan, J. Han. gSpan: gSpan: Graph-based Substructure Pattern Mining Technical Report UIUCDCS-R-2002-2296, Department of Computer Science, University of Illinois at Urbana-Champaign, 2002.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2006-7-17

推文