無候選型樣產生之頻繁樹狀結構探勘

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：3.19.56.45

姓名

童俊宏(Jiun-Hung Tung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

無候選型樣產生之頻繁樹狀結構探勘
(MINT: Mining Frequent Rooted Induced Unordered Tree without Candidate Generation)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在資料探勘（Data Mining）的領域中樹狀結構的探勘（Tree Mining）是一個重要的問題，它可以應用在網站記錄（Web Logs）的分析、生物資訊（Bioinformatics）和半結構式的文件（Semi-structured Documents）上。然而在此方面的先前研究都是先產生候選型樣，再測試其是否為頻繁出現的型樣，如果不是則會被刪除。以這樣的做法會用都掉很多的時間及空間在候選者的產生與測試上。所以，在此篇論文裡面，我們使用區域頻繁的這個概念設計了一個不會有候選者產生的演算法來做「有樹根的」、「誘導的」、「無序的」樹狀結構的探勘工作，而我們把這個演算法稱為MINT。我們利用資料產生器產生一些人工合成的資料集，以及實際的網站記錄資料，和HybridTreeMiner 來做比較。實驗結果顯示出即使在樹狀結構這種複雜的資料型態中，使用找尋區域頻繁的觀念是依然可以有不錯的效能。

摘要(英)

Tree pattern mining is an important issue in data mining area and it has many emerging applications including web log analysis, bioinformatics, semi-structured documents, and so on. However, most of the previous works are candidate-generation-and-testing approach. They enumerate candidate patterns from shorter patterns based on the apriori frequent patterns. Because this approach costs a lot of time and space in candidate generation and testing, in this paper, we adopt the idea of pattern growth to mine frequent rooted induced unordered tree without candidate generation. In the performance study, we use synthetic datasets and real world application datasets to compare with HybridTreeMiner. The experiments show that our algorithm is an efficient algorithm and cost-effective.

關鍵字(中)

★ 子樹
★ 標準型式
★ 支持度
★ 頻繁
★ 型樣

關鍵字(英)

★ canonical form
★ subtree
★ pattern
★ frequent
★ support

論文目次

目錄..........................................................................I
圖目錄......................................................................III
表目錄...................................................................... IV
第一章緒論...................................................................1
1.1. 研究動機與目的...........................................................1
1.2. 論文架構.................................................................3
第二章問題定義...............................................................4
第三章相關研究...............................................................8
3.1. Unot 演算法..............................................................9
3.2. uFreqt 演算法...........................................................10
3.3. HybridTreeMiner 演算法..................................................10
3.4. RootedTreeMiner 演算法..................................................12
3.5. 相關研究之比較..........................................................12
第四章演算法................................................................14
4.1. 樹狀結構探勘的挑戰......................................................14
4.2. 演算法架構..............................................................15
4.2.1. 標準型式（Canonical From） ...........................................15
4.2.2. 型樣列舉方法（Enumeration）...........................................17
4.2.3. 延伸點所允許的標記值範圍計算(Label Range Computing)...................19
4.2.4. 樹型樣成長機制（Extension）...........................................21
4.3. 演算法..................................................................25
第五章實驗結果..............................................................27
5.1 合成資料集...............................................................27
5.1.1 資料產生器說明.........................................................27
5.1.2 合成資料集實驗分析.....................................................29
5.2 實際資料集...............................................................30
5.2.1 實際資料集實驗分析.....................................................30
第六章結論..................................................................33
參考文獻.....................................................................34

參考文獻

[1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In proceedings of 1994 International Conference. Very Large Data Bases (VLDB’94), Setp.1994, 487-499.
[2] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H.Sakamoto, and S. Arikawa, Efficient Substructure Discovery from Large Semi-structured Data. In proceedings of the 2nd SIAM International Conference on Data Mining, April 2002.
[3] T. Asai, H. Arimura, T. Uno, and S. Nakano: Discovering Frequent Substructures in Large Unordered Trees. In proceedings of 6th International Conference on Discovery Science, October 2003.
[4] Y. Chi, Y. Yang, and R. R. Muntz, Indexing and Mining Free Trees. In proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), November 2003.
[5] Y. Chi, Y. Yang, and R. R. Muntz, HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms. In proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM’04), June 2004.
[6] Y. Chi, Y. Yang, and R. R. Muntz, Canonical Forms for Labeled Trees and Their Applications in Frequent Subtree Mining. Journal of Knowledge and Information Systems (KAIS), August 2005, 203-234.
[7] Y. Chi, Y. Yang, Y. Xia, and R. R. Muntz: CMTreeMiner, Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), February, 2005.
[8] J. Han, J. Pei, Y. Yin, and R. Mao, Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Journal of Data Mining and Knowledge Discovery, 8(1), 53-87, 2004.
[9] K. Y. Huang, C. H. Chang and K. Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In proceedings of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2004.
[10] S. Nijssen and J. N. Kok: Efficient Discovery of Frequent Unordered Trees. 1st international Workshop on Mining Graphs, Trees and Sequences, 2003.
[11] H. Tan, T. S. Dillon, F. Hadzic, E. Chang, and L. Feng, IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. In proceeding of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2006), 450 - 461, April 9-12 2006.
[12] C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi, Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. In proceedings of PAKDD, 2004.
[13] Y. Xiao, J. F. Yao, Z. Li, and M. H. Dunham, Efficient Data Mining for Maximal Frequent Subtrees. In proceedings of the 3rd IEEE international Conference on Data Mining, 2003.
[14] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, H-Mine: Hyper-Structure Mining of Frequent Pattern in Large Database. In proceedings of International Conference on Data Mining (ICDM), 2001.
[15] J. Pei, J. Han, B. M. Asl, and H. Pinto, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In proceedings of 17th International Conference on Data Engineering (ICDE), 2001.
[16] J. Punin, M. Krishnamoorthy, M. Zaki, LOGML: Log markup language for web usage mining. In WEBKDD Workshop (with SIGKDD), August 2001.
[17] Y. Xiao, J. F. Yao, and G. Yang, Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees. International Journal of Data Warehousing and Mining (IJDWM), 1(2), 44-66, April-June 2005.
[18] M. J. Zaki, C. C. Aggarwal, XRules: An Effective Structural Classifier for XML Data. In proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003.
[19] M. J. Zaki, Efficiently Mining Frequent Trees in a Forest, Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1021-1035, August 2005.
[20] M. J. Zaki, Efficiently Mining Frequent Embedded Unordered Trees. In proceedings of the Fundamenta Informaticae, 2005.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2006-7-21

推文