中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/12910
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 82070/82070 (100%)
Visitors : 55533974      Online Users : 1329
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/12910


    Title: 序列樣式探勘之研究;The Research of Mining Frequent Sequential Patterns
    Authors: 陳仕昇;Shih-Sheng Chen
    Contributors: 資訊管理研究所
    Keywords: 資料探勘;序列樣式;序列式資料;frequent pattern;sequential pattern;data mining
    Date: 2003-06-30
    Issue Date: 2009-09-22 15:18:18 (UTC+8)
    Publisher: 國立中央大學圖書館
    Abstract: 在眾多的資料中,具有有序性的序列資料是一個重要的研究議題,不管是科學上及商業上皆有廣泛的運用,在科學上如DNA序列的研究;在商業上如分析購物網站上使用者的瀏覽行為。我們可利用資料探勘技術可從序列資料中挖掘出高頻序樣式,提供使用者或決策者作不同的用途。在本論文中,我們將先前所研究的循序樣式,再細分成固定樣式及變動樣式,讓使用者或決策者更能瞭解潛藏在大量資料中更多的知識及規則。我們所提出的演算法可將循序樣式作區分外,其執行的效率不比目前執行效率相當快的PrefixSpan差。論文中,我們亦提出以抽樣為基礎的演算法分別挖掘一般的循序樣式及連續型循序樣式,我們演算法的優點有三,一是可處理大量的資料如同Apriori-like演算法,二是有效率如同Pattern Growth-like演算法,三是可與目前挖掘一般的循序樣式及連續型循序樣式的演算法相結合,且可與本論文提出挖掘混合樣式演算法相同。本論文主要應用於具有序列性質的資料,如在行銷上的依消費者行為作市場區隔,網站上網頁及系統效率維護等,提供使用者作為分析及決策的參考。 Mining sequential patterns in databases is an important issue with many applications on commercial and scientific domains. For example, finding the patterns of DNA sequences and analyzing users’ web site browsing patterns can help to discover important knowledge in genetic evolution and consumer behavior, respectively. Existing studies on finding sequential patterns can be classified into two categories, namely continuous and discontinuous patterns. In the first category, patterns are composed of elements in consecutive sequences. In the second category, patterns can be composed by elements that are separated by wild cards, which can denote zero or more than one elements. Although many researches have been published to find either kind of the patterns, no one can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. The dissertation defines hybrid patterns as the combination of continuous and discontinuous patterns and proposes a novel algorithm to mine hybrid patterns. The algorithm is as fast as PrefixSpan for mining sequential patterns. Algorithms such as PrefixSpan require data volume to be small enough to fit in the main memory of machines to gain the full speed. In the dissertation, we also propose a sampling-based approach to find discontinuous patterns and continuous patterns. There are three advantages in this approach. First, it can mine frequent patterns from huge data as Apriori-like algorithms but need not to scan database many times. Second, it is as efficient as Pattern-growth algorithm like PrefixSpan and need not compress the database into the memory. Third, it can work with any known algorithm in mining discontinuous or continuous patterns. The algorithms developed in the dissertation are important because they can be applied to mine knowledge from sequential data which are generated often in our daily life.
    Appears in Collections:[Graduate Institute of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File SizeFormat


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明