English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78937/78937 (100%)
造訪人次 : 39423800      線上人數 : 591
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/12910


    題名: 序列樣式探勘之研究;The Research of Mining Frequent Sequential Patterns
    作者: 陳仕昇;Shih-Sheng Chen
    貢獻者: 資訊管理研究所
    關鍵詞: 資料探勘;序列樣式;序列式資料;frequent pattern;sequential pattern;data mining
    日期: 2003-06-30
    上傳時間: 2009-09-22 15:18:18 (UTC+8)
    出版者: 國立中央大學圖書館
    摘要: 在眾多的資料中,具有有序性的序列資料是一個重要的研究議題,不管是科學上及商業上皆有廣泛的運用,在科學上如DNA序列的研究;在商業上如分析購物網站上使用者的瀏覽行為。我們可利用資料探勘技術可從序列資料中挖掘出高頻序樣式,提供使用者或決策者作不同的用途。在本論文中,我們將先前所研究的循序樣式,再細分成固定樣式及變動樣式,讓使用者或決策者更能瞭解潛藏在大量資料中更多的知識及規則。我們所提出的演算法可將循序樣式作區分外,其執行的效率不比目前執行效率相當快的PrefixSpan差。論文中,我們亦提出以抽樣為基礎的演算法分別挖掘一般的循序樣式及連續型循序樣式,我們演算法的優點有三,一是可處理大量的資料如同Apriori-like演算法,二是有效率如同Pattern Growth-like演算法,三是可與目前挖掘一般的循序樣式及連續型循序樣式的演算法相結合,且可與本論文提出挖掘混合樣式演算法相同。本論文主要應用於具有序列性質的資料,如在行銷上的依消費者行為作市場區隔,網站上網頁及系統效率維護等,提供使用者作為分析及決策的參考。 Mining sequential patterns in databases is an important issue with many applications on commercial and scientific domains. For example, finding the patterns of DNA sequences and analyzing users’ web site browsing patterns can help to discover important knowledge in genetic evolution and consumer behavior, respectively. Existing studies on finding sequential patterns can be classified into two categories, namely continuous and discontinuous patterns. In the first category, patterns are composed of elements in consecutive sequences. In the second category, patterns can be composed by elements that are separated by wild cards, which can denote zero or more than one elements. Although many researches have been published to find either kind of the patterns, no one can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. The dissertation defines hybrid patterns as the combination of continuous and discontinuous patterns and proposes a novel algorithm to mine hybrid patterns. The algorithm is as fast as PrefixSpan for mining sequential patterns. Algorithms such as PrefixSpan require data volume to be small enough to fit in the main memory of machines to gain the full speed. In the dissertation, we also propose a sampling-based approach to find discontinuous patterns and continuous patterns. There are three advantages in this approach. First, it can mine frequent patterns from huge data as Apriori-like algorithms but need not to scan database many times. Second, it is as efficient as Pattern-growth algorithm like PrefixSpan and need not compress the database into the memory. Third, it can work with any known algorithm in mining discontinuous or continuous patterns. The algorithms developed in the dissertation are important because they can be applied to mine knowledge from sequential data which are generated often in our daily life.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 大小格式瀏覽次數


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明