English  |  正體中文  |  简体中文  |  Items with full text/Total items : 69937/69937 (100%)
Visitors : 23037846      Online Users : 340
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/68058


    Title: 樹狀資料之中心集群分析;Center-based clustering with tree structured data
    Authors: 林政延;Lin,Zheng-Yan
    Contributors: 工業管理研究所
    Keywords: 資料探勘;樹狀資料;樹編輯距離;非階層式分群
    Date: 2015-07-06
    Issue Date: 2015-09-23 10:19:54 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 集群分析為資料探勘中非常熱門的一門應用領域,其主要作用為藉由資料間的特性將資料分群。傳統上,集群分析較常被使用在數值型資料上及類別型的資料上。然而群集分析卻很少應用在有關樹狀資料型態的研究上。然而樹狀資料常以各種不同形式出現在我們日常生活中,像是物料清單以及檔案資料結構。因此本研究主要針對如何將集群分析應用在樹狀資料。
    集群分析主要分為兩種形式,一種為階層式分群法,另一種為非階層式分群法。目前有關樹狀資料的分群研究大多為階層式分群。然而在結果上比起階層式分群法,非階層式分群法有其優勢。因此本研究主要著重在如何對樹狀資料做非階層式的集群分析。在集群分析中,另外一個需要考量的重點為相似度的定義,在本篇研究中,我們採用的樹編輯距離為最主流的樹狀資料相似度衡量方式,其概念為藉由找出轉換一樹狀資料至另一樹狀資料之間的最小編輯步驟為兩樹狀資料之間的距離。
    在過去,有學者Syu (2014) 提出方法將非階層式分群應用在字串型資料上。然而相對於字串型資料,樹狀資料要考慮的不僅僅為資料順序問題,還要考慮到節點以及分支結構的特性。在考量到樹狀資料的特性下,我們結合了K-means與K-modes兩種演算法的特性作為我們建立中心點的基礎。依此中心點,我們便能對樹狀資料做非階層式分群。
    ;Cluster analysis is a very popular topic in the fields of data mining. The main purpose of clustering is to cluster objects according to the characteristic of objects. However, the number of researches on tree-structured data clustering analysis is very few. Tree-structured data are everywhere in our daily life with variant form, such as bill of material (BOM), XML structure.
    Most studies of clustering on tree-structured data are hierarchical. However, non-hierarchical clustering has its own advantages when compare to hierarchical clustering. Therefore, we focus on applied non-hierarchical clustering on tree-structured data. The other important thing in cluster analysis is the similarity measure. The similarity we adopted was the tree edit distance which was the most popular similarity measure when measuring tree-structured data.
    In the past, Syu (2014) proposed a method to applied string data on non-hierarchical clustering. However, tree-structured data has more characteristic we need to concern, such as level, node, and arc. We proposed a method which combined the concepts of both K-means and K-modes to determine the center of cluster. Through the center we determined, we can make the combination of non-hierarchical clustering and tree-structured data.
    Appears in Collections:[工業管理研究所 ] 博碩士論文

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML436View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback  - 隱私權政策聲明