中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/44727
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 41784794      在线人数 : 1270
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/44727


    题名: 適用於大資料集高效率的分散式階層分群演算法;An Efficient Distributed Hierarchical-Clustering Algorithm for Large Scale Data Set
    作者: 黃安慶;An-Cing Huang
    贡献者: 資訊工程研究所
    关键词: 階層式分群演算法;平行計算;分散式計算;Parallel Computing;MPI;Hierarchical Clustering;Distributed System
    日期: 2010-07-28
    上传时间: 2010-12-09 13:53:51 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著資訊科技的進步,各領域所需要處理的資料量漸漸龐大到單一電腦無法處理的規模,以階層式分群演算法來說,由於其在執行時必須儲存非常大量的資料,因此在處理的資料量大時會面臨許多問題。 因此,本研究提出了將階層式分群演算法平行分散至多台電腦執行的的計算架構,藉由一預先設定的臨界值,過濾不必要儲存的資料,並將原來的階層結構拆解成為許多可獨立進行階層式分群演算法的子群集。最後將這些子群集以平行運算的的方式來加速階層式分群演算法的執行。依據這個計算模式,本研究以Message Passing Interface (MPI)函式庫實作出能夠讓階層式分群演算法平行分散的計算架構。 我們所提出的計算架構的主要優點是能夠大幅減少階層式分群演算法所需要的儲存空間與執行時間。個別應用可以依據自己的需求對於本文發展之程式去做出適度的修改,即可將本架構套用在該應用上。實驗結果也顯示出,本研究所提出之階層分群演算法的平行分散架構,於執行時間與儲存空間上皆有相當的改善,很適合發展到多種不同的應用之上。Clustering of different kinds of groups is a common and important technique in any research area. Clustering algorithms usually focus on a small dataset which can be analyzed by a single machine. However, as new hardware and techniques are developed for collecting data, the size of datasets can grow to an extremely large scale in many domains, such as astronomy, high energy physics, and aircraft engine diagnostics. However, The time complexity of hierarchical clustering algorithms are polynomial time between O(N2) to O(N3). This means that the computation cost of the algorithms will grow very fast as the size of input data become large. Therefore, the hierarchical clustering algorithms cannot be used directly in this situation because they can’t guarantee that the users will get the results back in a bounded amount of time. This research focuses on how to make the hierarchical clustering algorithm process in parallel. The traditional hierarchical clustering algorithm is an unsupervised learning algorithm which doesn't need to label data in advance or assign the number of clusters. These characteristics make it become adaptable and capable to process many kinds of data. The goal of our research is to use a parallel computing architecture to improve the speed of execution and minimize the storage space needed of traditional hierarchical clustering algorithms, and refining the process of hierarchical clustering algorithms. We propose a Parallelized Hierarchical Clustering Algorithm, which provides a modified Hierarchical Agglomerative Algorithm that can be adapted to the distributed environment. This algorithm can process a grouping in a parallel way, and reduce both data computation load and transmission rate when facing a large-size data.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML719检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明