植基於Spark系統之分散式粒化運算決策產生演算法;A Distributed Decision Generation Algorithm based on Granular Computing Using Spark

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/74732

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/74732

題名:	植基於Spark系統之分散式粒化運算決策產生演算法;A Distributed Decision Generation Algorithm based on Granular Computing Using Spark
作者:	林子晏;Lin, Zi-Yan
貢獻者:	資訊工程學系
關鍵詞:	分類演算法;分散式粒化運算決策產生演算法;Hadoop;Spark;DGAGC;Classification
日期:	2017-08-16
上傳時間:	2017-10-27 14:37:45 (UTC+8)
出版者:	國立中央大學
摘要:	Classification演算法的特色是分成兩個階段，第一個階段是training，用已經分類的資料並根據資料的特徵做出對應的類別，第二個階段是Classification，對其他未經分類資料的特徵做分類。DGAGC是一種Classification演算法，適用於離散型資料，連續型資料需要額外處理。我們過去的研究已經讓DGAGC支援Hadoop MapReduce運算模型。但是Hadoop MapReduce的版本只針對DGAGC training的部分。在Classification部分，只有單機版本。其中以training的部分最花時間。本篇論文提出了Spark版本的DGAGC training與Classification，藉此來改善Hadoop版本在資料集運算量不算大時的執行效率。再來是DGAGC Classification的部分，單機版本在預測模型太大的時候就無法進去預測。所以提出Spark版本的DGAGC Classification改善此問題。;The DGAGC algorithm, developed by National Central University, is a classification algorithm based on association-rule mining and searching. The DGAGC algorithm also specifies a distributed computing approach for model training, which is implemented on top of Hadoop MapReduce. In this study, we propose a new distributed computing approach for the DGAGC algorithm based on Apache Spark. With the support of in-memory computing by Spark, the new distributed DGAGC algorithm can achieve less average execution time for model training, given four different training data sets. In addition, we also propose a distributed version of the DGAGC for data classification.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	162	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....