植基於Spark系統之分散式粒化運算決策產生演算法

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	林子晏	zh_TW
DC.creator	Zi-Yan Lin	en_US
dc.date.accessioned	2017-8-16T07:39:07Z
dc.date.available	2017-8-16T07:39:07Z
dc.date.issued	2017
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=103522021
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	Classification演算法的特色是分成兩個階段，第一個階段是training，用已經分類的資料並根據資料的特徵做出對應的類別，第二個階段是Classification，對其他未經分類資料的特徵做分類。DGAGC是一種Classification演算法，適用於離散型資料，連續型資料需要額外處理。我們過去的研究已經讓DGAGC支援Hadoop MapReduce運算模型。但是Hadoop MapReduce的版本只針對DGAGC training的部分。在Classification部分，只有單機版本。其中以training的部分最花時間。本篇論文提出了Spark版本的DGAGC training與Classification，藉此來改善Hadoop版本在資料集運算量不算大時的執行效率。再來是DGAGC Classification的部分，單機版本在預測模型太大的時候就無法進去預測。所以提出Spark版本的DGAGC Classification改善此問題。	zh_TW
dc.description.abstract	The DGAGC algorithm, developed by National Central University, is a classification algorithm based on association-rule mining and searching. The DGAGC algorithm also specifies a distributed computing approach for model training, which is implemented on top of Hadoop MapReduce. In this study, we propose a new distributed computing approach for the DGAGC algorithm based on Apache Spark. With the support of in-memory computing by Spark, the new distributed DGAGC algorithm can achieve less average execution time for model training, given four different training data sets. In addition, we also propose a distributed version of the DGAGC for data classification.	en_US
DC.subject	分類演算法	zh_TW
DC.subject	分散式粒化運算決策產生演算法	zh_TW
DC.subject	Hadoop	en_US
DC.subject	Spark	en_US
DC.subject	DGAGC	en_US
DC.subject	Classification	en_US
DC.title	植基於Spark系統之分散式粒化運算決策產生演算法	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	A Distributed Decision Generation Algorithm based on Granular Computing Using Spark	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 103522021 完整後設資料紀錄