博碩士論文 103423018 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator陳毅寰zh_TW
DC.creatorYi-Huan Chenen_US
dc.date.accessioned2016-8-5T07:39:07Z
dc.date.available2016-8-5T07:39:07Z
dc.date.issued2016
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=103423018
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract巨量資料時代來臨,隨著處理的資料跳躍式成長,資料的雜訊也隨之增加,因此我們在資料探勘之前先進行資料的樣本選取,把雜訊去掉留下代表性資料,以確保後續資料探勘的品質。   但隨著資料點的數量級大到一定程度,樣本選取前處理的複雜度大增,使選取效果受到影響,進而影響後續資料探勘結果。另外,不同的樣本選取演算法在不同的資料集或是問題上,其選取效果有優有劣,不可能有任何一個演算法在所有資料集都有最佳的選取效果。 本研究提出了分散式樣本選取流程架構DCIS,藉由Divide and Conquer的概念把問題簡化成數個子問題,依序各個擊破進行樣本選取且最後再進行一次匯集篩選,以提升選取品質,並讓不同的樣本選取演算法在本DCIS的架構中都能獲得選取效果的提升。 本研究使用小型資料集,逐步實驗不同的匯集篩選方式、分類器和分散的群組數,最後確定DCIS方法架構,以大型資料集實驗DCIS之成效。結果顯示DCIS成功的讓不同的樣本選取演算法,在面對大型或是小型資料集時,都獲得了樣本選取品質的提升,進而幫助後續的資料探勘結果。zh_TW
dc.description.abstractIn the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem. In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection. We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets.en_US
DC.subject巨量資料zh_TW
DC.subjectDivide and Conquerzh_TW
DC.subject資料前處理(樣本選取)zh_TW
DC.subject分類探勘zh_TW
DC.subjectbig dataen_US
DC.subjectdivide and conqueren_US
DC.subjectinstance selectionen_US
DC.subjectclassificationen_US
DC.title分治式樣本選取法於巨量資料探勘之研究zh_TW
dc.language.isozh-TWzh-TW
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明