English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 94201/94201 (100%)
造訪人次 : 80417357      線上人數 : 176
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/106421


    題名: Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies
    作者: 柯士文;Tsai, Chih-Fong;Lin, Wei-Chao;Ke, Shih-Wen
    貢獻者: 管理學院資訊管理學系
    關鍵詞: Algorithms;Big Data;Cloud computing;Data mining;Datasets;Distributed;MapReduce;Parallel computing;Studies
    日期: 2016-12-01
    上傳時間: 2026-04-23 13:22:09 (UTC+8)
    出版者: Elsevier Inc.;New York: Elsevier Inc
    摘要: 摘要: •The performances of distributed and MapReduce methodologies over big datasets are compared.•Particularly, mining accuracy and efficiency of these two methodologies are examined.•The MapReduce based procedure by different numbers of nodes performs very stable.•Moreover, the MapReduce procedure requires the least computational cost to process big datasets. Mining with big data or big data mining has become an active research area. It is very difficult using current methodologies and data mining software tools for a single personal computer to efficiently deal with very large datasets. The parallel and cloud computing platforms are considered a better solution for big data mining. The concept of parallel computing is based on dividing a large problem into smaller ones and each of them is carried out by one single processor individually. In addition, these processes are performed concurrently in a distributed and parallel manner. There are two common methodologies used to tackle the big data problem. The first one is the distributed procedure based on the data parallelism paradigm, where a given big dataset can be manually divided into n subsets, and n algorithms are respectively executed for the corresponding n subsets. The final result can be obtained from a combination of the outputs produced by the n algorithms. The second one is the MapReduce based procedure under the cloud computing platform. This procedure is composed of the map and reduce processes, in which the former performs filtering and sorting and the later performs a summary operation in order to produce the final result. In this paper, we aim to compare the performance differences between the distributed and MapReduce methodologies over large scale datasets in terms of mining accuracy and efficiency. The experiments are based on four large scale datasets, which are used for the data classification problems. The results show that the classification performances of the MapReduce based procedure are very stable no matter how many computer nodes are used, better than the baseline single machine and distributed procedures except for the class imbalance dataset. In addition, the MapReduce procedure requires the least computational cost to process these big datasets.
    出版者: New York: Elsevier Inc
    出版日期: 2016-12
    出處: The Journal of systems and software, 2016-12, Vol.122, p.83-92
    資源來源: Elsevier ScienceDirect Journals Complete
    版權: 2016 Elsevier Inc.
    版權: Copyright Elsevier Sequoia S.A. Dec 2016
    識別號: ISSN: 0164-1212
    識別號: EISSN: 1873-1228
    識別號: DOI: 10.1016/j.jss.2016.09.007
    識別號: CODEN: JSSODM
    顯示於類別:[資訊管理學系] 期刊論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML20檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明