English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 36257323      Online Users : 2718
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/92658

    Title: 應用資料重採樣與資料離散化方法於類別不平衡問題之研究;Data Resampling and Discretization Methods for Class Imbalanced Data
    Authors: 林家暘;LIN, JIA-YANG
    Contributors: 資訊管理學系
    Keywords: 資料前處理;資料重採樣;資料離散化;類別不平衡;資料探勘;data preprocessing;data resampling;data discretization;class imbalance;data mining
    Date: 2023-07-27
    Issue Date: 2023-10-04 16:07:44 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 近年來,隨著人工智慧領域的蓬勃發展,許多產業積極投入相關研究,透過現有的產業資料,來研發適用於自身產業的智慧應用。然而在現實世界中,受到不同的人為或環境因素的影響,資料容易自然地呈現出偏斜且不均勻的狀態。這種類別不平衡問題廣泛存在於不同產業與領域當中,容易對相關應用的智慧模型造成負面影響,是近年相當重要的實務議題。因此,本研究欲應用資料層級的過採樣SMOTE(Synthetic Minority Over-sampling Technique, SMOTE),與ChiMerge和MDLP等監督式離散化方法,來探討不同資料前處理步驟的組合與順序,對於二元類別不平衡問題的效益與影響。此外,為了能夠深入理解不同重採樣方法,處理類別不平衡問題的效能差異。本研究納入多種相異的重採樣方法,即具有不同採樣策略的SMOTE方法、欠採樣Tomek Links方法,與上述兩者的混合方法,來進一步地探究不同前處理步驟的組合與順序,對於多元類別不平衡問題的影響。
    ;In recent years, with the booming of artificial intelligence, more people have taken the initiative to develop intelligent applications using their existing data, looking forward to creating successful products which suitable for their business. However, data tends to naturally present skewed or biased states due to various human or environmental factors in reality. The class imbalance problem widely exists in different industries and domains, and it causes negative influences on intelligent models used in related applications. Therefore, the issue has become an important practical concern recently. This study aims to explore the benefits and effects of data preprocessing steps with different combinations and orders to address binary class imbalance problems. The preprocessing steps include the oversampling technique called Synthetic Minority Over-sampling Technique (SMOTE) and supervised discretization methods such as ChiMerge and MDLP. Additionally, to gain a deeper understanding of different resampling methods′ performance in handling class imbalance problems, this study brings in diverse resampling methods, including SMOTE with different sampling strategies, an undersampling method called Tomek Links, and a hybrid method combining the above methods. To further investigate the impact of different preprocessing combinations and orders to address multiclass imbalance problems.
    This study uses binary and multiclass datasets provided by UCI and KEEL websites, to compare the effects of single preprocessing methods and mixed preprocessing methods on binary and multiclass class imbalance problems. Thus, clarifying the applicability of different preprocessing methods and providing effective solutions and recommendations. According to the experimental results, when dealing with binary class imbalance problems, it recommends the mixed method of using MDLP to discrete data features first, then using SMOTE to balance the datasets, to improve the classification performance of SVM, C4.5, and RF. Furthermore, when handling multiclass imbalance problems without considering the time cost, it recommends the mixed method of using resampling methods to balance the datasets first, then using ChiMerge to discrete data features, which can get more robust and accurate experimental results. Additionally, if there is a high emphasis on data processing and model computation efficiency, it recommends the mixed method of using resampling methods to balance the datasets first, then using MDLP to discrete data features, to efficiently obtain fairly accurate experimental results.
    Appears in Collections:[資訊管理研究所] 博碩士論文

    Files in This Item:

    File Description SizeFormat

    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明