在資料不平衡下提升分類器性能之策略研究

DC 欄位	值	語言
DC.contributor	工業管理研究所	zh_TW
DC.creator	陳詠俊	zh_TW
DC.creator	Yung-Chun Chen	en_US
dc.date.accessioned	2022-7-11T07:39:07Z
dc.date.available	2022-7-11T07:39:07Z
dc.date.issued	2022
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=109426024
dc.contributor.department	工業管理研究所	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	分類問題是在機器學習上相當重要的一個研究主題，透過模型我們可以自動地將龐大資料中的標籤分類出來，讓決策者能省下大量的時間就從交易紀錄、機台資料等來源中得到可用的資訊。當中類別不平衡（Class imbalanced）是相當重要的一個問題，若資料中的類別數量相差較大，會使得模型難以正確的分類。過去的研究已經提出了相當多的方法來改善此問題，但主要都著重在分類指標分數的提高，對於改善時產生的潛在變異性著墨較少。忽略使用改善方法造成的不穩定性，決策者依照模型分類出的結果可能受到訓練資料不同而有較大差異，造成決策上的錯估。本篇研究嘗試探討在類別不平衡的情況下，找到一種可以穩定提高分類器表現的策略。希望此策略能協助決策者做出穩健的決策，而不必擔心訓練當中可能的不確定。在本篇研究中，我們以兩種真實資料來呈現類別不平衡問題。設計不同類別數量的不平衡比例，或是資料集大小，檢視對分類器的影響。當中使用了三種常見的分類器，分別是 Logistic Regression、Support Vector Machine 以及 Random Forest。根據實驗結果，我們從中試著找到影響提高模型表現時的穩定與否的主要原因，並提出一個用以量測穩定性的指標。最後，我們提出一套能讓模型在類別不平衡下穩定的提高表現的策略。	zh_TW
dc.description.abstract	Classification is one of common topic in machine learning. We can automatically recognize the labels by the classification models. It saves lots of time and make the massive information from digital transaction or machine log being usable. Class imbalanced problem is one of the most important and popular issue in this field. Under imbalanced ratio of classes, the classifiers can’t make classification very well. Researchers have been proposed several methods to solve this problem. However, most of methods only focus on the enhancement of certain measurements. Ignoring the variation of results, decision makers may face a trouble that over or underestimating the classifies due to different training datasets, leading to an unsuitable decision. In this study, we try to find a strategy to improve the performance of classifiers stably under class imbalanced. With this strategy, decision makers can make a robust decision without worrying about the huge variation of classification results. We conduct a series of experiments with two real-world datasets to present the class imbalanced problem in this study, including the situation which being used different imbalanced ratios and sizes of datasets. Three classification models are used in the experiments, that is Logistic Regression, Support Vector Machine and Random Forest models. We examine the effects of Cost-sensitive and Under-sampling methods with these three models. According to the results of experiments, we try to find the main causes to stability and propose a method to describe the stability of improvement methods. In the end, we conduct a strategy to raising the ability of classifiers in a stable way	en_US
DC.subject	分類	zh_TW
DC.subject	資料類別不平衡	zh_TW
DC.subject	成本敏感方法	zh_TW
DC.subject	穩定性	zh_TW
DC.subject	classification	en_US
DC.subject	class imbalanced problem	en_US
DC.subject	cost-sensitive methods	en_US
DC.subject	stability	en_US
DC.title	在資料不平衡下提升分類器性能之策略研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	A study on strategies of improving performance under class imbalanced problem	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 109426024 完整後設資料紀錄