不平衡數據的機器學習發展暨可視化辨識模型之應用

DC 欄位	值	語言
DC.contributor	機械工程學系	zh_TW
DC.creator	許哲彰	zh_TW
DC.creator	Che-Chang Hsu	en_US
dc.date.accessioned	2019-7-24T07:39:07Z
dc.date.available	2019-7-24T07:39:07Z
dc.date.issued	2019
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=953403018
dc.contributor.department	機械工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	不平衡數據集在機器學習的許多應用場景中是一個普遍存在的問題。如何在訓練集的某些類擁有較多的樣本，而某些類只有相對較少的樣本情況下，解決傳統分類器對少類分類失準的問題已成為機器學習目前面臨的一個挑戰。本研究從算法層面(algorithm level)出發，提出一種結合貝葉斯分類器與支持向量機的新模型，即重新平衡支持向量機(SVM-rebalancing)。在這個學習過程中，重新平衡參數(分類權值參數)提供了一個使各類別的分類權值趨於平衡的協調，並藉由求解重新平衡規劃問題使少類樣本獲得有效的可識別性。本研究次要旨在瞭解造成錯誤分類的可能來源是否不僅是不平衡，還是尚有其他因素導致這些誤分類。鑒於模式識別的純預測模型缺乏可視化理解訊息，像類神經網路和支持向量機這樣的黑盒方法(black box)無法提供可解釋的模型，造成了對誤分類的原因無法探究其根源。因此，本研究提出對核函數進行多元尺度變換的前處理以來建構低維數據的表示空間。在實踐中，可視化辨識模型表明數據的重疊分布、多峰分布、偏態分布也是造成分類器的分類性能不佳的其他原因。最後，本研究給予一項建議是:採用這樣的可視化辨識模型策略能夠告訴我們數據結構所出現的問題，一旦想再繼續提升分類器的性能時就能往該方面進行後續改良。	zh_TW
dc.description.abstract	Imbalanced data is a common problem in many application domains of machine learning. How to solve the problem of misclassification of minority class samples by traditional classifiers has become a challenge in machine learning when some classes of training set have more samples, and some classes have relatively few samples. This paper proposes a new model combining Bayesian classifier and support vector machine (SVM) from the perspective of algorithm level, namely, SVM-rebalancing. In the learning process, the rebalance parameter (classification weight parameter) provides a coordination that balances the classification weight of each class. The problem is solved by rebalancing programming problem, so as to produce an effective identifiability for minority samples. The next study wants to understand whether the possible sources of misclassifications are not only the imbalance, but also other factors that cause to these misclassifications. In view of the purely predictive model of pattern recognition lacks visual understanding, black box methods such as neural networks and support vector machines cannot provide interpretable model, which makes it impossible to explore the sources of misclassification causes. Therefore, this study further proposes a pre-processing of multidimensional scaling of kernel functions to construct a visual low-dimensional data representation space. In practice, the visual recognition model indicates that the overlapping distribution, multimodal distribution, and skewed distribution of the data in the database are also other causes of poor classification performance of the classifier. Finally, this research gives a suggestion that using such a visual identification model strategy can tell us the problems that arise in the data structure, and once we further want to improve the performance of the classifier, we can make subsequent improvements in this aspect.	en_US
DC.subject	重新平衡支持向量機	zh_TW
DC.subject	可視化辨識模型	zh_TW
DC.subject	多元尺度變換	zh_TW
DC.subject	SVM-rebalancing	en_US
DC.subject	visual recognition model	en_US
DC.subject	multidimensional scaling	en_US
DC.title	不平衡數據的機器學習發展暨可視化辨識模型之應用	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Machine learning development of imbalanced data and application of visual recognition model	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 953403018 完整後設資料紀錄