English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 37839050      線上人數 : 470
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/81636


    題名: 不平衡數據的機器學習發展暨可視化辨識模型之應用;Machine learning development of imbalanced data and application of visual recognition model
    作者: 許哲彰;Hsu, Che-Chang
    貢獻者: 機械工程學系
    關鍵詞: 重新平衡支持向量機;可視化辨識模型;多元尺度變換;SVM-rebalancing;visual recognition model;multidimensional scaling
    日期: 2019-07-24
    上傳時間: 2019-09-03 16:34:34 (UTC+8)
    出版者: 國立中央大學
    摘要: 不平衡數據集在機器學習的許多應用場景中是一個普遍存在的問題。如何在訓練集的某些類擁有較多的樣本,而某些類只有相對較少的樣本情況下,解決傳統分類器對少類分類失準的問題已成為機器學習目前面臨的一個挑戰。本研究從算法層面(algorithm level)出發,提出一種結合貝葉斯分類器與支持向量機的新模型,即重新平衡支持向量機(SVM-rebalancing)。在這個學習過程中,重新平衡參數(分類權值參數)提供了一個使各類別的分類權值趨於平衡的協調,並藉由求解重新平衡規劃問題使少類樣本獲得有效的可識別性。本研究次要旨在瞭解造成錯誤分類的可能來源是否不僅是不平衡,還是尚有其他因素導致這些誤分類。鑒於模式識別的純預測模型缺乏可視化理解訊息,像類神經網路和支持向量機這樣的黑盒方法(black box)無法提供可解釋的模型,造成了對誤分類的原因無法探究其根源。因此,本研究提出對核函數進行多元尺度變換的前處理以來建構低維數據的表示空間。在實踐中,可視化辨識模型表明數據的重疊分布、多峰分布、偏態分布也是造成分類器的分類性能不佳的其他原因。最後,本研究給予一項建議是:採用這樣的可視化辨識模型策略能夠告訴我們數據結構所出現的問題,一旦想再繼續提升分類器的性能時就能往該方面進行後續改良。;Imbalanced data is a common problem in many application domains of machine learning. How to solve the problem of misclassification of minority class samples by traditional classifiers has become a challenge in machine learning when some classes of training set have more samples, and some classes have relatively few samples. This paper proposes a new model combining Bayesian classifier and support vector machine (SVM) from the perspective of algorithm level, namely, SVM-rebalancing. In the learning process, the rebalance parameter (classification weight parameter) provides a coordination that balances the classification weight of each class. The problem is solved by rebalancing programming problem, so as to produce an effective identifiability for minority samples. The next study wants to understand whether the possible sources of misclassifications are not only the imbalance, but also other factors that cause to these misclassifications. In view of the purely predictive model of pattern recognition lacks visual understanding, black box methods such as neural networks and support vector machines cannot provide interpretable model, which makes it impossible to explore the sources of misclassification causes. Therefore, this study further proposes a pre-processing of multidimensional scaling of kernel functions to construct a visual low-dimensional data representation space. In practice, the visual recognition model indicates that the overlapping distribution, multimodal distribution, and skewed distribution of the data in the database are also other causes of poor classification performance of the classifier. Finally, this research gives a suggestion that using such a visual identification model strategy can tell us the problems that arise in the data structure, and once we further want to improve the performance of the classifier, we can make subsequent improvements in this aspect.
    顯示於類別:[機械工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML134檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明