中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/65619
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 41651156      在线人数 : 1458
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/65619


    题名: 兩階段混合學習法於資料分類之研究;A Two-Stage Hybrid Learning Approach for Effective Pattern Classification
    作者: 游孟綸;You,Mon-loon
    贡献者: 資訊管理學系
    关键词: 資料探勘;樣本選取;資料縮減;機器學習;支援向量機;data mining;instance selection;data reduction;machine learning;support vector machines
    日期: 2014-07-09
    上传时间: 2014-10-15 17:06:23 (UTC+8)
    出版者: 國立中央大學
    摘要: 當今的企業常常需要從龐大的資料庫以及資料倉儲中尋找對企業有價值的知識,但越是大型的資料庫所包含的雜訊資料越多,這些雜訊資料會降低資料探勘的精確度,且龐大的資料更會增加知識發掘過程中所需的時間。
    雖然樣本選取可以在資料前處理的階段中幫我們過濾掉一些雜訊,是目前最常被用來進行資料縮減的方法,但不同的樣本選取的演算法所篩選出來的資料不盡相同,且常常會發生過度選取 (Over Selection) 或是選取不足 (Under Selection) 的情況進而影響資料探勘的精確度。因此本研究提出了一個新的資料前處理流程 (TSHLA, 兩階段混合學習) ,並且應用在資料分類上。先將訓練集的資料做樣本選取後,分別對被樣本選取演算法判定為雜訊及非雜訊的資料集訓練SVM模型;並且將測試集的資料做KNN的相似度比對,較相似為雜訊的測試資料集用雜訊資料集所訓練的模型做測試,同理,較相似為非雜訊的測試資料集用非雜訊資料集所訓練的模型做測試,希望在雜訊類的資料中找出被篩選掉,但卻有效的樣本,最後合併為最終結果。
    本研究的實驗分成兩部分,在樣本選取步驟皆分別實驗了IB3、DROP3、GA等三種效能較佳的演算法。在第一部分的實驗以TSHLA對50個小型資料集做測試,並以SVM作為本研究所使用的分類器。在第二部分的實驗則是使用大型資料集 (十萬筆以上) ,以SVM為分類器,與傳統樣本選取方法比較彼此精準度。
    ;Nowadays, more and more enterprises require extracting knowledge from very large databases. However, these large datasets usually contain a certain amount of noisy data, which are likely to decline the performance of data mining. In addition, the computational time of processing the large scale datasets is usually very large.
    Instance selection, which is the widely used data reduction approach, can filter out noisy data from large datasets. However, different instance selection algorithms over different domain datasets filter out different noisy data, which are likely to result in over or under selection since there is no exact definition of outliers. Thus, the quality of data mining results can be affected. Therefore, this thesis proposes a new data pre-processing (TSHLA, Two-Stage Hybrid Learning Approach) for effective data classification. First, instance selection is performed over a given training dataset to filter out the noisy and non-noisy data to train two individual SVM classifiers respectively. Then, using the KNN to compare the similarity of the testing data. As a result, the noisy and non-noisy testing sets are identified and they are fed into their corresponding SVM classifiers for classification.
    There two experimental studies in this thesis and three instance selection algorithms are used for comparison, which are IB3, DROP3 and GA. The first and second studies are based on 50 small UCI datasets and large scale datasets containing more than 100,000 data samples. In addition, our proposed TSHLA is compared with the baseline without instance selection and the one based on the conventional instance selection approach.
    显示于类别:[資訊管理研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML425检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明