中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/48987
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 40299752      在线人数 : 423
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/48987


    题名: 針對文字分類的支援向量導向樣本選取;Support Vector Oriented Instance Selection for Text Classification
    作者: 張哲瑋;Che-wei Chang
    贡献者: 資訊管理研究所
    关键词: 機器學習;支援向量機;文字分類;資料縮減;樣本選取;support vector machines;machine learning;text classification;data reduction;instance selection
    日期: 2011-07-22
    上传时间: 2012-01-05 15:12:16 (UTC+8)
    摘要: 樣本選取 (instance selection) 在資料探勘領域的一門技術,但是對於現今持續增長的資料量,卻很少人著重在樣本選取,而本研究提出了一個基於支援向量機 (Support Vector Machine, SVM)概念發展出的一個樣本選取演算法稱為SVOIS。 而且是針對於文字分類上進行樣本選取,此外也與幾個有名的樣本選取演算法ENN、IB3、ICF和DROP3這些演算法進行比較。在分類器的選擇上,也較這些方法不同,本篇論文不只有使用k-NN這個作為分類器,還有使用一個二分類的分類器支援向量機SVM作為分類器的比較依據;因為對於SVM而言,在訓練的時候時常需要花費很長的時間,而且時間是隨著樣本的增加而增長,所以我們認為SVOIS不只會對SVM有所幫助,還可能會對於k-NN有較其他樣本選取演算法更有幫助。 最後,透過實驗二分類的文字資料集來進行實驗,也分別實作出其他這個演算法來進行比較,以驗證SVOIS是較其他樣本選取演算法來的佳。實驗結果也發現,SVOIS針對在文字資料集上樣本選取後的正確率較其他演算法來的高,也能改善其資料量。 Since the number and size of online information are increasing rapidly, instance selection has become one of the major techniques for managing text data. In this paper, a novel instance selection method, namely Support Vector Oriented Instance Selection (SVOIS) is proposed for text classification. SVOIS attempts to find the support vectors in the original feature space through a linear regression plane, where the instances to be selected as the support vectors need to satisfy two criteria. The first one is that the distances between the original instances and their class centers need to be smaller than a pre-defined value. Then, the instances fulfilling this criterion are regarded as the regression data in order to identify a regression plane. The second criterion is based on the distances between the regression data and the regression plane, which is like the margin of SVM. In particular, these distances need to be larger than a pre-defined value, and the regression data fulfilling this criterion are called support vectors for classifier training and classification. More specifically, these two types of distances should not be neither too long to make all instances to be selected, nor too short leading to very few support vectors. In particular, this paper compares SVOIS with four state-of-the-art algorithms, which are ENN, IB3, ICF, and DROP3. The experimental results over the TechTC-100 dataset show that SVOIS can allow SVM and k-NN provide similar or better classification accuracy than the baseline without instance selection and it also outperforms the state-of-the-art algorithms in terms of effectiveness and efficiency.
    显示于类别:[資訊管理研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML783检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明