博碩士論文 984203039 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator張哲瑋zh_TW
DC.creatorChe-wei Changen_US
dc.date.accessioned2011-7-22T07:39:07Z
dc.date.available2011-7-22T07:39:07Z
dc.date.issued2011
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=984203039
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract樣本選取 (instance selection) 在資料探勘領域的一門技術,但是對於現今持續增長的資料量,卻很少人著重在樣本選取,而本研究提出了一個基於支援向量機 (Support Vector Machine, SVM)概念發展出的一個樣本選取演算法稱為SVOIS。 而且是針對於文字分類上進行樣本選取,此外也與幾個有名的樣本選取演算法ENN、IB3、ICF和DROP3這些演算法進行比較。在分類器的選擇上,也較這些方法不同,本篇論文不只有使用k-NN這個作為分類器,還有使用一個二分類的分類器支援向量機SVM作為分類器的比較依據;因為對於SVM而言,在訓練的時候時常需要花費很長的時間,而且時間是隨著樣本的增加而增長,所以我們認為SVOIS不只會對SVM有所幫助,還可能會對於k-NN有較其他樣本選取演算法更有幫助。 最後,透過實驗二分類的文字資料集來進行實驗,也分別實作出其他這個演算法來進行比較,以驗證SVOIS是較其他樣本選取演算法來的佳。實驗結果也發現,SVOIS針對在文字資料集上樣本選取後的正確率較其他演算法來的高,也能改善其資料量。 zh_TW
dc.description.abstractSince the number and size of online information are increasing rapidly, instance selection has become one of the major techniques for managing text data. In this paper, a novel instance selection method, namely Support Vector Oriented Instance Selection (SVOIS) is proposed for text classification. SVOIS attempts to find the support vectors in the original feature space through a linear regression plane, where the instances to be selected as the support vectors need to satisfy two criteria. The first one is that the distances between the original instances and their class centers need to be smaller than a pre-defined value. Then, the instances fulfilling this criterion are regarded as the regression data in order to identify a regression plane. The second criterion is based on the distances between the regression data and the regression plane, which is like the margin of SVM. In particular, these distances need to be larger than a pre-defined value, and the regression data fulfilling this criterion are called support vectors for classifier training and classification. More specifically, these two types of distances should not be neither too long to make all instances to be selected, nor too short leading to very few support vectors. In particular, this paper compares SVOIS with four state-of-the-art algorithms, which are ENN, IB3, ICF, and DROP3. The experimental results over the TechTC-100 dataset show that SVOIS can allow SVM and k-NN provide similar or better classification accuracy than the baseline without instance selection and it also outperforms the state-of-the-art algorithms in terms of effectiveness and efficiency. en_US
DC.subject機器學習zh_TW
DC.subject支援向量機zh_TW
DC.subject文字分類zh_TW
DC.subject資料縮減zh_TW
DC.subject樣本選取zh_TW
DC.subjectsupport vector machinesen_US
DC.subjectmachine learningen_US
DC.subjecttext classificationen_US
DC.subjectdata reductionen_US
DC.subjectinstance selectionen_US
DC.title針對文字分類的支援向量導向樣本選取zh_TW
dc.language.isozh-TWzh-TW
DC.titleSupport Vector Oriented Instance Selection for Text Classificationen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明