基於單一與混合特徵選取方法之比較

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/74751

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/74751

题名:	基於單一與混合特徵選取方法之比較
作者:	張櫻馨;Ying-Hsin, Chang
贡献者:	資訊管理學系
关键词:	資料探勘;機器學習;資訊融合;特徵選取;支援向量機;KDD;Machine Learning;Information Fusion;Feature Selection;Support Vector Machines
日期:	2017-07-04
上传时间:	2017-10-27 14:38:17 (UTC+8)
出版者:	國立中央大學
摘要:	在我們現今生活中，我們面臨巨量資料（Big Data）的問題，還需要考慮到資料的即時性，如何在有限的資源與時間之下，進行資料探勘，找出有趣的樣式，我們首要考慮的是資料前處理（Data Pre-processing），將特徵選取處理後的資料應用在分類器，提高模型預測正確率，進而幫助使用者做決策。本研究為探討特徵選取（Feature Selection）作為資料前處理的步驟，將不相關、冗餘的特徵（資料的屬性）刪除，換句話說，就是將原始資料集利用特徵選取的演算法，萃取出有用的特徵，或是足以代表整個資料集的資料值，並將這些特徵值重新組成一個新的資料集，再丟入SVM 支援向量機分類器中，希望可以透過特徵選取的方式，改善模型的正確率與執行的效能。目前大部分的特徵選取大多為單一（競爭式）特徵選取，本研究想加入資訊融合（Information Fusion）的概念，將實驗設計為UCI 公開資料集與其他公開資料集中，取得28 個完整資料集，進行單一（競爭式）特徵選取與混合式資料選取的比較，進一步探討不同維度、類型的資料對於不同方式的特徵選取的影響，以提出資訊融合（Information Fusion）概念的混合式特徵選取是否能幫助處理各種類型的資料集，並可大幅度的提升預測模型的正確率。;In our current life, we not only face the huge data （Big Data） problem, but also need to take into account the immediacy of information. Under limited resources and time, it is important to know how to perform data mining to find interesting style. We first consider data pre-processing for feature selection, and apply the selected data to construct the classifier, which could improve the classificaiton accuracy of the model, and help users make decisions. In this thesis, we discuss the feature selection as the preprocessing step, and remove irrelevant and redundant features （ attributes of the data） from a given dataset. In other words, the feature selection algorithm is used to idenitfy useful or represenative attributes from the entire data set. We reassemble these attributes into a new data set and then use the support vector machine classifier to improve the correctness and efficiency of the model. Since most related studies only focus on single （competitive） feature selection, this thesis applies the concept of information fusion for multiple feature selection results. The experiments are based on 28 UCI public datasets. The purpose of this thesis is to combine multiple feature selection methods. Under different dimensions and data types of information, we are able to understand whether combininng different feature selection results can perform better than single results in terms of classificaiton performance.
显示于类别:	[資訊管理研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	388	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....