博碩士論文 106423053 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator葉奇瑋zh_TW
DC.creatorChi-Wei Yehen_US
dc.date.accessioned2019-7-23T07:39:07Z
dc.date.available2019-7-23T07:39:07Z
dc.date.issued2019
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106423053
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在雜亂無章的資料當中找到合適的整理方法以及萃取出珍貴的資訊已經是大數據時代的願景,隨著資料數量與複雜度的增加,資料科學家不再把目光放在模型訓練的優劣而是希望以不一樣的計算方式或運行架構來找尋資料當中的蛛絲馬跡,最後更期望能從這些發現當中找到有效提升數值型預測的方法。 在進行資料集的數值型預測時回歸預測方法使用線性回歸(linear regression)、類神經網路(neural network)及支持向量回歸(support vector regression)為比較常見的預測模型建置方式,在訓練回歸模型時為了追求更好數值型預測結果除了調整模型內的設定參數外也會在資料前處理上採用特徵選取(feature selection)來篩選出較不相關或是多餘的特徵以及分群(clustering)將資料有條理地歸類為不同群。 本研究以階層式架構作為實驗雛型並且進行延伸,針對資料筆數與特徵數量多的資料集在資料前處理上進行層次式分群與特徵選取。為了使實驗結果具有明顯的比較成果,本研究採用多個不同領域、數量及特徵的資料集以階層式的架構結合不同的分群、特徵選取及線性回歸演算法來進行模型訓練和數值型預測誤差值運算,從分析與比較多種演算法組合的實驗結果之中證明本研究的實驗架構比起使用回歸預測方法或者是分群加上回歸預測方法更能使資料集在均方根或平均絕對誤差上有0至1的誤差值下降效果,除此之外本研究也在實驗結果中找出針對不同資料集平均表現較佳的階層式分群(K-means, C-means)、特徵選取(Mutual Information, Information Gain)與回歸預測方法(Multi-Perceptron)。 zh_TW
dc.description.abstractThe vision of the big data era is to find the suitable sorting method and extract valuable information from numerous and messy data. With the increase of the quantity and complexity of data, data scientists no longer focus on the strengths and weaknesses of model training but concentrate on using different calculation methods or operating architectures to find clues in the data, and finally they look forward to find ways to improve numerical prediction accuracy from these findings. In numerical prediction of data sets, the common prediction model construction methods in regression training uses linear regression, neural network and support vector regression. In order to pursue better numerical prediction results in the model, adjusting the parameters in the models is necessary. Apart from this, we use feature selection to select less relevant or redundant features and clustering to organize data into different groups. In this study, the hierarchical structure is used as an experimental prototype and extended further to deal with datasets which have a large number of features and amounts. In order to have obvious comparison results from the experimental results, our study combine different clustering, feature selection and regression algorithms to train models and process numerical prediction errors from different fields, quantities and features datasets. Our study find out that there is improvement of root mean square and mean absolute error by using hierarchical classification and regression with feature selection than using only regression or hierarchical classification and regression from analyzing and comparing multiple algorithms experimental results. In addition, our study also find out that the hierarchical structure using clustering (K-means and C-means), feature selection (Mutual Information and Information Gain) and regression (Multi-layer Perceptron) have better average performance in different datasets.en_US
DC.subject階層式架構zh_TW
DC.subject線性回歸zh_TW
DC.subject特徵選取zh_TW
DC.subject分類zh_TW
DC.subject分群zh_TW
DC.subjectK-meansen_US
DC.subjectC-meansen_US
DC.subjectGaussian Maximumen_US
DC.subjectChi Squareen_US
DC.subjectMutual Informationen_US
DC.subjectInformation Gainen_US
DC.subjectSupport Vector Machineen_US
DC.subjectMultilayer Perceptronen_US
DC.subjectSupport Vector Regressionen_US
DC.subjectLinear Regressionen_US
DC.titleHierarchical Classification and Regression with Feature Selectionen_US
dc.language.isoen_USen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明