在PU類型資料之下比較三種邏輯斯迴歸模型

DC 欄位	值	語言
DC.contributor	數學系	zh_TW
DC.creator	莊渝涵	zh_TW
DC.creator	Yu-Han Jhuang	en_US
dc.date.accessioned	2021-8-19T07:39:07Z
dc.date.available	2021-8-19T07:39:07Z
dc.date.issued	2021
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=108221015
dc.contributor.department	數學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	大數據時代的來臨，我們常面臨資料的標記品質不佳的情況。在傳統監督學習的二分類問題中，資料中含有部分的錯誤標記導致其訓練出的模型產生偏差。其中有一種含有錯誤標記的資料類型為僅含有正確標記的正標籤(positive)資料以及混雜大量負標籤(negative)及少量正標籤的未標記(unlabeled)資料，簡稱PU類型資料。在本文中我們比較文獻中所提出的三種邏輯斯迴歸的變型，分別是c-邏輯斯迴歸、ξ-邏輯斯迴歸以及γ-邏輯斯迴歸在PU類型資料的表現。我們藉由模擬實驗來比較這三種方法在PU類型資料下的參數估計準確性及分類正確性。實際資料分析使用UCI Machine Learning Repository中的兩筆資料集，分別是Wisconsin乳癌的資料集(WDBC)和Pima Indians糖尿病的資料集(Pima)。	zh_TW
dc.description.abstract	With the advent of the big data era, we often face the situation of poor quality of labeling the data. In binary classification problems of traditional supervised learning, mislabeled in data leads to a model bias issues. One type of mislabeled data is which contains correctly labeled of positive data and unlabeled ones which mixed with a large number of negative data and a small number of positive data, referred to as positive and unlabeled data. In this article, we compare the three logistic regression variants proposed in the literature, namely c-logistic regression, ξ-logistic regression and γ-logistic regression on positive and unlabeled data. We compare the parameter estimation accuracies and classification correct rates of these three methods under positive and unlabeled data by simulation experiments. For real-world applications, we supply the three methods on the two datasets, WDBC (breast cancer Wisconsin (diagnostic)) data set and PIMA (Pima Indians diabetes) data set in the UCI Machine Learning Repository.	en_US
DC.subject	邏輯斯迴歸	zh_TW
DC.subject	錯標機制	zh_TW
DC.subject	參數估計	zh_TW
DC.subject	PU類型資料	zh_TW
DC.subject	穩健估計	zh_TW
DC.subject	Logistic regression	en_US
DC.subject	Mislabeling mechanism	en_US
DC.subject	Parameter estimation	en_US
DC.subject	Positive and unlabeled data	en_US
DC.subject	Robust estimation	en_US
DC.title	在PU類型資料之下比較三種邏輯斯迴歸模型	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	A Comparison among Three Logistic Regression Models under Positive and Unlabeled Data	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 108221015 完整後設資料紀錄