邏輯斯迴歸的子取樣方法之比較

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：13.59.134.12

姓名

陳俊廷(Chen, Chun-Ting) 查詢紙本館藏

畢業系所

數學系

論文名稱

邏輯斯迴歸的子取樣方法之比較
(A Comparison among Subsampling Methods for Logistic Regression)

相關論文

★ New insights on ′′A semi-parametric model for wearable sensor-based physical activity monitoring data with informative device wear"	★ A parametric model for wearable sensor-based physical activity monitoring data with informative device wear
★ 透過隨機投影降維的函數型資料變異數分析—以穿戴式裝置資料為例	★ 在PU類型資料之下比較三種邏輯斯迴歸模型
★ 用於函數型資料之兩步驟共變異數分析在穿戴裝置資料之應用	★ 兩個具時空效應之隨機場的獨立性檢定
★ Kronecker包絡主成分分析模型選擇方法及其應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

為了對資料中感興趣的二元分類變數做出推論或預測，以該變數做為反應變數建立邏輯斯迴歸是個很常見的方法。但是當我們感興趣的反應變數需要付出額外的成本才能取得標記，且在資源有限下只能標記一小部分的樣本時，如何從大樣本中選取對建立邏輯斯迴歸有較佳效率的子樣本進行標記就會是個重要的問題。本文的主要目標是在給定已知解釋變數但未知反應變數的資料中，處理子取樣問題以有效地估計參數。首先我們會介紹Wang et al. (2018)與Hsu et al. (2019)提出的子取樣方法。接下來我們會根據本研究的設定情境及最適設計理論來提出他們子取樣方法的變化型，並預期其有更好的表現。我們將會比較各方法在模擬資料與實際資料分析的效果。

摘要(英)

To make inference for or to predict the binary variable of interest, we usually use logistic regression where the variable is treated as the response. When extra cost is needed to label the variable of interest under a limited budget, we can only label a small part of samples. How to select subsamples to be labelled to efficiently build a logistic regression model would be an important issue.The main purpose of this article is such subsampling problem for efficiently estimating parameters under known explanatory variables and unknown responses. First we introduce the subsampling methods introduced in Wang et al. (2018) and Hsu et al. (2019). Then, we propose modified methods which are more efficient in our framework.We will compare the performance of these methods by simulation studies and a real-word application.

關鍵字(中)

★ A-最佳性
★ D-最佳性
★ 邏輯斯迴歸
★ 子取樣

關鍵字(英)

★ A-optimality
★ D-optimality
★ Logistic regression
★ Subsampling

論文目次

摘要......................................iv
Abstract..................................v
致謝......................................vi
目錄.....................................vii
圖目錄....................................ix
表目錄.....................................x
一、緒論....................................1
二、方法介紹................................4
2.1 貪婪主動式學習演算法GATE.................7
2.2 D最是設計下的貪婪主動式學習演算法GATED...10
2.3 最小均方誤差子取樣mMSE..................12
2.4 最小期望均方誤差子取樣mEMSE.............16
2.5 方法比較...............................19
三、模擬資料分析............................21
3.1 模擬實驗設定............................21
3.2 實驗結果...............................22
四、實際資料分析............................27
五、結論...................................32
參考文獻...................................34

參考文獻

[1] Deng, X., Joseph, V. R., Sudjianto, A., and Wu, C.F.J. (2009). Active learning through sequential design, with applications to detection of money laundering. Journal of the American Statistical Association,104(487), 969-981.
[2] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Retrieved 2021/07/15, from http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Scienc
[3] Ford, I., Torsney, B., and Wu, C.F.J. (1992). The use of a canonical form in the construction of locally optimal designs for non‐linear problems. Journal of the Royal Statistical Society: Series B (Methodological), 54(2), 569-583.
[4] Hsu, H. L., Chang, Y. C. I., and Chen, R. B. (2019). Greedy active learning algorithm for logistic regression models. Computational Statistics & Data Analysis, 129, 119-134.
[5] Huang, S. H., Huang, M. N. L., and Lin, C. W. (2020). Optimal designs for binary response models with multiple nonnegative variables. Journal of Statistical Planning and Inference, 206, 75-83.
[6] Kabera, G. M., Haines, L. M., and Ndlovu, P. (2015). The analytic construction of D-optimal designs for the two-variable binary logistic regression model without interaction. Statistics, 49(5), 1169-1186.
[7] Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 96, 202-207.
[8] Wang, H., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829-844.

指導教授

黃世豪(Huang, Shih-Hao)

審核日期

2021-8-19

推文