A Compression-Based Partitioning Estimate Classifier

、線上人數：48

、訪客IP：52.15.233.83

姓名	藺禹筑(Yu-Zhu Lin) 查詢紙本館藏	畢業系所	統計研究所
論文名稱	(A Compression-Based Partitioning Estimate Classifier)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載] 本電子論文使用權限為同意立即開放。已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。
摘要(中)	分類問題在金融業、電商業抑或是醫療業無處不在。舉例來說，金融業透過儲戶的年齡、年收入、教育和歷史還款紀錄來預測其信用評等，而這些信用評等屬於類別型變數。此外，深度學習模型的蓬勃發展也反映出分類問題的重要性。另一方面，在電腦資源的限制下，伴隨著資料量的快速成長，多樣的資料縮減方法不斷地被提出。在本篇論文中，我們利用資料縮減的概念發展出適用於分類問題的預測模型，此外，也透過模擬與實際案例以展示我們提出的方法。
摘要(英)	In financial, telecom, or medical industry, classification problems are ubiquitous. For example, the financial industry predicts a depositor′s credit rating based on several input variables such as age, annual income, education, and repayment history, where the responses are qualitative. More and more deep learning models are developed for such purposes, reflecting the importance of classification problems. On the other hand, with the rapid growth of data size given limited computer resources, various data reduction methods have been innovated. In this thesis, we utilize a concept of data reduction to develop a classification predictor. We illustrate the proposed method through simulations and real examples.
關鍵字(中)	★ 資料壓縮 ★ 集群分析演算法 ★ 分割估計法	關鍵字(英)	★ data compression ★ k-means algorithm ★ partitioning estimate
論文目次	Contents 中文摘要...i Abstract...ii Contents...iii List of Figures...iv List of Tables...viii 1 Introduction...1 2 Literature Review...3 3 Methodology...6 3.1 Supercompress...6 3.2 PEC...11 4 Simulation...15 4.1 Supercompress vs. SRS...16 4.2 Predictive Efficiency under Five Models...31 4.3 Other Criteria...42 4.4 PEC vs. KNN with Different k Value...44 5 Real Applications...53 5.1 Small Data...53 5.2 Big Data...55 6 Conclusion...56 References...57
參考文獻	Chenlu Shi, and Boxin Tang (2021). Model-robust subdata selection for big data, Journal of Statistical Theory and Practice. 15(82). Elizabeth D Schifano, Jing Wu, Chun Wang, Jun Yan, and Ming-Hui Chen (2016). Online updating of statistical inference in the big data setting, Technometrics, 58(3), 393–403. Erchin Serpedin, Thomas Chen and Dinesh Rajan (2012). Mathematical Foundations for Signal Processing, Communications, and Networking, CRC Press, 381-385. Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (2013). An Introduction to Statistical Learning : with Applications in R, Springer, New York, NY. HaiYing Wang, Min Yang, and John Stufken (2018). Information-based optimal subdata selection for big data linear regression, Journal of The American Statistical Association, 114(525), 393-405. HaiYing Wang, Rong Zhu, and Ping Ma (2018). Optimal subsampling for large sample logistic regression, Journal of The American Statistical Association, 113(522), 829–844. Leo Breiman (2001). Random forests, Machine Learning, 45, 5-32. Lin Wang, Jake Elmstedt, Weng Kee Wong, and Hongquan Xu (2021). Orthogonal subsampling for big data linear regression, Annals of Applied Statistics, 15(3), 1273-1290. Nan Lin, and Ruibin Xi (2011). Aggregated estimating equation estimation, Statistics and Its Interface, 4(1), 73–83. Petros Drineas, Michael W. Mahoney, S. Muthukrishnan (2006). Sampling algorithms for l2 regression and applications, SODA ’06: Proceedings of The Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, 1127-1136. 57 Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second Edition), Springer-Verlag. V. Roshan Joseph and Akhil Vakayil (2021). SPlit: an optimal method for data splitting, Technometrics, 64(2), 166-176. V. Roshan Joseph, and Simon Mak (2021). Supervised compression of big data, Statistical Analysis and Data Mining, 14(3), 217-229. William Fithian and Trevor Hastie (2014). Local case-control sampling: efficient subsampling in imbalanced data sets, Annals of Statistics, 42(5), 1693–1724. Yaqiong Yao, and HaiYing Wang (2020). A review on optimal subsampling methods for massive datasets, Journal of Data Science, 19(1), 151–172. Yaqiong Yao, and Ying Wang (2021). A selective review on statistical techniques for big data, Modern Statistical Methods for Health Research, 223-245. Zizhu Fan, Yong Xu, and David Zhang (2011). Local linear discriminant analysis framework using sample neighbors, IEEE Transactions on Neural Networks, 22(7), 1119-1132.
指導教授	陳春樹張明中(Chun-Shu Chen Ming-Chung Chang)	審核日期	2022-7-12
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 109225015 詳細資訊