Subdata Selection : A- and I-optimalities

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：83

、訪客IP：3.140.242.43

姓名

吳姿蓉(Zih-Rong Wu) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

(Subdata Selection : A- and I-optimalities)

相關論文

★ Optimal Multi-platform Designs Based on Two Statistical Approaches	★ On the Construction of Multi-Stratum Factorial Designs
★ A Compression-Based Partitioning Estimate Classifier	★ On the Study of Feedforward Neural Networks: an Experimental Design Approach
★ Bayesian Optimization for Hyperparameter Tuning with Robust Parameter Design	★ Unreplicated Designs for Random Noise Exploration
★ Optimal Designs for Simple Directed/Weighted Network Structures	★ Study on the Prediction Capability of Two Aliasing Indices for Gaussian Random Fields
★ Predictive Subdata Selection for Gaussian Process Modeling	★ Optimal Designs on Undirected Network Structures for Network-Based Models
★ Data Reduction for Subsample in Gaussian Process	★ Gaussian Process Modeling with Weighted Additive Kernels

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

隨著技術的進步，數據集的規模呈指數級增長。計算工具的進步，但與數據量的增加比起來，就顯得相形見絀。因此，用於分析大數據的統計和計算工具就非常需要，因此，要如何可以在有限的成本下，從數據中提取重要信息，是我們這篇論文要討論的。考慮具有n個樣本數和p個變異數的線性回歸。對n≫p的情況，在現有研究方向是從完整數據中隨機抽取子樣本。但是，對於線性回歸模型下，現有方法還是需要時間來計算。Wang et al. (2018) 提出了一種稱為基於信息的最優子數據選擇（IBOSS）方法的替代方法。想法是選擇樣本數小的子樣本，以保留完整數據中的大部分信息。在本文中，我們採用A-最優性準則，其尋求最小化回歸係數的估計量的變異數，以及I-最優性準則，在設計空間尋求最小化預測的變異數。

摘要(英)

With technology advances, the sizes of datasets are growing exponentially. While computing power becomes stronger, it is dwarfed by the phenomenal increase in data volume. Therefore, eﬃcient statistical and computing tools for analyzing huge datasets are urgent so that one can draw important information from data under limited cost. Consider linear regression with n responses and p covariates. Existing investigations for n ≫ p are to take random subsamples from the full data. However, for linear regression on the full data, many existing methods take time to calculate. Wang et al. (2018) proposed an alternative approach called information-based optimal subdata selection (IBOSS) method. The idea is to select subdata of a small size that preserves most of information in the full data. In this thesis, we adopt the A-optimality criterion, which seeks to minimize the average
variance of the estimators of regression coeﬃcients, and the I-optimality criterion, which seeks to minimize the average prediction variance over the design space.

關鍵字(中)

★ 線性回歸
★ 優化設計
★ 信息性子數據選擇
★ 粒子群最佳化

關鍵字(英)

★ Linear regression
★ Informative subdata selection
★ Optimal design
★ Particle swarm optimization

論文目次

1 Introduction -----------------------1
2 Literature review ------------------3
2.1 Linear model -------------------3
2.2 Subsampling-based methods ------4
2.3 Information-based method -------5

3 Subdata selection: A-optimality ----7
3.1 A-optimal designs --------------8
3.1.1 Numerical examples ------10
3.2 Standardization ---------------13
3.2.1 A-optimal designs -------14
3.2.2 Numerical examples ------17
3.3 Algorithm ---------------------20

4 Subdata selection: I-optimality ---22
4.1 I-optimal designs -------------22
4.2 Numerical examples ------------26

5 Simulation ------------------------29

6 Real example ----------------------32

7 Conclusion ------------------------36

Reference ---------------------------37

參考文獻

[1] R. Tibshirani. “Regression shrinkage and selection via the lasso.” (1996).

[2] E. Candes and T. Tao. “The Dantzig selector: Statistical estimation when p is much larger than n.” (2007).

[3] J. Fan and J. Lv. “Sure Independence Screening for Ultra-High Dimensional Feature Space.” (2008).

[4] N. Meinshausen, L. Meier, and P. B¨uhlmann. “p-Values for High-Dimensional Regression.” (2008).

[5] P. Drineas, M. W. Mahoney, and S. Muthukrishnan. “Sampling algorithms for l2 regression and applications. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm.” (2006).

[6] P. Drineas, M. Mahoney, S. Muthukrishnan, and T. Sarlos. “Faster least squares approximation.” (2011).

[7] P. Ma and X. Sun. “Leveraging for big data regression.” (2015).

[8] P. Ma, X. Mahoney, and B. Yu. “A statistical perspective on algorithmic leveraging.” (2014).

[9] P. Ma, M. Mahoney, and B. Yu. “A statistical perspective on algorithmic leveraging.” (2015).

[10] H. Wang, M. Yang, and J. Stufken. “Information-Based Optimal Subdata Selection for Big Data Linear Regression.” (2018).

[11] J. Kiefer. “Optimum experimental designs.” (1959).

[12] R. Bellman. “Some Inequalities for Positive Deﬁnite Matrices.” (1980).

指導教授

張明中(Ming-Chung Chang)

審核日期

2019-7-2

推文