Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：38

、訪客IP：13.58.148.134

姓名

陳奕儒(Yi-Ju Chen) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

(Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data)

相關論文

★ 長期追蹤資料上的 Gamma-EM 分群	★ Bayesian method for sparse principal component analysis
★ Sparse Bayesian Estimation with High-dimensional Binary Response Data	★ Q學習結合監督式學習在股票市場的應用
★ γ-EM approach to latent orientations for cryo-electron microscopy image clustering analysis	★ Contrastive Principal Component Analysis for High-Dimension, Low-Sample-Size Data with Noise-Reduction
★ 基於Q-learning與非監督式學習之交易策略	★ 視覺化股票市場之狀態變動
★ Principal Components on t-SNE

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

主成分分析（PCA）是一種常用的線性降維方法，在降維過程中保留了數據之間變數的變異性。PCA 通常用於可視化單個數據集；對比成分分析 (CPCA) 是傳統 PCA的推廣。CPCA 可用於存在多個數據集（如實驗組和對照組）的情況，CPCA 可以在參考其他數據集的前提下探索特定數據集獨特的低維結構。然而，雖然 CPCA 已在許多領域被證明可以找到 PCA 忽略的重要數據模式（Abubakar Abid，2017），但CPCA 缺乏一個統計模型來告訴我們為什麼 CPCA 可以識別我們感興趣的那些變化。在本文中，我們提出 CPCA 的模型假設。我們將目標數據劃分為我們感興趣的信號
矩陣和我們不感興趣的滋擾矩陣，並試圖說明我們不感興趣的滋擾矩陣對目標數據的影響可以通過 CPCA 移除。另一方面，我們通過模擬分析說明 CPCA 還原信號矩陣的優勢。除此之外，我們根據我們對 CPCA 的模型假設提出了一種新方法，用以幫助我們選取對執行 CPCA 很重要的對比參數。最後，我們通過調整對比參數在合成圖像示例中找到了感興趣的數據模式，並驗證了我們選擇對比參數的新方法可以達到相同的效果。

摘要(英)

Principal Component Analysis (PCA) is a commonly used linear dimensionality reduction method and is often used to visualize a single dataset; Contrastive Component Analysis
(CPCA) can be used in situations where there are multiple datasets, and CPCA can explore the unique low-dimensional structure of a specific dataset on the premise of referring to other datasets. However, while CPCA has been shown in many fields to find important data patterns that PCA ignores (Abubakar Abid, 2017), CPCA lacks a statistical model to tell us why CPCA can identify those changes that we are interested in. In this paper, we propose a statistical model for CPCA. We divide the target data into the signal matrix that we are interested in and the nuisance matrix that we are not interested in, and try to explain that the influence
of the nuisance matrix on the target data can be removed by CPCA. On the other hand, we illustrate the advantages of CPCA in restoring the signal matrix using simulation analysis. Furthermore, we propose a new method based on our model to help us decide on the contrast
parameter that is important to perform CPCA. Finally, we found data patterns of interest in the synthetic image example by adjusting the contrast parameter, and verified that our new method of choosing the contrast parameter can achieve the same effect.

關鍵字(中)

★ 子組發現
★ 可視化
★ 特徵選取
★ 去噪

關鍵字(英)

★ subgroup discovery
★ visualizing
★ feature selection
★ denoising

論文目次

摘要 i
Abstract ii
誌謝 iii
1 Introduction 1
1.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Method 5
2.1 CPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The CPCA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Contrast parameter α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Theory 17
4 Numerical Study 23
4.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.1 Target dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Background dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Application: synthetic images 29
5.1 Handwritten digits on grassy backgrounds . . . . . . . . . . . . . . . . . . 30
5.2 Merchandise on grassy backgrounds . . . . . . . . . . . . . . . . . . . . . 32
6 Conclusion 38
6.1 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Bibliography 41

參考文獻

Abid, A., Zhang, M. J., Bagaria, V. K., & Zou, J. (2017). Contrastive principal component
analysis.
Aoshima, M., Shen, D., Shen, H., Yata, K., Zhou, Y.-H., & Marron, J. S. (2018). A survey
of high dimension low sample size asymptotics. Australian & New Zealand Journal of
Statistics, 60(1), 4-19.
Cox, M. A. A., & Cox, T. F. (2008). Multidimensional scaling. In Handbook of data
visualization (pp. 315–347). Berlin, Heidelberg: Springer Berlin Heidelberg.
du Prel, J.-B., Röhrig, B., Hommel, G., & Blettner, M. (2010). Choosing statistical tests:
part 12 of a series on evaluation of scientific publications. Deutsches Arzteblatt
international, 107 19, 343-8.
Fujiwara, T., Kwon, O.-H., & Ma, K.-L. (2020). Supporting analysis of dimensionality
reduction results with contrastive learning. IEEE Transactions on Visualization and
Computer Graphics, 26, 45-55.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components.
Journal of Educational Psychology, 24, 498-520.
HUNG, H., WU, P., TU, I., & HUANG, S. (2012). On multilinear principal component
analysis of order-two tensors. Biometrika, 99(3), 569–583.
Kazuyoshi, Y., & Makoto, A. (2013, 11). Pca consistency for the power spiked model
in high-dimensional settings. Journal of Multivariate Analysis, 122, 334-354. doi:
10.1016/j.jmva.2013.08.003
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to
document recognition. Proc. IEEE, 86, 2278-2324.
Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2008). Mpca: Multilinear principal
component analysis of tensor objects. IEEE Transactions on Neural Networks, 19(1),
18-39. doi: 10.1109/TNN.2007.901277
Obeya, P. O., & Akinlabi, G. O. (2021, jan). Application of the regular perturbation method
for the solution of first-order initial value problems. Journal of Physics: Conference
Series, 1734(1), 012021. Retrieved from https://doi.org/10.1088/1742-6596/
1734/1/012021 doi: 10.1088/1742-6596/1734/1/012021
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine
Learning Research, 9(86), 2579–2605.
Yata. (2009). Pca consistency for non-gaussian data in high dimension, low sample size
context (Vol. 38). doi: 10.1080/03610910902936083
Yata, K., & Aoshima, M. (2012). Effective pca for high-dimension, low-sample-size data
with noise reduction via geometric representations. Journal of Multivariate Analysis,
105(1), 193-215.
Yata, K., & Aoshima, M. (2016). Reconstruction of a high-dimensional low-rank matrix.
Electronic Journal of Statistics, 10(1), 895-917. doi: 10.1214/16-ejs1128
Ye, J. (2004). Generalized low rank approximations of matrices. In Proceedings of the
twenty-first international conference on machine learning (p. 112). Association for
Computing Machinery.
Zhu, P., & Knyazev, A. (2013). Angles between subspaces and their tangents. Journal of
Numerical Mathematics, 21(4). doi: 10.1515/jnum-2013-0013

指導教授

王紹宣(Shao-Hsuan Wang)

審核日期

2022-8-1

推文