摘要(英) |
Contrastive Principal Component Analysis (cPCA) is a useful dimensionality reduction technique under some specific scenarios in which datasets are collected under different conditions, e.g., a treatment and a control experiment, especially in visualizing and exploring patterns that are specific to one dataset. In this study, we propose a new methodology to deal with cPCA in high-dimension, low-sample-size (HDLSS) data situations. The proposed method, called cPCA-NR, gives an idea of applying the noise-reduction (NR) method proposed by Yata and Aoshima (2012) to mitigates the adverse effects of noisy data points, improving the robustness and reliability of the dimensionality reduction process. In simulation study, we demonstrate that the cPCA-NR outperforms traditional PCA in terms of classification accuracy and clustering performance. Moreover, the proposed method exhibits strong resilience to noisy data, achieving notable improvements in scenarios with high levels of noise. The results highlight the superior performance of cPCA-NR, establishing its potential as a valuable tool for various applications, such as image recognition, anomaly detection, and data visualization. |
參考文獻 |
[1] Abid, A., Zhang, M. J., Bagaria, V. K., and Zou, J. “Contrastive principal component analysis.” arXiv preprint arXiv:1709.06716, 2017.
[2] Ahn, J. “High dimension, low sample size data analysis”. The University of North Carolina at Chapel Hill, 2006.
[3] Kazuyoshi Yata, Makoto Aoshima, “Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations.” Journal of Multivariate Analysis, Volume 105, Issue 1, 2012, pp. 193-215.
[4] Kazuyoshi Yata & Makoto Aoshima, “PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context.” Communications in Statistics - Theory and Methods, 38:16-17, 2009, pp. 2634-2652.
[5] Yi-Ju Chen & Shao-Hsuan Wang. “Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data.” Master’s Thesis, National Central University, 2022.
[6] Hotelling, H. “Analysis of a complex of statistical variables into principal components.” Journal of Educational Psychology, 24, 498-520. 1933.
[7] Takanori Fujiwara, Oh-Hyun Kwon, and Kwan-Liu Ma. “Supporting analysis of dimensionality reduction results with contrastive learning.” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 45–55, 2020.
[8] Imrul Kaish, Jakir Hossain, Evangelos Papalexakis, and Jia Chen, “COVID-19 or flu? Discriminative knowledge discovery of COVID-19 symptoms from Google Trends data,” 4th International Workshop on Epidemiology meets Data Mining and Knowledge discovery, 2021.
[9] Micol Marchetti-Bowick, “Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data.” Ph.D. thesis, Carnegie Mellon University, 2020. |