Contrastive Principal Component Analysis for High-Dimension, Low-Sample-Size Data with Noise-Reduction

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：22

、訪客IP：3.135.186.14

姓名

賴彥儒(Yen-Ru Lai) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

(Contrastive Principal Component Analysis for High-Dimension, Low-Sample-Size Data with Noise-Reduction)

相關論文

★ 長期追蹤資料上的 Gamma-EM 分群	★ Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data
★ Bayesian method for sparse principal component analysis	★ Sparse Bayesian Estimation with High-dimensional Binary Response Data
★ Q學習結合監督式學習在股票市場的應用	★ γ-EM approach to latent orientations for cryo-electron microscopy image clustering analysis
★ 基於Q-learning與非監督式學習之交易策略	★ 視覺化股票市場之狀態變動
★ Principal Components on t-SNE

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

對比主成分分析（cPCA）是在某些特定情境下有用的降維技術，該情境下資料集在不同條件下收集，例如治療與對照實驗，特別用於視覺化和探索僅屬於一個資料集的模式。在本研究中，我們提出了一種新的方法來處理高維度、低樣本數（HDLSS）資料情境下的cPCA。這種方法稱為cPCA-NR，它借鑑了Yata和Aoshima（2012）提出的降噪（NR）方法，以減輕噪音資料點的不良影響，提高降維過程的穩健性和可靠性。在模擬研究中，我們證明了cPCA-NR在分類準確度和聚類性能方面優於傳統PCA。此外，該方法對噪音資料表現出強大的韌性，在高噪音水準的情境下達到了顯著的改進。這些結果突顯了cPCA-NR的優越性能，確定其作為各種應用的寶貴工具，例如圖像識別、異常檢測和資料視覺化。

摘要(英)

Contrastive Principal Component Analysis (cPCA) is a useful dimensionality reduction technique under some specific scenarios in which datasets are collected under different conditions, e.g., a treatment and a control experiment, especially in visualizing and exploring patterns that are specific to one dataset. In this study, we propose a new methodology to deal with cPCA in high-dimension, low-sample-size (HDLSS) data situations. The proposed method, called cPCA-NR, gives an idea of applying the noise-reduction (NR) method proposed by Yata and Aoshima (2012) to mitigates the adverse effects of noisy data points, improving the robustness and reliability of the dimensionality reduction process. In simulation study, we demonstrate that the cPCA-NR outperforms traditional PCA in terms of classification accuracy and clustering performance. Moreover, the proposed method exhibits strong resilience to noisy data, achieving notable improvements in scenarios with high levels of noise. The results highlight the superior performance of cPCA-NR, establishing its potential as a valuable tool for various applications, such as image recognition, anomaly detection, and data visualization.

關鍵字(中)

★ 子組發現
★ 視覺化
★ 特徵選取
★ 去噪

關鍵字(英)

★ subgroup discovery
★ visualizing
★ feature selection
★ denoising

論文目次

摘要 i
Abstract ii
致謝辭 iii
Chapter 1 Introduction . . . . . . . . . . . . . . 1
Chapter 2 Review . . . . . . . . . . . . . . . . . 3
2.1 cPCA . . . . . . . . . . . . . . . . . . . . . 3
2.2 Noise-Reduction Methodology . . . . . . . . . 7
2.3 cPCA-NR . . . . . . . . . . . . . . . . . . . 10
Chapter 3 Numerical Study . . . . . . . . . . . . 12
3.1 Case 1: with background . . . . . . . . . . . 12
3.1.1 Setting . . . . . . . . . . . . . . . . . . 12
3.1.2 Simulation . . . . . . . . . . . . . . . . 13
3.1.2.1 Multivariate normal distribution . . . . 14
3.1.2.2 Multivariate t-distribution . . . . . . . 19
3.2 Case 2: without background . . . . . . . . . 22
3.2.1 Setting . . . . . . . . . . . . . . . . . . 22
3.2.2 Simulation . . . . . . . . . . . . . . . . 23
3.2.2.1 Multivariate Normal Distribution . . . . 23
3.2.2.2 Multivariate t-distribution . . . . . . . 27
Chapter 4 Application . . . . . . . . . . . . . . 30
4.1 Handwritten digits on flower backgrounds . . 30
Chapter 5 Conclusion . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . 35

參考文獻

[1] Abid, A., Zhang, M. J., Bagaria, V. K., and Zou, J. “Contrastive principal component analysis.” arXiv preprint arXiv:1709.06716, 2017.
[2] Ahn, J. “High dimension, low sample size data analysis”. The University of North Carolina at Chapel Hill, 2006.
[3] Kazuyoshi Yata, Makoto Aoshima, “Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations.” Journal of Multivariate Analysis, Volume 105, Issue 1, 2012, pp. 193-215.
[4] Kazuyoshi Yata & Makoto Aoshima, “PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context.” Communications in Statistics - Theory and Methods, 38:16-17, 2009, pp. 2634-2652.
[5] Yi-Ju Chen & Shao-Hsuan Wang. “Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data.” Master’s Thesis, National Central University, 2022.
[6] Hotelling, H. “Analysis of a complex of statistical variables into principal components.” Journal of Educational Psychology, 24, 498-520. 1933.
[7] Takanori Fujiwara, Oh-Hyun Kwon, and Kwan-Liu Ma. “Supporting analysis of dimensionality reduction results with contrastive learning.” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 45–55, 2020.
[8] Imrul Kaish, Jakir Hossain, Evangelos Papalexakis, and Jia Chen, “COVID-19 or flu? Discriminative knowledge discovery of COVID-19 symptoms from Google Trends data,” 4th International Workshop on Epidemiology meets Data Mining and Knowledge discovery, 2021.
[9] Micol Marchetti-Bowick, “Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data.” Ph.D. thesis, Carnegie Mellon University, 2020.

指導教授

王紹宣(Shao-Hsuan Wang)

審核日期

2023-7-25

推文