Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/89628

Title:	Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data
Authors:	陳奕儒;Chen, Yi-Ju
Contributors:	統計研究所
Keywords:	子組發現;可視化;特徵選取;去噪;subgroup discovery;visualizing;feature selection;denoising
Date:	2022-08-01
Issue Date:	2022-10-04 11:49:58 (UTC+8)
Publisher:	國立中央大學
Abstract:	主成分分析（PCA）是一種常用的線性降維方法，在降維過程中保留了數據之間變數的變異性。PCA 通常用於可視化單個數據集；對比成分分析 (CPCA) 是傳統 PCA的推廣。CPCA 可用於存在多個數據集（如實驗組和對照組）的情況，CPCA 可以在參考其他數據集的前提下探索特定數據集獨特的低維結構。然而，雖然 CPCA 已在許多領域被證明可以找到 PCA 忽略的重要數據模式（Abubakar Abid，2017），但CPCA 缺乏一個統計模型來告訴我們為什麼 CPCA 可以識別我們感興趣的那些變化。在本文中，我們提出 CPCA 的模型假設。我們將目標數據劃分為我們感興趣的信號矩陣和我們不感興趣的滋擾矩陣，並試圖說明我們不感興趣的滋擾矩陣對目標數據的影響可以通過 CPCA 移除。另一方面，我們通過模擬分析說明 CPCA 還原信號矩陣的優勢。除此之外，我們根據我們對 CPCA 的模型假設提出了一種新方法，用以幫助我們選取對執行 CPCA 很重要的對比參數。最後，我們通過調整對比參數在合成圖像示例中找到了感興趣的數據模式，並驗證了我們選擇對比參數的新方法可以達到相同的效果。;Principal Component Analysis (PCA) is a commonly used linear dimensionality reduction method and is often used to visualize a single dataset; Contrastive Component Analysis (CPCA) can be used in situations where there are multiple datasets, and CPCA can explore the unique low-dimensional structure of a specific dataset on the premise of referring to other datasets. However, while CPCA has been shown in many fields to find important data pat terns that PCA ignores (Abubakar Abid, 2017), CPCA lacks a statistical model to tell us why CPCA can identify those changes that we are interested in. In this paper, we propose a statis tical model for CPCA. We divide the target data into the signal matrix that we are interested in and the nuisance matrix that we are not interested in, and try to explain that the influence of the nuisance matrix on the target data can be removed by CPCA. On the other hand, we illustrate the advantages of CPCA in restoring the signal matrix using simulation analysis. Furthermore, we propose a new method based on our model to help us decide on the contrast parameter that is important to perform CPCA. Finally, we found data patterns of interest in the synthetic image example by adjusting the contrast parameter, and verified that our new method of choosing the contrast parameter can achieve the same effect.
Appears in Collections:	[Graduate Institute of Statistics] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	46	View/Open

社群 sharing

Loading...