基於深度學習從核醣核酸定序表達譜推斷外周血單核細胞之細胞組成

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：173

、訪客IP：18.219.206.102

姓名

陳彥霖(Yen-Lin Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於深度學習從核醣核酸定序表達譜推斷外周血單核細胞之細胞組成
(PBMC Cell Composition Inference from RNA-seq Expression Profile Based on Deep Learning)

相關論文

★ 基於質譜儀資料使用機器學習辨識克雷伯氏肺炎桿菌之多重抗藥性	★ 結合多種訊號預處理方法於質譜儀資料以辨識細菌對抗生素之抗藥性
★ 利用機器學習預測濁水溪沖積扇區域之地下水位	★ 使用表徵學習和機器學習方法於晶圓線切割機台之異常偵測
★ 基於質譜儀資料利用人工智慧方法辨識革蘭氏陰性菌對環丙沙星抗藥性之特徵峰值	★ 應用數位分身於馬達軸承之異常偵測
★ 基於光誘導介電泳影像處理檢測流體抗藥性	★ 利用機器學習方法基於多類型地層監測資料預測濁水溪沖積扇地區之地層下陷
★ 基於人工智慧模型預測抗菌肽的最小抑菌濃度於特定菌株上	★ 使用語言模型嵌入和不平衡調整之深度學習方法識別多功能抗菌肽
★ 使用權重組合模型預測雲林縣地層下陷

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

單細胞轉錄組定序是一項很有前途的技術，可提供有關單細胞水平的基因表達模式的詳細信息。然而，單細胞轉錄組定序成本非常高，尤其是在分析大量細胞時。為了克服這一限制，研究人員開發了從次世代核醣核酸定序數據推斷細胞組成的方法，同時還利用了深度學習算法。這些基於機器學習的方法通常需要大量訓練數據，從而導致數據生成技術的發展，這些技術可以生成用於訓練細胞反卷積模型的偽批量核醣核酸定序樣本。但是，數據生成方法還有改進的空間來達到更佳的表現。在本研究中，我們使用狄利克雷分佈來生成更接近真實場景的合成核醣核酸定序樣本。我們構建了數個基於深度學習網路與自注意力機制的細胞反卷積模型，這些模型是在狄利克雷方法生成的數據上訓練的，旨在實現比現有方法更優越的性能。為了評估模型的有效性，我們使用兩個真實的人類外周血單核細胞數據集作為測試基準。我們的結果表明，我們的模型在這兩個外周血單核細胞數據集上優於其他現有方法，顯示皮爾森相關性分別約為0.87和0.78。值得注意的是，我們的模型對人類外周血單核細胞中比例較小的細胞類型有更精確的預測。由於數據集之間的差異，我們還強調了為特定數據集構建單獨模型以優化性能的重要性。

摘要(英)

Single-cell RNA sequencing (scRNA-seq) is a promising technique that provides detailed information about gene expression patterns at single-cell level. However, it can be prohibitively expensive, particularly when profiling a large number of cells. To overcome this limitation, researchers have developed methods to infer cell composition from next-generation RNA sequencing (RNA-seq) data, also utilizing deep learning algorithms. These machine leaning-based methods typically require a large amount of training data, leading to the development of data generation techniques that produce pseudo-bulk RNA-seq samples for training cell deconvolution models. However, there is room to improve data generation methods to achieve better performance. In this study, we use the Dirichlet distribution to generate synthetic RNA-seq samples that more closely resemble real-world scenarios. We construct deep learning-based deconvolution models trained on this Dirichlet-generated data, aiming to achieve superior performance compared to existing methods. To evaluate the models′ effectiveness, we employ two real human peripheral blood mononuclear cell (PBMC) datasets as testing benchmarks. Our results demonstrate that our models outperform other existing methods on these two PBMC datasets, showcasing Pearson correlations of approximately 0.87 and 0.78, respectively. Notably, our models achieve more precise predictions for cell types with smaller proportions in human PBMCs. We also emphasize the importance of building individual models for specific datasets to optimize performance due to the variance between datasets.

關鍵字(中)

★ 外周血單核細胞
★ 深度學習
★ 狄利克雷分佈

關鍵字(英)

★ Peripheral blood mononuclear cells
★ Deep learning
★ Dirichlet distribution

論文目次

Table of Contents
中文摘要 ii
Abstract iii
致謝 iv
Table of Contents v
List of Figures vii
List of Tables viii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Related Works 3
1.3 Motivation and Goal 5
Chapter 2 Materials and Methods 6
2.1 Datasets 6
2.2 scRNA-seq Preprocessing and Analysis 8
2.3 RNA-seq Preprocessing and Analysis 10
2.4 Bulk RNA-seq Simulation from scRNA-seq Dataset 11
2.4.1 The Random RNA-seq Data Generation Method 12
2.4.2 The Dirichlet RNA-seq Data Generation Method 13
2.5 Model Input Preprocessing 16
2.6 Model Architectures 17
2.6.1 Deep Neural Network (DNN) 18
2.6.2 Multi-Deep Neural Networks (multiDNNs) 18
2.6.3 Multi-head Self-Attention (selfAttention) 19
2.7 Model Training 20
2.8 Evaluation Metrics 21
Chapter 3 Results and Discussions 23
3.1 Manual Cell Type Identification 23
3.2 Comparison of Cell Type Identification 27
3.3 Pseudo-bulk RNA-seq Data Analysis 36
3.4 Performance of Models 37
3.5 Performance of Each Cell Type 44
3.6 Comparison with Other Related Methods 48
Chapter 4 Conclusions 52
References 53

參考文獻

References
[1] K. Verhoeckx et al., "The Impact of Food Bioactives on Health: In Vitro and Ex Vivo Models," 2015.
[2] M. Akdis, O. Palomares, W. van de Veen, M. van Splunter, and C. A. Akdis, "Th17 and Th22 Cells: A Confusion of Antimicrobial Response with Tissue Inflammation Versus Protection," Journal of Allergy and Clinical Immunology, vol. 129, no. 6, pp. 1438-1449, 2012.
[3] S. Crotty, "Follicular Helper Cd4 T Cells (Tfh)," Annual review of immunology, vol. 29, pp. 621-663, 2011.
[4] C. Tan and I. Gery, "The Unique Features of Th9 Cells and Their Products," Critical Reviews™ in Immunology, vol. 32, no. 1, 2012.
[5] S. Sakaguchi, T. Yamaguchi, T. Nomura, and M. Ono, "Regulatory T Cells and Immune Tolerance," cell, vol. 133, no. 5, pp. 775-787, 2008.
[6] V. Golubovskaya and L. Wu, "Different Subsets of T Cells, Memory, Effector Functions, and Car-T Immunotherapy," Cancers, vol. 8, no. 3, p. 36, 2016.
[7] T. S. Kapellos et al., "Human Monocyte Subsets and Phenotypes in Major Chronic Inflammatory Diseases," (in English), Frontiers in Immunology, Review vol. 10, 2019-August-30 2019, doi: 10.3389/fimmu.2019.02035.
[8] T. A. Patente, M. P. Pinho, A. A. Oliveira, G. C. M. Evangelista, P. C. Bergami-Santos, and J. A. M. Barbuto, "Human Dendritic Cells: Their Heterogeneity and Clinical Application Potential in Cancer Immunotherapy," (in English), Frontiers in Immunology, Review vol. 9, 2019-January-21 2019, doi: 10.3389/fimmu.2018.03176.
[9] K. M. McKinnon, "Flow Cytometry: An Overview," Current protocols in immunology, vol. 120, no. 1, pp. 5.1. 1-5.1. 11, 2018.
[10] F. S. Collins and H. Varmus, "A New Initiative on Precision Medicine," New England journal of medicine, vol. 372, no. 9, pp. 793-795, 2015.
[11] T. Chu, Z. Wang, D. Pe’er, and C. G. Danko, "Cell Type and Gene Expression Deconvolution with Bayesprism Enables Bayesian Integrative Analysis across Bulk and Single-Cell Rna Sequencing in Oncology," Nature Cancer, vol. 3, no. 4, pp. 505-517, 2022.
[12] A. M. Newman et al., "Robust Enumeration of Cell Subsets from Tissue Expression Profiles," Nature methods, vol. 12, no. 5, pp. 453-457, 2015.
[13] A. M. Newman et al., "Determining Cell Type Abundance and Expression from Bulk Tissues with Digital Cytometry," Nature biotechnology, vol. 37, no. 7, pp. 773-782, 2019.
[14] X. Wang, J. Park, K. Susztak, N. R. Zhang, and M. Li, "Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference," Nature communications, vol. 10, no. 1, p. 380, 2019.
[15] B. Jew et al., "Accurate Estimation of Cell Composition in Bulk Expression through Robust Integration of Single-Cell Information," Nature communications, vol. 11, no. 1, p. 1971, 2020.
[16] D. Tsoucas, R. Dong, H. Chen, Q. Zhu, G. Guo, and G.-C. Yuan, "Accurate Estimation of Cell-Type Composition from Gene Expression Data," Nature communications, vol. 10, no. 1, p. 2975, 2019.
[17] D. D. Erdmann-Pham, J. Fischer, J. Hong, and Y. S. Song, "Likelihood-Based Deconvolution of Bulk Gene Expression Data Using Single-Cell References," Genome research, vol. 31, no. 10, pp. 1794-1806, 2021.
[18] A. Frishberg et al., "Cell Composition Analysis of Bulk Genomics Using Single-Cell Data," Nature methods, vol. 16, no. 4, pp. 327-332, 2019.
[19] B. Andrade Barbosa et al., "Bayesian Log-Normal Deconvolution for Enhanced in Silico Microdissection of Bulk Gene Expression Data," Nature communications, vol. 12, no. 1, p. 6106, 2021.
[20] C. Torroja and F. Sanchez-Cabo, "Digitaldlsorter: Deep-Learning on Scrna-Seq to Deconvolute Gene Expression Data," (in English), Frontiers in Genetics, Technology Report vol. 10, 2019-October-25 2019, doi: 10.3389/fgene.2019.00978.
[21] K. Menden et al., "Deep Learning–Based Cell Composition Analysis from Tissue Expression Profiles," Science advances, vol. 6, no. 30, p. eaba2619, 2020.
[22] Y. Chen et al., "Deep Autoencoder for Interpretable Tissue-Adaptive Deconvolution and Cell-Type-Specific Gene Analysis," Nature Communications, vol. 13, no. 1, p. 6735, 2022.
[23] 6k Pbmcs from a Healthy Donor, Single Cell Gene Expression Dataset by Cell Ranger 1.1.0, 10x Genomics, 2016. (Https://Www.10xgenomics.Com/Resources/Datasets/6-K-Pbm-Cs-from-a-Healthy-Donor-1-Standard-1-1-0)
[24] 8k Pbmcs from a Healthy Donor, Single Cell Gene Expression Dataset by Cell Ranger 2.1.0, 10x Genomics, 2017. (Https://Www.10xgenomics.Com/Resources/Datasets/8-K-Pbm-Cs-from-a-Healthy-Donor-2-Standard-2-1-0)
[25] Frozen Pbmcs (Donor a), Single Cell Gene Expression Dataset by Cell Ranger 1.1.0, 10x Genomics, 2016. (Https://Www.10xgenomics.Com/Resources/Datasets/Frozen-Pbm-Cs-Donor-a-1-Standard-1-1-0)
[26] Frozen Pbmcs (Donor C), Single Cell Gene Expression Dataset by Cell Ranger 1.1.0, 10x Genomics, 2016. (Https://Www.10xgenomics.Com/Resources/Datasets/Frozen-Pbm-Cs-Donor-C-1-Standard-1-1-0)
[27] G. Monaco et al., "Rna-Seq Signatures Normalized by Mrna Abundance Allow Absolute Deconvolution of Human Immune Cell Types," Cell reports, vol. 26, no. 6, pp. 1627-1640. e7, 2019.
[28] D. Aran et al., "Reference-Based Analysis of Lung Single-Cell Sequencing Reveals a Transitional Profibrotic Macrophage," Nature immunology, vol. 20, no. 2, pp. 163-172, 2019.
[29] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija, "Integrating Single-Cell Transcriptomic Data across Different Conditions, Technologies, and Species," Nature biotechnology, vol. 36, no. 5, pp. 411-420, 2018.
[30] J. Alquicira-Hernandez, A. Sathe, H. P. Ji, Q. Nguyen, and J. E. Powell, "Scpred: Accurate Supervised Method for Cell-Type Classification from Single-Cell Rna-Seq Data," Genome biology, vol. 20, no. 1, pp. 1-17, 2019.
[31] A. Ianevski, A. K. Giri, and T. Aittokallio, "Fully-Automated and Ultra-Fast Cell-Type Identification Using Specific Marker Combinations from Single-Cell Transcriptomic Data," Nature communications, vol. 13, no. 1, p. 1246, 2022.
[32] F. Cunningham et al., "Ensembl 2022," Nucleic Acids Research, vol. 50, no. D1, pp. D988-D995, 2021, doi: 10.1093/nar/gkab1049.
[33] M. Dunning, A. Lynch, and M. Eldridge, "Illuminahumanv4. Db: Illumina Humanht12v4 Annotation Data (Chip Illuminahumanv4)," R package version, vol. 1, no. 0, 2015.
[34] A. Vaswani et al., "Attention Is All You Need," Advances in neural information processing systems, vol. 30, 2017.
[35] I. Lawrence and K. Lin, "A Concordance Correlation Coefficient to Evaluate Reproducibility," Biometrics, pp. 255-268, 1989.

指導教授

洪炯宗吳立青(Jorng-Tzong Horng Li-Ching Wu)

審核日期

2023-7-28

推文