Organism Homo sapiens
The recent emergence of COVID-19 presents a major global crisis. Profound knowledge gaps remain about the interaction between the virus and the immune system. Here, we used a systems biology approach to analyze immune responses in 76 COVID-19 patients and 69 age and sex- matched controls, from Hong Kong and Atlanta. Mass cytometry revealed prolonged plasmablast and effector T cell responses, reduced myeloid expression of HLA-DR and inhibition of mTOR signaling in plasmacytoid DCs (pDCs) during infection. Production of pro-inflammatory cytokines plasma levels of inflammatory mediators, including EN-RAGE, TNFSF14, and Oncostatin-M, which correlated with disease severity, and increased bacterial DNA and endotoxin in plasma in and reduced HLA-DR and CD86 but enhanced EN-RAGE expression in myeloid cells in severe transient expression of IFN stimulated genes in moderate infections, consistent with transcriptomic analysis of bulk PBMCs, that correlated with transient and low levels of plasma COVID-19.
Overall design: RNAseq analysis of PBMCs in a group of 17 COVID-19 subjects and 17 healthy controls
library(DESeq2)
library(RColorBrewer)
library(pheatmap)
library(tidyverse)
library(dplyr)
[1] covid-19 covid-19 covid-19 covid-19 covid-19 covid-19 covid-19 covid-19 covid-19 covid-19 covid-19
[12] covid-19 covid-19 covid-19 healthy healthy healthy healthy healthy healthy healthy healthy
[23] healthy healthy healthy healthy healthy healthy healthy healthy healthy
Levels: covid-19 healthy
dds <- DESeqDataSetFromMatrix(countData=covid_count2, colData=coldata, design=~condition)
Note: levels of factors in the design contain characters other than
letters, numbers, '_' and '.'. It is recommended (but not required) to use
only letters, numbers, and delimiters '_' or '.', as these are safe characters
for column names in R. [This is a message, not an warning or error]
dds
class: DESeqDataSet
dim: 60683 31
metadata(1): version
assays(1): counts
rownames(60683): ENSG00000223972 ENSG00000227232 ... ENSG00000277475 ENSG00000268674
rowData names(0):
colnames(31): S147_nCoV001EUHM.Draw.1 S149_nCoV002EUHM.Draw.2 ... S185_266 S186_SHXA14
colData names(1): condition
Considered factors
Since, Genes are same for both conditions. This factor can be ignorable.
Note: levels of factors in the design contain characters other than
letters, numbers, '_' and '.'. It is recommended (but not required) to use
only letters, numbers, and delimiters '_' or '.', as these are safe characters
for column names in R. [This is a message, not an warning or error]
S147_nCoV001EUHM.Draw.1 S149_nCoV002EUHM.Draw.2 S150_nCoV003EUHM.Draw.1 S151_nCoV004EUHM.Draw.1
0.9321496 0.8067540 1.1298006 0.7740941
S152_nCoV006EUHM.Draw.1 S153_nCoV007EUHM.Draw.1 S154_nCoV0010EUHM.Draw.1 S156_nCOV024EUHM.Draw.1
0.8336102 0.8110747 0.7640154 0.8630665
S157_nCOV0029EUHM S176_nCoV025EUHM.Draw.1 S177_nCoV025EUHM.Draw.2 S178_nCoV028EUHM.Draw.1
1.0968596 0.8322252 1.0140217 1.1993037
S179_nCoV033EUHM.Draw.1 S180_nCoV034EUHM.Draw.1 S061_257 S062_258
0.8447828 0.8883697 1.0779545 1.1385925
S063_259 S064_260 S065_261 S066_265
1.0724791 1.1217222 1.2982392 1.1273679
S067_270 S068_272 S069_273 S070_279
1.1581996 1.0428990 1.0838983 1.3094015
S071_280 S181_255 S182_SHXA10 S183_263
1.1116219 0.9276268 0.9395406 1.1970649
S184_SHXA18 S185_266 S186_SHXA14
1.1121298 1.0263585 1.0723443
normalized_dds <- counts(dds, normalized = TRUE)
vsd_obj <- vst(dds, blind = TRUE)
assayed_vsd <- assay(vsd_obj)
correlated_vsd <- cor(assayed_vsd)
Pre_QC Heatmap
Pre-QC PCA
In the Unsupervised Clustering, The biological replicates based on condition seems to cluster. Some outliers sample (175, 155, 145) are removed. So, proceeded with 14 Covid and 17 Healthy Samples.
pheatmap(correlated_vsd, annotation = select(meta,condition), cutree_cols = 2)
plotPCA(vsd_obj, intgroup = "condition")
43% cause of variance explained by the Covid-19 infected condition.
mean_count <- apply(covid_count2[,1:31], 1, mean)
var_count <- apply(covid_count2[,1:31],1,var)
disp <- data.frame(mean_count,var_count)
ggplot(disp)+
geom_point(aes(x=mean_count,y=var_count))+
scale_x_log10()+
scale_y_log10()+
xlab("Mean")+
ylab("Variance")+
ggtitle("Dispersion Plot")
plotDispEsts(dds_obj)
Clustered Nicely around the Maximum Liklihood Fitted Line. Proceeded with Blue dots to avoid false positives.
dds_results <- results(dds_obj,
contrast = c("condition", "covid-19","healthy"),
alpha = 0.05)
plotMA(dds_results, ylim = c(-8,8))
out of 52595 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 4651, 8.8%
LFC < 0 (down) : 2603, 4.9%
outliers [1] : 0, 0%
low counts [2] : 21241, 40%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
The red dots in the MAplot representing the real DE genes. But for lower mean counts the fold change is pretty high. Log fold change shrinkage is done to standardize the dispersion. The DESeq package have excluded 21241 NA/0 count genes.
dds_results2 <- results(dds_obj,
contrast = c("condition", "covid-19","healthy"),
alpha = 0.05,
lfcThreshold = 0.32)
dds_lfcdone <- lfcShrink(dds_obj,
contrast = c("condition", "covid-19","healthy"),
res = dds_results2)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).
Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895
plotMA(dds_lfcdone, ylim = c(-8,8))
DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MAP): condition covid-19 vs healthy
lfcSE results standard error: condition covid-19 vs healthy
stat results Wald statistic: condition covid-19 vs healthy
pvalue results Wald test p-value: condition covid-19 vs healthy
padj results BH adjusted p-values
out of 52595 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1360, 2.6%
LFC < 0 (down) : 206, 0.39%
outliers [1] : 0, 0%
low counts [2] : 22253, 42%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
So, 3% genes are differentially expressed.
dds_lfcdone <- data.frame(dds_lfcdone)%>%
tibble::rownames_to_column(var = "ensgene")
filtered_genes <- left_join(x = dds_lfcdone,
y = grch38[,c("ensgene", "symbol", "description")],
by = "ensgene")
ensgene baseMean log2FoldChange lfcSE
Length:61030 Min. : 0.0 Min. :-3.042 Min. :0.000
Class :character 1st Qu.: 0.1 1st Qu.:-0.029 1st Qu.:0.237
Mode :character Median : 0.5 Median : 0.070 Median :0.321
Mean : 259.4 Mean : 0.182 Mean :0.320
3rd Qu.: 12.2 3rd Qu.: 0.351 3rd Qu.:0.448
Max. :322992.8 Max. : 6.054 Max. :0.488
NA's :8209 NA's :8209
stat pvalue padj symbol
Min. :-8.506 Min. :0.000 Min. :0.000 Length:61030
1st Qu.: 0.000 1st Qu.:0.477 1st Qu.:0.929 Class :character
Median : 0.018 Median :0.899 Median :1.000 Mode :character
Mean : 0.344 Mean :0.722 Mean :0.845
3rd Qu.: 0.450 3rd Qu.:1.000 3rd Qu.:1.000
Max. :15.930 Max. :1.000 Max. :1.000
NA's :8209 NA's :8209 NA's :30511
description
Length:61030
Class :character
Mode :character
significant_genes <- subset(filtered_genes, padj<0.05)%>%
arrange(padj)
significant_genes
summary(significant_genes)
ensgene baseMean log2FoldChange lfcSE stat
Length:1573 Min. : 0.65 Min. :-2.536 Min. :0.04185 Min. :-8.506
Class :character 1st Qu.: 6.04 1st Qu.: 1.038 1st Qu.:0.20133 1st Qu.: 3.221
Mode :character Median : 45.72 Median : 1.620 Median :0.28804 Median : 3.893
Mean : 596.99 Mean : 1.406 Mean :0.29586 Mean : 3.874
3rd Qu.: 232.77 3rd Qu.: 2.008 3rd Qu.:0.40303 3rd Qu.: 5.340
Max. :65942.90 Max. : 6.054 Max. :0.48757 Max. :15.930
pvalue padj symbol description
Min. :0.000e+00 Min. :0.000e+00 Length:1573 Length:1573
1st Qu.:6.160e-08 1st Qu.:4.750e-06 Class :character Class :character
Median :5.841e-05 Median :2.263e-03 Mode :character Mode :character
Mean :4.553e-04 Mean :1.048e-02
3rd Qu.:7.024e-04 3rd Qu.:1.815e-02
Max. :2.579e-03 Max. :4.996e-02
exp_heatdata <- normalized_dds[significant_genes$ensgene,]
heat_colors <- brewer.pal(6, "YlOrRd")
pheatmap(exp_heatdata,
color = heat_colors,
cluster_rows = T,
show_rownames = F,
annotation = select(meta,condition),
cutree_cols = 2,
scale = "row")
filtered_genes <- filtered_genes%>%
mutate(threshold = padj <0.05)
ggplot(filtered_genes)+
geom_point(aes(x = log2FoldChange, y=log10(padj), color=threshold))+
xlab("log2 fold change") +
ylab("-log10 adjusted p-value") +
theme(legend.position = "none",
plot.title = element_text(size = rel(1.5), hjust = 0.5),
axis.title = element_text(size = rel(1.25)))
NA
NA
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] annotables_0.1.91 magrittr_1.5 forcats_0.5.0
[4] stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3
[7] readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[10] tidyverse_1.3.0 pheatmap_1.0.12 RColorBrewer_1.1-2
[13] DESeq2_1.24.0 SummarizedExperiment_1.14.1 DelayedArray_0.10.0
[16] BiocParallel_1.18.1 matrixStats_0.57.0 Biobase_2.44.0
[19] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.3
[22] S4Vectors_0.22.1 BiocGenerics_0.30.0 plotly_4.9.2.1
[25] ggplot2_3.3.2
loaded via a namespace (and not attached):
[1] colorspace_1.4-1 ellipsis_0.3.1 rsconnect_0.8.16 htmlTable_2.1.0
[5] XVector_0.24.0 base64enc_0.1-3 fs_1.3.1 rstudioapi_0.11
[9] farver_2.0.3 bit64_4.0.5 AnnotationDbi_1.46.1 fansi_0.4.1
[13] lubridate_1.7.4 xml2_1.2.2 splines_3.6.1 geneplotter_1.62.0
[17] knitr_1.30 Formula_1.2-4 jsonlite_1.7.1 broom_0.5.3
[21] annotate_1.62.0 cluster_2.1.0 dbplyr_1.4.4 png_0.1-7
[25] compiler_3.6.1 httr_1.4.2 backports_1.1.5 assertthat_0.2.1
[29] Matrix_1.2-18 lazyeval_0.2.2 cli_2.1.0 htmltools_0.5.0
[33] tools_3.6.1 gtable_0.3.0 glue_1.3.1 GenomeInfoDbData_1.2.1
[37] Rcpp_1.0.3 cellranger_1.1.0 vctrs_0.3.4 nlme_3.1-143
[41] crosstalk_1.1.0.1 xfun_0.18 rvest_0.3.6 lifecycle_0.2.0
[45] XML_3.99-0.3 zlibbioc_1.30.0 scales_1.1.1 hms_0.5.3
[49] yaml_2.2.1 memoise_1.1.0 gridExtra_2.3 rpart_4.1-15
[53] latticeExtra_0.6-29 stringi_1.4.4 RSQLite_2.2.1 genefilter_1.66.0
[57] checkmate_2.0.0 rlang_0.4.8 pkgconfig_2.0.3 bitops_1.0-6
[61] evaluate_0.14 lattice_0.20-38 htmlwidgets_1.5.2 labeling_0.4.2
[65] bit_4.0.4 tidyselect_0.2.5 R6_2.4.1 generics_0.0.2
[69] Hmisc_4.4-1 DBI_1.1.0 pillar_1.4.6 haven_2.3.1
[73] foreign_0.8-74 withr_2.3.0 survival_3.1-8 RCurl_1.98-1.2
[77] nnet_7.3-12 modelr_0.1.8 crayon_1.3.4 rmarkdown_2.5
[81] jpeg_0.1-8.1 locfit_1.5-9.4 grid_3.6.1 readxl_1.3.1
[85] data.table_1.13.0 blob_1.2.1 reprex_0.3.0 digest_0.6.26
[89] xtable_1.8-4 munsell_0.5.0 viridisLite_0.3.0
A work by Md. Tabassum Hossain Emon
emon.biotech.10th@gmail.com