library(ggplot2)
data.df <- read.csv("Downloads/Phagocyte_PCA.csv", row.names = 1)
gene.PCA <- prcomp( t(data.df) )
PCA.summary <- summary(gene.PCA)
PCA.importance <- PCA.summary$importance
prop.var <- PCA.importance["Proportion of Variance", ]
scree.df <- data.frame(prop.var)
rownames(scree.df) <- names(prop.var)
PCs <- factor(rownames(scree.df), levels = rownames(scree.df))
ggplot(data = scree.df, aes(x = PCs, y = prop.var)) +
geom_col() +
ylab("Proportion of variance captured") +
xlab("Principal component")
biplot(gene.PCA)
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length =
## arrow.len): zero-length arrow is of indeterminate angle and so skipped
rm(PCA.df)
## Warning in rm(PCA.df): object 'PCA.df' not found
PCA.df <- data.frame(gene.PCA$x)
PCA.df$donor <- rep(c(rep("D4",4), rep("D5",4), rep("D6",4)), 2)
PCA.df$cell.type <- rep(c("DCs","Macs","Monos","PMNs"), 6)
PCA.df$treatment <- c(rep("Buffer",12), rep("MRSA",12))
ggplot(PCA.df, aes(x = PC1, y = PC2, colour = cell.type, shape = treatment)) +
geom_point(size = 3) +
coord_equal() +
theme_minimal()
ggplot(PCA.df, aes(x = PC1, y = PC2, colour = donor, shape = treatment)) +
geom_point(size = 3) +
coord_equal() +
theme_minimal()
loadings.df <- as.data.frame(gene.PCA$rotation)
ggplot(loadings.df, aes(x = PC1, y = PC2)) +
geom_text(label = rownames(loadings.df), size = 3) +
xlab("PC1 loading") + ylab("PC2 loading") +
coord_equal() +
theme_minimal()
Q1. PC 1 = 0.6303 (63.03%)
PC 2 = 0.2200 (22.00%)
The first two principal components together account for approximately 85.03. 03% of the total variance in the data.
Q 2. PC 2 = 0.8503 (85.03%)
PC 3 = 0.89243 (89.24%)
PC 4 = 0.92548 (92.55%)
At least four principal components are required to capture 90% of the variance.
Q 3. One disadvantage I notice is that it becomes difficult to read because Step 11 has many gene labels overlapping, making it hard to identify what truly contributes.
Q 4. In the first plot, there were three clusters: on the left, DCs and Macs form a cluster; on either side, high PC 2 values correspond to monos in separate clusters; and with high PC 1, there is a cluster of PMNs. In figure 2, the separation was much weaker and harder to distinguish. This suggests that cell type primarily drives gene related questions rather than donor or treatment specifics.
Q 5. The gene with the highest weight for PC 1 is C5AR 1, and for PC 2, it is MMP 9.
Q 6. In the PCA plot, a black square with an X indicates an unknown sample. It lies between the orange MRSA cluster and the blue buffer, suggesting the mystery sample could be a monocyte. We know the unknown is positioned between the blue and orange groups but remains aligned with other monocytes along PC 1 and PC 2.
Q 7. In our lab, we analyzed a subset of 250 genes, most of which were not strongly required across conditions. As a result, PCA effectively captured overall gene expression. In figure B, which includes around 9,000 transcripts, the clusters are tighter and more distinct.
Q 8. The main question of the study is how MRSA might inhibit the body’ s natural immune response, facilitating bacterial infection.
Q 9. In figures 1D through 1H, the researchers compared MRSA infection activity with cells treated with IFN- y and IL-4. Focusing on the red and blue bars, which indicate genes down- or upregulated by each treatment, we see that MRSA caused few changes, whereas IFN- y and IL-4 affected more genes.
Q 10. The figure providing the clearest evidence that Staphylococcus aureus causes the muted response in Figure 1H is one that shows gene activation and suppression with color. Under IFN-γ and IL-4, there are strong color changes, indicating a weak response and inhibited immune activity.
Q 11. All RNA sequencing data generated in the study have been sourced from the NCBI Gene Expression Omnibus (GEO).