This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

setwd('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis')

library(fgsea)
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
library(stats)
library(cowplot)
library(DESeq2)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
    rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min


Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

    findMatches

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

    windows

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs,
    colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges,
    colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians,
    colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats,
    rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans,
    rowWeightedMedians, rowWeightedSds, rowWeightedVars

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages
    'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians
library(ggplot2)
library(EnhancedVolcano)
Loading required package: ggrepel
library(genefilter)

Attaching package: ‘genefilter’

The following objects are masked from ‘package:MatrixGenerics’:

    rowSds, rowVars

The following objects are masked from ‘package:matrixStats’:

    rowSds, rowVars
library(calibrate)
Loading required package: MASS

Attaching package: ‘MASS’

The following object is masked from ‘package:genefilter’:

    area
library(RColorBrewer)
library(ComplexHeatmap)
Loading required package: grid
========================================
ComplexHeatmap version 2.16.0
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference

If you use it in published research, please cite either one:
- Gu, Z. Complex Heatmap Visualization. iMeta 2022.
- Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
    genomic data. Bioinformatics 2016.


The new InteractiveComplexHeatmap package can directly export static 
complex heatmaps into an interactive Shiny app with zero effort. Have a try!

This message can be suppressed by:
  suppressPackageStartupMessages(library(ComplexHeatmap))
========================================


Attaching package: ‘ComplexHeatmap’

The following object is masked from ‘package:genefilter’:

    dist2
library(goseq)
Loading required package: BiasedUrn
Loading required package: geneLenDataBase
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      

Attaching package: ‘geneLenDataBase’

The following object is masked from ‘package:S4Vectors’:

    unfactor
library(geneLenDataBase)
library(gage)
library(AnnotationDbi)

Attaching package: ‘AnnotationDbi’

The following object is masked from ‘package:MASS’:

    select
library(org.Mm.eg.db)
library(reshape2)
library(pathview)

##############################################################################
Pathview is an open source software package distributed under GNU General
Public License version 3 (GPLv3). Details of GPLv3 is available at
http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
formally cite the original Pathview paper (not just mention it) in publications
or products. For details, do citation("pathview") within R.

The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
license agreement (details at http://www.kegg.jp/kegg/legal.html).
##############################################################################
library(dplyr)

Attaching package: ‘dplyr’

The following object is masked from ‘package:AnnotationDbi’:

    select

The following object is masked from ‘package:MASS’:

    select

The following object is masked from ‘package:Biobase’:

    combine

The following object is masked from ‘package:matrixStats’:

    count

The following objects are masked from ‘package:GenomicRanges’:

    intersect, setdiff, union

The following object is masked from ‘package:GenomeInfoDb’:

    intersect

The following objects are masked from ‘package:IRanges’:

    collapse, desc, intersect, setdiff, slice, union

The following objects are masked from ‘package:S4Vectors’:

    first, intersect, rename, setdiff, setequal, union

The following objects are masked from ‘package:BiocGenerics’:

    combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(VennDiagram)
Loading required package: futile.logger
library(magrittr)

Attaching package: ‘magrittr’

The following object is masked from ‘package:GenomicRanges’:

    subtract
library(clusterProfiler)
clusterProfiler v4.8.1  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/

If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

Attaching package: ‘clusterProfiler’

The following object is masked from ‘package:AnnotationDbi’:

    select

The following object is masked from ‘package:MASS’:

    select

The following object is masked from ‘package:IRanges’:

    slice

The following object is masked from ‘package:S4Vectors’:

    rename

The following object is masked from ‘package:stats’:

    filter
library(AnnotationDbi)
library(PoiClaClu)
library(tidyr)

Attaching package: ‘tidyr’

The following object is masked from ‘package:magrittr’:

    extract

The following object is masked from ‘package:reshape2’:

    smiths

The following object is masked from ‘package:S4Vectors’:

    expand
library(tibble)
library(msigdbr)
library(ggpubr)

Attaching package: ‘ggpubr’

The following object is masked from ‘package:VennDiagram’:

    rotate

The following object is masked from ‘package:cowplot’:

    get_legend
library(GO.db)
library(GOstats)
Loading required package: Category
Loading required package: Matrix

Attaching package: ‘Matrix’

The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack

The following object is masked from ‘package:S4Vectors’:

    expand

Loading required package: graph
Warning: replacing previous import ‘utils::findMatches’ by ‘S4Vectors::findMatches’ when loading ‘AnnotationForge’
Attaching package: ‘GOstats’

The following object is masked from ‘package:AnnotationDbi’:

    makeGOGraph
library(gageData)
countDataNK <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.count.csv', header = TRUE, sep = ",")
metaDataNK <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.metadata.csv', header = TRUE, sep = ",")

#First attempting to look at the data without subsetting based on condition
dds <- DESeqDataSetFromMatrix(countData=countDataNK, DataFrame(metaDataNK), design=~genotype, tidy = TRUE)
Warning: some variables in design formula are characters, converting to factors
dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 495 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
vsd <- vst(dds, blind=FALSE)

# plotPCA(vsd, intgroup="condition")
# plotPCA(vsd, intgroup="genotype")
# plotPCA(vsd, intgroup= c("genotype", "condition"))


pcaData <- plotPCA(vsd, intgroup=c("condition", "genotype"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))

pcaplot<-ggplot(pcaData, aes(PC1, PC2, color=genotype, shape=condition)) +
  geom_point(size=5) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) +
  theme_light()

ggsave(filename = "PCA-plot.tiff",
  plot = pcaplot, device = "tiff", path = NULL, scale = 1, width = 5, height = 5, units = c("in"), dpi = 600)
pcaplot

#Trying to make a VENN diagram indiscriminate of subsets - just all subsets and genotype as the main comparitor

resA_DKO <- results(dds, contrast=c("genotype","DKO","NCR"))
resB_FOXO1 <- results(dds, contrast=c("genotype","FOXO1KO","NCR"))
resC_GFI1 <- results(dds, contrast=c("genotype","GFI1KO","NCR"))

  get_lfc_sign <- function(df, thr = 1.1){
  df$lfc_thr <- 0
  df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
  df$lfc_thr[which(df$log2FoldChange <= (-1.1 * thr))] <- -1
  return(df)
}

resA_DKO <- get_lfc_sign(resA_DKO, thr=1.1)
resB_FOXO1 <- get_lfc_sign(resB_FOXO1, thr=1.1)
resC_GFI1 <- get_lfc_sign(resC_GFI1, thr=1.1)

resA_DKO <- resA_DKO[order(resA_DKO$pvalue, decreasing=FALSE), ]
resB_FOXO1 <- resB_FOXO1[order(resB_FOXO1$pvalue, decreasing=FALSE), ]
resC_GFI1 <- resC_GFI1[order(resC_GFI1$pvalue, decreasing=FALSE), ]

table(resA_DKO$padj < 0.05, resA_DKO$lfc_thr)
       
           -1     0     1
  FALSE   634 19662   674
  TRUE    475  3108   575
table(resB_FOXO1$padj < 0.05, resB_FOXO1$lfc_thr)
       
           -1     0     1
  FALSE    87 17197    95
  TRUE    222  1967   140
table(resC_GFI1$padj < 0.05, resC_GFI1$lfc_thr)
       
           -1     0     1
  FALSE    46 19414   184
  TRUE     52  1087   280
resA_DKO$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resA_DKO), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resA_DKO$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resA_DKO), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resA_DKO$name <- mapIds(org.Mm.eg.db, keys=row.names(resA_DKO), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
#Add Entrez, Symbol and gene name to resB_FOXO1 files
resB_FOXO1$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resB_FOXO1), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resB_FOXO1$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resB_FOXO1), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resB_FOXO1$name <- mapIds(org.Mm.eg.db, keys=row.names(resB_FOXO1), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
#Add Entrez, Symbol and gene name to resC_GFI1 files
resC_GFI1$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resC_GFI1), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resC_GFI1$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resC_GFI1), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resC_GFI1$name <- mapIds(org.Mm.eg.db, keys=row.names(resC_GFI1), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resASig_NK <- resA_DKO[which(resA_DKO$padj < 0.05 & abs(resA_DKO$log2FoldChange) >= 1.1 & resA_DKO$baseMean >= 20), ]
resBSig_NK <- resB_FOXO1[which(resB_FOXO1$padj < 0.05 & abs(resB_FOXO1$log2FoldChange) >= 1.1 & resB_FOXO1$baseMean >= 20), ]
resCSig_NK <- resC_GFI1[which(resC_GFI1$padj < 0.05 & abs(resC_GFI1$log2FoldChange) >= 1.1 & resC_GFI1$baseMean >= 20), ]

write.csv(resASig_NK, "NK_NCR_vs_DKO_0.05pAdj1.1L2FC20BaseMean.csv")
write.csv(resBSig_NK, "NK_NCR_vs_FOXO1KO_0.05pAdj1.1L2FC20BaseMean.csv")
write.csv(resCSig_NK, "NK_NCR_vs_GFI1KO_0.05pAdj1.1L2FC20BaseMean.csv")


##VENN diagram DEG of ALL NK indiscriminate of Subset

set1NK <- resASig_NK$symbol
set2NK <- resBSig_NK$symbol
set3NK <- resCSig_NK$symbol
colors <- c("slateblue1", "goldenrod1", "indianred1")

set1NK<-na.omit(set1NK)
set2NK<-na.omit(set2NK)
set3NK<-na.omit(set3NK)

venn.diagram(
  x = list(set1NK, set2NK, set3NK),
  category.names = c("DKO", "FOXO1KO", "GFI1KO"),
  filename = 'DEG ALL NK VennDiagram.png',
  output = TRUE,
  imagetype = "png",
  scaled = FALSE,
  col = "black",
  fill = colors,
  cat.col = colors,
  cat.cex = 1.5,   # Adjust the font size to make the labels smaller
  cat.pos = 0,     # Move the labels outside the circles
  cat.dist = c(0.05, 0.05, -0.45),  # Adjust the distance for each category label individually
  margin = 0.15
)
[1] 1
countData <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.count.csv', header = TRUE, sep = ",", row.names = 1)
metaData <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.metadata.csv', header = TRUE, sep = ",")

metaData$genotype <- factor(metaData$genotype, levels = c("NCR", "FOXO1KO", "GFI1KO", "DKO"))
metaData$condition <- factor(metaData$condition, levels = c("Imm", "M1", "M2"))

######################### within each condition, how does KO compare to WT########################################

dds_meta1 <- metaData %>% filter(condition == "M1")
dds_1 <- DESeqDataSetFromMatrix(countData=countData[,which(colnames(countData) %in% dds_meta1$Geneid)], dds_meta1, design=~genotype)
#dds_1 <- dds_1[rowSums(counts(dds_1)) > 8, ]
dds_1 <- DESeq(dds_1)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
resA <- results(dds_1, contrast=c("genotype","DKO","NCR"))
resB <- results(dds_1, contrast=c("genotype","FOXO1KO","NCR"))
resC <- results(dds_1, contrast=c("genotype","GFI1KO","NCR"))
# change the last variable in the sequence to be the control and the second to be the comparitor - this resA here, NCR (control) vs DKO 

  get_lfc_sign <- function(df, thr = 1.1){
  df$lfc_thr <- 0
  df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
  df$lfc_thr[which(df$log2FoldChange <= (-1.1 * thr))] <- -1
  return(df)
}

resA <- get_lfc_sign(resA, thr=1.1)
resB <- get_lfc_sign(resB, thr=1.1)
resC <- get_lfc_sign(resC, thr=1.1)

resA <- resA[order(resA$pvalue, decreasing=FALSE), ]
resB <- resB[order(resB$pvalue, decreasing=FALSE), ]
resC <- resC[order(resC$pvalue, decreasing=FALSE), ]

table(resA$padj < 0.05, resA$lfc_thr)
       
           -1     0     1
  FALSE   910 19202  1251
  TRUE    416  1526   484
table(resB$padj < 0.05, resB$lfc_thr)
       
           -1     0     1
  FALSE   668 18836   692
  TRUE    250   818   152
table(resC$padj < 0.05, resC$lfc_thr)
       
           -1     0     1
  FALSE   543 20370   889
  TRUE     43   504   262
 

#MA plot

plotMA(
  resA,
  #padj = 0.05,
  #log2Foldchange = 1.5,
  alpha = 0.5,
  #select.top.method = c("padj", "log2FoldChange"),
  #top = 20,
  #select.top.method = c("padj", "log2FoldChange"),
  main = "M1 NK cells - NCR Vs DKO",
  xlab = "Average log2 counts per million",
  ylab = "Log2 Fold Change",
  ylim = c(-15, 15),
  colNonSig = "black",
  #palette = c("#B31B21", "#1465AC", "darkgray"),
  #colSig = "firebrick1",
  colSig = "dodgerblue",
  returnData = FALSE,
  MLE = FALSE,
  #ggtheme = theme_classic(),
)

resA$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resA$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resA$name <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
#Add Entrez, Symbol and gene name to resB files
resB$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resB$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resB$name <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
#Add Entrez, Symbol and gene name to resC files
resC$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resC$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resC$name <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
# Add normalized count to the result Table
resAanno<- merge(as.data.frame(resA), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resBanno<- merge(as.data.frame(resB), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resCanno<- merge(as.data.frame(resC), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)

write.csv(resAanno, "M1_NCR_vs_DKO.csv")
write.csv(resBanno, "M1_NCR_vs_FOXO1KO.csv")
write.csv(resCanno, "M1_NCR_vs_GFI1KO.csv")
## This is just me playing around with a volcano plot first to see lots of genes and then a second one where I've handpicked some genes to map on 

#volcano plot resA
volplot_resA_discovery <- EnhancedVolcano(resA, 
                           lab = resA$symbol, 
                           x = "log2FoldChange", 
                           y = "padj", 
                           border = "full", 
                           borderColour = "black", 
                           gridlines.major = F, 
                           gridlines.minor = F,
                           title = 'M1 - DKO vs NCR',
                           subtitle = bquote(italic('FDR <= 0.05 and absolute FC >= 1.1')),
                              #selectLab = c('Eomes','FOXO1','Gfi1','Id3','Pdcd1','Gzma','Gzmb', 'Tbx21'),
                           pCutoff = 0.00005,
                           FCcutoff = 1.1,
                           pointSize = 3.0,
                           labSize = 3.5,
                           legendPosition = 'none',
                           xlim = c(-10, 15),
                              #legendLabels = c("NS", expression(Log[2] ~ FC), "p-value", expression(p - value ~ and ~ log[2] ~ FC)),
                           #drawConnectors = TRUE, 
                           max.overlaps = 100,
                           lengthConnectors = unit(0.001, "npc"),
                           typeConnectors = 'closed',
                           endsConnectors = "last",
                           widthConnectors = 0.1,
                              #col=c("grey", "green4", "blue", "red3"),
                           colAlpha = 0.5
)
Warning: One or more p-values is 0. Converting to 10^-1 * current lowest non-zero p-value...
volplot_resA_discovery

lab_italics <- paste0("italic('", resA$symbol, "')")
selectLab_italics = paste0(
  "italic('",
  c('Dusp7','Ptpn5','Fcgr3', 'Tcf7','Icos','Vegfc','Gmfg','Cphx1','Gpr15', 'Thy1', 'Nek10', 'Hopx', 'FOXO1', 'Gfi1'),
  "')")

Volplot_M1resA <- EnhancedVolcano(resA,
                lab = lab_italics,
                x = 'log2FoldChange',
                y = 'pvalue',
                selectLab = selectLab_italics,
                xlab = bquote(~Log[2]~ 'fold change'),
                pCutoff = 0.00005,
                FCcutoff = 1.1,
                pointSize = 3.0,
                labSize = 6.0,
                labCol = 'black',
                title = 'M1 - DKO vs NCR',
                subtitle = bquote(italic('FDR <= 0.05 and absolute FC >= 1.1')),
                #border = "full",
                labFace = 'bold',
                boxedLabels = TRUE,
                gridlines.major = F, 
                gridlines.minor = F,
                parseLabels = TRUE,
                col = c('grey30', 'green4', 'blue1', 'red3'),
                colAlpha = 4/5,
                legendPosition = 'bottom',
                legendLabSize = 14,
                legendIconSize = 4.0,
                drawConnectors = TRUE,
                widthConnectors = 1.0,
                colConnectors = 'black') + coord_flip()
Warning: One or more p-values is 0. Converting to 10^-1 * current lowest non-zero p-value...Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Volplot_M1resA

countDataTOTAL <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.count.csv', header = TRUE, sep = ",", row.names = 1) 
metaDataTOTAL <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.metadata.csv', header = TRUE, sep = ",")

metaDataTOTAL$genotype <- factor(metaDataTOTAL$genotype, levels = c("NCR", "FOXO1KO", "GFI1KO", "DKO"))
metaDataTOTAL$condition <- factor(metaDataTOTAL$condition, levels = c("Imm", "M1", "M2"))

 
######################### within each condition, how does KO compare to WT########################################

dds_meta1 <- metaDataTOTAL %>% filter(condition == "M1")
dds_1 <- DESeqDataSetFromMatrix(countData=countDataTOTAL[,which(colnames(countDataTOTAL) %in% dds_meta1$Geneid)], dds_meta1, design=~genotype)
#dds_1 <- dds_1[rowSums(counts(dds_1)) > 8, ]
dds_1 <- DESeq(dds_1)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
resA <- results(dds_1, contrast=c("genotype","DKO","NCR"))
resB <- results(dds_1, contrast=c("genotype","FOXO1KO","NCR"))
resC <- results(dds_1, contrast=c("genotype","GFI1KO","NCR"))
# change the last variable in the sequence to be the control and the second to be the comparitor - this resA here, NCR (control) vs DKO 

  get_lfc_sign <- function(df, thr = 1.1){
  df$lfc_thr <- 0
  df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
  df$lfc_thr[which(df$log2FoldChange <= (-1.1 * thr))] <- -1
  return(df)
}

resA <- get_lfc_sign(resA, thr=1.1)
resB <- get_lfc_sign(resB, thr=1.1)
resC <- get_lfc_sign(resC, thr=1.1)

resA <- resA[order(resA$pvalue, decreasing=FALSE), ]
resB <- resB[order(resB$pvalue, decreasing=FALSE), ]
resC <- resC[order(resC$pvalue, decreasing=FALSE), ]

table(resA$padj < 0.05, resA$lfc_thr)
       
           -1     0     1
  FALSE   910 19202  1251
  TRUE    416  1526   484
table(resB$padj < 0.05, resB$lfc_thr)
       
           -1     0     1
  FALSE   668 18836   692
  TRUE    250   818   152
table(resC$padj < 0.05, resC$lfc_thr)
       
           -1     0     1
  FALSE   543 20370   889
  TRUE     43   504   262
#Add Entrez, Symbol and gene name to resA files
resA$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resA$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resA$name <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
#Add Entrez, Symbol and gene name to resB files
resB$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resB$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resB$name <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
#Add Entrez, Symbol and gene name to resC files
resC$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resC$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
resC$name <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="GENENAME", keytype="ENSEMBL", multiVals="first")
'select()' returned 1:many mapping between keys and columns
# Add normalized count to the result Table
resA<- merge(as.data.frame(resA), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resB<- merge(as.data.frame(resB), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resC<- merge(as.data.frame(resC), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)


#PCA plot
rld <- rlog(dds_1)
# plotPCA(rld, intgroup = c("genotype"))
# plotPCA(rld, intgroup = c("genotype", "condition"))
my_theme <- theme(panel.grid.major = element_blank(),
                  panel.grid.minor = element_blank(),
                  panel.background = element_rect(fill = "white", color = "black"),
                  panel.border = element_rect(color = "black", fill = NA, linewidth = 0.1))
plotPCA(rld, intgroup = "genotype", ntop = 500, returnData = FALSE) + my_theme + geom_point(size=5) + labs(title = "PCA Plot - M1 Gene Expression Data")


resASig <- resA[which(resA$padj < 0.05 & abs(resA$log2FoldChange) >= 1.1 & resA$baseMean >= 20), ]
resBSig <- resB[which(resB$padj < 0.05 & abs(resB$log2FoldChange) >= 1.1 & resB$baseMean >= 20), ]
resCSig <- resC[which(resC$padj < 0.05 & abs(resC$log2FoldChange) >= 1.1 & resC$baseMean >= 20), ]

mat <- assay(rld)
idA <- resASig$Row.names
DEgenes <- mat[idA,]
colData(rld)
DataFrame with 20 rows and 5 columns
           Geneid condition genotype     batch sizeFactor
      <character>  <factor> <factor> <integer>  <numeric>
JD.5         JD.5        M1   NCR            1   0.930689
JD.8         JD.8        M1   NCR            1   1.011171
JD.11       JD.11        M1   NCR            1   0.994989
JD.14       JD.14        M1   NCR            1   0.788172
JD.17       JD.17        M1   GFI1KO         1   0.740554
...           ...       ...      ...       ...        ...
D11           D11        M1  DKO             2    1.31403
E2             E2        M1  NCR             2    1.42852
E5             E5        M1  GFI1KO          2    1.19067
E8             E8        M1  FOXO1KO         2    1.56719
E11           E11        M1  DKO             2    1.03036
annotation <- data.frame(genotype=colData(rld)$genotype)
genotype_colors <- list("genotype"=c("NCR" = "grey65", "FOXO1KO" = "goldenrod1", "GFI1KO" = "indianred1", "DKO" = "slateblue1"))

bwr_99 <- colorRampPalette(c("cyan", "black", "yellow"))(n = 99)

#Dendrogram
pheatmap(DEgenes, 
         scale = "row", 
         show_rownames = F,
         show_colnames = F,
         color = bwr_99,
         clustering_distance_rows = "correlation", 
         annotation_col = annotation, 
         annotation_colors = genotype_colors,
         treeheight_row = 30,
         treeheight_col = 15,
         main="M1 DEG")

# Candidate gene heatmaps

genesEff <- c("Lag3", "Batf3", "Tnfsf8", "Gzmm", "Il21r", "Itk", "Tnfsf10", "Tlr3", "Foxo1", "Socs1", "Cd2ap", "Gzmk", "Il18r1", "Il12rb2", "Zeb2", "Prf1", "Gmzb", "Ifng", "Ccl4", "Sh2d1b1")
genesDevDiff <- c("Lef1", "Gata6", "Hhat", "Eomes", "Runx2", "Sox4", "Id3", "Tox", "Il4ra", "Dtx1", "Cd27", "Tcf7", "Klrg1", "Tbx21", "Spi1", "Id2")
genesCellSurface <- c("Kit", "Thy1", "Klra1", "Klra3", "Klra4", "Klra8", "Klra9", "Klra17", "Klrb1a", "Klrc1", "Klrc3", "Klrd1", "Ncr1", "Cd244a", "Klrg1", "Cd226", "Il2rb", "Klrb1c", "Fcgr3", "Sell", "Cd27", "H2-Ab1", "Cd86", "Cd69", "Klrd1", "Lair1", "Lag3", "Nkrp1", "Ccr5", "Ccr7", "Cxcr1", "Cxcr2", "Cxcr3", "Cxcr4", "Cx3cr1", "S1pr5", "Ccr2", "Il2ra", "Il12rb1", "Il15ra", "Il18r1", "Fcgr1", "Fcgr2a", "Fcgr2b", "Fcgr2c", "Fcgr3")
genesMetabolism <- c("Ppargc1a", "Ppara", "Ppard", "Pparg", "Sirt1", "Foxo1", "Foxo3", "Foxo4", "Acox1", "Acadm", "Acads", "Acsl1", "Acsl3", "Acsl4", "Acsl5", "Acsl6", "Akr1b8", "Akr1b3", "Aldh1a1", "Aldh2", "Aldh3a1", "Aldh4a1", "Aldh5a1", "Cpt1a", "Cpt1b", "Cpt2", "Fasn", "Fabp1", "Fabp2", "Fabp3", "Fabp4", "Fabp5", "Glut1", "Glut2", "Glut3", "Glut4", "G6pd", "Pck1", "Pck2", "Hk1", "Hk2", "Gck", "Pdk1", "Pdk2", "Pdk3")
genesProliferationCellSurvival <- c("Klf2", "Bbc3", "Klf4", "Il7r", "Bcl2a1b", "Bag3", "Bcl2", "Bcl2l1",  "Bcl2l11", "Mcl1", "Bax", "Bad", "Bid", "Bak1", "Casp3", "Casp8", "Casp9", "Xiap", "Survivin", "Pten", "Akt1", "Akt2", "Akt3", "Mtor", "Rb1", "E2f1", "E2f", "E2f3", "Tp53", "Cdkn1a", "Cdkn1b", "Cdkn2a", "Cdkn2b", "Nfkb1", "Nfkb2", "Rela", "Relb", "Tnf", "Tnfrsf1a", "Tnfrsf1b", "Mapk1", "Mapk3",  "Mapk8", "Mapk9", "Mapk14", "Mapk12", "Mapk13", "Map3k1", "Map3k5", "Map3k7")
genesTranscriptionFactors <- c("Gfi1", "Gfi1b", "Eomes", "Tbx21", "Nfil3", "Id2", "Nfkbia", "Nfkbid", "Nfkbiz", "Spi1", "Nr4a1", "Ncr1", "Tcf7", "Hobit", "Blimp1", "Gata3", "Il7r", "Tox", "Tox2", "Zbtb16", "Rora", "Ets1", "Eomes", "Tox2", "Tbx21", "Zeb2", "Runx3", "Id2", "Nfil3", "Tox", "Gata3", "Prdm1", "Batf", "Batf3", "Irf4", "Irf8", "Stat4", "Stat5", "c-Myc", "Notch1", "Ttf1", "Nkx3-1", "Gfi1", "Gfi1b", "Foxo1", "Foxo3", "Foxo4", "Spi1", "Spib", "Spic", "Ets1", "Ets2", "Pax5", "Pax8")
genesMigration <- c("Cxcr3", "Ccr5", "Ccr2", "Gpr183", "Ccr7", "Itgb3", "Cd63", "Itgad", "Gpr155", "Itgb5", "Ccr10", "Cxcr4", "Sell", "Itga4", "Itga6", "Itga1", "S1pr5", "Cxcr3", "Cx3cr1", "Ccr7", "S1pr5", "Cxcr4", "Cxcr6", "Ccr5", "Ccr9", "Ccr2", "Ccr6", "Itga4", "Itgb1", "Itgb2", "ItgaL", "ItgaM", "Itga1", "ItgaE", "ItgaD", "Itgb7", "ItgaE", "Selp", "Sele", "Sell", "Cldn1", "Cldn2", "Cldn3", "Cldn4", "Cldn5", "Cldn7", "Cldn8", "Cldn9", "Cldn10", "Cldn11", "Cldn12", "Cldn15", "Cldn16", "Cldn17", "Cldn18", "Cldn19", "Cldn20")
genesActivationInhibition <- c("Klra8", "Klra7", "Klra4", "Klra3", "Klra1", "Klrd1", "Cd94", "Lair-1", "Ptpn6", "Shp1", "Cish", "Cd244a", "Cd38", "Il6st", "Klrc1", "Cd101", "Hmbox1", "Klra3", "Klra1", "Cd226", "Cd96", "Cd200r1", "Ahr", "Cd72", "Tigit", "Cd69", "Ncr", "Klrb1c", "Klrb1b", "Cd244", "Ly49h", "Klrk1", "Cd226", "Cd25", "Il2rb", "Ifngr1", "Ifnar1", "Stat4", "Stat5", "Stat1", "Prf1", "Gzma", "Gzmb", "Gzmk", "Gzmm", "Tnfsf10", "Fasl", "Fas", "Ifng")
genesCytokineChemokine <- c("Ifng", "Tnf", "Csf2", "Il10", "Il2", "Il12", "Il15", "Il18", "Il22", "Il27", "Il33", "Cxcl1", "Cxcl2", "Cxcl3", "Cxcl4", "Cxcl5", "Cxcl9", "Cxcl10", "Cxcl11", "Cxcl12", "Cxcl13", "Cxcl16", "Ccl1", "Ccl2", "Ccl3", "Ccl4", "Ccl5", "Ccl6", "Ccl7", "Ccl8", "Ccl9", "Ccl10", "Ccl11", "Ccl12", "Ccl17", "Ccl19", "Ccl20", "Ccl21", "Ccl22", "Ccl25", "Ccl27", "Ccl28")
genesProteinValidation <- c("Itga2", "Itgam", "Thy1", "Thy1.2", "Klrb1c", "Nkrp1c", "Kit", "Tbx21", "Eomes", "Foxo1", "Ncr1", "Ncr", "Cxcr3", "Cx3cr1", "Tox", "Klrg1", "Sell", "Havcr2", "Il7r", "Icos", "Tcf7", "Il18r1", "Cd226", "Cd38", "Lag3")


plot_genes_heatmap <- function(resA, expression_mat, dds_meta1, keep_genes, keep_groups = c("NCR", "FOXO1KO", "DKO", "GFI1KO"), padj_cutoff = 0.05, log2FoldChange_cutoff = 1.1, baseMean_cutoff = 20, ...) {
  resA$significant <- ifelse(resA$padj < padj_cutoff & abs(resA$log2FoldChange) >= log2FoldChange_cutoff & resA$baseMean >= baseMean_cutoff & !is.na(resA$symbol), "yes", "no")
  df1 <- data.frame(expression_mat, "Row.names" = row.names(expression_mat))
  df2 <- data.frame("Row.names" = resA$Row.names, "significant" = resA$significant, "genes" = resA$symbol)
  mat <- dplyr::full_join(df1, df2, by = "Row.names")  #df3, df4,
  mat <- mat[, -which(colnames(mat) == "Row.names")]
  mat <- mat %>% filter(genes %in% keep_genes)
  row.names(mat) <- mat[, ncol(mat)]
  mat <- mat[, -ncol(mat)]
  mat <- mat[rowSums(mat[, -ncol(mat)]) > 0, ]
  row_anno <- mat[, "significant", drop = F]
  mat <- mat[, dds_meta1 %>% filter(genotype %in% keep_groups) %>% dplyr::select(Geneid) %>% unlist() %>% as.character()]
  col_anno <- dds_meta1 %>%
    filter(genotype %in% keep_groups) %>%
    dplyr::select(genotype)
  row.names(col_anno) <- dds_meta1 %>%
    filter(genotype %in% keep_groups) %>%
    dplyr::select(Geneid) %>%
    unlist() %>%
    as.character()
  
  genotype_colors <- list("genotype" = c("NCR" = "grey65", "FOXO1KO" = "goldenrod1", "GFI1KO" = "indianred1", "DKO" = "slateblue1"))
  sig_col <- list("significant" = c("yes" = "#e15179", "no" = "#e7e7e7"))
  pheatmap(mat,
           annotation_colors = c(genotype_colors, sig_col),
           annotation_col = col_anno,
           annotation_row = row_anno,
           ...
  )
}

plot_genes_heatmap(resA, assay(rld), dds_meta1, genesProteinValidation, keep_groups = c("NCR", "FOXO1KO", "DKO", "GFI1KO"),
                   show_rownames = T,
                   cluster_cols = T,
                   show_colnames = F,
                   cluster_rows = T,
                   scale="row",
                   #border_color = NA,
                   treeheight_row = 10,
                   clustering_distance_rows = "correlation",
                   color = bwr_99)
Warning: The input is a data frame, convert it to the matrix.

##VENN diagram DEG

set1 <- resASig$symbol
set2 <- resBSig$symbol
set3 <- resCSig$symbol
colors <- c("slateblue1", "goldenrod1", "indianred1")

set1<-na.omit(set1)
set2<-na.omit(set2)
set3<-na.omit(set3)

venn.diagram(
  x = list(set1, set2, set3),
  category.names = c("DKO", "FOXO1KO", "GFI1KO"),
  filename = 'DEG M1 VennDiagram.png',
  output = TRUE,
  imagetype = "png",
  scaled = FALSE,
  col = "black",
  fill = colors,
  cat.col = colors,
  cat.cex = 1.5,   # Adjust the font size to make the labels smaller
  cat.pos = 0,     # Move the labels outside the circles
  cat.dist = c(0.05, 0.05, -0.45),  # Adjust the distance for each category label individually
  margin = 0.15
)
[1] 1
#GO BP analysis 

#BP
orderSigresA_GO_UP <- resASig %>% filter(padj < 0.05 & (log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))
orderSigresA_GO_DOWN <- resASig %>% filter(padj < 0.05 & (log2FoldChange) <= -1.1 & baseMean >= 20 & !is.na(symbol))


egoBP_UPresA <- enrichGO(gene         = orderSigresA_GO_UP$entrez,
                 OrgDb         = org.Mm.eg.db,
                 keyType       = 'ENTREZID',
                 ont           = "BP",
                 pAdjustMethod = "BH",
                 pvalueCutoff  = 0.05,
                 )
head(egoBP_UPresA)

dotplot(egoBP_UPresA, showCategory = 15)


egoBP_DOWNresA <- enrichGO(gene         = orderSigresA_GO_DOWN$entrez,
                     OrgDb         = org.Mm.eg.db,
                     keyType       = 'ENTREZID',
                     ont           = "BP",
                     pAdjustMethod = "BH",
                     pvalueCutoff  = 0.05,
)
head(egoBP_DOWNresA)
dotplot(egoBP_DOWNresA, showCategory = 15)


write.csv(egoBP_UPresA, "M1_NCR_vs_DKO_GOBPup.csv")
write.csv(egoBP_DOWNresA, "M1_NCR_vs_DKO_GOBPdown.csv")
#KEGG pathway analysis

orderSigresA_GO_UP <- resASig %>% filter(padj < 0.05 & (log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))

KEGGup <- enrichKEGG(gene         = orderSigresA_GO_UP$entrez,
                 organism     = 'mmu',
                 keyType = "ncbi-geneid",
                 pvalueCutoff = 1,
                 qvalueCutoff = 1)
Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
Warning: the 'wininet' method is deprecated for http:// and https:// URLsReading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
Warning: the 'wininet' method is deprecated for http:// and https:// URLsReading KEGG annotation online: "https://rest.kegg.jp/conv/ncbi-geneid/mmu"...
Warning: the 'wininet' method is deprecated for http:// and https:// URLs
dotplot(KEGGup, showCategory = 10)




orderSigresA_GO_DOWN <- resASig %>% filter(padj < 0.05 & (log2FoldChange) <= -1.1 & baseMean >= 20 & !is.na(symbol))

KEGGdown <- enrichKEGG(gene         = orderSigresA_GO_DOWN$entrez,
                 organism     = 'mmu',
                 keyType = "ncbi-geneid",
                 pvalueCutoff = 1,
                 qvalueCutoff = 1)
dotplot(KEGGdown, showCategory = 10)


write.csv(KEGGup, "M1_NCR_vs_DKO_KEGGup.csv")
write.csv(KEGGdown, "M1_NCR_vs_DKO_KEGGdown.csv")
#GSEA

#change log2foldchange to negative values to see underexpression?
#DKO
tmp <- resA[!is.na(resA$symbol), ]
tmp <- tmp[order(tmp$stat, decreasing = T), ]
testgenes <- tmp$stat
names(testgenes) <- tmp$symbol
testgenes <- testgenes[!is.na(testgenes)]

gene_sets <- msigdbr(species = "mouse", category = "H")

gene_sets <- gene_sets %>%
  dplyr::select(gs_name, gene_symbol)

gsearesA <- GSEA(geneList = testgenes,
                TERM2GENE = gene_sets)
preparing geneSet collections...
GSEA analysis...
Warning: There are ties in the preranked stats (4.56% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.Warning: There are duplicate gene names, fgsea may produce unexpected results.Warning: 'package:stats' may not be available when loadingWarning: 'package:stats' may not be available when loadingleading edge analysis...
done...
dotplot(gsearesA)


View(gsearesA)
dfgsearesA <- as.data.frame(gsearesA)



#attempting to overlap dfgsearesA/B/C

# str(dfgsearesA)
# str(dfgsearesB)
# str(dfgsearesC)
# 
# celltypes = c('Imm', 'M1', 'M2')
# groups = c("NCR", "FOXO1KO", "GFI1KO", "DKO")



#adding a score for GSEA
resdf2 <- resA %>%
  arrange(padj) %>%
  mutate(gsea_metric = -log10(padj) * sign(log2FoldChange))

resdf2 <- resdf2 %>%
  mutate(padj = case_when(padj == 0 ~ .Machine$double.xmin,
                          TRUE ~ padj)) %>%
  mutate(gsea_metric = -log10(padj) * sign(log2FoldChange))
resdf2

resdf2 <- resdf2 %>%
  filter(! is.na(gsea_metric)) %>%
  arrange(desc(gsea_metric))
View(resdf2)

hist(resdf2$gsea_metric, breaks = 100)


ranks <- resdf2 %>%
  select(symbol, gsea_metric) %>%
  distinct(symbol, .keep_all = TRUE) %>%
  deframe()

head(ranks)
   Ptpn5 Cdc42bpb    Grhl3    Bend5   Rhbdf2      Dsp 
307.6527 307.6527 228.2902 188.3544 172.5150 147.6078 
gseares <- GSEA(geneList = ranks,
                TERM2GENE = gene_sets)
preparing geneSet collections...
GSEA analysis...
Warning: There are ties in the preranked stats (44.04% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.Warning: 'package:stats' may not be available when loadingno term enriched under specific pvalueCutoff...
gsearesdf <- as.data.frame(gseares)

dotplot(gseares)
Error in `$<-.data.frame`(`*tmp*`, ".sign", value = "activated") : 
  replacement has 1 row, data has 0
# # compared to Wt, how does KO change in condition
# ############################################################################################################
# You helped write this initial code 2 months ago - I haven't fully thought through how I would present data that look at IMM vs M1 vs M2 in each KO - maybe this comparison could be useful for looking at how pathways or key transcription factors change as the cells develop from IMM into M1 into M2

# # FOXO1KO vs WT and Imm vs M1
# dds_meta1 <- metaDataTOTAL %>% filter((genotype %in% c("NCR", "FOXO1KO")) & (condition %in% c("Imm", "M1")))
# dds_1 <- DESeqDataSetFromMatrix(countData=countDataTOTAL[,which(colnames(countDataTOTAL) %in% dds_meta1$Geneid)], dds_meta1, design= ~ 0 + condition + genotype + genotype:condition)
# dds_1 <- dds_1[rowSums(counts(dds_1)) > 5, ]
# dds_1 <- DESeq(dds_1)
# 
# resultsNames(dds_1)
# resA <- results(dds_1, name = 'conditionM1.genotypeFOXO1KO')
# 
# get_lfc_sign <- function(df, thr = 1){
#   df$lfc_thr <- 0
#   df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
#   df$lfc_thr[which(df$log2FoldChange <= (-1 * thr))] <- -1
#   return(df)
# }
# 
# resA <- get_lfc_sign(resA, thr=1)
# 
# resA <- resA[order(resA$pvalue, decreasing=FALSE), ]
# 
# table(resA$padj < 0.05, resA$lfc_thr)
# 
# write.csv(resA, "filenames.csv")

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

---
title: "Belz Lab NK cell RNA seq v0.1 11.8.23"
output: html_notebook
---

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*. 

```{r}
setwd('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis')

library(fgsea)
library(stats)
library(cowplot)
library(DESeq2)
library(ggplot2)
library(EnhancedVolcano)
library(genefilter)
library(calibrate)
library(RColorBrewer)
library(ComplexHeatmap)
library(goseq)
library(geneLenDataBase)
library(gage)
library(AnnotationDbi)
library(org.Mm.eg.db)
library(reshape2)
library(pathview)
library(dplyr)
library(VennDiagram)
library(magrittr)
library(clusterProfiler)
library(AnnotationDbi)
library(PoiClaClu)
library(tidyr)
library(tibble)
library(msigdbr)
library(ggpubr)
library(GO.db)
library(GOstats)
library(gageData)
```

```{r}
#Looking at the data without consideration of condition, only genotype

countDataNK <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.count.csv', header = TRUE, sep = ",")
metaDataNK <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.metadata.csv', header = TRUE, sep = ",")

#First attempting to look at the data without subsetting based on condition
dds <- DESeqDataSetFromMatrix(countData=countDataNK, DataFrame(metaDataNK), design=~genotype, tidy = TRUE)
dds <- DESeq(dds)

vsd <- vst(dds, blind=FALSE)

# plotPCA(vsd, intgroup="condition")
# plotPCA(vsd, intgroup="genotype")
# plotPCA(vsd, intgroup= c("genotype", "condition"))


pcaData <- plotPCA(vsd, intgroup=c("condition", "genotype"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))

pcaplot<-ggplot(pcaData, aes(PC1, PC2, color=genotype, shape=condition)) +
  geom_point(size=5) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) +
  theme_light()

ggsave(filename = "PCA-plot.tiff",
  plot = pcaplot, device = "tiff", path = NULL, scale = 1, width = 5, height = 5, units = c("in"), dpi = 600)
pcaplot
```

```{r}
#Trying to make a VENN diagram indiscriminate of subsets - just all subsets and genotype as the main comparitor

resA_DKO <- results(dds, contrast=c("genotype","DKO","NCR"))
resB_FOXO1 <- results(dds, contrast=c("genotype","FOXO1KO","NCR"))
resC_GFI1 <- results(dds, contrast=c("genotype","GFI1KO","NCR"))

  get_lfc_sign <- function(df, thr = 1.1){
  df$lfc_thr <- 0
  df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
  df$lfc_thr[which(df$log2FoldChange <= (-1.1 * thr))] <- -1
  return(df)
}

resA_DKO <- get_lfc_sign(resA_DKO, thr=1.1)
resB_FOXO1 <- get_lfc_sign(resB_FOXO1, thr=1.1)
resC_GFI1 <- get_lfc_sign(resC_GFI1, thr=1.1)

resA_DKO <- resA_DKO[order(resA_DKO$pvalue, decreasing=FALSE), ]
resB_FOXO1 <- resB_FOXO1[order(resB_FOXO1$pvalue, decreasing=FALSE), ]
resC_GFI1 <- resC_GFI1[order(resC_GFI1$pvalue, decreasing=FALSE), ]

table(resA_DKO$padj < 0.05, resA_DKO$lfc_thr)
table(resB_FOXO1$padj < 0.05, resB_FOXO1$lfc_thr)
table(resC_GFI1$padj < 0.05, resC_GFI1$lfc_thr)

resA_DKO$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resA_DKO), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resA_DKO$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resA_DKO), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resA_DKO$name <- mapIds(org.Mm.eg.db, keys=row.names(resA_DKO), column="GENENAME", keytype="ENSEMBL", multiVals="first")

#Add Entrez, Symbol and gene name to resB_FOXO1 files
resB_FOXO1$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resB_FOXO1), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resB_FOXO1$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resB_FOXO1), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resB_FOXO1$name <- mapIds(org.Mm.eg.db, keys=row.names(resB_FOXO1), column="GENENAME", keytype="ENSEMBL", multiVals="first")

#Add Entrez, Symbol and gene name to resC_GFI1 files
resC_GFI1$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resC_GFI1), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resC_GFI1$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resC_GFI1), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resC_GFI1$name <- mapIds(org.Mm.eg.db, keys=row.names(resC_GFI1), column="GENENAME", keytype="ENSEMBL", multiVals="first")

resASig_NK <- resA_DKO[which(resA_DKO$padj < 0.05 & abs(resA_DKO$log2FoldChange) >= 1.1 & resA_DKO$baseMean >= 20), ]
resBSig_NK <- resB_FOXO1[which(resB_FOXO1$padj < 0.05 & abs(resB_FOXO1$log2FoldChange) >= 1.1 & resB_FOXO1$baseMean >= 20), ]
resCSig_NK <- resC_GFI1[which(resC_GFI1$padj < 0.05 & abs(resC_GFI1$log2FoldChange) >= 1.1 & resC_GFI1$baseMean >= 20), ]

write.csv(resASig_NK, "NK_NCR_vs_DKO_0.05pAdj1.1L2FC20BaseMean.csv")
write.csv(resBSig_NK, "NK_NCR_vs_FOXO1KO_0.05pAdj1.1L2FC20BaseMean.csv")
write.csv(resCSig_NK, "NK_NCR_vs_GFI1KO_0.05pAdj1.1L2FC20BaseMean.csv")


##VENN diagram DEG of ALL NK indiscriminate of Subset

set1NK <- resASig_NK$symbol
set2NK <- resBSig_NK$symbol
set3NK <- resCSig_NK$symbol
colors <- c("slateblue1", "goldenrod1", "indianred1")

set1NK<-na.omit(set1NK)
set2NK<-na.omit(set2NK)
set3NK<-na.omit(set3NK)

venn.diagram(
  x = list(set1NK, set2NK, set3NK),
  category.names = c("DKO", "FOXO1KO", "GFI1KO"),
  filename = 'DEG ALL NK VennDiagram.png',
  output = TRUE,
  imagetype = "png",
  scaled = FALSE,
  col = "black",
  fill = colors,
  cat.col = colors,
  cat.cex = 1.5,   # Adjust the font size to make the labels smaller
  cat.pos = 0,     # Move the labels outside the circles
  cat.dist = c(0.05, 0.05, -0.45),  # Adjust the distance for each category label individually
  margin = 0.15
)
```

```{r}
#note this is now just metaData and countData - not total as we're subsetting populations

countData <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.count.csv', header = TRUE, sep = ",", row.names = 1)
metaData <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.metadata.csv', header = TRUE, sep = ",")

metaData$genotype <- factor(metaData$genotype, levels = c("NCR", "FOXO1KO", "GFI1KO", "DKO"))
metaData$condition <- factor(metaData$condition, levels = c("Imm", "M1", "M2"))

######################### within each condition, how does KO compare to WT########################################

dds_meta1 <- metaData %>% filter(condition == "M1")
dds_1 <- DESeqDataSetFromMatrix(countData=countData[,which(colnames(countData) %in% dds_meta1$Geneid)], dds_meta1, design=~genotype)
#dds_1 <- dds_1[rowSums(counts(dds_1)) > 8, ]
dds_1 <- DESeq(dds_1)

resA <- results(dds_1, contrast=c("genotype","DKO","NCR"))
resB <- results(dds_1, contrast=c("genotype","FOXO1KO","NCR"))
resC <- results(dds_1, contrast=c("genotype","GFI1KO","NCR"))
# change the last variable in the sequence to be the control and the second to be the comparitor - this resA here, NCR (control) vs DKO 

  get_lfc_sign <- function(df, thr = 1.1){
  df$lfc_thr <- 0
  df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
  df$lfc_thr[which(df$log2FoldChange <= (-1.1 * thr))] <- -1
  return(df)
}

resA <- get_lfc_sign(resA, thr=1.1)
resB <- get_lfc_sign(resB, thr=1.1)
resC <- get_lfc_sign(resC, thr=1.1)

resA <- resA[order(resA$pvalue, decreasing=FALSE), ]
resB <- resB[order(resB$pvalue, decreasing=FALSE), ]
resC <- resC[order(resC$pvalue, decreasing=FALSE), ]

table(resA$padj < 0.05, resA$lfc_thr)
table(resB$padj < 0.05, resB$lfc_thr)
table(resC$padj < 0.05, resC$lfc_thr)
 

#MA plot

plotMA(
  resA,
  #padj = 0.05,
  #log2Foldchange = 1.5,
  alpha = 0.5,
  #select.top.method = c("padj", "log2FoldChange"),
  #top = 20,
  #select.top.method = c("padj", "log2FoldChange"),
  main = "M1 NK cells - NCR Vs DKO",
  xlab = "Average log2 counts per million",
  ylab = "Log2 Fold Change",
  ylim = c(-15, 15),
  colNonSig = "black",
  #palette = c("#B31B21", "#1465AC", "darkgray"),
  #colSig = "firebrick1",
  colSig = "dodgerblue",
  returnData = FALSE,
  MLE = FALSE,
  #ggtheme = theme_classic(),
)
```


```{r}
#PCA plot

rld <- rlog(dds_1)
# plotPCA(rld, intgroup = c("genotype"))
# plotPCA(rld, intgroup = c("genotype", "condition"))
my_theme <- theme(panel.grid.major = element_blank(),
                  panel.grid.minor = element_blank(),
                  panel.background = element_rect(fill = "white", color = "black"),
                  panel.border = element_rect(color = "black", fill = NA, linewidth = 0.1))
plotPCA(rld, intgroup = "genotype", ntop = 500, returnData = FALSE) + my_theme + geom_point(size=5) + labs(title = "PCA Plot - M1 Gene Expression Data")
```


```{r}
resA$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resA$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resA$name <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="GENENAME", keytype="ENSEMBL", multiVals="first")

#Add Entrez, Symbol and gene name to resB files
resB$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resB$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resB$name <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="GENENAME", keytype="ENSEMBL", multiVals="first")

#Add Entrez, Symbol and gene name to resC files
resC$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resC$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resC$name <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="GENENAME", keytype="ENSEMBL", multiVals="first")


# Add normalized count to the result Table
resAanno<- merge(as.data.frame(resA), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resBanno<- merge(as.data.frame(resB), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resCanno<- merge(as.data.frame(resC), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)

write.csv(resAanno, "M1_NCR_vs_DKO.csv")
write.csv(resBanno, "M1_NCR_vs_FOXO1KO.csv")
write.csv(resCanno, "M1_NCR_vs_GFI1KO.csv")
```

```{r}
## This is just me playing around with a volcano plot first to see lots of genes and then a second one where I've handpicked some genes to map on 

#volcano plot resA
volplot_resA_discovery <- EnhancedVolcano(resA, 
                           lab = resA$symbol, 
                           x = "log2FoldChange", 
                           y = "padj", 
                           border = "full", 
                           borderColour = "black", 
                           gridlines.major = F, 
                           gridlines.minor = F,
                           title = 'M1 - DKO vs NCR',
                           subtitle = bquote(italic('FDR <= 0.05 and absolute FC >= 1.1')),
                              #selectLab = c('Eomes','FOXO1','Gfi1','Id3','Pdcd1','Gzma','Gzmb', 'Tbx21'),
                           pCutoff = 0.00005,
                           FCcutoff = 1.1,
                           pointSize = 3.0,
                           labSize = 3.5,
                           legendPosition = 'none',
                           xlim = c(-10, 15),
                              #legendLabels = c("NS", expression(Log[2] ~ FC), "p-value", expression(p - value ~ and ~ log[2] ~ FC)),
                           #drawConnectors = TRUE, 
                           max.overlaps = 100,
                           lengthConnectors = unit(0.001, "npc"),
                           typeConnectors = 'closed',
                           endsConnectors = "last",
                           widthConnectors = 0.1,
                              #col=c("grey", "green4", "blue", "red3"),
                           colAlpha = 0.5
)
volplot_resA_discovery
ggsave(filename = "Volplot M1 DKO NK cells_search.tiff",
       plot = volplot_resA_discovery, device = "tiff", path = NULL, scale = 1, width = 8, height = 8, units = c("in"), dpi = 600)
```


```{r}
######## second volcano plot 


lab_italics <- paste0("italic('", resA$symbol, "')")
selectLab_italics = paste0(
  "italic('",
  c('Dusp7','Ptpn5','Fcgr3', 'Tcf7','Icos','Vegfc','Gmfg','Cphx1','Gpr15', 'Thy1', 'Nek10', 'Hopx', 'FOXO1', 'Gfi1'),
  "')")

Volplot_M1resA <- EnhancedVolcano(resA,
                lab = lab_italics,
                x = 'log2FoldChange',
                y = 'pvalue',
                selectLab = selectLab_italics,
                xlab = bquote(~Log[2]~ 'fold change'),
                pCutoff = 0.00005,
                FCcutoff = 1.1,
                pointSize = 3.0,
                labSize = 6.0,
                labCol = 'black',
                title = 'M1 - DKO vs NCR',
                subtitle = bquote(italic('FDR <= 0.05 and absolute FC >= 1.1')),
                #border = "full",
                labFace = 'bold',
                boxedLabels = TRUE,
                gridlines.major = F, 
                gridlines.minor = F,
                parseLabels = TRUE,
                col = c('grey30', 'green4', 'blue1', 'red3'),
                colAlpha = 4/5,
                legendPosition = 'bottom',
                legendLabSize = 14,
                legendIconSize = 4.0,
                drawConnectors = TRUE,
                widthConnectors = 1.0,
                colConnectors = 'black') + coord_flip()
Volplot_M1resA
```

```{r}
countDataTOTAL <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.count.csv', header = TRUE, sep = ",", row.names = 1) 
metaDataTOTAL <- read.csv('R://JDBG2022-A6322/Results/Sequencing/First and Second Batch combined analysis/JD.metadata.csv', header = TRUE, sep = ",")

metaDataTOTAL$genotype <- factor(metaDataTOTAL$genotype, levels = c("NCR", "FOXO1KO", "GFI1KO", "DKO"))
metaDataTOTAL$condition <- factor(metaDataTOTAL$condition, levels = c("Imm", "M1", "M2"))

 
######################### within each condition, how does KO compare to WT########################################

dds_meta1 <- metaDataTOTAL %>% filter(condition == "M1")
dds_1 <- DESeqDataSetFromMatrix(countData=countDataTOTAL[,which(colnames(countDataTOTAL) %in% dds_meta1$Geneid)], dds_meta1, design=~genotype)
#dds_1 <- dds_1[rowSums(counts(dds_1)) > 8, ]
dds_1 <- DESeq(dds_1)

resA <- results(dds_1, contrast=c("genotype","DKO","NCR"))
resB <- results(dds_1, contrast=c("genotype","FOXO1KO","NCR"))
resC <- results(dds_1, contrast=c("genotype","GFI1KO","NCR"))
# change the last variable in the sequence to be the control and the second to be the comparitor - this resA here, NCR (control) vs DKO 

  get_lfc_sign <- function(df, thr = 1.1){
  df$lfc_thr <- 0
  df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
  df$lfc_thr[which(df$log2FoldChange <= (-1.1 * thr))] <- -1
  return(df)
}

resA <- get_lfc_sign(resA, thr=1.1)
resB <- get_lfc_sign(resB, thr=1.1)
resC <- get_lfc_sign(resC, thr=1.1)

resA <- resA[order(resA$pvalue, decreasing=FALSE), ]
resB <- resB[order(resB$pvalue, decreasing=FALSE), ]
resC <- resC[order(resC$pvalue, decreasing=FALSE), ]

table(resA$padj < 0.05, resA$lfc_thr)
table(resB$padj < 0.05, resB$lfc_thr)
table(resC$padj < 0.05, resC$lfc_thr)

#Add Entrez, Symbol and gene name to resA files
resA$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resA$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resA$name <- mapIds(org.Mm.eg.db, keys=row.names(resA), column="GENENAME", keytype="ENSEMBL", multiVals="first")

#Add Entrez, Symbol and gene name to resB files
resB$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resB$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resB$name <- mapIds(org.Mm.eg.db, keys=row.names(resB), column="GENENAME", keytype="ENSEMBL", multiVals="first")

#Add Entrez, Symbol and gene name to resC files
resC$symbol <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="SYMBOL", keytype="ENSEMBL", multiVals="first")
resC$entrez <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="ENTREZID", keytype="ENSEMBL", multiVals="first")
resC$name <- mapIds(org.Mm.eg.db, keys=row.names(resC), column="GENENAME", keytype="ENSEMBL", multiVals="first")


# Add normalized count to the result Table
resA<- merge(as.data.frame(resA), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resB<- merge(as.data.frame(resB), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)
resC<- merge(as.data.frame(resC), as.data.frame(counts(dds_1, normalized=TRUE)), by="row.names", sort=FALSE)


#PCA plot
rld <- rlog(dds_1)
# plotPCA(rld, intgroup = c("genotype"))
# plotPCA(rld, intgroup = c("genotype", "condition"))
my_theme <- theme(panel.grid.major = element_blank(),
                  panel.grid.minor = element_blank(),
                  panel.background = element_rect(fill = "white", color = "black"),
                  panel.border = element_rect(color = "black", fill = NA, linewidth = 0.1))
plotPCA(rld, intgroup = "genotype", ntop = 500, returnData = FALSE) + my_theme + geom_point(size=5) + labs(title = "PCA Plot - M1 Gene Expression Data")

resASig <- resA[which(resA$padj < 0.05 & abs(resA$log2FoldChange) >= 1.1 & resA$baseMean >= 20), ]
resBSig <- resB[which(resB$padj < 0.05 & abs(resB$log2FoldChange) >= 1.1 & resB$baseMean >= 20), ]
resCSig <- resC[which(resC$padj < 0.05 & abs(resC$log2FoldChange) >= 1.1 & resC$baseMean >= 20), ]

mat <- assay(rld)
idA <- resASig$Row.names
DEgenes <- mat[idA,]
colData(rld)

annotation <- data.frame(genotype=colData(rld)$genotype)
genotype_colors <- list("genotype"=c("NCR" = "grey65", "FOXO1KO" = "goldenrod1", "GFI1KO" = "indianred1", "DKO" = "slateblue1"))

bwr_99 <- colorRampPalette(c("cyan", "black", "yellow"))(n = 99)

#Dendrogram
pheatmap(DEgenes, 
         scale = "row", 
         show_rownames = F,
         show_colnames = F,
         color = bwr_99,
         clustering_distance_rows = "correlation", 
         annotation_col = annotation, 
         annotation_colors = genotype_colors,
         treeheight_row = 30,
         treeheight_col = 15,
         main="M1 DEG")
```

```{r}
#TOP 40 DEG

orderSigresA <- resASig %>% filter(padj < 0.05 & abs(log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))
orderSigresB <- resBSig %>% filter(padj < 0.05 & abs(log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))
orderSigresC <- resCSig %>% filter(padj < 0.05 & abs(log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))

write.csv(orderSigresA, "M1_NCR_vs_DKO0.05pAdj1.1L2FC20BaseMean.csv")
write.csv(orderSigresB, "M1_NCR_vs_FOXO1KO0.05pAdj1.1L2FC20BaseMean.csv")
write.csv(orderSigresC, "M1_NCR_vs_GFI1KO0.05pAdj1.1L2FC20BaseMean.csv")


id1 <- orderSigresA$Row.names
id2 <- orderSigresA$symbol
topDE <- mat[id1, dds_meta1 %>% filter(genotype %in% c("DKO", "NCR")) %>% select(Geneid) %>% unlist %>% as.character]
rownames(topDE) <- id2
top40DE <- head(topDE, n=40)
#bwr_90 <- colorRampPalette(c("blue", "white", "red"))(n = 99)


#Top 40 DE genes from dds_1
pheatmap(top40DE, 
         scale = "row", 
         show_rownames = T,
         show_colnames = F,
         clustering_distance_rows = "correlation",
         annotation_colors = genotype_colors,
         border_color = NA,
         cellheight = 12,
         legend = T,
        #angle_col = c("270", "0", "45", "90", "315")
         cellwidth = 12,
         treeheight_col = 5,
         treeheight_row = 5,
         annotation_col = dds_meta1 %>% filter(genotype %in% c("DKO", "NCR")) %>% select(genotype), 
         #main="DKO - Top 40 M1 DEGs", 
         color = bwr_99)
```

```{r}
# Candidate gene heatmaps

genesEff <- c("Lag3", "Batf3", "Tnfsf8", "Gzmm", "Il21r", "Itk", "Tnfsf10", "Tlr3", "Foxo1", "Socs1", "Cd2ap", "Gzmk", "Il18r1", "Il12rb2", "Zeb2", "Prf1", "Gmzb", "Ifng", "Ccl4", "Sh2d1b1")
genesDevDiff <- c("Lef1", "Gata6", "Hhat", "Eomes", "Runx2", "Sox4", "Id3", "Tox", "Il4ra", "Dtx1", "Cd27", "Tcf7", "Klrg1", "Tbx21", "Spi1", "Id2")
genesCellSurface <- c("Kit", "Thy1", "Klra1", "Klra3", "Klra4", "Klra8", "Klra9", "Klra17", "Klrb1a", "Klrc1", "Klrc3", "Klrd1", "Ncr1", "Cd244a", "Klrg1", "Cd226", "Il2rb", "Klrb1c", "Fcgr3", "Sell", "Cd27", "H2-Ab1", "Cd86", "Cd69", "Klrd1", "Lair1", "Lag3", "Nkrp1", "Ccr5", "Ccr7", "Cxcr1", "Cxcr2", "Cxcr3", "Cxcr4", "Cx3cr1", "S1pr5", "Ccr2", "Il2ra", "Il12rb1", "Il15ra", "Il18r1", "Fcgr1", "Fcgr2a", "Fcgr2b", "Fcgr2c", "Fcgr3")
genesMetabolism <- c("Ppargc1a", "Ppara", "Ppard", "Pparg", "Sirt1", "Foxo1", "Foxo3", "Foxo4", "Acox1", "Acadm", "Acads", "Acsl1", "Acsl3", "Acsl4", "Acsl5", "Acsl6", "Akr1b8", "Akr1b3", "Aldh1a1", "Aldh2", "Aldh3a1", "Aldh4a1", "Aldh5a1", "Cpt1a", "Cpt1b", "Cpt2", "Fasn", "Fabp1", "Fabp2", "Fabp3", "Fabp4", "Fabp5", "Glut1", "Glut2", "Glut3", "Glut4", "G6pd", "Pck1", "Pck2", "Hk1", "Hk2", "Gck", "Pdk1", "Pdk2", "Pdk3")
genesProliferationCellSurvival <- c("Klf2", "Bbc3", "Klf4", "Il7r", "Bcl2a1b", "Bag3", "Bcl2", "Bcl2l1",  "Bcl2l11", "Mcl1", "Bax", "Bad", "Bid", "Bak1", "Casp3", "Casp8", "Casp9", "Xiap", "Survivin", "Pten", "Akt1", "Akt2", "Akt3", "Mtor", "Rb1", "E2f1", "E2f", "E2f3", "Tp53", "Cdkn1a", "Cdkn1b", "Cdkn2a", "Cdkn2b", "Nfkb1", "Nfkb2", "Rela", "Relb", "Tnf", "Tnfrsf1a", "Tnfrsf1b", "Mapk1", "Mapk3",  "Mapk8", "Mapk9", "Mapk14", "Mapk12", "Mapk13", "Map3k1", "Map3k5", "Map3k7")
genesTranscriptionFactors <- c("Gfi1", "Gfi1b", "Eomes", "Tbx21", "Nfil3", "Id2", "Nfkbia", "Nfkbid", "Nfkbiz", "Spi1", "Nr4a1", "Ncr1", "Tcf7", "Hobit", "Blimp1", "Gata3", "Il7r", "Tox", "Tox2", "Zbtb16", "Rora", "Ets1", "Eomes", "Tox2", "Tbx21", "Zeb2", "Runx3", "Id2", "Nfil3", "Tox", "Gata3", "Prdm1", "Batf", "Batf3", "Irf4", "Irf8", "Stat4", "Stat5", "c-Myc", "Notch1", "Ttf1", "Nkx3-1", "Gfi1", "Gfi1b", "Foxo1", "Foxo3", "Foxo4", "Spi1", "Spib", "Spic", "Ets1", "Ets2", "Pax5", "Pax8")
genesMigration <- c("Cxcr3", "Ccr5", "Ccr2", "Gpr183", "Ccr7", "Itgb3", "Cd63", "Itgad", "Gpr155", "Itgb5", "Ccr10", "Cxcr4", "Sell", "Itga4", "Itga6", "Itga1", "S1pr5", "Cxcr3", "Cx3cr1", "Ccr7", "S1pr5", "Cxcr4", "Cxcr6", "Ccr5", "Ccr9", "Ccr2", "Ccr6", "Itga4", "Itgb1", "Itgb2", "ItgaL", "ItgaM", "Itga1", "ItgaE", "ItgaD", "Itgb7", "ItgaE", "Selp", "Sele", "Sell", "Cldn1", "Cldn2", "Cldn3", "Cldn4", "Cldn5", "Cldn7", "Cldn8", "Cldn9", "Cldn10", "Cldn11", "Cldn12", "Cldn15", "Cldn16", "Cldn17", "Cldn18", "Cldn19", "Cldn20")
genesActivationInhibition <- c("Klra8", "Klra7", "Klra4", "Klra3", "Klra1", "Klrd1", "Cd94", "Lair-1", "Ptpn6", "Shp1", "Cish", "Cd244a", "Cd38", "Il6st", "Klrc1", "Cd101", "Hmbox1", "Klra3", "Klra1", "Cd226", "Cd96", "Cd200r1", "Ahr", "Cd72", "Tigit", "Cd69", "Ncr", "Klrb1c", "Klrb1b", "Cd244", "Ly49h", "Klrk1", "Cd226", "Cd25", "Il2rb", "Ifngr1", "Ifnar1", "Stat4", "Stat5", "Stat1", "Prf1", "Gzma", "Gzmb", "Gzmk", "Gzmm", "Tnfsf10", "Fasl", "Fas", "Ifng")
genesCytokineChemokine <- c("Ifng", "Tnf", "Csf2", "Il10", "Il2", "Il12", "Il15", "Il18", "Il22", "Il27", "Il33", "Cxcl1", "Cxcl2", "Cxcl3", "Cxcl4", "Cxcl5", "Cxcl9", "Cxcl10", "Cxcl11", "Cxcl12", "Cxcl13", "Cxcl16", "Ccl1", "Ccl2", "Ccl3", "Ccl4", "Ccl5", "Ccl6", "Ccl7", "Ccl8", "Ccl9", "Ccl10", "Ccl11", "Ccl12", "Ccl17", "Ccl19", "Ccl20", "Ccl21", "Ccl22", "Ccl25", "Ccl27", "Ccl28")
genesProteinValidation <- c("Itga2", "Itgam", "Thy1", "Thy1.2", "Klrb1c", "Nkrp1c", "Kit", "Tbx21", "Eomes", "Foxo1", "Ncr1", "Ncr", "Cxcr3", "Cx3cr1", "Tox", "Klrg1", "Sell", "Havcr2", "Il7r", "Icos", "Tcf7", "Il18r1", "Cd226", "Cd38", "Lag3")


plot_genes_heatmap <- function(resA, expression_mat, dds_meta1, keep_genes, keep_groups = c("NCR", "FOXO1KO", "DKO", "GFI1KO"), padj_cutoff = 0.05, log2FoldChange_cutoff = 1.1, baseMean_cutoff = 20, ...) {
  resA$significant <- ifelse(resA$padj < padj_cutoff & abs(resA$log2FoldChange) >= log2FoldChange_cutoff & resA$baseMean >= baseMean_cutoff & !is.na(resA$symbol), "yes", "no")
  df1 <- data.frame(expression_mat, "Row.names" = row.names(expression_mat))
  df2 <- data.frame("Row.names" = resA$Row.names, "significant" = resA$significant, "genes" = resA$symbol)
  mat <- dplyr::full_join(df1, df2, by = "Row.names")  #df3, df4,
  mat <- mat[, -which(colnames(mat) == "Row.names")]
  mat <- mat %>% filter(genes %in% keep_genes)
  row.names(mat) <- mat[, ncol(mat)]
  mat <- mat[, -ncol(mat)]
  mat <- mat[rowSums(mat[, -ncol(mat)]) > 0, ]
  row_anno <- mat[, "significant", drop = F]
  mat <- mat[, dds_meta1 %>% filter(genotype %in% keep_groups) %>% dplyr::select(Geneid) %>% unlist() %>% as.character()]
  col_anno <- dds_meta1 %>%
    filter(genotype %in% keep_groups) %>%
    dplyr::select(genotype)
  row.names(col_anno) <- dds_meta1 %>%
    filter(genotype %in% keep_groups) %>%
    dplyr::select(Geneid) %>%
    unlist() %>%
    as.character()
  
  genotype_colors <- list("genotype" = c("NCR" = "grey65", "FOXO1KO" = "goldenrod1", "GFI1KO" = "indianred1", "DKO" = "slateblue1"))
  sig_col <- list("significant" = c("yes" = "#e15179", "no" = "#e7e7e7"))
  pheatmap(mat,
           annotation_colors = c(genotype_colors, sig_col),
           annotation_col = col_anno,
           annotation_row = row_anno,
           ...
  )
}

plot_genes_heatmap(resA, assay(rld), dds_meta1, genesProteinValidation, keep_groups = c("NCR", "FOXO1KO", "DKO", "GFI1KO"),
                   show_rownames = T,
                   cluster_cols = T,
                   show_colnames = F,
                   cluster_rows = T,
                   scale="row",
                   #border_color = NA,
                   treeheight_row = 10,
                   clustering_distance_rows = "correlation",
                   color = bwr_99)
 
```

```{r}
##VENN diagram DEG

set1 <- resASig$symbol
set2 <- resBSig$symbol
set3 <- resCSig$symbol
colors <- c("slateblue1", "goldenrod1", "indianred1")

set1<-na.omit(set1)
set2<-na.omit(set2)
set3<-na.omit(set3)

venn.diagram(
  x = list(set1, set2, set3),
  category.names = c("DKO", "FOXO1KO", "GFI1KO"),
  filename = 'DEG M1 VennDiagram.png',
  output = TRUE,
  imagetype = "png",
  scaled = FALSE,
  col = "black",
  fill = colors,
  cat.col = colors,
  cat.cex = 1.5,   # Adjust the font size to make the labels smaller
  cat.pos = 0,     # Move the labels outside the circles
  cat.dist = c(0.05, 0.05, -0.45),  # Adjust the distance for each category label individually
  margin = 0.15
)
```

```{r}
#GO BP analysis 

#BP
orderSigresA_GO_UP <- resASig %>% filter(padj < 0.05 & (log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))
orderSigresA_GO_DOWN <- resASig %>% filter(padj < 0.05 & (log2FoldChange) <= -1.1 & baseMean >= 20 & !is.na(symbol))


egoBP_UPresA <- enrichGO(gene         = orderSigresA_GO_UP$entrez,
                 OrgDb         = org.Mm.eg.db,
                 keyType       = 'ENTREZID',
                 ont           = "BP",
                 pAdjustMethod = "BH",
                 pvalueCutoff  = 0.05,
                 )
head(egoBP_UPresA)
dotplot(egoBP_UPresA, showCategory = 15)

egoBP_DOWNresA <- enrichGO(gene         = orderSigresA_GO_DOWN$entrez,
                     OrgDb         = org.Mm.eg.db,
                     keyType       = 'ENTREZID',
                     ont           = "BP",
                     pAdjustMethod = "BH",
                     pvalueCutoff  = 0.05,
)
head(egoBP_DOWNresA)
dotplot(egoBP_DOWNresA, showCategory = 15)

write.csv(egoBP_UPresA, "M1_NCR_vs_DKO_GOBPup.csv")
write.csv(egoBP_DOWNresA, "M1_NCR_vs_DKO_GOBPdown.csv")
```


```{r}
#KEGG pathway analysis

orderSigresA_GO_UP <- resASig %>% filter(padj < 0.05 & (log2FoldChange) >= 1.1 & baseMean >= 20 & !is.na(symbol))

KEGGup <- enrichKEGG(gene         = orderSigresA_GO_UP$entrez,
                 organism     = 'mmu',
                 keyType = "ncbi-geneid",
                 pvalueCutoff = 1,
                 qvalueCutoff = 1)
dotplot(KEGGup, showCategory = 10)



orderSigresA_GO_DOWN <- resASig %>% filter(padj < 0.05 & (log2FoldChange) <= -1.1 & baseMean >= 20 & !is.na(symbol))

KEGGdown <- enrichKEGG(gene         = orderSigresA_GO_DOWN$entrez,
                 organism     = 'mmu',
                 keyType = "ncbi-geneid",
                 pvalueCutoff = 1,
                 qvalueCutoff = 1)
dotplot(KEGGdown, showCategory = 10)

write.csv(KEGGup, "M1_NCR_vs_DKO_KEGGup.csv")
write.csv(KEGGdown, "M1_NCR_vs_DKO_KEGGdown.csv")
```

```{r}
#GSEA

#change log2foldchange to negative values to see underexpression?
#DKO
tmp <- resA[!is.na(resA$symbol), ]
tmp <- tmp[order(tmp$stat, decreasing = T), ]
testgenes <- tmp$stat
names(testgenes) <- tmp$symbol
testgenes <- testgenes[!is.na(testgenes)]

gene_sets <- msigdbr(species = "mouse", category = "H")

gene_sets <- gene_sets %>%
  dplyr::select(gs_name, gene_symbol)

gsearesA <- GSEA(geneList = testgenes,
                TERM2GENE = gene_sets)

dotplot(gsearesA)

View(gsearesA)
dfgsearesA <- as.data.frame(gsearesA)



#attempting to overlap dfgsearesA/B/C

# str(dfgsearesA)
# str(dfgsearesB)
# str(dfgsearesC)
# 
# celltypes = c('Imm', 'M1', 'M2')
# groups = c("NCR", "FOXO1KO", "GFI1KO", "DKO")



#adding a score for GSEA
resdf2 <- resA %>%
  arrange(padj) %>%
  mutate(gsea_metric = -log10(padj) * sign(log2FoldChange))

resdf2 <- resdf2 %>%
  mutate(padj = case_when(padj == 0 ~ .Machine$double.xmin,
                          TRUE ~ padj)) %>%
  mutate(gsea_metric = -log10(padj) * sign(log2FoldChange))
resdf2

resdf2 <- resdf2 %>%
  filter(! is.na(gsea_metric)) %>%
  arrange(desc(gsea_metric))
View(resdf2)

hist(resdf2$gsea_metric, breaks = 100)

ranks <- resdf2 %>%
  select(symbol, gsea_metric) %>%
  distinct(symbol, .keep_all = TRUE) %>%
  deframe()

head(ranks)

gseares <- GSEA(geneList = ranks,
                TERM2GENE = gene_sets)
gsearesdf <- as.data.frame(gseares)

dotplot(gseares)

View(gsearesdf)
 
```
```{r}
# # compared to Wt, how does KO change in condition
# ############################################################################################################
# You helped write this initial code 2 months ago - I haven't fully thought through how I would present data that look at IMM vs M1 vs M2 in each KO - maybe this comparison could be useful for looking at how pathways or key transcription factors change as the cells develop from IMM into M1 into M2

# # FOXO1KO vs WT and Imm vs M1
# dds_meta1 <- metaDataTOTAL %>% filter((genotype %in% c("NCR", "FOXO1KO")) & (condition %in% c("Imm", "M1")))
# dds_1 <- DESeqDataSetFromMatrix(countData=countDataTOTAL[,which(colnames(countDataTOTAL) %in% dds_meta1$Geneid)], dds_meta1, design= ~ 0 + condition + genotype + genotype:condition)
# dds_1 <- dds_1[rowSums(counts(dds_1)) > 5, ]
# dds_1 <- DESeq(dds_1)
# 
# resultsNames(dds_1)
# resA <- results(dds_1, name = 'conditionM1.genotypeFOXO1KO')
# 
# get_lfc_sign <- function(df, thr = 1){
#   df$lfc_thr <- 0
#   df$lfc_thr[which(df$log2FoldChange >= thr)] <- 1
#   df$lfc_thr[which(df$log2FoldChange <= (-1 * thr))] <- -1
#   return(df)
# }
# 
# resA <- get_lfc_sign(resA, thr=1)
# 
# resA <- resA[order(resA$pvalue, decreasing=FALSE), ]
# 
# table(resA$padj < 0.05, resA$lfc_thr)
# 
# write.csv(resA, "filenames.csv")
```


Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
