Lipid mediators are signaling molecules derived from fatty acids that regulate inflammation and immune responses. By binding to cell-surface receptors, lipid mediators can regulate transcription and alter methylation. The goal of this study is to perform a cross-sectional analysis on patients who tested positive for the cyclic citrullinated peptide antibody (CCP) and determine whether specific CpG sites are associated with individual lipid mediators.
47 participants have both DNAm and lipid mediator information for at least 1 visit and up to 3 visits across 3 cell types. Initial quality control included adjusting for batch and implementing a range filter on our CpGs from Illumina’s EPICv1 BeadChip array.
For each of the 26 lipid mediators (LMs), linear models were fit with LM as the predictor of interest and m-value of CpG as the outcome, while adjusting for age, sex, and batch. Batch (binary: in batch 1 or other) was a covariate in the model based on PCA plots that suggested a cell-specific batch effect (2.1). A separate model was run for each of the three cell types, Bcell, Tmem, and Tnai. Using the p-values from this probe-level analysis, probes were ranked from smallest p-value to largest. A set-enrichment analysis was completed using the methylRRA() function from the methylGSA R package. All defaults were used except for the minimum and maximum number of genes for a gene set. The minimum number of genes that a gene set must contain is 10 and the maximum number is 1000. These parameters were the same for both KEGG and GO (3.1,3.2). To correct for multiple testing the Benjamini and Hochberg False Discovery Rate (FDR) was used, and the significance cut-off was FDR < 0.05.
CpG and Sample Size Summary
Below is a file that contains information about sample size and number of CpGs for the dataset used in this analysis. We started with 865,859 probes and between 21 and 23 percent of probes were filtered out due to low quality. We also excluded sex chromosomes for this analysis. We used 47 samples in our model, which was the same across Bcells, Tmem, and Tnai cells. These 47 samples are in the TIPRA cohort and are all CCP positive.
Patient Demographics
Below is a summary of several clinical variables for visit 1, which is the group we are using. The dataset is mostly non-Hispanic white females with an average age of 58.47.
Below is a table showing how CpGs intersect between cell types after a FDR filter of 0.05. Most CpGs are in only one cell type but there are some that are present in all 3 cell types and a small number that are present in two of the three cell types.
Below is a Venn diagram of all significant DMRs and how they intersect between Bcells, Tmem, and Tnai. Most of the significant DMRs were present in all three cell types and Bcells had the most significant DMRs that were not shared between cell types.
Below is an UpSet plot comparing how the significant DMRs intersect between lipid mediators. There is very little overlap.
Below is an UpSet plot comparing how the significant DMRs intersect between mediator pathways. There is very little overlap.
The tables below shows the genes associated with each significant DMR as well as the direction of methylation for Bcell, Tnai, and Tmem cell types. All significant DMRs in have a SIDAK p-value below 0.05. DMRs with a single probe were also filtered out.
Bcell_estimate <- read_csv("/Users/eckco/Desktop/Norris_May_Update/DMR/All_Bcell_DMR_dataset_with_estimate.csv", show_col_types = FALSE)
# Exclude the first column
Bcell_estimate <- Bcell_estimate[, -1]
# Remove extra rows and columns
Bcell_estimate <- Bcell_estimate %>%
select(-c("CpG","fdr"))
# Keep only one row per DMR
# Keep only one row per DMR using base R unique
#Bcell_estimate <- unique(Bcell_estimate, by = "DMR_Names")
# Base R approach to keep only one row per unique DMR_Names
Bcell_estimate <- Bcell_estimate[!duplicated(Bcell_estimate$DMR_Names), ]
names(Bcell_estimate)[names(Bcell_estimate) =="sidak"] <- "SIDAK"
Bcell_estimate$SIDAK <- formatC(Bcell_estimate$SIDAK, format = "e", digits = 2)
# Rename Total column
Bcell_estimate <- Bcell_estimate %>%
rename(numProbes = total)
full_names <- read.csv("/Users/eckco/Desktop/Norris_May_Update/finalLMlist_wParentFAInfo_updatedEnzymeInfo_Jan2024.csv")
# Prepare the full_names dataset by selecting Analyte and choosing between Pathway and Pathway2
full_names_pathway <- full_names %>%
mutate(pathway = ifelse(is.na(Pathway) | Pathway == "", Pathway2, Pathway)) %>%
select(Analyte, pathway)
# Merge in the pathway data using Analyte = LM
Bcell_estimate <- Bcell_estimate %>%
left_join(full_names_pathway, by = c("LM" = "Analyte"))
names(Bcell_estimate)[names(Bcell_estimate) == "pathway"] <- "Pathway"
Bcell_estimate <- Bcell_estimate[, c(5,11,1,2,3,4,6,8,9,10)]
Bcell_estimate <- Bcell_estimate[, c(7,1,2,3,4,5,6,8,9,10)]
library(dplyr)
Bcell_estimate <- Bcell_estimate %>%
mutate(UCSC_RefGene_Name = sapply(strsplit(UCSC_RefGene_Name, ";"), function(genes) {
unique_genes <- unique(trimws(genes))
paste(unique_genes, collapse = ";")
}))%>%
filter(!is.na(UCSC_RefGene_Name) & UCSC_RefGene_Name != "")
datatable(
Bcell_estimate,
rownames = FALSE,
options = list(
pageLength = 10,
dom = 'Bfrtip', # B: Buttons, f: filter, r: processing, t: table, i: info, p: pagination
buttons = c('copy', 'csv', 'excel', 'pdf')
),
extensions = 'Buttons',
caption = "Click to explore or download the full table"
)
Number of unique genes for Bcells:
71
Tmem_estimate <- read_csv("/Users/eckco/Desktop/Norris_May_Update/DMR/All_Tmem_DMR_dataset_with_estimate.csv", show_col_types = FALSE)
# Exclude the first column
Tmem_estimate <- Tmem_estimate[, -1]
# Remove extra rows and columns
Tmem_estimate <- Tmem_estimate %>%
select(-c("CpG","fdr"))
# Keep only one row per DMR
# Keep only one row per DMR using base R unique
#Tmem_estimate <- unique(Tmem_estimate, by = "DMR_Names")
# Base R approach to keep only one row per unique DMR_Names
Tmem_estimate <- Tmem_estimate[!duplicated(Tmem_estimate$DMR_Names), ]
names(Tmem_estimate)[names(Tmem_estimate) =="sidak"] <- "SIDAK"
Tmem_estimate$SIDAK <- formatC(Tmem_estimate$SIDAK, format = "e", digits = 2)
# Rename Total column
Tmem_estimate <- Tmem_estimate %>%
rename(numProbes = total)
full_names <- read.csv("/Users/eckco/Desktop/Norris_May_Update/finalLMlist_wParentFAInfo_updatedEnzymeInfo_Jan2024.csv")
# Prepare the full_names dataset by selecting Analyte and choosing between Pathway and Pathway2
full_names_pathway <- full_names %>%
mutate(pathway = ifelse(is.na(Pathway) | Pathway == "", Pathway2, Pathway)) %>%
select(Analyte, pathway)
# Merge in the pathway data using Analyte = LM
Tmem_estimate <- Tmem_estimate %>%
left_join(full_names_pathway, by = c("LM" = "Analyte"))
names(Tmem_estimate)[names(Tmem_estimate) == "pathway"] <- "Pathway"
Tmem_estimate <- Tmem_estimate[, c(5,11,1,2,3,4,6,8,9,10)]
Tmem_estimate <- Tmem_estimate[, c(7,1,2,3,4,5,6,8,9,10)]
library(dplyr)
Tmem_estimate <- Tmem_estimate %>%
mutate(UCSC_RefGene_Name = sapply(strsplit(UCSC_RefGene_Name, ";"), function(genes) {
unique_genes <- unique(trimws(genes))
paste(unique_genes, collapse = ";")
}))%>%
filter(!is.na(UCSC_RefGene_Name) & UCSC_RefGene_Name != "")
datatable(
Tmem_estimate,
rownames = FALSE,
options = list(
pageLength = 10,
dom = 'Bfrtip', # B: Buttons, f: filter, r: processing, t: table, i: info, p: pagination
buttons = c('copy', 'csv', 'excel', 'pdf')
),
extensions = 'Buttons',
caption = "Click to explore or download the full table"
)
Number of unique genes for Tmem:
67
Tnai_estimate <- read_csv("/Users/eckco/Desktop/Norris_May_Update/DMR/All_Tnai_DMR_dataset_with_estimate.csv", show_col_types = FALSE)
# Exclude the first column
Tnai_estimate <- Tnai_estimate[, -1]
# Remove extra rows and columns
Tnai_estimate <- Tnai_estimate %>%
select(-c("CpG","fdr"))
# Keep only one row per DMR
# Keep only one row per DMR using base R unique
#Tnai_estimate <- unique(Tnai_estimate, by = "DMR_Names")
# Base R approach to keep only one row per unique DMR_Names
Tnai_estimate <- Tnai_estimate[!duplicated(Tnai_estimate$DMR_Names), ]
names(Tnai_estimate)[names(Tnai_estimate) =="sidak"] <- "SIDAK"
Tnai_estimate$SIDAK <- formatC(Tnai_estimate$SIDAK, format = "e", digits = 2)
# Rename Total column
Tnai_estimate <- Tnai_estimate %>%
rename(numProbes = total)
full_names <- read.csv("/Users/eckco/Desktop/Norris_May_Update/finalLMlist_wParentFAInfo_updatedEnzymeInfo_Jan2024.csv")
# Prepare the full_names dataset by selecting Analyte and choosing between Pathway and Pathway2
full_names_pathway <- full_names %>%
mutate(pathway = ifelse(is.na(Pathway) | Pathway == "", Pathway2, Pathway)) %>%
select(Analyte, pathway)
# Merge in the pathway data using Analyte = LM
Tnai_estimate <- Tnai_estimate %>%
left_join(full_names_pathway, by = c("LM" = "Analyte"))
names(Tnai_estimate)[names(Tnai_estimate) == "pathway"] <- "Pathway"
Tnai_estimate <- Tnai_estimate[, c(5,11,1,2,3,4,6,8,9,10)]
Tnai_estimate <- Tnai_estimate[, c(7,1,2,3,4,5,6,8,9,10)]
library(dplyr)
Tnai_estimate <- Tnai_estimate %>%
mutate(UCSC_RefGene_Name = sapply(strsplit(UCSC_RefGene_Name, ";"), function(genes) {
unique_genes <- unique(trimws(genes))
paste(unique_genes, collapse = ";")
}))%>%
filter(!is.na(UCSC_RefGene_Name) & UCSC_RefGene_Name != "")
datatable(
Tnai_estimate,
rownames = FALSE,
options = list(
pageLength = 10,
dom = 'Bfrtip', # B: Buttons, f: filter, r: processing, t: table, i: info, p: pagination
buttons = c('copy', 'csv', 'excel', 'pdf')
),
extensions = 'Buttons',
caption = "Click to explore or download the full table"
)
Number of unique genes for Tnai:
61
Below is a Venn diagram comparing the intersecting pathways using KEGG across cell types.
Below is a Venn diagram comparing the intersection of pathways with the top 10 lowest padj values across cell types using KEGG.
Below is a table comparing the intersecting KEGG pathways that are present in all 3 cell types. There are 35 total pathways.
Below is a Venn diagram comparing the intersecting pathways using GO ontology profiles across cell types.
Below is a Venn diagram comparing the intersection of pathways with the top 10 lowest padj values across cell types using GO.
1.1 Code for pathway intersection plots
# # Define your directory path
# input_dir <- "/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tnai/Tnai_GO_Results/Sig_Results"
#
# # List all CSV files in the directory
# file_list <- list.files(input_dir, pattern = "\\.csv$", full.names = TRUE)
#
# # Function to read each file and add LM column
# read_and_label <- function(file_path) {
# filename <- basename(file_path)
# lm <- str_extract(filename, "LM[^_\\.]+")
# df <- read_csv(file_path, col_types = cols(.default = "c"))
# df$LM <- lm
# return(df)
# }
#
#
# # Apply the function to all files and bind them
# combined_df <- file_list %>%
# lapply(read_and_label) %>%
# bind_rows()
#
# length(combined_df)
# combined_df <- combined_df[, c(10,1,2,3,4,5,6,7,8,9)]
# # View the result
# print(head(combined_df))
#write_csv(combined_df, "Tnai_Merged_GO_Results.csv")
Bcell_Kegg <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Bcell/Bcell_Kegg_Results/More_Pathways_Sig_Data/Sig_Results_More_Pathways/Bcell_Merged_KEGG_Results.csv")
head(Bcell_Kegg)
Bcell_GO <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Bcell/Bcell_GO_Results/Sig_Results/Merged_GO_Results.csv")
head(Bcell_GO)
Tmem_Kegg <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tmem/Tmem_KEGG_Results/Significant_Results/Tmem_KEGG_combined_Significant_Pathways.csv")
head(Tmem_Kegg)
Tmem_GO <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tmem/Tmem_GO_Results/Sig_Results/Tmem_Merged_GO_Results.csv")
head(Tmem_GO)
Tnai_KEGG <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tnai/Tnai_KEGG_Results/Sig_Results/Tnai_Merged_KEGG_Results.csv")
head(Tnai_KEGG)
Tnai_GO <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tnai/Tnai_GO_Results/Sig_Results/Tnai_Merged_GO_Results.csv")
head(Tnai_GO)
############################################
# --------------------------
# Load the KEGG datasets
# --------------------------
Bcell_Kegg <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Bcell/Bcell_Kegg_Results/More_Pathways_Sig_Data/Sig_Results_More_Pathways/Bcell_Merged_KEGG_Results.csv")
Tmem_Kegg <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tmem/Tmem_KEGG_Results/Significant_Results/Tmem_KEGG_combined_Significant_Pathways.csv")
Tnai_KEGG <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tnai/Tnai_KEGG_Results/Sig_Results/Tnai_Merged_KEGG_Results.csv")
# --------------------------
# Filter by adjusted p-value < 0.05
# --------------------------
Bcell_filtered <- Bcell_Kegg %>% filter(padj < 0.05)
Tmem_filtered <- Tmem_Kegg %>% filter(padj < 0.05)
Tnai_filtered <- Tnai_KEGG %>% filter(padj < 0.05)
# --------------------------
# Venn Diagram by ID
# --------------------------
# Extract IDs
venn_list <- list(
Bcell = Bcell_filtered$ID,
Tmem = Tmem_filtered$ID,
Tnai = Tnai_filtered$ID
)
# Create improved Venn diagram
venn.plot <- venn.diagram(
x = venn_list,
category.names = c("Bcell", "Tmem", "Tnai"),
filename = NULL, # Keep in R session
fill = c("skyblue", "lightgreen", "salmon"),
alpha = 0.5,
cex = 1.8,
fontfamily = "sans",
cat.cex = 1.5,
cat.fontfamily = "sans",
cat.pos = c(-20, 20, 0), # Adjust label positions
cat.dist = c(0.06, 0.06, 0.03),
margin = 0.2,
scaled = FALSE
)
# Plot it to screen
grid.newpage()
grid.draw(venn.plot)
png("KEGG_Venn_Diagram.png", width = 1600, height = 1200, res = 200)
grid.newpage()
grid.draw(venn.plot)
grid.text("KEGG Pathway Intersections", x = 0.5, y = 0.95, gp = gpar(fontsize = 20, fontface = "bold"))
dev.off()
# --------------------------
# 2. UpSet Plot by Description
# --------------------------
# Get top 10 unique Descriptions from each
top_desc <- function(df, label) {
df %>%
filter(!is.na(Description)) %>%
distinct(Description, .keep_all = TRUE) %>%
top_n(0, wt = -padj) %>%
pull(Description)
}
Bcell_top <- top_desc(Bcell_filtered, "Bcell")
Tmem_top <- top_desc(Tmem_filtered, "Tmem")
Tnai_top <- top_desc(Tnai_filtered, "Tnai")
# Create a binary matrix for UpSet plot
all_descriptions <- unique(c(Bcell_top, Tmem_top, Tnai_top))
upset_df <- data.frame(
Description = all_descriptions,
Bcell = all_descriptions %in% Bcell_top,
Tmem = all_descriptions %in% Tmem_top,
Tnai = all_descriptions %in% Tnai_top
)
# Convert logical to integer for UpSetR
upset_df[, -1] <- lapply(upset_df[, -1], as.integer)
rownames(upset_df) <- upset_df$Description
upset_df <- upset_df[, -1]
png("KEGG_Upset_Plot.png", width = 1600, height = 1200, res = 200)
# Plot UpSet
upset(upset_df, sets = c("Bcell", "Tmem", "Tnai"),
keep.order = TRUE,
order.by = "freq",
mainbar.y.label = "Intersection Size",
sets.x.label = "Top 10 lowest padj values per Cell Type")
#grid.text("KEGG Top Pathway Intersections", x = 0.5, y = 0.95, gp = gpar(fontsize = 20, fontface = "bold"))
dev.off()
####################################################
# Load required libraries
library(dplyr)
library(VennDiagram)
library(grid)
library(UpSetR)
####################################################
# --------------------------
# Load the GO datasets
# --------------------------
Bcell_GO <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Bcell/Bcell_GO_Results/Sig_Results/Merged_GO_Results.csv")
Tmem_GO <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tmem/Tmem_GO_Results/Sig_Results/Tmem_Merged_GO_Results.csv")
Tnai_GO <- read.csv("/Users/eckco/Desktop/Norris_Lab/Modeling_Results/DMR/Pathway_Results/Tnai/Tnai_GO_Results/Sig_Results/Tnai_Merged_GO_Results.csv")
head(Tnai_GO)
# --------------------------
# Filter by adjusted p-value < 0.05
# --------------------------
Bcell_filtered <- Bcell_GO %>% filter(padj < 0.05)
Tmem_filtered <- Tmem_GO %>% filter(padj < 0.05)
Tnai_filtered <- Tnai_GO %>% filter(padj < 0.05)
# --------------------------
# Venn Diagram by ID
# --------------------------
venn_list <- list(
Bcell = Bcell_filtered$ID,
Tmem = Tmem_filtered$ID,
Tnai = Tnai_filtered$ID
)
venn.plot <- venn.diagram(
x = venn_list,
category.names = c("Bcell", "Tmem", "Tnai"),
filename = NULL,
fill = c("skyblue", "lightgreen", "salmon"),
alpha = 0.5,
cex = 1.8,
fontfamily = "sans",
cat.cex = 1.8,
cat.fontfamily = "sans",
cat.pos = c(-15, 15, 180), # Bcell left, Tmem right, Tnai bottom
cat.dist = c(0.08, 0.08, 0.10), # push labels out
margin = 0.2,
scaled = FALSE
)
# Save Venn diagram to file
png("GO_Venn_Diagram.png", width = 1600, height = 1200, res = 200)
grid.newpage()
grid.draw(venn.plot)
grid.text("GO Pathway Intersections", x = 0.5, y = 0.95, gp = gpar(fontsize = 20, fontface = "bold"))
dev.off()
# --------------------------
# 2. UpSet Plot by ID
# --------------------------
# Helper to get top 10 unique IDs
top_desc <- function(df) {
df %>%
filter(!is.na(ID)) %>%
distinct(ID, .keep_all = TRUE) %>%
#slice_min(padj, n = 10, with_ties=FALSE)
slice_min(padj, n = 10) %>%
pull(ID)
}
Bcell_top <- top_desc(Bcell_filtered)
length(Bcell_top)
Tmem_top <- top_desc(Tmem_filtered)
Tnai_top <- top_desc(Tnai_filtered)
# Create UpSet data frame
all_IDs <- unique(c(Bcell_top, Tmem_top, Tnai_top))
upset_df <- data.frame(
ID = all_IDs,
Bcell = all_IDs %in% Bcell_top,
Tmem = all_IDs %in% Tmem_top,
Tnai = all_IDs %in% Tnai_top
)
# Convert to binary matrix
upset_df[, -1] <- lapply(upset_df[, -1], as.integer)
rownames(upset_df) <- upset_df$ID
upset_df <- upset_df[, -1]
# Plot UpSet
png("GO_Upset_Plot.png", width = 1600, height = 1200, res = 200)
upset(upset_df, sets = c("Bcell", "Tmem", "Tnai"),
keep.order = TRUE,
order.by = "freq",
mainbar.y.label = "Intersection Size",
sets.x.label = "Top 10 lowest padj values per Cell Type")
dev.off()