We conducted two-sample Mendelian Randomization (MR) analyses and
colocalization to evaluate the causal effect of acetyl-amino acid levels
within a 1 Mb region surrounding the transcription start site (TSS) of
the ACY1 gene on 7 metabolic ouctomes. Genetic instruments were
derived from the METSIM cohort, while outcome summary statistics were
obtained from various cohorts. All analyses were performed using
R (version 4.5.0) and Python (version
3.12.9).
Summary statistics for acetyl-amino acids were sourced from the
METSIM (Metabolic Syndrome in Men) study, a population-based cohort of
Finnish men. The data, available in GRCh38 coordinates, were downloaded
from the PheWeb repository
(https://pheweb.org/metsim-metab/) as compressed files.
These files were decompressed using the following command:
for f in C*; do mv "$f" "$f.gz"; bgzip -d "$f.gz"; done
This yielded tab-separated value (tsv) files containing association statistics for approximately 6,000 individuals. The dataset includes the following columns:
chrom)pos)ref) and alternate (alt)
allelesbeta)sebeta)pval)maf)rsids)The following acetyl-amino acids were fetched (GRCh38 coordinates):
N-acetylmethionine (C1083, 6,135 samples):
wget https://pheweb.org/metsim-metab/download/C1083N-acetylalanine (C1110, 6,136 samples):
wget https://pheweb.org/metsim-metab/download/C1110N-acetylisoleucine (C100001276, 4,946 samples):
wget https://pheweb.org/metsim-metab/download/C100001276N-acetylglutamate (C100000282, 5,864 samples):
wget https://pheweb.org/metsim-metab/download/C100000282N-acetylserine (C100001851, 6,133 samples):
wget https://pheweb.org/metsim-metab/download/C100001851BMI summary statistics were obtained from the Jurgens et al. (2022)
GWAS of European-ancestry individuals in the UK Biobank, available at
https://personal.broadinstitute.org/ryank/Jurgens_Pirruccello_2022_GWAS_Sumstats.zip.
The file GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv
provides association results for approximately 460,000 participants,
originally reported in GRCh37 (hg19) coordinates. Key columns
include:
CHR)BP)SNP)ALLELE1)ALLELE0)A1FREQ)BETA)SE)P)BMI summary statistics were downloaded and processed as follows:
wget https://personal.broadinstitute.org/ryank/Jurgens_Pirruccello_2022_GWAS_Sumstats.zip
# Select: GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv
# Use LiftOver to convert from GRCh37 to GRCh38 coordinates
The following bash commands were used to decompress and format
MRC-IEU outcome files (.vcf.gz) into tab-separated value
(tsv) files with calculated p-values:
# Step 1: Decompress all .vcf.gz files
for f in ukb-b-12493.vcf.gz ukb-b-12651.vcf.gz ukb-b-15541.vcf.gz ukb-b-970.vcf.gz ebi-a-GCST006867.vcf.gz; do
gunzip -k "$f"
done
# Step 2: Convert to tsv and calculate pval from logp
for f in ukb-b-12493.vcf ukb-b-12651.vcf ukb-b-15541.vcf ukb-b-970.vcf ebi-a-GCST006867.vcf; do
outfile="${f/.vcf/_formatted.tsv}"
echo -e "CHR\tBP\tSNP\tREF\tALT\tEAF\tBETA\tSE\tLP\tPVAL" > "$outfile"
awk 'BEGIN{OFS="\t"}
!/^#/ {
split($10, a, ":");
beta = a[1];
se = a[2];
lp = a[3];
eaf = a[4];
pval = 10^(-lp);
print $1, $2, $3, $4, $5, eaf, beta, se, lp, pval
}' "$f" >> "$outfile"
done
gunzip -k 34841290-GCST90091033-EFO_0003095.h.tsv.gz
awk 'BEGIN{OFS="\t"; print "CHR","BP","SNP","REF","ALT","EAF","BETA","SE","LP","PVAL"}
NR>1 {
lp = -log($21)/log(10);
print $3, $4, $2, $5, $6, $11, $7, $20, lp, $21
}' 34841290-GCST90091033-EFO_0003095.h.tsv > 34841290-GCST90091033-EFO_0003095_formatted.tsv
To align the BMI data with the acetylaminos data (GRCh38), we
converted the GRCh37 coordinates to GRCh38 using a three-step
bioinformatics pipeline implemented in Python and
Unix:
Conversion to BED Format: A Python
script (make_bed.py) transformed the BMI TSV file into a
BED format, extracting chromosome, start (BP - 1 for
0-based coordinates), end (BP), and SNP ID. The output was
saved as
GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed.
LiftOver: The UCSC liftOver tool
was applied using the chain file hg19ToHg38.over.chain.gz
to map GRCh37 coordinates to GRCh38. Successfully mapped SNPs were
written to
GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed, with
unmapped SNPs logged in unmapped_SNPs.bed.
Merging with Original Data: A second
Python script (format_bmi.py) processed the
lifted BED file, adjusting coordinates to 1-based notation and merging
them with the original TSV data using the SNP column. The
merged dataset was sorted by SNP, processed in chunks (500,000 rows) to
manage memory, and renamed to conform to the TwoSampleMR R
package requirements: effect_allele (ALLELE1),
other_allele (ALLELE0), eaf
(A1FREQ), beta (BETA),
se (SE), pval (P),
and BP_GRCh38 (new position). The final output was saved as
merged_gwas_data_grch38_twosample.tsv.
Both datasets underwent cleaning and harmonization in R
using the data.table, dplyr, and
tidyr packages:
Acetylaminos Data: Rows with missing or invalid
rsIDs were filtered out
(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)),
and the data were formatted for MR analysis with columns including
chr.exposure, pos.exposure,
beta.exposure, se.exposure,
pval.exposure, effect_allele.exposure,
other_allele.exposure, eaf.exposure,
samplesize.exposure (5,830), and SNP.
BMI Data: Similar filtering removed invalid
SNPs, and the data were formatted with chr.outcome,
pos.outcome, beta.outcome,
se.outcome, pval.outcome,
effect_allele.outcome, other_allele.outcome,
eaf.outcome, samplesize.outcome (460,000), and
SNP.
Harmonization: The harmonise_data
function from the TwoSampleMR package aligned acetylaminos
(exposure) and BMI (outcome) data by matching alleles and removing
palindromic SNPs with ambiguous effects, producing a harmonized dataset
saved as
harmonized_nat_Jurgens_BMI_dat_[date].csv.
BMI summary statistics were on GRCh37. I wanted the GRCh38 coordinates for other post-GWAS analyses.
This was a 3-step process:
Step 1: Converts BMI from TSV to BED file format
This first script (make_bed.py) converts the BMI summary statistics file (in TSV format) to a BED file format.
import csv
# Open the input TSV file and prepare to write to the output BED file
with open('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv', 'r') as tsvfile, open('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed', 'w') as bedfile:
tsv_reader = csv.DictReader(tsvfile, delimiter='\t')
for row in tsv_reader:
# Extract the necessary information
chromosome = f"chr{row['CHR']}"
start = int(row['BP']) - 1 # Convert to 0-based for BED format
end = row['BP']
snp_id = row['SNP']
# Write to BED file
bedfile.write(f"{chromosome}\t{start}\t{end}\t{snp_id}\n")
print("Conversion complete. Check 'GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed' for the output.")
Step 2: liftOver
liftOver \
GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed \ # Input BED file
hg19ToHg38.over.chain.gz \ # Chain file for converting GRCh37 (hg19) → GRCh38 (hg38)
GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed \ # Output file (successfully mapped SNPs)
unmapped_SNPs.bed # Output file for unmapped SNPs
Step 3: This 2nd script (format_bmi.py) does the following:
Reads the BED File:
GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed
containing:
CHR)END positions to a new column called
BP_GRCh38.Sorts BED Data:
SNP column for efficient
merging later.Processes the GWAS TSV File:
GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv file in
chunks (500,000 rows at a time) to avoid memory overflow.SNP for faster and more efficient
merging.Merges:
SNP
column.BP_GRCh38 (new genome build position) to the
GWAS data.Rename Columns:
Renames key columns to fit the format required by the TwoSampleMR package:
ALLELE1 → effect_alleleALLELE0 → other_alleleA1FREQ → eaf (effect allele
frequency)BETA → betaSE → se (standard error)P → pvalSaves Merged Results:
merged_gwas_data_grch38_original.tsv for downstream
analysis using the TwoSampleMR package.import pandas as pd
# Define chunk size for reading large TSV files in smaller parts
chunk_size = 500000 # Adjust based on memory availability
# Initialize an empty list to store the merged chunks
merged_chunks = []
try:
# Step 1: Load the BED file (without headers)
bed_columns = ['CHR', 'START', 'END', 'SNP']
bed_data = pd.read_csv('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed', sep='\t', header=None, names=bed_columns)
print("Loaded BED file successfully.", flush=True)
# Step 2: Add 1 to the START position to convert from zero-based to one-based
bed_data['BP_GRCh38'] = bed_data['END']
print("Converted BED file positions to one-based.", flush=True)
# Step 3: Sort the BED data by the SNP column to optimize merging
bed_data = bed_data.sort_values(by='SNP')
print("Sorted BED data by SNP.", flush=True)
# Step 4: Process the TSV file in chunks, and merge each chunk with the BED data
for chunk in pd.read_csv('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv', sep='\t', chunksize=chunk_size):
print(f"Processing chunk...", flush=True)
# Sort the chunk by SNP for more efficient merging
chunk = chunk.sort_values(by='SNP')
# Merge the chunk with the BED data
merged_chunk = pd.merge(chunk, bed_data[['CHR', 'SNP', 'BP_GRCh38']], on='SNP', how='left')
# Append the merged chunk to the list of merged chunks
merged_chunks.append(merged_chunk)
# Step 5: Concatenate all merged chunks together into one DataFrame
merged_data = pd.concat(merged_chunks)
print("Merged all chunks successfully.", flush=True)
# Step 6: Save the merged data to a new TSV file before changing column names
merged_data.to_csv('merged_gwas_data_grch38_original.tsv', sep='\t', index=False)
print("Saved 'merged_gwas_data_grch38_original.tsv' successfully.", flush=True)
# Step 7: Perform the column renaming for TwoSampleMR, keeping both BP columns
merged_data.rename(columns={
'SNP': 'SNP',
'CHR': 'CHR', # Keep the original chromosome column
'BP': 'BP', # Keep the original BP
'BP_GRCh38': 'BP_GRCh38', # Add BP_GRCh38
'ALLELE1': 'effect_allele', # ALLELE1 -> effect_allele
'ALLELE0': 'other_allele', # ALLELE0 -> other_allele
'A1FREQ': 'eaf', # A1FREQ -> effect allele frequency
'BETA': 'beta', # BETA -> beta
'SE': 'se', # SE -> standard error
'P': 'pval' # P -> p-value
}, inplace=True)
# Step 8: Save the final TwoSampleMR-compatible file in TSV format
merged_data.to_csv('merged_gwas_data_grch38_twosample.tsv', sep='\t', index=False)
print("Saved 'merged_gwas_data_grch38_twosample.tsv' successfully.", flush=True)
# Step 9 (optional / didn't work): Save the merged data in Parquet format for faster reading in the future
# merged_data.to_parquet('merged_gwas_data_grch38.parquet', index=False)
# print("Saved 'merged_gwas_data_grch38.parquet' successfully (Parquet format).", flush=True)
except Exception as e:
print(f"An error occurred: {e}", flush=True)
Re-install liftOver to use in base conda env.
#!/bin/bash
# install_liftover.sh
# Script to install UCSC liftOver on Mac M1/M2/M3 (Apple Silicon) correctly
# Downloads Intel (x86_64) binary because Rosetta handles it automatically
echo "Navigating to /opt/homebrew/bin (standard Homebrew location for Apple Silicon)..."
cd /opt/homebrew/bin || { echo "/opt/homebrew/bin not found, trying /usr/local/bin"; cd /usr/local/bin || exit 1; }
echo "Removing any existing broken liftOver binary..."
sudo rm -f liftOver
echo "Downloading working macOS x86_64 (Intel) liftOver binary from UCSC..."
sudo curl -O http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/liftOver
echo "Making liftOver executable..."
sudo chmod +x liftOver
echo "Testing liftOver installation..."
if liftOver --help > /dev/null 2>&1; then
echo "✅ liftOver installed successfully and ready to use!"
else
echo "❌ liftOver installation failed. Please check manually."
fi
# Use liftOver: skip non-alcoholic fatty liver disease, as already on build 38.
cd /Users/charleenadams/mr_acetylaminos_bmi
for f in *_formatted.tsv; do
# Skip the file that's already on GRCh38
if [[ "$f" == "34841290-GCST90091033-EFO_0003095_formatted.tsv" ]]; then
echo "Skipping $f (already GRCh38)"
continue
fi
prefix="${f/.tsv/}"
# Step 1: Create BED file for liftOver (0-based start)
awk 'BEGIN{OFS="\t"} NR>1 {print "chr"$1, $2-1, $2, $3}' "$f" > "${prefix}_lift.bed"
# Step 2: Run liftOver
liftOver "${prefix}_lift.bed" hg19ToHg38.over.chain.gz "${prefix}_lifted.bed" "${prefix}_unmapped.bed"
# Step 3: Map lifted SNPs back (chr prefix removed)
awk 'BEGIN{OFS="\t"} {gsub(/^chr/,"",$1); print $4, $1, $3}' "${prefix}_lifted.bed" > "${prefix}_map.tsv"
# Step 4: Join lifted coordinates back into the original data
awk 'BEGIN{OFS="\t"} FNR==NR{a[$1]=$3; next}
FNR==1{print $0, "BP38"; next}
{print $0, (a[$3] ? a[$3] : "NA")}' "${prefix}_map.tsv" "$f" > "${prefix}_lifted38.tsv"
done
Instrument Selection: From the filtered dataset,
genetic instruments were selected with a relaxed p-value threshold of
5E-4 (versus the conventional 5 × 10⁻⁸) to maximize power within the
ACY1 TSS region, justified by prior evidence of Acetylaminos
association. Instruments were required to have an F-statistic > 10 to
ensure strength, calculated as \[ F = \left(
\frac{\beta_{exposure}}{SE_{exposure}} \right)^2 \]. Steiger
filtering (steiger_filtering) retained only SNPs where the
exposure effect directionally preceded the outcome.
Linkage Disequilibrium (LD) Clumping:
Independent instruments were identified using ld_clump with
a clumping window of 500 kb (versus the default 10,000 kb) and an
r^2 threshold of 0.001, referencing the 1000 Genomes
European (EUR) population (bfile = EUR).
MR Methods: Five MR methods were applied:
mr_ivw).mr_egger_regression).mr_weighted_median).mr_lasso).mr_conmix).Sensitivity Analyses: Heterogeneity was assessed
with mr_heterogeneity, pleiotropy with
mr_pleiotropy_test, and leave-one-out analysis with
mr_leaveoneout. Wald ratio tests were computed for each
instrument individually.
When only one instrument was available, we only performed Wald ratio tests (nothing else)
# Clear the environment
rm(list = ls())
# Load required libraries
library(data.table)
library(dplyr)
library(TwoSampleMR)
library(MRInstruments)
library(openxlsx)
library(ggplot2)
library(scales)
# Define exposure file information
exposure_info <- list(
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C1083",
exposure_name = "N-acetylmethionine",
sample_size = 6135
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C1110",
exposure_name = "N-acetylalanine",
sample_size = 6136
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C100001276",
exposure_name = "N-acetylisoleucine",
sample_size = 4946
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C100000282",
exposure_name = "N-acetylglutamate",
sample_size = 5864
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C100001851",
exposure_name = "N-acetylserine",
sample_size = 6133
)
)
# Loop through each exposure file
for (exp in exposure_info) {
exposure_file <- exp$file
exposure_name <- exp$exposure_name
sample_size <- exp$sample_size
today <- Sys.Date()
# Create results directory for this exposure with p5E4 threshold
results_dir <- paste0("/Users/charleenadams/mr_acetylaminos_bmi/results_", exposure_name, "_p5E4/")
if (!dir.exists(results_dir)) {
dir.create(results_dir, recursive = TRUE)
cat("Created directory:", results_dir, "\n")
}
# ---------------------------------------------
# Load and Format Exposure Data
# ---------------------------------------------
exp_data <- fread(exposure_file) %>% as.data.frame()
filtered_exp_df <- exp_data %>%
filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
arrange(rsids)
exp_dat <- filtered_exp_df %>%
mutate(
chr.exposure = chrom,
pos.exposure = pos,
beta.exposure = beta,
se.exposure = sebeta,
exposure = exposure_name,
id.exposure = exposure_name,
pval.exposure = pval,
SNP.exposure = rsids,
SNP = rsids,
effect_allele.exposure = alt,
other_allele.exposure = ref,
eaf.exposure = maf,
samplesize.exposure = sample_size,
id_col = nearest_genes
)
# ---------------------------------------------
# Load and Format BMI Data
# ---------------------------------------------
bmi <- fread("/Users/charleenadams/mr_acetylaminos_bmi/merged_gwas_data_grch38_twosample.tsv")
filtered_bmi_df <- bmi %>%
filter(!is.na(SNP) & SNP != "" & grepl("^rs", SNP)) %>%
arrange(SNP)
out_dat <- filtered_bmi_df %>%
mutate(
SNP = SNP,
SNP.outcome = SNP,
chr.outcome = CHR_x,
pos.outcome = BP_GRCh38,
beta.outcome = beta,
se.outcome = se,
pval.outcome = pval,
effect_allele.outcome = effect_allele,
other_allele.outcome = other_allele,
eaf.outcome = eaf,
samplesize.outcome = 460000,
id.outcome = "BMI",
outcome = "BMI"
)
# ---------------------------------------------
# Harmonize Data
# ---------------------------------------------
dat <- harmonise_data(exposure_dat = exp_dat, outcome_dat = out_dat)
harmonized_file <- paste0(results_dir, "harmonized_", exposure_name, "_Jurgens_BMI_dat_", today, ".csv")
write.csv(dat, harmonized_file, row.names = FALSE)
# ---------------------------------------------
# Select 1MB around ACY1 region
# ---------------------------------------------
# Load the local annotation file
local_annotation <- read.table("/Users/charleenadams/mr_ukbppp_chd/mart_export_clean.txt",
header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# Filter for the gene ACY1
acy1_data <- local_annotation[local_annotation$Gene.name == "ACY1", ]
# Extract the TSS and chromosome
acy1_tss <- min(acy1_data$Transcript.start..bp.)
acy1_chrom <- acy1_data$Chromosome.scaffold.name[1]
window_size <- 500000
tss_start <- acy1_tss - window_size
tss_end <- acy1_tss + window_size
# Display the results
cat("TSS for ACY1:", acy1_tss, "\n")
cat("500kb upstream:", tss_start, "\n")
cat("500kb downstream:", tss_end, "\n")
# Filter data within 1Mb of ACY1 TSS
dat_filtered <- dat %>%
filter(chr.outcome == acy1_chrom & pos.outcome >= tss_start & pos.outcome <= tss_end)
filtered_file <- paste0(results_dir, "filtered_ACY1_1Mb_", exposure_name, "_Jurgens_BMI_", today, ".csv")
write.csv(dat_filtered, file = filtered_file, row.names = FALSE)
cat("Filtered data saved to:", filtered_file, "\n")
cat("Number of observations within 1 Mb of ACY1 TSS:", nrow(dat_filtered), "\n")
# ---------------------------------------------
# MR
# ---------------------------------------------
# Step 1: Filter Instruments
dat_filtered <- read.csv(filtered_file)
instruments <- dat_filtered %>%
filter(mr_keep == TRUE, pval.exposure < 5e-4) %>% # Relaxed threshold to 5e-4
mutate(
rsid = SNP,
pval = pval.exposure,
id = paste0(exposure_name, " (1MB around ACY1 TSS)"),
F_stat = (beta.exposure / se.exposure)^2 # Compute F-statistic
) %>%
filter(F_stat > 10) # Exclude weak instruments
cat("Number of instruments after F-stat filtering:", nrow(instruments), "\n")
# Step 1.5: Steiger Filtering
instruments <- steiger_filtering(instruments)
instruments <- instruments %>% filter(steiger_dir == TRUE) # Keep only SNPs with correct direction
cat("Number of instruments after Steiger filtering:", nrow(instruments), "\n")
cat("Preview of Steiger-filtered instruments:\n")
print(head(instruments))
# Subset instruments to keep only TwoSampleMR, F-stat, and Steiger fields
instruments_subset <- instruments %>%
dplyr::select(
SNP,
effect_allele.exposure, other_allele.exposure,
effect_allele.outcome, other_allele.outcome,
beta.exposure, se.exposure, pval.exposure,
beta.outcome, se.outcome, pval.outcome,
eaf.exposure, eaf.outcome,
id.exposure, exposure,
id.outcome, outcome,
samplesize.exposure, samplesize.outcome,
mr_keep, action,
F_stat,
steiger_dir, steiger_pval
)
# Step 2: Local LD Clumping
clumped <- ld_clump(
dplyr::tibble(rsid = instruments$rsid, pval = instruments$pval, id = instruments$id),
plink_bin = genetics.binaRies::get_plink_binary(),
bfile = "/Users/charleenadams/1000G_bfiles/EUR/EUR",
clump_r2 = 0.001,
clump_kb = 500 # Reduced window
)
clumped_dat <- instruments %>% dplyr::filter(SNP %in% clumped$rsid)
cat("Local clumping completed. Number of SNPs retained:", nrow(clumped_dat), "\n")
cat("Preview of clumped data:\n")
print(head(clumped_dat))
# Subset clumped_dat to keep only TwoSampleMR, F-stat, and Steiger fields
clumped_dat_subset <- clumped_dat %>%
dplyr::select(
SNP,
effect_allele.exposure, other_allele.exposure,
effect_allele.outcome, other_allele.outcome,
beta.exposure, se.exposure, pval.exposure,
beta.outcome, se.outcome, pval.outcome,
eaf.exposure, eaf.outcome,
id.exposure, exposure,
id.outcome, outcome,
samplesize.exposure, samplesize.outcome,
mr_keep, action,
F_stat,
steiger_dir, steiger_pval
)
# Step 3: Perform Selected MR Analyses
# 3.1: Inverse Variance Weighted (IVW)
ivw_result <- tryCatch({
mr(clumped_dat, method_list = "mr_ivw")
}, error = function(e) {
cat("Error in IVW for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.2: MR-Egger
egger_result <- tryCatch({
mr(clumped_dat, method_list = "mr_egger_regression")
}, error = function(e) {
cat("Error in MR-Egger for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.3: Weighted Median
weighted_median_result <- tryCatch({
mr(clumped_dat, method_list = "mr_weighted_median")
}, error = function(e) {
cat("Error in Weighted Median for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.4: MR-Lasso
mr_input <- mr_input(
bx = clumped_dat$beta.exposure,
bxse = clumped_dat$se.exposure,
by = clumped_dat$beta.outcome,
byse = clumped_dat$se.outcome,
exposure = exposure_name,
outcome = "BMI",
snps = clumped_dat$SNP
)
lasso_result <- tryCatch({
mr_lasso(mr_input)
}, error = function(e) {
cat("Error in MR-Lasso for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.5: Contamination Mixture (ConMix)
conmix_result <- tryCatch({
mr_conmix(mr_input)
}, error = function(e) {
cat("Error in MR-ConMix for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.6: Heterogeneity Test
heterogeneity_result <- tryCatch({
mr_heterogeneity(clumped_dat)
}, error = function(e) {
cat("Error in Heterogeneity Test for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.7: Pleiotropy Test
pleiotropy_result <- tryCatch({
mr_pleiotropy_test(clumped_dat)
}, error = function(e) {
cat("Error in Pleiotropy Test for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# 3.8: Leave-One-Out Analysis
loo_result <- tryCatch({
mr_leaveoneout(clumped_dat)
}, error = function(e) {
cat("Error in Leave-One-Out for", exposure_name, ":", conditionMessage(e), "\n")
NULL
})
# Step 4: Perform Wald Ratio Tests for Each Instrument
wald_ratios <- clumped_dat %>%
mutate(
wald_beta = beta.outcome / beta.exposure,
wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
method = paste("Wald Ratio:", SNP)
) %>%
dplyr::select(SNP, wald_beta, wald_se, pval, method)
cat("\n=== Wald Ratio Tests for Each Instrument for", exposure_name, "===\n")
print(wald_ratios)
# Step 5: Prepare Data for Forest Plot
ivw_df <- if (!is.null(ivw_result)) ivw_result %>% mutate(method = "IVW") else NULL
egger_df <- if (!is.null(egger_result)) egger_result %>% mutate(method = "MR-Egger") else NULL
weighted_median_df <- if (!is.null(weighted_median_result)) weighted_median_result %>% mutate(method = "Weighted Median") else NULL
lasso_df <- if (!is.null(lasso_result)) {
data.frame(method = "MR-Lasso", b = lasso_result@Estimate,
se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
NULL
}
conmix_se <- if (!is.null(conmix_result)) {
(conmix_result@CIUpper - conmix_result@CILower) / (2 * 1.96) # SE = (CIUpper - CILower) / (2 * 1.96) for 95% CI
} else {
NA
}
conmix_df <- if (!is.null(conmix_result)) {
data.frame(method = "MR-ConMix", b = conmix_result@Estimate,
se = conmix_se, pval = conmix_result@Pvalue)
} else {
NULL
}
mr_results <- bind_rows(
ivw_df,
egger_df,
weighted_median_df,
lasso_df,
conmix_df,
wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
mutate(method = as.factor(method)) %>%
filter(!is.na(method) & method != "NA") %>%
arrange(b) %>%
mutate(method = factor(method, levels = unique(method)))
# Step 6: Create Beautiful Forest Plot with Dynamic Colors (No Numbers)
base_colors <- c(
"IVW" = "#1F78B4",
"MR-Egger" = "#FF7F00",
"Weighted Median" = "#33A02C",
"MR-Lasso" = "#FB9A99",
"MR-ConMix" = "#E41A1C"
)
wald_methods <- unique(wald_ratios$method)
n_wald <- length(wald_methods)
if (n_wald > 0) {
wald_colors <- hue_pal()(n_wald)
names(wald_colors) <- wald_methods
} else {
wald_colors <- NULL
}
color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]
# Dynamically determine x-axis limits based on confidence intervals
x_min <- min(mr_results$b - 1.96 * mr_results$se, na.rm = TRUE)
x_max <- max(mr_results$b + 1.96 * mr_results$se, na.rm = TRUE)
forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
geom_point(size = 3) +
geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
labs(
title = paste0("Mendelian Randomization Estimates:\n", exposure_name, " (1MB around ACY1 TSS) on Jurgens BMI"),
x = "Causal Effect (Beta)",
y = ""
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.y = element_text(size = 12, face = "bold"),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 12),
panel.grid.major = element_line(color = "grey90"),
panel.grid.minor = element_blank(),
legend.position = "none",
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
) +
scale_color_manual(values = color_list) +
xlim(x_min - 0.02, x_max + 0.02)
# Step 7: Create Forest Plot Ordered by Method
desired_order <- c(
"IVW",
"MR-Egger",
"MR-Lasso",
"MR-ConMix",
"Weighted Median",
unique(wald_ratios$method)
)
desired_order <- desired_order[desired_order %in% unique(mr_results$method)]
mr_results$method <- factor(mr_results$method, levels = desired_order)
mr_results <- mr_results[order(mr_results$method), ]
forest_plot_by_method <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se),
height = 0.2, na.rm = TRUE, linewidth = 0.8) +
geom_point(size = 3) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
labs(
title = paste0("Mendelian Randomization Estimates:\n", exposure_name, " (1MB around ACY1 TSS)\non Jurgens BMI"),
x = "Causal Effect (Beta)",
y = ""
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.y = element_text(size = 12, face = "bold"),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 12),
panel.grid.major = element_line(color = "grey90"),
panel.grid.minor = element_blank(),
legend.position = "none",
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
) +
scale_color_manual(values = color_list) +
xlim(x_min - 0.02, x_max + 0.02)
# Save plots
plot_file <- paste0(results_dir, "MR_Forestplot_", exposure_name, "_Jurgens_BMI_", today, ".png")
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat("Forest plot saved to:", plot_file, "\n")
plot_file_by_method <- paste0(results_dir, "MR_Forestplot_", exposure_name, "_Jurgens_BMI_By_Method_", today, ".png")
ggsave(plot_file_by_method, plot = forest_plot_by_method, dpi = 300, width = 8, height = 5)
cat("Forest plot by method saved to:", plot_file_by_method, "\n")
# Step 8: Save Everything in One Excel Spreadsheet
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")
add_formatted_sheet <- function(wb, sheet_name, data, title) {
addWorksheet(wb, sheet_name)
if (is.null(data) || nrow(data) == 0 || ncol(data) == 0) {
writeData(wb, sheet_name, paste("No data available for", sheet_name), startRow = 1, startCol = 1)
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
} else {
writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
setRowHeights(wb, sheet_name, rows = 1, heights = 20)
setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}
}
# TOC with all sections
toc_data <- data.frame(
Sheet = c("IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix",
"Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
"Instruments", "Clumped_Data"),
Title = c("Inverse Variance Weighted (IVW) Results", "MR-Egger Results",
"Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
"Wald Ratio Tests for Each Instrument", "Heterogeneity Test Results",
"Pleiotropy Test Results", "Leave-One-Out Analysis Results",
"Filtered Instruments Data", "Clumped SNPs Data")
)
toc_data <- toc_data[complete.cases(toc_data), ] # Remove any NA rows
addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")
# Add all sheets
add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")
if (!is.null(lasso_result)) {
lasso_df <- data.frame(
Exposure = lasso_result@Exposure,
Outcome = lasso_result@Outcome,
Estimate = lasso_result@Estimate,
StdError = lasso_result@StdError,
CILower = lasso_result@CILower,
CIUpper = lasso_result@CIUpper,
Pvalue = lasso_result@Pvalue,
SNPs = lasso_result@SNPs,
Valid = lasso_result@Valid,
ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
RegEstimate = lasso_result@RegEstimate,
RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
Lambda = lasso_result@Lambda
)
add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
} else {
addWorksheet(wb, "MR-Lasso")
writeData(wb, "MR-Lasso", "MR-Lasso Analysis Failed", startRow = 1, startCol = 1)
addStyle(wb, "MR-Lasso", title_style, rows = 1, cols = 1)
}
if (!is.null(conmix_result)) {
conmix_df <- data.frame(
Exposure = conmix_result@Exposure,
Outcome = conmix_result@Outcome,
Estimate = conmix_result@Estimate,
Pvalue = conmix_result@Pvalue,
SNPs = conmix_result@SNPs,
Psi = conmix_result@Psi,
CILower = conmix_result@CILower,
CIUpper = conmix_result@CIUpper,
CIRange = paste(conmix_result@CIRange, collapse = ", "),
CIMin = conmix_result@CIMin,
CIMax = conmix_result@CIMax,
CIStep = conmix_result@CIStep,
Valid = paste(conmix_result@Valid, collapse = ", "),
ValidSNPs = paste(conmix_result@ValidSNPs, collapse = ", "),
Alpha = conmix_result@Alpha
)
add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
} else {
addWorksheet(wb, "MR-ConMix")
writeData(wb, "MR-ConMix", "MR-ConMix Analysis Failed", startRow = 1, startCol = 1)
addStyle(wb, "MR-ConMix", title_style, rows = 1, cols = 1)
}
add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis Results")
add_formatted_sheet(wb, "Instruments", instruments_subset, "Filtered Instruments Data")
add_formatted_sheet(wb, "Clumped_Data", clumped_dat_subset, "Clumped SNPs Data")
excel_file <- paste0(results_dir, "MR_Results_", exposure_name, "_Jurgens_BMI_", today, ".xlsx")
saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved to:", excel_file, "\n")
}
# Clear the environment
rm(list = ls())
# Load required libraries
library(data.table)
library(dplyr)
library(TwoSampleMR)
library(MRInstruments)
library(openxlsx)
library(ggplot2)
library(scales)
library(MendelianRandomization)
# Note for Mac M3 users:
# Ensure dependencies are installed:
# - Install Homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# - Install plink: brew install plink
# - Install R packages if missing: install.packages(c("data.table", "dplyr", "TwoSampleMR", "MRInstruments", "openxlsx", "ggplot2", "scales", "MendelianRandomization"))
# Define master results directory
master_results_dir <- "/Users/charleenadams/mr_acetylaminos_bmi/results/"
if (!dir.exists(master_results_dir)) {
dir.create(master_results_dir, recursive = TRUE)
cat("Created master results directory:", master_results_dir, "\n")
}
# Define exposure file information
exposure_info <- list(
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C1083",
exposure_name = "N-acetylmethionine",
sample_size = 6135
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C1110",
exposure_name = "N-acetylalanine",
sample_size = 6136
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C100001276",
exposure_name = "N-acetylisoleucine",
sample_size = 4946
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C100000282",
exposure_name = "N-acetylglutamate",
sample_size = 5864
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/C100001851",
exposure_name = "N-acetylserine",
sample_size = 6133
)
)
# Define outcome file information
outcome_info <- list(
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/merged_gwas_data_grch38_twosample.tsv",
outcome_name = "BMI",
sample_size = 460000,
id = "BMI"
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/ukb-b-12651_formatted_lifted38.tsv",
outcome_name = "PureHypercholesterolaemia",
sample_size = 463010,
id = "ukb-b-12651"
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/ukb-b-970_formatted_lifted38.tsv",
outcome_name = "Type2DiabetesNoComplications",
sample_size = 463010,
id = "ukb-b-970"
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/ukb-b-12493_formatted_lifted38.tsv",
outcome_name = "EssentialHypertension",
sample_size = 463010,
id = "ukb-b-12493"
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/ukb-b-15541_formatted_lifted38.tsv",
outcome_name = "Obesity",
sample_size = 463010,
id = "ukb-b-15541"
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/34841290-GCST90091033-EFO_0003095_formatted.tsv",
outcome_name = "NonAlcoholicFattyLiverDisease",
sample_size = 778614,
id = "GCST90091033"
),
list(
file = "/Users/charleenadams/mr_acetylaminos_bmi/ebi-a-GCST006867_formatted_lifted38.tsv",
outcome_name = "Type2Diabetes",
sample_size = 655666,
id = "ebi-a-GCST006867"
)
)
# Load local annotation file for ACY1 TSS (outside loops for efficiency)
local_annotation <- read.table("/Users/charleenadams/mr_ukbppp_chd/mart_export_clean.txt",
header = TRUE, sep = "\t", stringsAsFactors = FALSE)
acy1_data <- local_annotation[local_annotation$Gene.name == "ACY1", ]
acy1_tss <- min(acy1_data$Transcript.start..bp.)
acy1_chrom <- acy1_data$Chromosome.scaffold.name[1]
window_size <- 500000
tss_start <- acy1_tss - window_size
tss_end <- acy1_tss + window_size
cat("TSS for ACY1:", acy1_tss, "\n")
cat("500kb upstream:", tss_start, "\n")
cat("500kb downstream:", tss_end, "\n")
# Double loop through each exposure and outcome
for (exp in exposure_info) {
exposure_file <- exp$file
exposure_name <- exp$exposure_name
sample_size_exp <- exp$sample_size
# Load and format exposure data (once per exposure)
exp_data <- fread(exposure_file) %>% as.data.frame()
filtered_exp_df <- exp_data %>%
filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
arrange(rsids)
exp_dat <- filtered_exp_df %>%
mutate(
chr.exposure = chrom,
pos.exposure = pos,
beta.exposure = beta,
se.exposure = sebeta,
exposure = exposure_name,
id.exposure = exposure_name,
pval.exposure = pval,
SNP.exposure = rsids,
SNP = rsids,
effect_allele.exposure = alt,
other_allele.exposure = ref,
eaf.exposure = maf,
samplesize.exposure = sample_size_exp,
id_col = nearest_genes
)
for (out in outcome_info) {
outcome_file <- out$file
outcome_name <- out$outcome_name
sample_size_out <- out$sample_size
outcome_id <- out$id
today <- Sys.Date()
# Create results subdirectory
results_dir <- paste0(master_results_dir, exposure_name, "_", outcome_name, "/")
if (!dir.exists(results_dir)) {
dir.create(results_dir, recursive = TRUE)
cat("Created directory:", results_dir, "\n")
}
# Check if analysis is already complete
excel_file <- paste0(results_dir, "MR_Results_", exposure_name, "_", outcome_name, "_", today, ".xlsx")
if (file.exists(excel_file)) {
cat("Analysis for", exposure_name, "on", outcome_name, "already complete. Skipping.\n")
next
}
# ---------------------------------------------
# Load and Format Outcome Data
# ---------------------------------------------
out_data <- fread(outcome_file) %>% as.data.frame()
cat("Columns in", outcome_file, ":\n")
print(colnames(out_data))
filtered_out_df <- out_data %>%
filter(!is.na(SNP) & SNP != "" & grepl("^rs", SNP)) %>%
arrange(SNP)
# Standardize column names, handling BMI-specific names
if (outcome_id == "BMI") {
out_dat <- filtered_out_df %>%
mutate(
SNP = SNP,
SNP.outcome = SNP,
chr.outcome = CHR_x,
pos.outcome = BP_GRCh38,
beta.outcome = beta,
se.outcome = se,
pval.outcome = pval,
effect_allele.outcome = effect_allele,
other_allele.outcome = other_allele,
eaf.outcome = eaf,
samplesize.outcome = sample_size_out,
id.outcome = outcome_id,
outcome = outcome_name
)
} else {
out_dat <- filtered_out_df %>%
mutate(
SNP = SNP,
SNP.outcome = SNP,
chr.outcome = CHR,
pos.outcome = BP,
beta.outcome = BETA,
se.outcome = SE,
pval.outcome = PVAL,
effect_allele.outcome = ALT,
other_allele.outcome = REF,
eaf.outcome = EAF,
samplesize.outcome = sample_size_out,
id.outcome = outcome_id,
outcome = outcome_name
)
}
# ---------------------------------------------
# Harmonize Data
# ---------------------------------------------
dat <- harmonise_data(exposure_dat = exp_dat, outcome_dat = out_dat)
harmonized_file <- paste0(results_dir, "harmonized_", exposure_name, "_", outcome_name, "_", today, ".csv")
write.csv(dat, harmonized_file, row.names = FALSE)
cat("Harmonized data saved to:", harmonized_file, "\n")
# ---------------------------------------------
# Filter 1MB around ACY1 TSS
# ---------------------------------------------
dat_filtered <- dat %>%
filter(chr.outcome == acy1_chrom & pos.outcome >= tss_start & pos.outcome <= tss_end)
filtered_file <- paste0(results_dir, "filtered_ACY1_1Mb_", exposure_name, "_", outcome_name, "_", today, ".csv")
write.csv(dat_filtered, file = filtered_file, row.names = FALSE)
cat("Filtered data saved to:", filtered_file, "\n")
cat("Number of observations within 1 Mb of ACY1 TSS:", nrow(dat_filtered), "\n")
# ---------------------------------------------
# MR Analyses
# ---------------------------------------------
# Step 1: Filter Instruments
dat_filtered <- read.csv(filtered_file)
instruments <- dat_filtered %>%
filter(mr_keep == TRUE, pval.exposure < 5e-4) %>%
mutate(
rsid = SNP,
pval = pval.exposure,
id = paste0(exposure_name, " (1MB around ACY1 TSS)"),
F_stat = (beta.exposure / se.exposure)^2
) %>%
filter(F_stat > 10)
cat("Number of instruments after F-stat filtering:", nrow(instruments), "\n")
# Check if instruments are empty
if (nrow(instruments) == 0) {
cat("No valid instruments found for", exposure_name, "on", outcome_name, ". Skipping MR analyses.\n")
# Create placeholder Excel file
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
addWorksheet(wb, "Summary")
writeData(wb, "Summary", paste("No valid instruments found for", exposure_name, "on", outcome_name),
startRow = 1, startCol = 1)
addStyle(wb, "Summary", title_style, rows = 1, cols = 1)
saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Placeholder Excel file saved to:", excel_file, "\n")
next
}
# Step 1.5: Steiger Filtering
instruments <- tryCatch({
steiger_filtering(instruments)
}, error = function(e) {
cat("Error in Steiger filtering for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
instruments # Return original instruments to avoid breaking
})
# Check if steiger_dir exists before filtering
if ("steiger_dir" %in% colnames(instruments)) {
instruments <- instruments %>% filter(steiger_dir == TRUE)
} else {
cat("Warning: steiger_dir column not found after steiger_filtering for", exposure_name, "on", outcome_name, ". Proceeding without Steiger filtering.\n")
}
cat("Number of instruments after Steiger filtering:", nrow(instruments), "\n")
cat("Preview of Steiger-filtered instruments:\n")
print(head(instruments))
# Subset instruments
instruments_subset <- instruments %>%
dplyr::select(
SNP,
effect_allele.exposure, other_allele.exposure,
effect_allele.outcome, other_allele.outcome,
beta.exposure, se.exposure, pval.exposure,
beta.outcome, se.outcome, pval.outcome,
eaf.exposure, eaf.outcome,
id.exposure, exposure,
id.outcome, outcome,
samplesize.exposure, samplesize.outcome,
mr_keep, action,
F_stat,
steiger_dir, steiger_pval
)
# Step 2: Local LD Clumping
clumped <- ld_clump(
dplyr::tibble(rsid = instruments$rsid, pval = instruments$pval, id = instruments$id),
plink_bin = genetics.binaRies::get_plink_binary(),
bfile = "/Users/charleenadams/1000G_bfiles/EUR/EUR",
clump_r2 = 0.001,
clump_kb = 500
)
clumped_dat <- instruments %>% dplyr::filter(SNP %in% clumped$rsid)
cat("Local clumping completed. Number of SNPs retained:", nrow(clumped_dat), "\n")
cat("Preview of clumped data:\n")
print(head(clumped_dat))
# Subset clumped data
clumped_dat_subset <- clumped_dat %>%
dplyr::select(
SNP,
effect_allele.exposure, other_allele.exposure,
effect_allele.outcome, other_allele.outcome,
beta.exposure, se.exposure, pval.exposure,
beta.outcome, se.outcome, pval.outcome,
eaf.exposure, eaf.outcome,
id.exposure, exposure,
id.outcome, outcome,
samplesize.exposure, samplesize.outcome,
mr_keep, action,
F_stat,
steiger_dir, steiger_pval
)
# Step 3: Perform MR Analyses
ivw_result <- tryCatch({
mr(clumped_dat, method_list = "mr_ivw")
}, error = function(e) {
cat("Error in IVW for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
egger_result <- tryCatch({
mr(clumped_dat, method_list = "mr_egger_regression")
}, error = function(e) {
cat("Error in MR-Egger for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
weighted_median_result <- tryCatch({
mr(clumped_dat, method_list = "mr_weighted_median")
}, error = function(e) {
cat("Error in Weighted Median for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
mr_input <- mr_input(
bx = clumped_dat$beta.exposure,
bxse = clumped_dat$se.exposure,
by = clumped_dat$beta.outcome,
byse = clumped_dat$se.outcome,
exposure = exposure_name,
outcome = outcome_name,
snps = clumped_dat$SNP
)
lasso_result <- tryCatch({
mr_lasso(mr_input)
}, error = function(e) {
cat("Error in MR-Lasso for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
conmix_result <- tryCatch({
mr_conmix(mr_input)
}, error = function(e) {
cat("Error in MR-ConMix for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
heterogeneity_result <- tryCatch({
mr_heterogeneity(clumped_dat)
}, error = function(e) {
cat("Error in Heterogeneity Test for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
pleiotropy_result <- tryCatch({
mr_pleiotropy_test(clumped_dat)
}, error = function(e) {
cat("Error in Pleiotropy Test for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
loo_result <- tryCatch({
mr_leaveoneout(clumped_dat)
}, error = function(e) {
cat("Error in Leave-One-Out for", exposure_name, "on", outcome_name, ":", conditionMessage(e), "\n")
NULL
})
# Step 4: Wald Ratio Tests
wald_ratios <- clumped_dat %>%
mutate(
wald_beta = beta.outcome / beta.exposure,
wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
method = paste("Wald Ratio:", SNP)
) %>%
dplyr::select(SNP, wald_beta, wald_se, pval, method)
cat("\n=== Wald Ratio Tests for", exposure_name, "on", outcome_name, "===\n")
print(wald_ratios)
# Step 5: Prepare Data for Forest Plot
ivw_df <- if (!is.null(ivw_result)) ivw_result %>% mutate(method = "IVW") else NULL
egger_df <- if (!is.null(egger_result)) egger_result %>% mutate(method = "MR-Egger") else NULL
weighted_median_df <- if (!is.null(weighted_median_result)) weighted_median_result %>% mutate(method = "Weighted Median") else NULL
lasso_df <- if (!is.null(lasso_result)) {
data.frame(method = "MR-Lasso", b = lasso_result@Estimate,
se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
NULL
}
conmix_se <- if (!is.null(conmix_result)) {
(conmix_result@CIUpper - conmix_result@CILower) / (2 * 1.96)
} else {
NA
}
conmix_df <- if (!is.null(conmix_result)) {
data.frame(method = "MR-ConMix", b = conmix_result@Estimate,
se = conmix_se, pval = conmix_result@Pvalue)
} else {
NULL
}
mr_results <- bind_rows(
ivw_df,
egger_df,
weighted_median_df,
lasso_df,
conmix_df,
wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
mutate(method = as.factor(method)) %>%
filter(!is.na(method) & method != "NA") %>%
arrange(b) %>%
mutate(method = factor(method, levels = unique(method)))
# Step 6: Create Forest Plot
base_colors <- c(
"IVW" = "#1F78B4",
"MR-Egger" = "#FF7F00",
"Weighted Median" = "#33A02C",
"MR-Lasso" = "#FB9A99",
"MR-ConMix" = "#E41A1C"
)
wald_methods <- unique(wald_ratios$method)
n_wald <- length(wald_methods)
if (n_wald > 0) {
wald_colors <- hue_pal()(n_wald)
names(wald_colors) <- wald_methods
} else {
wald_colors <- NULL
}
color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]
x_min <- min(mr_results$b - 1.96 * mr_results$se, na.rm = TRUE)
x_max <- max(mr_results$b + 1.96 * mr_results$se, na.rm = TRUE)
forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
geom_point(size = 3) +
geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
labs(
title = paste0("MR Estimates:\n", exposure_name, " (1MB around ACY1 TSS) on ", outcome_name),
x = "Causal Effect (Beta)",
y = ""
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.y = element_text(size = 12, face = "bold"),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 12),
panel.grid.major = element_line(color = "grey90"),
panel.grid.minor = element_blank(),
legend.position = "none",
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
) +
scale_color_manual(values = color_list) +
xlim(x_min - 0.02, x_max + 0.02)
# Step 7: Create Forest Plot Ordered by Method
desired_order <- c(
"IVW",
"MR-Egger",
"MR-Lasso",
"MR-ConMix",
"Weighted Median",
unique(wald_ratios$method)
)
desired_order <- desired_order[desired_order %in% unique(mr_results$method)]
mr_results$method <- factor(mr_results$method, levels = desired_order)
mr_results <- mr_results[order(mr_results$method), ]
forest_plot_by_method <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se),
height = 0.2, na.rm = TRUE, linewidth = 0.8) +
geom_point(size = 3) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
labs(
title = paste0("MR Estimates:\n", exposure_name, " (1MB around ACY1 TSS) on ", outcome_name),
x = "Causal Effect (Beta)",
y = ""
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.y = element_text(size = 12, face = "bold"),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 12),
panel.grid.major = element_line(color = "grey90"),
panel.grid.minor = element_blank(),
legend.position = "none",
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
) +
scale_color_manual(values = color_list) +
xlim(x_min - 0.02, x_max + 0.02)
# Save plots
plot_file <- paste0(results_dir, "MR_Forestplot_", exposure_name, "_", outcome_name, "_", today, ".png")
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat("Forest plot saved to:", plot_file, "\n")
plot_file_by_method <- paste0(results_dir, "MR_Forestplot_", exposure_name, "_", outcome_name, "_By_Method_", today, ".png")
ggsave(plot_file_by_method, forest_plot_by_method, dpi = 300, width = 8, height = 5)
cat("Forest plot by method saved to:", plot_file_by_method, "\n")
# Step 8: Save Results in Excel
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")
add_formatted_sheet <- function(wb, sheet_name, data, title) {
addWorksheet(wb, sheet_name)
if (is.null(data) || nrow(data) == 0 || ncol(data) == 0) {
writeData(wb, sheet_name, paste("No data available for", sheet_name), startRow = 1, startCol = 1)
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
} else {
writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
setRowHeights(wb, sheet_name, rows = 1, heights = 20)
setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}
}
toc_data <- data.frame(
Sheet = c("IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix",
"Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
"Instruments", "Clumped_Data"),
Title = c("Inverse Variance Weighted (IVW) Results", "MR-Egger Results",
"Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
"Wald Ratio Tests for Each Instrument", "Heterogeneity Test Results",
"Pleiotropy Test Results", "Leave-One-Out Analysis Results",
"Filtered Instruments Data", "Clumped SNPs Data")
)
toc_data <- toc_data[complete.cases(toc_data), ]
addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")
add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")
if (!is.null(lasso_result)) {
lasso_df <- data.frame(
Exposure = lasso_result@Exposure,
Outcome = lasso_result@Outcome,
Estimate = lasso_result@Estimate,
StdError = lasso_result@StdError,
CILower = lasso_result@CILower,
CIUpper = lasso_result@CIUpper,
Pvalue = lasso_result@Pvalue,
SNPs = lasso_result@SNPs,
Valid = lasso_result@Valid,
ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
RegEstimate = lasso_result@RegEstimate,
RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
Lambda = lasso_result@Lambda
)
add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
} else {
addWorksheet(wb, "MR-Lasso")
writeData(wb, "MR-Lasso", "MR-Lasso Analysis Failed", startRow = 1, startCol = 1)
addStyle(wb, "MR-Lasso", title_style, rows = 1, cols = 1)
}
if (!is.null(conmix_result)) {
conmix_df <- data.frame(
Exposure = conmix_result@Exposure,
Outcome = conmix_result@Outcome,
Estimate = conmix_result@Estimate,
Pvalue = conmix_result@Pvalue,
SNPs = conmix_result@SNPs,
Psi = conmix_result@Psi,
CILower = conmix_result@CILower,
CIUpper = conmix_result@CIUpper,
CIRange = paste(conmix_result@CIRange, collapse = ", "),
CIMin = conmix_result@CIMin,
CIMax = conmix_result@CIMax,
CIStep = conmix_result@CIStep,
Valid = paste(conmix_result@Valid, collapse = ", "),
ValidSNPs = paste(conmix_result@ValidSNPs, collapse = ", "),
Alpha = conmix_result@Alpha
)
add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
} else {
addWorksheet(wb, "MR-ConMix")
writeData(wb, "MR-ConMix", "MR-ConMix Analysis Failed", startRow = 1, startCol = 1)
addStyle(wb, "MR-ConMix", title_style, rows = 1, cols = 1)
}
add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis Results")
add_formatted_sheet(wb, "Instruments", instruments_subset, "Filtered Instruments Data")
add_formatted_sheet(wb, "Clumped_Data", clumped_dat_subset, "Clumped SNPs Data")
saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved to:", excel_file, "\n")
}
}
# Load required packages
library(coloc)
library(data.table)
library(stringr)
# Set working directory
setwd("/Users/charleenadams/mr_acetylaminos_bmi/results")
# Find all filtered CSV files recursively
filtered_files <- list.files(path = ".", pattern = "^filtered_.*\\.csv$", recursive = TRUE, full.names = TRUE)
# Today's date
today_date <- format(Sys.Date(), "%Y-%m-%d")
# Create output directory if needed
dir.create("coloc_results", showWarnings = FALSE)
# Mapping of known case/control numbers
case_control_info <- list(
"PureHypercholesterolaemia" = list(ncase = 22622, ncontrol = 440388),
"Type2DiabetesNoComplications" = list(ncase = 12045, ncontrol = 450965),
"EssentialHypertension" = list(ncase = 54358, ncontrol = 408652),
"Obesity" = list(ncase = 4688, ncontrol = 458322),
"NonAlcoholicFattyLiverDisease" = list(ncase = 8434, ncontrol = 770180, n_total = 778614),
"Type2Diabetes" = list(ncase = 61714, ncontrol = 593952)
)
# Start looping through each file
for (file in filtered_files) {
cat("Running coloc for:", file, "\n")
# Read the data
dat <- fread(file)
# ❗ Remove duplicated SNPs (keep first occurrence only)
dat <- dat[!duplicated(dat$SNP), ]
# Build clean exposure_outcome name
clean_name <- gsub("^.*/filtered_|\\.csv$", "", file)
# Split the name by "_"
split_name <- strsplit(clean_name, "_")[[1]]
# Outcome is second-to-last element before the date
outcome_name <- split_name[length(split_name) - 1]
# Decide outcome type
if (tolower(outcome_name) == "bmi") {
outcome_type <- "quant"
s_outcome <- NULL
} else {
outcome_type <- "cc"
if (outcome_name %in% names(case_control_info)) {
info <- case_control_info[[outcome_name]]
if (!is.na(info$ncase) & !is.na(info$ncontrol)) {
s_outcome <- info$ncase / (info$ncase + info$ncontrol)
} else if (!is.na(info$n_total)) {
s_outcome <- 0.05 # fallback for NAFLD if needed
} else {
stop(paste0("No valid case/control info for outcome: ", outcome_name))
}
} else {
stop(paste0("Outcome not recognized in case_control_info: ", outcome_name))
}
}
# Build exposure dataset
dataset1 <- list(
beta = dat$beta.exposure,
varbeta = (dat$se.exposure)^2,
snp = dat$SNP,
MAF = dat$eaf.exposure,
type = "quant",
N = unique(dat$samplesize.exposure)
)
# Build outcome dataset
dataset2 <- list(
beta = dat$beta.outcome,
varbeta = (dat$se.outcome)^2,
snp = dat$SNP,
MAF = dat$eaf.outcome,
type = outcome_type,
N = unique(dat$samplesize.outcome)
)
if (!is.null(s_outcome)) {
dataset2$s <- s_outcome
}
# Run coloc.abf
coloc_res <- coloc.abf(dataset1, dataset2)
# Save coloc result
output_file <- paste0("coloc_results/coloc_", clean_name, "_", today_date, ".rds")
saveRDS(coloc_res, file = output_file)
}
# Compile the results
# Find all coloc result .rds files
coloc_files <- list.files(path = "coloc_results", pattern = "^coloc_.*\\.rds$", full.names = TRUE)
# Initialize empty list to store results
coloc_summary_list <- list()
# Loop through each coloc result file
for (file in coloc_files) {
cat("Collecting:", file, "\n")
# Read coloc result
res <- readRDS(file)
# Extract posterior probabilities
pp <- res$summary
# Extract exposure and outcome properly from filename
clean_name <- gsub("^coloc_|\\.rds$", "", basename(file))
parts <- unlist(strsplit(clean_name, "_"))
# parts[1] = ACY1
# parts[2] = 1Mb
# parts[3] = exposure (e.g., N-acetylalanine)
# parts[4] = outcome (e.g., BMI)
# parts[5] = date (ignore)
if (length(parts) < 5) {
stop(paste("Filename", file, "does not have expected structure!"))
}
exposure <- parts[3]
outcome <- parts[4]
# Assemble into a row
coloc_summary_list[[file]] <- data.frame(
exposure = exposure,
outcome = outcome,
PP.H0 = pp["PP.H0.abf"],
PP.H1 = pp["PP.H1.abf"],
PP.H2 = pp["PP.H2.abf"],
PP.H3 = pp["PP.H3.abf"],
PP.H4 = pp["PP.H4.abf"]
)
}
# Combine all into one data frame
coloc_summary_df <- rbindlist(coloc_summary_list)
# Save to CSV
fwrite(coloc_summary_df, file = "coloc_summary_table_2025-04-27.csv")
cat("✅ Coloc summary table saved as coloc_summary_table_2025-04-27.csv\n")
https://yodamendel.shinyapps.io/mr_acetylaminos_app/
# Run with:
# shiny::runApp("/Users/charleenadams/mr_acetylaminos_bmi/mr_acetylaminos_app")
# rsconnect::deployApp('/Users/charleenadams/mr_acetylaminos_bmi/mr_acetylaminos_app')
library(shiny)
# Set CRAN repository to avoid mrcieu.r-universe.dev issues
options(repos = c(CRAN = "https://cloud.r-project.org"))
# Define file paths for results, plots, and colocalization summary
base_path <- file.path("data", "results")
exposures <- c("N-acetylmethionine", "N-acetylalanine", "N-acetylisoleucine", "N-acetylglutamate", "N-acetylserine")
outcomes <- c("BMI", "PureHypercholesterolaemia", "Type2DiabetesNoComplications", "EssentialHypertension",
"Obesity", "NonAlcoholicFattyLiverDisease", "Type2Diabetes")
file_date <- "2025-04-27" # Fixed date to match your files
coloc_file <- file.path("data", "coloc_summary_table_2025-04-27 copy.csv") # Path to colocalization summary
# Generate file paths dynamically for MR results and plots
file_paths <- list()
plot_paths <- list()
for (exp in exposures) {
for (out in outcomes) {
folder <- file.path(base_path, paste0(exp, "_", out))
key <- paste(exp, out, sep = "_")
file_paths[[key]] <- file.path(folder, paste0("MR_Results_", exp, "_", out, "_", file_date, ".xlsx"))
plot_paths[[key]] <- file.path(folder, paste0("MR_Forestplot_", exp, "_", out, "_By_Method_", file_date, ".png"))
}
}
# UI
ui <- fluidPage(
tags$head(
tags$style(HTML("
#logo_container {
text-align: center;
margin-top: 20px;
margin-bottom: 20px;
opacity: 0;
transition: opacity 0.5s ease-in-out;
}
#logo_container.visible {
opacity: 1;
}
.image-grid {
display: flex;
flex-wrap: wrap;
justify-content: center;
gap: 20px;
margin-top: 20px;
}
.image-grid img {
width: 200px;
height: 200px;
object-fit: contain;
}
.error-message {
color: red;
font-size: 16px;
text-align: center;
}
")),
tags$script(HTML("
$(document).on('shiny:connected', function() {
var acy1Image = $('.image-grid');
var logoContainer = $('#logo_container');
$(window).on('scroll', function() {
var scrollPosition = $(window).scrollTop() + $(window).height();
var acy1Bottom = acy1Image.offset().top + acy1Image.outerHeight();
if (scrollPosition > acy1Bottom) {
logoContainer.addClass('visible');
} else {
logoContainer.removeClass('visible');
}
});
});
"))
),
div(style = "text-align: center; margin-bottom: 20px;",
h1("Mendelian Randomization and Colocalization Results", style = "color: #A94800;"),
h4(HTML("Analysis of Five Acetyl-Amino Acids (1MB around <i>ACY1</i> TSS) on Seven Metabolic Outcomes"), style = "color: #A94800;"),
tags$a(href = "https://www.bidmc.org/research/research-by-department/medicine/cardiovascular-medicine/personal-genomics-and-cardiometabolic-disease",
target = "_blank",
"Our Lab",
style = "font-size: 18px; color: #A94800; text-decoration: none; font-weight: bold;")
),
div(style = "background-color: #FFF3E0; padding: 20px; border-radius: 10px; margin-bottom: 20px; border: 1px solid #D2691E;",
h3(HTML("The Story of Acetyl-Amino Acids and <i>ACY1</i>"), style = "color: #D2691E; text-align: center; margin-bottom: 20px;"),
p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
HTML("Amino acids are the building blocks of life, but their acetylated forms—like <i>N</i>-acetylmethionine, <i>N</i>-acetylalanine, <i>N</i>-acetylisoleucine, <i>N</i>-acetylglutamate, and <i>N</i>-acetylserine—might hold secrets to metabolic health. These metabolites, regulated by the enzyme encoded by the <i>ACY1</i> gene, are more than just biochemical byproducts. Emerging evidence suggests they could influence conditions like obesity, diabetes, and liver disease.")),
p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
HTML("<i>ACY1</i>, or aminoacylase-1, is an enzyme that breaks down <i>N</i>-acetylated amino acids in the kidneys and other tissues. Genetic variants near its transcription start site (TSS) can alter its activity, potentially affecting circulating levels of these metabolites. But do these acetyl-amino acids actually cause changes in metabolic outcomes, or are they just along for the ride? That’s where Mendelian Randomization (MR) and colocalization come in.")),
p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
HTML("We used MR to test causality between five acetyl-amino acids and seven metabolic outcomes—BMI, pure hypercholesterolemia, type 2 diabetes (with and without complications), essential hypertension, obesity, and non-alcoholic fatty liver disease—forming 35 exposure-outcome pairs. Our exposure data came from the METSIM cohort (~6,000 Finnish men), and outcome data were sourced from large-scale GWAS (up to 778,614 participants).")),
p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
HTML("The MR pipeline employed five methods—IVW, MR-Egger, Weighted Median, MR-Lasso, and ConMix—with a relaxed p-value threshold (5E-4) and 500 kb clumping for independent SNPs. Sensitivity analyses checked for heterogeneity and pleiotropy. Additionally, we performed colocalization analysis on all 35 pairs to assess whether the genetic signals for exposures and outcomes share a common causal variant, strengthening causal inference.")),
p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
HTML("The results? 35 MR analyses showed hints of causality for some pairs, though not all pairs could be tested and most were null. Colocalization was null for all, though. Download the MR results, forest plots, or colocalization summary below to explore the full story. (If we want to use any figures, I'll regenerate them to remove or fix the titles.)"))
),
sidebarLayout(
sidebarPanel(
width = 3,
style = "background-color: #FFE8D6; padding: 15px; border-radius: 5px;",
h3("Instructions", style = "color: #A94800;"),
p("Select an acetyl-amino acid and metabolic outcome to download MR results (Excel) or forest plots (PNG). Download the colocalization summary (CSV) for all 35 pairs.", style = "color: #A94800;"),
hr(),
h3("Download Results", style = "color: #A94800;"),
selectInput("exposure", "Select Acetyl-Amino Acid:",
choices = exposures, selected = exposures[1]),
selectInput("outcome", "Select Metabolic Outcome:",
choices = outcomes, selected = outcomes[1]),
uiOutput("download_results_ui"),
br(),
uiOutput("download_plot_ui"),
br(),
downloadButton("download_coloc", "Colocalization Summary"),
hr(),
div(style = "border: 1px solid #D2691E; padding: 10px; background-color: #D2691E; border-radius: 5px;",
strong("Objective", style = "color: #FFFFFF;"),
p(HTML("This app shares MR and colocalization analyses of five acetyl-amino acids (1MB around <i>ACY1</i> TSS) on seven metabolic outcomes."),
style = "color: #FFFFFF;")
)
),
mainPanel(
width = 9,
div(style = "text-align: center; margin-top: 20px;",
div(style = "margin-bottom: 10px;",
p(HTML("Tap any acetyl-amino acid image to see methods and pipeline"),
style = "color: #A94800; font-size: 24px; font-weight: bold; text-decoration: none;")
),
div(class = "image-grid",
tags$a(href = "https://rpubs.com/YodaMendel/1303448", target = "_blank",
img(id = "acy1_image", src = "image.png", alt = "N-acetylmethionine")),
tags$a(href = "https://rpubs.com/YodaMendel/1303448", target = "_blank",
img(src = "image2.png", alt = "N-acetylalanine")),
tags$a(href = "https://rpubs.com/YodaMendel/1303448", target = "_blank",
img(src = "image3.png", alt = "N-acetylisoleucine")),
tags$a(href = "https://rpubs.com/YodaMendel/1303448", target = "_blank",
img(src = "image4.png", alt = "N-acetylglutamate")),
tags$a(href = "https://rpubs.com/YodaMendel/1303448", target = "_blank",
img(src = "image5.png", alt = "N-acetylserine"))
)
),
div(id = "logo_container",
tags$a(href = "https://www.bidmc.org/research/research-by-department/medicine/cardiovascular-medicine/personal-genomics-and-cardiometabolic-disease",
target = "_blank",
img(src = "BIDMC_HMS_Stacked-LockUp.png", height = "100px", width = "auto"))
)
)
)
)
# Server
server <- function(input, output, session) {
# Reactive file path based on user selection
selected_file <- reactive({
file_paths[[paste(input$exposure, input$outcome, sep = "_")]]
})
selected_plot <- reactive({
plot_paths[[paste(input$exposure, input$outcome, sep = "_")]]
})
# Dynamic UI for results download
output$download_results_ui <- renderUI({
file_path <- selected_file()
if (file.exists(file_path)) {
downloadButton("download_results", "Download MR Results")
} else {
div(class = "error-message",
paste("Results file not available for", input$exposure, "and", input$outcome))
}
})
output$download_results <- downloadHandler(
filename = function() {
paste0("MR_Results_", input$exposure, "_", input$outcome, "_", file_date, ".xlsx")
},
content = function(file) {
tryCatch({
file.copy(selected_file(), file)
cat("Downloaded results file:", selected_file(), "\n")
}, error = function(e) {
cat("Error downloading results file:", selected_file(), "\nError:", conditionMessage(e), "\n")
stop(e)
})
}
)
# Dynamic UI for plot download
output$download_plot_ui <- renderUI({
plot_path <- selected_plot()
if (file.exists(plot_path)) {
downloadButton("download_plot", "Download Forest Plot")
} else {
div(class = "error-message",
paste("Forest plot not available for", input$exposure, "and", input$outcome))
}
})
output$download_plot <- downloadHandler(
filename = function() {
paste0("MR_Forestplot_", input$exposure, "_", input$outcome, "_By_Method_", file_date, ".png")
},
content = function(file) {
tryCatch({
file.copy(selected_plot(), file)
cat("Downloaded plot file:", selected_plot(), "\n")
}, error = function(e) {
cat("Error downloading plot file:", selected_plot(), "\nError:", conditionMessage(e), "\n")
stop(e)
})
}
)
# Download handler for colocalization summary
output$download_coloc <- downloadHandler(
filename = "coloc_summary_table_2025-04-27.csv",
content = function(file) {
tryCatch({
file.copy(coloc_file, file)
cat("Downloaded colocalization file:", coloc_file, "\n")
}, error = function(e) {
cat("Error downloading colocalization file:", coloc_file, "\nError:", conditionMessage(e), "\n")
stop(e)
})
},
contentType = "text/csv"
)
}
# Run app
shinyApp(ui = ui, server = server)