Here we perform a Mendelian randomization (MR) of circulating ACY1 levels on body mass index (BMI) in participants in the UK Biobank (UKB). Our genetic instruments for ACY1 come from Europeans (“discovery” data🚩; flagged for a reason) enrolled in the UK Biobank Pharma Proteomics Project (UKB-PPP)[1], and our BMI summary statistics come from Jurgens et al. (2023)[2]. We do so first using three biologically informed SNPs as instruments with the inverse-variance weighted (IVW) method, after investigating these SNPs for independence with Plink2 and LDlink, considering both D’ and r2. We perform Steiger filtering to ascertain whether our hypothesized direction of effect is plausible and heterogeniety and pleiotropy tests to assess violations to the MR assumption against horizontal pleiotriopy. We follow this with several MR sensitivity analyses, including MRLap to explore the impact of sample overlap and winner’s curse (when initially reported effect sizes of significant genetic variants are overestimated) on our estimates and MR-PRESSO with an instrument that includes more SNPs.
Sun BB, Chiou J, Traylor M, Benner C, Hsu YH, Richardson TG, Surendran P, Mahajan A, Robins C, Vasquez-Grinnell SG, Hou L, Kvikstad EM, Burren OS, Davitte J, Ferber KL, Gillies CE, Hedman ÅK, Hu S, Lin T, Mikkilineni R, Pendergrass RK, Pickering C, Prins B, Baird D, Chen CY, Ward LD, Deaton AM, Welsh S, Willis CM, Lehner N, Arnold M, Wörheide MA, Suhre K, Kastenmüller G, Sethi A, Cule M, Raj A; Alnylam Human Genetics; AstraZeneca Genomics Initiative; Biogen Biobank Team; Bristol Myers Squibb; Genentech Human Genetics; GlaxoSmithKline Genomic Sciences; Pfizer Integrative Biology; Population Analytics of Janssen Data Sciences; Regeneron Genetics Center; Burkitt-Gray L, Melamud E, Black MH, Fauman EB, Howson JMM, Kang HM, McCarthy MI, Nioi P, Petrovski S, Scott RA, Smith EN, Szalma S, Waterworth DM, Mitnaul LJ, Szustakowski JD, Gibson BW, Miller MR, Whelan CD. Plasma proteomic associations with genetics and health in the UK Biobank. Nature. 2023 Oct;622(7982):329-338. doi: 10.1038/s41586-023-06592-6. Epub 2023 Oct 4. PMID: 37794186; PMCID: PMC10567551.
Jurgens SJ, Pirruccello JP, Choi SH, Morrill VN, Chaffin M, Lubitz SA, Lunetta KL, Ellinor PT. Adjusting for common variant polygenic scores improves yield in rare variant association analyses. Nat Genet. 2023 Apr;55(4):544-548. doi: 10.1038/s41588-023-01342-w. Epub 2023 Mar 23. PMID: 36959364; PMCID: PMC11078202.
From the Synapse UKB-PPP website:
The UK Biobank Pharma Proteomics Project (UKB-PPP) is a collaboration between the UK Biobank (UKB) and thirteen biopharmaceutical companies characterising the plasma proteomic profiles of 54,219 UKB participants. As part of a collaborative analysis across the thirteen UKB-PPP partners, we conducted comprehensive protein quantitative trait loci (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 85% are newly discovered, in addition to ancestry-specific pQTL mapping in non-Europeans. We identify independent secondary associations in 87% of cis and 30% of trans loci, expanding the catalogue of genetic instruments for downstream analyses.
The UKB-PPP pQTL summary statistics are available via Synapse for the “discovery” UKB EUR population. To download, it requires creating a Synapse account, obtaining a personal access token, and adding files (protein and RSID mapping files and pQTL summary statistics) to a download list.
After logging into Synapse and adding desired files to the download list, these are the steps:
cd /Users/charleenadams/acy1_bmi
conda create -n synapse_env -c conda-forge mamba
conda activate synapse_env
pip install synapseclient
synapse -h
synapse get-download-list
synapse login "cd...." eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ6SjVSSzpMN1RMOlQ3TDc6M1ZYNjpKRU9VOjY0NFI6VTNJWDo1S1oyOjdaQ0s6RlBUSCIsImFsZyI6IlJTMjU2In0.eyJhY2Nlc3MiOnsic2NvcGUiOlsidmlldyIsImRvd25sb2FkIiwibW9kaWZ5Il0sIm9pZGNfY2xhaW1zIjp7fX0sInRva2VuX3R5cGUiOiJQRVJTT05BTF9BQ0NFU1NfVE9LRU4iLCJpc3MiOiJodHRwczovL3JlcG8tcHJvZC5wcm9kLnNhZ2ViYXNlLm9yZy9hdXRoL3YxIiwiYXVkIjoiMCIsIm5iZiI6MTcyOTcwNjgzOSwiaWF0IjoxNzI5NzA2ODM5LCJqdGkiOiIxMzA3MyIsInN1YiI6IjM0NzI3ODMifQ.B65yULl25FZ1vmFOXEE6RaNYOhXTXZ5ov5JLVP47nJpejRA6oMfcwOcY1z4RH7bLFQ48xzRhc1I8kSbexid8jpoKnQJlBI8mVNItvv92dZNDrNN_8cvXtHNLVb6O12jwIZ0XDuQxcLuxm0kF52IZSXWMtwBzBeyKckkNYL9LTbQKDmFRmOpwZGVjkZyT7wsiuLLjj9bTcMfFcY-IstD7xnK4O51D_UejK6vg-Eh0Uw23i4NHhMAA02-D-tZq1yV0X8yI0EIOFoLQ3AKtXgrr4sE6zDGz4KTPaqVM557Vv9Aqk2KtnfiZRSWWxWYPbJm1Ty8AYLBDv6Iq-tdAval8kw
I placed a copy of the RSID and protein mapping files at: /Users/charleenadams/acy1_bmi/maps
In terminal window, look at the rsid map for chromosome 1. It’s nice that they give us positions for both POS19 and POS38.
zless - olink_rsid_map_mac5_info03_b0_7_chr1_patched_v2.tsv.gz
ID REF ALT rsid POS19 POS38
1:10177:A:AC:imp:v1 A AC rs367896724 10177 10177
# ls *.tar | wc -l
for file in *.tar; do tar -xf "$file"; done
The rsid numbers need to be merged into the pQTL summary statistics files for MR. When we unpack the files, they produce subdirectories, though in this case, we just get one subdirectory for ACY1. So, the code below goes into subdirectories and annotates all the files in all of them with their corresponding rsid mapping files by chromosome or would if we needed it to.
#!/bin/bash
# Base directory where your data is located
base_dir="/Users/charleenadams/acy1_bmi"
# Directory where RSID map files are located
map_dir="$base_dir/maps"
# Log file to track progress
log_file="$base_dir/add_rsid_log.txt"
echo "Log started on $(date)" > "$log_file"
# Loop through each subdirectory in the base directory
for dir in "$base_dir"/*; do
if [[ -d "$dir" ]]; then
echo "Processing directory: $dir" | tee -a "$log_file"
# Loop through each .gz file in the subdirectory
for file in "$dir"/*.gz; do
# Extract chromosome number from file name
chrom=$(basename "$file" | sed -E 's/.*chr([0-9XY]+).*/\1/')
# Define the RSID map file for this chromosome
map_file="$map_dir/olink_rsid_map_mac5_info03_b0_7_chr${chrom}_patched_v2.tsv.gz"
output_file="${file%.gz}_rsid_added.gz"
echo " Processing file: $file (Chromosome $chrom)" | tee -a "$log_file"
# Check if the chromosome number was extracted and the RSID map file exists
if [[ -n "$chrom" && -f "$map_file" ]]; then
echo " Using RSID map: $map_file" | tee -a "$log_file"
# Remove existing rsid_added file to avoid overwrite prompts
rm -f "$output_file"
# Join the original file with the RSID map based on the ID column, adding GENPOS37 (POS19) and GENPOS38 (POS38) while keeping the original ID
gunzip -c "$file" | \
awk 'BEGIN {OFS="\t"}
NR==FNR {map[$1]=$4; genpos37[$1]=$5; genpos38[$1]=$6; next}
($3 in map) {print $0, map[$3], genpos37[$3], genpos38[$3]}
!($3 in map) {print $0, ".", ".", "."}' <(gunzip -c "$map_file") - | \
gzip -c > "$output_file"
# Confirm successful processing
if [[ -s "$output_file" ]]; then
echo " RSID added with GENPOS37 and GENPOS38. Output saved to $output_file" | tee -a "$log_file"
else
echo " Warning: $output_file is empty!" | tee -a "$log_file"
fi
else
echo " Error: RSID map file for chromosome $chrom not found or chromosome not identified." | tee -a "$log_file"
fi
done
fi
done
echo "Log ended on $(date)" >> "$log_file"
echo "Processing complete. Check $log_file for details."
#Run with:
chmod +x add_rsid.sh
nohup ./add_rsid.sh > add_rsid_log.txt 2>&1 &
ps -p 27745 -o stat,etime
Lots of things can go wrong in loops. Good to watch their progress.
find . -type f \( -name '*_rsid_added.gz' -o -name '*_rsid_added_rsid_added.gz' \) -exec ls {} +
find . -type f \( -name '*_rsid_added.gz' -o -name '*_rsid_added_rsid_added.gz' \) -exec rm {} +
#!/bin/bash
# Define base directory
base_dir="/Users/charleenadams/acy1_bmi"
# Log file for tracking
log_file="$base_dir/merge_rsid_files_log.txt"
echo "Processing started on $(date)" > "$log_file"
# Loop through directories to process, excluding specific ones
for dir in "$base_dir"/*/; do
# Skip directories you don’t want to process
if [[ "$dir" == *"maps"* || "$dir" == *"BMI_males_GIANT"* || "$dir" == *"scripts"* ]]; then
continue
fi
echo "Processing directory: $dir" | tee -a "$log_file"
# Collect all '_rsid_added.gz' files in this directory
rsid_files=("$dir"*_rsid_added.gz)
# If there are files to merge
if [[ ${#rsid_files[@]} -gt 0 ]]; then
merged_file="${dir%/}_merged_rsid_added.gz"
# Extract header from the first chromosome file
gunzip -c "${rsid_files[0]}" | head -n 1 > temp_header.txt
# Create a temporary file for content without headers
> temp_content.txt
# Append each file's content without header to the temporary content file
for file in "${rsid_files[@]}"; do
gunzip -c "$file" | tail -n +2 >> temp_content.txt
done
# Concatenate header and content into final merged file
cat temp_header.txt temp_content.txt | gzip -c > "$merged_file"
rm temp_header.txt temp_content.txt
# Confirm if the merged file was created successfully and isn’t empty
if [[ -s "$merged_file" ]]; then
echo " Merged file created: $merged_file" | tee -a "$log_file"
else
echo " Warning: Merged file is empty for directory $dir" | tee -a "$log_file"
fi
else
echo " No rsid_added.gz files found in directory $dir" | tee -a "$log_file"
fi
done
echo "Processing completed on $(date)" >> "$log_file"
echo "Check $log_file for details.""
# Run with:
chmod +x chrom_merge.sh
nohup ./chrom_merge.sh > chrom_merge_log.txt 2>&1 &
# Note: produces two log files only because I added one in the script not just to nohup.
I looked at the GIANT BMI data, but our SNPs aren’t in it, maybe because it’s an older GWAS; didn’t include as many SNPs.
We will use the Jurgens (UKB) BMI summary statistics instead but need to consider methods for sample overlap, such as MRLap.
#!/bin/bash
# Define the input VCF file and output TSV file paths
vcf_file="/Users/charleenadams/acy1_bmi/BMI_males_GIANT/ieu-a-785.vcf.gz"
output_tsv="/Users/charleenadams/acy1_bmi/BMI_males_GIANT/ieu-a-785.tsv"
# Add the header line to the output file
echo -e "CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tieu-a-785" > "$output_tsv"
# Convert VCF to TSV and append the data to the output file
bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO\t%FORMAT[\t%SAMPLE]\n' "$vcf_file" >> "$output_tsv"
echo "Conversion completed. Output saved to: $output_tsv"
library(data.table)
acy1 <- fread("/Users/charleenadams/acy1_bmi/ACY1_Q03154_OID20137_v1_Cardiometabolic_merged_rsid_added")
# If the first column contains all fields as a single string, use `tstrsplit` to separate it
acy1 <- acy1[, c("CHROM", "GENPOS", "ID", "ALLELE0", "ALLELE1", "A1FREQ", "INFO", "N",
"TEST", "BETA", "SE", "CHISQ", "LOG10P", "EXTRA") :=
tstrsplit(`CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P EXTRA`,
split = " ", fixed = TRUE)][,
`CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P EXTRA` := NULL]
# Define columns to be converted to numeric
numeric_columns <- c("CHROM", "GENPOS", "A1FREQ", "INFO", "N", "BETA", "SE", "CHISQ", "LOG10P")
# Convert specified columns to numeric
acy1[, (numeric_columns) := lapply(.SD, as.numeric), .SDcols = numeric_columns]
# Convert -LOG10P to p-value assuming LOG10P represents -log10(p-value)
acy1$Pval <- 10^(acy1$LOG10P * -1)
write.table(acy1,"/Users/charleenadams/acy1_bmi/acy1_formatted.txt",
row.names = FALSE, col.names = TRUE, quote = FALSE, sep = '\t')
From EUR 1000 Genomes (MAF<0.001)
#!/bin/bash
# ----------------------------------------------------------------------------------
# Script: create_bfiles_for_all_chromosomes.sh
# Description: Convert VCF.gz files for all chromosomes to PLINK binary files and merge them.
# Usage: ./create_bfiles_for_all_chromosomes.sh
# ----------------------------------------------------------------------------------
# ------------------------------
# Step 1: Define Paths
# ------------------------------
# Directory containing VCF files (one per chromosome)
VCF_DIR="/Users/charleenadams/temp_BI/LD_matrices/1000Genomes_scratch/annotated/eur"
# Output directory for PLINK binary files
OUTPUT_DIR="/Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001"
# Prefix for merged PLINK files
MERGED_PREFIX="${OUTPUT_DIR}/EUR"
# Ensure output directory exists
mkdir -p "$OUTPUT_DIR"
# ------------------------------
# Step 2: Check for Required Tools
# ------------------------------
command_exists () {
command -v "$1" >/dev/null 2>&1 ;
}
if ! command_exists plink2 ; then
echo "Error: plink2 is not installed or not in your PATH."
exit 1
fi
if ! command_exists tabix ; then
echo "Error: tabix is not installed or not in your PATH."
exit 1
fi
# ------------------------------
# Step 3: Process Each Chromosome
# ------------------------------
echo "Processing chromosomes..."
for CHR in {1..22}; do
# Define input VCF and output prefix for the current chromosome
INPUT_VCF="${VCF_DIR}/ALL.chr${CHR}.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_annotated_EUR.vcf.gz"
OUTPUT_PREFIX="${OUTPUT_DIR}/ALL.chr${CHR}.maf001_filtered_EUR"
# Check if VCF file exists
if [ ! -f "$INPUT_VCF" ]; then
echo "Warning: VCF file for chromosome $CHR not found. Skipping..."
continue
fi
# Index the VCF file if necessary
INDEX_FILE="${INPUT_VCF}.tbi"
if [ ! -f "$INDEX_FILE" ]; then
echo "Indexing VCF file for chromosome $CHR..."
tabix -p vcf "$INPUT_VCF"
fi
# Convert VCF to PLINK binary files
echo "Converting chromosome $CHR to PLINK binary format..."
plink2 \
--vcf "$INPUT_VCF" \
--keep-allele-order \
--maf 0.001 \
--mac 1 \
--make-bed \
--max-alleles 2 \
--snps-only just-acgt \
--out "$OUTPUT_PREFIX" \
--threads 16 \
--memory 64000
if [ $? -ne 0 ]; then
echo "Error: Failed to process chromosome $CHR. Skipping..."
continue
fi
echo "Chromosome $CHR processed successfully: $OUTPUT_PREFIX"
done
# ------------------------------
# Step 4: Merge All Chromosomes
# ------------------------------
echo "Merging all chromosomes into a single PLINK dataset..."
# Create a merge list
MERGE_LIST="${OUTPUT_DIR}/merge_list.txt"
ls "${OUTPUT_DIR}/ALL.chr"*.maf001_filtered_EUR.bed | sed 's/.bed//' > "$MERGE_LIST"
# Use the `.fam` file from the first chromosome as the reference for merging
FIRST_FAM="${OUTPUT_DIR}/ALL.chr1.maf001_filtered_EUR.fam"
if [ ! -f "$FIRST_FAM" ]; then
echo "Error: .fam file for chr1 not found. Cannot merge chromosomes."
exit 1
fi
# Merge using PLINK2
plink2 \
--pmerge-list "$MERGE_LIST" \
--fam "$FIRST_FAM" \
--make-bed \
--out "$MERGED_PREFIX" \
--threads 16 \
--memory 64000
if [ $? -ne 0 ]; then
echo "Error: Merging chromosomes failed."
exit 1
fi
echo "All chromosomes merged successfully: ${MERGED_PREFIX}.bed, ${MERGED_PREFIX}.bim, ${MERGED_PREFIX}.fam"
echo "Done."
# Run with:
nohup ./create_bfiles_from_vcf_MAF001.sh > create_bfiles_from_vcf_MAF001.log.txt 2>&1 &
cat ALL.chr{1..22}.maf001_filtered_EUR.bim > EUR.bim
cp ALL.chr1.maf001_filtered_EUR.fam EUR.fam
library(data.table)
# Load PLINK .bim file
plink_bim2 <- fread("/Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001/EUR.bim",header = FALSE, col.names = c("CHR", "SNP", "CM", "BP", "ALT_PLINK", "REF_PLINK"))
cat("PLINK .bim file contains", nrow(plink_bim2), "variants.\n")
# Check for duplicate SNPs
duplicates <- plink_bim2[duplicated(plink_bim2$SNP), ]
dim(duplicates)
# Count the total number of duplicates
num_duplicates <- sum(duplicated(plink_bim2$SNP))
cat("Total duplicate SNPs:", num_duplicates, "\n")
# Remove SNPs with missing IDs
plink_bim2_clean <- plink_bim2[plink_bim2$SNP != ".", ]
cat("Filtered PLINK .bim file contains", nrow(plink_bim2_clean), "variants.\n")
# Merge summary statistics with PLINK .bim file on rsid
exp_dat <- fread("/Users/charleenadams/acy1_bmi/acy1_formatted.txt")
harmonized_data <- merge(exp_dat, plink_bim2_clean, by.x = "rsid", by.y = "SNP", all.x = TRUE)
# Check allele alignment
harmonized_data[, flip := (ALLELE1 == REF_PLINK & ALLELE0 == ALT_PLINK)]
# Flip effect sizes where necessary
harmonized_data[flip == TRUE, `:=`(
ALLELE1 = ALT_PLINK,
ALLELE0 = REF_PLINK,
BETA = -BETA # Flip the effect size
)]
# Remove SNPs that cannot be matched
harmonized_data <- harmonized_data[!is.na(REF_PLINK) & !is.na(ALT_PLINK)]
# Save the harmonized summary statistics for ld_clump()
fwrite(harmonized_data, "/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt", sep = "\t")
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
#agnostic_instruments <- acy1[which(acy1$Pval<0.00000005),]
# Instrument suggestion from Mark
benson_list <- c("rs121912698","rs121912701","rs34017492")
instrument <- acy1[which(acy1$rsid%in%benson_list),]
# Prepare exposure data (instrument)
exp_dat <- data.frame(
SNP = instrument$rsid,
beta.exposure = instrument$BETA,
se.exposure = instrument$SE,
eaf.exposure = instrument$A1FREQ,
effect_allele.exposure = instrument$ALLELE1,
other_allele.exposure = instrument$ALLELE0,
pval.exposure = instrument$Pval,
samplesize.exposure = instrument$N,
chr.exposure = instrument$CHROM,
pos.exposure = instrument$POS19)
# Set the exposure label
exp_dat$exposure <- "ACY1 exposure"
exp_dat$id.exposure <- "ACY1 exposure"
# LD clump: install these first, though we didn't use this due to our need for lower frequencies alleles
# devtools::install_github("explodecomputer/genetics.binaRies")
# genetics.binaRies::get_plink_binary()
exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid
# Create the summary stats file
exp_dat <- as.data.table(exp_dat)
summary_stats <- exp_dat[, .(SNP = rsid, P = pval)]
# Write to a text file
write.table(
summary_stats,
file = "/Users/charleenadams/acy1_bmi/summary_stats.txt",
quote = FALSE,
row.names = FALSE,
col.names = TRUE)
plink2 \
--bfile /Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001/EUR \
--clump summary_stats.txt \
--clump-p1 5e-8 \
--clump-r2 0.001 \
--clump-kb 250 \
The results suggest they are. But is this due to the low frequencies of the alleles? Combination of rare variants, so low statistical power to detect LD?
clumping_res <- fread("/Users/charleenadams/acy1_bmi/clumped_results_stringent.clumps")
datatable(clumping_res,
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Clumping Results from Plink2')
I was sceptical that our SNPs are independent. This section explains the pairwise LD between the three SNPs (rs34017492, rs121912698, rs121912701) using LDlink.
Color Coding
Interpretation
Conclusion
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
# Instrument suggestion from Mark
benson_list <- c("rs121912698","rs121912701","rs34017492")
instrument <- acy1[which(acy1$rsid%in%benson_list),]
# Prepare exposure data (instrument)
exp_dat <- data.frame(
SNP = instrument$rsid,
beta.exposure = instrument$BETA,
se.exposure = instrument$SE,
eaf.exposure = instrument$A1FREQ,
effect_allele.exposure = instrument$ALLELE1,
other_allele.exposure = instrument$ALLELE0,
pval.exposure = instrument$Pval,
samplesize.exposure = instrument$N,
chr.exposure = instrument$CHROM,
pos.exposure = instrument$POS19)
# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"
bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
SNP = bmi$SNP,
beta.outcome = bmi$BETA,
se.outcome = bmi$SE,
eaf.outcome = bmi$A1FREQ,
effect_allele.outcome = bmi$ALLELE1,
other_allele.outcome = bmi$ALLELE0,
pval.outcome = bmi$P,
samplesize.outcome = 460000,
chr.outcome = bmi$CHR,
pos.outcome = bmi$BP)
# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"
# Harmonize datasets for MR
harmonized_data <- harmonise_data(
exposure_dat = exp_dat,
outcome_dat = out_dat)
write.csv(harmonized_data,"/Users/charleenadams/acy1_bmi/harmonized_data.csv")
In this section, we perform Steiger filtering to ascertain whether our instrumental SNPs explain more of the variance in in ACY1 than BMI. This is screen for whether we’d gotten the hypothesized direction of effect correct (we did; more of the variance in ACY1 than BMI is explained by instrumental SNPs). Then we explore whether there is heterogeneity in the effects of our instruments (a sign of possible pleiotropy) and whether there is evidence of horizontal pleiotropy in the IVW using the MR-Egger intercept test.
harmonized_data <- read.csv("/Users/charleenadams/acy1_bmi/harmonized_data.csv")
# Run Steiger filtering
dat.steiger <- steiger_filtering(harmonized_data)
# Round numerical values in the 'res' dataframe to desired decimal places
dat.steiger_rounded <- dat.steiger
num_cols <- sapply(dat.steiger_rounded, is.numeric) # Identify numeric columns
dat.steiger_rounded[, num_cols] <- lapply(dat.steiger_rounded[, num_cols], round, digits = 3)
col.list <- c("rsq.exposure", "rsq.outcome","steiger_dir","steiger_pval","SNP","effect_allele.exposure", "other_allele.exposure", "effect_allele.outcome","other_allele.outcome", "beta.exposure","beta.outcome","eaf.exposure", "eaf.outcome","se.exposure", "se.outcome","pval.exposure","pval.outcome")
steiger.list <- dat.steiger_rounded[,col.list]
datatable(steiger.list,
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Harmonized Data for MR with Steiger Info')
Results indicate no evidence for heterogeneity.
library(DT)
harmonized_data <- read.csv("/Users/charleenadams/acy1_bmi/harmonized_data.csv")
# Heterogeneity
het <- mr_heterogeneity(dat.steiger)
write.csv(dat.steiger, "/Users/charleenadams/acy1_bmi/dat.steiger.csv")
# Round numerical values in the 'res' dataframe to desired decimal places
het_rounded <- het
num_cols <- sapply(het_rounded, is.numeric) # Identify numeric columns
het_rounded[, num_cols] <- lapply(het_rounded[, num_cols], round, digits = 3)
het_rounded$id.exposure=NULL
het_rounded$id.outcome=NULL
datatable(het_rounded,
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Heterogeneity Test Results')
Results indicate no evidence for unbalanced pleiotropy in the IVW using the MR-Egger intercept test.
library(DT)
harmonized_data <- read.csv("/Users/charleenadams/acy1_bmi/harmonized_data.csv")
# Pleiotropy
plt <- mr_pleiotropy_test(dat.steiger)
# Round numerical values in the 'res' dataframe to desired decimal places
plt_rounded <- plt
num_cols <- sapply(plt_rounded, is.numeric) # Identify numeric columns
plt_rounded[, num_cols] <- lapply(plt_rounded[, num_cols], round, digits = 3)
plt_rounded$id.exposure=NULL
plt_rounded$id.outcome=NULL
# Create the datatable with centered alignment
datatable(plt_rounded,
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Pleiotropy Test Results')
IVW and weighted median MR tests provide evidence of causality. However, we need to be cautious due to sample overlap; it can inflate the effect estimates.
outcome | exposure | method | nsnp | b | se | pval | |
---|---|---|---|---|---|---|---|
3 | BMI | ACY1 | Inverse variance weighted | 3 | 0.021 | 0.008 | 0.014 |
1 | BMI | ACY1 | MR Egger | 3 | 0.011 | 0.016 | 0.627 |
2 | BMI | ACY1 | Weighted median | 3 | 0.019 | 0.009 | 0.025 |
5 | BMI | ACY1 | Weighted mode | 3 | 0.019 | 0.009 | 0.168 |
Choose a wider swathe of SNPs…
# Load the required library
library(data.table)
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
agnostic_instruments <- acy1[which(acy1$Pval<0.00000005),]
# Prepare exposure data (instrument)
exp_dat <- data.frame(
SNP = agnostic_instruments$rsid,
beta.exposure = agnostic_instruments$BETA,
se.exposure = agnostic_instruments$SE,
eaf.exposure = agnostic_instruments$A1FREQ,
effect_allele.exposure = agnostic_instruments$ALLELE1,
other_allele.exposure = agnostic_instruments$ALLELE0,
pval.exposure = agnostic_instruments$Pval,
samplesize.exposure = agnostic_instruments$N,
chr.exposure = agnostic_instruments$CHROM,
pos.exposure = agnostic_instruments$POS19)
# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"
exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid
# Create the summary stats file
exp_dat <- as.data.table(exp_dat)
summary_stats <- exp_dat[, .(SNP = rsid, P = pval)]
# Write to a text file
write.table(
summary_stats,
file = "/Users/charleenadams/acy1_bmi/summary_stats_large.txt",
quote = FALSE,
row.names = FALSE,
col.names = TRUE)
plink2 \
--bfile /Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001/EUR \
--clump summary_stats_large.txt \
--clump-p1 5e-8 \
--clump-r2 0.001 \
--clump-kb 250 \
--out clumped_results_large
Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) is a widely used method in Mendelian Randomization (MR) analysis that provides several key advantages:
library(data.table)
library(MRPRESSO)
library(kableExtra)
library(ggplot2)
library(dplyr)
# Step 1: Load the dataset
clumping_results <- fread("/Users/charleenadams/acy1_bmi/clumped_results_large.clumps")
#print(colnames(clumping_results))
# Step 1: Rename the `#CHROM` column to `CHROM`
setnames(clumping_results, old = "#CHROM", new = "CHROM")
# Filter rows where SP2 is "."
independent_snps <- clumping_results[SP2 == ".", .(CHROM, POS, ID, P)]
# Save the independent SNPs to a new file
fwrite(
independent_snps,
file = "independent_snps_for_MR.txt",
sep = "\t",
col.names = TRUE
)
#cat("Independent SNPs have been extracted and saved to 'independent_snps_for_MR.txt'.")
# Read in the ACY1 data
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
agnostic_instruments <- acy1[which(acy1$Pval<0.00000005),]
agnostic_instruments <- agnostic_instruments[which(agnostic_instruments$rsid%in%independent_snps$ID),]
# Prepare exposure data (instrument)
exp_dat <- data.frame(
SNP = agnostic_instruments$rsid,
beta.exposure = agnostic_instruments$BETA,
se.exposure = agnostic_instruments$SE,
eaf.exposure = agnostic_instruments$A1FREQ,
effect_allele.exposure = agnostic_instruments$ALLELE1,
other_allele.exposure = agnostic_instruments$ALLELE0,
pval.exposure = agnostic_instruments$Pval,
samplesize.exposure = agnostic_instruments$N,
chr.exposure = agnostic_instruments$CHROM,
pos.exposure = agnostic_instruments$POS19)
# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"
# Columns needed by ld_clump(), though we clumped directly with Plink
# Kept for legacy:
exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid
# Read in BMI data
bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
SNP = bmi$SNP,
beta.outcome = bmi$BETA,
se.outcome = bmi$SE,
eaf.outcome = bmi$A1FREQ,
effect_allele.outcome = bmi$ALLELE1,
other_allele.outcome = bmi$ALLELE0,
pval.outcome = bmi$P,
samplesize.outcome = 460000,
chr.outcome = bmi$CHR,
pos.outcome = bmi$BP)
# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"
# Harmonize datasets for MR
harmonized_data <- harmonise_data(
exposure_dat = exp_dat,
outcome_dat = out_dat)
dat_indep_steiger <- steiger_filtering(harmonized_data)
write.csv(dat_indep_steiger,"/Users/charleenadams/acy1_bmi/harmonized_data_steiger_mr.csv")
mr_expanded_plt <- mr_pleiotropy_test(dat_indep_steiger)
mr_expanded_plt
## id.exposure id.outcome outcome exposure egger_intercept se pval
## 1 ACY1 BMI BMI ACY1 0.001501008 0.002049837 0.4760925
mr_expanded_het <- mr_heterogeneity(dat_indep_steiger)
mr_expanded_het
## id.exposure id.outcome outcome exposure method Q
## 1 ACY1 BMI BMI ACY1 MR Egger 11.24004
## 2 ACY1 BMI BMI ACY1 Inverse variance weighted 11.77624
## Q_df Q_pval
## 1 14 0.6670847
## 2 15 0.6958811
mr_expanded_res <- mr(dat_indep_steiger)
#mr_expanded_res
# Round numerical values in the 'res' dataframe to desired decimal places
res_rounded <- mr_expanded_res
num_cols <- sapply(res_rounded, is.numeric) # Identify numeric columns
res_rounded[, num_cols] <- lapply(res_rounded[, num_cols], round, digits = 3)
res_rounded$id.exposure=NULL
res_rounded$id.outcome=NULL
# Filter out rows where the method is "Simple mode"
res_rounded <- res_rounded[res_rounded$method != "Simple mode", ]
res_rounded <- res_rounded[order(res_rounded$method),]
# Generate the table
kable(
res_rounded,
format = "html",
caption = "MR Expanded Results for ACY1 on BMI",
align = "c"
) %>%
kable_styling(full_width = FALSE, position = "center", bootstrap_options = c("striped", "hover")) %>%
column_spec(1:7, width = "10em") # Adjust column width as necessary
outcome | exposure | method | nsnp | b | se | pval | |
---|---|---|---|---|---|---|---|
3 | BMI | ACY1 | Inverse variance weighted | 16 | 0.018 | 0.009 | 0.051 |
1 | BMI | ACY1 | MR Egger | 16 | 0.013 | 0.011 | 0.251 |
2 | BMI | ACY1 | Weighted median | 16 | 0.015 | 0.013 | 0.267 |
5 | BMI | ACY1 | Weighted mode | 16 | 0.008 | 0.016 | 0.609 |
# Prepare the data for plotting
res_plot <- res_rounded %>%
mutate(
lower = b - 1.96 * se, # Lower bound of the confidence interval
upper = b + 1.96 * se, # Upper bound of the confidence interval
method = factor(method, levels = unique(method)) # Ensure consistent ordering of methods
)
# Create the forest plot
ggplot(res_plot, aes(x = method, y = b, ymin = lower, ymax = upper, color = method)) +
geom_pointrange(size = 0.8) + # Points with error bars
geom_hline(yintercept = 0, linetype = "dashed", color = "grey") + # Add a dashed line at 0
coord_flip() + # Flip coordinates for horizontal orientation
labs(
title = "Forest Plot of Expanded MR Results for ACY1 on BMI",
x = "Method",
y = "Effect Size (Beta)",
color = "Method"
) +
theme_minimal(base_size = 14) +
theme(
legend.position = "top",
axis.title.y = element_text(size = 12),
axis.title.x = element_text(size = 12)
)
ggsave("forest_plot_expanded_acy1_bmi.png", width = 8, height = 6, dpi = 300)
## MRPRESSO
# Prepare the data for MR-PRESSO
mrpresso_input <- data.frame(
SNP = dat_indep_steiger$rsid, # SNP identifiers
Y_effect = dat_indep_steiger$beta.outcome, # Outcome effect sizes
Y_se = dat_indep_steiger$se.outcome, # Outcome standard errors
E1_effect = dat_indep_steiger$beta.exposure, # Exposure effect sizes
E1_se = dat_indep_steiger$se.exposure # Exposure standard errors
)
# Check the structure of the input data
#print(head(mrpresso_input))
# Run MR-PRESSO using the global method
tryCatch({
mrpresso_results <- mr_presso(
BetaOutcome = "Y_effect", # Outcome effect size column
BetaExposure = "E1_effect", # Exposure effect size column
SdOutcome = "Y_se", # Outcome standard error column
SdExposure = "E1_se", # Exposure standard error column
OUTLIERtest = TRUE, # Run the outlier test
DISTORTIONtest = TRUE, # Run the distortion test
data = mrpresso_input, # Input data
NbDistribution = 1000, # Number of bootstrap iterations
SignifThreshold = 0.05 # Significance threshold
)
# Display the results
print(mrpresso_results)
# Save the results to a file
saveRDS(mrpresso_results, file = "/Users/charleenadams/acy1_bmi/mrpresso_results.rds")
cat("MR-PRESSO analysis completed successfully. Results saved to /Users/charleenadams/acy1_bmi/mrpresso_results.rds\n")
}, error = function(e) {
cat("Error encountered during MR-PRESSO analysis:\n")
cat(e$message, "\n")
})
## $`Main MR results`
## Exposure MR Analysis Causal Estimate Sd T-stat P-value
## 1 E1_effect Raw 0.01810085 0.008201552 2.207003 0.04331256
## 2 E1_effect Outlier-corrected NA NA NA NA
##
## $`MR-PRESSO results`
## $`MR-PRESSO results`$`Global Test`
## $`MR-PRESSO results`$`Global Test`$RSSobs
## [1] 14.00632
##
## $`MR-PRESSO results`$`Global Test`$Pvalue
## [1] 0.668
##
##
##
## MR-PRESSO analysis completed successfully. Results saved to /Users/charleenadams/acy1_bmi/mrpresso_results.rds
# Save MR-PRESSO results
write.csv(mrpresso_results$`Main MR results`, file = "mrpresso_results.csv", row.names = FALSE)
# Extract main results from MR-PRESSO output
mrpresso_main_results <- as.data.frame(mrpresso_results$`Main MR results`)
# Generate a kable table
kable(
mrpresso_main_results,
format = "html",
caption = "MR-PRESSO Results for ACY1 on BMI",
align = "c"
) %>%
kable_styling(full_width = FALSE, position = "center", bootstrap_options = c("striped", "hover"))
Exposure | MR Analysis | Causal Estimate | Sd | T-stat | P-value |
---|---|---|---|---|---|
E1_effect | Raw | 0.0181008 | 0.0082016 | 2.207003 | 0.0433126 |
E1_effect | Outlier-corrected | NA | NA | NA | NA |
MRlap accounts for the winner’s curse, which refers to the tendency for associations found in the initial discovery sample to be overestimated when evaluated in a subsequent sample. In MR studies, this can lead to inflated estimates of causal effects. IMPORTANTLY, we used the UKB-PPP “Discovery” pQTLs for ACY1. For this reason, the strength of the association between SNPs and ACY1 could be inflated and lead to incorrect conclusions.
MRlap uses cross-trait linkage disequilibrium score regression (LDSC) to estimate and adjust for winner’s curse and sample overlap. By doing so, it provides more accurate estimates of causal effects, reducing the likelihood of overestimating the strength of associations.
library(TwoSampleMR)
library(MRlap)
library(data.table)
# Step 1: Load the dataset
clumping_results <- fread("/Users/charleenadams/acy1_bmi/clumped_results_large.clumps")
# Step 1: Rename the `#CHROM` column to `CHROM`
setnames(clumping_results, old = "#CHROM", new = "CHROM")
# Filter rows where SP2 is "."
independent_snps <- clumping_results[SP2 == ".", .(CHROM, POS, ID, P)]
# Save the independent SNPs to a new file
fwrite(
independent_snps,
file = "independent_snps_for_MR.txt",
sep = "\t",
col.names = TRUE
)
#cat("Independent SNPs have been extracted and saved to 'independent_snps_for_MR.txt'.")
# Read in the ACY1 data
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
exp_dat <- data.frame(
SNP = acy1$rsid,
beta.exposure = acy1$BETA,
se.exposure = acy1$SE,
eaf.exposure = acy1$A1FREQ,
effect_allele.exposure = acy1$ALLELE1,
other_allele.exposure = acy1$ALLELE0,
pval.exposure = acy1$Pval,
samplesize.exposure = acy1$N,
chr.exposure = acy1$CHROM,
pos.exposure = acy1$POS19)
# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"
# Columns needed by ld_clump(), though we clumped directly with Plink
# Kept for legacy:
exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid
# Read in BMI data
bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
SNP = bmi$SNP,
beta.outcome = bmi$BETA,
se.outcome = bmi$SE,
eaf.outcome = bmi$A1FREQ,
effect_allele.outcome = bmi$ALLELE1,
other_allele.outcome = bmi$ALLELE0,
pval.outcome = bmi$P,
samplesize.outcome = 460000,
chr.outcome = bmi$CHR,
pos.outcome = bmi$BP)
# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"
# Step 2: Prepare the exposure data
exp_dat <- as.data.table(exp_dat)
exposure_data <- exp_dat[, list(
rsid = SNP, # SNP identifier
a1 = effect_allele.exposure, # Alternate (effect) allele
a2 = other_allele.exposure, # Reference allele
beta = beta.exposure, # Effect size
se = se.exposure, # Standard error
N = samplesize.exposure, # Sample size
chr = chr.exposure, # Chromosome
pos = pos.exposure # Position
)]
# Step 3: Save the exposure data
fwrite(
exposure_data,
file = "/Users/charleenadams/acy1_bmi/exposure_data_MRlap.tsv",
sep = "\t"
)
#cat("Exposure data saved to: /Users/charleenadams/acy1_bmi/exposure_data_MRlap.tsv\n")
# Step 4: Prepare the outcome data
out_dat <- as.data.table(out_dat)
outcome_data <- out_dat[, list(
rsid = SNP, # SNP identifier
a1 = effect_allele.outcome, # Alternate (effect) allele
a2 = other_allele.outcome, # Reference allele
beta = beta.outcome, # Effect size
se = se.outcome, # Standard error
N = samplesize.outcome, # Sample size
chr = chr.outcome, # Chromosome
pos = pos.outcome # Position
)]
# Step 5: Save the outcome data
fwrite(
outcome_data,
file = "/Users/charleenadams/acy1_bmi/outcome_data_MRlap.tsv",
sep = "\t"
)
#cat("Outcome data saved to: /Users/charleenadams/acy1_bmi/outcome_data_MRlap.tsv\n")
# Step 6: Summary for verification
#cat("\nSummary of prepared datasets:\n")
#cat("Exposure data:\n")
#cat("\nOutcome data:\n")
# Load necessary libraries
# Step 1: Set file paths
exposure_file <- "/Users/charleenadams/acy1_bmi/exposure_data_MRlap.tsv"
outcome_file <- "/Users/charleenadams/acy1_bmi/outcome_data_MRlap.tsv"
# Step 2: Set file paths for LDSC input (required by MRlap)
ld_path <- "/Users/charleenadams/acy1_bmi/eur_w_ld_chr" # Path to LD score files
hm3_file <- "/Users/charleenadams/acy1_bmi/eur_w_ld_chr/w_hm3.snplist" # Path to HapMap3 SNP list
# Step 3: Read in the exposure and outcome data
exposure_data <- fread(exposure_file)
outcome_data <- fread(outcome_file)
exposure_data <- as.data.frame(exposure_data)
outcome_data <- as.data.frame(outcome_data)
# Step 4: Run MRlap
results <- MRlap(
exposure = exposure_data,
exposure_name = "ACY1",
outcome = outcome_data,
outcome_name = "BMI",
ld = ld_path,
hm3 = hm3_file
)
## <<< Preparation of analysis >>>
## > Checking parameters
## The p-value threshold used for selecting MR instruments is: 5e-08
## The distance used for pruning MR instruments is: 500 Kb
## Distance-based pruning will be used for MR instruments
## > Processing exposure (ACY1) summary statistics...
## # Preparation of the data...
## The data.frame used as input is: "exposure_data".
## SNPID column, ok - CHR column, ok - POS column, ok - ALT column, ok - REF column, ok - BETA column, ok - SE column, ok - N column, ok
## > Processing outcome (BMI) summary statistics...
## # Preparation of the data...
## The data.frame used as input is: "outcome_data".
## SNPID column, ok - CHR column, ok - POS column, ok - ALT column, ok - REF column, ok - BETA column, ok - SE column, ok - N column, ok
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
## <<< Performing cross-trait LDSC >>>
## > Munging exposure data...
## > Munging outcome data...
## > Running cross-trait LDSC...
## Please consider saving the log files and checking them to ensure that all columns were interpreted correctly and no warnings were issued for any of the summary statistics files
## > Cleaning temporary files...
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
## <<< Running IVW-MR >>>
## > Identifying IVs...
## 359 IVs with p < 5e-08
## 0 IVs excluded - more strongly associated with the outcome than with the exposure, p < 1e-03
## Pruning : distance : 500 Kb
## 18 IVs left after pruning
## > Performing MR
## IVW-MR observed effect: 0.000643 ( 0.0151 )
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
## <<< Estimating corrected effect >>>
## > Estimating genetic architecture parameters...
## > Estimating corrected effect...
## corrected effect: -0.00218 ( 0.0171 )
## covariance between observed and corrected effect: 0.00026
## 8000 simulations were used to estimate the variance and the covariance.
## > Testing difference between observed and corrected effect...
## Runtime of the analysis: 5 minute(s) and 1 second(s).
# Step 6: Print a summary of results
#cat("MRlap analysis completed. Results saved to: /Users/charleenadams/acy1_bmi/mrlap_results.tsv\n")
This table summarizes the results after correcting for biases in the MR analysis.
observed_effect | observed_effect_se | m_IVs | IVs | observed_effect_p | corrected_effect | corrected_effect_se | corrected_effect_p | test_difference | p_difference |
---|---|---|---|---|---|---|---|---|---|
0.000643 | 0.0151 | 18 | rs10119644, rs112875651, rs121912698, rs1260326, … | 0.966 | -0.00218 | 0.0169 | 0.897 | 1.22 | 0.221 |
This table presents the linkage disequilibrium score regression (LDSC) results.
The LDSC intercept analysis supports the validity of our MR results by demonstrating that cross-trait confounding due to sample overlap is negligible. However, the fact that the corrected effect estimate is null indicates that we should be cautious in claiming causality from the IVW and MR-PRESSO analyses of ACY1 on BMI, which use the DISCOVERY UKB-PPP instruments. If there are other pQTL sources for ACY1, we should perform this analysis with them and compare.
h2_exp | h2_exp_se | int_exp | h2_out | h2_out_se | int_out | gcov | gcov_se | rg | int_crosstrait | int_crosstrait_se |
---|---|---|---|---|---|---|---|---|---|---|
0.154 | 0.0173 | 0.99 | 0.26 | 0.00705 | 1.08 | 0.0707 | 0.00657 | 0.353 | 0.088 | 0.0075 |
This table assesses the polygenicity of the exposure trait (ACY1).
polygenicity | perSNP_heritability |
---|---|
0.000546 | 0.000245 |
The MRlap analysis revealed the following key insights:
The observed and corrected causal effects of ACY1 on BMI were near zero and statistically non-significant, suggesting no robust causal link.
LDSC analysis indicated a moderate genetic correlation (rg = 0.353) between ACY1 and BMI, consistent with shared genetic architecture but not causality.
The genetic architecture of ACY1 is sparse, with a low polygenicity estimate (0.000546), meaning only a small subset of SNPs contributes to its heritability.
These results collectively suggest that while ACY1 and BMI share some genetic background, ACY1 may not exert a strong causal influence on BMI. But MRlap isn’t set up to address rare variants…
# Load necessary library
library(openxlsx)
# Create a new Excel workbook
wb <- createWorkbook()
# Function to handle list elements and write them to Excel
write_list_to_sheet <- function(wb, sheet_name, data) {
addWorksheet(wb, sheet_name)
# If the data is a list, convert it to a data frame
if (is.list(data)) {
data <- as.data.frame(t(data))
colnames(data) <- names(data)
}
writeData(wb, sheet_name, data)
}
# Write MRcorrection results
write_list_to_sheet(wb, "MRcorrection", results$MRcorrection)
# Write LDSC results
write_list_to_sheet(wb, "LDSC", results$LDSC)
# Write GeneticArchitecture results
write_list_to_sheet(wb, "GeneticArchitecture", results$GeneticArchitecture)
# Save the workbook to your desired path
saveWorkbook(wb, file = "/Users/charleenadams/acy1_bmi/mrlap_results.xlsx", overwrite = TRUE)
cat("MRlap results have been saved to 'mrlap_results.xlsx' with three tabs.")
## MRlap results have been saved to 'mrlap_results.xlsx' with three tabs.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.49 MRPRESSO_1.0 kableExtra_1.4.0 MRlap_0.0.3.2
## [5] DT_0.33 biomaRt_2.62.0 readxl_1.4.3 TwoSampleMR_0.6.8
## [9] corrplot_0.95 tidyr_1.3.1 ggrepel_0.9.6 data.table_1.16.2
## [13] dplyr_1.1.4 openxlsx_4.2.7.1 ggplot2_3.5.1
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 mnormt_2.1.1 httr2_1.0.6
## [4] rlang_1.1.4 magrittr_2.0.3 e1071_1.7-16
## [7] compiler_4.4.2 RSQLite_2.3.7 gdata_3.0.1
## [10] png_0.1-8 systemfonts_1.1.0 vctrs_0.6.5
## [13] quadprog_1.5-8 stringr_1.5.1 pkgconfig_2.0.3
## [16] crayon_1.5.3 fastmap_1.2.0 dbplyr_2.5.0
## [19] XVector_0.46.0 labeling_0.4.3 splitstackshape_1.4.8
## [22] pbivnorm_0.6.0 utf8_1.2.4 rmarkdown_2.29
## [25] tzdb_0.4.0 UCSC.utils_1.2.0 ragg_1.3.3
## [28] purrr_1.0.2 bit_4.5.0 xfun_0.49
## [31] zlibbioc_1.52.0 cachem_1.1.0 GenomeInfoDb_1.42.0
## [34] jsonlite_1.8.9 progress_1.2.3 blob_1.2.4
## [37] psych_2.4.6.26 lavaan_0.6-19 parallel_4.4.2
## [40] prettyunits_1.2.0 R6_2.5.1 bslib_0.8.0
## [43] stringi_1.8.4 jquerylib_0.1.4 cellranger_1.1.0
## [46] iterators_1.0.14 Rcpp_1.0.13-1 R.utils_2.12.3
## [49] readr_2.1.5 IRanges_2.40.0 Matrix_1.7-1
## [52] tidyselect_1.2.1 rstudioapi_0.17.1 yaml_2.3.10
## [55] codetools_0.2-20 doParallel_1.0.17 curl_6.0.0
## [58] lattice_0.22-6 tibble_3.2.1 plyr_1.8.9
## [61] Biobase_2.66.0 withr_3.0.2 KEGGREST_1.46.0
## [64] evaluate_1.0.1 proxy_0.4-27 BiocFileCache_2.14.0
## [67] zip_2.3.1 xml2_1.3.6 Biostrings_2.74.0
## [70] pillar_1.9.0 filelock_1.0.3 foreach_1.5.2
## [73] stats4_4.4.2 generics_0.1.3 vroom_1.6.5
## [76] S4Vectors_0.44.0 hms_1.1.3 munsell_0.5.1
## [79] scales_1.3.0 gtools_3.9.5 class_7.3-22
## [82] glue_1.8.0 tools_4.4.2 GenomicSEM_0.0.5
## [85] grid_4.4.2 crosstalk_1.2.1 AnnotationDbi_1.68.0
## [88] colorspace_2.1-1 nlme_3.1-166 GenomeInfoDbData_1.2.13
## [91] cli_3.6.3 rappdirs_0.3.3 textshaping_0.4.0
## [94] fansi_1.0.6 viridisLite_0.4.2 svglite_2.1.3
## [97] gtable_0.3.6 R.methodsS3_1.8.2 sass_0.4.9
## [100] digest_0.6.37 mgsub_1.7.3 BiocGenerics_0.52.0
## [103] htmlwidgets_1.6.4 farver_2.1.2 R.oo_1.27.0
## [106] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
## [109] httr_1.4.7 bit64_4.5.2
Overall, the MR analysis supports a potential causal influence of ACY1 on BMI due to:
The use of a rare variant with a strong effect on ACY1, serving as a robust instrument for standard IVW MR, with confirmation by MR-PRESSO using an expanded list of instruments.
Potential bias from sample overlap is negligible based on the LDSC intercept test.
Winner’s curse may still have inflated the MR estimate.
Thus, MRlap failed to confirm the effect because:
It relies on polygenic signals and cross-trait LD correlations, which are less sensitive to rare variants with strong effects and/or our IVW and MR-PRESSO analyses are biased due to use of UKB-PPP discovery samples for the ACY1 instruments.
The remaining portion of the document examines the Benson instruments with Wald ratios and combinations of IVW analyses.
NOTE: Although it isn’t necessary, I have calculated the weights for the 3-SNP IVW analysis by hand and performed the 3-SNP IVW by hand to show you how that is done.
IVW Estimate (\(\beta_{IVW}\)):
\[ \beta_{IVW} = \frac{\sum w_i \cdot \beta_{\text{exposure},i} \cdot \beta_{\text{outcome},i}}{\sum w_i \cdot \beta_{\text{exposure},i}^2} \]
where \(w_i = \frac{1}{se_{\text{outcome},i}^2}\) are the inverse-variance weights.
Standard Error (\(se_{IVW}\)):
\[ se_{IVW} = \sqrt{\frac{1}{\sum w_i \cdot \beta_{\text{exposure},i}^2}} \]
p-value:
\[ p = 2 \cdot \text{pnorm}(-|\beta_{IVW} / se_{IVW}|) \]
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
# Instrument suggestion from Mark
benson_list <- c("rs121912698","rs121912701","rs34017492")
instrument <- acy1[which(acy1$rsid%in%benson_list),]
# Prepare exposure data (instrument)
exp_dat <- data.frame(
SNP = instrument$rsid,
beta.exposure = instrument$BETA,
se.exposure = instrument$SE,
eaf.exposure = instrument$A1FREQ,
effect_allele.exposure = instrument$ALLELE1,
other_allele.exposure = instrument$ALLELE0,
pval.exposure = instrument$Pval,
samplesize.exposure = instrument$N,
chr.exposure = instrument$CHROM,
pos.exposure = instrument$POS19)
# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"
bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
SNP = bmi$SNP,
beta.outcome = bmi$BETA,
se.outcome = bmi$SE,
eaf.outcome = bmi$A1FREQ,
effect_allele.outcome = bmi$ALLELE1,
other_allele.outcome = bmi$ALLELE0,
pval.outcome = bmi$P,
samplesize.outcome = 460000,
chr.outcome = bmi$CHR,
pos.outcome = bmi$BP)
# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"
# Harmonize datasets for MR
harmonized_data <- harmonise_data(
exposure_dat = exp_dat,
outcome_dat = out_dat)
# Calculate weights as the inverse variance of beta.outcome
harmonized_data$weights <- 1 / (harmonized_data$se.outcome^2)
# Calculate the numerator and denominator for the IVW estimate
harmonized_data$IVW_numerator <- harmonized_data$weights * harmonized_data$beta.exposure * harmonized_data$beta.outcome
harmonized_data$IVW_denominator <- harmonized_data$weights * (harmonized_data$beta.exposure^2)
# Sum across all SNPs to get the IVW estimate
beta_IVW <- sum(harmonized_data$IVW_numerator) / sum(harmonized_data$IVW_denominator)
# Calculate the standard error
se_IVW <- sqrt(1 / sum(harmonized_data$IVW_denominator))
# Calculate the p-value
p_IVW <- 2 * pnorm(-abs(beta_IVW / se_IVW))
# Add results as new columns
harmonized_data$beta_IVW <- beta_IVW
harmonized_data$se_IVW <- se_IVW
harmonized_data$p_IVW <- p_IVW
# View the results
print(harmonized_data)
## SNP effect_allele.exposure other_allele.exposure
## 1 rs121912698 T C
## 2 rs121912701 A G
## 3 rs34017492 G C
## effect_allele.outcome other_allele.outcome beta.exposure beta.outcome
## 1 T C -1.691910 -0.0311163
## 2 A G -0.520153 -0.0214348
## 3 G C 0.487914 0.0115997
## eaf.exposure eaf.outcome remove palindromic ambiguous id.outcome se.outcome
## 1 0.00440744 0.003927 FALSE FALSE FALSE BMI 0.0150565
## 2 0.00476360 0.004107 FALSE FALSE FALSE BMI 0.0147407
## 3 0.00235621 0.002368 FALSE TRUE FALSE BMI 0.0219029
## pval.outcome samplesize.outcome chr.outcome pos.outcome outcome se.exposure
## 1 0.0387695 460000 3 52022837 BMI 0.0537128
## 2 0.1459120 460000 3 52023042 BMI 0.0515046
## 3 0.5963920 460000 3 52018149 BMI 0.0820835
## pval.exposure samplesize.exposure chr.exposure pos.exposure exposure
## 1 8.933055e-218 33693 3 52022837 ACY1
## 2 5.571857e-24 33693 3 52023042 ACY1
## 3 2.779457e-09 33693 3 52018149 ACY1
## id.exposure action SNP_index mr_keep weights IVW_numerator IVW_denominator
## 1 ACY1 2 1 TRUE 4411.151 232.22937 12627.1825
## 2 ACY1 2 1 TRUE 4602.182 51.31145 1245.1624
## 3 ACY1 2 1 TRUE 2084.475 11.79741 496.2303
## beta_IVW se_IVW p_IVW
## 1 0.02055446 0.008342441 0.01374567
## 2 0.02055446 0.008342441 0.01374567
## 3 0.02055446 0.008342441 0.01374567
# Repeat the 3-SNP meta-analytic analysis performed earlier; just to have it here too for easy viewing.
metaMR3 <- mr(harmonized_data)
metaMR3
## id.exposure id.outcome outcome exposure method nsnp
## 1 ACY1 BMI BMI ACY1 MR Egger 3
## 2 ACY1 BMI BMI ACY1 Weighted median 3
## 3 ACY1 BMI BMI ACY1 Inverse variance weighted 3
## 4 ACY1 BMI BMI ACY1 Simple mode 3
## 5 ACY1 BMI BMI ACY1 Weighted mode 3
## b se pval
## 1 0.01090458 0.016409839 0.62661534
## 2 0.01909664 0.008561989 0.02572107
## 3 0.02055446 0.008342441 0.01374567
## 4 0.02109267 0.016009197 0.31834507
## 5 0.01850858 0.009186963 0.18152333
# Step 1: Run Wald ratio MR for each SNP individually
wald_results <- data.frame()
for (i in 1:nrow(harmonized_data)) {
wald_result <- mr_wald_ratio(
b_exp = harmonized_data$beta.exposure[i],
b_out = harmonized_data$beta.outcome[i],
se_exp = harmonized_data$se.exposure[i],
se_out = harmonized_data$se.outcome[i]
)
wald_results <- rbind(wald_results, data.frame(
SNP = harmonized_data$SNP[i],
Estimate = wald_result$b,
SE = wald_result$se,
P = wald_result$pval
))
}
# Step 2: Run IVW MR for all combinations of two SNPs
library(gtools) # For combinations
ivw_results <- data.frame()
snp_combinations <- combinations(n = length(benson_list), r = 2, v = benson_list)
for (i in 1:nrow(snp_combinations)) {
selected_snps <- snp_combinations[i, ]
subset_data <- harmonized_data[harmonized_data$SNP %in% selected_snps, ]
# Perform IVW MR
ivw_result <- mr_ivw(
b_exp = subset_data$beta.exposure,
b_out = subset_data$beta.outcome,
se_exp = subset_data$se.exposure,
se_out = subset_data$se.outcome
)
ivw_results <- rbind(ivw_results, data.frame(
SNPs = paste(selected_snps, collapse = ","),
Estimate = ivw_result$b,
SE = ivw_result$se,
P = ivw_result$pval
))
}
# Display Results
print("Wald Ratio Results for Individual SNPs:")
## [1] "Wald Ratio Results for Individual SNPs:"
print(wald_results)
## SNP Estimate SE P
## 1 rs121912698 0.01839123 0.008899114 0.0387685
## 2 rs121912701 0.04120864 0.028339162 0.1459120
## 3 rs34017492 0.02377407 0.044890903 0.5963917
print("IVW Results for SNP Combinations:")
## [1] "IVW Results for SNP Combinations:"
print(ivw_results)
## SNPs Estimate SE P
## 1 rs121912698,rs121912701 0.02043929 0.008490340 0.01606841
## 2 rs121912698,rs34017492 0.01859477 0.008729243 0.03315765
## 3 rs121912701,rs34017492 0.03624046 0.023963576 0.13045395
# Save results to files
fwrite(wald_results, "/Users/charleenadams/acy1_bmi/wald_results.csv")
fwrite(ivw_results, "/Users/charleenadams/acy1_bmi/ivw_results.csv")
In this top table, I’m providing the betas for exposures and outcomes next to each other so that you can more easily see how they produce Wald ratios.
SNP | Effect Allele | Other Allele | Beta (Exposure) | Beta (Outcome) | SE (Outcome) | Wald Estimate | Wald SE | Wald P |
---|---|---|---|---|---|---|---|---|
rs121912698 | T | C | -1.6919 | -0.0311 | 0.0151 | 0.0184 | 0.0089 | 0.0388 |
rs121912701 | A | G | -0.5202 | -0.0214 | 0.0147 | 0.0412 | 0.0283 | 0.1459 |
rs34017492 | G | C | 0.4879 | 0.0116 | 0.0219 | 0.0238 | 0.0449 | 0.5964 |
SNPs | Estimate | SE | P |
---|---|---|---|
rs121912698, rs121912701, rs34017492 | 0.0206 | 0.0083 | 0.0137 |
rs121912698, rs121912701 | 0.0204 | 0.0085 | 0.0161 |
rs121912698, rs34017492 | 0.0186 | 0.0087 | 0.0332 |
rs121912701, rs34017492 | 0.0362 | 0.0240 | 0.1305 |
Method | Estimate | SE | P |
---|---|---|---|
Inverse Variance Weighted | 0.0206 | 0.0083 | 0.0137 |
Weighted Median | 0.0191 | 0.0087 | 0.0289 |
MR Egger | 0.0109 | 0.0164 | 0.6266 |
Weighted Mode | 0.0185 | 0.0093 | 0.1840 |
The Benson-SNP analysis of circulating ACY1 on BMI indicates that an INCREASE in circulating levels of ACY1 INCREASE BMI.
NOTE: MR results are interpreted in relation to higher levels of the instrumented trait, unless the GWAS was explicitly set up to reflect a decreasing trait (which is uncommon). The interpretation focuses on the trait, not the SNPs. For example, the minor allele of rs121912698 decreases ACY1 levels and also decreases BMI, meaning higher levels of ACY1 increase BMI. Similarly, the minor allele of rs34017492 increases ACY1 levels and also increases BMI. In this analysis, all three SNPs show a consistent alignment between the direction of their effects on ACY1 and BMI with rs121912698 and rs121912701 both decreasing ACY1 and BMI and rs34017492 increasing ACY1 and increasing BMI.
However, this is not always the case. Another pattern seen in MR occurs when the minor allele’s effect on the exposure trait is opposite to its effect on the outcome trait. In such instances, the Wald ratio (calculated as the outcome beta divided by the exposure beta) will indicate that a higher level of the exposure decreases the outcome. This relationship holds regardless of whether the exposure beta is negative and the outcome beta positive, or vice versa. For instance, suppose the minor allele for rs123 increases ACY1 but decreases BMI. The MR interpretation is that an increase in the levels of ACY1 decrease BMI. But suppose rs123 decreases ACY1 but increases BMI. The MR interpretation is still that an increase in the levels of ACY1 decrease BMI.