Mendelian Randomization of ACY1 and BMI

1 Introduction
2 UKB-PPP
3 Getting the Data
4 Bioinformatic File Preparation
5 Can’t Use the Giant BMI Summary Statistics
6 Format ACY1 Data for Clumping
7 Instrument Selection
8 Format Jurgens BMI Summary Statistics
9 IVW MR Results
10 More Instruments
- 10.1 Larger Instrument Set Clumping
11 MR-PRESSO with Expanded Instrument
12 MRlap
13 MRlap Results
14 Summary
15 Next Steps
- 15.1 Formulas for IVW Mendelian Randomization
16 Wald ratio and 2-SNP IVW Analyses
17 Cleaner Visual of the Benson MR Results

1 Introduction

Here we perform a Mendelian randomization (MR) of circulating ACY1 levels on body mass index (BMI) in participants in the UK Biobank (UKB). Our genetic instruments for ACY1 come from Europeans (“discovery” data🚩; flagged for a reason) enrolled in the UK Biobank Pharma Proteomics Project (UKB-PPP)[1], and our BMI summary statistics come from Jurgens et al. (2023)[2]. We do so first using three biologically informed SNPs as instruments with the inverse-variance weighted (IVW) method, after investigating these SNPs for independence with Plink2 and LDlink, considering both D’ and r2. We perform Steiger filtering to ascertain whether our hypothesized direction of effect is plausible and heterogeniety and pleiotropy tests to assess violations to the MR assumption against horizontal pleiotriopy. We follow this with several MR sensitivity analyses, including MRLap to explore the impact of sample overlap and winner’s curse (when initially reported effect sizes of significant genetic variants are overestimated) on our estimates and MR-PRESSO with an instrument that includes more SNPs.

Sun BB, Chiou J, Traylor M, Benner C, Hsu YH, Richardson TG, Surendran P, Mahajan A, Robins C, Vasquez-Grinnell SG, Hou L, Kvikstad EM, Burren OS, Davitte J, Ferber KL, Gillies CE, Hedman ÅK, Hu S, Lin T, Mikkilineni R, Pendergrass RK, Pickering C, Prins B, Baird D, Chen CY, Ward LD, Deaton AM, Welsh S, Willis CM, Lehner N, Arnold M, Wörheide MA, Suhre K, Kastenmüller G, Sethi A, Cule M, Raj A; Alnylam Human Genetics; AstraZeneca Genomics Initiative; Biogen Biobank Team; Bristol Myers Squibb; Genentech Human Genetics; GlaxoSmithKline Genomic Sciences; Pfizer Integrative Biology; Population Analytics of Janssen Data Sciences; Regeneron Genetics Center; Burkitt-Gray L, Melamud E, Black MH, Fauman EB, Howson JMM, Kang HM, McCarthy MI, Nioi P, Petrovski S, Scott RA, Smith EN, Szalma S, Waterworth DM, Mitnaul LJ, Szustakowski JD, Gibson BW, Miller MR, Whelan CD. Plasma proteomic associations with genetics and health in the UK Biobank. Nature. 2023 Oct;622(7982):329-338. doi: 10.1038/s41586-023-06592-6. Epub 2023 Oct 4. PMID: 37794186; PMCID: PMC10567551.
Jurgens SJ, Pirruccello JP, Choi SH, Morrill VN, Chaffin M, Lubitz SA, Lunetta KL, Ellinor PT. Adjusting for common variant polygenic scores improves yield in rare variant association analyses. Nat Genet. 2023 Apr;55(4):544-548. doi: 10.1038/s41588-023-01342-w. Epub 2023 Mar 23. PMID: 36959364; PMCID: PMC11078202.

2 UKB-PPP

From the Synapse UKB-PPP website:

The UK Biobank Pharma Proteomics Project (UKB-PPP) is a collaboration between the UK Biobank (UKB) and thirteen biopharmaceutical companies characterising the plasma proteomic profiles of 54,219 UKB participants. As part of a collaborative analysis across the thirteen UKB-PPP partners, we conducted comprehensive protein quantitative trait loci (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 85% are newly discovered, in addition to ancestry-specific pQTL mapping in non-Europeans. We identify independent secondary associations in 87% of cis and 30% of trans loci, expanding the catalogue of genetic instruments for downstream analyses.

3 Getting the Data

The UKB-PPP pQTL summary statistics are available via Synapse for the “discovery” UKB EUR population. To download, it requires creating a Synapse account, obtaining a personal access token, and adding files (protein and RSID mapping files and pQTL summary statistics) to a download list.

After logging into Synapse and adding desired files to the download list, these are the steps:

cd /Users/charleenadams/acy1_bmi
conda create -n synapse_env -c conda-forge mamba
conda activate synapse_env
pip install synapseclient
synapse -h
synapse get-download-list
synapse login "cd...." eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ6SjVSSzpMN1RMOlQ3TDc6M1ZYNjpKRU9VOjY0NFI6VTNJWDo1S1oyOjdaQ0s6RlBUSCIsImFsZyI6IlJTMjU2In0.eyJhY2Nlc3MiOnsic2NvcGUiOlsidmlldyIsImRvd25sb2FkIiwibW9kaWZ5Il0sIm9pZGNfY2xhaW1zIjp7fX0sInRva2VuX3R5cGUiOiJQRVJTT05BTF9BQ0NFU1NfVE9LRU4iLCJpc3MiOiJodHRwczovL3JlcG8tcHJvZC5wcm9kLnNhZ2ViYXNlLm9yZy9hdXRoL3YxIiwiYXVkIjoiMCIsIm5iZiI6MTcyOTcwNjgzOSwiaWF0IjoxNzI5NzA2ODM5LCJqdGkiOiIxMzA3MyIsInN1YiI6IjM0NzI3ODMifQ.B65yULl25FZ1vmFOXEE6RaNYOhXTXZ5ov5JLVP47nJpejRA6oMfcwOcY1z4RH7bLFQ48xzRhc1I8kSbexid8jpoKnQJlBI8mVNItvv92dZNDrNN_8cvXtHNLVb6O12jwIZ0XDuQxcLuxm0kF52IZSXWMtwBzBeyKckkNYL9LTbQKDmFRmOpwZGVjkZyT7wsiuLLjj9bTcMfFcY-IstD7xnK4O51D_UejK6vg-Eh0Uw23i4NHhMAA02-D-tZq1yV0X8yI0EIOFoLQ3AKtXgrr4sE6zDGz4KTPaqVM557Vv9Aqk2KtnfiZRSWWxWYPbJm1Ty8AYLBDv6Iq-tdAval8kw

I placed a copy of the RSID and protein mapping files at: /Users/charleenadams/acy1_bmi/maps

4 Bioinformatic File Preparation

4.1 Look at RSID Map

In terminal window, look at the rsid map for chromosome 1. It’s nice that they give us positions for both POS19 and POS38.

zless - olink_rsid_map_mac5_info03_b0_7_chr1_patched_v2.tsv.gz

ID      REF     ALT     rsid    POS19   POS38
1:10177:A:AC:imp:v1     A       AC      rs367896724     10177   10177

4.2 Untar the ACY1 Data

# ls *.tar | wc -l 
for file in *.tar; do tar -xf "$file"; done

4.3 Merge RSID Map with the ACY1 Data

The rsid numbers need to be merged into the pQTL summary statistics files for MR. When we unpack the files, they produce subdirectories, though in this case, we just get one subdirectory for ACY1. So, the code below goes into subdirectories and annotates all the files in all of them with their corresponding rsid mapping files by chromosome or would if we needed it to.

#!/bin/bash

# Base directory where your data is located
base_dir="/Users/charleenadams/acy1_bmi"
# Directory where RSID map files are located
map_dir="$base_dir/maps"

# Log file to track progress
log_file="$base_dir/add_rsid_log.txt"
echo "Log started on $(date)" > "$log_file"

# Loop through each subdirectory in the base directory
for dir in "$base_dir"/*; do
    if [[ -d "$dir" ]]; then
        echo "Processing directory: $dir" | tee -a "$log_file"

        # Loop through each .gz file in the subdirectory
        for file in "$dir"/*.gz; do
            # Extract chromosome number from file name
            chrom=$(basename "$file" | sed -E 's/.*chr([0-9XY]+).*/\1/')

            # Define the RSID map file for this chromosome
            map_file="$map_dir/olink_rsid_map_mac5_info03_b0_7_chr${chrom}_patched_v2.tsv.gz"
            output_file="${file%.gz}_rsid_added.gz"

            echo "  Processing file: $file (Chromosome $chrom)" | tee -a "$log_file"

            # Check if the chromosome number was extracted and the RSID map file exists
            if [[ -n "$chrom" && -f "$map_file" ]]; then
                echo "    Using RSID map: $map_file" | tee -a "$log_file"

                # Remove existing rsid_added file to avoid overwrite prompts
                rm -f "$output_file"

                # Join the original file with the RSID map based on the ID column, adding GENPOS37 (POS19) and GENPOS38 (POS38) while keeping the original ID
                gunzip -c "$file" | \
                awk 'BEGIN {OFS="\t"}
                     NR==FNR {map[$1]=$4; genpos37[$1]=$5; genpos38[$1]=$6; next}
                     ($3 in map) {print $0, map[$3], genpos37[$3], genpos38[$3]}
                     !($3 in map) {print $0, ".", ".", "."}' <(gunzip -c "$map_file") - | \
                gzip -c > "$output_file"

                # Confirm successful processing
                if [[ -s "$output_file" ]]; then
                    echo "    RSID added with GENPOS37 and GENPOS38. Output saved to $output_file" | tee -a "$log_file"
                else
                    echo "    Warning: $output_file is empty!" | tee -a "$log_file"
                fi
            else
                echo "    Error: RSID map file for chromosome $chrom not found or chromosome not identified." | tee -a "$log_file"
            fi
        done
    fi
done

echo "Log ended on $(date)" >> "$log_file"
echo "Processing complete. Check $log_file for details."

#Run with: 
chmod +x add_rsid.sh
nohup ./add_rsid.sh > add_rsid_log.txt 2>&1 &
ps -p 27745 -o stat,etime

4.4 Check Progress

Lots of things can go wrong in loops. Good to watch their progress.

List all files ending in _rsid_added.gz in all subdirectories
Remove all files ending in _rsid_added.gz in all subdirectories (in case they are made incorrectly)

find . -type f \( -name '*_rsid_added.gz' -o -name '*_rsid_added_rsid_added.gz' \) -exec ls {} +
find . -type f \( -name '*_rsid_added.gz' -o -name '*_rsid_added_rsid_added.gz' \) -exec rm {} +

4.5 Merge CHRs

#!/bin/bash

# Define base directory
base_dir="/Users/charleenadams/acy1_bmi"

# Log file for tracking
log_file="$base_dir/merge_rsid_files_log.txt"
echo "Processing started on $(date)" > "$log_file"

# Loop through directories to process, excluding specific ones
for dir in "$base_dir"/*/; do
    # Skip directories you don’t want to process
    if [[ "$dir" == *"maps"* || "$dir" == *"BMI_males_GIANT"* || "$dir" == *"scripts"* ]]; then
        continue
    fi
    
    echo "Processing directory: $dir" | tee -a "$log_file"

    # Collect all '_rsid_added.gz' files in this directory
    rsid_files=("$dir"*_rsid_added.gz)

    # If there are files to merge
    if [[ ${#rsid_files[@]} -gt 0 ]]; then
        merged_file="${dir%/}_merged_rsid_added.gz"
        
        # Extract header from the first chromosome file
        gunzip -c "${rsid_files[0]}" | head -n 1 > temp_header.txt

        # Create a temporary file for content without headers
        > temp_content.txt

        # Append each file's content without header to the temporary content file
        for file in "${rsid_files[@]}"; do
            gunzip -c "$file" | tail -n +2 >> temp_content.txt
        done

        # Concatenate header and content into final merged file
        cat temp_header.txt temp_content.txt | gzip -c > "$merged_file"
        rm temp_header.txt temp_content.txt

        # Confirm if the merged file was created successfully and isn’t empty
        if [[ -s "$merged_file" ]]; then
            echo "  Merged file created: $merged_file" | tee -a "$log_file"
        else
            echo "  Warning: Merged file is empty for directory $dir" | tee -a "$log_file"
        fi
    else
        echo "  No rsid_added.gz files found in directory $dir" | tee -a "$log_file"
    fi
done

echo "Processing completed on $(date)" >> "$log_file"
echo "Check $log_file for details.""

# Run with:
chmod +x chrom_merge.sh
nohup ./chrom_merge.sh > chrom_merge_log.txt 2>&1 &
# Note: produces two log files only because I added one in the script not just to nohup.

5 Can’t Use the Giant BMI Summary Statistics

I looked at the GIANT BMI data, but our SNPs aren’t in it, maybe because it’s an older GWAS; didn’t include as many SNPs.

We will use the Jurgens (UKB) BMI summary statistics instead but need to consider methods for sample overlap, such as MRLap.

#!/bin/bash

# Define the input VCF file and output TSV file paths
vcf_file="/Users/charleenadams/acy1_bmi/BMI_males_GIANT/ieu-a-785.vcf.gz"
output_tsv="/Users/charleenadams/acy1_bmi/BMI_males_GIANT/ieu-a-785.tsv"

# Add the header line to the output file
echo -e "CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tieu-a-785" > "$output_tsv"

# Convert VCF to TSV and append the data to the output file
bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO\t%FORMAT[\t%SAMPLE]\n' "$vcf_file" >> "$output_tsv"

echo "Conversion completed. Output saved to: $output_tsv"

6 Format ACY1 Data for Clumping

library(data.table)
acy1 <- fread("/Users/charleenadams/acy1_bmi/ACY1_Q03154_OID20137_v1_Cardiometabolic_merged_rsid_added")

# If the first column contains all fields as a single string, use `tstrsplit` to separate it
acy1 <- acy1[, c("CHROM", "GENPOS", "ID", "ALLELE0", "ALLELE1", "A1FREQ", "INFO", "N", 
                 "TEST", "BETA", "SE", "CHISQ", "LOG10P", "EXTRA") := 
                 tstrsplit(`CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P EXTRA`, 
                           split = " ", fixed = TRUE)][, 
                           `CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P EXTRA` := NULL]

# Define columns to be converted to numeric
numeric_columns <- c("CHROM", "GENPOS", "A1FREQ", "INFO", "N", "BETA", "SE", "CHISQ", "LOG10P")

# Convert specified columns to numeric
acy1[, (numeric_columns) := lapply(.SD, as.numeric), .SDcols = numeric_columns]

# Convert -LOG10P to p-value assuming LOG10P represents -log10(p-value)
acy1$Pval <- 10^(acy1$LOG10P * -1)
write.table(acy1,"/Users/charleenadams/acy1_bmi/acy1_formatted.txt",
            row.names = FALSE, col.names = TRUE, quote = FALSE, sep = '\t')

6.1 Obtain Bfiles from PLINK2

From EUR 1000 Genomes (MAF<0.001)

#!/bin/bash

# ----------------------------------------------------------------------------------
# Script: create_bfiles_for_all_chromosomes.sh
# Description: Convert VCF.gz files for all chromosomes to PLINK binary files and merge them.
# Usage: ./create_bfiles_for_all_chromosomes.sh
# ----------------------------------------------------------------------------------

# ------------------------------
# Step 1: Define Paths
# ------------------------------

# Directory containing VCF files (one per chromosome)
VCF_DIR="/Users/charleenadams/temp_BI/LD_matrices/1000Genomes_scratch/annotated/eur"

# Output directory for PLINK binary files
OUTPUT_DIR="/Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001"

# Prefix for merged PLINK files
MERGED_PREFIX="${OUTPUT_DIR}/EUR"

# Ensure output directory exists
mkdir -p "$OUTPUT_DIR"

# ------------------------------
# Step 2: Check for Required Tools
# ------------------------------

command_exists () {
    command -v "$1" >/dev/null 2>&1 ;
}

if ! command_exists plink2 ; then
    echo "Error: plink2 is not installed or not in your PATH."
    exit 1
fi

if ! command_exists tabix ; then
    echo "Error: tabix is not installed or not in your PATH."
    exit 1
fi

# ------------------------------
# Step 3: Process Each Chromosome
# ------------------------------

echo "Processing chromosomes..."

for CHR in {1..22}; do
    # Define input VCF and output prefix for the current chromosome
    INPUT_VCF="${VCF_DIR}/ALL.chr${CHR}.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_annotated_EUR.vcf.gz"
    OUTPUT_PREFIX="${OUTPUT_DIR}/ALL.chr${CHR}.maf001_filtered_EUR"

    # Check if VCF file exists
    if [ ! -f "$INPUT_VCF" ]; then
        echo "Warning: VCF file for chromosome $CHR not found. Skipping..."
        continue
    fi

    # Index the VCF file if necessary
    INDEX_FILE="${INPUT_VCF}.tbi"
    if [ ! -f "$INDEX_FILE" ]; then
        echo "Indexing VCF file for chromosome $CHR..."
        tabix -p vcf "$INPUT_VCF"
    fi

    # Convert VCF to PLINK binary files
    echo "Converting chromosome $CHR to PLINK binary format..."
    plink2 \
      --vcf "$INPUT_VCF" \
      --keep-allele-order \
      --maf 0.001 \
      --mac 1 \
      --make-bed \
      --max-alleles 2 \
      --snps-only just-acgt \
      --out "$OUTPUT_PREFIX" \
      --threads 16 \
      --memory 64000

    if [ $? -ne 0 ]; then
        echo "Error: Failed to process chromosome $CHR. Skipping..."
        continue
    fi

    echo "Chromosome $CHR processed successfully: $OUTPUT_PREFIX"
done

# ------------------------------
# Step 4: Merge All Chromosomes
# ------------------------------

echo "Merging all chromosomes into a single PLINK dataset..."

# Create a merge list
MERGE_LIST="${OUTPUT_DIR}/merge_list.txt"
ls "${OUTPUT_DIR}/ALL.chr"*.maf001_filtered_EUR.bed | sed 's/.bed//' > "$MERGE_LIST"

# Use the `.fam` file from the first chromosome as the reference for merging
FIRST_FAM="${OUTPUT_DIR}/ALL.chr1.maf001_filtered_EUR.fam"

if [ ! -f "$FIRST_FAM" ]; then
    echo "Error: .fam file for chr1 not found. Cannot merge chromosomes."
    exit 1
fi

# Merge using PLINK2
plink2 \
  --pmerge-list "$MERGE_LIST" \
  --fam "$FIRST_FAM" \
  --make-bed \
  --out "$MERGED_PREFIX" \
  --threads 16 \
  --memory 64000

if [ $? -ne 0 ]; then
    echo "Error: Merging chromosomes failed."
    exit 1
fi

echo "All chromosomes merged successfully: ${MERGED_PREFIX}.bed, ${MERGED_PREFIX}.bim, ${MERGED_PREFIX}.fam"
echo "Done."

# Run with: 
nohup ./create_bfiles_from_vcf_MAF001.sh > create_bfiles_from_vcf_MAF001.log.txt 2>&1 &

6.2 Merge the CHR Bfiles

cat ALL.chr{1..22}.maf001_filtered_EUR.bim > EUR.bim
cp ALL.chr1.maf001_filtered_EUR.fam EUR.fam

6.3 Harmonize Bfiles and ACY1 Data

library(data.table)

# Load PLINK .bim file
plink_bim2 <- fread("/Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001/EUR.bim",header = FALSE, col.names = c("CHR", "SNP", "CM", "BP", "ALT_PLINK", "REF_PLINK"))
cat("PLINK .bim file contains", nrow(plink_bim2), "variants.\n")

# Check for duplicate SNPs
duplicates <- plink_bim2[duplicated(plink_bim2$SNP), ]
dim(duplicates)

# Count the total number of duplicates
num_duplicates <- sum(duplicated(plink_bim2$SNP))
cat("Total duplicate SNPs:", num_duplicates, "\n")

# Remove SNPs with missing IDs
plink_bim2_clean <- plink_bim2[plink_bim2$SNP != ".", ]
cat("Filtered PLINK .bim file contains", nrow(plink_bim2_clean), "variants.\n")

# Merge summary statistics with PLINK .bim file on rsid
exp_dat <- fread("/Users/charleenadams/acy1_bmi/acy1_formatted.txt")  
harmonized_data <- merge(exp_dat, plink_bim2_clean, by.x = "rsid", by.y = "SNP", all.x = TRUE)

# Check allele alignment
harmonized_data[, flip := (ALLELE1 == REF_PLINK & ALLELE0 == ALT_PLINK)]

# Flip effect sizes where necessary
harmonized_data[flip == TRUE, `:=`(
  ALLELE1 = ALT_PLINK,
  ALLELE0 = REF_PLINK,
  BETA = -BETA  # Flip the effect size
)]

# Remove SNPs that cannot be matched
harmonized_data <- harmonized_data[!is.na(REF_PLINK) & !is.na(ALT_PLINK)]

# Save the harmonized summary statistics for ld_clump()
fwrite(harmonized_data, "/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt", sep = "\t")

7 Instrument Selection

acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")

#agnostic_instruments <- acy1[which(acy1$Pval<0.00000005),]

# Instrument suggestion from Mark
benson_list <- c("rs121912698","rs121912701","rs34017492")
instrument <- acy1[which(acy1$rsid%in%benson_list),]

# Prepare exposure data (instrument)
exp_dat <- data.frame(
  SNP = instrument$rsid,
  beta.exposure = instrument$BETA,
  se.exposure = instrument$SE,
  eaf.exposure = instrument$A1FREQ,
  effect_allele.exposure = instrument$ALLELE1,
  other_allele.exposure = instrument$ALLELE0,
  pval.exposure = instrument$Pval,
  samplesize.exposure = instrument$N,
  chr.exposure = instrument$CHROM,
  pos.exposure = instrument$POS19)

# Set the exposure label
exp_dat$exposure <- "ACY1 exposure"
exp_dat$id.exposure <- "ACY1 exposure"

# LD clump: install these first, though we didn't use this due to our need for lower frequencies alleles
# devtools::install_github("explodecomputer/genetics.binaRies")
# genetics.binaRies::get_plink_binary()

exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid

# Create the summary stats file
exp_dat <- as.data.table(exp_dat)
summary_stats <- exp_dat[, .(SNP = rsid, P = pval)]

# Write to a text file
write.table(
  summary_stats,
  file = "/Users/charleenadams/acy1_bmi/summary_stats.txt",
  quote = FALSE,
  row.names = FALSE,
  col.names = TRUE)

7.1 Clump the 3 SNPs in Plink2

 plink2 \                            
  --bfile /Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001/EUR \
  --clump summary_stats.txt \
  --clump-p1 5e-8 \
  --clump-r2 0.001 \
  --clump-kb 250 \

7.2 Are the SNPs Independent?

The results suggest they are. But is this due to the low frequencies of the alleles? Combination of rare variants, so low statistical power to detect LD?

clumping_res <- fread("/Users/charleenadams/acy1_bmi/clumped_results_stringent.clumps")
datatable(clumping_res,
          options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
          rownames = FALSE,
          caption = 'Clumping Results from Plink2')

7.3 LDlink’s Bokeh Plot

I was sceptical that our SNPs are independent. This section explains the pairwise LD between the three SNPs (rs34017492, rs121912698, rs121912701) using LDlink.

7.3.1 Legend for Bokeh Plot

Color Coding

Purple: Just the diagonal.
Blue: Perfect linkage disequilibrium (D’ = 1.0).
White: No correlation (r² = 0.0).

Interpretation

The D’ values for all SNP pairs are 1.0, suggesting that there is no historical recombination between them.
The r² values are 0.0, indicating that the SNPs are statistically independent in this dataset.
This discrepancy may be due to the very low allele frequencies of these SNPs, making their statistical correlation negligible despite perfect linkage (D’ = 1.0).

Conclusion

While the D’ metric suggests the absence of recombination, the lack of statistical correlation (r²) indicates that these SNPs may act independently in terms of genetic association due to their low frequencies. However, shouldn’t we be skeptical due to the lack of power to detect patterns between the rare variants?

8 Format Jurgens BMI Summary Statistics

acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")

# Instrument suggestion from Mark
benson_list <- c("rs121912698","rs121912701","rs34017492")
instrument <- acy1[which(acy1$rsid%in%benson_list),]

# Prepare exposure data (instrument)
exp_dat <- data.frame(
  SNP = instrument$rsid,
  beta.exposure = instrument$BETA,
  se.exposure = instrument$SE,
  eaf.exposure = instrument$A1FREQ,
  effect_allele.exposure = instrument$ALLELE1,
  other_allele.exposure = instrument$ALLELE0,
  pval.exposure = instrument$Pval,
  samplesize.exposure = instrument$N,
  chr.exposure = instrument$CHROM,
  pos.exposure = instrument$POS19)

# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"

bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
  SNP = bmi$SNP,
  beta.outcome = bmi$BETA,
  se.outcome = bmi$SE,
  eaf.outcome = bmi$A1FREQ,
  effect_allele.outcome = bmi$ALLELE1,
  other_allele.outcome = bmi$ALLELE0,
  pval.outcome = bmi$P,
  samplesize.outcome = 460000,         
  chr.outcome = bmi$CHR,
  pos.outcome = bmi$BP)

# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"

# Harmonize datasets for MR
harmonized_data <- harmonise_data(
  exposure_dat = exp_dat,
  outcome_dat = out_dat)
write.csv(harmonized_data,"/Users/charleenadams/acy1_bmi/harmonized_data.csv")

9 IVW MR Results

In this section, we perform Steiger filtering to ascertain whether our instrumental SNPs explain more of the variance in in ACY1 than BMI. This is screen for whether we’d gotten the hypothesized direction of effect correct (we did; more of the variance in ACY1 than BMI is explained by instrumental SNPs). Then we explore whether there is heterogeneity in the effects of our instruments (a sign of possible pleiotropy) and whether there is evidence of horizontal pleiotropy in the IVW using the MR-Egger intercept test.

harmonized_data <- read.csv("/Users/charleenadams/acy1_bmi/harmonized_data.csv")
# Run Steiger filtering
dat.steiger <- steiger_filtering(harmonized_data)

# Round numerical values in the 'res' dataframe to desired decimal places
dat.steiger_rounded <- dat.steiger
num_cols <- sapply(dat.steiger_rounded, is.numeric)  # Identify numeric columns
dat.steiger_rounded[, num_cols] <- lapply(dat.steiger_rounded[, num_cols], round, digits = 3) 
col.list <- c("rsq.exposure", "rsq.outcome","steiger_dir","steiger_pval","SNP","effect_allele.exposure", "other_allele.exposure", "effect_allele.outcome","other_allele.outcome",   "beta.exposure","beta.outcome","eaf.exposure", "eaf.outcome","se.exposure", "se.outcome","pval.exposure","pval.outcome")
steiger.list <- dat.steiger_rounded[,col.list]
datatable(steiger.list,
          options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
          rownames = FALSE,
          caption = 'Harmonized Data for MR with Steiger Info')

9.1 Heterogeneity Test

Results indicate no evidence for heterogeneity.

library(DT)
harmonized_data <- read.csv("/Users/charleenadams/acy1_bmi/harmonized_data.csv")
# Heterogeneity
het <- mr_heterogeneity(dat.steiger)
write.csv(dat.steiger, "/Users/charleenadams/acy1_bmi/dat.steiger.csv")

# Round numerical values in the 'res' dataframe to desired decimal places
het_rounded <- het
num_cols <- sapply(het_rounded, is.numeric)  # Identify numeric columns
het_rounded[, num_cols] <- lapply(het_rounded[, num_cols], round, digits = 3)  
het_rounded$id.exposure=NULL
het_rounded$id.outcome=NULL
datatable(het_rounded,
          options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
          rownames = FALSE,
          caption = 'Heterogeneity Test Results')

9.2 Pleiotropy Test

Results indicate no evidence for unbalanced pleiotropy in the IVW using the MR-Egger intercept test.

library(DT)
harmonized_data <- read.csv("/Users/charleenadams/acy1_bmi/harmonized_data.csv")
# Pleiotropy 
plt <- mr_pleiotropy_test(dat.steiger)

# Round numerical values in the 'res' dataframe to desired decimal places
plt_rounded <- plt
num_cols <- sapply(plt_rounded, is.numeric)  # Identify numeric columns
plt_rounded[, num_cols] <- lapply(plt_rounded[, num_cols], round, digits = 3) 
plt_rounded$id.exposure=NULL
plt_rounded$id.outcome=NULL
# Create the datatable with centered alignment
datatable(plt_rounded,
          options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
          rownames = FALSE,
          caption = 'Pleiotropy Test Results')

9.3 IVW MR Results

IVW and weighted median MR tests provide evidence of causality. However, we need to be cautious due to sample overlap; it can inflate the effect estimates.

MR Results for ACY1 on BMI
	outcome	exposure	method	nsnp	b	se	pval
3	BMI	ACY1	Inverse variance weighted	3	0.021	0.008	0.014
1	BMI	ACY1	MR Egger	3	0.011	0.016	0.627
2	BMI	ACY1	Weighted median	3	0.019	0.009	0.025
5	BMI	ACY1	Weighted mode	3	0.019	0.009	0.168

10 More Instruments

Choose a wider swathe of SNPs…

# Load the required library
library(data.table)

acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
agnostic_instruments <- acy1[which(acy1$Pval<0.00000005),]

# Prepare exposure data (instrument)
exp_dat <- data.frame(
  SNP = agnostic_instruments$rsid,
  beta.exposure = agnostic_instruments$BETA,
  se.exposure = agnostic_instruments$SE,
  eaf.exposure = agnostic_instruments$A1FREQ,
  effect_allele.exposure = agnostic_instruments$ALLELE1,
  other_allele.exposure = agnostic_instruments$ALLELE0,
  pval.exposure = agnostic_instruments$Pval,
  samplesize.exposure = agnostic_instruments$N,
  chr.exposure = agnostic_instruments$CHROM,
  pos.exposure = agnostic_instruments$POS19)

# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"

exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid

# Create the summary stats file
exp_dat <- as.data.table(exp_dat)
summary_stats <- exp_dat[, .(SNP = rsid, P = pval)]

# Write to a text file
write.table(
  summary_stats,
  file = "/Users/charleenadams/acy1_bmi/summary_stats_large.txt",
  quote = FALSE,
  row.names = FALSE,
  col.names = TRUE)

10.1 Larger Instrument Set Clumping

Performed with Plink2

 plink2 \                            
  --bfile /Users/charleenadams/acy1_bmi/EUR_1000Gen_bfiles_MAF_001/EUR \
  --clump summary_stats_large.txt \
  --clump-p1 5e-8 \
  --clump-r2 0.001 \
  --clump-kb 250 \
  --out clumped_results_large

11 MR-PRESSO with Expanded Instrument

Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) is a widely used method in Mendelian Randomization (MR) analysis that provides several key advantages:

11.1 1. Detection and Correction of Horizontal Pleiotropy

Horizontal pleiotropy occurs when genetic variants influence the outcome through pathways unrelated to the exposure, violating MR assumptions.
MR-PRESSO:
- Identifies outlier variants responsible for pleiotropy using residual sum of squares (RSS).
- Corrects for the influence of these outliers to refine causal estimates.

11.2 2. Outlier Identification

MR-PRESSO performs an outlier test to identify SNPs with large residuals, suggesting they may exert pleiotropic effects.
This allows researchers to address violations of the instrumental variable assumptions without removing all variants indiscriminately.

11.3 3. Distortion Test

The distortion test assesses whether removing outlier SNPs significantly changes the causal estimate.
This provides evidence on whether pleiotropy is distorting the MR results, offering a robust sensitivity analysis.

11.4 4. Improved Robustness

By removing pleiotropic outliers, MR-PRESSO enhances the reliability of causal estimates compared to simpler methods like inverse-variance weighted (IVW) MR.
It is particularly useful in cases where pleiotropy is suspected or where the data contain outliers.

11.5 5. Bootstrap Method for Inference

MR-PRESSO uses a bootstrap-based approach to derive p-values and confidence intervals, which can improve statistical inference in small datasets or datasets with skewed distributions.

library(data.table)
library(MRPRESSO)
library(kableExtra)
library(ggplot2)
library(dplyr)

# Step 1: Load the dataset
clumping_results <- fread("/Users/charleenadams/acy1_bmi/clumped_results_large.clumps")
#print(colnames(clumping_results))

# Step 1: Rename the `#CHROM` column to `CHROM`
setnames(clumping_results, old = "#CHROM", new = "CHROM")

# Filter rows where SP2 is "."
independent_snps <- clumping_results[SP2 == ".", .(CHROM, POS, ID, P)]

# Save the independent SNPs to a new file
fwrite(
  independent_snps,
  file = "independent_snps_for_MR.txt",
  sep = "\t",
  col.names = TRUE
)
#cat("Independent SNPs have been extracted and saved to 'independent_snps_for_MR.txt'.")

# Read in the ACY1 data
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")
agnostic_instruments <- acy1[which(acy1$Pval<0.00000005),]
agnostic_instruments <- agnostic_instruments[which(agnostic_instruments$rsid%in%independent_snps$ID),]

# Prepare exposure data (instrument)
exp_dat <- data.frame(
  SNP = agnostic_instruments$rsid,
  beta.exposure = agnostic_instruments$BETA,
  se.exposure = agnostic_instruments$SE,
  eaf.exposure = agnostic_instruments$A1FREQ,
  effect_allele.exposure = agnostic_instruments$ALLELE1,
  other_allele.exposure = agnostic_instruments$ALLELE0,
  pval.exposure = agnostic_instruments$Pval,
  samplesize.exposure = agnostic_instruments$N,
  chr.exposure = agnostic_instruments$CHROM,
  pos.exposure = agnostic_instruments$POS19)

# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"

# Columns needed by ld_clump(), though we clumped directly with Plink
# Kept for legacy:
exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid

# Read in BMI data 
bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
  SNP = bmi$SNP,
  beta.outcome = bmi$BETA,
  se.outcome = bmi$SE,
  eaf.outcome = bmi$A1FREQ,
  effect_allele.outcome = bmi$ALLELE1,
  other_allele.outcome = bmi$ALLELE0,
  pval.outcome = bmi$P,
  samplesize.outcome = 460000,         
  chr.outcome = bmi$CHR,
  pos.outcome = bmi$BP)

# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"

# Harmonize datasets for MR
 harmonized_data <- harmonise_data(
   exposure_dat = exp_dat,
   outcome_dat = out_dat)
 
 dat_indep_steiger <- steiger_filtering(harmonized_data)
 write.csv(dat_indep_steiger,"/Users/charleenadams/acy1_bmi/harmonized_data_steiger_mr.csv")
 
 mr_expanded_plt <- mr_pleiotropy_test(dat_indep_steiger)
 mr_expanded_plt

##   id.exposure id.outcome outcome exposure egger_intercept          se      pval
## 1        ACY1        BMI     BMI     ACY1     0.001501008 0.002049837 0.4760925

 mr_expanded_het <- mr_heterogeneity(dat_indep_steiger)
 mr_expanded_het

##   id.exposure id.outcome outcome exposure                    method        Q
## 1        ACY1        BMI     BMI     ACY1                  MR Egger 11.24004
## 2        ACY1        BMI     BMI     ACY1 Inverse variance weighted 11.77624
##   Q_df    Q_pval
## 1   14 0.6670847
## 2   15 0.6958811

 mr_expanded_res <- mr(dat_indep_steiger)
 #mr_expanded_res

 # Round numerical values in the 'res' dataframe to desired decimal places
res_rounded <- mr_expanded_res
num_cols <- sapply(res_rounded, is.numeric)  # Identify numeric columns
res_rounded[, num_cols] <- lapply(res_rounded[, num_cols], round, digits = 3)
res_rounded$id.exposure=NULL
res_rounded$id.outcome=NULL

# Filter out rows where the method is "Simple mode"
res_rounded <- res_rounded[res_rounded$method != "Simple mode", ]
res_rounded <- res_rounded[order(res_rounded$method),]

# Generate the table
kable(
  res_rounded,
  format = "html",
  caption = "MR Expanded Results for ACY1 on BMI",
  align = "c"
) %>%
  kable_styling(full_width = FALSE, position = "center", bootstrap_options = c("striped", "hover")) %>%
  column_spec(1:7, width = "10em")  # Adjust column width as necessary

MR Expanded Results for ACY1 on BMI
	outcome	exposure	method	nsnp	b	se	pval
3	BMI	ACY1	Inverse variance weighted	16	0.018	0.009	0.051
1	BMI	ACY1	MR Egger	16	0.013	0.011	0.251
2	BMI	ACY1	Weighted median	16	0.015	0.013	0.267
5	BMI	ACY1	Weighted mode	16	0.008	0.016	0.609

# Prepare the data for plotting
res_plot <- res_rounded %>%
  mutate(
    lower = b - 1.96 * se,  # Lower bound of the confidence interval
    upper = b + 1.96 * se,  # Upper bound of the confidence interval
    method = factor(method, levels = unique(method))  # Ensure consistent ordering of methods
  )

# Create the forest plot
ggplot(res_plot, aes(x = method, y = b, ymin = lower, ymax = upper, color = method)) +
  geom_pointrange(size = 0.8) +  # Points with error bars
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey") +  # Add a dashed line at 0
  coord_flip() +  # Flip coordinates for horizontal orientation
  labs(
    title = "Forest Plot of Expanded MR Results for ACY1 on BMI",
    x = "Method",
    y = "Effect Size (Beta)",
    color = "Method"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    legend.position = "top",
    axis.title.y = element_text(size = 12),
    axis.title.x = element_text(size = 12)
  )

ggsave("forest_plot_expanded_acy1_bmi.png", width = 8, height = 6, dpi = 300) 

## MRPRESSO

# Prepare the data for MR-PRESSO
mrpresso_input <- data.frame(
  SNP = dat_indep_steiger$rsid,               # SNP identifiers
  Y_effect = dat_indep_steiger$beta.outcome,  # Outcome effect sizes
  Y_se = dat_indep_steiger$se.outcome,        # Outcome standard errors
  E1_effect = dat_indep_steiger$beta.exposure, # Exposure effect sizes
  E1_se = dat_indep_steiger$se.exposure       # Exposure standard errors
)

# Check the structure of the input data
#print(head(mrpresso_input))

# Run MR-PRESSO using the global method
tryCatch({
  mrpresso_results <- mr_presso(
    BetaOutcome = "Y_effect",         # Outcome effect size column
    BetaExposure = "E1_effect",      # Exposure effect size column
    SdOutcome = "Y_se",              # Outcome standard error column
    SdExposure = "E1_se",            # Exposure standard error column
    OUTLIERtest = TRUE,              # Run the outlier test
    DISTORTIONtest = TRUE,           # Run the distortion test
    data = mrpresso_input,           # Input data
    NbDistribution = 1000,           # Number of bootstrap iterations
    SignifThreshold = 0.05           # Significance threshold
  )
  
  # Display the results
  print(mrpresso_results)
  
  # Save the results to a file
  saveRDS(mrpresso_results, file = "/Users/charleenadams/acy1_bmi/mrpresso_results.rds")
  cat("MR-PRESSO analysis completed successfully. Results saved to /Users/charleenadams/acy1_bmi/mrpresso_results.rds\n")
}, error = function(e) {
  cat("Error encountered during MR-PRESSO analysis:\n")
  cat(e$message, "\n")
})

## $`Main MR results`
##    Exposure       MR Analysis Causal Estimate          Sd   T-stat    P-value
## 1 E1_effect               Raw      0.01810085 0.008201552 2.207003 0.04331256
## 2 E1_effect Outlier-corrected              NA          NA       NA         NA
## 
## $`MR-PRESSO results`
## $`MR-PRESSO results`$`Global Test`
## $`MR-PRESSO results`$`Global Test`$RSSobs
## [1] 14.00632
## 
## $`MR-PRESSO results`$`Global Test`$Pvalue
## [1] 0.668
## 
## 
## 
## MR-PRESSO analysis completed successfully. Results saved to /Users/charleenadams/acy1_bmi/mrpresso_results.rds

# Save MR-PRESSO results
write.csv(mrpresso_results$`Main MR results`, file = "mrpresso_results.csv", row.names = FALSE)

# Extract main results from MR-PRESSO output
mrpresso_main_results <- as.data.frame(mrpresso_results$`Main MR results`)

# Generate a kable table
kable(
  mrpresso_main_results,
  format = "html",
  caption = "MR-PRESSO Results for ACY1 on BMI",
  align = "c"
) %>%
  kable_styling(full_width = FALSE, position = "center", bootstrap_options = c("striped", "hover"))

MR-PRESSO Results for ACY1 on BMI
Exposure	MR Analysis	Causal Estimate	Sd	T-stat	P-value
E1_effect	Raw	0.0181008	0.0082016	2.207003	0.0433126
E1_effect	Outlier-corrected	NA	NA	NA	NA

12 MRlap

MRlap accounts for the winner’s curse, which refers to the tendency for associations found in the initial discovery sample to be overestimated when evaluated in a subsequent sample. In MR studies, this can lead to inflated estimates of causal effects. IMPORTANTLY, we used the UKB-PPP “Discovery” pQTLs for ACY1. For this reason, the strength of the association between SNPs and ACY1 could be inflated and lead to incorrect conclusions.

MRlap uses cross-trait linkage disequilibrium score regression (LDSC) to estimate and adjust for winner’s curse and sample overlap. By doing so, it provides more accurate estimates of causal effects, reducing the likelihood of overestimating the strength of associations.

library(TwoSampleMR)
library(MRlap)
library(data.table)

# Step 1: Load the dataset
clumping_results <- fread("/Users/charleenadams/acy1_bmi/clumped_results_large.clumps")
# Step 1: Rename the `#CHROM` column to `CHROM`
setnames(clumping_results, old = "#CHROM", new = "CHROM")

# Filter rows where SP2 is "."
independent_snps <- clumping_results[SP2 == ".", .(CHROM, POS, ID, P)]

# Save the independent SNPs to a new file
fwrite(
  independent_snps,
  file = "independent_snps_for_MR.txt",
  sep = "\t",
  col.names = TRUE
)
#cat("Independent SNPs have been extracted and saved to 'independent_snps_for_MR.txt'.")

# Read in the ACY1 data
acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")

exp_dat <- data.frame(
  SNP = acy1$rsid,
  beta.exposure = acy1$BETA,
  se.exposure = acy1$SE,
  eaf.exposure = acy1$A1FREQ,
  effect_allele.exposure = acy1$ALLELE1,
  other_allele.exposure = acy1$ALLELE0,
  pval.exposure = acy1$Pval,
  samplesize.exposure = acy1$N,
  chr.exposure = acy1$CHROM,
  pos.exposure = acy1$POS19)

# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"

# Columns needed by ld_clump(), though we clumped directly with Plink
# Kept for legacy:
exp_dat$rsid <- exp_dat$SNP
exp_dat$pval <- exp_dat$pval.exposure
exp_dat$id <- exp_dat$rsid

# Read in BMI data 
bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
  SNP = bmi$SNP,
  beta.outcome = bmi$BETA,
  se.outcome = bmi$SE,
  eaf.outcome = bmi$A1FREQ,
  effect_allele.outcome = bmi$ALLELE1,
  other_allele.outcome = bmi$ALLELE0,
  pval.outcome = bmi$P,
  samplesize.outcome = 460000,         
  chr.outcome = bmi$CHR,
  pos.outcome = bmi$BP)

# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"

# Step 2: Prepare the exposure data
exp_dat <- as.data.table(exp_dat)
exposure_data <- exp_dat[, list(
  rsid = SNP,                           # SNP identifier
  a1 = effect_allele.exposure,          # Alternate (effect) allele
  a2 = other_allele.exposure,           # Reference allele
  beta = beta.exposure,                 # Effect size
  se = se.exposure,                     # Standard error
  N = samplesize.exposure,              # Sample size
  chr = chr.exposure,                   # Chromosome
  pos = pos.exposure                    # Position
)]

# Step 3: Save the exposure data
fwrite(
  exposure_data,
  file = "/Users/charleenadams/acy1_bmi/exposure_data_MRlap.tsv",
  sep = "\t"
)
#cat("Exposure data saved to: /Users/charleenadams/acy1_bmi/exposure_data_MRlap.tsv\n")

# Step 4: Prepare the outcome data
out_dat <- as.data.table(out_dat)
outcome_data <- out_dat[, list(
  rsid = SNP,                           # SNP identifier
  a1 = effect_allele.outcome,           # Alternate (effect) allele
  a2 = other_allele.outcome,            # Reference allele
  beta = beta.outcome,                  # Effect size
  se = se.outcome,                      # Standard error
  N = samplesize.outcome,               # Sample size
  chr = chr.outcome,                    # Chromosome
  pos = pos.outcome                     # Position
)]

# Step 5: Save the outcome data
fwrite(
  outcome_data,
  file = "/Users/charleenadams/acy1_bmi/outcome_data_MRlap.tsv",
  sep = "\t"
)
#cat("Outcome data saved to: /Users/charleenadams/acy1_bmi/outcome_data_MRlap.tsv\n")

# Step 6: Summary for verification
#cat("\nSummary of prepared datasets:\n")
#cat("Exposure data:\n")
#cat("\nOutcome data:\n")

# Load necessary libraries
# Step 1: Set file paths
exposure_file <- "/Users/charleenadams/acy1_bmi/exposure_data_MRlap.tsv"
outcome_file <- "/Users/charleenadams/acy1_bmi/outcome_data_MRlap.tsv"

# Step 2: Set file paths for LDSC input (required by MRlap)
ld_path <- "/Users/charleenadams/acy1_bmi/eur_w_ld_chr"  # Path to LD score files
hm3_file <- "/Users/charleenadams/acy1_bmi/eur_w_ld_chr/w_hm3.snplist"  # Path to HapMap3 SNP list

# Step 3: Read in the exposure and outcome data
exposure_data <- fread(exposure_file)
outcome_data <- fread(outcome_file)
exposure_data <- as.data.frame(exposure_data)
outcome_data <- as.data.frame(outcome_data)

# Step 4: Run MRlap
results <- MRlap(
  exposure = exposure_data,
  exposure_name = "ACY1",
  outcome = outcome_data,
  outcome_name = "BMI",
  ld = ld_path,
  hm3 = hm3_file
)

## <<< Preparation of analysis >>> 
## > Checking parameters 
## The p-value threshold used for selecting MR instruments is: 5e-08 
## The distance used for pruning MR instruments is:  500 Kb 
## Distance-based pruning will be used for MR instruments 
## > Processing exposure (ACY1) summary statistics... 
## # Preparation of the data... 
## The data.frame used as input is: "exposure_data".  
##    SNPID column, ok - CHR column, ok - POS column, ok - ALT column, ok - REF column, ok - BETA column, ok - SE column, ok - N column, ok 
## > Processing outcome (BMI) summary statistics... 
## # Preparation of the data... 
## The data.frame used as input is: "outcome_data".  
##    SNPID column, ok - CHR column, ok - POS column, ok - ALT column, ok - REF column, ok - BETA column, ok - SE column, ok - N column, ok 
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 
## <<< Performing cross-trait LDSC >>>  
## > Munging exposure data...

## > Munging outcome data... 
## > Running cross-trait LDSC... 
##   Please consider saving the log files and checking them to ensure that all columns were interpreted correctly and no warnings were issued for any of the summary statistics files
## > Cleaning temporary files... 
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 
## <<< Running IVW-MR >>>  
## > Identifying IVs... 
##     359 IVs with p < 5e-08 
##     0 IVs excluded - more strongly associated with the outcome than with the exposure, p < 1e-03 
##    Pruning : distance :  500 Kb 
##     18 IVs left after pruning 
## > Performing MR 
##     IVW-MR observed effect: 0.000643 ( 0.0151 )
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 
## <<< Estimating corrected effect >>>  
## > Estimating genetic architecture parameters... 
## > Estimating corrected effect... 
##     corrected effect: -0.00218 ( 0.0171 )
##     covariance between observed and corrected effect: 0.00026 
##             8000 simulations were used to estimate the variance and the covariance.
## > Testing difference between observed and corrected effect... 
## Runtime of the analysis:  5  minute(s) and  1  second(s).

# Step 6: Print a summary of results
#cat("MRlap analysis completed. Results saved to: /Users/charleenadams/acy1_bmi/mrlap_results.tsv\n")

13 MRlap Results

13.1 1. MR Correction Results

This table summarizes the results after correcting for biases in the MR analysis.

Observed Effect: The initial observed causal effect was 0.000643 with a standard error of 0.0151, based on 18 instruments (IVs).
Corrected Effect: After correction, the causal effect was -0.00218, remaining statistically non-significant (p = 0.897).
Test Diff: The observed and corrected effects did not differ significantly (p = 0.221), suggesting minimal bias correction was required.

MR Correction Results
observed_effect	observed_effect_se	m_IVs	IVs	observed_effect_p	corrected_effect	corrected_effect_se	corrected_effect_p	test_difference	p_difference
0.000643	0.0151	18	rs10119644, rs112875651, rs121912698, rs1260326, …	0.966	-0.00218	0.0169	0.897	1.22	0.221

13.2 LDSC Results

This table presents the linkage disequilibrium score regression (LDSC) results.

Heritability (h²):
- Exposure (ACY1): 15.4% (SE = 1.73%)
- Outcome (BMI): 26% (SE = 0.705%)
Genetic Correlation (rg): The genetic correlation between exposure and outcome is 0.353, indicating moderate shared genetic etiology.
Intercept Terms: Intercepts suggest low cross-trait confounding (e.g., cross-trait intercept = 0.088 with SE = 0.0075).

13.3 Interpretation of LDSC Cross-Trait Intercept

Cross-Trait Intercept: 0.088 (SE = 0.0075)
95% Confidence Interval: (0.073, 0.103)

13.3.1 Implications for our Benson and MR-PRESSO MR:

Evidence of Low Sample Overlap Bias:
- The cross-trait intercept is close to zero, and its 95% confidence interval does not include values that would indicate substantial bias.
- This strongly suggests that sample overlap is unlikely to be a major source of bias in our MR results.
Robustness of MR Findings:
- With minimal evidence of confounding from sample overlap, the MR estimates are more likely to reflect true causal relationships rather than artifacts of shared samples.

13.3.2 Caution Still Warranted, Though:

The LDSC intercept analysis supports the validity of our MR results by demonstrating that cross-trait confounding due to sample overlap is negligible. However, the fact that the corrected effect estimate is null indicates that we should be cautious in claiming causality from the IVW and MR-PRESSO analyses of ACY1 on BMI, which use the DISCOVERY UKB-PPP instruments. If there are other pQTL sources for ACY1, we should perform this analysis with them and compare.

LDSC Results
h2_exp	h2_exp_se	int_exp	h2_out	h2_out_se	int_out	gcov	gcov_se	rg	int_crosstrait	int_crosstrait_se
0.154	0.0173	0.99	0.26	0.00705	1.08	0.0707	0.00657	0.353	0.088	0.0075

13.4 Genetic Architecture Results

This table assesses the polygenicity of the exposure trait (ACY1).

Polygenicity: A polygenicity estimate of 0.000546 indicates that ACY1 is influenced by a small number of SNPs.
Per SNP Heritability: Each SNP contributes approximately 0.000245 to the total heritability, highlighting the sparse genetic architecture.

Genetic Architecture Results
polygenicity	perSNP_heritability
0.000546	0.000245

13.5 Summary Interpretation of MRlap Results

The MRlap analysis revealed the following key insights:

The observed and corrected causal effects of ACY1 on BMI were near zero and statistically non-significant, suggesting no robust causal link.
LDSC analysis indicated a moderate genetic correlation (rg = 0.353) between ACY1 and BMI, consistent with shared genetic architecture but not causality.
The genetic architecture of ACY1 is sparse, with a low polygenicity estimate (0.000546), meaning only a small subset of SNPs contributes to its heritability.

These results collectively suggest that while ACY1 and BMI share some genetic background, ACY1 may not exert a strong causal influence on BMI. But MRlap isn’t set up to address rare variants…

13.6 Saving the MRlap Results

# Load necessary library
library(openxlsx)

# Create a new Excel workbook
wb <- createWorkbook()

# Function to handle list elements and write them to Excel
write_list_to_sheet <- function(wb, sheet_name, data) {
  addWorksheet(wb, sheet_name)
  # If the data is a list, convert it to a data frame
  if (is.list(data)) {
    data <- as.data.frame(t(data))
    colnames(data) <- names(data)
  }
  writeData(wb, sheet_name, data)
}

# Write MRcorrection results
write_list_to_sheet(wb, "MRcorrection", results$MRcorrection)

# Write LDSC results
write_list_to_sheet(wb, "LDSC", results$LDSC)

# Write GeneticArchitecture results
write_list_to_sheet(wb, "GeneticArchitecture", results$GeneticArchitecture)

# Save the workbook to your desired path
saveWorkbook(wb, file = "/Users/charleenadams/acy1_bmi/mrlap_results.xlsx", overwrite = TRUE)

cat("MRlap results have been saved to 'mrlap_results.xlsx' with three tabs.")

## MRlap results have been saved to 'mrlap_results.xlsx' with three tabs.

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.49        MRPRESSO_1.0      kableExtra_1.4.0  MRlap_0.0.3.2    
##  [5] DT_0.33           biomaRt_2.62.0    readxl_1.4.3      TwoSampleMR_0.6.8
##  [9] corrplot_0.95     tidyr_1.3.1       ggrepel_0.9.6     data.table_1.16.2
## [13] dplyr_1.1.4       openxlsx_4.2.7.1  ggplot2_3.5.1    
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3               mnormt_2.1.1            httr2_1.0.6            
##   [4] rlang_1.1.4             magrittr_2.0.3          e1071_1.7-16           
##   [7] compiler_4.4.2          RSQLite_2.3.7           gdata_3.0.1            
##  [10] png_0.1-8               systemfonts_1.1.0       vctrs_0.6.5            
##  [13] quadprog_1.5-8          stringr_1.5.1           pkgconfig_2.0.3        
##  [16] crayon_1.5.3            fastmap_1.2.0           dbplyr_2.5.0           
##  [19] XVector_0.46.0          labeling_0.4.3          splitstackshape_1.4.8  
##  [22] pbivnorm_0.6.0          utf8_1.2.4              rmarkdown_2.29         
##  [25] tzdb_0.4.0              UCSC.utils_1.2.0        ragg_1.3.3             
##  [28] purrr_1.0.2             bit_4.5.0               xfun_0.49              
##  [31] zlibbioc_1.52.0         cachem_1.1.0            GenomeInfoDb_1.42.0    
##  [34] jsonlite_1.8.9          progress_1.2.3          blob_1.2.4             
##  [37] psych_2.4.6.26          lavaan_0.6-19           parallel_4.4.2         
##  [40] prettyunits_1.2.0       R6_2.5.1                bslib_0.8.0            
##  [43] stringi_1.8.4           jquerylib_0.1.4         cellranger_1.1.0       
##  [46] iterators_1.0.14        Rcpp_1.0.13-1           R.utils_2.12.3         
##  [49] readr_2.1.5             IRanges_2.40.0          Matrix_1.7-1           
##  [52] tidyselect_1.2.1        rstudioapi_0.17.1       yaml_2.3.10            
##  [55] codetools_0.2-20        doParallel_1.0.17       curl_6.0.0             
##  [58] lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
##  [61] Biobase_2.66.0          withr_3.0.2             KEGGREST_1.46.0        
##  [64] evaluate_1.0.1          proxy_0.4-27            BiocFileCache_2.14.0   
##  [67] zip_2.3.1               xml2_1.3.6              Biostrings_2.74.0      
##  [70] pillar_1.9.0            filelock_1.0.3          foreach_1.5.2          
##  [73] stats4_4.4.2            generics_0.1.3          vroom_1.6.5            
##  [76] S4Vectors_0.44.0        hms_1.1.3               munsell_0.5.1          
##  [79] scales_1.3.0            gtools_3.9.5            class_7.3-22           
##  [82] glue_1.8.0              tools_4.4.2             GenomicSEM_0.0.5       
##  [85] grid_4.4.2              crosstalk_1.2.1         AnnotationDbi_1.68.0   
##  [88] colorspace_2.1-1        nlme_3.1-166            GenomeInfoDbData_1.2.13
##  [91] cli_3.6.3               rappdirs_0.3.3          textshaping_0.4.0      
##  [94] fansi_1.0.6             viridisLite_0.4.2       svglite_2.1.3          
##  [97] gtable_0.3.6            R.methodsS3_1.8.2       sass_0.4.9             
## [100] digest_0.6.37           mgsub_1.7.3             BiocGenerics_0.52.0    
## [103] htmlwidgets_1.6.4       farver_2.1.2            R.oo_1.27.0            
## [106] memoise_2.0.1           htmltools_0.5.8.1       lifecycle_1.0.4        
## [109] httr_1.4.7              bit64_4.5.2

14 Summary

Overall, the MR analysis supports a potential causal influence of ACY1 on BMI due to:

The use of a rare variant with a strong effect on ACY1, serving as a robust instrument for standard IVW MR, with confirmation by MR-PRESSO using an expanded list of instruments.
Potential bias from sample overlap is negligible based on the LDSC intercept test.
Winner’s curse may still have inflated the MR estimate.

Thus, MRlap failed to confirm the effect because:

It relies on polygenic signals and cross-trait LD correlations, which are less sensitive to rare variants with strong effects and/or our IVW and MR-PRESSO analyses are biased due to use of UKB-PPP discovery samples for the ACY1 instruments.

15 Next Steps

Look for other sources of ACY1 instruments.
Consider running other MR methods, such as GSMR (Generalized Summary-data-based Mendelian Randomization)… NOTE: We opted to zoom in on the Benson instruments.

The remaining portion of the document examines the Benson instruments with Wald ratios and combinations of IVW analyses.

NOTE: Although it isn’t necessary, I have calculated the weights for the 3-SNP IVW analysis by hand and performed the 3-SNP IVW by hand to show you how that is done.

15.1 Formulas for IVW Mendelian Randomization

IVW Estimate (\(\beta_{IVW}\)):

\[ \beta_{IVW} = \frac{\sum w_i \cdot \beta_{\text{exposure},i} \cdot \beta_{\text{outcome},i}}{\sum w_i \cdot \beta_{\text{exposure},i}^2} \]

where \(w_i = \frac{1}{se_{\text{outcome},i}^2}\) are the inverse-variance weights.
Standard Error (\(se_{IVW}\)):

\[ se_{IVW} = \sqrt{\frac{1}{\sum w_i \cdot \beta_{\text{exposure},i}^2}} \]
p-value:

\[ p = 2 \cdot \text{pnorm}(-|\beta_{IVW} / se_{IVW}|) \]

16 Wald ratio and 2-SNP IVW Analyses

acy1 <- fread("/Users/charleenadams/acy1_bmi/harmonized_summary_statistics.txt")

# Instrument suggestion from Mark
benson_list <- c("rs121912698","rs121912701","rs34017492")
instrument <- acy1[which(acy1$rsid%in%benson_list),]

# Prepare exposure data (instrument)
exp_dat <- data.frame(
  SNP = instrument$rsid,
  beta.exposure = instrument$BETA,
  se.exposure = instrument$SE,
  eaf.exposure = instrument$A1FREQ,
  effect_allele.exposure = instrument$ALLELE1,
  other_allele.exposure = instrument$ALLELE0,
  pval.exposure = instrument$Pval,
  samplesize.exposure = instrument$N,
  chr.exposure = instrument$CHROM,
  pos.exposure = instrument$POS19)

# Set the exposure label
exp_dat$exposure <- "ACY1"
exp_dat$id.exposure <- "ACY1"

bmi <- fread("/Users/charleenadams/acy1_bmi/GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv")
out_dat <- data.frame(
  SNP = bmi$SNP,
  beta.outcome = bmi$BETA,
  se.outcome = bmi$SE,
  eaf.outcome = bmi$A1FREQ,
  effect_allele.outcome = bmi$ALLELE1,
  other_allele.outcome = bmi$ALLELE0,
  pval.outcome = bmi$P,
  samplesize.outcome = 460000,         
  chr.outcome = bmi$CHR,
  pos.outcome = bmi$BP)

# Set the outcome labels
out_dat$outcome <- "BMI"
out_dat$id.outcome <- "BMI"

# Harmonize datasets for MR
harmonized_data <- harmonise_data(
  exposure_dat = exp_dat,
  outcome_dat = out_dat)

# Calculate weights as the inverse variance of beta.outcome
harmonized_data$weights <- 1 / (harmonized_data$se.outcome^2)

# Calculate the numerator and denominator for the IVW estimate
harmonized_data$IVW_numerator <- harmonized_data$weights * harmonized_data$beta.exposure * harmonized_data$beta.outcome
harmonized_data$IVW_denominator <- harmonized_data$weights * (harmonized_data$beta.exposure^2)

# Sum across all SNPs to get the IVW estimate
beta_IVW <- sum(harmonized_data$IVW_numerator) / sum(harmonized_data$IVW_denominator)

# Calculate the standard error
se_IVW <- sqrt(1 / sum(harmonized_data$IVW_denominator))

# Calculate the p-value
p_IVW <- 2 * pnorm(-abs(beta_IVW / se_IVW))

# Add results as new columns
harmonized_data$beta_IVW <- beta_IVW
harmonized_data$se_IVW <- se_IVW
harmonized_data$p_IVW <- p_IVW

# View the results
print(harmonized_data)

##           SNP effect_allele.exposure other_allele.exposure
## 1 rs121912698                      T                     C
## 2 rs121912701                      A                     G
## 3  rs34017492                      G                     C
##   effect_allele.outcome other_allele.outcome beta.exposure beta.outcome
## 1                     T                    C     -1.691910   -0.0311163
## 2                     A                    G     -0.520153   -0.0214348
## 3                     G                    C      0.487914    0.0115997
##   eaf.exposure eaf.outcome remove palindromic ambiguous id.outcome se.outcome
## 1   0.00440744    0.003927  FALSE       FALSE     FALSE        BMI  0.0150565
## 2   0.00476360    0.004107  FALSE       FALSE     FALSE        BMI  0.0147407
## 3   0.00235621    0.002368  FALSE        TRUE     FALSE        BMI  0.0219029
##   pval.outcome samplesize.outcome chr.outcome pos.outcome outcome se.exposure
## 1    0.0387695             460000           3    52022837     BMI   0.0537128
## 2    0.1459120             460000           3    52023042     BMI   0.0515046
## 3    0.5963920             460000           3    52018149     BMI   0.0820835
##   pval.exposure samplesize.exposure chr.exposure pos.exposure exposure
## 1 8.933055e-218               33693            3     52022837     ACY1
## 2  5.571857e-24               33693            3     52023042     ACY1
## 3  2.779457e-09               33693            3     52018149     ACY1
##   id.exposure action SNP_index mr_keep  weights IVW_numerator IVW_denominator
## 1        ACY1      2         1    TRUE 4411.151     232.22937      12627.1825
## 2        ACY1      2         1    TRUE 4602.182      51.31145       1245.1624
## 3        ACY1      2         1    TRUE 2084.475      11.79741        496.2303
##     beta_IVW      se_IVW      p_IVW
## 1 0.02055446 0.008342441 0.01374567
## 2 0.02055446 0.008342441 0.01374567
## 3 0.02055446 0.008342441 0.01374567

# Repeat the 3-SNP meta-analytic analysis performed earlier; just to have it here too for easy viewing.
metaMR3 <- mr(harmonized_data)
metaMR3

##   id.exposure id.outcome outcome exposure                    method nsnp
## 1        ACY1        BMI     BMI     ACY1                  MR Egger    3
## 2        ACY1        BMI     BMI     ACY1           Weighted median    3
## 3        ACY1        BMI     BMI     ACY1 Inverse variance weighted    3
## 4        ACY1        BMI     BMI     ACY1               Simple mode    3
## 5        ACY1        BMI     BMI     ACY1             Weighted mode    3
##            b          se       pval
## 1 0.01090458 0.016409839 0.62661534
## 2 0.01909664 0.008561989 0.02572107
## 3 0.02055446 0.008342441 0.01374567
## 4 0.02109267 0.016009197 0.31834507
## 5 0.01850858 0.009186963 0.18152333

# Step 1: Run Wald ratio MR for each SNP individually
wald_results <- data.frame()
for (i in 1:nrow(harmonized_data)) {
  wald_result <- mr_wald_ratio(
    b_exp = harmonized_data$beta.exposure[i],
    b_out = harmonized_data$beta.outcome[i],
    se_exp = harmonized_data$se.exposure[i],
    se_out = harmonized_data$se.outcome[i]
  )
  wald_results <- rbind(wald_results, data.frame(
    SNP = harmonized_data$SNP[i],
    Estimate = wald_result$b,
    SE = wald_result$se,
    P = wald_result$pval
  ))
}

# Step 2: Run IVW MR for all combinations of two SNPs
library(gtools) # For combinations
ivw_results <- data.frame()
snp_combinations <- combinations(n = length(benson_list), r = 2, v = benson_list)
for (i in 1:nrow(snp_combinations)) {
  selected_snps <- snp_combinations[i, ]
  subset_data <- harmonized_data[harmonized_data$SNP %in% selected_snps, ]
  
  # Perform IVW MR
  ivw_result <- mr_ivw(
    b_exp = subset_data$beta.exposure,
    b_out = subset_data$beta.outcome,
    se_exp = subset_data$se.exposure,
    se_out = subset_data$se.outcome
  )
  
  ivw_results <- rbind(ivw_results, data.frame(
    SNPs = paste(selected_snps, collapse = ","),
    Estimate = ivw_result$b,
    SE = ivw_result$se,
    P = ivw_result$pval
  ))
}

# Display Results
print("Wald Ratio Results for Individual SNPs:")

## [1] "Wald Ratio Results for Individual SNPs:"

print(wald_results)

##           SNP   Estimate          SE         P
## 1 rs121912698 0.01839123 0.008899114 0.0387685
## 2 rs121912701 0.04120864 0.028339162 0.1459120
## 3  rs34017492 0.02377407 0.044890903 0.5963917

print("IVW Results for SNP Combinations:")

## [1] "IVW Results for SNP Combinations:"

print(ivw_results)

##                      SNPs   Estimate          SE          P
## 1 rs121912698,rs121912701 0.02043929 0.008490340 0.01606841
## 2  rs121912698,rs34017492 0.01859477 0.008729243 0.03315765
## 3  rs121912701,rs34017492 0.03624046 0.023963576 0.13045395

# Save results to files
fwrite(wald_results, "/Users/charleenadams/acy1_bmi/wald_results.csv")
fwrite(ivw_results, "/Users/charleenadams/acy1_bmi/ivw_results.csv")

17 Cleaner Visual of the Benson MR Results

In this top table, I’m providing the betas for exposures and outcomes next to each other so that you can more easily see how they produce Wald ratios.

SNP	Effect Allele	Other Allele	Beta (Exposure)	Beta (Outcome)	SE (Outcome)	Wald Estimate	Wald SE	Wald P
rs121912698	T	C	-1.6919	-0.0311	0.0151	0.0184	0.0089	0.0388
rs121912701	A	G	-0.5202	-0.0214	0.0147	0.0412	0.0283	0.1459
rs34017492	G	C	0.4879	0.0116	0.0219	0.0238	0.0449	0.5964

17.0.1 3-SNP and 2-SNP Inverse Variance Weighted (IVW) MR

SNPs	Estimate	SE	P
rs121912698, rs121912701, rs34017492	0.0206	0.0083	0.0137
rs121912698, rs121912701	0.0204	0.0085	0.0161
rs121912698, rs34017492	0.0186	0.0087	0.0332
rs121912701, rs34017492	0.0362	0.0240	0.1305

17.0.2 3-SNP Meta-analytic MR

Method	Estimate	SE	P
Inverse Variance Weighted	0.0206	0.0083	0.0137
Weighted Median	0.0191	0.0087	0.0289
MR Egger	0.0109	0.0164	0.6266
Weighted Mode	0.0185	0.0093	0.1840

The Benson-SNP analysis of circulating ACY1 on BMI indicates that an INCREASE in circulating levels of ACY1 INCREASE BMI.

NOTE: MR results are interpreted in relation to higher levels of the instrumented trait, unless the GWAS was explicitly set up to reflect a decreasing trait (which is uncommon). The interpretation focuses on the trait, not the SNPs. For example, the minor allele of rs121912698 decreases ACY1 levels and also decreases BMI, meaning higher levels of ACY1 increase BMI. Similarly, the minor allele of rs34017492 increases ACY1 levels and also increases BMI. In this analysis, all three SNPs show a consistent alignment between the direction of their effects on ACY1 and BMI with rs121912698 and rs121912701 both decreasing ACY1 and BMI and rs34017492 increasing ACY1 and increasing BMI.

However, this is not always the case. Another pattern seen in MR occurs when the minor allele’s effect on the exposure trait is opposite to its effect on the outcome trait. In such instances, the Wald ratio (calculated as the outcome beta divided by the exposure beta) will indicate that a higher level of the exposure decreases the outcome. This relationship holds regardless of whether the exposure beta is negative and the outcome beta positive, or vice versa. For instance, suppose the minor allele for rs123 increases ACY1 but decreases BMI. The MR interpretation is that an increase in the levels of ACY1 decrease BMI. But suppose rs123 decreases ACY1 but increases BMI. The MR interpretation is still that an increase in the levels of ACY1 decrease BMI.