Mendelian randomization (MR) of N-Acetyltaurine (NAT) (1MB around PTER TSS) and body mass index (BMI)

1 Methods

1.1 Study Overview

We conducted a two-sample Mendelian Randomization (MR) analysis to evaluate the causal effect of N-acetyltaurine (NAT) levels, within a 1 Mb region surrounding the transcription start site (TSS) of the PTER gene, on body mass index (BMI). Genetic instruments for NAT were derived from the METSIM cohort, while BMI summary statistics were obtained from the Jurgens et al. (2022) UK Biobank (UKB) GWAS. All analyses were performed using R (version 4.x) and Python (version 3.x), leveraging multiple statistical packages and bioinformatics tools to ensure robust and reproducible results.

1.2 Data Sources

1.2.1 Exposure Data: N-Acetyltaurine (NAT)

Summary statistics for NAT were sourced from the METSIM (Metabolic Syndrome in Men) study, a population-based cohort of Finnish men. The NAT data, available in GRCh38 coordinates, were downloaded from the PheWeb repository (https://pheweb.org/metsim-metab/download/C100005466) as a compressed file (C100005466). This file was decompressed using bgzip -c -d to yield C100005466.uncompressed, containing association statistics for NAT across 6,099 individuals. The dataset includes:

Chromosome (chrom)
Position (pos)
Reference (ref) and alternate (alt) alleles
Effect sizes (beta)
Standard errors (sebeta)
P-values (pval)
Minor allele frequencies (maf)
rsIDs (rsids)

1.2.2 Outcome Data: Body Mass Index (BMI)

BMI summary statistics were obtained from the Jurgens et al. (2022) GWAS of European ancestry individuals in the UK Biobank (https://personal.broadinstitute.org/ryank/Jurgens_Pirruccello_2022_GWAS_Sumstats.zip). The file GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv provides association results for 460,000 participants, originally reported in GRCh37 (hg19) coordinates. Key columns include:

Chromosome (CHR)
Base pair position (BP)
SNP identifier (SNP)
Effect allele (ALLELE1)
Other allele (ALLELE0)
Effect allele frequency (A1FREQ)
Beta coefficient (BETA)
Standard error (SE)
P-value (P)

1.3 Coordinate Conversion for BMI Data

To align the BMI data with the NAT data (GRCh38), we converted the GRCh37 coordinates to GRCh38 using a three-step bioinformatics pipeline implemented in Python and Unix:

Conversion to BED Format: A Python script (make_bed.py) transformed the BMI TSV file into a BED format, extracting chromosome, start (BP - 1 for 0-based coordinates), end (BP), and SNP ID. The output was saved as GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed.
LiftOver: The UCSC liftOver tool was applied using the chain file hg19ToHg38.over.chain.gz to map GRCh37 coordinates to GRCh38. Successfully mapped SNPs were written to GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed, with unmapped SNPs logged in unmapped_SNPs.bed.
Merging with Original Data: A second Python script (format_bmi.py) processed the lifted BED file, adjusting coordinates to 1-based notation and merging them with the original TSV data using the SNP column. The merged dataset was sorted by SNP, processed in chunks (500,000 rows) to manage memory, and renamed to conform to the TwoSampleMR R package requirements: effect_allele (ALLELE1), other_allele (ALLELE0), eaf (A1FREQ), beta (BETA), se (SE), pval (P), and BP_GRCh38 (new position). The final output was saved as merged_gwas_data_grch38_twosample.tsv.

1.4 Data Cleaning and Harmonization

Both datasets underwent cleaning and harmonization in R using the data.table, dplyr, and tidyr packages:

NAT Data: Rows with missing or invalid rsIDs were filtered out (!is.na(rsids) & rsids != "" & grepl("^rs", rsids)), and the data were formatted for MR analysis with columns including chr.exposure, pos.exposure, beta.exposure, se.exposure, pval.exposure, effect_allele.exposure, other_allele.exposure, eaf.exposure, samplesize.exposure (6,099), and SNP.
BMI Data: Similar filtering removed invalid SNPs, and the data were formatted with chr.outcome, pos.outcome, beta.outcome, se.outcome, pval.outcome, effect_allele.outcome, other_allele.outcome, eaf.outcome, samplesize.outcome (460,000), and SNP.
Harmonization: The harmonise_data function from the TwoSampleMR package aligned NAT (exposure) and BMI (outcome) data by matching alleles and removing palindromic SNPs with ambiguous effects, producing a harmonized dataset saved as harmonized_nat_Jurgens_BMI_dat_[date].csv.

1.5 Defining the PTER Region

To focus on the PTER locus, we used the biomaRt R package to query the Ensembl database (GRCh38, hsapiens_gene_ensembl dataset) for the PTER gene’s TSS. Attributes retrieved included:

ensembl_gene_id
external_gene_name
chromosome_name
transcription_start_site
strand

The most upstream TSS was selected (min(transcription_start_site)), and a 1 Mb window (±500 kb) was defined around this position on chromosome 10. The harmonized dataset was filtered to retain only SNPs within this region (chr.outcome == pter_chrom & pos.outcome >= tss_start & pos.outcome <= tss_end), saved as filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv.

1.6 Mendelian Randomization Analyses

We implemented two primary MR approaches to assess the causal effect of NAT on BMI within the PTER region, using R packages TwoSampleMR, MendelianRandomization, ieugwasr, and MRInstruments, with visualization via ggplot2.

1.6.1 Suite of MR Approaches

Instrument Selection: From the filtered dataset, genetic instruments were selected with a relaxed p-value threshold of 5 × 10⁻⁶ (versus the conventional 5 × 10⁻⁸) to maximize power within the PTER TSS region, justified by prior evidence of NAT association. Instruments were required to have an F-statistic > 10 to ensure strength, calculated as \[ F = \left( \frac{\beta_{exposure}}{SE_{exposure}} \right)^2 \]. Steiger filtering (steiger_filtering) retained only SNPs where the exposure effect directionally preceded the outcome.
Linkage Disequilibrium (LD) Clumping: Independent instruments were identified using ld_clump with a clumping window of 500 kb (versus the default 10,000 kb) and an r^2 threshold of 0.001, referencing the 1000 Genomes European (EUR) population (bfile = EUR).
MR Methods: Five MR methods were applied:
- Inverse Variance Weighted (IVW): Assumes no horizontal pleiotropy (mr_ivw).
- MR-Egger: Adjusts for directional pleiotropy (mr_egger_regression).
- Weighted Median: Robust to invalid instruments if <50% are pleiotropic (mr_weighted_median).
- MR-Lasso: Identifies and adjusts for pleiotropic outliers (mr_lasso).
- Contamination Mixture (ConMix): Models mixture distributions to account for invalid instruments (mr_conmix).
Sensitivity Analyses: Heterogeneity was assessed with mr_heterogeneity, pleiotropy with mr_pleiotropy_test, and leave-one-out analysis with mr_leaveoneout. Wald ratio tests were computed for each instrument individually.
Output: Results were saved in an Excel file (MR_Results_PTER_Jurgens_BMI_[date].xlsx) with sheets for each method, heterogeneity, pleiotropy, and instrument details, alongside a forest plot (MR_Forestplot_PTER_Jurgens_BMI_[date].png) generated using ggplot2.

1.6.2 MR with COJO Instruments

Instrument Selection: Identical to the Suite approach, instruments were filtered at p < 5 × 10⁻⁶ and F > 10, followed by Steiger filtering.
Conditional Joint Analysis (COJO): The GCTA software (version 1.9x) was used to perform COJO analysis (gcta64 --cojo-slct --cojo-p 5e-6) on the filtered instruments, formatted as a tab-delimited file (cojo_input.txt) with SNP, alleles (A1, A2), frequency, beta, SE, p-value, and sample size. LD was estimated using the 1000 Genomes EUR reference panel. Conditionally independent SNPs were extracted from cojo_output.jma.cojo and merged back into the instrument set, updating beta, SE, and p-values.
MR Methods: The same five MR methods (IVW, MR-Egger, Weighted Median, MR-Lasso, ConMix) were applied to the COJO-selected instruments, with identical sensitivity analyses.
Output: Results were saved in MR_Results_PTER_Jurgens_BMI_COJO_[date].xlsx and visualized in MR_Forestplot_PTER_Jurgens_BMI_COJO_[date].png.

1.6.3 Alternative Approach

An additional MR analysis with a stricter p-value threshold (5 × 10⁻⁸) and a 10,000 kb clumping window was explored but not prioritized, as the relaxed threshold and smaller window better suited the PTER-specific hypothesis.

1.7 Statistical Software and Visualization

Languages: R (version 4.x) and Python (version 3.x).
R Packages: TwoSampleMR, MendelianRandomization, ieugwasr, MRInstruments, data.table, dplyr, tidyr, readxl, openxlsx, ggplot2, ggrepel, corrplot, RhpcBLASctl, biomaRt, scales.
Python Libraries: pandas, csv.
Other Tools: liftOver (UCSC), GCTA (version 1.9x), PLINK (via genetics.binaRies).
Visualization: Forest plots were created with ggplot2, using distinct colors for each MR method and error bars representing 95% confidence intervals, saved at 600 DPI.

2 Getting the Data

# NAT: 
# on GRCh38
wget https://pheweb.org/metsim-metab/download/C100005466

# BMI:
wget https://personal.broadinstitute.org/ryank/Jurgens_Pirruccello_2022_GWAS_Sumstats.zip
# select: GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv
# use LiftOver to obtain GRCh38 coordinates

3 Bioinformatics Pipeline Prep

BMI summary statistics were on GRCh37. I wanted the GRCh38 coordinates for other post-GWAS analyses.

This was a 3-step process:

Converting the BMI summary statistics to BED format.
Running liftOver to obtain GRCh38 coordinates
Merging the liftedOver coordinates with the original TSV

Step 1: Converts BMI from TSV to BED file format

This first script (make_bed.py) converts the BMI summary statistics file (in TSV format) to a BED file format.

import csv

# Open the input TSV file and prepare to write to the output BED file
with open('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv', 'r') as tsvfile, open('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed', 'w') as bedfile:
    tsv_reader = csv.DictReader(tsvfile, delimiter='\t')
    
    for row in tsv_reader:
        # Extract the necessary information
        chromosome = f"chr{row['CHR']}"
        start = int(row['BP']) - 1  # Convert to 0-based for BED format
        end = row['BP']
        snp_id = row['SNP']
        
        # Write to BED file
        bedfile.write(f"{chromosome}\t{start}\t{end}\t{snp_id}\n")

print("Conversion complete. Check 'GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed' for the output.")

Step 2: liftOver

liftOver \
  GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.bed \       # Input BED file
  hg19ToHg38.over.chain.gz \                              # Chain file for converting GRCh37 (hg19) → GRCh38 (hg38)
  GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed \  # Output file (successfully mapped SNPs)
  unmapped_SNPs.bed                                       # Output file for unmapped SNPs

Step 3: This 2nd script (format_bmi.py) does the following:

Reads the BED File:

Reads GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed containing:
- Chromosome (CHR)
- Start and end positions
- SNP identifiers
Adjusts coordinates from 0-based to 1-based:
- Assigns END positions to a new column called BP_GRCh38.

Sorts BED Data:

Sorts the BED data by the SNP column for efficient merging later.

Processes the GWAS TSV File:

Reads the large GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv file in chunks (500,000 rows at a time) to avoid memory overflow.
Sorts each chunk by SNP for faster and more efficient merging.

Merges:

Merges the chunked data with the BED file using the SNP column.
Adds the BP_GRCh38 (new genome build position) to the GWAS data.
Uses a left join to ensure all GWAS SNPs are preserved, even if there isn’t a matching position in the BED file.

Rename Columns:

Renames key columns to fit the format required by the TwoSampleMR package:

ALLELE1 → effect_allele
ALLELE0 → other_allele
A1FREQ → eaf (effect allele frequency)
BETA → beta
SE → se (standard error)
P → pval

Saves Merged Results:

Combines all the merged chunks into one large DataFrame.
Saves this merged DataFrame as merged_gwas_data_grch38_original.tsv for downstream analysis using the TwoSampleMR package.

import pandas as pd

# Define chunk size for reading large TSV files in smaller parts
chunk_size = 500000  # Adjust based on memory availability

# Initialize an empty list to store the merged chunks
merged_chunks = []

try:
    # Step 1: Load the BED file (without headers)
    bed_columns = ['CHR', 'START', 'END', 'SNP']
    bed_data = pd.read_csv('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample_hg38.bed', sep='\t', header=None, names=bed_columns)
    print("Loaded BED file successfully.", flush=True)

    # Step 2: Add 1 to the START position to convert from zero-based to one-based
    bed_data['BP_GRCh38'] = bed_data['END']
    print("Converted BED file positions to one-based.", flush=True)

    # Step 3: Sort the BED data by the SNP column to optimize merging
    bed_data = bed_data.sort_values(by='SNP')
    print("Sorted BED data by SNP.", flush=True)

    # Step 4: Process the TSV file in chunks, and merge each chunk with the BED data
    for chunk in pd.read_csv('GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv', sep='\t', chunksize=chunk_size):
        print(f"Processing chunk...", flush=True)
        # Sort the chunk by SNP for more efficient merging
        chunk = chunk.sort_values(by='SNP')

        # Merge the chunk with the BED data
        merged_chunk = pd.merge(chunk, bed_data[['CHR', 'SNP', 'BP_GRCh38']], on='SNP', how='left')

        # Append the merged chunk to the list of merged chunks
        merged_chunks.append(merged_chunk)

    # Step 5: Concatenate all merged chunks together into one DataFrame
    merged_data = pd.concat(merged_chunks)
    print("Merged all chunks successfully.", flush=True)

    # Step 6: Save the merged data to a new TSV file before changing column names
    merged_data.to_csv('merged_gwas_data_grch38_original.tsv', sep='\t', index=False)
    print("Saved 'merged_gwas_data_grch38_original.tsv' successfully.", flush=True)

    # Step 7: Perform the column renaming for TwoSampleMR, keeping both BP columns
    merged_data.rename(columns={
        'SNP': 'SNP',
        'CHR': 'CHR',  # Keep the original chromosome column
        'BP': 'BP',  # Keep the original BP
        'BP_GRCh38': 'BP_GRCh38',  # Add BP_GRCh38
        'ALLELE1': 'effect_allele',  # ALLELE1 -> effect_allele
        'ALLELE0': 'other_allele',   # ALLELE0 -> other_allele
        'A1FREQ': 'eaf',             # A1FREQ -> effect allele frequency
        'BETA': 'beta',              # BETA -> beta
        'SE': 'se',                  # SE -> standard error
        'P': 'pval'                  # P -> p-value
    }, inplace=True)

    # Step 8: Save the final TwoSampleMR-compatible file in TSV format
    merged_data.to_csv('merged_gwas_data_grch38_twosample.tsv', sep='\t', index=False)
    print("Saved 'merged_gwas_data_grch38_twosample.tsv' successfully.", flush=True)

    # Step 9 (optional / didn't work): Save the merged data in Parquet format for faster reading in the future
    # merged_data.to_parquet('merged_gwas_data_grch38.parquet', index=False)
    # print("Saved 'merged_gwas_data_grch38.parquet' successfully (Parquet format).", flush=True)

except Exception as e:
    print(f"An error occurred: {e}", flush=True)

4 Suite of MR Approaches

Performed with a relaxed pval (5E-6) threshold (vs 5E-8) and clumping window = 500 kb
Relaxing the p-value threshold for instrument selection is justified here, given the focus on a TSS region and our a priori knowledge. I’m less concerned about a SNP in PTER’s TSS being a false positive for a NAT association than I am about another violation to the MR assumptions; the real challenge is horizontal pleiotropy, which I’ve addressed using a range of MR methods and sensitivity analyses.
The default window for ld_clump is 10000 kb, which is is 10x larger than our PTER region. So, I chose a much smaller window (500 kb).

rm(list = ls())

# Load Required Libraries

# Mendelian Randomization
library(TwoSampleMR)          # Core package for Mendelian Randomization (MR) analyses
library(MendelianRandomization) # Additional MR methods, including MR-Lasso
library(ieugwasr)             # For local LD clumping with ld_clump()
library(MRInstruments)        # For proxy SNP lookup (if needed)

# Data Wrangling & Handling
library(dplyr)                # Data manipulation (filtering, joining, summarizing)
library(tidyr)                # Data reshaping
library(data.table)           # Fast and efficient data handling
library(readxl)               # Reading Excel files
library(openxlsx)             # Writing and formatting Excel files efficiently

# Statistical & Visualization
library(ggplot2)              # For plotting
library(ggrepel)              # Improved text labeling in plots
library(corrplot)             # Visualization of correlation matrices
library(RhpcBLASctl)          # Control multithreading for efficiency in MR analyses
library(biomaRt)              # Querying Ensembl for gene annotation
library(scales)               # For dynamic color generation

# Set Working Directory & Setup Folder for Project
setwd("/Users/charleenadams/temp_BI/mr_nat_pter_bmi")
if (!dir.exists("results_expanded_p5E6")) dir.create("results_expanded_p5E6", recursive = TRUE)

# ---------------------------------------------
# Load and Format NAT Data
# ---------------------------------------------

nat <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/C100005466.uncompressed") %>% as.data.frame()
filtered_nat_df <- nat %>% 
  filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
  arrange(rsids)

exp_dat <- filtered_nat_df %>%
  mutate(
    chr.exposure = chrom,
    pos.exposure = pos,   
    beta.exposure = beta,
    se.exposure = sebeta,
    exposure = "N-acetyltaurine",
    id.exposure = "N-acetyltaurine",
    pval.exposure = pval,
    SNP.exposure = rsids,
    SNP = rsids,
    effect_allele.exposure = alt,
    other_allele.exposure = ref,
    eaf.exposure = maf,
    samplesize.exposure = 6099,
    id_col = nearest_genes
  )

# ---------------------------------------------
# Load and Format BMI Data
# ---------------------------------------------

bmi <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/merged_gwas_data_grch38_twosample.tsv")
filtered_bmi_df <- bmi %>%
  filter(!is.na(SNP) & SNP != "" & grepl("^rs", SNP)) %>%
  arrange(SNP)

out_dat <- filtered_bmi_df %>%
  mutate(
    SNP = SNP,
    SNP.outcome = SNP,
    chr.outcome = CHR_x,
    pos.outcome = BP_GRCh38,
    beta.outcome = beta,
    se.outcome = se,
    pval.outcome = pval,
    effect_allele.outcome = effect_allele,
    other_allele.outcome = other_allele,
    eaf.outcome = eaf,
    samplesize.outcome = 460000,
    id.outcome = "BMI",
    outcome = "BMI"
  )

# ---------------------------------------------
# Harmonize Data
# ---------------------------------------------

dat <- harmonise_data(exposure_dat = exp_dat, outcome_dat = out_dat)
today <- Sys.Date()
write.csv(dat, paste0("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/harmonized_nat_Jurgens_BMI_dat_", today, ".csv"), row.names = FALSE)

# ---------------------------------------------
# Select 1MB around PTER region
# ---------------------------------------------

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
pter_data <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "chromosome_name", 
                                  "transcription_start_site", "strand"),
                   filters = "external_gene_name", values = "PTER", mart = ensembl)
pter_tss <- min(pter_data$transcription_start_site)
pter_chrom <- pter_data$chromosome_name[1]
window_size <- 500000
tss_start <- pter_tss - window_size
tss_end <- pter_tss + window_size

dat_filtered <- dat %>%
  filter(chr.outcome == pter_chrom & pos.outcome >= tss_start & pos.outcome <= tss_end)
write.csv(dat_filtered, file = "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv", row.names = FALSE)
cat("Filtered data saved to: /Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv\n")
cat("Number of observations within 1 Mb of PTER TSS:", nrow(dat_filtered), "\n")

# ---------------------------------------------
# MR
# ---------------------------------------------

# Step 1: Filter Instruments
dat_filtered <- read.csv("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv")
instruments <- dat_filtered %>%
  filter(mr_keep == TRUE, pval.exposure < 5e-6) %>%  # Relaxed threshold
  mutate(
    rsid = SNP,
    pval = pval.exposure,
    id = "N-acetyltaurine (1MB around PTER TSS)",
    F_stat = (beta.exposure / se.exposure)^2  # Compute F-statistic
  ) %>%
  filter(F_stat > 10)  # Exclude weak instruments

cat("Number of instruments after F-stat filtering:", nrow(instruments), "\n")

# Step 1.5: Steiger Filtering
instruments <- steiger_filtering(instruments)
instruments <- instruments %>% filter(steiger_dir == TRUE)  # Keep only SNPs with correct direction
cat("Number of instruments after Steiger filtering:", nrow(instruments), "\n")
cat("Preview of Steiger-filtered instruments:\n")
print(head(instruments))

# Subset instruments to keep only TwoSampleMR, F-stat, and Steiger fields
instruments_subset <- instruments %>%
  dplyr::select(
    SNP, 
    effect_allele.exposure, other_allele.exposure,
    effect_allele.outcome, other_allele.outcome,
    beta.exposure, se.exposure, pval.exposure,
    beta.outcome, se.outcome, pval.outcome,
    eaf.exposure, eaf.outcome,
    id.exposure, exposure,
    id.outcome, outcome,
    samplesize.exposure, samplesize.outcome,
    mr_keep, action,
    F_stat,
    steiger_dir, steiger_pval
  )

# Step 2: Local LD Clumping
clumped <- ld_clump(
  dplyr::tibble(rsid = instruments$rsid, pval = instruments$pval, id = instruments$id),
  plink_bin = genetics.binaRies::get_plink_binary(),
  bfile = "/Users/charleenadams/1000G_bfiles/EUR/EUR",
  clump_r2 = 0.001,
  clump_kb = 500  # Reduced window
)
clumped_dat <- instruments %>% dplyr::filter(SNP %in% clumped$rsid)
cat("Local clumping completed. Number of SNPs retained:", nrow(clumped_dat), "\n")
cat("Preview of clumped data:\n")
print(head(clumped_dat))

# Subset clumped_dat to keep only TwoSampleMR, F-stat, and Steiger fields
clumped_dat_subset <- clumped_dat %>%
  dplyr::select(
    SNP, 
    effect_allele.exposure, other_allele.exposure,
    effect_allele.outcome, other_allele.outcome,
    beta.exposure, se.exposure, pval.exposure,
    beta.outcome, se.outcome, pval.outcome,
    eaf.exposure, eaf.outcome,
    id.exposure, exposure,
    id.outcome, outcome,
    samplesize.exposure, samplesize.outcome,
    mr_keep, action,
    F_stat,
    steiger_dir, steiger_pval
  )

# Step 3.0: Perform Many MR Analyses at Once
mr_result <- mr_wrapper(clumped_dat)

# Extract the nested list components from mr_wrapper
estimates <- mr_result$`N-acetyltaurine.BMI`$estimates
heterogeneity <- mr_result$`N-acetyltaurine.BMI`$heterogeneity
directional_pleiotropy <- mr_result$`N-acetyltaurine.BMI`$directional_pleiotropy
info <- mr_result$`N-acetyltaurine.BMI`$info
snps_retained <- mr_result$`N-acetyltaurine.BMI`$snps_retained

# Step 3: Perform Selected MR Analyses
# 3.1: Inverse Variance Weighted (IVW)
ivw_result <- mr(clumped_dat, method_list = "mr_ivw")

# 3.2: MR-Egger
egger_result <- mr(clumped_dat, method_list = "mr_egger_regression")

# 3.3: Weighted Median
weighted_median_result <- mr(clumped_dat, method_list = "mr_weighted_median")

# 3.4: MR-Lasso
mr_input <- mr_input(
  bx = clumped_dat$beta.exposure,
  bxse = clumped_dat$se.exposure,
  by = clumped_dat$beta.outcome,
  byse = clumped_dat$se.outcome,
  exposure = "N-acetyltaurine",
  outcome = "BMI",
  snps = clumped_dat$SNP
)
lasso_result <- tryCatch({
  mr_lasso(mr_input)
}, error = function(e) {
  cat("Error in MR-Lasso:", conditionMessage(e), "\n")
  NULL
})

# 3.5: Contamination Mixture (ConMix)
conmix_result <- tryCatch({
  mr_conmix(mr_input)
}, error = function(e) {
  cat("Error in MR-ConMix:", conditionMessage(e), "\n")
  NULL
})

# 3.6: Heterogeneity Test
heterogeneity_result <- mr_heterogeneity(clumped_dat)

# 3.7: Pleiotropy Test
pleiotropy_result <- mr_pleiotropy_test(clumped_dat)

# 3.8: Leave-One-Out Analysis
loo_result <- mr_leaveoneout(clumped_dat)

# Step 4: Perform Wald Ratio Tests for Each Instrument
wald_ratios <- clumped_dat %>%
  mutate(
    wald_beta = beta.outcome / beta.exposure,
    wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
    pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
    method = paste("Wald Ratio:", SNP)
  ) %>%
  dplyr::select(SNP, wald_beta, wald_se, pval, method)

cat("\n=== Wald Ratio Tests for Each Instrument ===\n")
print(wald_ratios)

# Step 5: Prepare Data for Forest Plot
ivw_df <- ivw_result %>% mutate(method = "IVW")
egger_df <- egger_result %>% mutate(method = "MR-Egger")
weighted_median_df <- weighted_median_result %>% mutate(method = "Weighted Median")

lasso_df <- if (!is.null(lasso_result)) {
  data.frame(method = "MR-Lasso", b = lasso_result@Estimate, 
             se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
  NULL
}

conmix_se <- if (!is.null(conmix_result)) {
  (conmix_result@CIUpper - conmix_result@CILower) / (2 * 1.96)  # SE = (CIUpper - CILower) / (2 * 1.96) for 95% CI
} else {
  NA
}

conmix_df <- if (!is.null(conmix_result)) {
  data.frame(method = "MR-ConMix", b = conmix_result@Estimate, 
             se = conmix_se, pval = conmix_result@Pvalue)
} else {
  NULL
}

mr_results <- bind_rows(
  ivw_df,
  egger_df,
  weighted_median_df,
  lasso_df,
  conmix_df,
  wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
  mutate(method = as.factor(method)) %>%
  filter(!is.na(method) & method != "NA") %>%
  arrange(b) %>%
  mutate(method = factor(method, levels = unique(method)))

# Step 6: Create Beautiful Forest Plot with Dynamic Colors (No Numbers)
base_colors <- c(
  "IVW" = "#1F78B4",
  "MR-Egger" = "#FF7F00",
  "Weighted Median" = "#33A02C",
  "MR-Lasso" = "#FB9A99",
  "MR-ConMix" = "#E41A1C"
)

wald_methods <- unique(wald_ratios$method)
n_wald <- length(wald_methods)
if (n_wald > 0) {
  wald_colors <- hue_pal()(n_wald)
  names(wald_colors) <- wald_methods
} else {
  wald_colors <- NULL
}

color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]

forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  labs(
    title = "Mendelian Randomization Estimates:\n N-acetyltaurine (1MB around PTER TSS) on Jurgens BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  xlim(-0.15, 0.05)

# Step 7: Save Results and Plot
today <- Sys.Date()
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/results_expanded_p5E6/"
excel_file <- paste0(results_dir, "MR_Results_PTER_Jurgens_BMI_", today, ".xlsx")
plot_file <- paste0(results_dir, "MR_Forestplot_PTER_Jurgens_BMI_", today, ".png")

if (!dir.exists(results_dir)) {
  dir.create(results_dir, recursive = TRUE)
  cat("Created directory:", results_dir, "\n")
}

# 7.01. Save same plot ordered by method

# Dynamically determine x-axis limits based on confidence intervals
x_min <- min(mr_results$b - 1.96 * mr_results$se, na.rm = TRUE)
x_max <- max(mr_results$b + 1.96 * mr_results$se, na.rm = TRUE)

# Explicitly set the correct order by method name
desired_order <- c(
  "IVW",
  "MR-Egger",
  "MR-Lasso",
  "MR-ConMix",
  "Weighted Median",
  "Wald Ratio: rs117372132",
  "Wald Ratio: rs142238737",
  "Wald Ratio: rs3802555",
  "Wald Ratio: rs4747286",
  "Wald Ratio: rs7075357"
)

# Force the order in the dataset itself
mr_results$method <- factor(mr_results$method, levels = desired_order)

# Sort the dataset based on the new factor levels BEFORE plotting
mr_results <- mr_results[order(mr_results$method), ]

forest_plot_by_method <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  # Plot error bars FIRST to prevent them from being hidden
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), 
                 height = 0.2, na.rm = TRUE, linewidth = 0.8) +
  # Plot points over error bars
  geom_point(size = 3) +
  # Add reference line at zero
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  # Labels and title
  labs(
    title = "Mendelian Randomization Estimates:\n N-acetyltaurine (1MB around PTER TSS)\non Jurgens BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  # Styling
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  # Adjust x-limits dynamically based on data
  xlim(x_min - 0.02, x_max + 0.02)

# Print plot to check output
print(forest_plot_by_method)

# Save the reordered plot
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/results_expanded_p5E6/"
plot_file_by_method <- paste0(results_dir, "MR_Forestplot_PTER_Jurgens_BMI_By_Method_", today, ".png")
ggsave(plot_file_by_method, plot = forest_plot_by_method, dpi = 300, width = 8, height = 5)

# 7.1: Save Everything in One Excel Spreadsheet
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")

add_formatted_sheet <- function(wb, sheet_name, data, title) {
  addWorksheet(wb, sheet_name)
  writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
  writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
  mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
  setRowHeights(wb, sheet_name, rows = 1, heights = 20)
  setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
  addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}

# TOC with all sections
toc_data <- data.frame(
  Sheet = c("Estimates", "Heterogeneity", "Directional_Pleiotropy", "Info", "SNPs_Retained",
            "IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix", 
            "Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
            "Instruments", "Clumped_Data"),
  Title = c("MR Causal Estimates (mr_wrapper)", "Heterogeneity Results (mr_wrapper)", 
            "Directional Pleiotropy Results (mr_wrapper)", "Summary Info (mr_wrapper)", 
            "SNPs Retained (mr_wrapper)",
            "Inverse Variance Weighted (IVW) Results", "MR-Egger Results", 
            "Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
            "Wald Ratio Tests for Each Instrument", "Heterogeneity Test Results", 
            "Pleiotropy Test Results", "Leave-One-Out Analysis Results",
            "Filtered Instruments Data", "Clumped SNPs Data")
)
toc_data <- toc_data[complete.cases(toc_data), ]  # Remove any NA rows (e.g., if lasso/conmix are NULL)

addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")

# Add all sheets
add_formatted_sheet(wb, "Estimates", estimates, "MR Causal Estimates for N-acetyltaurine on BMI (mr_wrapper)")
add_formatted_sheet(wb, "Heterogeneity", heterogeneity, "Heterogeneity Results (mr_wrapper)")
add_formatted_sheet(wb, "Directional_Pleiotropy", directional_pleiotropy, "Directional Pleiotropy Results (Egger Intercept, mr_wrapper)")
add_formatted_sheet(wb, "Info", info, "Summary Information and Diagnostics (mr_wrapper)")
add_formatted_sheet(wb, "SNPs_Retained", snps_retained, "SNPs Retained After Filtering (mr_wrapper)")

add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")

if (!is.null(lasso_result)) {
  lasso_df <- data.frame(
    Exposure = lasso_result@Exposure,
    Outcome = lasso_result@Outcome,
    Estimate = lasso_result@Estimate,
    StdError = lasso_result@StdError,
    CILower = lasso_result@CILower,
    CIUpper = lasso_result@CIUpper,
    Pvalue = lasso_result@Pvalue,
    SNPs = lasso_result@SNPs,
    Valid = lasso_result@Valid,
    ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
    RegEstimate = lasso_result@RegEstimate,
    RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
    Lambda = lasso_result@Lambda
  )
  add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
} else {
  addWorksheet(wb, "MR-Lasso")
  writeData(wb, "MR-Lasso", "MR-Lasso Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-Lasso", title_style, rows = 1, cols = 1)
}

if (!is.null(conmix_result)) {
  conmix_df <- data.frame(
    Exposure = conmix_result@Exposure,
    Outcome = conmix_result@Outcome,
    Estimate = conmix_result@Estimate,
    Pvalue = conmix_result@Pvalue,
    SNPs = conmix_result@SNPs,
    Psi = conmix_result@Psi,
    CILower = conmix_result@CILower,
    CIUpper = conmix_result@CIUpper,
    CIRange = paste(conmix_result@CIRange, collapse = ", "),
    CIMin = conmix_result@CIMin,
    CIMax = conmix_result@CIMax,
    CIStep = conmix_result@CIStep,
    Valid = paste(conmix_result@Valid, collapse = ", "),
    ValidSNPs = paste(conmix_result@ValidSNPs, collapse = ", "),
    Alpha = conmix_result@Alpha
  )
  add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
} else {
  addWorksheet(wb, "MR-ConMix")
  writeData(wb, "MR-ConMix", "MR-ConMix Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-ConMix", title_style, rows = 1, cols = 1)
}

add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis Results")
add_formatted_sheet(wb, "Instruments", instruments_subset, "Filtered Instruments Data")
add_formatted_sheet(wb, "Clumped_Data", clumped_dat_subset, "Clumped SNPs Data")

saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved to:", excel_file, "\n")

# 7.2: Save Forest Plot
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat("Forest plot saved to:", plot_file, "\n")

5 MR with COJO Instruments

With: with relaxed pval (5E-6)

rm(list = ls())

# Load Required Libraries
library(TwoSampleMR)          # Core MR analyses
library(MendelianRandomization) # MR-Lasso and ConMix
library(ieugwasr)             # LD clumping
library(MRInstruments)        # Proxy SNP lookup
library(dplyr)                # Data manipulation
library(tidyr)                # Data reshaping
library(data.table)           # Fast data handling
library(readxl)               # Read Excel
library(openxlsx)             # Write formatted Excel
library(ggplot2)              # Plotting
library(ggrepel)              # Text labeling in plots
library(corrplot)             # Correlation matrices
library(RhpcBLASctl)          # Multithreading control
library(biomaRt)              # Gene annotation
library(scales)               # Color generation
library(pheatmap)             # Purrty heatmap

# Set Working Directory & Setup Folder
setwd("/Users/charleenadams/temp_BI/mr_nat_pter_bmi")
if (!dir.exists("cojo")) dir.create("cojo", recursive = TRUE)

# ---------------------------------------------
# Load and Format NAT Data
# ---------------------------------------------
nat <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/C100005466.uncompressed") %>% as.data.frame()
filtered_nat_df <- nat %>% 
  filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
  arrange(rsids)

exp_dat <- filtered_nat_df %>%
  mutate(
    chr.exposure = chrom,
    pos.exposure = pos,   
    beta.exposure = beta,
    se.exposure = sebeta,
    exposure = "N-acetyltaurine",
    id.exposure = "N-acetyltaurine",
    pval.exposure = pval,
    SNP.exposure = rsids,
    SNP = rsids,
    effect_allele.exposure = alt,
    other_allele.exposure = ref,
    eaf.exposure = maf,
    samplesize.exposure = 6099,
    id_col = nearest_genes
  )

# ---------------------------------------------
# Load and Format BMI Data
# ---------------------------------------------
bmi <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/merged_gwas_data_grch38_twosample.tsv")
filtered_bmi_df <- bmi %>%
  filter(!is.na(SNP) & SNP != "" & grepl("^rs", SNP)) %>%
  arrange(SNP)

out_dat <- filtered_bmi_df %>%
  mutate(
    SNP = SNP,
    SNP.outcome = SNP,
    chr.outcome = CHR_x,
    pos.outcome = BP_GRCh38,
    beta.outcome = beta,
    se.outcome = se,
    pval.outcome = pval,
    effect_allele.outcome = effect_allele,
    other_allele.outcome = other_allele,
    eaf.outcome = eaf,
    samplesize.outcome = 460000,
    id.outcome = "BMI",
    outcome = "BMI"
  )

# ---------------------------------------------
# Harmonize Data
# ---------------------------------------------
dat <- harmonise_data(exposure_dat = exp_dat, outcome_dat = out_dat)
today <- Sys.Date()
write.csv(dat, paste0("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/harmonized_nat_Jurgens_BMI_dat_", today, ".csv"), row.names = FALSE)

# ---------------------------------------------
# Select 1MB around PTER region
# ---------------------------------------------
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
pter_data <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "chromosome_name", 
                                  "transcription_start_site", "strand"),
                   filters = "external_gene_name", values = "PTER", mart = ensembl)
pter_tss <- min(pter_data$transcription_start_site)
pter_chrom <- pter_data$chromosome_name[1]
window_size <- 500000
tss_start <- pter_tss - window_size
tss_end <- pter_tss + window_size

dat_filtered <- dat %>%
  filter(chr.outcome == pter_chrom & pos.outcome >= tss_start & pos.outcome <= tss_end)
write.csv(dat_filtered, file = "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv", row.names = FALSE)
cat("Filtered data saved to: /Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv\n")
cat("Number of observations within 1 Mb of PTER TSS:", nrow(dat_filtered), "\n")

# ---------------------------------------------
# MR with COJO Analysis
# ---------------------------------------------

# Step 1: Filter Instruments
dat_filtered <- read.csv("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv")
instruments <- dat_filtered %>%
  filter(mr_keep == TRUE, pval.exposure < 5e-6) %>%  # Relaxed threshold
  mutate(
    rsid = SNP,
    pval = pval.exposure,
    id = "N-acetyltaurine (1MB around PTER TSS)",
    F_stat = (beta.exposure / se.exposure)^2  # Compute F-statistic
  ) %>%
  filter(F_stat > 10)  # Exclude weak instruments

cat("Number of instruments after F-stat filtering:", nrow(instruments), "\n")

# Step 1.5: Steiger Filtering
instruments <- steiger_filtering(instruments)
instruments <- instruments %>% filter(steiger_dir == TRUE)
cat("Number of instruments after Steiger filtering:", nrow(instruments), "\n")
cat("Preview of Steiger-filtered instruments:\n")
print(head(instruments))

# Step 2: Prepare COJO Input File: SNP A1 A2 freq b se p N 
cojo_input <- instruments %>%
  dplyr::select(SNP, effect_allele.exposure, other_allele.exposure, 
                eaf.exposure,  # Add frequency
                beta.exposure, se.exposure, pval.exposure, 
                samplesize.exposure, chr.exposure, pos.exposure) %>%
  rename(A1 = effect_allele.exposure,
         A2 = other_allele.exposure,
         freq = eaf.exposure,  # Rename to freq
         b = beta.exposure,
         se = se.exposure,
         p = pval.exposure,
         N = samplesize.exposure,
         CHR = chr.exposure,
         BP = pos.exposure)

# Save COJO input file
fwrite(cojo_input, "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo_input.txt", sep = "\t", quote = FALSE)

cojo_input <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo_input.txt")

# Step 3: Run COJO Analysis (requires GCTA installed)
# Note: This step assumes GCTA is installed and accessible from your terminal/command line
# Adjust the GCTA path if necessary or run this command manually in your terminal
system("gcta64 --bfile /Users/charleenadams/1000G_bfiles/EUR/EUR --cojo-file /Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo_input.txt --cojo-slct --cojo-p 5e-6 --out cojo_output")

# Produces
# cojo_output.cma.cojo
# cojo_output.jma.cojo
# cojo_output.ldr.cojo
# cojo_output.log

# Read the ld matrix file into R
ld_matrix <- read.table("cojo_output.ldr.cojo", header = TRUE, row.names = 1)
isSymmetric(as.matrix(ld_matrix))
ld_matrix <- as.matrix(ld_matrix)

# r2
ld_matrix_r2 <- ld_matrix^2

# heatmap
heatmap(as.matrix(ld_matrix), symm = TRUE, main = "LD Matrix Heatmap")

# Create the customized heatmap
pheatmap(ld_matrix,
         # Color scheme: dark blue to yellow gradient
         color = colorRampPalette(c("#1f77b4", "white", "#ffcc00"))(100),
         # No clustering since LD matrices are position-based
         cluster_rows = FALSE,
         cluster_cols = FALSE,
         # Display correlation values in cells
         display_numbers = TRUE,
         number_color = "black",          # Number color
         fontsize_number = 14,             # Font size for numbers
         # Customize borders
         border_color = "gray30",         # Thin gray borders around cells
         # Legend customization
         legend = TRUE,                   # Include legend (default)
         legend_breaks = seq(-1, 1, 0.5), # Custom breaks for legend
         legend_labels = c("-1", "-0.5", "0", "0.5", "1"), # Custom labels
         # Labels and title
         main = "LD Matrix Heatmap", # Title
         fontsize = 14,                   # Title font size
         fontsize_row = 12,                # Row label font size
         fontsize_col = 12,                # Column label font size
         angle_col = 45,                  # Rotate column labels for readability
         # Output settings
         filename = "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/custom_ld_heatmap.png", # Save file name
         width = 10,                      # Width in inches
         height = 10,                     # Height in inches
         res = 600)                       # Resolution in DPI

# Read COJO results
cojo_results <- fread("cojo_output.jma.cojo") %>%
  dplyr::select(SNP, CHR = Chr, BP = bp, bJ, bJ_se, pJ)

# Merge COJO results back with instruments to get full MR data
instruments_cojo <- instruments %>%
  inner_join(cojo_results, by = "SNP") %>%
  mutate(beta.exposure = bJ,       # Update beta with conditional estimate
         se.exposure = bJ_se,      # Update SE with conditional estimate
         pval.exposure = pJ)       # Update p-value with conditional estimate

cat("Number of conditionally independent instruments after COJO:", nrow(instruments_cojo), "\n")
cat("Preview of COJO-selected instruments:\n")
print(head(instruments_cojo))

# Step 4: Perform MR with COJO-Selected Instruments
mr_result <- mr_wrapper(instruments_cojo)
estimates <- mr_result$`N-acetyltaurine.BMI`$estimates
heterogeneity <- mr_result$`N-acetyltaurine.BMI`$heterogeneity
directional_pleiotropy <- mr_result$`N-acetyltaurine.BMI`$directional_pleiotropy
info <- mr_result$`N-acetyltaurine.BMI`$info
snps_retained <- mr_result$`N-acetyltaurine.BMI`$snps_retained

# Individual MR Analyses with COJO Instruments
ivw_result <- mr(instruments_cojo, method_list = "mr_ivw")
egger_result <- mr(instruments_cojo, method_list = "mr_egger_regression")
weighted_median_result <- mr(instruments_cojo, method_list = "mr_weighted_median")

mr_input <- mr_input(
  bx = instruments_cojo$beta.exposure,
  bxse = instruments_cojo$se.exposure,
  by = instruments_cojo$beta.outcome,
  byse = instruments_cojo$se.outcome,
  exposure = "N-acetyltaurine",
  outcome = "BMI",
  snps = instruments_cojo$SNP
)
lasso_result <- tryCatch({
  mr_lasso(mr_input)
}, error = function(e) {
  cat("Error in MR-Lasso:", conditionMessage(e), "\n")
  NULL
})
conmix_result <- tryCatch({
  mr_conmix(mr_input)
}, error = function(e) {
  cat("Error in MR-ConMix:", conditionMessage(e), "\n")
  NULL
})

heterogeneity_result <- mr_heterogeneity(instruments_cojo)
pleiotropy_result <- mr_pleiotropy_test(instruments_cojo)
loo_result <- mr_leaveoneout(instruments_cojo)

wald_ratios <- instruments_cojo %>%
  mutate(
    wald_beta = beta.outcome / beta.exposure,
    wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
    pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
    method = paste("Wald Ratio:", SNP)
  ) %>%
  dplyr::select(SNP, wald_beta, wald_se, pval, method)

cat("\n=== Wald Ratio Tests for COJO Instruments ===\n")
print(wald_ratios)

# Step 5: Prepare Data for Forest Plot
ivw_df <- ivw_result %>% mutate(method = "IVW")
egger_df <- egger_result %>% mutate(method = "MR-Egger")
weighted_median_df <- weighted_median_result %>% mutate(method = "Weighted Median")
lasso_df <- if (!is.null(lasso_result)) {
  data.frame(method = "MR-Lasso", b = lasso_result@Estimate, 
             se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
  NULL
}
conmix_se <- if (!is.null(conmix_result)) {
  (conmix_result@CIUpper - conmix_result@CILower) / (2 * 1.96)
} else {
  NA
}
conmix_df <- if (!is.null(conmix_result)) {
  data.frame(method = "MR-ConMix", b = conmix_result@Estimate, 
             se = conmix_se, pval = conmix_result@Pvalue)
} else {
  NULL
}

mr_results <- bind_rows(
  ivw_df,
  egger_df,
  weighted_median_df,
  lasso_df,
  conmix_df,
  wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
  mutate(method = as.factor(method)) %>%
  filter(!is.na(method) & method != "NA") %>%
  arrange(b) %>%
  mutate(method = factor(method, levels = unique(method)))

# Step 6: Create Forest Plot
base_colors <- c(
  "IVW" = "#1F78B4",
  "MR-Egger" = "#FF7F00",
  "Weighted Median" = "#33A02C",
  "MR-Lasso" = "#FB9A99",
  "MR-ConMix" = "#E41A1C"
)
wald_methods <- unique(wald_ratios$method)
n_wald <- length(wald_methods)
if (n_wald > 0) {
  wald_colors <- hue_pal()(n_wald)
  names(wald_colors) <- wald_methods
} else {
  wald_colors <- NULL
}
color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]

forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  labs(
    title = "Mendelian Randomization Estimates (COJO):\n N-acetyltaurine (1MB around PTER TSS)\non Jurgens BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  xlim(-0.15, 0.05)

# Step 7: Save Results and Plot
today <- Sys.Date()
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo/"
excel_file <- paste0(results_dir, "MR_Results_PTER_Jurgens_BMI_COJO_", today, ".xlsx")
plot_file <- paste0(results_dir, "MR_Forestplot_PTER_Jurgens_BMI_COJO_", today, ".png")

if (!dir.exists(results_dir)) {
  dir.create(results_dir, recursive = TRUE)
  cat("Created directory:", results_dir, "\n")
}

# 7.01. Save same plot ordered by method

# Dynamically determine x-axis limits based on confidence intervals
x_min <- min(mr_results$b - 1.96 * mr_results$se, na.rm = TRUE)
x_max <- max(mr_results$b + 1.96 * mr_results$se, na.rm = TRUE)

# Explicitly set the correct order by method name
desired_order <- c(
  "IVW",
  "MR-Egger",
  "MR-Lasso",
  "MR-ConMix",
  "Weighted Median",
  "Wald Ratio: rs117110974",
  "Wald Ratio: rs117372132",
  "Wald Ratio: rs142238737",
  "Wald Ratio: rs45485296",
  "Wald Ratio: rs7084722",
  "Wald Ratio: rs1023275",
  "Wald Ratio: rs61844133"
)

# Force the order in the dataset itself
mr_results$method <- factor(mr_results$method, levels = desired_order)

# Sort the dataset based on the new factor levels BEFORE plotting
mr_results <- mr_results[order(mr_results$method), ]

forest_plot_by_method <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  # Plot error bars FIRST to prevent them from being hidden
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), 
                 height = 0.2, na.rm = TRUE, linewidth = 0.8) +
  # Plot points over error bars
  geom_point(size = 3) +
  # Add reference line at zero
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  # Labels and title
  labs(
    title = "Mendelian Randomization Estimates (COJO):\n N-acetyltaurine (1MB around PTER TSS)\non Jurgens BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  # Styling
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  # Adjust x-limits dynamically based on data
  xlim(x_min - 0.02, x_max + 0.02)

# Print plot to check output
print(forest_plot_by_method)

# Save the reordered plot
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo/"
plot_file_by_method <- paste0(results_dir, "MR_Forestplot_PTER_COJO_Jurgens_BMI_By_Method_", today, ".png")
ggsave(plot_file_by_method, plot = forest_plot_by_method, dpi = 300, width = 8, height = 5)


# 7.1: Save Everything in One Excel Spreadsheet
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")

add_formatted_sheet <- function(wb, sheet_name, data, title) {
  addWorksheet(wb, sheet_name)
  writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
  writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
  mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
  setRowHeights(wb, sheet_name, rows = 1, heights = 20)
  setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
  addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}

toc_data <- data.frame(
  Sheet = c("Estimates", "Heterogeneity", "Directional_Pleiotropy", "Info", "SNPs_Retained",
            "IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix", 
            "Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
            "Instruments", "Clumped_Data", "COJO_Results"),
  Title = c("MR Causal Estimates (mr_wrapper)", "Heterogeneity Results (mr_wrapper)", 
            "Directional Pleiotropy Results (mr_wrapper)", "Summary Info (mr_wrapper)", 
            "SNPs Retained (mr_wrapper)",
            "Inverse Variance Weighted (IVW) Results", "MR-Egger Results", 
            "Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
            "Wald Ratio Tests for Each Instrument", "Heterogeneity Test Results", 
            "Pleiotropy Test Results", "Leave-One-Out Analysis Results",
            "Filtered Instruments Data", "Clumped SNPs Data", "COJO Conditionally Independent SNPs")
)
toc_data <- toc_data[complete.cases(toc_data), ]

addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")

add_formatted_sheet(wb, "Estimates", estimates, "MR Causal Estimates for N-acetyltaurine on BMI (mr_wrapper)")
add_formatted_sheet(wb, "Heterogeneity", heterogeneity, "Heterogeneity Results (mr_wrapper)")
add_formatted_sheet(wb, "Directional_Pleiotropy", directional_pleiotropy, "Directional Pleiotropy Results (Egger Intercept, mr_wrapper)")
add_formatted_sheet(wb, "Info", info, "Summary Information and Diagnostics (mr_wrapper)")
add_formatted_sheet(wb, "SNPs_Retained", snps_retained, "SNPs Retained After Filtering (mr_wrapper)")
add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")
if (!is.null(lasso_result)) {
  lasso_df <- data.frame(
    Exposure = lasso_result@Exposure,
    Outcome = lasso_result@Outcome,
    Estimate = lasso_result@Estimate,
    StdError = lasso_result@StdError,
    CILower = lasso_result@CILower,
    CIUpper = lasso_result@CIUpper,
    Pvalue = lasso_result@Pvalue,
    SNPs = lasso_result@SNPs,
    Valid = lasso_result@Valid,
    ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
    RegEstimate = lasso_result@RegEstimate,
    RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
    Lambda = lasso_result@Lambda
  )
  add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
} else {
  addWorksheet(wb, "MR-Lasso")
  writeData(wb, "MR-Lasso", "MR-Lasso Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-Lasso", title_style, rows = 1, cols = 1)
}
if (!is.null(conmix_result)) {
  conmix_df <- data.frame(
    Exposure = conmix_result@Exposure,
    Outcome = conmix_result@Outcome,
    Estimate = conmix_result@Estimate,
    Pvalue = conmix_result@Pvalue,
    SNPs = conmix_result@SNPs,
    Psi = conmix_result@Psi,
    CILower = conmix_result@CILower,
    CIUpper = conmix_result@CIUpper,
    CIRange = paste(conmix_result@CIRange, collapse = ", "),
    CIMin = conmix_result@CIMin,
    CIMax = conmix_result@CIMax,
    CIStep = conmix_result@CIStep,
    Valid = paste(conmix_result@Valid, collapse = ", "),
    ValidSNPs = paste(conmix_result@ValidSNPs, collapse = ", "),
    Alpha = conmix_result@Alpha
  )
  add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
} else {
  addWorksheet(wb, "MR-ConMix")
  writeData(wb, "MR-ConMix", "MR-ConMix Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-ConMix", title_style, rows = 1, cols = 1)
}
add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis Results")
add_formatted_sheet(wb, "Instruments", instruments, "Filtered Instruments Data")
#add_formatted_sheet(wb, "Clumped_Data", clumped_dat_subset, "Clumped SNPs Data")
add_formatted_sheet(wb, "COJO_Results", cojo_results, "COJO Conditionally Independent SNPs")

saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved to:", excel_file, "\n")

# 7.2: Save Forest Plot
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat("Forest plot saved to:", plot_file, "\n")

6 Results (COJO-selected SNPs)

6.0.1 Main Finding

Across multiple MR methods, including Inverse Variance Weighted (IVW), Weighted Median, MR-Lasso, and MR-ConMix, there is consistent evidence of a negative causal association between NAT and BMI. This suggests that genetically higher levels of NAT are associated with lower BMI. Specifically, the IVW method yielded a beta coefficient of -0.016 (SE = 0.006, p = 0.012), indicating a statistically significant negative association. The Weighted Median method reinforced this finding with a beta coefficient of -0.021 (SE = 0.006, p = 0.001), and MR-ConMix further supported the association with a beta coefficient of -0.033 (p = 0.016). The MR-Lasso analysis also confirmed a beta estimate of -0.016 (SE = 0.006, p = 0.012).

Although multiple MR methods consistently suggest a negative association, MR-Egger did not detect a significant association (beta = -0.002, SE = 0.014, p = 0.886)

6.1 Heterogeneity and Pleiotropy

Heterogeneity tests revealed moderate variability among the SNPs. The IVW heterogeneity test yielded a Q-statistic of 13.3 (degrees of freedom = 6, p = 0.039), suggesting heterogeneity among the genetic instruments. However, no significant directional pleiotropy was detected by the MR-Egger intercept test (intercept = -0.004, SE = 0.004, p = 0.320). While this implies reasonably reliable results, the presence of heterogeneity warrants caution. MR-Lasso mitigates these concerns by applying L1 regularization, effectively shrinking the effects of potentially invalid instruments toward zero. MR-ConMix, on the other hand, addresses pleiotropy by modeling a mixture of valid and invalid instruments, using parameters like Psi (0.024) and Alpha (0.05) to enhance robustness. Together, these methods improve the reliability of causal inference under potential violations of MR assumptions.

Replicating this study in additional populations to confirm causality and enhance the robustness of the findings is warranted.

7 Extra Analysis with More-Stringent Criteria

### MR with pval (5E-8) and strict clumping window 10000 kb

<button id="Strict-MR-button" onclick="toggleVisibility('Strict-MR')">Strict-MR</button>
<div id="Strict-MR" style="display:none;">  


# ---------------------------------------------
# Load and Format NAT Data
# ---------------------------------------------

# NB: Data is on GRCh38.
# The sample size is 6099 (known from website) for NAT from the METSIM PheWeb data.

# wget https://pheweb.org/metsim-metab/download/C100005466
# bgzip -c -d C100005466 > C100005466.uncompressed
nat <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/C100005466.uncompressed")
nat <- as.data.frame(nat)

# Filter out rows with missing or invalid RSIDs
filtered_nat_df <- nat %>% 
  filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
  arrange(rsids)

# Format harmonized data for exposure
exp_dat <- filtered_nat_df %>%
  mutate(chr.exposure = chrom,
         pos.exposure = pos,   
         beta.exposure = beta,
         se.exposure = sebeta,
         exposure = "N-acetyltaurine",
         id.exposure = "N-acetyltaurine",
         pval.exposure = pval,
         SNP.exposure = rsids,
         SNP = rsids,
         effect_allele.exposure = alt,
         other_allele.exposure = ref,
         eaf.exposure = maf,
         samplesize.exposure = 6099,
         id_col = nearest_genes)

# ---------------------------------------------
# Load and Format BMI Data
# ---------------------------------------------

# I previously did LiftOver on GWAS_sumstats_EUR__invnorm_bmi__TOTALsample.tsv to obtain GRCh38
bmi <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/merged_gwas_data_grch38_twosample.tsv")

# NB: The sample size is 460000, known from the Jurgens 2022 README
# Filter out rows with missing or invalid SNPs
filtered_bmi_df <- bmi %>%
  filter(!is.na(SNP) & SNP != "" & grepl("^rs", SNP)) %>%
  arrange(SNP)

# Format harmonized data for outcome
out_dat <- filtered_bmi_df %>%
  mutate(SNP = SNP,
         SNP.outcome = SNP,
         chr.outcome = CHR_x,
         pos.outcome = BP_GRCh38,
         beta.outcome = beta,
         se.outcome = se,
         pval.outcome = pval,
         effect_allele.outcome = effect_allele,
         other_allele.outcome = other_allele,
         eaf.outcome = eaf,
         samplesize.outcome = 460000,
         id.outcome = "BMI",
         outcome = "BMI")

# ---------------------------------------------
# Harmonize Data (Exposure and Outcome)
# ---------------------------------------------

dat <- harmonise_data(
  exposure_dat = exp_dat,
  outcome_dat = out_dat)

# Save with today's date appended
# Get today's date
today <- Sys.Date()  

# Construct the filename with the date
file_path <- paste0("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/harmonized_nat_Jurgens_BMI_dat_", today, ".csv")

# Write the CSV file
write.csv(dat, file = file_path, row.names = FALSE)

# ---------------------------------------------
# Select 1MB around PTER region
# ---------------------------------------------

# Step 1: Connect to Ensembl and retrieve TSS for PTER
# Using GRCh38 (hg38) - adjust to GRCh37 if needed
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

# Define attributes to fetch: TSS, chromosome, strand, etc.
attributes <- c("ensembl_gene_id", "external_gene_name", "chromosome_name", 
                "transcription_start_site", "strand")

# Query Ensembl for PTER
pter_data <- getBM(attributes = attributes,
                   filters = "external_gene_name",
                   values = "PTER",
                   mart = ensembl)

# Inspect the result
print("PTER gene data from Ensembl:")
print(pter_data)

# PTER may have multiple transcripts with different TSSs. 
# Assuming the most upstream TSS for simplicity (common practice).
# Adjust logic if a specific transcript or criteria is needed.
pter_tss <- min(pter_data$transcription_start_site)  # Most upstream TSS
pter_chrom <- pter_data$chromosome_name[1]           # Should be "3" for PTER
pter_strand <- pter_data$strand[1]                   # 1 (forward) or -1 (reverse)

cat("Selected TSS for PTER:", pter_tss, "on chromosome", pter_chrom, "\n")

# Step 2: Define the 1 Mb window around TSS (500 kb upstream and downstream)
window_size <- 500000  # 500 kb in base pairs
tss_start <- pter_tss - window_size  # 500 kb upstream
tss_end <- pter_tss + window_size    # 500 kb downstream

# Step 3: Filter the data for observations within this window
# Assuming 'chr.outcome' is chromosome and 'pos.outcome' is position in GRCh38
# Ensure chromosome format matches (e.g., "3" vs "chr3")
dat_filtered <- dat %>%
  filter(chr.outcome == pter_chrom &  # Match chromosome
         pos.outcome >= tss_start &   # Within 500 kb upstream
         pos.outcome <= tss_end)      # Within 500 kb downstream

# Step 4: Inspect and save the filtered data
print("Filtered observations within 1 Mb of PTER TSS:")
print(head(dat_filtered))

# Save with today's date (e.g., 2025-02-19) appended to the filename
today <- Sys.Date()  # e.g., "2025-02-19"
output_file <- paste0("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/",
                      "filtered_PTER_1Mb_Jurgens_BMI_", today, ".csv")
write.csv(dat_filtered, file = output_file, row.names = FALSE)

cat("Filtered data saved to:", output_file, "\n")

# Optional: Count how many observations were selected
cat("Number of observations within 1 Mb of PTER TSS:", nrow(dat_filtered), "\n")

# ---------------------------------------------
# MR
# ---------------------------------------------

# Step 1: Filter Instruments ----
# Purpose: Select strong, valid genetic instruments for N-acetyltaurine exposure

dat_filtered <- read.csv("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_Jurgens_BMI_2025-02-19.csv")
instruments <- dat_filtered
instruments <- instruments[which(instruments$mr_keep == "TRUE"), ]
instruments <- instruments[which(instruments$pval.exposure < 5E-8), ]
instruments$rsid <- instruments$SNP
instruments$pval <- instruments$pval.exposure
instruments$id <- "N-acetyltaurine (1MB around PTER TSS; Phosphotriesterase Related; N-acetyltaurine hydrolase)"

# Verify the filtered instruments
cat("Dimensions of instruments after filtering:", dim(instruments), "\n")
cat("Preview of instruments:\n")
print(head(instruments))

# Step 2: Local LD Clumping ----
# Purpose: Ensure independence among SNPs for MR methods
clumped <- ieugwasr::ld_clump(
  dplyr::tibble(rsid = instruments$rsid, pval = instruments$pval, id = instruments$id),
  plink_bin = genetics.binaRies::get_plink_binary(),
  bfile = "/Users/charleenadams/1000G_bfiles/EUR/EUR",  # Path to local EUR bfile
  clump_r2 = 0.001,    # LD threshold
  clump_kb = 10000     # Clumping window
)

# Filter original data to retain only clumped SNPs
clumped_dat <- instruments %>% dplyr::filter(SNP %in% clumped$rsid)

cat("Local clumping completed. Number of SNPs retained:", nrow(clumped_dat), "\n")
cat("Preview of clumped data:\n")
print(head(clumped_dat))

# Step 3: Perform MR Analyses ----
# Overview: Multiple MR methods

# 3.1: Inverse Variance Weighted (IVW)
ivw_result <- mr(clumped_dat, method_list = "mr_ivw")

# 3.2: MR-Egger
egger_result <- mr(clumped_dat, method_list = "mr_egger_regression")

# 3.3: MR-PRESSO
if (nrow(clumped_dat) >= 3) {
  presso_result <- mr_presso(
    BetaOutcome = "beta.outcome",
    BetaExposure = "beta.exposure",
    SdOutcome = "se.outcome",
    SdExposure = "se.exposure",
    OUTLIERtest = TRUE,
    DISTORTIONtest = TRUE,
    data = clumped_dat,
    NbDistribution = 1000,
    SignifThreshold = 0.05
  )
} else {
  cat("MR-PRESSO requires at least 3 SNPs. Skipping MR-PRESSO.\n")
  presso_result <- NULL
}

# 3.4: Weighted Median
weighted_median_result <- mr(clumped_dat, method_list = "mr_weighted_median")

# 3.5: MR-Lasso
# Prepare MRInput object for MendelianRandomization package
mr_input <- mr_input(
  bx = clumped_dat$beta.exposure,
  bxse = clumped_dat$se.exposure,
  by = clumped_dat$beta.outcome,
  byse = clumped_dat$se.outcome,
  exposure = "N-acetyltaurine",
  outcome = "BMI",
  snps = clumped_dat$SNP
)

lasso_result <- tryCatch({
  mr_lasso(mr_input)
}, error = function(e) {
  cat("Error in MR-Lasso:", conditionMessage(e), "\n")
  NULL
})

# Expanded reporting for MR-Lasso
if (!is.null(lasso_result)) {
  cat("\n=== Detailed MR-Lasso Results ===\n")
  cat("Exposure:", lasso_result@Exposure, "\n")
  cat("Outcome:", lasso_result@Outcome, "\n")
  cat("Estimate (Beta):", lasso_result@Estimate, "\n")
  cat("Standard Error:", lasso_result@StdError, "\n")
  cat("95% CI Lower:", lasso_result@CILower, "\n")
  cat("95% CI Upper:", lasso_result@CIUpper, "\n")
  cat("Alpha (Significance Level):", lasso_result@Alpha, "\n")
  cat("P-value:", lasso_result@Pvalue, "\n")
  cat("Number of variants (SNPs):", lasso_result@SNPs, "\n")
  cat("Number of valid instruments (Valid):", lasso_result@Valid, "\n")
  if (length(lasso_result@ValidSNPs) > 0) {
    cat("Valid SNPs (RSIDs):", paste(lasso_result@ValidSNPs, collapse = ", "), "\n")
  } else {
    cat("No valid SNPs identified in MR-Lasso.\n")
  }
  cat("Regularization Estimate:", lasso_result@RegEstimate, "\n")
  cat("Regularization Intercept:", paste(lasso_result@RegIntercept, collapse = ", "), "\n")
  cat("Tuning Parameter (Lambda):", lasso_result@Lambda, "\n")
  cat("All SNPs in mr_input:", paste(mr_input@snps, collapse = ", "), "\n")
  print(lasso_result)
}

# Expanded reporting for MR-PRESSO
if (!is.null(presso_result)) {
  cat("\n=== Detailed MR-PRESSO Results ===\n")
  cat("Main MR Results:\n")
  print(presso_result$`Main MR results`)
  if (!is.null(presso_result$`MR-PRESSO results`$`Global Test`)) {
    cat("Global Test Results:\n")
    cat("RSSobs:", presso_result$`MR-PRESSO results`$`Global Test`$RSSobs, "\n")
    cat("P-value:", presso_result$`MR-PRESSO results`$`Global Test`$Pvalue, "\n")
  }
  if (!is.null(presso_result$`Outlier Test`)) {
    cat("Outlier Test Results:\n")
    print(presso_result$`Outlier Test`)
  }
  if (!is.null(presso_result$`Distortion Test`)) {
    cat("Distortion Test Results:\n")
    print(presso_result$`Distortion Test`)
  }
  if (!is.null(presso_result$`Raw Estimates`)) {
    cat("Raw Causal Estimates:\n")
    print(presso_result$`Raw Estimates`)
  }
}

# 3.6: Heterogeneity Test
heterogeneity_result <- mr_heterogeneity(clumped_dat)

# 3.7: Pleiotropy Test
pleiotropy_result <- mr_pleiotropy_test(clumped_dat)

# Step 4: Perform Wald Ratio Tests for Each Instrument ----
wald_ratios <- clumped_dat %>%
  mutate(
    wald_beta = beta.outcome / beta.exposure,  # Causal effect (beta) for each SNP
    wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),  # Standard error
    pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),  # P-value
    method = paste("Wald Ratio:", SNP)  # Label each Wald ratio with its RSID
  ) %>%
  dplyr::select(SNP, wald_beta, wald_se, pval, method)

cat("\n=== Wald Ratio Tests for Each Instrument ===\n")
print(wald_ratios)

# Step 5: Prepare Data for Forest Plot ----
# Pre-process each result separately, excluding Weighted Mode
ivw_df <- ivw_result %>% mutate(method = "IVW")
egger_df <- egger_result %>% mutate(method = "MR-Egger")
weighted_median_df <- weighted_median_result %>% mutate(method = "Weighted Median")

presso_df <- if (!is.null(presso_result)) {
  as.data.frame(presso_result$`Main MR results`) %>%
    dplyr::filter(`Causal Estimate` != "NA" & !is.na(`Causal Estimate`)) %>%  # Remove NA entries
    dplyr::mutate(method = "MR-PRESSO", b = `Causal Estimate`, se = Sd, pval = `P-value`) %>%
    dplyr::select(method, b, se, pval)
} else {
  NULL
}

lasso_df <- if (!is.null(lasso_result)) {
  data.frame(method = "MR-Lasso", b = lasso_result@Estimate, 
             se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
  NULL
}

# Combine all results, including Wald ratios, ensuring exact betas and SEs from results
mr_results <- bind_rows(
  ivw_df,
  egger_df,
  weighted_median_df,
  presso_df,
  lasso_df,
  wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
  mutate(method = as.factor(method))  # Ensure method is a factor for plotting

# Filter out any NA methods or entries for the plot
mr_results <- mr_results %>%
  filter(!is.na(method) & method != "NA")

# Order methods by magnitude of effect (beta) from smallest to largest
mr_results <- mr_results %>%
  arrange(b) %>%
  mutate(method = factor(method, levels = method))  # Re-level factor for ordered plotting

# Step 6: Create Beautiful Forest Plot Using ggplot2 ----
# Define beautiful, distinct colors for each method
beautiful_colors <- c(
  "IVW" = "#1F78B4",           # Soft Blue
  "MR-Egger" = "#FF7F00",      # Bright Orange
  "Weighted Median" = "#33A02C", # Vibrant Green
  "MR-PRESSO" = "#6A3D9A",     # Rich Purple
  "MR-Lasso" = "#FB9A99",      # Coral Pink
  "Wald Ratio: rs117372132" = "#E41A1C",  # Deep Red
  "Wald Ratio: rs142238737" = "#FF6B6B",  # Light Red
  "Wald Ratio: rs3802555" = "#FFD700",    # Gold
  "Wald Ratio: rs7075357" = "#ADFF2F"     # Green-Yellow
)

# Dynamically adjust colors based on available methods in mr_results
available_methods <- mr_results$method
color_list <- beautiful_colors[names(beautiful_colors) %in% available_methods]

forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  geom_point(size = 3) +  # Colored points for estimates
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +  # Horizontal confidence intervals
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +  # Zero line
  labs(
    title = "Mendelian Randomization Estimates: N-acetyltaurine (1MB around PTER TSS) on Jurgens BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",  # Remove legend entirely
    panel.background = element_rect(fill = "white", color = NA),  # Ensure no background lines
    plot.background = element_rect(fill = "white", color = NA)  # Ensure no plot background lines
  ) +
  scale_color_manual(values = color_list)  # Apply beautiful colors

# Step 7: Save Results and Plot to Files ----
today <- Sys.Date()  # e.g., "2025-02-19"
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/results/"
excel_file <- paste0(results_dir, "MR_Results_PTER_Jurgens_BMI_", today, ".xlsx")
plot_file <- paste0(results_dir, "MR_Forestplot_PTER_Jurgens_BMI_", today, ".png")

# Ensure the results directory exists
if (!dir.exists(results_dir)) {
  dir.create(results_dir, recursive = TRUE)
  cat("Created directory:", results_dir, "\n")
}

# 7.1: Save Excel Spreadsheet ----
wb <- createWorkbook()

# Define styles
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")

# Helper function to add and format a sheet
add_formatted_sheet <- function(wb, sheet_name, data, title) {
  addWorksheet(wb, sheet_name)
  writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
  writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
  mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
  setRowHeights(wb, sheet_name, rows = 1, heights = 20)
  setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
  addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}

# TOC data
toc_data <- data.frame(
  Sheet = c("IVW", "MR-Egger", if (!is.null(presso_result)) "MR-PRESSO", 
            "Weighted_Median", if (!is.null(lasso_result)) "MR-Lasso",
            "Wald_Ratios", "Heterogeneity", "Pleiotropy", "Instruments", "Clumped_Data"),
  Title = c("Inverse Variance Weighted (IVW) Results", "MR-Egger Results", 
            if (!is.null(presso_result)) "MR-PRESSO Results", 
            "Weighted Median Results", 
            if (!is.null(lasso_result)) "MR-Lasso Results",
            "Wald Ratio Tests for Each Instrument", 
            "Heterogeneity Test Results", "Pleiotropy Test Results", 
            "Filtered Instruments Data", "Clumped SNPs Data")
)
toc_data <- toc_data[complete.cases(toc_data), ]

# Add TOC sheet
addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")

# Add analysis sheets
add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
if (!is.null(presso_result)) {
  # Combine Main MR results and Global Test into a single data frame for the MR-PRESSO sheet
  main_results <- as.data.frame(presso_result$`Main MR results`) %>%
    dplyr::filter(`Causal Estimate` != "NA" & !is.na(`Causal Estimate`))  # Remove NA entries
  if (!is.null(presso_result$`MR-PRESSO results`$`Global Test`)) {
    global_test_df <- data.frame(
      Metric = c("RSSobs", "P-value"),
      Value = c(presso_result$`MR-PRESSO results`$`Global Test`$RSSobs,
                presso_result$`MR-PRESSO results`$`Global Test`$Pvalue)
    )
    # Bind main results and global test, ensuring no overlap
    presso_df <- bind_rows(
      main_results %>% mutate(Type = "Main Results", `Causal Estimate` = as.numeric(`Causal Estimate`), Sd = as.numeric(Sd), `P-value` = as.numeric(`P-value`)) %>%
        dplyr::select(Exposure, `MR Analysis`, `Causal Estimate`, Sd, `T-stat`, `P-value`, Type),
      global_test_df %>% mutate(Type = "Global Test", `Causal Estimate` = NA, Sd = NA, `T-stat` = NA, `P-value` = as.numeric(Value)) %>%
        dplyr::select(Exposure = Metric, `MR Analysis` = Type, `Causal Estimate`, Sd, `T-stat`, `P-value`, Type)
    )
  } else {
    presso_df <- main_results %>% mutate(Type = "Main Results")
  }
  add_formatted_sheet(wb, "MR-PRESSO", presso_df, "MR-PRESSO Results")
  if (!is.null(presso_result$`Outlier Test`)) {
    add_formatted_sheet(wb, "MR-PRESSO_Outliers", presso_result$`Outlier Test`, "MR-PRESSO Outlier Test Results")
  }
  if (!is.null(presso_result$`Distortion Test`)) {
    add_formatted_sheet(wb, "MR-PRESSO_Distortion", presso_result$`Distortion Test`, "MR-PRESSO Distortion Test Results")
  }
  if (!is.null(presso_result$`Raw Estimates`)) {
    add_formatted_sheet(wb, "MR-PRESSO_Raw", presso_result$`Raw Estimates`, "MR-PRESSO Raw Estimates")
  }
}
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")
if (!is.null(lasso_result)) {
  lasso_df <- data.frame(
    Exposure = lasso_result@Exposure,
    Outcome = lasso_result@Outcome,
    Estimate = lasso_result@Estimate,
    StdError = lasso_result@StdError,
    CILower = lasso_result@CILower,
    CIUpper = lasso_result@CIUpper,
    Alpha = lasso_result@Alpha,
    Pvalue = lasso_result@Pvalue,
    SNPs = lasso_result@SNPs,
    Valid = lasso_result@Valid,
    ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
    RegEstimate = lasso_result@RegEstimate,
    RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
    Lambda = lasso_result@Lambda
  )
  add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
}
add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Instruments", instruments, "Filtered Instruments Data")
add_formatted_sheet(wb, "Clumped_Data", clumped_dat, "Clumped SNPs Data")

# Save the workbook
saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved beautifully to:", excel_file, "\n")

# 7.2: Save Forest Plot ----
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")  # Adjusted for more lines
cat("Forest plot saved to:", plot_file, "\n")

8 FinnGen Replication Attempt (Spoiler: It’s NULL)

With: relaxed pval (5E-6) and clump = 500

rm(list = ls())

# Load Required Libraries

# Mendelian Randomization
library(TwoSampleMR)          # Core package for Mendelian Randomization (MR) analyses
library(MendelianRandomization) # Additional MR methods, including MR-Lasso
library(ieugwasr)             # For local LD clumping with ld_clump()
library(MRInstruments)        # For proxy SNP lookup (if needed)

# Data Wrangling & Handling
library(dplyr)                # Data manipulation (filtering, joining, summarizing)
library(tidyr)                # Data reshaping
library(data.table)           # Fast and efficient data handling
library(readxl)               # Reading Excel files
library(openxlsx)             # Writing and formatting Excel files efficiently

# Statistical & Visualization
library(ggplot2)              # For plotting
library(ggrepel)              # Improved text labeling in plots
library(corrplot)             # Visualization of correlation matrices
library(RhpcBLASctl)          # Control multithreading for efficiency in MR analyses
library(biomaRt)              # Querying Ensembl for gene annotation
library(scales)               # For dynamic color generation

# Set Working Directory & Setup Folder for Project
setwd("/Users/charleenadams/temp_BI/mr_nat_pter_bmi")
if (!dir.exists("finngen_rep_P06")) dir.create("finngen_rep_P06", recursive = TRUE)

# ---------------------------------------------
# Load and Format NAT Data
# ---------------------------------------------

nat <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/C100005466.uncompressed") %>% as.data.frame()
filtered_nat_df <- nat %>% 
  filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
  arrange(rsids)

exp_dat <- filtered_nat_df %>%
  mutate(
    chr.exposure = chrom,
    pos.exposure = pos,   
    beta.exposure = beta,
    se.exposure = sebeta,
    exposure = "N-acetyltaurine",
    id.exposure = "N-acetyltaurine",
    pval.exposure = pval,
    SNP.exposure = rsids,
    SNP = rsids,
    effect_allele.exposure = alt,
    other_allele.exposure = ref,
    eaf.exposure = maf,
    samplesize.exposure = 6099,
    id_col = nearest_genes
  )

# ---------------------------------------------
# Load and Format BMI Data (FinnGen)
# ---------------------------------------------

bmi <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/summary_stats_release_finngen_R12_BMI_IRN")
filtered_bmi_df <- bmi %>%
  filter(!is.na(rsids) & rsids != "" & grepl("^rs", rsids)) %>%
  arrange(rsids) %>%
  rename(CHR = `#chrom`)

out_dat <- filtered_bmi_df %>%
  mutate(
    SNP = rsids,
    SNP.outcome = rsids,
    chr.outcome = CHR,
    pos.outcome = pos,
    beta.outcome = beta,
    se.outcome = sebeta,
    pval.outcome = pval,
    effect_allele.outcome = alt,
    other_allele.outcome = ref,
    eaf.outcome = af_alt,
    samplesize.outcome = 500348,
    id.outcome = "FinnGen BMI",
    outcome = "FinnGen BMI",
    id_col = "nearest_genes"
  )

# ---------------------------------------------
# Harmonize Data
# ---------------------------------------------

dat <- harmonise_data(exposure_dat = exp_dat, outcome_dat = out_dat)
today <- Sys.Date()
write.csv(dat, paste0("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/harmonized_nat_finngen_BMI_dat_", today, ".csv"), row.names = FALSE)

# ---------------------------------------------
# Select 1MB around PTER region
# ---------------------------------------------

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
pter_data <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "chromosome_name", 
                                  "transcription_start_site", "strand"),
                   filters = "external_gene_name", values = "PTER", mart = ensembl)
pter_tss <- min(pter_data$transcription_start_site)
pter_chrom <- pter_data$chromosome_name[1]
window_size <- 500000
tss_start <- pter_tss - window_size
tss_end <- pter_tss + window_size

dat_filtered <- dat %>%
  filter(chr.outcome == pter_chrom & pos.outcome >= tss_start & pos.outcome <= tss_end)
write.csv(dat_filtered, file = "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_finngen_BMI_2025-02-19.csv", row.names = FALSE)
cat("Filtered data saved to: /Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_finngen_BMI_2025-02-19.csv\n")
cat("Number of observations within 1 Mb of PTER TSS:", nrow(dat_filtered), "\n")

# ---------------------------------------------
# MR
# ---------------------------------------------

# Step 1: Filter Instruments
dat_filtered <- read.csv("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_finngen_BMI_2025-02-19.csv")
instruments <- dat_filtered %>%
  filter(mr_keep == TRUE, pval.exposure < 5e-6) %>%  # Relaxed threshold
  mutate(
    rsid = SNP,
    pval = pval.exposure,
    id = "N-acetyltaurine (1MB around PTER TSS)",
    F_stat = (beta.exposure / se.exposure)^2  # Compute F-statistic
  ) %>%
  filter(F_stat > 10)  # Exclude weak instruments

cat("Number of instruments after F-stat filtering:", nrow(instruments), "\n")

# Step 1.5: Steiger Filtering
instruments <- steiger_filtering(instruments)
instruments <- instruments %>% filter(steiger_dir == TRUE)  # Keep only SNPs with correct direction
cat("Number of instruments after Steiger filtering:", nrow(instruments), "\n")
cat("Preview of Steiger-filtered instruments:\n")
print(head(instruments))

# Subset instruments to keep only TwoSampleMR, F-stat, and Steiger fields
instruments_subset <- instruments %>%
  dplyr::select(
    SNP, 
    effect_allele.exposure, other_allele.exposure,
    effect_allele.outcome, other_allele.outcome,
    beta.exposure, se.exposure, pval.exposure,
    beta.outcome, se.outcome, pval.outcome,
    eaf.exposure, eaf.outcome,
    id.exposure, exposure,
    id.outcome, outcome,
    samplesize.exposure, samplesize.outcome,
    mr_keep, action,
    F_stat,
    steiger_dir, steiger_pval
  )

# Use the same instru
snps=c("rs117372132",
"rs142238737",
"rs3802555",
"rs4747286",
"rs7075357")

test <- instruments_subset[which(instruments_subset$SNP%in%snps),]
clumped_dat <- test

# We wil skip clumping since we are using the same set of independent instruments as was chosen for the Jurgens

# # Step 2: Local LD Clumping
# clumped <- ld_clump(
#   dplyr::tibble(rsid = instruments$rsid, pval = instruments$pval, id = instruments$id),
#   plink_bin = genetics.binaRies::get_plink_binary(),
#   bfile = "/Users/charleenadams/1000G_bfiles/EUR/EUR",
#   clump_r2 = 0.001,
#   clump_kb = 500  # Reduced window
# )
# clumped_dat <- instruments %>% dplyr::filter(SNP %in% clumped$rsid)
# cat("Local clumping completed. Number of SNPs retained:", nrow(clumped_dat), "\n")
# cat("Preview of clumped data:\n")
# print(head(clumped_dat))
# 
# # Subset clumped_dat to keep only TwoSampleMR, F-stat, and Steiger fields
# clumped_dat_subset <- clumped_dat %>%
#   dplyr::select(
#     SNP, 
#     effect_allele.exposure, other_allele.exposure,
#     effect_allele.outcome, other_allele.outcome,
#     beta.exposure, se.exposure, pval.exposure,
#     beta.outcome, se.outcome, pval.outcome,
#     eaf.exposure, eaf.outcome,
#     id.exposure, exposure,
#     id.outcome, outcome,
#     samplesize.exposure, samplesize.outcome,
#     mr_keep, action,
#     F_stat,
#     steiger_dir, steiger_pval
#   )

# Step 3.0: Perform Many MR Analyses at Once
mr_result <- mr_wrapper(clumped_dat)

# Extract the nested list components from mr_wrapper
estimates <- mr_result$`N-acetyltaurine.FinnGen BMI`$estimates
heterogeneity <- mr_result$`N-acetyltaurine.FinnGen BMI`$heterogeneity
directional_pleiotropy <- mr_result$`N-acetyltaurine.FinnGen BMI`$directional_pleiotropy
info <- mr_result$`N-acetyltaurine.FinnGen BMI`$info
snps_retained <- mr_result$`N-acetyltaurine.FinnGen BMI`$snps_retained

# Print the structure to confirm the data is there
print(estimates)
print(heterogeneity)
print(directional_pleiotropy)
print(info)
print(snps_retained)

# Step 3: Perform Selected MR Analyses
# 3.1: Inverse Variance Weighted (IVW)
ivw_result <- mr(clumped_dat, method_list = "mr_ivw")

# 3.2: MR-Egger
egger_result <- mr(clumped_dat, method_list = "mr_egger_regression")

# 3.3: Weighted Median
weighted_median_result <- mr(clumped_dat, method_list = "mr_weighted_median")

# 3.4: MR-Lasso
mr_input <- mr_input(
  bx = clumped_dat$beta.exposure,
  bxse = clumped_dat$se.exposure,
  by = clumped_dat$beta.outcome,
  byse = clumped_dat$se.outcome,
  exposure = "N-acetyltaurine",
  outcome = "BMI",
  snps = clumped_dat$SNP
)
lasso_result <- tryCatch({
  mr_lasso(mr_input)
}, error = function(e) {
  cat("Error in MR-Lasso:", conditionMessage(e), "\n")
  NULL
})

# 3.5: Contamination Mixture (ConMix)
conmix_result <- tryCatch({
  mr_conmix(mr_input)
}, error = function(e) {
  cat("Error in MR-ConMix:", conditionMessage(e), "\n")
  NULL
})

# 3.6: Heterogeneity Test
heterogeneity_result <- mr_heterogeneity(clumped_dat)

# 3.7: Pleiotropy Test
pleiotropy_result <- mr_pleiotropy_test(clumped_dat)

# 3.8: Leave-One-Out Analysis
loo_result <- mr_leaveoneout(clumped_dat)

# Step 4: Perform Wald Ratio Tests for Each Instrument
wald_ratios <- clumped_dat %>%
  mutate(
    wald_beta = beta.outcome / beta.exposure,
    wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
    pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
    method = paste("Wald Ratio:", SNP)
  ) %>%
  dplyr::select(SNP, wald_beta, wald_se, pval, method)

cat("\n=== Wald Ratio Tests for Each Instrument ===\n")
print(wald_ratios)

# Step 5: Prepare Data for Forest Plot
ivw_df <- ivw_result %>% mutate(method = "IVW")
egger_df <- egger_result %>% mutate(method = "MR-Egger")
weighted_median_df <- weighted_median_result %>% mutate(method = "Weighted Median")

lasso_df <- if (!is.null(lasso_result)) {
  data.frame(method = "MR-Lasso", b = lasso_result@Estimate, 
             se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
  NULL
}

conmix_se <- if (!is.null(conmix_result)) {
  (conmix_result@CIUpper - conmix_result@CILower) / (2 * 1.96)  # SE = (CIUpper - CILower) / (2 * 1.96) for 95% CI
} else {
  NA
}

conmix_df <- if (!is.null(conmix_result)) {
  data.frame(method = "MR-ConMix", b = conmix_result@Estimate, 
             se = conmix_se, pval = conmix_result@Pvalue)
} else {
  NULL
}

mr_results <- bind_rows(
  ivw_df,
  egger_df,
  weighted_median_df,
  lasso_df,
  conmix_df,
  wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
  mutate(method = as.factor(method)) %>%
  filter(!is.na(method) & method != "NA") %>%
  arrange(b) %>%
  mutate(method = factor(method, levels = unique(method)))

# Step 6: Create Beautiful Forest Plot with Dynamic Colors (No Numbers)
base_colors <- c(
  "IVW" = "#1F78B4",
  "MR-Egger" = "#FF7F00",
  "Weighted Median" = "#33A02C",
  "MR-Lasso" = "#FB9A99",
  "MR-ConMix" = "#E41A1C"
)

wald_methods <- unique(wald_ratios$method)
n_wald <- length(wald_methods)
if (n_wald > 0) {
  wald_colors <- hue_pal()(n_wald)
  names(wald_colors) <- wald_methods
} else {
  wald_colors <- NULL
}

color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]

# Dynamically determine x-axis limits based on confidence intervals
x_min <- min(mr_results$b - 1.96 * mr_results$se, na.rm = TRUE)
x_max <- max(mr_results$b + 1.96 * mr_results$se, na.rm = TRUE)

forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  # Plot error bars FIRST to prevent them from being hidden
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), 
                 height = 0.2, na.rm = TRUE, linewidth = 0.8) +
  # Plot points over error bars
  geom_point(size = 3) +
  # Add reference line at zero
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  # Labels and title
  labs(
    title = "Mendelian Randomization Estimates:\n N-acetyltaurine (1MB around PTER TSS)\n on FinnGen BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  # Styling
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  # Adjust x-limits dynamically based on data
  xlim(x_min - 0.02, x_max + 0.02)

# Print plot to check output
print(forest_plot)

# Step 7.0: Save Results and Plot
today <- Sys.Date()
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/finngen_rep_P06/"
excel_file <- paste0(results_dir, "MR_Results_PTER_FinnGen_BMI_", today, ".xlsx")
plot_file <- paste0(results_dir, "MR_Forestplot_PTER_FinnGen_BMI_", today, ".png")

if (!dir.exists(results_dir)) {
  dir.create(results_dir, recursive = TRUE)
  cat("Created directory:", results_dir, "\n")
}

# Step 7.0.1: Save Same Plot but Ordered by Methods
# Generate the reordered forest plot by method

# Dynamically determine x-axis limits based on confidence intervals
x_min <- min(mr_results$b - 1.96 * mr_results$se, na.rm = TRUE)
x_max <- max(mr_results$b + 1.96 * mr_results$se, na.rm = TRUE)

# Explicitly set the correct order by method name
desired_order <- c(
  "IVW",
  "MR-Egger",
  "MR-Lasso",
  "MR-ConMix",
  "Weighted Median",
  "Wald Ratio: rs117372132",
  "Wald Ratio: rs142238737",
  "Wald Ratio: rs3802555",
  "Wald Ratio: rs4747286",
  "Wald Ratio: rs7075357"
)

# Force the order in the dataset itself
mr_results$method <- factor(mr_results$method, levels = desired_order)

# Sort the dataset based on the new factor levels BEFORE plotting
mr_results <- mr_results[order(mr_results$method), ]

forest_plot_by_method <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  # Plot error bars FIRST to prevent them from being hidden
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), 
                 height = 0.2, na.rm = TRUE, linewidth = 0.8) +
  # Plot points over error bars
  geom_point(size = 3) +
  # Add reference line at zero
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  # Labels and title
  labs(
    title = "Mendelian Randomization Estimates:\n N-acetyltaurine (1MB around PTER TSS)\non FinnGen BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  # Styling
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  # Adjust x-limits dynamically based on data
  xlim(x_min - 0.02, x_max + 0.02)

# Print plot to check output
print(forest_plot_by_method)

# Save the reordered plot
plot_file_by_method <- paste0(results_dir, "MR_Forestplot_PTER_FinnGen_BMI_By_Method_", today, ".png")
ggsave(plot_file_by_method, plot = forest_plot_by_method, dpi = 300, width = 8, height = 5)

# 7.1: Save Everything in One Excel Spreadsheet
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")

add_formatted_sheet <- function(wb, sheet_name, data, title) {
  addWorksheet(wb, sheet_name)
  writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
  writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
  mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
  setRowHeights(wb, sheet_name, rows = 1, heights = 20)
  setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
  addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}

# TOC with all sections
toc_data <- data.frame(
  Sheet = c("Estimates", "Heterogeneity", "Directional_Pleiotropy", "Info", "SNPs_Retained",
            "IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix", 
            "Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
            "Instruments", "Clumped_Data"),
  Title = c("MR Causal Estimates (mr_wrapper)", "Heterogeneity Results (mr_wrapper)", 
            "Directional Pleiotropy Results (mr_wrapper)", "Summary Info (mr_wrapper)", 
            "SNPs Retained (mr_wrapper)",
            "Inverse Variance Weighted (IVW) Results", "MR-Egger Results", 
            "Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
            "Wald Ratio Tests for Each Instrument", "Heterogeneity Test Results", 
            "Pleiotropy Test Results", "Leave-One-Out Analysis Results",
            "Filtered Instruments Data", "Clumped SNPs Data")
)
toc_data <- toc_data[complete.cases(toc_data), ]  # Remove any NA rows (e.g., if lasso/conmix are NULL)

addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")

# Add all sheets
add_formatted_sheet(wb, "Estimates", estimates, "MR Causal Estimates for N-acetyltaurine on BMI (mr_wrapper)")
add_formatted_sheet(wb, "Heterogeneity", heterogeneity, "Heterogeneity Results (mr_wrapper)")
add_formatted_sheet(wb, "Directional_Pleiotropy", directional_pleiotropy, "Directional Pleiotropy Results (Egger Intercept, mr_wrapper)")
add_formatted_sheet(wb, "Info", info, "Summary Information and Diagnostics (mr_wrapper)")
add_formatted_sheet(wb, "SNPs_Retained", snps_retained, "SNPs Retained After Filtering (mr_wrapper)")

add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")

if (!is.null(lasso_result)) {
  lasso_df <- data.frame(
    Exposure = lasso_result@Exposure,
    Outcome = lasso_result@Outcome,
    Estimate = lasso_result@Estimate,
    StdError = lasso_result@StdError,
    CILower = lasso_result@CILower,
    CIUpper = lasso_result@CIUpper,
    Pvalue = lasso_result@Pvalue,
    SNPs = lasso_result@SNPs,
    Valid = lasso_result@Valid,
    ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
    RegEstimate = lasso_result@RegEstimate,
    RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
    Lambda = lasso_result@Lambda
  )
  add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
} else {
  addWorksheet(wb, "MR-Lasso")
  writeData(wb, "MR-Lasso", "MR-Lasso Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-Lasso", title_style, rows = 1, cols = 1)
}

if (!is.null(conmix_result)) {
  conmix_df <- data.frame(
    Exposure = conmix_result@Exposure,
    Outcome = conmix_result@Outcome,
    Estimate = conmix_result@Estimate,
    Pvalue = conmix_result@Pvalue,
    SNPs = conmix_result@SNPs,
    Psi = conmix_result@Psi,
    CILower = conmix_result@CILower,
    CIUpper = conmix_result@CIUpper,
    CIRange = paste(conmix_result@CIRange, collapse = ", "),
    CIMin = conmix_result@CIMin,
    CIMax = conmix_result@CIMax,
    CIStep = conmix_result@CIStep,
    Valid = paste(conmix_result@Valid, collapse = ", "),
    ValidSNPs = paste(conmix_result@ValidSNPs, collapse = ", "),
    Alpha = conmix_result@Alpha
  )
  add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
} else {
  addWorksheet(wb, "MR-ConMix")
  writeData(wb, "MR-ConMix", "MR-ConMix Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-ConMix", title_style, rows = 1, cols = 1)
}

add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis Results")
add_formatted_sheet(wb, "Instruments", instruments_subset, "Filtered Instruments Data")
add_formatted_sheet(wb, "Clumped_Data", clumped_dat, "Clumped SNPs Data")

saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved to:", excel_file, "\n")

# 7.2: Save Forest Plot
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat("Forest plot saved to:", plot_file, "\n")

9 FinnGen Replication COJO (Spoiler: Also NULL)

rm(list = ls())

# Load Required Libraries
library(TwoSampleMR)          # Core MR analyses
library(MendelianRandomization) # MR-Lasso and ConMix
library(ieugwasr)             # LD clumping
library(MRInstruments)        # Proxy SNP lookup
library(dplyr)                # Data manipulation
library(tidyr)                # Data reshaping
library(data.table)           # Fast data handling
library(readxl)               # Read Excel
library(openxlsx)             # Write formatted Excel
library(ggplot2)              # Plotting
library(ggrepel)              # Text labeling in plots
library(corrplot)             # Correlation matrices
library(RhpcBLASctl)          # Multithreading control
library(biomaRt)              # Gene annotation
library(scales)               # Color generation
library(pheatmap)             # Purrty heatmap

# Set Working Directory & Setup Folder
setwd("/Users/charleenadams/temp_BI/mr_nat_pter_bmi")
if (!dir.exists("cojo_finngen")) dir.create("cojo_finngen", recursive = TRUE)

# ---------------------------------------------
# MR with COJO Analysis
# ---------------------------------------------

# Start with the same COJO instruments from the Jurgens MR analysis

# Read COJO results
cojo_results <- fread("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo_output.jma.cojo") %>%
  dplyr::select(SNP, CHR = Chr, BP = bp, bJ, bJ_se, pJ)

# Step 1: Filter Instruments
dat_filtered <- read.csv("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/filtered_PTER_1Mb_finngen_BMI_2025-02-19.csv")
instruments <- dat_filtered %>%
  filter(mr_keep == TRUE, pval.exposure < 5e-6) %>%  # Relaxed threshold
  mutate(
    rsid = SNP,
    pval = pval.exposure,
    id = "N-acetyltaurine (1MB around PTER TSS)",
    F_stat = (beta.exposure / se.exposure)^2  # Compute F-statistic
  ) %>%
  filter(F_stat > 10)  # Exclude weak instruments

cat("Number of instruments after F-stat filtering:", nrow(instruments), "\n")

# Step 1.5: Steiger Filtering
instruments <- steiger_filtering(instruments)
instruments <- instruments %>% filter(steiger_dir == TRUE)  # Keep only SNPs with correct direction
cat("Number of instruments after Steiger filtering:", nrow(instruments), "\n")
cat("Preview of Steiger-filtered instruments:\n")
print(head(instruments))

# Merge COJO results back with instruments to get full MR data
instruments_cojo <- instruments %>%
  inner_join(cojo_results, by = "SNP") %>%
  mutate(beta.exposure = bJ,       # Update beta with conditional estimate
         se.exposure = bJ_se,      # Update SE with conditional estimate
         pval.exposure = pJ)       # Update p-value with conditional estimate

cat("Number of conditionally independent instruments after COJO:", nrow(instruments_cojo), "\n")
cat("Preview of COJO-selected instruments:\n")
print(head(instruments_cojo))

# Subset instruments to keep only TwoSampleMR, F-stat, and Steiger fields
instruments_subset <- instruments_cojo %>%
  dplyr::select(
    SNP, 
    effect_allele.exposure, other_allele.exposure,
    effect_allele.outcome, other_allele.outcome,
    beta.exposure, se.exposure, pval.exposure,
    beta.outcome, se.outcome, pval.outcome,
    eaf.exposure, eaf.outcome,
    id.exposure, exposure,
    id.outcome, outcome,
    samplesize.exposure, samplesize.outcome,
    mr_keep, action,
    F_stat,
    steiger_dir, steiger_pval
  )

# Step 4: Perform MR with COJO-Selected Instruments
mr_result <- mr_wrapper(instruments_cojo)
estimates <- mr_result$`N-acetyltaurine.FinnGen BMI`$estimates
heterogeneity <- mr_result$`N-acetyltaurine.FinnGen BMI`$heterogeneity
directional_pleiotropy <- mr_result$`N-acetyltaurine.FinnGen BMI`$directional_pleiotropy
info <- mr_result$`N-acetyltaurine.FinnGen BMI`$info
snps_retained <- mr_result$`N-acetyltaurine.FinnGen BMI`$snps_retained

# Print the structure to confirm the data is there
print(estimates)
print(heterogeneity)
print(directional_pleiotropy)
print(info)
print(snps_retained)

# Individual MR Analyses with COJO Instruments
ivw_result <- mr(instruments_cojo, method_list = "mr_ivw")
egger_result <- mr(instruments_cojo, method_list = "mr_egger_regression")
weighted_median_result <- mr(instruments_cojo, method_list = "mr_weighted_median")

mr_input <- mr_input(
  bx = instruments_cojo$beta.exposure,
  bxse = instruments_cojo$se.exposure,
  by = instruments_cojo$beta.outcome,
  byse = instruments_cojo$se.outcome,
  exposure = "N-acetyltaurine",
  outcome = "BMI",
  snps = instruments_cojo$SNP
)
lasso_result <- tryCatch({
  mr_lasso(mr_input)
}, error = function(e) {
  cat("Error in MR-Lasso:", conditionMessage(e), "\n")
  NULL
})
conmix_result <- tryCatch({
  mr_conmix(mr_input)
}, error = function(e) {
  cat("Error in MR-ConMix:", conditionMessage(e), "\n")
  NULL
})

heterogeneity_result <- mr_heterogeneity(instruments_cojo)
pleiotropy_result <- mr_pleiotropy_test(instruments_cojo)
loo_result <- mr_leaveoneout(instruments_cojo)

wald_ratios <- instruments_cojo %>%
  mutate(
    wald_beta = beta.outcome / beta.exposure,
    wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
    pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
    method = paste("Wald Ratio:", SNP)
  ) %>%
  dplyr::select(SNP, wald_beta, wald_se, pval, method)

cat("\n=== Wald Ratio Tests for COJO Instruments ===\n")
print(wald_ratios)

# Step 5: Prepare Data for Forest Plot
ivw_df <- ivw_result %>% mutate(method = "IVW")
egger_df <- egger_result %>% mutate(method = "MR-Egger")
weighted_median_df <- weighted_median_result %>% mutate(method = "Weighted Median")
lasso_df <- if (!is.null(lasso_result)) {
  data.frame(method = "MR-Lasso", b = lasso_result@Estimate, 
             se = lasso_result@StdError, pval = lasso_result@Pvalue)
} else {
  NULL
}
conmix_se <- if (!is.null(conmix_result)) {
  (conmix_result@CIUpper - conmix_result@CILower) / (2 * 1.96)
} else {
  NA
}
conmix_df <- if (!is.null(conmix_result)) {
  data.frame(method = "MR-ConMix", b = conmix_result@Estimate, 
             se = conmix_se, pval = conmix_result@Pvalue)
} else {
  NULL
}

mr_results <- bind_rows(
  ivw_df,
  egger_df,
  weighted_median_df,
  lasso_df,
  conmix_df,
  wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
  mutate(method = as.factor(method)) %>%
  filter(!is.na(method) & method != "NA") %>%
  arrange(b) %>%
  mutate(method = factor(method, levels = unique(method)))

# Step 6: Create Forest Plot
base_colors <- c(
  "IVW" = "#1F78B4",
  "MR-Egger" = "#FF7F00",
  "Weighted Median" = "#33A02C",
  "MR-Lasso" = "#FB9A99",
  "MR-ConMix" = "#E41A1C"
)
wald_methods <- unique(wald_ratios$method)
n_wald <- length(wald_methods)
if (n_wald > 0) {
  wald_colors <- hue_pal()(n_wald)
  names(wald_colors) <- wald_methods
} else {
  wald_colors <- NULL
}
color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]

forest_plot <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  labs(
    title = "Mendelian Randomization Estimates (COJO):\n N-acetyltaurine (1MB around PTER TSS)\non FinnGen BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  xlim(-0.15, 0.05)

# Step 7: Save Results and Plot
today <- Sys.Date()
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo_finngen/"
excel_file <- paste0(results_dir, "MR_Results_PTER_FinnGen_BMI_COJO_", today, ".xlsx")
plot_file <- paste0(results_dir, "MR_Forestplot_PTER_FinnGen_BMI_COJO_", today, ".png")

if (!dir.exists(results_dir)) {
  dir.create(results_dir, recursive = TRUE)
  cat("Created directory:", results_dir, "\n")
}

# 7.0.01: Ordered

# Explicitly set the correct order by method name
desired_order <- c(
  "IVW",
  "MR-Egger",
  "MR-Lasso",
  "MR-ConMix",
  "Weighted Median",
  "Wald Ratio: rs117110974",
  "Wald Ratio: rs117372132",
  "Wald Ratio: rs142238737",
  "Wald Ratio: rs45485296",
  "Wald Ratio: rs7084722",
  "Wald Ratio: rs1023275",
  "Wald Ratio: rs61844133"
)

# Force the order
mr_results$method <- factor(mr_results$method, levels = desired_order, ordered = TRUE)
color_list <- c(base_colors, wald_colors)
available_methods <- unique(mr_results$method)
color_list <- color_list[names(color_list) %in% available_methods]

forest_plot_by_method <- ggplot(mr_results, aes(x = b, y = method, color = method)) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = b - 1.96 * se, xmax = b + 1.96 * se), height = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
  labs(
    title = "Mendelian Randomization Estimates (COJO):\n N-acetyltaurine (1MB around PTER TSS)\non FinnGen BMI",
    x = "Causal Effect (Beta)",
    y = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    panel.grid.major = element_line(color = "grey90"),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  ) +
  scale_color_manual(values = color_list) +
  xlim(-0.15, 0.05)

# Save the reordered plot
results_dir <- "/Users/charleenadams/temp_BI/mr_nat_pter_bmi/cojo_finngen/"
plot_file_by_method <- paste0(results_dir, "MR_Forestplot_PTER_COJO_finngen_BMI_By_Method_", today, ".png")
ggsave(plot_file_by_method, plot = forest_plot_by_method, dpi = 300, width = 8, height = 5)

# 7.1: Save Everything in One Excel Spreadsheet
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")

add_formatted_sheet <- function(wb, sheet_name, data, title) {
  addWorksheet(wb, sheet_name)
  writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
  writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
  mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
  setRowHeights(wb, sheet_name, rows = 1, heights = 20)
  setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
  addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}

toc_data <- data.frame(
  Sheet = c("Estimates", "Heterogeneity", "Directional_Pleiotropy", "Info", "SNPs_Retained",
            "IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix", 
            "Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
            "Instruments", "Clumped_Data", "COJO_Results"),
  Title = c("MR Causal Estimates (mr_wrapper)", "Heterogeneity Results (mr_wrapper)", 
            "Directional Pleiotropy Results (mr_wrapper)", "Summary Info (mr_wrapper)", 
            "SNPs Retained (mr_wrapper)",
            "Inverse Variance Weighted (IVW) Results", "MR-Egger Results", 
            "Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
            "Wald Ratio Tests for Each Instrument", "Heterogeneity Test Results", 
            "Pleiotropy Test Results", "Leave-One-Out Analysis Results",
            "Filtered Instruments Data", "Clumped SNPs Data", "COJO Conditionally Independent SNPs")
)
toc_data <- toc_data[complete.cases(toc_data), ]

addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")

add_formatted_sheet(wb, "Estimates", estimates, "MR Causal Estimates for N-acetyltaurine on BMI (mr_wrapper)")
add_formatted_sheet(wb, "Heterogeneity", heterogeneity, "Heterogeneity Results (mr_wrapper)")
add_formatted_sheet(wb, "Directional_Pleiotropy", directional_pleiotropy, "Directional Pleiotropy Results (Egger Intercept, mr_wrapper)")
add_formatted_sheet(wb, "Info", info, "Summary Information and Diagnostics (mr_wrapper)")
add_formatted_sheet(wb, "SNPs_Retained", snps_retained, "SNPs Retained After Filtering (mr_wrapper)")
add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted (IVW) Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")
if (!is.null(lasso_result)) {
  lasso_df <- data.frame(
    Exposure = lasso_result@Exposure,
    Outcome = lasso_result@Outcome,
    Estimate = lasso_result@Estimate,
    StdError = lasso_result@StdError,
    CILower = lasso_result@CILower,
    CIUpper = lasso_result@CIUpper,
    Pvalue = lasso_result@Pvalue,
    SNPs = lasso_result@SNPs,
    Valid = lasso_result@Valid,
    ValidSNPs = if (length(lasso_result@ValidSNPs) > 0) paste(lasso_result@ValidSNPs, collapse = ", ") else "None",
    RegEstimate = lasso_result@RegEstimate,
    RegIntercept = paste(lasso_result@RegIntercept, collapse = ", "),
    Lambda = lasso_result@Lambda
  )
  add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
} else {
  addWorksheet(wb, "MR-Lasso")
  writeData(wb, "MR-Lasso", "MR-Lasso Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-Lasso", title_style, rows = 1, cols = 1)
}
if (!is.null(conmix_result)) {
  conmix_df <- data.frame(
    Exposure = conmix_result@Exposure,
    Outcome = conmix_result@Outcome,
    Estimate = conmix_result@Estimate,
    Pvalue = conmix_result@Pvalue,
    SNPs = conmix_result@SNPs,
    Psi = conmix_result@Psi,
    CILower = conmix_result@CILower,
    CIUpper = conmix_result@CIUpper,
    CIRange = paste(conmix_result@CIRange, collapse = ", "),
    CIMin = conmix_result@CIMin,
    CIMax = conmix_result@CIMax,
    CIStep = conmix_result@CIStep,
    Valid = paste(conmix_result@Valid, collapse = ", "),
    ValidSNPs = paste(conmix_result@ValidSNPs, collapse = ", "),
    Alpha = conmix_result@Alpha
  )
  add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
} else {
  addWorksheet(wb, "MR-ConMix")
  writeData(wb, "MR-ConMix", "MR-ConMix Analysis Failed", startRow = 1, startCol = 1)
  addStyle(wb, "MR-ConMix", title_style, rows = 1, cols = 1)
}
add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests for Each Instrument")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis Results")
add_formatted_sheet(wb, "Instruments", instruments, "Filtered Instruments Data")
#add_formatted_sheet(wb, "Clumped_Data", clumped_dat_subset, "Clumped SNPs Data")
add_formatted_sheet(wb, "COJO_Results", cojo_results, "COJO Conditionally Independent SNPs")

saveWorkbook(wb, excel_file, overwrite = TRUE)
cat("Excel file saved to:", excel_file, "\n")

# 7.2: Save Forest Plot
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat("Forest plot saved to:", plot_file, "\n")

10 Full and Downloadable Results

11 Code for Results App

Execute with: shiny::runApp(“/Users/charleenadams/temp_BI/mr_nat_pter_bmi/mr_nat_pter_bmi_app”)

tree /Users/charleenadams/temp_BI/mr_nat_pter_bmi/mr_nat_pter_bmi_app/
/Users/charleenadams/temp_BI/mr_nat_pter_bmi/mr_nat_pter_bmi_app/
├── app.R
├── cojo_output.cma.cojo
├── cojo_output.jma.cojo
├── cojo_output.ldr.cojo
├── cojo_output.log
├── custom_ld_heatmap.png
├── directory_structure.txt
├── rsconnect
│   └── shinyapps.io
│       └── yodamendel
│           └── mr_nat_pter_bmi_app.dcf
└── www
    ├── BIDMC_HMS_Stacked-LockUp.png
    ├── cojo
    │   ├── MR_Forestplot_PTER_COJO_Jurgens_BMI_By_Method_2025-03-01.png
    │   ├── MR_Forestplot_PTER_Jurgens_BMI_COJO_2025-02-20.png
    │   └── MR_Results_PTER_Jurgens_BMI_COJO_2025-02-20.xlsx
    ├── cojo_finngen
    │   ├── MR_Forestplot_PTER_COJO_finngen_BMI_By_Method_2025-03-01.png
    │   ├── MR_Forestplot_PTER_FinnGen_BMI_COJO_2025-03-01.png
    │   └── MR_Results_PTER_FinnGen_BMI_COJO_2025-03-01.xlsx
    ├── finngen_rep_P06
    │   ├── MR_Forestplot_PTER_FinnGen_BMI_2025-03-01.png
    │   ├── MR_Forestplot_PTER_FinnGen_BMI_By_Method_2025-03-01.png
    │   └── MR_Results_PTER_FinnGen_BMI_2025-03-01.xlsx
    ├── hy-w587486.gif
    ├── mr_nat_pter_bmi.Rmd
    ├── nat2.png
    └── results_expanded_p5E6
        ├── MR_Forestplot_PTER_Jurgens_BMI_2025-02-20.png
        ├── MR_Forestplot_PTER_Jurgens_BMI_By_Method_2025-03-01.png
        └── MR_Results_PTER_Jurgens_BMI_2025-02-20.xlsx

*Excludes `.renvignore` (invisible file to prevent rendering the .Rmd)

# Run with:
# shiny::runApp("/Users/charleenadams/temp_BI/mr_nat_pter_bmi/mr_nat_pter_bmi_app")
# rsconnect::deployApp('/Users/charleenadams/temp_BI/mr_nat_pter_bmi/mr_nat_pter_bmi_app')

library(shiny)

# Define file paths relative to www/
file1 <- "www/results_expanded_p5E6/MR_Results_PTER_Jurgens_BMI_2025-02-20.xlsx"
file2 <- "www/cojo/MR_Results_PTER_Jurgens_BMI_COJO_2025-02-20.xlsx"
file3 <- "www/finngen_rep_P06/MR_Results_PTER_FinnGen_BMI_2025-03-01.xlsx"
file4 <- "www/cojo_finngen/MR_Results_PTER_FinnGen_BMI_COJO_2025-03-01.xlsx"
plot1 <- "www/results_expanded_p5E6/MR_Forestplot_PTER_Jurgens_BMI_By_Method_2025-03-01.png"
plot2 <- "www/cojo/MR_Forestplot_PTER_COJO_Jurgens_BMI_By_Method_2025-03-01.png"
plot3 <- "www/finngen_rep_P06/MR_Forestplot_PTER_FinnGen_BMI_By_Method_2025-03-01.png"
plot4 <- "www/cojo_finngen/MR_Forestplot_PTER_COJO_finngen_BMI_By_Method_2025-03-01.png"
rmd_file <- "www/mr_nat_pter_bmi.Rmd"  # Rmd file for download

# UI
ui <- fluidPage(
  tags$head(
    tags$style(HTML("
      #logo_container {
        text-align: center;
        margin-top: 20px;
        margin-bottom: 20px;
        opacity: 0;
        transition: opacity 0.5s ease-in-out;
      }
      #logo_container.visible {
        opacity: 1;
      }
    ")),
    tags$script(HTML("
      $(document).on('shiny:connected', function() {
        var natImage = $('#nat_image');
        var logoContainer = $('#logo_container');
        
        $(window).on('scroll', function() {
          var scrollPosition = $(window).scrollTop() + $(window).height();
          var natBottom = natImage.offset().top + natImage.outerHeight();
          
          if (scrollPosition > natBottom) {
            logoContainer.addClass('visible');
          } else {
            logoContainer.removeClass('visible');
          }
        });
      });
    "))
  ),
  
  div(style = "text-align: center; margin-bottom: 20px;",
      h1("Mendelian Randomization Results", style = "color: #A94800;"),
      h4(HTML("Analysis of <i>N</i>-acetyltaurine (1MB around <i>PTER</i> TSS) on BMI"), style = "color: #A94800;"),
      tags$a(href = "https://www.bidmc.org/research/research-by-department/medicine/cardiovascular-medicine/personal-genomics-and-cardiometabolic-disease",
             target = "_blank",
             "Our Lab",
             style = "font-size: 18px; color: #A94800; text-decoration: none; font-weight: bold;")
  ),
  
  div(style = "background-color: #FFF3E0; padding: 20px; border-radius: 10px; margin-bottom: 20px; border: 1px solid #D2691E;",
      h3(HTML("The Story of <i>N</i>-acetyltaurine"), style = "color: #D2691E; text-align: center; margin-bottom: 20px;"),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("Taurine—your body makes it, but not always enough, which is why it’s in energy drinks and why your cat would straight-up die without it. It’s one of the most abundant amino acids in humans, posted up in your brain, muscles, and liver like it owns the place. But taurine doesn’t just sit there—it has an entire metabolic network, and among its lesser-known metabolites is <i>N</i>-acetyltaurine (NAT), quietly minding its business.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        "Until now."),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("Turns out, NAT isn’t just some metabolic footnote. Its levels fluctuate with exercise, diet, and—because of course—alcohol consumption. And the body has a dedicated enzyme, PTER, (phosphotriesterase-related), whose entire job is to break it down.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("In mice (<a href='https://www.nature.com/articles/s41586-024-07801-6' target='_blank' style='color: #A94800; text-decoration: underline;'>Nature, 2024</a>), PTER hangs out in the kidney, liver, and brainstem, converting NAT back into taurine and acetate. But when researchers knocked out <i>Pter</i> things got weird. These mice had sky-high NAT levels, ate less, dodged obesity, and handled glucose like metabolic overachievers. Then researchers gave extra NAT to obese mice, and—same deal—they ate less and lost weight.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("So what’s the takeaway? NAT might be a major player in appetite and energy balance, with PTER, acting as the bouncer. Meaning, science may one day figure out how to tell your body, “You’re full. Step away from the fridge.”")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        "But that’s mice."),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        "Now, enter humans and Mendelian Randomization—the genetics truth serum."),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("Say some wellness influencer swears turmeric lattes melted their belly fat. “Studies show!” they chirp. You roll your eyes. Correlation is not causation, but MR is here to fix that. It’s basically a genetic lie detector, using DNA variants—randomly assigned at birth—to figure out whether something actually affects weight, or if it’s just some rich person’s organic grocery bill talking.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("Genes are set in stone before life muddies the waters with gym habits and late-night stress eating. So MR locks onto these genetic markers—SNPs near a metabolite’s hotspot—and runs the numbers. It’s essentially a natural randomized trial. No sketchy confounding factors—just the hard truths written in your DNA.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        "Now, let’s put it to work on NAT and BMI."),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("This MR showdown focuses on NAT using genetic instruments from a 1 Mb region around the <i>PTER</i> gene’s transcription start site (TSS) to see if NAT actually influences BMI. The dataset? A Finnish cohort called Metabolic Syndrome in Men (METSIM)—6,099 men who probably sweat out their problems in saunas—paired with BMI data from 460,000 Brits in the UK Biobank, a sample size bigger than a royal wedding guest list. Since the BMI data was originally in GRCh37, we gave it a power liftOver to GRCh38—a genome upgrade, if you will.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        "The MR pipeline was tight:"),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("Data was cleaned up in R and Python, harmonized through TwoSampleMR to ditch flaky SNPs, then processed using IVW, MR-Egger, Weighted Median, MR-Lasso, and ConMix. A COJO step ensured only independent SNPs made the cut. Graphs? Forest plots in 600 DPI ggplot2 excellence, because science should look good.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        "And the results? NAT’s genetic signals suggest it’s taking swings at BMI—and winning."),
      tags$ul(
        tags$li("IVW lands at -0.016 (p = 0.012), meaning higher NAT levels (as determined by SNPs) are linked to lower BMI.", 
                style = "font-size: 16px; line-height: 1.6; color: #333333;"),
        tags$li("Weighted Median flexes at -0.021 (p = 0.001).", 
                style = "font-size: 16px; line-height: 1.6; color: #333333;"),
        tags$li("MR-ConMix comes in hot at -0.033 (p = 0.016).", 
                style = "font-size: 16px; line-height: 1.6; color: #333333;"),
        tags$li("MR-Lasso backs it up with -0.016 (p = 0.012).", 
                style = "font-size: 16px; line-height: 1.6; color: #333333;"),
        tags$li("MR-Egger, as usual, is a buzzkill at -0.002 (p = 0.892), but the bigger picture holds: NAT appears to be nudging BMI downward.", 
                style = "font-size: 16px; line-height: 1.6; color: #333333;")
      ),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("One caveat—heterogeneity is making some noise (Q = 13.3, p = 0.039), meaning the SNPs might be a little rowdy. But pleiotropy isn’t hijacking the analysis (MR-Egger intercept p = 0.320), and the robustness checks hold. Still, replication is a good idea since we didn't get any traction there.")),
      p(style = "text-align: justify; font-size: 16px; line-height: 1.6; color: #333333;",
        HTML("Bottom line? This MR study suggests NAT—and specifically the SNPs near <i>PTER</i> controlling it—might be playing a role in keeping BMI in check. Next time someone credits their turmeric latte with melting fat, you might just smile and think about NAT."))
  ),
  
  sidebarLayout(
    sidebarPanel(
      width = 3,
      style = "background-color: #FFE8D6; padding: 15px; border-radius: 5px;",
      h3("Instructions", style = "color: #A94800;"),
      p("Download the full results (Excel), visuals (PNG), and master script (.Rmd).", br(), br(),
        "Files will have today's date appended.", style = "color: #A94800;"),
      hr(),
      h3("Download Results", style = "color: #A94800;"),
      downloadButton("download_file1", "Jurgens MR Results"),
      br(), br(),
      downloadButton("download_file2", "Jurgens COJO MR Results"),
      br(), br(),
      downloadButton("download_file3", "FinnGen MR Results"),
      br(), br(),
      downloadButton("download_file4", "FinnGen COJO MR Results"),
      br(), br(),
      downloadButton("download_plot1", "Jurgens MR Forest"),
      br(), br(),
      downloadButton("download_plot2", "Jurgens COJO MR Forest"),
      br(), br(),
      downloadButton("download_plot3", "FinnGen MR Forest"),
      br(), br(),
      downloadButton("download_plot4", "FinnGen COJO MR Forest"),
      br(), br(),
      downloadButton("download_rmd", "Pipeline Script"),
      hr(),
      
      div(style = "border: 1px solid #D2691E; padding: 10px; background-color: #D2691E; border-radius: 5px;",
          strong("Objective", style = "color: #FFFFFF;"),
          p(HTML("This app breaks down MR analyses on <i>N</i>-acetyltaurine (1MB around <i>PTER</i> TSS) and BMI (<a href='https://pmc.ncbi.nlm.nih.gov/articles/PMC11078202/#S15' target='_blank' style='color: #87CEEB; text-decoration: underline;'>Jurgens et al., 2023</a>). The goal? Make it clear and fun enough that Mom (or at least a non-MR scientist) doesn’t glaze over."), 
            style = "color: #FFFFFF;")
      )
    ),
    
    mainPanel(
      width = 9,
      h3("Key Findings from MR Analyses", style = "color: #A94800;"),
      
      h4("What We Found", style = "margin-top: 20px; color: #A94800;"),
      p(HTML("Our Mendelian Randomization (MR) analyses investigated the causal relationship between <i>N</i>-acetyltaurine levels (1MB surrounding the *PTER* transcription start site, TSS) and BMI using data from Jurgens et al. (2022) (n=460,000; UK Biobank summary statistics). We employed two strategies:"), 
        style = "color: #333333;"),
      tags$ul(
        tags$li(strong("Suite of MR Approaches:"), 
                "Used a relaxed p-value threshold (5E-6) and a 500 kb clumping window to select instruments. Multiple MR methods (e.g., IVW, MR-Egger, Weighted Median, MR-Lasso, MR-ConMix) were applied to address horizontal pleiotropy and ensure robustness.", 
                style = "color: #333333;"),
        tags$li(strong("MR with COJO Instruments:"), 
                "Utilized conditional joint analysis (COJO) with a relaxed p-value (5E-6) to identify conditionally independent instruments, followed by the same suite of MR methods.", 
                style = "color: #333333;"),
        tags$li(strong("Attempted Replication:"), 
                "We used the same instruments and methods with the FinnGen BMI summary data (n=500,348) as the outcome data source.", 
                style = "color: #333333;")
      ),
      
      p("Key observations from the analyses:", style = "color: #333333;"),
      tags$ul(
        tags$li(HTML("Both analyses suggest a <strong>negative causal effect</strong> of <i>N</i>-acetyltaurine on BMI, with some (but not all) estimates indicating a reduction in BMI associated with higher <i>N</i>-acetyltaurine levels."), 
                style = "color: #333333;"),
        tags$li("Our replication analysis using FinnGen, however, did not confirm the original findings. While I ensured the same SNPs were used, the lack of significance appears to stem from differences in how these variants influence BMI within the FinnGen population.", 
                style = "color: #333333;")
      ),
      
      div(style = "text-align: center; margin-top: 20px;",
          div(style = "margin-bottom: 2px;",
              p(HTML("Tap <i>N</i>-acetyltaurine to see methods and pipeline"), 
                style = "color: #A94800; font-size: 24px; font-weight: bold; text-decoration: none;")
          ),
          tags$a(href = "https://rpubs.com/YodaMendel/1274627", target = "_blank",
                 img(id = "nat_image", src = "nat2.png", height = "400px", width = "400px"))
      ),
      
      div(id = "logo_container",
          tags$a(href = "https://www.bidmc.org/research/research-by-department/medicine/cardiovascular-medicine/personal-genomics-and-cardiometabolic-disease",
                 target = "_blank",
                 img(src = "BIDMC_HMS_Stacked-LockUp.png", height = "100px", width = "auto"))
      )
    )
  )
)

# Server
server <- function(input, output) {
  
  output$download_file1 <- downloadHandler(
    filename = function() { paste0("MR_Results_PTER_Jurgens_BMI_Suite_", Sys.Date(), ".xlsx") },
    content = function(file) { file.copy(file1, file) }
  )
  
  output$download_file2 <- downloadHandler(
    filename = function() { paste0("MR_Results_PTER_Jurgens_BMI_COJO_", Sys.Date(), ".xlsx") },
    content = function(file) { file.copy(file2, file) }
  )
  
  output$download_file3 <- downloadHandler(
    filename = function() { paste0("MR_Results_PTER_FinnGen_BMI_Suite_", Sys.Date(), ".xlsx") },
    content = function(file) { file.copy(file3, file) }
  )
  
  output$download_file4 <- downloadHandler(
    filename = function() { paste0("MR_Results_PTER_FinnGen_BMI_COJO_", Sys.Date(), ".xlsx") },
    content = function(file) { file.copy(file4, file) }
  )
  
  output$download_plot1 <- downloadHandler(
    filename = function() { paste0("MR_Forestplot_PTER_Jurgens_BMI_Suite_", Sys.Date(), ".png") },
    content = function(file) { file.copy(plot1, file) }
  )
  
  output$download_plot2 <- downloadHandler(
    filename = function() { paste0("MR_Forestplot_PTER_COJO_Jurgens_BMI_", Sys.Date(), ".png") },
    content = function(file) { file.copy(plot2, file) }
  )
  
  output$download_plot3 <- downloadHandler(
    filename = function() { paste0("MR_Forestplot_PTER_FinnGen_BMI_Suite_", Sys.Date(), ".png") },
    content = function(file) { file.copy(plot3, file) }
  )
  
  output$download_plot4 <- downloadHandler(
    filename = function() { paste0("MR_Forestplot_PTER_FinnGen_BMI_COJO_", Sys.Date(), ".png") },
    content = function(file) { file.copy(plot4, file) }
  )
  
  output$download_rmd <- downloadHandler(
    filename = function() { paste0("mr_nat_pter_bmi_", Sys.Date(), ".Rmd") },
    content = function(file) { file.copy(rmd_file, file) }
  )
}

# Run app
shinyApp(ui = ui, server = server)