We conducted a two-sample Mendelian Randomization (MR) analysis to evaluate the causal effect of selected UKB-PPP proteins (1 MB region surrounding their respective transcription start sites [TSSs]) on systolic blood pressure (SBP).
I had previously obtained the pQTL files (see https://rpubs.com/YodaMendel/1243451 for an example of how to programmatically get files from UKB-PPP). For obtaining the files and adding rsids, refer to:
- /Users/charleenadams/temp_BI/chemokine_rgs_olink/processed_ukbppp_chemokine_list/ukbpp_chemokines_40.Rmd
- https://rpubs.com/YodaMendel/1268255
- https://rpubs.com/YodaMendel/1243451
cd /Volumes/LaCie/tahir/mr_ukbppp_bp
wget https://gwas.mrcieu.ac.uk/files/ieu-b-38/ieu-b-38.vcf.gz
“Here, we report genome-wide discovery analyses of BP traits - systolic (SBP), diastolic (DBP) and pulse pressure (PP) - in people of European ancestry drawn from UK Biobank (UKB) and the International Consortium of Blood Pressure-Genome Wide Association Studies (ICBP). We adopted a combination of a one- and two-stage study design to test common and low-frequency single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) ≥ 1% associated with BP traits. In all, we studied over 1 million people of European descent, including replication data from the US Million Veterans Program (MVP, N=220,520) and the Estonian Genome Centre, University of Tartu (EGCUT, N=28,742) Biobank.”
For more details, refer to the full paper.
Step 1: Bash script calling bcftools to convert vcf to tsv and format the headers for TwoSampleMR
Dependencies:
• bcftools (for querying VCF)
• awk (for formatting)
• bgzip (for compressing the output)
run with: cd /Volumes/LaCie/tahir/mr_ukbppp_bp/scripts chmod +x format_ieu-b-38.vcf.sh nohup ./format_ieu-b-38.vcf.sh > format_ieu-b-38.vcf.log 2>&1 & ps -p 96562 -o stat,etime
# Step 1: Decompress ieu-b-38.vcf.gz without deleting the original
cd /Volumes/LaCie/tahir/mr_ukbppp_bp
gunzip -k ieu-b-38.vcf.gz
# Step 2: Convert the decompressed .vcf file to a .tsv format and calculate p-values from logp
outfile="ieu-b-38_formatted.tsv"
# Write header
echo -e "CHR\tBP\tSNP\tREF\tALT\tEAF\tBETA\tSE\tLP\tPVAL" > "$outfile"
# Extract fields and calculate pval
awk 'BEGIN{OFS="\t"}
!/^#/ {
split($10, a, ":");
beta = a[1];
se = a[2];
lp = a[3];
eaf = a[4];
pval = 10^(-lp);
print $1, $2, $3, $4, $5, eaf, beta, se, lp, pval
}' ieu-b-38.vcf >> "$outfile"
# Clean MR_bp_proteins.csv
# Output file name
outfile="MR_bp_proteins_cleaned.csv"
# Process:
# - Remove lines that exactly match "Protein" or "Assay" (case-sensitive)
# - Trim leading/trailing whitespace
# - Remove completely empty lines
# - Remove funky hidden characters (like ^M)
# Run
awk '{
gsub(/^[ \t]+|[ \t]+$/, ""); # Trim leading/trailing spaces
gsub(/\r/, ""); # Remove carriage returns if any
if ($0 != "" && $0 != "Protein" && $0 != "Assay") print $0;
}' MR_bp_proteins.csv > "$outfile"
echo "✅ Cleaned file saved as: $outfile"
cd /Volumes/LaCie/tahir/mr_ukbppp_bp/scripts
chmod +x copy_173_ukbppp.sh
./copy_173_ukbppp.sh
#!/bin/bash
# Source and destination directories
SRC_DIR="/Users/charleenadams/ukbppp/coloc_clin_mol_extracted"
DEST_DIR="/Volumes/LaCie/tahir/mr_ukbppp_bp/data_173"
PROTEIN_LIST="/Volumes/LaCie/tahir/mr_ukbppp_bp/MR_bp_proteins_cleaned.csv"
# Make sure the destination directory exists
mkdir -p "$DEST_DIR"
# Start the process
echo "🚀 Starting directory search and copy..."
# Loop through each protein name, skipping the header
tail -n +2 "$PROTEIN_LIST" | while IFS= read -r PROTEIN; do
echo "🔍 Searching for folders matching: *$PROTEIN*"
# Find matching directories
MATCHES=$(find "$SRC_DIR" -type d -name "*$PROTEIN*" 2>/dev/null)
if [[ -n "$MATCHES" ]]; then
echo "$MATCHES" | while read -r DIR; do
echo "✅ Copying $DIR to $DEST_DIR"
cp -R "$DIR" "$DEST_DIR"
done
else
echo "❌ NOT FOUND: No directory found for $PROTEIN"
fi
done
echo "🎉 All done!"
Look at how many copied over: 198 vs 173 because Olink has multiple proteins of same gene name.
find . -mindepth 1 -maxdepth 1 -type d | wc -l
conda env list
For Python3 to tackle merging, header delim issues, and LOG10P conversion
Step 1:
source ~/venvs/mr_env/bin/activate #created previously
cd /Volumes/LaCie/tahir/mr_ukbppp_bp/scripts
nano consolidate_chrs.py
chmod +x consolidate_chrs.py
nohup ./consolidate_chrs.py > consolidate_chrs.log 2>&1 &
#!/usr/bin/env python3
import os
import psutil # <-- Need this for RAM and CPU stats! Install via: pip install psutil
import pandas as pd
import numpy as np
import gzip
import multiprocessing
import time
# Base directory where the subdirectories exist
base_dir = "/Volumes/LaCie/tahir/mr_ukbppp_bp/data_173"
def auto_tune_cores(ram_per_process_gb=4, cpu_threshold=80):
"""
Dynamically calculate number of cores based on available RAM and CPU usage.
"""
# Total logical cores
total_cores = os.cpu_count()
# Get available RAM (in GB)
available_ram_gb = psutil.virtual_memory().available / (1024 ** 3)
# Estimate safe core usage based on RAM
cores_by_ram = int(available_ram_gb // ram_per_process_gb)
# CPU load (average over 1 minute)
cpu_load_percent = psutil.cpu_percent(interval=1)
# If CPU is already working hard, reduce the core count
if cpu_load_percent > cpu_threshold:
print(f"⚠️ CPU load is {cpu_load_percent}%. Throttling cores down.")
cores_by_cpu = max(1, total_cores // 4)
else:
cores_by_cpu = total_cores
# Take the safer limit
optimal_cores = max(1, min(cores_by_ram, cores_by_cpu, total_cores - 1))
print(f"🧠 Available RAM: {available_ram_gb:.2f} GB")
print(f"⚙️ CPU load: {cpu_load_percent:.2f}%")
print(f"🚀 Using {optimal_cores} out of {total_cores} cores.")
return optimal_cores
def process_subdir(subdir):
"""Function to process and merge files in a single subdirectory"""
subdir_path = os.path.join(base_dir, subdir)
if not os.path.isdir(subdir_path):
print(f"⛔ Skipping {subdir} (not a directory)")
return
merged_file_path = os.path.join(subdir_path, f"{subdir}_merged.tsv.gz")
if os.path.exists(merged_file_path):
print(f"✅ Already merged: {merged_file_path}")
return
merged_data = []
file_count = 0
print(f"🔨 Processing {subdir}...")
for filename in sorted(os.listdir(subdir_path)):
if not filename.endswith("_rsid_added.gz"):
continue # Skip non-target files
file_path = os.path.join(subdir_path, filename)
try:
with gzip.open(file_path, "rt") as f:
df = pd.read_csv(f, sep=r"\s+", engine="python")
# Standardize column names
df.columns = df.columns.str.strip().str.upper()
if "LOG10P" not in df.columns:
raise ValueError(f"LOG10P column missing in {filename}, columns are: {df.columns.tolist()}")
# Add the P-value
df["P"] = 10 ** (-df["LOG10P"])
merged_data.append(df)
file_count += 1
except Exception as e:
print(f"💥 Error processing {filename} in {subdir}: {e}")
if merged_data and file_count >= 23:
merged_df = pd.concat(merged_data, ignore_index=True)
merged_df.to_csv(merged_file_path, sep="\t", index=False, compression="gzip")
print(f"🎉 Merged {file_count} files in {subdir} -> {merged_file_path}")
elif merged_data:
print(f"⚠️ Only {file_count} files processed in {subdir}, not merging.")
else:
print(f"🚫 No valid files found in {subdir}")
if __name__ == "__main__":
# Auto-tune cores based on system load
num_cores = auto_tune_cores()
# Get all subdirectories to process
all_subdirs = [d for d in os.listdir(base_dir) if os.path.isdir(os.path.join(base_dir, d))]
print(f"🔧 Starting parallel processing for {len(all_subdirs)} directories using {num_cores} cores...\n")
# Start the pool
with multiprocessing.Pool(processes=num_cores) as pool:
pool.map(process_subdir, all_subdirs)
print("\n✅ ALL DONE!")
Step 2: Check headers in R
This is to verify that it worked.
# Load necessary library
library(data.table)
# Define file path
file_path <- "/Volumes/LaCie/tahir/mr_ukbppp_bp/data_173/ACAN_P16112_OID20159_v1_Cardiometabolic/ACAN_P16112_OID20159_v1_Cardiometabolic_merged.tsv.gz"
# Read the gzipped TSV file using gunzip -c (macOS-friendly)
data <- fread(cmd = paste("gunzip -c", file_path), sep = "\t", header = TRUE)
# Display first few rows
print(head(data))
# Summary statistics for numeric columns
summary(data)
# Check column names
colnames(data)
table(data$CHROM)
Copy the merged files to data_173/mr_files
mkdir -p /Volumes/LaCie/tahir/mr_ukbppp_bp/data_173/mr_files
find /Volumes/LaCie/tahir/mr_ukbppp_bp/data_173 -name "*_merged.tsv.gz" -exec cp {} /Volumes/LaCie/tahir/mr_ukbppp_bp/data_173/mr_files/ \;
cd /Volumes/LaCie/tahir/mr_ukbppp_bp/data/mr_files
cp ieu-b-38_formatted.tsv /Volumes/LaCie/tahir/mr_ukbppp_bp/data_173/mr_files/
# run with:
nohup Rscript /Volumes/LaCie/tahir/mr_ukbppp_bp/scripts/mr_loop.R > /Volumes/LaCie/tahir/mr_ukbppp_bp/scripts/mr_loop.log 2>&1 &
ps -p 30446 -o stat,etime
rm(list = ls())
# Load libraries
library(TwoSampleMR)
library(MendelianRandomization)
library(ieugwasr)
library(MRInstruments)
library(dplyr)
library(tidyr)
library(data.table)
library(openxlsx)
library(ggplot2)
library(scales)
library(RhpcBLASctl)
library(patchwork)
library(genetics.binaRies)
# Define %||% operator for fallback
`%||%` <- function(x, y) if (length(x) > 0) x else y
# Set memory limit
Sys.setenv("R_MAX_VSIZE" = 128000)
cat("Max vector size set to:", Sys.getenv("R_MAX_VSIZE"), "MB\n")
# Set directories and files
base_dir <- "/Volumes/LaCie/tahir/mr_ukbppp_bp"
data_dir <- file.path(base_dir, "data_173", "mr_files")
results_dir <- file.path(base_dir, "results_173_tss")
if (!dir.exists(results_dir)) dir.create(results_dir, recursive = TRUE)
# Check if data_dir exists
if (!dir.exists(data_dir)) {
stop(sprintf("Data directory %s does not exist. Please verify the path.", data_dir))
}
# Outcome configuration for SBP
outcome_config <- list(
name = "SBP",
file = file.path(data_dir, "ieu-b-38_formatted.tsv"),
id = "ieu-b-38"
)
# Debug: Print outcome file path
cat("Checking outcome file:", outcome_config$file, "\n")
# Verify outcome file exists
if (!file.exists(outcome_config$file)) {
stop(sprintf("Outcome file %s does not exist. Please verify the file name and path.", outcome_config$file))
}
# Annotation file
local_annotation_file <- file.path(base_dir, "mart_clean_37_alias.txt")
if (!file.exists(local_annotation_file)) {
stop(sprintf("Local annotation file %s not found", local_annotation_file))
}
# List protein files
protein_files <- list.files(data_dir, pattern = "^[A-Z0-9]+_[A-Z0-9]+_OID[0-9]+_v1_.*\\.tsv(\\.gz)?$", full.names = TRUE)
protein_files <- protein_files[!grepl(basename(outcome_config$file), protein_files)]
# Debug: List all files in data_dir
all_files <- list.files(data_dir, full.names = TRUE)
cat("All files in data_dir:\n")
print(all_files)
cat("Protein files to process:\n")
print(protein_files)
if (length(protein_files) == 0) {
stop("No protein files found. Check the file pattern or data_dir contents.")
}
# Load annotation file
gene_data <- fread(local_annotation_file, data.table = FALSE)
cat(sprintf("Loaded local annotation file with %d genes\n", nrow(gene_data)))
# Set up logging
log_file <- file.path(base_dir, "mr_analysis.log")
sink(log_file, append = TRUE, split = TRUE)
cat(sprintf("Starting sequential MR analysis on %s\n", Sys.time()))
forest_plots <- list()
# Format exposure function
format_exposure <- function(file, protein_chrom, tss_start, tss_end) {
dat <- fread(file) %>%
filter(
CHROM == protein_chrom,
POS19 >= tss_start,
POS19 <= tss_end
) %>%
dplyr::select(
SNP = RSID,
chr.exposure = CHROM,
pos.exposure = POS19,
effect_allele.exposure = ALLELE1,
other_allele.exposure = ALLELE0,
eaf.exposure = A1FREQ,
beta.exposure = BETA,
se.exposure = SE,
pval.exposure = P,
samplesize.exposure = N
) %>%
mutate(
exposure = sub("_v1_.*$", "", basename(file)),
id.exposure = exposure,
chr.exposure = as.character(chr.exposure)
) %>%
filter(!is.na(SNP) & grepl("^rs", SNP))
cat(sprintf("Exposure SNPs for %s in TSS window: %d\n", basename(file), nrow(dat)))
return(dat)
}
# Format outcome function (fixed to handle sample size correctly)
format_outcome <- function(file, protein_chrom, tss_start, tss_end) {
dat <- fread(file)
# Debug: Print column names
cat("Columns in outcome file:", colnames(dat), "\n")
dat <- dat %>%
filter(
CHR == protein_chrom,
BP >= tss_start,
BP <= tss_end
) %>%
dplyr::select(
SNP = SNP,
chr.outcome = CHR,
pos.outcome = BP,
effect_allele.outcome = ALT,
other_allele.outcome = REF,
eaf.outcome = EAF,
beta.outcome = BETA,
se.outcome = SE,
pval.outcome = PVAL
) %>%
mutate(
outcome = outcome_config$name,
id.outcome = outcome_config$id,
chr.outcome = as.character(chr.outcome),
samplesize.outcome = 757601 # Added constant sample size
) %>%
filter(!is.na(SNP) & grepl("^rs", SNP))
cat(sprintf("Outcome SNPs in TSS window (chr %s, %d-%d): %d\n", protein_chrom, tss_start, tss_end, nrow(dat)))
return(dat)
}
# Function to add formatted Excel sheet
add_formatted_sheet <- function(wb, sheet_name, data, title, olink_id) {
if (is.null(data) || !is.data.frame(data) || nrow(data) == 0 || ncol(data) == 0) {
cat(sprintf("Skipping sheet '%s' for %s: Invalid or empty data\n", sheet_name, olink_id))
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, "No data available")
return()
}
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}
# Modified perform_mr function
perform_mr <- function(protein_file, outcome_file, results_dir, gene_data, clump_r2, clump_label) {
olink_id <- sub("_v1_.*$", "", basename(protein_file))
protein_dir <- file.path(results_dir, clump_label, olink_id)
today <- Sys.Date()
excel_file <- file.path(protein_dir, sprintf("MR_Results_%s_%s_r2%s_%s.xlsx", olink_id, outcome_config$name, clump_label, today))
tss_harmonized_file <- file.path(protein_dir, sprintf("tss_harmonized_%s_%s_r2%s_%s.csv", olink_id, outcome_config$name, clump_label, today))
plot_file <- file.path(protein_dir, sprintf("MR_Forestplot_%s_%s_r2%s_%s.png", olink_id, outcome_config$name, clump_label, today))
# Check for existing files
existing_harmonized <- list.files(protein_dir, pattern = sprintf("(full|tss)_harmonized_%s_%s_r2%s_.*\\.csv$", olink_id, outcome_config$name, clump_label), full.names = TRUE)
if (length(existing_harmonized) > 0 || file.exists(excel_file) || file.exists(plot_file)) {
cat(sprintf("Skipping %s (r2 = %s): Existing files found (e.g., %s)\n", olink_id, clump_r2, existing_harmonized[1] %||% excel_file))
return(NULL)
}
if (!dir.exists(protein_dir)) dir.create(protein_dir, recursive = TRUE)
cat(sprintf("Processing %s with clump_r2 = %s...\n", olink_id, clump_r2))
# Get TSS and chromosome
protein <- strsplit(olink_id, "_")[[1]][1]
cat(sprintf("Searching for protein: %s\n", protein))
protein_data <- gene_data %>%
filter(`Gene name` == protein) %>%
dplyr::select(`Chromosome/scaffold name`, `Transcription start site (TSS)`, `Gene stable ID version`, `Gene name`, `Gene Synonym`)
if (nrow(protein_data) == 0) {
cat(sprintf("No match found for Gene name '%s', trying Gene Synonym...\n", protein))
protein_data <- gene_data %>%
filter(`Gene Synonym` == protein) %>%
dplyr::select(`Chromosome/scaffold name`, `Transcription start site (TSS)`, `Gene stable ID version`, `Gene name`, `Gene Synonym`)
}
if (nrow(protein_data) == 0) {
cat(sprintf("No TSS data found for %s in either Gene name or Gene Synonym; skipping\n", olink_id))
return(NULL)
}
cat(sprintf("Protein data for %s:\n", olink_id))
print(protein_data)
chrom_col <- "Chromosome/scaffold name"
tss_col <- "Transcription start site (TSS)"
tss_column <- protein_data[[tss_col]]
if (all(is.na(tss_column)) || length(tss_column) == 0) {
cat(sprintf("No valid TSS values for %s; skipping\n", olink_id))
return(NULL)
}
protein_tss <- min(tss_column, na.rm = TRUE)
protein_chrom <- protein_data[[chrom_col]][1]
if (is.infinite(protein_tss) || is.na(protein_tss) || is.na(protein_chrom)) {
cat(sprintf("Invalid TSS (%s) or chromosome (%s) for %s; skipping\n", protein_tss, protein_chrom, olink_id))
return(NULL)
}
cat(sprintf("Using TSS: %d, Chromosome: %s for %s\n", protein_tss, protein_chrom, olink_id))
window_size <- 500000
tss_start <- protein_tss - window_size
tss_end <- protein_tss + window_size
# Filter and harmonize TSS data
exp_dat <- format_exposure(protein_file, protein_chrom, tss_start, tss_end)
if (nrow(exp_dat) == 0) {
cat(sprintf("No exposure SNPs in TSS window for %s; skipping\n", olink_id))
return(NULL)
}
out_dat <- format_outcome(outcome_file, protein_chrom, tss_start, tss_end)
if (nrow(out_dat) == 0) {
cat(sprintf("No outcome SNPs in TSS window for %s; skipping\n", olink_id))
return(NULL)
}
dat <- harmonise_data(exposure_dat = exp_dat, outcome_dat = out_dat, action = 1)
write.csv(dat, tss_harmonized_file, row.names = FALSE)
cat(sprintf("TSS harmonized data for %s saved to: %s\n", olink_id, tss_harmonized_file))
instruments <- dat %>%
filter(mr_keep == TRUE, pval.exposure < 5e-6) %>%
mutate(
rsid = SNP,
pval = pval.exposure,
id = sprintf("%s exposure", olink_id),
F_stat = (beta.exposure / se.exposure)^2
) %>%
filter(F_stat > 10)
cat(sprintf("Number of instruments for %s after F-stat filtering: %d\n", olink_id, nrow(instruments)))
if (nrow(instruments) == 0) {
cat(sprintf("No instruments for %s after F-stat filtering; skipping\n", olink_id))
return(NULL)
}
instruments <- steiger_filtering(instruments)
if (!"steiger_dir" %in% colnames(instruments) || nrow(instruments) == 0) {
cat(sprintf("Steiger filtering produced no valid instruments for %s; skipping\n", olink_id))
return(NULL)
}
instruments <- instruments %>% filter(steiger_dir == TRUE)
cat(sprintf("Number of instruments for %s after Steiger filtering: %d\n", olink_id, nrow(instruments)))
if (nrow(instruments) == 0) {
cat(sprintf("No instruments for %s after Steiger filtering; skipping\n", olink_id))
return(NULL)
}
clumped_dat <- NULL
tryCatch({
clumped <- ld_clump(
dplyr::tibble(rsid = instruments$rsid, pval = instruments$pval, id = instruments$id),
plink_bin = genetics.binaRies::get_plink_binary(),
bfile = "/Users/charleenadams/1000G_bfiles/EUR/EUR",
clump_r2 = clump_r2,
clump_kb = 500
)
clumped_dat <- instruments %>% filter(SNP %in% clumped$rsid)
cat(sprintf("Number of SNPs retained for %s after clumping (r2 = %s): %d\n", olink_id, clump_r2, nrow(clumped_dat)))
}, error = function(e) {
cat(sprintf("LD clumping failed for %s (r2 = %s): %s\nSkipping MR analysis.\n", olink_id, clump_r2, e$message))
return(NULL)
})
if (is.null(clumped_dat) || nrow(clumped_dat) == 0) {
cat(sprintf("No SNPs retained for %s after clumping (r2 = %s); skipping MR analysis\n", olink_id, clump_r2))
return(NULL)
}
# Perform MR analyses
ivw_result <- mr(clumped_dat, method_list = "mr_ivw")
egger_result <- if (nrow(clumped_dat) >= 3) {
tryCatch(mr(clumped_dat, method_list = "mr_egger_regression"), error = function(e) NULL)
} else NULL
weighted_median_result <- mr(clumped_dat, method_list = "mr_weighted_median")
mr_input <- mr_input(
bx = clumped_dat$beta.exposure,
bxse = clumped_dat$se.exposure,
by = clumped_dat$beta.outcome,
byse = clumped_dat$se.outcome,
exposure = olink_id,
outcome = outcome_config$name,
snps = clumped_dat$SNP
)
lasso_result <- tryCatch(mr_lasso(mr_input), error = function(e) {
cat(sprintf("MR-Lasso failed for %s: %s\n", olink_id, e$message)); NULL
})
conmix_result <- tryCatch(mr_conmix(mr_input), error = function(e) {
cat(sprintf("MR-ConMix failed for %s: %s\n", olink_id, e$message)); NULL
})
heterogeneity_result <- mr_heterogeneity(clumped_dat)
pleiotropy_result <- if (nrow(clumped_dat) >= 3) mr_pleiotropy_test(clumped_dat) else NULL
loo_result <- mr_leaveoneout(clumped_dat)
wald_ratios <- clumped_dat %>%
mutate(
wald_beta = beta.outcome / beta.exposure,
wald_se = sqrt((se.outcome^2 / beta.exposure^2) + ((beta.outcome^2 * se.exposure^2) / (beta.exposure^4))),
pval = 2 * pnorm(abs(wald_beta / wald_se), lower.tail = FALSE),
method = paste("Wald Ratio:", SNP)
) %>%
dplyr::select(SNP, wald_beta, wald_se, pval, method)
# Process MR results for forest plot
mr_results <- bind_rows(
if (nrow(ivw_result) > 0) ivw_result[, c("method", "b", "se", "pval")],
if (!is.null(egger_result) && nrow(egger_result) > 0) egger_result[, c("method", "b", "se", "pval")],
if (nrow(weighted_median_result) > 0) weighted_median_result[, c("method", "b", "se", "pval")],
if (!is.null(lasso_result)) data.frame(
method = "MR-Lasso",
b = lasso_result@Estimate,
se = lasso_result@StdError,
pval = lasso_result@Pvalue
),
if (!is.null(conmix_result)) data.frame(
method = "MR-ConMix",
b = conmix_result@Estimate,
se = (conmix_result@CIUpper - conmix_result@CILower)/(2*1.96),
pval = conmix_result@Pvalue
),
wald_ratios %>% dplyr::select(method, b = wald_beta, se = wald_se, pval)
) %>%
mutate(
method = factor(method, levels = unique(method)),
or = exp(b),
or_lower = exp(b - 1.96 * se),
or_upper = exp(b + 1.96 * se)
) %>%
na.omit()
# Define colors for forest plot
base_colors <- c("IVW" = "#1F78B4", "MR-Egger" = "#FF7F00", "Weighted Median" = "#33A02C",
"MR-Lasso" = "#FB9A99", "MR-ConMix" = "#E41A1C")
if (nrow(mr_results) == 0 || !("method" %in% names(mr_results))) {
cat(sprintf("Warning: No valid MR results for %s; using default colors\n", olink_id))
color_list <- base_colors
} else {
wald_methods <- unique(mr_results$method[grepl("Wald Ratio", mr_results$method)])
n_wald <- length(wald_methods)
wald_colors <- if (n_wald > 0) setNames(hue_pal()(n_wald), wald_methods) else NULL
color_list <- c(base_colors, wald_colors)
color_list <- color_list[names(color_list) %in% unique(mr_results$method)]
}
# Create forest plot
x_min <- min(mr_results$or_lower, na.rm = TRUE)
x_max <- max(mr_results$or_upper, na.rm = TRUE)
forest_plot <- ggplot(mr_results, aes(x = or, y = method, color = method)) +
geom_errorbarh(aes(xmin = or_lower, xmax = or_upper), height = 0.2, na.rm = TRUE, linewidth = 0.8) +
geom_point(size = 3) +
geom_vline(xintercept = 1, linetype = "dashed", color = "grey50") +
labs(
title = sprintf("Mendelian Randomization Estimates:\n %s on %s (r2 = %s)", protein, outcome_config$name, clump_r2),
x = "Causal Effect (Beta)",
y = ""
) +
theme_minimal() +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
axis.text.y = element_text(size = 12, face = "bold"),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 12),
panel.grid.major = element_line(color = "grey90"),
panel.grid.minor = element_blank(),
legend.position = "none",
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
) +
scale_color_manual(values = color_list) +
xlim(x_min - 0.02 * (x_max - x_min), x_max + 0.02 * (x_max - x_min))
forest_plots[[paste(olink_id, clump_label, sep = "_")]] <<- forest_plot
# Save outputs to Excel
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")
toc_data <- data.frame(
Sheet = c("IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix",
"Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
"Instruments", "Clumped_Data"),
Title = c("Inverse Variance Weighted Results", "MR-Egger Results",
"Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
"Wald Ratio Tests", "Heterogeneity Test Results",
"Pleiotropy Test Results", "Leave-One-Out Analysis",
"Filtered Instruments", "Clumped SNPs Data")
)
addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")
add_formatted_sheet <- function(wb, sheet_name, data, title) {
if (is.null(data) || !is.data.frame(data) || nrow(data) == 0 || ncol(data) == 0) {
cat(sprintf("Skipping sheet '%s' for %s: Invalid or empty data\n", sheet_name, olink_id))
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, "No data available")
return()
}
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}
add_formatted_sheet(wb, "IVW", ivw_result, "Inverse Variance Weighted Results")
add_formatted_sheet(wb, "MR-Egger", egger_result, "MR-Egger Results")
add_formatted_sheet(wb, "Weighted_Median", weighted_median_result, "Weighted Median Results")
if (!is.null(lasso_result)) {
lasso_df <- data.frame(
Exposure = lasso_result@Exposure,
Outcome = lasso_result@Outcome,
Estimate = lasso_result@Estimate,
StdError = lasso_result@StdError,
CILower = lasso_result@CILower,
CIUpper = lasso_result@CIUpper,
Pvalue = lasso_result@Pvalue,
SNPs = lasso_result@SNPs,
Valid = lasso_result@Valid,
ValidSNPs = paste(lasso_result@ValidSNPs, collapse = ", ")
)
add_formatted_sheet(wb, "MR-Lasso", lasso_df, "MR-Lasso Results")
}
if (!is.null(conmix_result)) {
conmix_df <- data.frame(
Exposure = conmix_result@Exposure,
Outcome = conmix_result@Outcome,
Estimate = conmix_result@Estimate,
Pvalue = conmix_result@Pvalue,
CILower = conmix_result@CILower,
CIUpper = conmix_result@CIUpper,
SNPs = conmix_result@SNPs
)
add_formatted_sheet(wb, "MR-ConMix", conmix_df, "MR-ConMix Results")
}
add_formatted_sheet(wb, "Wald_Ratios", wald_ratios, "Wald Ratio Tests")
add_formatted_sheet(wb, "Heterogeneity_Test", heterogeneity_result, "Heterogeneity Test Results")
add_formatted_sheet(wb, "Pleiotropy_Test", pleiotropy_result, "Pleiotropy Test Results")
add_formatted_sheet(wb, "Leave_One_Out", loo_result, "Leave-One-Out Analysis")
add_formatted_sheet(wb, "Instruments", instruments, "Filtered Instruments")
add_formatted_sheet(wb, "Clumped_Data", clumped_dat, "Clumped SNPs Data")
saveWorkbook(wb, excel_file, overwrite = FALSE)
ggsave(plot_file, forest_plot, width = 12, height = 10, dpi = 600, bg = "white")
cat(sprintf("Results for %s (r2 = %s) saved to: %s and %s\n", olink_id, clump_r2, excel_file, plot_file))
gc(verbose = TRUE)
}
# Main analysis loop for multiple clump_r2 values
for (clump_r2 in c(0.1, 0.01, 0.001)) {
clump_label <- paste0("r2_", gsub("\\.", "", clump_r2))
for (protein_file in protein_files) {
perform_mr(protein_file, outcome_config$file, results_dir, gene_data, clump_r2, clump_label)
}
}
cat(sprintf("MR analysis completed on %s\n", Sys.time()))
sink()
# Master file creation
today <- Sys.Date()
master_file <- file.path(base_dir, sprintf("Master_MR_Results_%s_%s.xlsx", outcome_config$name, today))
result_files <- list.files(results_dir, pattern = sprintf("MR_Results_.*_%s_r2.*\\.xlsx$", outcome_config$name),
full.names = TRUE, recursive = TRUE)
tabs <- c("IVW", "MR-Egger", "Weighted_Median", "MR-Lasso", "MR-ConMix",
"Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out",
"Instruments", "Clumped_Data")
master_data <- setNames(vector("list", length(tabs)), tabs)
extract_protein_id <- function(file) {
basename <- basename(file)
protein_id <- sub(sprintf("MR_Results_(.*)_%s_r2.*\\.xlsx$", outcome_config$name), "\\1", basename)
r2_value <- sub(".*_r2([0-9]+)_.*\\.xlsx$", "\\1", basename)
list(protein_id = protein_id, r2_value = r2_value)
}
for (file in result_files) {
info <- extract_protein_id(file)
protein_id <- info$protein_id
r2_value <- info$r2_value
cat(sprintf("Processing %s (r2 = %s)...\n", protein_id, r2_value))
wb <- tryCatch({
loadWorkbook(file)
}, error = function(e) {
cat(sprintf("Failed to load %s: %s\nSkipping\n", protein_id, e$message))
return(NULL)
}, warning = function(w) {
cat(sprintf("Warning loading %s: %s\n", protein_id, w$message))
suppressWarnings(loadWorkbook(file))
})
if (is.null(wb)) next
sheet_names <- sheets(wb)
for (tab in tabs) {
if (tab %in% sheet_names) {
data <- tryCatch({
readWorkbook(wb, sheet = tab, startRow = 3, colNames = TRUE)
}, error = function(e) {
cat(sprintf("Error reading %s in %s: %s\n", tab, protein_id, e$message))
return(NULL)
}, warning = function(w) {
cat(sprintf("Warning reading %s in %s: %s\n", tab, protein_id, w$message))
suppressWarnings(readWorkbook(wb, sheet = tab, startRow = 3, colNames = TRUE))
})
if (is.null(data) || nrow(data) == 0 ||
(nrow(data) == 1 && length(data[1, 1]) > 0 && data[1, 1] == "No data available")) {
next
}
if ("chr.exposure" %in% names(data)) {
data$chr.exposure <- as.character(data$chr.exposure)
}
if ("chr.outcome" %in% names(data)) {
data$chr.outcome <- as.character(data$chr.outcome)
}
data$Protein_ID <- protein_id
data$Clump_R2 <- r2_value
if (is.null(master_data[[tab]])) {
master_data[[tab]] <- data
} else {
master_data[[tab]] <- bind_rows(master_data[[tab]], data)
}
}
}
}
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")
toc_data <- data.frame(
Sheet = tabs,
Title = c("Inverse Variance Weighted Results", "MR-Egger Results",
"Weighted Median Results", "MR-Lasso Results", "MR-ConMix Results",
"Wald Ratio Tests", "Heterogeneity Test Results",
"Pleiotropy Test Results", "Leave-One-Out Analysis",
"Filtered Instruments", "Clumped SNPs Data")
)
addWorksheet(wb, "TOC")
writeData(wb, "TOC", "Table of Contents", startRow = 1, startCol = 1)
mergeCells(wb, "TOC", cols = 1:2, rows = 1)
addStyle(wb, "TOC", title_style, rows = 1, cols = 1)
writeData(wb, "TOC", toc_data, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "TOC", cols = 1:2, widths = "auto")
add_formatted_sheet <- function(wb, sheet_name, data, title) {
if (is.null(data) || nrow(data) == 0) {
cat(sprintf("Skipping sheet '%s': No data\n", sheet_name))
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, "No data available")
return()
}
addWorksheet(wb, sheet_name)
writeData(wb, sheet_name, title, startRow = 1, startCol = 1)
writeData(wb, sheet_name, data, startRow = 3, startCol = 1, headerStyle = header_style)
mergeCells(wb, sheet_name, cols = 1:ncol(data), rows = 1)
setColWidths(wb, sheet_name, cols = 1:ncol(data), widths = "auto")
addStyle(wb, sheet_name, title_style, rows = 1, cols = 1)
}
for (tab in tabs) {
add_formatted_sheet(wb, tab, master_data[[tab]], toc_data$Title[tabs == tab])
}
saveWorkbook(wb, master_file, overwrite = TRUE)
cat(sprintf("Master file saved to: %s\n", master_file))
import pandas as pd
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
import re
# File paths
input_file = "/Volumes/LaCie/tahir/mr_ukbppp_bp/Master_MR_Results_SBP_2025-04-30_Modified.xlsx"
output_file = "/Volumes/LaCie/tahir/mr_ukbppp_bp/Master_MR_Results_SBP_2025-04-30_Modified_v3.xlsx"
# Load the Excel file
wb = openpyxl.load_workbook(input_file)
sheets_to_process = ["IVW", "MR-Egger", "Weighted_Median", "Wald_Ratios", "Heterogeneity_Test", "Pleiotropy_Test", "Leave_One_Out", "Instruments", "Clumped_Data"]
# Function to extract r2 value from file name and map to R2
def extract_and_convert_r2(file_name):
if pd.isna(file_name):
return pd.NA
# Extract the r2 value from the file name (e.g., r2r2_0001)
match = re.search(r'r2r2_(\d+)_', str(file_name))
if not match:
print(f"Could not extract r2 value from file name: {file_name}")
return pd.NA
r2_value = match.group(1) # e.g., "0001"
# Map to R2
if r2_value == "0001":
return "0.001"
elif r2_value == "001":
return "0.01"
elif r2_value == "01":
return "0.1"
else:
print(f"Unexpected r2 value extracted: {r2_value}")
return pd.NA
for sheet_name in sheets_to_process:
print(f"Processing sheet: {sheet_name}")
# Read the sheet into a DataFrame
df = pd.read_excel(input_file, sheet_name=sheet_name, header=2)
# Check if the sheet has data
if df.empty or (df.shape[0] == 1 and df.iloc[0, 0] == "No data available"):
print(f"Sheet {sheet_name} is empty or has no data. Skipping.")
continue
# Ensure File column exists (should already be renamed from Clump_R2)
if "File" not in df.columns:
if "Clump_R2" in df.columns:
df = df.rename(columns={"Clump_R2": "File"})
print(f"Renamed Clump_R2 to File in sheet {sheet_name}")
else:
print(f"Sheet {sheet_name} does not have a File or Clump_R2 column. Adding R2 with NA values.")
df["R2"] = pd.NA
# Write back to sheet and continue
ws = wb[sheet_name]
for row in ws["A3":f"{openpyxl.utils.get_column_letter(ws.max_column)}{ws.max_row}"]:
for cell in row:
cell.value = None
for r_idx, row in enumerate(dataframe_to_rows(df, index=False, header=True), 3):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value)
continue
# Debug: Print unique values in File column
print(f"Unique values in File column for sheet {sheet_name}: {df['File'].unique()}")
# Create new R2 column by extracting r2 from File column
df["R2"] = df["File"].apply(extract_and_convert_r2)
# Debug: Check if R2 column has values
print(f"R2 column values for sheet {sheet_name}: {df['R2'].unique()}")
# Write the modified DataFrame back to the sheet
ws = wb[sheet_name]
# Clear existing data below the title (row 1) and header (row 3)
for row in ws["A3":f"{openpyxl.utils.get_column_letter(ws.max_column)}{ws.max_row}"]:
for cell in row:
cell.value = None
# Write the updated DataFrame starting from row 3
for r_idx, row in enumerate(dataframe_to_rows(df, index=False, header=True), 3):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value)
# Save the modified workbook
wb.save(output_file)
print(f"Modified Excel file saved to: {output_file}")
# Clear environment
rm(list = ls())
# Load required libraries
library(coloc)
library(data.table)
library(dplyr)
library(openxlsx)
# Set base directory
base_dir <- "/Volumes/LaCie/tahir/mr_ukbppp_bp"
results_dir <- file.path(base_dir, "results_173_tss")
output_file <- file.path(base_dir, sprintf("Coloc_Results_SBP_%s.xlsx", Sys.Date()))
# Define subdirectory to process (choose one)
r2_dirs <- "r2_01" # You can change this to "r2_001" or "r2_0001" if desired
# Find all tss_harmonized files in the specified r2 directory
tss_files <- unlist(lapply(r2_dirs, function(r2_dir) {
protein_dirs <- list.dirs(file.path(results_dir, r2_dir), recursive = FALSE)
unlist(lapply(protein_dirs, function(protein_dir) {
files <- list.files(protein_dir, pattern = "^tss_harmonized_.*\\.csv$", full.names = TRUE)
return(files)
}))
}))
if (length(tss_files) == 0) {
stop("No tss_harmonized files found in the subdirectory.")
}
cat(sprintf("Found %d tss_harmonized files to process in %s.\n", length(tss_files), r2_dirs))
# Function to extract protein ID from file name
extract_info <- function(file_path) {
file_name <- basename(file_path)
# Extract protein ID (e.g., WIF1_Q9Y5W5_OID21464)
protein_id <- sub("^tss_harmonized_(.*)_SBP_r2r2_.*\\.csv$", "\\1", file_name)
list(protein_id = protein_id, file_name = file_name)
}
# Function to run colocalization analysis
run_coloc <- function(tss_file) {
# Extract info
info <- extract_info(tss_file)
protein_id <- info$protein_id
file_name <- info$file_name
cat(sprintf("Processing %s...\n", protein_id))
# Read the harmonized data
dat <- tryCatch({
fread(tss_file)
}, error = function(e) {
cat(sprintf("Failed to read %s: %s\n", tss_file, e$message))
return(NULL)
})
# Check if data is empty or insufficient
if (is.null(dat) || nrow(dat) < 1) {
cat(sprintf("No data in %s. Skipping.\n", tss_file))
return(NULL)
}
# Verify required columns exist
required_cols <- c("SNP", "beta.exposure", "se.exposure", "samplesize.exposure",
"beta.outcome", "se.outcome", "samplesize.outcome",
"eaf.exposure", "eaf.outcome")
missing_cols <- setdiff(required_cols, colnames(dat))
if (length(missing_cols) > 0) {
cat(sprintf("Missing required columns in %s: %s\n", tss_file, paste(missing_cols, collapse = ", ")))
return(NULL)
}
# Remove duplicate SNPs (keep first occurrence)
if (any(duplicated(dat$SNP))) {
cat(sprintf("Found %d duplicate SNPs in %s. Keeping first occurrence.\n",
sum(duplicated(dat$SNP)), protein_id))
dat <- dat[!duplicated(dat$SNP), ]
}
# Check for NA or invalid values
if (any(is.na(dat$beta.exposure)) || any(is.na(dat$se.exposure)) ||
any(is.na(dat$beta.outcome)) || any(is.na(dat$se.outcome)) ||
any(is.na(dat$eaf.exposure)) || any(is.na(dat$eaf.outcome))) {
cat(sprintf("NA values found in critical columns for %s. Skipping.\n", protein_id))
return(NULL)
}
# Compute MAF (minor allele frequency) as min(eaf, 1-eaf)
maf_exposure <- pmin(dat$eaf.exposure, 1 - dat$eaf.exposure)
maf_outcome <- pmin(dat$eaf.outcome, 1 - dat$eaf.outcome)
# Prepare exposure dataset (protein)
exposure_data <- list(
snp = dat$SNP,
beta = dat$beta.exposure,
varbeta = (dat$se.exposure)^2,
N = dat$samplesize.exposure[1], # Assuming same sample size for all SNPs
MAF = maf_exposure,
type = "quant"
)
# Prepare outcome dataset (SBP)
outcome_data <- list(
snp = dat$SNP,
beta = dat$beta.outcome,
varbeta = (dat$se.outcome)^2,
N = dat$samplesize.outcome[1], # Assuming same sample size for all SNPs
MAF = maf_outcome,
type = "quant"
)
# Run colocalization analysis
coloc_result <- tryCatch({
coloc.abf(dataset1 = exposure_data, dataset2 = outcome_data)
}, error = function(e) {
cat(sprintf("Coloc failed for %s: %s\n", protein_id, e$message))
return(NULL)
})
if (is.null(coloc_result)) {
return(NULL)
}
# Extract summary results
summary <- coloc_result$summary
result <- data.frame(
Protein_ID = protein_id,
File = file_name,
NSNPs = summary["nsnps"],
PP_H0 = summary["PP.H0.abf"],
PP_H1 = summary["PP.H1.abf"],
PP_H2 = summary["PP.H2.abf"],
PP_H3 = summary["PP.H3.abf"],
PP_H4 = summary["PP.H4.abf"],
stringsAsFactors = FALSE
)
return(result)
}
# Process all files
coloc_results <- lapply(tss_files, run_coloc)
coloc_results <- do.call(rbind, Filter(Negate(is.null), coloc_results))
if (is.null(coloc_results) || nrow(coloc_results) == 0) {
stop("No colocalization results to save.")
}
# Clean and format the results (remove R2 and P-value Threshold columns)
coloc_results_clean <- coloc_results %>%
rename(
"Protein ID" = Protein_ID,
"No. SNPs" = NSNPs,
"PP H0 (No Association)" = PP_H0,
"PP H1 (Exposure Only)" = PP_H1,
"PP H2 (Outcome Only)" = PP_H2,
"PP H3 (Distinct Variants)" = PP_H3,
"PP H4 (Shared Variant)" = PP_H4
) %>%
mutate(
across(starts_with("PP "), ~ round(.x, 4))
) %>%
dplyr::select(
"Protein ID", File, "No. SNPs",
"PP H0 (No Association)", "PP H1 (Exposure Only)", "PP H2 (Outcome Only)",
"PP H3 (Distinct Variants)", "PP H4 (Shared Variant)"
)
# Create Excel workbook with formatted headers
wb <- createWorkbook()
title_style <- createStyle(fontSize = 14, fontColour = "black", textDecoration = "bold", halign = "center")
header_style <- createStyle(fontColour = "white", fgFill = "#C8A2C8", textDecoration = "bold", halign = "center", border = "TopBottomLeftRight")
# Add sheet for colocalization results
addWorksheet(wb, "Colocalization Results")
writeData(wb, "Colocalization Results", "Table 1. Colocalization Analysis Results for Proteins and SBP", startRow = 1, startCol = 1)
mergeCells(wb, "Colocalization Results", cols = 1:ncol(coloc_results_clean), rows = 1)
addStyle(wb, "Colocalization Results", title_style, rows = 1, cols = 1)
writeData(wb, "Colocalization Results", coloc_results_clean, startRow = 3, startCol = 1, headerStyle = header_style)
setColWidths(wb, "Colocalization Results", cols = 1:ncol(coloc_results_clean), widths = "auto")
# Save the workbook
saveWorkbook(wb, output_file, overwrite = TRUE)
cat(sprintf("Colocalization results saved to: %s\n", output_file))
See: https://yodamendel.shinyapps.io/mr_ukbppp_bp_app/
# Run with:
# rsconnect::deployApp('/Volumes/LaCie/tahir/mr_ukbppp_bp/mr_ukbppp_bp_app')
library(shiny)
# ---- UI ----
ui <- fluidPage(
# ---- HEAD STYLES ----
tags$head(
tags$style(HTML("
body {
font-family: 'Lato', sans-serif;
color: #1A2526;
background-color: #FFFFFF; /* Pure white background */
}
#main_title {
color: #005566;
margin-bottom: 10px;
font-size: 32px;
font-weight: bold;
}
#lab_link {
display: inline-block;
margin-top: 10px;
font-size: 18px;
color: #A51C30;
text-decoration: none;
font-weight: bold;
}
#lab_link:hover {
color: #7A1422;
}
.info-box {
background-color: #F5F9FF;
border: 2px solid #005566;
border-radius: 10px;
padding: 20px;
margin-bottom: 30px;
box-shadow: 2px 2px 8px rgba(0,0,0,0.1);
}
.download-box {
background-color: #FFF5F5;
border: 2px solid #A51C30;
border-radius: 10px;
padding: 20px;
margin-bottom: 30px;
box-shadow: 2px 2px 8px rgba(0,0,0,0.1);
}
.download-button {
margin-bottom: 15px;
}
.btn-primary {
background-color: #005566;
border-color: #003f4d;
font-size: 16px;
padding: 10px 20px;
border-radius: 8px;
color: #ffffff;
text-decoration: none;
display: inline-block;
transition: background-color 0.3s ease;
}
.btn-primary:hover {
background-color: #007c91;
color: #ffffff;
}
#logo_container {
text-align: center;
margin-top: 30px;
margin-bottom: 20px;
}
#logo_image {
width: 200px;
height: auto;
}
"))
),
# ---- HEADER TITLE & LAB LINK ----
div(
style = "text-align: center; margin-bottom: 10px;",
h1(HTML("Cis-MR and Colocalization of Selected UKB-PPP Proteins & SBP"), id = "main_title")
),
div(
style = "text-align: center; margin-bottom: 30px;",
tags$a(
href = "https://www.bidmc.org/research/research-by-department/medicine/cardiovascular-medicine/personal-genomics-and-cardiometabolic-disease",
target = "_blank",
"🔬 Our Lab",
id = "lab_link"
)
),
# ---- DESCRIPTION BOX ----
div(class = "info-box",
h3("🧬 Analysis Summary"),
p(HTML("
This platform showcases Mendelian Randomization (MR) and colocalization analyses of selected proteins from the UK Biobank Pharma Proteomics Project (UKB-PPP) with systolic blood pressure (SBP).<br><br>
<b>Dataset:</b> We used GWAS summary statistics for SBP (Dataset ID: ieu-b-38) from Evangelou et al. (2018, PMID: <a href='https://pmc.ncbi.nlm.nih.gov/articles/PMC6284793/' target='_blank'>30224653</a>), conducted by the International Consortium of Blood Pressure.<br><br>
<b>MR Analysis:</b> We constructed <b>cis-acting genetic instruments</b> (1MB around TSS), filtered at <b>P < 5e-6</b> with LD clumping at <b>R² thresholds of 0.1, 0.01, and 0.001</b>. Effect estimates (b) are presented as <b>betas</b>.<br><br>
<b>Colocalization Analysis:</b> We performed colocalization analysis to test for shared causal variants between protein levels and SBP, reporting posterior probabilities for hypotheses H0 to H4.<br><br>
<b>Downloadable File Includes:</b><br><br>
✅ <b>MR Results:</b> Results for selected UKB-PPP proteins across three R² thresholds (0.1, 0.01, 0.001), including IVW, MR-Egger, Weighted Median, MR-LASSO, MR-ConMix, Wald Ratios, heterogeneity, pleiotropy, and instrument data.<br><br>
✅ <b>Colocalization Results:</b> Posterior probabilities for shared causal variants between protein levels and SBP (last tab in the Excel) <br><br>
<b>Notes and Scripts</b> (Walkthrough + Annotated Code):<br>
👉 <a href='https://rpubs.com/YodaMendel/1304438' target='_blank'>https://rpubs.com/YodaMendel/1304438</a>
"))
),
# ---- DOWNLOAD BOX ----
div(class = "download-box",
h3("📦 Download Results"),
p("Click below to download the master Excel file containing MR and colocalization results."),
div(
class = "download-button",
downloadButton(
outputId = "download_mr_results",
label = "Download Master MR and Coloc Results (Excel)",
class = "btn-primary"
)
)
),
# ---- BIDMC LOGO WITH HYPERLINK ----
div(
id = "logo_container",
tags$a(
href = "https://www.bidmc.org/research/research-by-department/medicine/cardiovascular-medicine/personal-genomics-and-cardiometabolic-disease",
target = "_blank",
tags$img(
src = "BIDMC_HMS_Stacked-LockUp.png",
id = "logo_image",
alt = "Beth Israel Deaconess Medical Center Logo"
)
)
)
)
# ---- SERVER ----
server <- function(input, output, session) {
# Download handler for the MR and colocalization results Excel file
output$download_mr_results <- downloadHandler(
filename = function() {
"Master_MR_Results_SBP_2025-04-30_Modified_v3.xlsx"
},
content = function(file) {
file.copy("www/Master_MR_Results_SBP_2025-04-30_Modified_v3.xlsx", file)
},
contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
)
}
# ---- RUN APP ----
shinyApp(ui = ui, server = server)