This pipeline uses linkage disequilibrium score regression (LDSC) to calculate genetic correlations (rg) between 33 Olink chemokines.
LDSC requires linkage disequilibrium (LD) scores based on r² values, unlike SuSiE and coloc, which rely on the signed correlation coefficient (r) in matrix form. These LD scores are also aggregated and weighted. We are starting with the UK Biobank Pharma Proteomics Project (UKB-PPP) summary statistics for Europeans because precomputed LD scores for Europeans are available from the Alkes Lab. For our MESA and JHS populations, we will need to calculate these scores ourselves, which is more complex than generating LD matrices and could require substantial storage space. Additionally, LDSC generally works best with sample sizes of > 5,000, so MESA and JHS may be slightly underpowered for this approach. To test this pipeline with chemokine data, we are using the UKB-PPP dataset.
This setup is designed for flexibility and scalability; for example, it could be adapted to process additional UKB-PPP proteins in about 3 hours (meaning the scripts can be revised in about 3 hours), significantly reducing the manual effort needed. Run time for generating genetic correlations varies depending on the number of protein-protein pairs analyzed. This loop for LDSC, which generated 528 genetic correlations from 33 chemokines, ran for about 5 hours. (Note: This pipeline contains many scripts. The genetic correlation script was not parallelized for this initial run, as the focus was on obtaining results; future runs would benefit from parallelization to improve efficiency.)
Direct quote from the Synapse UKB-PPP website:
The UK Biobank Pharma Proteomics Project (UKB-PPP) is a collaboration between the UK Biobank (UKB) and thirteen biopharmaceutical companies characterising the plasma proteomic profiles of 54,219 UKB participants. As part of a collaborative analysis across the thirteen UKB-PPP partners, we conducted comprehensive protein quantitative trait loci (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 85% are newly discovered, in addition to ancestry-specific pQTL mapping in non-Europeans. We identify independent secondary associations in 87% of cis and 30% of trans loci, expanding the catalogue of genetic instruments for downstream analyses.
The UKB-PPP pQTL summary statistics are available via synapse for the “discovery” UKB EUR population. To download, it requires creating a synapse account, obtaining a personal access token, and adding files to a download list.
In a terminal window, programmatically acquire the download list of selected chemokine summary statistics.
I used the protein mapping file, which I’d downloaded previously to select the chemokines (i.e., I looked at them in Excel). I placed a copy of the rsid and protein mapping files at:
/Users/charleenadams/temp_BI/chemokine_rgs_olink/maps
(synapse_env) charleenadams@Charleens-MacBook-Pro maps % ls
olink_protein_map_1.5k_v1.tsv olink_rsid_map_mac5_info03_b0_7_chr20_patched_v2.tsv.gz
olink_protein_map_3k_v1.tsv olink_rsid_map_mac5_info03_b0_7_chr21_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr10_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr22_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr11_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr2_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr12_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr3_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr13_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr4_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr14_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr5_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr15_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr6_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr16_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr7_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr17_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr8_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr18_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chr9_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr19_patched_v2.tsv.gz olink_rsid_map_mac5_info03_b0_7_chrXY_patched_v2.tsv.gz
olink_rsid_map_mac5_info03_b0_7_chr1_patched_v2.tsv olink_rsid_map_mac5_info03_b0_7_chrX_patched_v2.tsv.gz
Here are the 33 chemokines I selected. Would you like other ones? I’m not sure if we want Chemokine-like protein TAFA-5. I kept it for now.
| UKBPPP_ProteinID | Olink Target Fullname | OlinkID | UniProt | Assay |
|---|---|---|---|---|
| CCL13:Q99616:OID20655:v1 | C-C motif chemokine 13 | OID20655 | Q99616 | CCL13 |
| CCL14:Q16627:OID20401:v1 | C-C motif chemokine 14 | OID20401 | Q16627 | CCL14 |
| CCL15:Q16663:OID20328:v1 | C-C motif chemokine 15 | OID20328 | Q16663 | CCL15 |
| CCL16:O15467:OID20334:v1 | C-C motif chemokine 16 | OID20334 | O15467 | CCL16 |
| CCL17:Q92583:OID20745:v1 | C-C motif chemokine 17 | OID20745 | Q92583 | CCL17 |
| CCL18:P55774:OID20395:v1 | C-C motif chemokine 18 | OID20395 | P55774 | CCL18 |
| CCL19:Q99731:OID21030:v1 | C-C motif chemokine 19 | OID21030 | Q99731 | CCL19 |
| CCL2:P13500:OID21004:v1 | C-C motif chemokine 2 | OID21004 | P13500 | CCL2 |
| CCL20:P78556:OID20671:v1 | C-C motif chemokine 20 | OID20671 | P78556 | CCL20 |
| CCL21:O00585:OID20686:v1 | C-C motif chemokine 21 | OID20686 | O00585 | CCL21 |
| CCL22:O00626:OID20765:v1 | C-C motif chemokine 22 | OID20765 | O00626 | CCL22 |
| CCL23:P55773:OID20693:v1 | C-C motif chemokine 23 | OID20693 | P55773 | CCL23 |
| CCL24:O00175:OID20772:v1 | C-C motif chemokine 24 | OID20772 | O00175 | CCL24 |
| CCL25:O15444:OID20674:v1 | C-C motif chemokine 25 | OID20674 | O15444 | CCL25 |
| CCL26:Q9Y258:OID20546:v1 | C-C motif chemokine 26 | OID20546 | Q9Y258 | CCL26 |
| CCL27:Q9Y4X3:OID20121:v1 | C-C motif chemokine 27 | OID20121 | Q9Y4X3 | CCL27 |
| CCL28:Q9NRJ3:OID20569:v1 | C-C motif chemokine 28 | OID20569 | Q9NRJ3 | CCL28 |
| CCL3:P10147:OID20610:v1 | C-C motif chemokine 3 | OID20610 | P10147 | CCL3 |
| CCL4:P13236:OID20695:v1 | C-C motif chemokine 4 | OID20695 | P13236 | CCL4 |
| CCL5:P13501:OID20412:v1 | C-C motif chemokine 5 | OID20412 | P13501 | CCL5 |
| CCL7:P80098:OID20523:v1 | C-C motif chemokine 7 | OID20523 | P80098 | CCL7 |
| CCL8:P80075:OID21466:v1 | C-C motif chemokine 8 | OID21466 | P80075 | CCL8 |
| CXCL10:P02778:OID20697:v1 | C-X-C motif chemokine 10 | OID20697 | P02778 | CXCL10 |
| CXCL11:O14625:OID21042:v1 | C-X-C motif chemokine 11 | OID21042 | O14625 | CXCL11 |
| CXCL13:O43927:OID21024:v1 | C-X-C motif chemokine 13 | OID21024 | O43927 | CXCL13 |
| CXCL14:O95715:OID20458:v1 | C-X-C motif chemokine 14 | OID20458 | O95715 | CXCL14 |
| CXCL16:Q9H2A7:OID20282:v1 | C-X-C motif chemokine 16 | OID20282 | Q9H2A7 | CXCL16 |
| CXCL17:Q6UXB2:OID20622:v1 | C-X-C motif chemokine 17 | OID20622 | Q6UXB2 | CXCL17 |
| CXCL3:P19876:OID20788:v1 | C-X-C motif chemokine 3 | OID20788 | P19876 | CXCL3 |
| CXCL5:P42830:OID20207:v1 | C-X-C motif chemokine 5 | OID20207 | P42830 | CXCL5 |
| CXCL6:P80162:OID20613:v1 | C-X-C motif chemokine 6 | OID20613 | P80162 | CXCL6 |
| CXCL9:Q07325:OID20675:v1 | C-X-C motif chemokine 9 | OID20675 | Q07325 | CXCL9 |
| TAFA5:Q7Z5A7:OID21321:v1 | Chemokine-like protein TAFA-5 | OID21321 | Q7Z5A7 | TAFA5 |
library(DT)
table <- fread("/Users/charleenadams/temp_BI/chemokine_rgs_olink/protein_map_interactive.csv")
datatable(table,
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Olink Protein Map')
After logging into Synapse and adding desired files to the download list, these are the steps.
cd /Users/charleenadams/temp_BI/chemokine_rgs_olink
conda create -n synapse_env -c conda-forge mamba
conda activate synapse_env
pip install synapseclient
synapse -h
synapse get-download-list
synapse login "cdadams" eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ6SjVSSzpMN1RMOlQ3TDc6M1ZYNjpKRU9VOjY0NFI6VTNJWDo1S1oyOjdaQ0s6RlBUSCIsImFsZyI6IlJTMjU2In0.eyJhY2Nlc3MiOnsic2NvcGUiOlsidmlldyIsImRvd25sb2FkIiwibW9kaWZ5Il0sIm9pZGNfY2xhaW1zIjp7fX0sInRva2VuX3R5cGUiOiJQRVJTT05BTF9BQ0NFU1NfVE9LRU4iLCJpc3MiOiJodHRwczovL3JlcG8tcHJvZC5wcm9kLnNhZ2ViYXNlLm9yZy9hdXRoL3YxIiwiYXVkIjoiMCIsIm5iZiI6MTcyOTcwNjgzOSwiaWF0IjoxNzI5NzA2ODM5LCJqdGkiOiIxMzA3MyIsInN1YiI6IjM0NzI3ODMifQ.B65yULl25FZ1vmFOXEE6RaNYOhXTXZ5ov5JLVP47nJpejRA6oMfcwOcY1z4RH7bLFQ48xzRhc1I8kSbexid8jpoKnQJlBI8mVNItvv92dZNDrNN_8cvXtHNLVb6O12jwIZ0XDuQxcLuxm0kF52IZSXWMtwBzBeyKckkNYL9LTbQKDmFRmOpwZGVjkZyT7wsiuLLjj9bTcMfFcY-IstD7xnK4O51D_UejK6vg-Eh0Uw23i4NHhMAA02-D-tZq1yV0X8yI0EIOFoLQ3AKtXgrr4sE6zDGz4KTPaqVM557Vv9Aqk2KtnfiZRSWWxWYPbJm1Ty8AYLBDv6Iq-tdAval8kw
RSID maps are provided per chromosome.
In terminal window, look at the rsid map for chromosome 1. It’s nice that they give us positions for both POS19 and POS38.
zless - olink_rsid_map_mac5_info03_b0_7_chr1_patched_v2.tsv.gz
ID REF ALT rsid POS19 POS38
1:10177:A:AC:imp:v1 A AC rs367896724 10177 10177
1:10352:T:TA:imp:v1 T TA rs201106462 10352 10352
1:10511:G:A:imp:v1 G A rs534229142 10511 10511
We have to manually select the Olink files for the download list, which means it can be easy to miss them. I’d identified 33, but when adding them to the download list, it’s difficult to eyeball. Below I’m counting that we got 33 and then unpacking them.
ls *.tar | wc -l
for file in *.tar; do tar -xf "$file"; done
The rsid numbers need to be merged into the chemokine summary statistics files for LDSC. When we unpack the files, they produce 33 subdirectories. So, the code below goes into the subdirectories and annotates all the files in all of them with their corresponding rsid mapping files by chromosome.
#!/bin/bash
# Base directory where your data is located
base_dir="/Users/charleenadams/temp_BI/chemokine_rgs_olink"
# Directory where RSID map files are located
map_dir="$base_dir/maps"
# Log file to track progress
log_file="add_rsid_log.txt"
echo "Log started on $(date)" > $log_file
# Loop through each subdirectory in the base directory
for dir in "$base_dir"/*; do
if [[ -d "$dir" ]]; then
echo "Processing directory: $dir" | tee -a $log_file
# Loop through each .gz file in the subdirectory
for file in "$dir"/*.gz; do
# Extract chromosome number from file name without using -P option
chrom=$(basename "$file" | sed -E 's/.*chr([0-9XY]+).*/\1/')
# Define the RSID map file for this chromosome
map_file="$map_dir/olink_rsid_map_mac5_info03_b0_7_chr${chrom}_patched_v2.tsv.gz"
output_file="${file%.gz}_rsid_added.gz"
echo " Processing file: $file (Chromosome $chrom)" | tee -a $log_file
# Check if the chromosome number was extracted and the RSID map file exists
if [[ -n "$chrom" && -f "$map_file" ]]; then
echo " Using RSID map: $map_file" | tee -a $log_file
# Remove existing rsid_added file to avoid overwrite prompts
rm -f "$output_file"
# Join the original file with the RSID map based on the ID column
gunzip -c "$file" | \
awk 'NR==FNR{a[$1]=$4; next} $3 in a {$3=a[$3]}1' <(gunzip -c "$map_file") - | \
gzip -c > "$output_file"
# Confirm successful processing
if [[ -s "$output_file" ]]; then
echo " RSID added and output saved to $output_file" | tee -a $log_file
else
echo " Warning: $output_file is empty!" | tee -a $log_file
fi
else
echo " Error: RSID map file for chromosome $chrom not found or chromosome not identified." | tee -a $log_file
fi
done
fi
done
echo "Log ended on $(date)" >> $log_file
echo "Processing complete. Check $log_file for details."
#Run with:
chmod +x add_rsid.sh
nohup ./add_rsid.sh > add_rsid_log.txt 2>&1 &
ps -p 27745 -o stat,etime
Lots of things can go wrong in loops. Good to watch their progress.
find . -type f \( -name '*_rsid_added.gz' -o -name '*_rsid_added_rsid_added.gz' \) -exec ls {} +
find . -type f \( -name '*_rsid_added.gz' -o -name '*_rsid_added_rsid_added.gz' \) -exec rm {} +
Each chemokine subdirectory contains summary statistics by chromosome. They must be merged for LDSC so that there is one summary statistics file per chemokine. (Well, I bet LDSC would let us run things one chromosome at a time, but setting that up sounds more hellish than merging, and merging is cleaner for this purpose.)
#!/bin/bash
# Define base directory
base_dir="/Users/charleenadams/temp_BI/chemokine_rgs_olink"
# Log file for tracking
log_file="$base_dir/merge_rsid_files_log.txt"
echo "Processing started on $(date)" > $log_file
# Find directories to process, excluding specific ones
for dir in "$base_dir"/*/; do
# Skip directories you don’t want to process
if [[ "$dir" == *"maps"* || "$dir" == *"1000G_Phase3_weights_hm3_no_MHC"* || "$dir" == *"ldsc"* || "$dir" == *"rsconnect"* ]]; then
continue
fi
echo "Processing directory: $dir" | tee -a $log_file
# Collect all '_rsid_added.gz' files in this directory
rsid_files=("$dir"*_rsid_added.gz)
# If there are files to merge
if [[ ${#rsid_files[@]} -gt 0 ]]; then
merged_file="${dir%/}_merged_rsid_added.gz"
# Extract header from the first chromosome file
gunzip -c "${rsid_files[0]}" | head -n 1 > temp_header.txt
# Use GNU parallel to process each file without header in parallel
export -f gunzip
export -f tail
export log_file
export merged_file
export temp_header.txt
parallel -j4 'gunzip -c {} | tail -n +2 >> temp_content.txt' ::: "${rsid_files[@]}"
# Concatenate header and content into final merged file
cat temp_header.txt temp_content.txt | gzip -c > "$merged_file"
rm temp_header.txt temp_content.txt
# Confirm if the merged file was created successfully and isn’t empty
if [[ -s "$merged_file" ]]; then
echo " Merged file created: $merged_file" | tee -a $log_file
else
echo " Warning: Merged file is empty for directory $dir" | tee -a $log_file
fi
else
echo " No rsid_added.gz files found in directory $dir" | tee -a $log_file
fi
done
echo "Processing completed on $(date)" >> $log_file
echo "Check $log_file for details."
# Run with:
chmod +x chemokine_chrom_merge.sh
nohup ./chemokine_chrom_merge.sh > chemokine_chrom_merge_log.txt 2>&1 &
I obtained the precalculated LD scores for Europeans within 1000 Genomes Project Phase 3, as provided by the Alkes lab. These files were specifically generated using HapMap3 SNPs within the European subset of 1000 Genomes samples, excluding the MHC region, which is particularly complex in terms of LD structure due to high genetic diversity and extended linkage disequilibrium in that region.
Using these files is standard practice for LDSC analysis focused on European ancestry because they help ensure comparability across studies, minimize confounding by population structure, and leverage the better-characterized LD patterns in this population.
I’m also using a HapMap3 snplist: w_hm3.snplist.
The Olink files have LOG10P. LDSC needs non-transformed p-values.
#!/bin/bash
# Loop through each *merged_rsid_added.gz file
for file in *_merged_rsid_added.gz; do
# Define the output filename
output_file="${file%_merged_rsid_added.gz}_with_pval.gz"
# Use awk to calculate P-value from LOG10P and write to a new gzipped file
gunzip -c "$file" | \
awk 'BEGIN {OFS="\t"} NR==1 {print $0, "P"} NR>1 {pval = (10 ^ -$13); print $0, pval}' | \
gzip > "$output_file"
echo "Processed file saved: $output_file"
done
#Run with:
chmod +x pval.sh
nohup ./pval.sh> pval_log.txt 2>&1 &
LDSC requires that summary statistics be in a certain form. Getting them into that form is something they refer to as “munging.” I tried munging just one file. It worked.
python /Users/charleenadams/temp_BI/chemokine_rgs_olink/ldsc/munge_sumstats.py \
--sumstats CCL13_Q99616_OID20655_v1_Inflammation_with_pval.gz \
--snp rsid \
--a1 ALLELE1 \
--a2 ALLELE0 \
--frq A1FREQ \
--N-col N \
--p P \
--signed-sumstats BETA,0 \
--out CCL13_Q99616_OID20655_v1_Inflammation_with_pval_TEST \
--merge-alleles /Users/charleenadams/temp_BI/chemokine_rgs_olink/w_hm3.snplist
#!/bin/bash
# Define base directory
base_dir="/Users/charleenadams/temp_BI/chemokine_rgs_olink"
output_dir="$base_dir/munged"
# Create the output directory if it doesn't exist
mkdir -p "$output_dir"
# Loop through each file ending with "_with_pval.gz" in the base directory
for file in "$base_dir"/*_with_pval.gz; do
# Extract the base name without the extension
base_name=$(basename "$file" "_with_pval.gz")
# Define the output filename for munging
output_file="$output_dir/${base_name}_munged"
# Perform the munging, specifying the column names explicitly
munge_sumstats.py \
--sumstats "$file" \
--snp rsid \
--a1 ALLELE1 \
--a2 ALLELE0 \
--frq A1FREQ \
--N-col N \
--p P \
--signed-sumstats BETA,0 \
--out "$output_file" \
--merge-alleles /Users/charleenadams/temp_BI/chemokine_rgs_olink/w_hm3.snplist
echo "Munged file created: $output_file"
done
# Run with:
chmod +x munge_loop.sh
nohup ./munge_loop.sh > munge_loop_log.txt 2>&1 &
NB: This works but is slow! Next time we grab more sets of UKB-PPP summary statistics, it will be worth running this step in parallel.
#!/bin/bash
# Define base directory and reference LD scores
base_dir="/Users/charleenadams/temp_BI/chemokine_rgs_olink"
ldscores="$base_dir/1000G_Phase3_weights_hm3_no_MHC" # Adjust path if necessary
# Directory for the munged files
munged_dir="$base_dir/munged"
# Loop over pairs of munged files
for file1 in "$munged_dir"/*.sumstats.gz; do
for file2 in "$munged_dir"/*.sumstats.gz; do
# Avoid redundant calculations
if [[ "$file1" < "$file2" ]]; then
# Extract the filename without path and suffix for clarity in output
base_file1=$(basename "${file1%_munged.sumstats.gz}")
base_file2=$(basename "${file2%_munged.sumstats.gz}")
output_prefix="$munged_dir/${base_file1}_${base_file2}_correlation"
# Count unique SNPs for each file pair
M_count=$(gunzip -c "$file1" "$file2" | awk 'NR > 1 {print $1}' | sort | uniq | wc -l)
# Run LDSC for genetic correlation
./ldsc/ldsc.py \
--rg "$file1","$file2" \
--ref-ld-chr "$ldscores"/weights.hm3_noMHC. \
--w-ld-chr "$ldscores"/weights.hm3_noMHC. \
--out "$output_prefix" \
--not-M-5-50 \
--M "$M_count"
echo "Genetic correlation output created: ${output_prefix}.log"
fi
done
done
# Run with:
chmod +x ldsc_chemokines.sh
nohup ./ldsc_chemokines.sh > ldsc_chemokines_log.txt 2>&1 &
The results are generated as log files and don’t provide the genetic correlations until the end. Create a loop to extract the results from each file and merge them together so that we can look at them.
#!/bin/bash
# Define the output file and write the header only once
output_file="genetic_correlation_master_results.txt"
echo -e "p1\tp2\trg\tse\tz\tp\th2_obs\th2_obs_se\th2_int\th2_int_se\tgcov_int\tgcov_int_se" > "$output_file"
# Loop over each .log file to extract the result line
for log_file in *.log; do
# Extract lines that have .sumstats.gz followed by numeric values
awk '/\.sumstats\.gz/ && /[0-9]+\.[0-9]+/' "$log_file" >> "$output_file"
done
#Run with:
chmod +x extract_rgs_from_correlation_logs.sh
nohup ./extract_rgs_from_correlation_logs.sh > extract_rgs_from_correlation_logs_log.txt 2>&1 &
“P1” and “P2” refer to “trait 1” and “trait 2” in LDSC. The file names for each chemokine were originally used to index the traits. But this is messy to look at. I want to display the results by the chemokine symbol nicely in a heatmap.
#!/bin/bash
# Define input and output files
input_file="genetic_correlation_master_results.txt"
output_file="formatted_genetic_correlation_results.txt"
# Write the header to the output file
head -n 1 "$input_file" > "$output_file"
# Process each line, extracting and formatting the p1 and p2 fields
tail -n +2 "$input_file" | while read -r line; do
# Extract the full paths for p1 and p2
p1=$(echo "$line" | awk '{print $1}')
p2=$(echo "$line" | awk '{print $2}')
# Extract only the chemokine label (e.g., CCL13, CCL14) from each path
p1_label=$(echo "$p1" | sed -E 's|.*/([A-Z]+[0-9]+)_.*|\1|')
p2_label=$(echo "$p2" | sed -E 's|.*/([A-Z]+[0-9]+)_.*|\1|')
# Replace the original paths in the line with the formatted labels
formatted_line=$(echo "$line" | sed -E "s|$p1|$p1_label|; s|$p2|$p2_label|")
# Append the formatted line to the output file
echo "$formatted_line" >> "$output_file"
done
echo "Formatting complete. Output saved to $output_file."
#Run with:
chmod +x format_p1_p2.sh
./format_p1_p2.sh
These results provide estimates of genetic correlation between pairs of chemokine traits (p1 and p2), along with additional information regarding the heritability of p2 and genetic covariance between the traits. Below is a description of each column.
The rg and p columns are key for understanding the degree and significance of genetic overlap between the two traits, while the heritability and intercept values provide additional context about the genetic architecture of p2 and any potential confounding influences affecting the estimates.
library(DT)
datatable(data, #datatable(head(data, n = 10),
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Genetic Correlation (from LDSC) Results')
# Load necessary libraries
library(ggplot2)
library(reshape2)
library(dplyr)
library(tidyr)
setwd("/Users/charleenadams/temp_BI/chemokine_rgs_olink/munged")
# Load the data
data <- read.table("formatted_genetic_correlation_results.txt", header = TRUE, stringsAsFactors = FALSE)
# Ensure symmetry by adding reversed rows
data_symmetric <- bind_rows(data, data %>% rename(p1 = p2, p2 = p1))
# Create the matrix for the rg values
rg_matrix <- data_symmetric %>%
select(p1, p2, rg) %>%
spread(key = p2, value = rg, fill = NA)
# Set row names as p1 and drop the p1 column
rownames(rg_matrix) <- rg_matrix$p1
rg_matrix <- rg_matrix %>% select(-p1)
# Replace NA values with 0 for clustering purposes (optional, adjust as needed)
rg_matrix[is.na(rg_matrix)] <- 0
# Perform hierarchical clustering
dist_matrix <- dist(rg_matrix) # Calculate distance
cluster_rows <- hclust(dist_matrix) # Row clustering
cluster_cols <- hclust(dist(t(rg_matrix))) # Column clustering
# Reorder the matrix based on clustering
rg_matrix_ordered <- rg_matrix[cluster_rows$order, cluster_cols$order]
# Melt the reordered matrix for ggplot
rg_long <- melt(as.matrix(rg_matrix_ordered), na.rm = TRUE)
colnames(rg_long) <- c("p1", "p2", "rg")
# Plot the heatmap with hierarchical clustering
ggplot(rg_long, aes(x = p1, y = p2, fill = rg)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0,
limit = c(-1, 1), space = "Lab", name = "Genetic Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 10, hjust = 1),
axis.text.y = element_text(size = 10),
axis.title = element_blank(),
legend.position = "bottom",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8)) +
coord_fixed()
Hierarchical Clustering Heatmap for Genetic Correlation of 33 Olink Chemokines
# Define the number of clusters you'd like to identify (adjust as needed)
num_clusters <- 5 # or any number of clusters you want to cut at
# Cut the dendrograms to get clusters for rows and columns
row_clusters <- cutree(cluster_rows, k = num_clusters)
col_clusters <- cutree(cluster_cols, k = num_clusters)
# Convert clusters to a data frame for rows (chemokines)
row_cluster_df <- data.frame(Chemokine = names(row_clusters), Cluster = row_clusters)
row_cluster_df <- row_cluster_df[order(row_cluster_df$Cluster),]
# Print row clusters in a table format
print("Row Clusters (Chemokines grouped by row):")
## [1] "Row Clusters (Chemokines grouped by row):"
print(row_cluster_df)
## Chemokine Cluster
## CCL13 CCL13 1
## CCL17 CCL17 1
## CCL2 CCL2 1
## CCL24 CCL24 1
## CCL26 CCL26 1
## CCL27 CCL27 1
## CCL4 CCL4 1
## CCL5 CCL5 1
## CCL8 CCL8 1
## CXCL11 CXCL11 1
## CXCL3 CXCL3 1
## CXCL5 CXCL5 1
## CXCL6 CXCL6 1
## CCL14 CCL14 2
## CCL7 CCL7 2
## CCL15 CCL15 3
## CCL19 CCL19 3
## CCL20 CCL20 3
## CCL21 CCL21 3
## CCL23 CCL23 3
## CXCL10 CXCL10 3
## CXCL9 CXCL9 3
## CCL16 CCL16 4
## CCL18 CCL18 4
## CCL22 CCL22 4
## CCL3 CCL3 4
## CXCL13 CXCL13 4
## CCL25 CCL25 5
## CCL28 CCL28 5
## CXCL14 CXCL14 5
## CXCL16 CXCL16 5
## CXCL17 CXCL17 5
## TAFA5 TAFA5 5
# If you want to save it as a CSV file, you can use:
# write.csv(row_cluster_df, "row_clusters.csv", row.names = FALSE)
# Load necessary libraries
library(ggplot2)
library(reshape2)
library(dplyr)
library(tidyr)
setwd("/Users/charleenadams/temp_BI/chemokine_rgs_olink/munged")
# Load the data
data <- read.table("formatted_genetic_correlation_results.txt", header = TRUE, stringsAsFactors = FALSE)
# Ensure symmetry by adding reversed rows
data_symmetric <- bind_rows(data, data %>% rename(p1 = p2, p2 = p1))
# Create the matrix for the rg values
rg_matrix <- data_symmetric %>%
select(p1, p2, rg) %>%
spread(key = p2, value = rg, fill = 0) # Replace NAs with 0 for k-means
# Set row names as p1 and drop the p1 column
rownames(rg_matrix) <- rg_matrix$p1
rg_matrix <- rg_matrix %>% select(-p1)
# Define the number of clusters
set.seed(42) # For reproducibility
k <- 5 # Change this based on desired clusters
# Apply k-means clustering
kmeans_result <- kmeans(rg_matrix, centers = k)
cluster_assignment <- kmeans_result$cluster
# Order rows and columns by cluster
rg_matrix_ordered <- rg_matrix[order(cluster_assignment), order(cluster_assignment)]
# Melt the ordered matrix for ggplot
rg_long <- melt(as.matrix(rg_matrix_ordered), na.rm = TRUE)
colnames(rg_long) <- c("p1", "p2", "rg")
# Plot the heatmap with k-means clustering
ggplot(rg_long, aes(x = p1, y = p2, fill = rg)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0,
limit = c(-1, 1), space = "Lab", name = "Genetic Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 10, hjust = 1),
axis.text.y = element_text(size = 10),
axis.title = element_blank(),
legend.position = "bottom",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8)) +
coord_fixed() +
ggtitle(paste("Genetic Correlation Heatmap with K-means Clustering (k =", k, ")"))
K-means Clustering Heatmap for Genetic Correlation of 33 Olink Chemokines
# Load necessary libraries
library(dplyr)
library(tidyr)
# Define number of clusters
k <- 5
# K-means clustering on the correlation matrix
set.seed(42)
kmeans_result <- kmeans(rg_matrix, centers = k)
kmeans_clusters <- as.data.frame(kmeans_result$cluster)
colnames(kmeans_clusters) <- "Kmeans_Cluster"
kmeans_clusters$Chemokine <- rownames(kmeans_clusters)
# Hierarchical clustering on the distance matrix
dist_matrix <- dist(rg_matrix)
hclust_result <- hclust(dist_matrix)
hierarchical_clusters <- cutree(hclust_result, k = k) # Assign to 'k' clusters
hierarchical_clusters <- as.data.frame(hierarchical_clusters)
colnames(hierarchical_clusters) <- "Hierarchical_Cluster"
hierarchical_clusters$Chemokine <- rownames(hierarchical_clusters)
# Combine results into one table for comparison
cluster_comparison <- left_join(kmeans_clusters, hierarchical_clusters, by = "Chemokine")
cluster_comparison <- cluster_comparison %>% arrange(Kmeans_Cluster, Hierarchical_Cluster)
# Display the table
knitr::kable(cluster_comparison, caption = "Comparison of K-means and Hierarchical Clustering Assignments")
| Kmeans_Cluster | Chemokine | Hierarchical_Cluster |
|---|---|---|
| 1 | CCL13 | 1 |
| 1 | CCL17 | 1 |
| 1 | CCL2 | 1 |
| 1 | CCL26 | 1 |
| 1 | CCL5 | 1 |
| 1 | CCL8 | 1 |
| 1 | CXCL11 | 1 |
| 1 | CXCL3 | 1 |
| 1 | CXCL5 | 1 |
| 1 | CXCL6 | 1 |
| 2 | CCL4 | 1 |
| 2 | CCL15 | 3 |
| 2 | CCL19 | 3 |
| 2 | CCL20 | 3 |
| 2 | CCL21 | 3 |
| 2 | CCL23 | 3 |
| 2 | CXCL10 | 3 |
| 2 | CXCL9 | 3 |
| 2 | CCL18 | 4 |
| 2 | CXCL13 | 4 |
| 3 | CCL24 | 1 |
| 3 | CCL27 | 1 |
| 3 | CCL25 | 5 |
| 3 | CCL28 | 5 |
| 3 | CXCL14 | 5 |
| 3 | CXCL16 | 5 |
| 3 | CXCL17 | 5 |
| 3 | TAFA5 | 5 |
| 4 | CCL16 | 4 |
| 4 | CCL3 | 4 |
| 5 | CCL14 | 2 |
| 5 | CCL7 | 2 |
| 5 | CCL22 | 4 |
We analyzed genetic correlations among 33 Olink chemokines using LDSC with UKB-PPP data for European ancestry. LDSC required linkage disequilibrium (LD) scores based on squared correlation coefficients (r²), unlike SuSiE and coloc, which rely on signed correlations (r). Chemokine summary statistics were obtained from Synapse, with SNPs mapped via rsid maps, and were processed to meet LDSC input standards by converting log-transformed p-values to standard p-values. Genetic correlations were estimated for each chemokine pair using precomputed LD scores from the 1000 Genomes European dataset, focusing on HapMap3 SNPs outside the MHC region to avoid genetic diversity confounding. For clustering, we applied both hierarchical and k-means clustering to visualize chemokine relationships, using a five-cluster solution for comparison. Results indicated consistent groupings across methods, highlighting chemokine relationships (mostly positively genetically correlated) and cluster stability across clustering techniques.
In the conversation between us (me, Rob, and Usman) on 10/30/2024, we noted that I had not included some chemokines. I identified some that were missed.
| UKBPPP Protein ID | Olink Target Full Name | Olink ID | UniProt | Assay |
|---|---|---|---|---|
| CXCL8:P10145:OID20153:v1 | Interleukin-8 | OID20153 | P10145 | CXCL8 |
| CXCL8:P10145:OID20631:v1 | Interleukin-8 | OID20631 | P10145 | CXCL8 |
| CXCL8:P10145:OID20997:v1 | Interleukin-8 | OID20997 | P10145 | CXCL8 |
| CXCL8:P10145:OID21430:v1 | Interleukin-8 | OID21430 | P10145 | CXCL8 |
| CXCL1:P09341:OID20762:v1 | Growth-regulated alpha protein | OID20762 | P09341 | CXCL1 |
| CCL11:P51671:OID20668:v1 | Eotaxin | OID20668 | P51671 | CCL11 |
I could fetch these from Synapse and re-run the pipeline to munge these and obtain the pairwise genetic correlations. This would give me the chance to parallelize the genetic correlations script. But before doing this, we might want to think about it more (see NOTE 3).
Note that there are four versions of CXCL8. I’m not sure why Olink has done that, but I’m guessing it has to do with QC. When the genetic correlations are finished, I’ll need to re-name these so that they don’t look like duplicates in the correlation plot.
Some things done with GenomicSEM require a genetic correlation matrix. So, obtaining these now kills two birds with one stone, though I don’t know if LDSC can be done within GenomicSEM or if so whether it would be more efficient to do so. Regardless, some hypotheses need a genetic correlation matrix to answer with GenomicSEM. Usman is particularly interested in GenomicSEM for multivariate analyses.
The new laptop is here! Our Gerzsten drive isn’t mapped to it yet, and I need to get VPN installed, which could take several days. But I could fetch all the UKB-PPP files for EUR from Synapse, if we want. I suspect the compressed EUR files will be about 2-3 TB. After adding the rsids and pvals, which creates new files, and after munging, which also creates new files, we’ll be up to ~6-7TB. (I could perhaps add the rsids and convert from LOG10P to p-values in one script to reduce the times I create new files.) I think with the right parallelizing, it should take 5-7 days to run genetic correlations with the UKB-PPP EUR set of ~2900 protein summary statistics. This is worth it if we want to do lots of different GenomicSEM analyses (e.g., latent analyses, multivariable GWASes, etc). We would not want to have any other data on the new laptop until this is done. If it turns out to be too computationally burdensome on its CPU, then we will have tested the scripts and could move it to AWS.
In addition to noting some chemokines that I hadn’t included, we decided in the conversation on Wednesday that I’d do some data mining to explore the landscape of jointly significant SNPs for the chemokines and that I’d do so at two p-value thresholds: P<5e8 and P<5e11. Below I have extracted the P<5e11 SNPs (and P<5e8 SNPs) from the EUR Olink summary statistics for the 33 chemokines. So far, the document below display the P<5e11 results.
We also spoke about the use of the Sun (UKB-PPP) paper’s supplement to find shared significant SNPs across the chemokines. Looks like they used P<1.7xe-11 (vs what I chose above). Also of note is that the table I used from their supplement (ST9) chose the most significant SNP for a region (based on their dynamic region defintion, I presume).
Rob observed that the strong genetic correlations between some chemokine traits might point to issues with Olink’s assay. In particular, see these for CCL7 and CCL8 and CCL19 and CCL21 pairs.
| p1 | p2 | rg | se | z | p | h2_obs | h2_obs_se | h2_int | h2_int_se | gcov_int | gcov_int_se |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CCL7 | CCL8 | 0.7954 | 0.0582 | 13.6576 | 1.8184e-42 | 0.1771 | 0.0244 | 1.0511 | 0.0436 | 0.4577 | 0.0087 |
| CCL19 | CCL21 | 0.7743 | 0.0654 | 11.8390 | 2.4542e-32 | 0.2744 | 0.0384 | 1.0036 | 0.0098 | 0.3442 | 0 |
It’s worth just stating it explicitly that “genetic correlations” are generated from a process that looks across the genome, accounting for LD, in order to compare the genetic architectures of two traits. Genetic correlations aren’t about finding the most significant hits.
In contrast, the comparative analysis I’m doing below is a manual data dive focused on significant hits and seeing how they compare across the chemokines. This approach isn’t focused solely on cis hits for the chemokines, which will become clearer as you see the results.
This loop below iterates through the original (but rsid annotated and p-value added) EUR Olink chemokine summary statistics by creating two subdirectories for the files with only the SNPs with the desired p-value thresholds: subset_pval_filtered_5e11 and subset_pval_filtered_5e8. Then, within each of the two directories, it creates a file that compares chemokine-by-chemokine for SNPs that are significant at the threshold.
#!/bin/bash
# Define base directory and threshold directories
base_dir="/Users/charleenadams/temp_BI/chemokine_rgs_olink"
threshold_dirs=("subset_pval_filtered_5e8" "subset_pval_filtered_5e11")
# Original file path for extracting the header (just for reference)
original_file="${base_dir}/CCL2_P13500_OID21004_v1_Neurology_with_pval.gz"
# Function to process a directory
process_directory() {
local dir_path="$1"
local output_file="${dir_path}/common_snps_comparison_p_too.txt"
# Assuming columns are always in this order: CHROM, GENPOS, rsid, ..., P (last column)
local CHROM_COL=1
local GENPOS_COL=2
local RSID_COL=3
local P_COL=$(awk 'NR==1{print NF; exit}' "$original_file") # Assuming P is the last column
# Write header to the single output file, including P value columns
echo -e "File1\tFile2\tSNP\tCHROM\tGENPOS\tFile1_P\tFile2_P" > "$output_file"
# Collect all files in the directory
local files=("$dir_path"/*_pval_filtered_*.txt)
# Iterate through each file combination
for ((i=0; i<${#files[@]}; i++)); do
for ((j=i+1; j<${#files[@]}; j++)); do
local ref_file="${files[i]}"
local comp_file="${files[j]}"
ref_name=$(basename "$ref_file" .txt)
comp_name=$(basename "$comp_file" .txt)
# Use awk with here-document to find common SNPs
awk -v p_col=$P_COL '
BEGIN {
OFS="\t"
}
NR==FNR {
snp_list[$3] = $1 "\t" $2 "\t" $(NF) # Store CHROM, GENPOS, and P value (NF is the last field)
next
}
($3 in snp_list) {
split(snp_list[$3], ref, "\t")
print "'"$ref_name"'", "'"$comp_name"'", $3, ref[1], ref[2], ref[3], $(NF)
}' "$ref_file" "$comp_file" >> "$output_file"
done
done
}
# Loop over each threshold directory
for threshold_dir in "${threshold_dirs[@]}"; do
dir_path="${base_dir}/${threshold_dir}"
process_directory "$dir_path"
done
echo "Processing complete for both directories."
# Run with:
chmod +x find_common_sig_chemokine_snps_p_too.sh
nohup ./find_common_sig_chemokine_snps_p_too.sh > find_common_sig_chemokine_snps_p_too_log.txt 2>&1 &
I read the (P<5e11) results into R to look at them. I realized it would be helpful to know the transcription start site (TSS) for each chemokine to ascertain whether the shared significant SNPs are cis or trans in relation to the chemokines. And it occurred to me that I might be able to tackle my pet obessesion about whether the trans SNPs are in enhancers.
This step is so that we can visualize where the TSSes are for our chemokines and think about them in relation to significant SNPs that are trans.
library(data.table)
library(biomaRt)
filtered_5e11 <- fread("/Users/charleenadams/temp_BI/chemokine_rgs_olink/subset_pval_filtered_5e11/common_snps_comparison_p_too.txt")
# Extract characters before the first underscore in File1 and File2
filtered_5e11[, chemokine1 := sub("_.*", "", File1)]
filtered_5e11[, chemokine2 := sub("_.*", "", File2)]
# Reorder columns as specified
filtered_5e11 <- filtered_5e11[, .(chemokine1, chemokine2, SNP, CHROM, GENPOS, File1_P, File2_P, File1, File2)]
# Define the list of unique chemokines from your data
chemokines <- unique(c(filtered_5e11$chemokine1, filtered_5e11$chemokine2))
# Connect to Ensembl's human gene dataset
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
# Query TSS information for only the main chromosomes (1-22, X, Y, MT)
tss_data <- getBM(
attributes = c("hgnc_symbol", "chromosome_name", "transcription_start_site"),
filters = c("hgnc_symbol", "chromosome_name"),
values = list(chemokines, c(1:22, "X", "Y", "MT")),
mart = ensembl
)
# Convert to data.table if it's not already
if (!is.data.table(tss_data)) {
tss_data <- as.data.table(tss_data)
}
# Rename columns for clarity
setnames(tss_data, c("hgnc_symbol", "chromosome_name", "transcription_start_site"),
c("chemokine", "TSS_chrom", "TSS_pos"))
# Remove duplicate entries in tss_data based on chemokine name
tss_data <- tss_data[!duplicated(tss_data$chemokine)]
# Now proceed with the merge
annotated_data <- merge(filtered_5e11, tss_data, by.x = "chemokine1", by.y = "chemokine", all.x = TRUE)
setnames(annotated_data, "TSS_chrom", "TSS_chrom1")
setnames(annotated_data, "TSS_pos", "TSS_pos1")
# Merge TSS info for chemokine2
annotated_data <- merge(annotated_data, tss_data, by.x = "chemokine2", by.y = "chemokine", all.x = TRUE)
setnames(annotated_data, "TSS_chrom", "TSS_chrom2")
setnames(annotated_data, "TSS_pos", "TSS_pos2")
# Reorder columns if needed
setcolorder(annotated_data, c("chemokine1", "chemokine2", "SNP", "CHROM", "GENPOS", "TSS_chrom1", "TSS_pos1", "TSS_chrom2", "TSS_pos2", "File1_P", "File2_P", "File1", "File2"))
# Extract only the chromosome number from TSS_chrom1 and TSS_chrom2, ensuring only primary chromosomes are kept
annotated_data[, TSS_chrom1 := sub("^.*?(\\d+|X|Y|MT).*", "\\1", TSS_chrom1)]
annotated_data[, TSS_chrom2 := sub("^.*?(\\d+|X|Y|MT).*", "\\1", TSS_chrom2)]
filtered_5e11 <- annotated_data
# Assume your data is in a data.table called `filtered_5e11`
snp_data <- copy(filtered_5e11)
# Connect to the Ensembl regulatory feature dataset
ensembl_reg <- useMart("ENSEMBL_MART_FUNCGEN", dataset = "hsapiens_regulatory_feature", host = "https://www.ensembl.org")
# Retrieve enhancer regions using the correct filter and attributes
enhancer_data <- getBM(
attributes = c("chromosome_name", "chromosome_start", "chromosome_end"),
filters = "regulatory_feature_type_name",
values = "enhancer",
mart = ensembl_reg
)
# Convert enhancer data to a data.table
enhancer_data <- as.data.table(enhancer_data)
setnames(enhancer_data, c("chromosome_name", "chromosome_start", "chromosome_end"), c("CHROM", "start_position", "end_position"))
# Ensure the chromosome columns in both tables are numeric
snp_data[, CHROM := as.integer(CHROM)]
enhancer_data[, CHROM := as.integer(CHROM)]
# Prepare enhancer data for overlap join
setkey(enhancer_data, CHROM, start_position, end_position)
snp_data[, enhancer := "Not in Enhancer"] # Default value
# Create temporary range columns for the SNP position
snp_data[, `:=`(GENPOS_start = GENPOS, GENPOS_end = GENPOS)]
# Use foverlaps to find matches between SNPs and enhancer regions
setkey(snp_data, CHROM, GENPOS_start, GENPOS_end)
overlap_result <- foverlaps(snp_data, enhancer_data, by.x = c("CHROM", "GENPOS_start", "GENPOS_end"), by.y = c("CHROM", "start_position", "end_position"), nomatch = 0)
# Annotate SNPs that fall in enhancer regions
snp_data[overlap_result, enhancer := "In Enhancer", on = .(CHROM, GENPOS_start = GENPOS)]
# Remove temporary columns
snp_data[, `:=`(GENPOS_start = NULL, GENPOS_end = NULL)]
filtered_5e11 <- snp_data
# Reorder columns if needed
setcolorder(filtered_5e11, c("chemokine1", "chemokine2", "SNP", "CHROM", "GENPOS", "TSS_chrom1", "TSS_pos1", "TSS_chrom2", "TSS_pos2", "enhancer", "File1_P", "File2_P", "File1", "File2"))
write.table(filtered_5e11, "/Users/charleenadams/temp_BI/chemokine_rgs_olink/filtered_5e11.txt",
row.names = FALSE, col.names = TRUE, quote = FALSE, sep = '\t')
chemokine1: Identifies the first chemokine in each analyzed pair, represented by its standard abbreviation (e.g., “CCL19” for C-C motif chemokine ligand 19).
chemokine2: Specifies the second chemokine in each pair, represented by its standard abbreviation. Pairing allows for comparative analysis of genomic regions between the two chemokines.
SNP: Refers to the rsID identifier
of shared, significant SNPs between chemokine1 and
chemokine2 (e.g., “rs6679677”). Each SNP was filtered for
statistical significance, representing variants of interest that are
common between the chemokine pairs.
CHROM: The chromosome number on which the shared SNP is located.
GENPOS: The genomic position (GRCh38) of the SNP
in base pairs on the chromosome specified in CHROM. This
precise location allows alignment with transcription start sites (TSS)
or enhancer regions for further analysis.
TSS_chrom1: The chromosome where the
transcription start site (TSS) of chemokine1 is located, as
identified by Ensembl’s human gene dataset
(hsapiens_gene_ensembl). This information helps assess
whether the SNP is on the same chromosome (cis) or a different
chromosome (trans) in relation to chemokine1.
TSS_pos1: The exact position (in base pairs) of
the TSS for chemokine1 on TSS_chrom1,
retrieved from Ensembl. This positional data aids in proximity analysis
between the SNP and chemokine1.
TSS_chrom2: The chromosome where the TSS for
chemokine2 is located, retrieved from Ensembl. This value
allows determination of whether chemokine2 is in a cis or
trans relationship to the SNP.
TSS_pos2: The specific genomic position (in base
pairs) of the TSS for chemokine2 on
TSS_chrom2, aiding in proximity evaluation relative to the
SNP for chemokine2.
enhancer: Indicates whether each SNP falls
within an enhancer region (“In Enhancer”) or outside of one (“Not in
Enhancer”). Enhancer regions were identified through Ensembl’s
regulatory feature dataset (hsapiens_regulatory_feature),
focusing specifically on entries classified as “enhancer.”
File1_P and File2_P: P-values
associated with each SNP for chemokine1 and
chemokine2, respectively. These values quantify the
strength of association of the SNP with each chemokine trait.
File1 and File2: Metadata from Olink describing the specific assay and condition information for each chemokine. This includes assay details, context of measurement, and versioning, providing additional context for each chemokine’s data.
I’m going to provide all the results in a table after giving just the results for CCL2 and CXCL10. I just wanted to give you the data for each of them in case you wanted to explore them individually. In the full data I’ll post below these, you can also key into them, but it’s easier separately. So, that’s why I’m being redundant.
# Filter for rows where chemokine1 is "CCL2"
ccl2_shared <- filtered_5e11[chemokine1 == "CCL2"]
library(DT)
datatable(ccl2_shared, #datatable(head(data, n = 10),
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Commonly Significant SNPs Between CCL2 and 32 Other Chemokines (P<5e11)')
# Filter for rows where chemokine1 is "CXCL10"
CXCL10_shared <- filtered_5e11[chemokine1 == "CXCL10"]
library(DT)
datatable(CXCL10_shared, #datatable(head(data, n = 10),
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Commonly Significant SNPs Between CXCL10 and 32 Other Chemokines (P<5e11)')
library(DT)
datatable(filtered_5e11, #datatable(head(data, n = 10),
options = list(pageLength = 10, autoWidth = TRUE, scrollX = TRUE),
rownames = FALSE,
caption = 'Commonly Significant SNPs Between Chemokines (P<5e11)')