sumstats:
CXCL3 CCL4 CCL27userGWAS)
The analysis integrates Genomic Structural Equation Modeling (GenomicSEM) with multivariate summary statistics of three chemokines—CXCL3, CCL4, and CCL27—identified as moderately genetically correlated through linkage disequilibrium score regression (LDSC). By leveraging multivariate methods, the study highlights the advantages of capturing shared genetic architecture across traits while reducing noise compared to traditional univariate approaches.
Key findings include the identification of two genomic loci, on chromosomes 2 and 7, containing variants (e.g., rs1260326 near GCKR and rs7779873 near CD36) with significant associations and potential implications in metabolic pathways. Functional annotation using tools such as CADD/PHRED and FUMA underscores the biological relevance of these loci.
The value of GenomicSEM lies in its ability to provide a robust framework for testing the multivariate genetic architecture of traits, enhancing statistical power and uncovering pleiotropic effects that may remain undetected in pairwise or univariate analyses. By integrating functional annotations and mapping tools, this approach strengthens the interpretation of genetic findings and their translational potential in complex traits, such as inflammatory and metabolic disorders.
nohup Rscript merged_munged_40.R > merged_munged_40.log 2>&1 &
nohup Rscript add_p_sumstats.R > add_p_sumstats.log 2>&1 &
nohup Rscript sumstats_on_sumstats.R > sumstats_on_sumstats.log 2>&1 &
nohup Rscript ldsc40.R > ldsc40.log 2>&1 &
# Path to the saved plot
plot_path <- "/Users/charleenadams/temp_BI/chemokine_rgs_olink/processed_ukbppp_chemokine_list/munged_40/merged_munged_protein_results/pvals_fix/ldsc_results_rgs/paLDSC_plot.png"
# Include the plot
include_graphics(plot_path)
userGWAS)Structural Equation Modeling (SEM) is a statistical technique used to model and analyze complex relationships between observed (measured) and unobserved (latent) variables. It combines elements of multiple regression, factor analysis, and path analysis into a single, flexible framework.
Likewise, Genomic Structural Equation Modeling (GenomicSEM) is a statistical framework designed for SEM with GWAS summary statistics. It adapts SEM principles to work with genetic covariance matrices, enabling the study of complex relationships among traits, shared genetic architectures, and causal pathways. It uses two matrices:
Modeling:
General SEM Framework:
Parameter Estimation:
nohup Rscript usergwas_3chemokines.R > usergwas_3chemokines.log 2>&1 &
nohup Rscript /Users/charleenadams/ukbppp/comp_manhattan.R > comp_manhattan.log 2>&1 &
## CXCL3 Manhattan Plot
## CCL4 Manhattan Plot
## CCL27 Manhattan Plot
## Latent Factor Manhattan Plot (no filtering)
## Stringent QC: Labeled Latent Factor Manhattan Plot Manhattan Plot
Together, Cluster of Differentiation 36 (CD36) and
Glucokinase Regulatory Protein (GCKR) are critical to
understanding the genetic architecture of metabolic disorders. These
genes provide valuable insights for developing therapeutic strategies
targeting:
- Diabetes
- Dyslipidemia
- Cardiovascular Diseases
- NAFLD
Definition: CADD is a computational tool that scores the deleteriousness (harmfulness) of single nucleotide variants (SNVs) and small insertions/deletions (indels) in the genome.
Purpose: It combines multiple annotations (e.g., conservation, regulatory impact, protein function, etc.) into a single score to predict how damaging a variant might be.
Uses: Phred Quality Score (PHRED), which provides a scaled score indicating the relative deleteriousness of a variant.
| SNP | CHR | BP | MAF | A1 | A2 | est | SE | Pval_Estimate | nearest_gene | Consequence | ConsDetail | GeneName | PHRED | PhastCons | PhyloP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rs1728918 | 2 | 27635463 | 0.272366 | A | G | 0.04235485 | 0.006950637 | 1.103550e-09 | PPM1G | UPSTREAM | upstream | PPM1G | 2.458 | 0.060 | 0.283 |
| rs1260326 | 2 | 27730940 | 0.410537 | T | C | 0.04512821 | 0.006290023 | 7.253483e-13 | GCKR | NON_SYNONYMOUS | splice,missense | GCKR | 13.220 | 0.995 | 0.943 |
| rs780094 | 2 | 27741237 | 0.410537 | T | C | 0.04275407 | 0.006290005 | 1.067132e-11 | GCKR | INTRONIC | intron | GCKR | 1.852 | 0.000 | -0.872 |
| rs780093 | 2 | 27742603 | 0.411531 | T | C | 0.04217555 | 0.006287704 | 1.978192e-11 | GCKR | INTRONIC | intron | GCKR | 1.541 | 0.000 | -0.109 |
| rs7779873 | 7 | 80211423 | 0.453280 | G | A | -0.03564340 | 0.006215646 | 9.782121e-09 | CD36 | INTRONIC | intron | CD36 | 5.362 | 0.002 | 0.173 |
| rs6961069 | 7 | 80218961 | 0.419483 | C | T | -0.03646031 | 0.006270293 | 6.071746e-09 | CD36 | INTRONIC | intron | CD36 | 5.150 | 0.000 | 0.382 |
| rs13236689 | 7 | 80236014 | 0.424453 | T | G | -0.03506074 | 0.006260322 | 2.137726e-08 | CD36 | INTRONIC | intron | CD36 | 8.262 | 0.005 | -0.008 |
rs1260326 (PHRED:
13.22) are of higher interest due to their
non-synonymous coding impact and proximity to the gene GCKR.
Variants within CD36 show low conservation but may play roles
in regulatory mechanisms.PPM1G encodes a serine/threonine phosphatase that is
part of the protein phosphatase 2C (PP2C) family. It plays a critical
role in:
- Dephosphorylation: Removes phosphate groups from
serine and threonine residues on proteins.
- Cell Cycle Regulation: Supports proper cell cycle
progression.
- Pre-mRNA Splicing: Ensures accurate and efficient
gene expression by regulating pre-mRNA splicing.
- Cell Stress Response: Dephosphorylates proteins
involved in stress adaptation.
- Cancer Biology: Altered expression has been linked to
tumorigenesis.
- Neurological Function: Plays a role in neuronal
signaling and brain activity.
## GTEx Heatmap
FUMA’s default for independence is Plink’s clumping at r2 < 0.6. This is user-defined. I could have specified a stricter independence threshold, but these are “indepedent at r2 < 0.6.
| Genomic Locus | uniqID | rsID | chr | pos |
|---|---|---|---|---|
| 1 | 2:27635463:A:G | rs1728918 | 2 | 27635463 |
| 1 | 2:27730940:C:T | rs1260326 | 2 | 27730940 |
| 2 | 7:80211423:A:G | rs7779873 | 7 | 80211423 |
| 2 | 7:80218961:C:T | rs6961069 | 7 | 80218961 |
I wanted to know the actual r2 between those identified with the r2 < 0.6 threshold in FUMA as “independent.” I used LD Pair: