Study Overview

SPTLC3 (Serine Palmitoyltransferase Long Chain Base Subunit 3) encodes a catalytic subunit of serine palmitoyltransferase (SPT), the enzyme responsible for the first and rate-limiting step in de novo sphingolipid biosynthesis.

Prior knockout studies in model organisms have demonstrated that loss of SPTLC3 function alters circulating lipid levels, but the direction and magnitude of effect from a common variant in humans is not pre-specified. This PheWAS is exploratory: we are looking for associations in either direction.

This report summarizes PheWAS meta-analysis across 8 variants on chromosome 20 and 1803 unique disease outcomes (phecodes), combining results from 6 genetic ancestry groups (AFR, AMR, EAS, EUR, MID, SAS). Per-ancestry PheWAS results were generated using PheTK and combined using random-effects meta-analysis (REML) via the metafor package.

Variants Tested

All eight variants fall within SPTLC3 on chromosome 20 (positions ~13.07–13.16 Mb, hg38), spanning exons 3 through 11. The table below summarizes the molecular annotation for each variant. Consequence type is color-coded: missense variants alter the amino acid sequence, synonymous variants do not, intronic variants fall outside coding sequence, and splice region variants may affect mRNA splicing.

variant_metadata |>
  arrange(position) |>
  select(rsid, position, exon, dna_change, protein_change,
         consequence, aou_freq, allele_count) |>
  mutate(aou_freq = round(aou_freq, 3)) |>
  rename(
    rsID             = rsid,
    Position         = position,
    Exon             = exon,
    `DNA Change`     = dna_change,
    `Protein Change` = protein_change,
    Consequence      = consequence,
    `AoU Freq`       = aou_freq,
    `Allele Count`   = allele_count
  ) |>
  kable(format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) |>
  column_spec(
    6,
    color = "white",
    background = case_when(
      variant_metadata |> arrange(position) |> pull(consequence) == "Missense"
        ~ "#e74c3c",
      variant_metadata |> arrange(position) |> pull(consequence) == "Synonymous"
        ~ "#3498db",
      variant_metadata |> arrange(position) |> pull(consequence) == "Splice Region, Intron"
        ~ "#e67e22",
      TRUE ~ "#95a5a6"
    )
  )
rsID Position Exon DNA Change Protein Change Consequence AoU Freq Allele Count
rs243887 13,072,370 3 c.418T>G p.(Leu140Val) Missense 0.769 638,048
rs243888 13,072,387 3 c.435A>G p.(Ser145=) Synonymous 0.656 544,144
rs117004417 13,079,985 4-5 c.607+5488C>T Intron 0.006 5,305
rs77523068 13,080,015 4-5 c.607+5518G>T Intron 0.014 11,706
rs6109692 13,091,168 5 c.693C>T p.(Phe231=) Synonymous 0.095 78,432
rs77696068 13,093,478 5-6 c.733-6G>A Splice Region, Intron 0.016 13,327
rs6078938 13,154,121 10 c.1398T>C p.(Tyr466=) Synonymous 0.160 132,725
rs61738161 13,160,073 11 c.1486G>A p.(Ala496Thr) Missense 0.041 34,414

The chart below shows the ancestry composition of allele carriers for each variant. Variants differ substantially in which ancestry groups carry them — for example, rs77523068 is predominantly carried by AFR individuals (88%), while rs117004417 is predominantly EAS (25%) and EUR (52%).

ancestry_long <- variant_metadata |>
  arrange(position) |>
  select(rsid, AFR, EAS, EUR, AMR, MID, SAS, REM) |>
  pivot_longer(-rsid, names_to = "Ancestry", values_to = "Percent") |>
  mutate(
    rsid    = factor(rsid, levels = variant_metadata |>
                       arrange(position) |> pull(rsid)),
    Ancestry = factor(Ancestry,
                      levels = c("AFR", "AMR", "EAS", "EUR", "MID", "SAS", "REM"))
  )

ancestry_colors <- c(
  AFR = "#E6851E", AMR = "#9B6DB5", EAS = "#1B9E77",
  EUR = "#3498db", MID = "#e74c3c", SAS = "#F39C12", REM = "#95a5a6"
)

plot_ly(
  ancestry_long,
  x       = ~rsid,
  y       = ~Percent,
  color   = ~Ancestry,
  colors  = ancestry_colors,
  type    = "bar",
  text    = ~paste0(Ancestry, ": ", Percent, "%"),
  hoverinfo = "text"
) |>
  layout(
    barmode = "stack",
    title   = list(text = "Ancestry Composition of Allele Carriers by Variant",
                   x = 0),
    xaxis   = list(title = "Variant (rsID)", tickangle = -30),
    yaxis   = list(title = "Percentage of Carriers (%)", range = c(0, 101)),
    legend  = list(title = list(text = "Ancestry"))
  )

Phecode Category Coverage

phecode_counts <- meta_results |>
  distinct(phecode, phecode_category) |>
  count(phecode_category, name = "n_phecodes") |>
  arrange(n_phecodes)

plot_ly(
  phecode_counts,
  x         = ~n_phecodes,
  y         = ~reorder(phecode_category, n_phecodes),
  type      = "bar",
  orientation = "h",
  marker    = list(color = "#3498db"),
  text      = ~n_phecodes,
  textposition = "outside",
  hovertemplate = "%{y}: %{x} phecodes<extra></extra>"
) |>
  layout(
    title  = list(text = "Phecodes Tested per Disease Category", x = 0),
    xaxis  = list(title = "Number of Phecodes"),
    yaxis  = list(title = ""),
    margin = list(l = 140)
  )

Quality Control

Before examining results, we verify that the meta-analysis behaved as expected statistically. The QC section examines p-value distributions, ancestry group coverage, and between-ancestry heterogeneity.

P-value Distribution

Under the null hypothesis, p-values follow a uniform distribution between 0 and 1. The red dashed line marks the expected bin count under a perfect uniform distribution. Inflation (skew toward low p-values) would indicate confounding or population stratification; deflation would suggest overly conservative analysis.

expected_per_bin <- meta_results |>
  count(variant_id) |>
  summarise(avg = mean(n)) |>
  pull(avg) / 50

meta_results |>
  ggplot(aes(x = pval)) +
  geom_histogram(bins = 50, fill = "#3498db", color = "white", linewidth = 0.2) +
  geom_hline(
    yintercept = expected_per_bin,
    linetype   = "dashed", color = "firebrick", linewidth = 0.8
  ) +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "P-value", y = "Count",
    title   = "P-value Distribution by Variant",
    caption = "Red dashed line = expected count under uniform distribution"
  ) +
  theme_bw()

P-value distributions are broadly uniform across all eight variants, consistent with the null hypothesis. Slight enrichment of small p-values is expected even under the null when testing ~1,800 phecodes simultaneously.

QQ Plot

A QQ plot compares observed -log10(p) values against what would be expected under a uniform null. Points on the diagonal indicate null-consistent behavior; departure above the line at the upper right indicates signal or inflation.

meta_results |>
  filter(!is.na(pval)) |>
  group_by(variant_id) |>
  arrange(pval, .by_group = TRUE) |>
  mutate(
    expected = -log10(seq_along(pval) / (n() + 1)),
    observed = -log10(pval)
  ) |>
  ungroup() |>
  ggplot(aes(x = expected, y = observed)) +
  geom_point(alpha = 0.4, size = 0.8, color = "#3498db") +
  geom_abline(slope = 1, intercept = 0, color = "firebrick", linewidth = 0.8) +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "Expected -log10(p)", y = "Observed -log10(p)",
    title = "QQ Plot by Variant"
  ) +
  theme_bw()

QQ plots track closely along the diagonal for all variants. The modest tail departure is consistent with random sampling variability rather than systematic inflation.

Number of Ancestries Per Meta-Analysis

Each phecode-variant combination was meta-analyzed across whichever ancestry groups had sufficient data. Higher k produces more precise estimates. Meta-analyses with k = 2 are retained but excluded from the top hits table.

meta_results |>
  count(variant_id, n_groups) |>
  ggplot(aes(x = factor(n_groups), y = n, fill = factor(n_groups))) +
  geom_col(show.legend = FALSE) +
  scale_fill_brewer(palette = "Blues") +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "Number of Ancestry Groups (k)", y = "Number of Phecodes",
    title = "Ancestry Group Coverage per Variant"
  ) +
  theme_bw()

For the four largest variants (≥33K carriers), most phecodes were testable across all 6 ancestry groups. Smaller variants show more k = 2–3 results, reflecting insufficient carrier counts in smaller ancestry groups.

Heterogeneity (I²)

I² quantifies the proportion of total variance attributable to true between-ancestry differences rather than sampling error.

  • I² < 25% — Low heterogeneity; consistent effect sizes across ancestries
  • I² 25–50% — Moderate heterogeneity
  • I² > 50% — High heterogeneity; effect sizes differ meaningfully

Only phecode-variant pairs with k ≥ 3 are shown, as I² is unreliable at k = 2.

meta_results |>
  filter(n_groups >= 3) |>
  ggplot(aes(x = i2)) +
  geom_histogram(bins = 40, fill = "#2ecc71", color = "white", linewidth = 0.2) +
  geom_vline(xintercept = c(25, 50, 75), linetype = "dashed",
             color = "firebrick", linewidth = 0.7) +
  geom_text(
    data = tibble(x = c(25, 50, 75), label = c("Low", "Moderate", "High")),
    aes(x = x, y = Inf, label = label),
    vjust = 2, hjust = -0.15, size = 3, color = "firebrick",
    inherit.aes = FALSE
  ) +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "I² (%)", y = "Count",
    title   = "Heterogeneity Distribution by Variant",
    caption = "Restricted to meta-analyses with k ≥ 3 ancestry groups"
  ) +
  theme_bw()

The large spike at I² = 0% reflects the REML boundary behavior described below.

REML Boundary Cases

When the data cannot support a positive τ² (between-study variance), REML sets τ² = 0, collapsing the model to fixed effects. Approximately 50% of meta-analyses hit this boundary across every variant, consistent with a largely null study where effect sizes are small and k is low.

boundary_df <- meta_results |>
  filter(n_groups >= 3) |>
  group_by(variant_id) |>
  summarise(
    n_total    = n(),
    n_boundary = sum(tau2_boundary, na.rm = TRUE),
    pct        = round(100 * n_boundary / n_total, 1),
    .groups    = "drop"
  ) |>
  left_join(select(variant_metadata, variant_id, rsid), by = "variant_id") |>
  arrange(pct)

plot_ly(
  boundary_df,
  x    = ~pct,
  y    = ~reorder(rsid, pct),
  type = "bar",
  orientation = "h",
  marker      = list(color = "#7570B3"),
  text        = ~paste0(pct, "%  (", n_boundary, " / ", n_total, ")"),
  textposition = "outside",
  hovertemplate = "%{y}<br>Boundary: %{x}%<extra></extra>"
) |>
  layout(
    title  = list(text = "REML τ² = 0 Boundary Rate by Variant (k ≥ 3)", x = 0),
    xaxis  = list(title = "% of Meta-Analyses at Boundary", range = c(0, 75)),
    yaxis  = list(title = ""),
    margin = list(l = 110)
  )

Results

Significance Thresholds

Because we are testing 1803 phecodes simultaneously across 8 variants, multiple testing correction is essential. Three thresholds are applied:

  • Bonferroni divides α = 0.05 by the number of phecodes tested, assuming independence. This is conservative because phecodes are correlated.
  • FDR (Benjamini-Hochberg) controls the expected proportion of false discoveries. More appropriate for correlated phecode structure.
  • Suggestive (p < 1×10⁻³) is a commonly used exploratory cutoff in PheWAS.
Threshold Cutoff Rationale
Bonferroni 2.77e-05 0.05 / 1803 unique phecodes
FDR (BH) 0.05 (q-value) Benjamini-Hochberg correction across all tests
Suggestive 1.00e-03 Commonly used exploratory threshold in PheWAS

No associations reached Bonferroni correction or FDR < 0.05 for any variant.

Meta-Analysis Manhattan Plots

Each point represents one phecode, colored by disease category. Hover over any point to see full details including beta, 95% CI, p-value, k, and I². The solid red line marks the Bonferroni threshold; the dashed red line marks the suggestive threshold. Under the null, approximately 2 phecodes per variant are expected above the suggestive threshold by chance alone.

ggplot(
  meta_results |> distinct(phecode_category),
  aes(x = 1, y = 1, color = phecode_category)
) +
  geom_point(alpha = 0, size = 3) +
  scale_color_manual(values = category_palette, name = "Disease Category") +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(nrow = 3, override.aes = list(size = 3, alpha = 1)))

for (variant in unique(meta_results$variant_id)) {
  print(htmltools::tagList(plot_manhattan(variant)))
  cat(get_suggestive_text(variant))
}

Suggestive hits (p < 0.001):

RE_470 — Acute bronchiolitis (Respiratory) Beta (log-OR): -0.555 95% CI: [-0.874, -0.237] P-value: 6.27e-04 k: 3 I²: 0%

No associations above the suggestive threshold.

Suggestive hits (p < 0.001):

GU_586.2 — Cyst of kidney (Genitourinary) Beta (log-OR): 0.391 95% CI: [0.183, 0.599] P-value: 2.25e-04 k: 4 I²: 0%

No associations above the suggestive threshold.

Suggestive hits (p < 0.001):

NS_331.1 — Tension headache (Neurological) Beta (log-OR): -0.133 95% CI: [-0.209, -0.057] P-value: 6.15e-04 k: 4 I²: 0%

MS_713.31 — Pain in shoulder (Muscloskeletal) Beta (log-OR): -0.05 95% CI: [-0.078, -0.021] P-value: 6.65e-04 k: 6 I²: 0%

Suggestive hits (p < 0.001):

CV_416 — Cardiac arrhythmia and conduction disorders (Cardiovascular) Beta (log-OR): -0.133 95% CI: [-0.21, -0.055] P-value: 8.13e-04 k: 5 I²: 0%

Suggestive hits (p < 0.001):

PP_907 — Placental disorders (Pregnancy) Beta (log-OR): -0.227 95% CI: [-0.341, -0.113] P-value: 9.73e-05 k: 3 I²: 0%

Suggestive hits (p < 0.001):

CV_417.2 — Tachycardia (Cardiovascular) Beta (log-OR): 0.11 95% CI: [0.046, 0.174] P-value: 8.01e-04 k: 4 I²: 0%

MB_282.2 — Personal history of nicotine dependence (Mental) Beta (log-OR): 0.083 95% CI: [0.034, 0.132] P-value: 8.67e-04 k: 6 I²: 0%

Per-Ancestry Manhattan Plots

The plots below show the per-ancestry results from PheTK before meta-analysis, allowing you to see whether signals are driven by a specific ancestry group or are consistent across populations. Each row of panels represents one variant; each panel within a row shows one ancestry group. Hover over any point for phecode details including case and control counts.

for (variant in unique(meta_results$variant_id)) {
  print(htmltools::tagList(plot_ancestry_manhattan(variant)))
  cat("\n\n")
}

SPTLC3 Cardiovascular and Lipid Associations

Given SPTLC3’s role in sphingolipid biosynthesis, we specifically examine lipid and cardiovascular phecodes for any association signal regardless of direction. The heatmap below shows the meta-analytic beta (log-OR) for each of these phecodes across all eight variants. An interesting pattern emerges from the data: effect estimates trend predominantly negative for lipid and broad cardiovascular outcomes, suggesting possible protection — but this is an observation from the data, not a pre-specified hypothesis being confirmed.

heatmap_data <- meta_results |>
  filter(phecode %in% hypothesis_phecodes) |>
  left_join(variant_metadata, by = "variant_id") |>
  mutate(
    variant_label = paste0(rsid, "<br>N=", scales::comma(approx_n)),
    variant_label = factor(variant_label,
      levels = paste0(
        variant_metadata$rsid[match(variant_order, variant_metadata$variant_id)],
        "<br>N=",
        scales::comma(variant_metadata$approx_n[
          match(variant_order, variant_metadata$variant_id)])
      )
    ),
    phecode = factor(phecode, levels = rev(phecode_order)),
    tooltip = paste0(
      "<b>", phecode, ": ", phecode_string, "</b><br>",
      "Variant: ", rsid, " (", variant_id, ")<br>",
      "N carriers: ", scales::comma(approx_n), "<br>",
      "Beta (log-OR): ", round(estimate, 3), "<br>",
      "95% CI: [", round(ci_lb, 3), ", ", round(ci_ub, 3), "]<br>",
      "P-value: ", formatC(pval, format = "e", digits = 2), "<br>",
      "k (ancestries): ", n_groups, "<br>",
      "I²: ", round(i2, 1), "%"
    )
  )

plot_ly(
  heatmap_data,
  x = ~variant_label, y = ~phecode, z = ~estimate,
  text = ~tooltip, hoverinfo = "text",
  type = "heatmap",
  colorscale = list(
    c(0, "#2166ac"), c(0.5, "#ffffff"), c(1, "#d6604d")
  ),
  zmin = -0.6, zmax = 0.6,
  colorbar = list(title = "Beta<br>(log-OR)")
) |>
  add_markers(
    data        = filter(heatmap_data, pval < 0.05),
    x           = ~variant_label,
    y           = ~phecode,
    marker      = list(symbol = "asterisk", size = 8, color = "black",
                       line = list(width = 1.5, color = "black")),
    hoverinfo   = "skip",
    showlegend  = FALSE,
    inherit     = FALSE
  ) |>
  layout(
    title = list(
      text = paste0(
        "Effect Direction for Cardiovascular and Lipid Phecodes<br>",
        "<sup>Blue = protective (β < 0) | Red = risk-increasing (β > 0) | ",
        "✶ = nominally significant (p < 0.05)</sup>"
      ),
      x = 0
    ),
    xaxis  = list(title = "Variant (ordered by carrier N)", tickangle = -30),
    yaxis  = list(title = ""),
    margin = list(l = 120, b = 120)
  )

Eight of the fifteen phecodes show a negative direction (β < 0) in ≥75% of variants, including all lipid metabolism phecodes and the broadest cardiovascular outcomes. The notable exception is cerebral atherosclerosis (CV_436.5), which trends risk-increasing in the larger variants with nominally significant positive associations in the 33K and 122K variants. This warrants follow-up.

Top Suggestive Associations

All phecode-variant associations with p < 1.00e-03 and k ≥ 3 ancestry groups. None survive Bonferroni or FDR correction. Use the search box to filter by phenotype, category, or variant. Click any column header to sort.

meta_results |>
  filter(pval < suggestive_thresh, n_groups >= 3) |>
  arrange(pval) |>
  left_join(variant_metadata, by = "variant_id") |>
  select(rsid, variant_id, approx_n, phecode, phecode_string, phecode_category,
         estimate, ci_lb, ci_ub, pval, fdr, n_groups, i2) |>
  mutate(
    across(c(estimate, ci_lb, ci_ub), ~ round(.x, 3)),
    pval = round(pval, 5),
    fdr  = round(fdr, 4),
    i2   = round(i2, 1)
  ) |>
  rename(
    rsID           = rsid,
    Variant        = variant_id,
    `Carriers (N)` = approx_n,
    Phecode        = phecode,
    Phenotype      = phecode_string,
    Category       = phecode_category,
    `Beta (log-OR)`= estimate,
    `CI Lower`     = ci_lb,
    `CI Upper`     = ci_ub,
    `P-value`      = pval,
    FDR            = fdr,
    k              = n_groups,
    `I²`           = i2
  ) |>
  datatable(
    filter   = "top",
    rownames = FALSE,
    options  = list(
      pageLength = 15,
      scrollX    = TRUE,
      columnDefs = list(list(className = "dt-center", targets = "_all"))
    )
  ) |>
  formatStyle(
    "Beta (log-OR)",
    background = styleInterval(0, c("#d6eaf8", "#fadbd8")),
    color      = "black"
  )

Interpretation

Summary

This PheWAS meta-analysis tested 1803 disease outcomes across 8 SPTLC3 variants in up to 389,000 participants from six ancestry groups. The main findings are:

  1. No associations survived multiple testing correction — neither Bonferroni (p < 2.77e-05) nor FDR (q < 0.05) — across any variant or phecode.

  2. An exploratory signal of interest emerges in lipid and cardiovascular phecodes — effect estimates trend negative (toward lower disease odds) in the majority of variants for lipid and broad cardiovascular outcomes. This was not a pre-specified directional hypothesis but is consistent with what would be expected if these variants reduce SPTLC3 activity.

  3. No unexpected off-target associations — the top suggestive hits are scattered across unrelated disease categories with no cross-variant replication, consistent with random chance.

The study provides directional support for the SPTLC3 cardiovascular protection hypothesis but falls short of the statistical power needed for confirmatory evidence.

Why Did the Lipid and Cardiovascular Associations Not Reach Significance?

The most likely explanation is insufficient power for modest effect sizes. Protective associations with common diseases like hyperlipidemia, where the variant effect on lipid levels is real but small (|β| ≈ 0.04–0.05 log-OR), would require substantially larger sample sizes to clear Bonferroni correction across 1,800 phecodes. The consistent protective direction in the larger variants is encouraging but confidence intervals still include zero.

The most likely explanation is insufficient power for modest effect sizes. If these variants have a small effect on lipid levels (|β| ≈ 0.03–0.05 log-OR), detecting that association against a Bonferroni threshold correcting for 1,800 phecodes would require substantially larger sample sizes than the largest variant here (~389K). The nominally significant associations in the largest variants are encouraging but confidence intervals include zero.

Trans-Ethnic Consistency

The predominantly low I² values and ~50% REML boundary rate indicate effect sizes are broadly consistent across ancestry groups. There is no evidence of ancestry-specific effects being averaged away. The directional signal in the hypothesis-relevant phecodes appears consistent across ancestries rather than being driven by a single group.

The Cerebral Atherosclerosis Exception

CV_436.5 (Cerebral atherosclerosis) diverges from the otherwise protective pattern, showing nominally significant positive (risk-increasing) associations in the 33K and 122K variants. Cerebrovascular and coronary atherosclerosis have distinct pathophysiology and different genetic architectures in prior GWAS, making this divergence plausible. However, given the small case counts and lack of replication in the larger variants, this should be treated as hypothesis-generating.

Limitations

Power. The Bonferroni threshold of p < 2.77e-05 is stringent for detecting modest protective effects (|β| ≈ 0.03–0.05 log-OR).

EHR phenotyping. Diagnostic codes underascertain disease and may misclassify treated patients, attenuating associations with conditions like hyperlipidemia.

Small k. Smaller variants had many phecode-variant pairs testable in only 2–3 ancestry groups, producing unstable heterogeneity estimates.

Multiple testing burden. Testing 14,424 total associations means the suggestive hits carry a non-trivial false positive rate and require replication before interpretation.


Report generated with R 4.4.0 · metafor 5.0.1 · tidyverse 2.0.0 · plotly 4.10.4