Study Overview

SPTLC3 (Serine Palmitoyltransferase Long Chain Base Subunit 3) encodes a catalytic subunit of serine palmitoyltransferase (SPT), the enzyme responsible for the first and rate-limiting step in de novo sphingolipid biosynthesis.

Prior knockout studies in model organisms have demonstrated that loss of SPTLC3 function alters circulating lipid levels, but the direction and magnitude of effect from a common variant in humans is not pre-specified. This PheWAS is exploratory: we are looking for associations in either direction.

This report summarizes PheWAS meta-analysis across 8 variants on chromosome 20 and 1803 unique disease outcomes (phecodes), combining results from 6 genetic ancestry groups (AFR, AMR, EAS, EUR, MID, SAS). Per-ancestry PheWAS results were generated using PheTK and combined using random-effects meta-analysis (REML) via the metafor package.

Variants Tested

All eight variants fall within SPTLC3 on chromosome 20 (positions ~13.07–13.16 Mb, hg38), spanning exons 3 through 11. The table below summarizes the molecular annotation for each variant. Consequence type is color-coded: missense variants alter the amino acid sequence, synonymous variants do not, intronic variants fall outside coding sequence, and splice region variants may affect mRNA splicing.

variant_metadata |>
  arrange(position) |>
  select(rsid, position, exon, dna_change, protein_change,
         consequence, aou_freq, allele_count) |>
  mutate(aou_freq = round(aou_freq, 3)) |>
  rename(
    rsID             = rsid,
    Position         = position,
    Exon             = exon,
    `DNA Change`     = dna_change,
    `Protein Change` = protein_change,
    Consequence      = consequence,
    `AoU Freq`       = aou_freq,
    `Allele Count`   = allele_count
  ) |>
  kable(format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) |>
  column_spec(
    6,
    color = "white",
    background = case_when(
      variant_metadata |> arrange(position) |> pull(consequence) == "Missense"
        ~ "#e74c3c",
      variant_metadata |> arrange(position) |> pull(consequence) == "Synonymous"
        ~ "#3498db",
      variant_metadata |> arrange(position) |> pull(consequence) == "Splice Region, Intron"
        ~ "#e67e22",
      TRUE ~ "#95a5a6"
    )
  )

rsID	Position	Exon	DNA Change	Protein Change	Consequence	AoU Freq	Allele Count
rs243887	13,072,370	3	c.418T>G	p.(Leu140Val)	Missense	0.769	638,048
rs243888	13,072,387	3	c.435A>G	p.(Ser145=)	Synonymous	0.656	544,144
rs117004417	13,079,985	4-5	c.607+5488C>T	—	Intron	0.006	5,305
rs77523068	13,080,015	4-5	c.607+5518G>T	—	Intron	0.014	11,706
rs6109692	13,091,168	5	c.693C>T	p.(Phe231=)	Synonymous	0.095	78,432
rs77696068	13,093,478	5-6	c.733-6G>A	—	Splice Region, Intron	0.016	13,327
rs6078938	13,154,121	10	c.1398T>C	p.(Tyr466=)	Synonymous	0.160	132,725
rs61738161	13,160,073	11	c.1486G>A	p.(Ala496Thr)	Missense	0.041	34,414

The chart below shows the ancestry composition of allele carriers for each variant. Variants differ substantially in which ancestry groups carry them — for example, rs77523068 is predominantly carried by AFR individuals (88%), while rs117004417 is predominantly EAS (25%) and EUR (52%).

ancestry_long <- variant_metadata |>
  arrange(position) |>
  select(rsid, AFR, EAS, EUR, AMR, MID, SAS, REM) |>
  pivot_longer(-rsid, names_to = "Ancestry", values_to = "Percent") |>
  mutate(
    rsid    = factor(rsid, levels = variant_metadata |>
                       arrange(position) |> pull(rsid)),
    Ancestry = factor(Ancestry,
                      levels = c("AFR", "AMR", "EAS", "EUR", "MID", "SAS", "REM"))
  )

ancestry_colors <- c(
  AFR = "#E6851E", AMR = "#9B6DB5", EAS = "#1B9E77",
  EUR = "#3498db", MID = "#e74c3c", SAS = "#F39C12", REM = "#95a5a6"
)

plot_ly(
  ancestry_long,
  x       = ~rsid,
  y       = ~Percent,
  color   = ~Ancestry,
  colors  = ancestry_colors,
  type    = "bar",
  text    = ~paste0(Ancestry, ": ", Percent, "%"),
  hoverinfo = "text"
) |>
  layout(
    barmode = "stack",
    title   = list(text = "Ancestry Composition of Allele Carriers by Variant",
                   x = 0),
    xaxis   = list(title = "Variant (rsID)", tickangle = -30),
    yaxis   = list(title = "Percentage of Carriers (%)", range = c(0, 101)),
    legend  = list(title = list(text = "Ancestry"))
  )

Phecode Category Coverage

phecode_counts <- meta_results |>
  distinct(phecode, phecode_category) |>
  count(phecode_category, name = "n_phecodes") |>
  arrange(n_phecodes)

plot_ly(
  phecode_counts,
  x         = ~n_phecodes,
  y         = ~reorder(phecode_category, n_phecodes),
  type      = "bar",
  orientation = "h",
  marker    = list(color = "#3498db"),
  text      = ~n_phecodes,
  textposition = "outside",
  hovertemplate = "%{y}: %{x} phecodes<extra></extra>"
) |>
  layout(
    title  = list(text = "Phecodes Tested per Disease Category", x = 0),
    xaxis  = list(title = "Number of Phecodes"),
    yaxis  = list(title = ""),
    margin = list(l = 140)
  )

Quality Control

Before examining results, we verify that the meta-analysis behaved as expected statistically. The QC section examines p-value distributions, ancestry group coverage, and between-ancestry heterogeneity.

P-value Distribution

Under the null hypothesis, p-values follow a uniform distribution between 0 and 1. The red dashed line marks the expected bin count under a perfect uniform distribution. Inflation (skew toward low p-values) would indicate confounding or population stratification; deflation would suggest overly conservative analysis.

expected_per_bin <- meta_results |>
  count(variant_id) |>
  summarise(avg = mean(n)) |>
  pull(avg) / 50

meta_results |>
  ggplot(aes(x = pval)) +
  geom_histogram(bins = 50, fill = "#3498db", color = "white", linewidth = 0.2) +
  geom_hline(
    yintercept = expected_per_bin,
    linetype   = "dashed", color = "firebrick", linewidth = 0.8
  ) +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "P-value", y = "Count",
    title   = "P-value Distribution by Variant",
    caption = "Red dashed line = expected count under uniform distribution"
  ) +
  theme_bw()

P-value distributions are broadly uniform across all eight variants, consistent with the null hypothesis. Slight enrichment of small p-values is expected even under the null when testing ~1,800 phecodes simultaneously.

QQ Plot

A QQ plot compares observed -log10(p) values against what would be expected under a uniform null. Points on the diagonal indicate null-consistent behavior; departure above the line at the upper right indicates signal or inflation.

meta_results |>
  filter(!is.na(pval)) |>
  group_by(variant_id) |>
  arrange(pval, .by_group = TRUE) |>
  mutate(
    expected = -log10(seq_along(pval) / (n() + 1)),
    observed = -log10(pval)
  ) |>
  ungroup() |>
  ggplot(aes(x = expected, y = observed)) +
  geom_point(alpha = 0.4, size = 0.8, color = "#3498db") +
  geom_abline(slope = 1, intercept = 0, color = "firebrick", linewidth = 0.8) +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "Expected -log10(p)", y = "Observed -log10(p)",
    title = "QQ Plot by Variant"
  ) +
  theme_bw()

QQ plots track closely along the diagonal for all variants. The modest tail departure is consistent with random sampling variability rather than systematic inflation.

Number of Ancestries Per Meta-Analysis

Each phecode-variant combination was meta-analyzed across whichever ancestry groups had sufficient data. Higher k produces more precise estimates. Meta-analyses with k = 2 are retained but excluded from the top hits table.

meta_results |>
  count(variant_id, n_groups) |>
  ggplot(aes(x = factor(n_groups), y = n, fill = factor(n_groups))) +
  geom_col(show.legend = FALSE) +
  scale_fill_brewer(palette = "Blues") +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "Number of Ancestry Groups (k)", y = "Number of Phecodes",
    title = "Ancestry Group Coverage per Variant"
  ) +
  theme_bw()

For the four largest variants (≥33K carriers), most phecodes were testable across all 6 ancestry groups. Smaller variants show more k = 2–3 results, reflecting insufficient carrier counts in smaller ancestry groups.

Heterogeneity (I²)

I² quantifies the proportion of total variance attributable to true between-ancestry differences rather than sampling error.

I² < 25% — Low heterogeneity; consistent effect sizes across ancestries
I² 25–50% — Moderate heterogeneity
I² > 50% — High heterogeneity; effect sizes differ meaningfully

Only phecode-variant pairs with k ≥ 3 are shown, as I² is unreliable at k = 2.

meta_results |>
  filter(n_groups >= 3) |>
  ggplot(aes(x = i2)) +
  geom_histogram(bins = 40, fill = "#2ecc71", color = "white", linewidth = 0.2) +
  geom_vline(xintercept = c(25, 50, 75), linetype = "dashed",
             color = "firebrick", linewidth = 0.7) +
  geom_text(
    data = tibble(x = c(25, 50, 75), label = c("Low", "Moderate", "High")),
    aes(x = x, y = Inf, label = label),
    vjust = 2, hjust = -0.15, size = 3, color = "firebrick",
    inherit.aes = FALSE
  ) +
  facet_wrap(~ variant_id, nrow = 2) +
  labs(
    x = "I² (%)", y = "Count",
    title   = "Heterogeneity Distribution by Variant",
    caption = "Restricted to meta-analyses with k ≥ 3 ancestry groups"
  ) +
  theme_bw()

The large spike at I² = 0% reflects the REML boundary behavior described below.

REML Boundary Cases

When the data cannot support a positive τ² (between-study variance), REML sets τ² = 0, collapsing the model to fixed effects. Approximately 50% of meta-analyses hit this boundary across every variant, consistent with a largely null study where effect sizes are small and k is low.

boundary_df <- meta_results |>
  filter(n_groups >= 3) |>
  group_by(variant_id) |>
  summarise(
    n_total    = n(),
    n_boundary = sum(tau2_boundary, na.rm = TRUE),
    pct        = round(100 * n_boundary / n_total, 1),
    .groups    = "drop"
  ) |>
  left_join(select(variant_metadata, variant_id, rsid), by = "variant_id") |>
  arrange(pct)

plot_ly(
  boundary_df,
  x    = ~pct,
  y    = ~reorder(rsid, pct),
  type = "bar",
  orientation = "h",
  marker      = list(color = "#7570B3"),
  text        = ~paste0(pct, "%  (", n_boundary, " / ", n_total, ")"),
  textposition = "outside",
  hovertemplate = "%{y}<br>Boundary: %{x}%<extra></extra>"
) |>
  layout(
    title  = list(text = "REML τ² = 0 Boundary Rate by Variant (k ≥ 3)", x = 0),
    xaxis  = list(title = "% of Meta-Analyses at Boundary", range = c(0, 75)),
    yaxis  = list(title = ""),
    margin = list(l = 110)
  )

Results

Significance Thresholds

Because we are testing 1803 phecodes simultaneously across 8 variants, multiple testing correction is essential. Three thresholds are applied:

Bonferroni divides α = 0.05 by the number of phecodes tested, assuming independence. This is conservative because phecodes are correlated.
FDR (Benjamini-Hochberg) controls the expected proportion of false discoveries. More appropriate for correlated phecode structure.
Suggestive (p < 1×10⁻³) is a commonly used exploratory cutoff in PheWAS.

Threshold	Cutoff	Rationale
Bonferroni	2.77e-05	0.05 / 1803 unique phecodes
FDR (BH)	0.05 (q-value)	Benjamini-Hochberg correction across all tests
Suggestive	1.00e-03	Commonly used exploratory threshold in PheWAS

No associations reached Bonferroni correction or FDR < 0.05 for any variant.

Meta-Analysis Manhattan Plots

Each point represents one phecode, colored by disease category. Hover over any point to see full details including beta, 95% CI, p-value, k, and I². The solid red line marks the Bonferroni threshold; the dashed red line marks the suggestive threshold. Under the null, approximately 2 phecodes per variant are expected above the suggestive threshold by chance alone.

ggplot(
  meta_results |> distinct(phecode_category),
  aes(x = 1, y = 1, color = phecode_category)
) +
  geom_point(alpha = 0, size = 3) +
  scale_color_manual(values = category_palette, name = "Disease Category") +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(nrow = 3, override.aes = list(size = 3, alpha = 1)))

for (variant in unique(meta_results$variant_id)) {
  print(htmltools::tagList(plot_manhattan(variant)))
  cat(get_suggestive_text(variant))
}