SPTLC3 (Serine Palmitoyltransferase Long Chain Base Subunit 3) encodes a catalytic subunit of serine palmitoyltransferase (SPT), the enzyme responsible for the first and rate-limiting step in de novo sphingolipid biosynthesis.
Prior knockout studies in model organisms have demonstrated that loss of SPTLC3 function alters circulating lipid levels, but the direction and magnitude of effect from a common variant in humans is not pre-specified. This PheWAS is exploratory: we are looking for associations in either direction.
This report summarizes PheWAS meta-analysis across 8
variants on chromosome 20 and 1803 unique disease
outcomes (phecodes), combining results from 6 genetic
ancestry groups (AFR, AMR, EAS, EUR, MID, SAS). Per-ancestry
PheWAS results were generated using PheTK and combined
using random-effects meta-analysis (REML) via the metafor
package.
All eight variants fall within SPTLC3 on chromosome 20 (positions ~13.07–13.16 Mb, hg38), spanning exons 3 through 11. The table below summarizes the molecular annotation for each variant. Consequence type is color-coded: missense variants alter the amino acid sequence, synonymous variants do not, intronic variants fall outside coding sequence, and splice region variants may affect mRNA splicing.
variant_metadata |>
arrange(position) |>
select(rsid, position, exon, dna_change, protein_change,
consequence, aou_freq, allele_count) |>
mutate(aou_freq = round(aou_freq, 3)) |>
rename(
rsID = rsid,
Position = position,
Exon = exon,
`DNA Change` = dna_change,
`Protein Change` = protein_change,
Consequence = consequence,
`AoU Freq` = aou_freq,
`Allele Count` = allele_count
) |>
kable(format.args = list(big.mark = ",")) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) |>
column_spec(
6,
color = "white",
background = case_when(
variant_metadata |> arrange(position) |> pull(consequence) == "Missense"
~ "#e74c3c",
variant_metadata |> arrange(position) |> pull(consequence) == "Synonymous"
~ "#3498db",
variant_metadata |> arrange(position) |> pull(consequence) == "Splice Region, Intron"
~ "#e67e22",
TRUE ~ "#95a5a6"
)
)
| rsID | Position | Exon | DNA Change | Protein Change | Consequence | AoU Freq | Allele Count |
|---|---|---|---|---|---|---|---|
| rs243887 | 13,072,370 | 3 | c.418T>G | p.(Leu140Val) | Missense | 0.769 | 638,048 |
| rs243888 | 13,072,387 | 3 | c.435A>G | p.(Ser145=) | Synonymous | 0.656 | 544,144 |
| rs117004417 | 13,079,985 | 4-5 | c.607+5488C>T | — | Intron | 0.006 | 5,305 |
| rs77523068 | 13,080,015 | 4-5 | c.607+5518G>T | — | Intron | 0.014 | 11,706 |
| rs6109692 | 13,091,168 | 5 | c.693C>T | p.(Phe231=) | Synonymous | 0.095 | 78,432 |
| rs77696068 | 13,093,478 | 5-6 | c.733-6G>A | — | Splice Region, Intron | 0.016 | 13,327 |
| rs6078938 | 13,154,121 | 10 | c.1398T>C | p.(Tyr466=) | Synonymous | 0.160 | 132,725 |
| rs61738161 | 13,160,073 | 11 | c.1486G>A | p.(Ala496Thr) | Missense | 0.041 | 34,414 |
The chart below shows the ancestry composition of allele carriers for each variant. Variants differ substantially in which ancestry groups carry them — for example, rs77523068 is predominantly carried by AFR individuals (88%), while rs117004417 is predominantly EAS (25%) and EUR (52%).
ancestry_long <- variant_metadata |>
arrange(position) |>
select(rsid, AFR, EAS, EUR, AMR, MID, SAS, REM) |>
pivot_longer(-rsid, names_to = "Ancestry", values_to = "Percent") |>
mutate(
rsid = factor(rsid, levels = variant_metadata |>
arrange(position) |> pull(rsid)),
Ancestry = factor(Ancestry,
levels = c("AFR", "AMR", "EAS", "EUR", "MID", "SAS", "REM"))
)
ancestry_colors <- c(
AFR = "#E6851E", AMR = "#9B6DB5", EAS = "#1B9E77",
EUR = "#3498db", MID = "#e74c3c", SAS = "#F39C12", REM = "#95a5a6"
)
plot_ly(
ancestry_long,
x = ~rsid,
y = ~Percent,
color = ~Ancestry,
colors = ancestry_colors,
type = "bar",
text = ~paste0(Ancestry, ": ", Percent, "%"),
hoverinfo = "text"
) |>
layout(
barmode = "stack",
title = list(text = "Ancestry Composition of Allele Carriers by Variant",
x = 0),
xaxis = list(title = "Variant (rsID)", tickangle = -30),
yaxis = list(title = "Percentage of Carriers (%)", range = c(0, 101)),
legend = list(title = list(text = "Ancestry"))
)
phecode_counts <- meta_results |>
distinct(phecode, phecode_category) |>
count(phecode_category, name = "n_phecodes") |>
arrange(n_phecodes)
plot_ly(
phecode_counts,
x = ~n_phecodes,
y = ~reorder(phecode_category, n_phecodes),
type = "bar",
orientation = "h",
marker = list(color = "#3498db"),
text = ~n_phecodes,
textposition = "outside",
hovertemplate = "%{y}: %{x} phecodes<extra></extra>"
) |>
layout(
title = list(text = "Phecodes Tested per Disease Category", x = 0),
xaxis = list(title = "Number of Phecodes"),
yaxis = list(title = ""),
margin = list(l = 140)
)
Before examining results, we verify that the meta-analysis behaved as expected statistically. The QC section examines p-value distributions, ancestry group coverage, and between-ancestry heterogeneity.
Under the null hypothesis, p-values follow a uniform distribution between 0 and 1. The red dashed line marks the expected bin count under a perfect uniform distribution. Inflation (skew toward low p-values) would indicate confounding or population stratification; deflation would suggest overly conservative analysis.
expected_per_bin <- meta_results |>
count(variant_id) |>
summarise(avg = mean(n)) |>
pull(avg) / 50
meta_results |>
ggplot(aes(x = pval)) +
geom_histogram(bins = 50, fill = "#3498db", color = "white", linewidth = 0.2) +
geom_hline(
yintercept = expected_per_bin,
linetype = "dashed", color = "firebrick", linewidth = 0.8
) +
facet_wrap(~ variant_id, nrow = 2) +
labs(
x = "P-value", y = "Count",
title = "P-value Distribution by Variant",
caption = "Red dashed line = expected count under uniform distribution"
) +
theme_bw()
P-value distributions are broadly uniform across all eight variants, consistent with the null hypothesis. Slight enrichment of small p-values is expected even under the null when testing ~1,800 phecodes simultaneously.
A QQ plot compares observed -log10(p) values against what would be expected under a uniform null. Points on the diagonal indicate null-consistent behavior; departure above the line at the upper right indicates signal or inflation.
meta_results |>
filter(!is.na(pval)) |>
group_by(variant_id) |>
arrange(pval, .by_group = TRUE) |>
mutate(
expected = -log10(seq_along(pval) / (n() + 1)),
observed = -log10(pval)
) |>
ungroup() |>
ggplot(aes(x = expected, y = observed)) +
geom_point(alpha = 0.4, size = 0.8, color = "#3498db") +
geom_abline(slope = 1, intercept = 0, color = "firebrick", linewidth = 0.8) +
facet_wrap(~ variant_id, nrow = 2) +
labs(
x = "Expected -log10(p)", y = "Observed -log10(p)",
title = "QQ Plot by Variant"
) +
theme_bw()
QQ plots track closely along the diagonal for all variants. The modest tail departure is consistent with random sampling variability rather than systematic inflation.
Each phecode-variant combination was meta-analyzed across whichever ancestry groups had sufficient data. Higher k produces more precise estimates. Meta-analyses with k = 2 are retained but excluded from the top hits table.
meta_results |>
count(variant_id, n_groups) |>
ggplot(aes(x = factor(n_groups), y = n, fill = factor(n_groups))) +
geom_col(show.legend = FALSE) +
scale_fill_brewer(palette = "Blues") +
facet_wrap(~ variant_id, nrow = 2) +
labs(
x = "Number of Ancestry Groups (k)", y = "Number of Phecodes",
title = "Ancestry Group Coverage per Variant"
) +
theme_bw()
For the four largest variants (≥33K carriers), most phecodes were testable across all 6 ancestry groups. Smaller variants show more k = 2–3 results, reflecting insufficient carrier counts in smaller ancestry groups.
I² quantifies the proportion of total variance attributable to true between-ancestry differences rather than sampling error.
Only phecode-variant pairs with k ≥ 3 are shown, as I² is unreliable at k = 2.
meta_results |>
filter(n_groups >= 3) |>
ggplot(aes(x = i2)) +
geom_histogram(bins = 40, fill = "#2ecc71", color = "white", linewidth = 0.2) +
geom_vline(xintercept = c(25, 50, 75), linetype = "dashed",
color = "firebrick", linewidth = 0.7) +
geom_text(
data = tibble(x = c(25, 50, 75), label = c("Low", "Moderate", "High")),
aes(x = x, y = Inf, label = label),
vjust = 2, hjust = -0.15, size = 3, color = "firebrick",
inherit.aes = FALSE
) +
facet_wrap(~ variant_id, nrow = 2) +
labs(
x = "I² (%)", y = "Count",
title = "Heterogeneity Distribution by Variant",
caption = "Restricted to meta-analyses with k ≥ 3 ancestry groups"
) +
theme_bw()
The large spike at I² = 0% reflects the REML boundary behavior described below.
When the data cannot support a positive τ² (between-study variance), REML sets τ² = 0, collapsing the model to fixed effects. Approximately 50% of meta-analyses hit this boundary across every variant, consistent with a largely null study where effect sizes are small and k is low.
boundary_df <- meta_results |>
filter(n_groups >= 3) |>
group_by(variant_id) |>
summarise(
n_total = n(),
n_boundary = sum(tau2_boundary, na.rm = TRUE),
pct = round(100 * n_boundary / n_total, 1),
.groups = "drop"
) |>
left_join(select(variant_metadata, variant_id, rsid), by = "variant_id") |>
arrange(pct)
plot_ly(
boundary_df,
x = ~pct,
y = ~reorder(rsid, pct),
type = "bar",
orientation = "h",
marker = list(color = "#7570B3"),
text = ~paste0(pct, "% (", n_boundary, " / ", n_total, ")"),
textposition = "outside",
hovertemplate = "%{y}<br>Boundary: %{x}%<extra></extra>"
) |>
layout(
title = list(text = "REML τ² = 0 Boundary Rate by Variant (k ≥ 3)", x = 0),
xaxis = list(title = "% of Meta-Analyses at Boundary", range = c(0, 75)),
yaxis = list(title = ""),
margin = list(l = 110)
)
Because we are testing 1803 phecodes simultaneously across 8 variants, multiple testing correction is essential. Three thresholds are applied:
| Threshold | Cutoff | Rationale |
|---|---|---|
| Bonferroni | 2.77e-05 | 0.05 / 1803 unique phecodes |
| FDR (BH) | 0.05 (q-value) | Benjamini-Hochberg correction across all tests |
| Suggestive | 1.00e-03 | Commonly used exploratory threshold in PheWAS |
No associations reached Bonferroni correction or FDR < 0.05 for any variant.
Each point represents one phecode, colored by disease category. Hover over any point to see full details including beta, 95% CI, p-value, k, and I². The solid red line marks the Bonferroni threshold; the dashed red line marks the suggestive threshold. Under the null, approximately 2 phecodes per variant are expected above the suggestive threshold by chance alone.
ggplot(
meta_results |> distinct(phecode_category),
aes(x = 1, y = 1, color = phecode_category)
) +
geom_point(alpha = 0, size = 3) +
scale_color_manual(values = category_palette, name = "Disease Category") +
theme_void() +
theme(legend.position = "bottom") +
guides(color = guide_legend(nrow = 3, override.aes = list(size = 3, alpha = 1)))
for (variant in unique(meta_results$variant_id)) {
print(htmltools::tagList(plot_manhattan(variant)))
cat(get_suggestive_text(variant))
}
Suggestive hits (p < 0.001):
RE_470 — Acute bronchiolitis (Respiratory) Beta (log-OR): -0.555 95% CI: [-0.874, -0.237] P-value: 6.27e-04 k: 3 I²: 0%
No associations above the suggestive threshold.
Suggestive hits (p < 0.001):
GU_586.2 — Cyst of kidney (Genitourinary) Beta (log-OR): 0.391 95% CI: [0.183, 0.599] P-value: 2.25e-04 k: 4 I²: 0%
No associations above the suggestive threshold.
Suggestive hits (p < 0.001):
NS_331.1 — Tension headache (Neurological) Beta (log-OR): -0.133 95% CI: [-0.209, -0.057] P-value: 6.15e-04 k: 4 I²: 0%
MS_713.31 — Pain in shoulder (Muscloskeletal) Beta (log-OR): -0.05 95% CI: [-0.078, -0.021] P-value: 6.65e-04 k: 6 I²: 0%
Suggestive hits (p < 0.001):
CV_416 — Cardiac arrhythmia and conduction disorders (Cardiovascular) Beta (log-OR): -0.133 95% CI: [-0.21, -0.055] P-value: 8.13e-04 k: 5 I²: 0%
Suggestive hits (p < 0.001):
PP_907 — Placental disorders (Pregnancy) Beta (log-OR): -0.227 95% CI: [-0.341, -0.113] P-value: 9.73e-05 k: 3 I²: 0%
Suggestive hits (p < 0.001):
CV_417.2 — Tachycardia (Cardiovascular) Beta (log-OR): 0.11 95% CI: [0.046, 0.174] P-value: 8.01e-04 k: 4 I²: 0%
MB_282.2 — Personal history of nicotine dependence (Mental) Beta (log-OR): 0.083 95% CI: [0.034, 0.132] P-value: 8.67e-04 k: 6 I²: 0%
The plots below show the per-ancestry results from PheTK before meta-analysis, allowing you to see whether signals are driven by a specific ancestry group or are consistent across populations. Each row of panels represents one variant; each panel within a row shows one ancestry group. Hover over any point for phecode details including case and control counts.
for (variant in unique(meta_results$variant_id)) {
print(htmltools::tagList(plot_ancestry_manhattan(variant)))
cat("\n\n")
}
Given SPTLC3’s role in sphingolipid biosynthesis, we specifically examine lipid and cardiovascular phecodes for any association signal regardless of direction. The heatmap below shows the meta-analytic beta (log-OR) for each of these phecodes across all eight variants. An interesting pattern emerges from the data: effect estimates trend predominantly negative for lipid and broad cardiovascular outcomes, suggesting possible protection — but this is an observation from the data, not a pre-specified hypothesis being confirmed.
heatmap_data <- meta_results |>
filter(phecode %in% hypothesis_phecodes) |>
left_join(variant_metadata, by = "variant_id") |>
mutate(
variant_label = paste0(rsid, "<br>N=", scales::comma(approx_n)),
variant_label = factor(variant_label,
levels = paste0(
variant_metadata$rsid[match(variant_order, variant_metadata$variant_id)],
"<br>N=",
scales::comma(variant_metadata$approx_n[
match(variant_order, variant_metadata$variant_id)])
)
),
phecode = factor(phecode, levels = rev(phecode_order)),
tooltip = paste0(
"<b>", phecode, ": ", phecode_string, "</b><br>",
"Variant: ", rsid, " (", variant_id, ")<br>",
"N carriers: ", scales::comma(approx_n), "<br>",
"Beta (log-OR): ", round(estimate, 3), "<br>",
"95% CI: [", round(ci_lb, 3), ", ", round(ci_ub, 3), "]<br>",
"P-value: ", formatC(pval, format = "e", digits = 2), "<br>",
"k (ancestries): ", n_groups, "<br>",
"I²: ", round(i2, 1), "%"
)
)
plot_ly(
heatmap_data,
x = ~variant_label, y = ~phecode, z = ~estimate,
text = ~tooltip, hoverinfo = "text",
type = "heatmap",
colorscale = list(
c(0, "#2166ac"), c(0.5, "#ffffff"), c(1, "#d6604d")
),
zmin = -0.6, zmax = 0.6,
colorbar = list(title = "Beta<br>(log-OR)")
) |>
add_markers(
data = filter(heatmap_data, pval < 0.05),
x = ~variant_label,
y = ~phecode,
marker = list(symbol = "asterisk", size = 8, color = "black",
line = list(width = 1.5, color = "black")),
hoverinfo = "skip",
showlegend = FALSE,
inherit = FALSE
) |>
layout(
title = list(
text = paste0(
"Effect Direction for Cardiovascular and Lipid Phecodes<br>",
"<sup>Blue = protective (β < 0) | Red = risk-increasing (β > 0) | ",
"✶ = nominally significant (p < 0.05)</sup>"
),
x = 0
),
xaxis = list(title = "Variant (ordered by carrier N)", tickangle = -30),
yaxis = list(title = ""),
margin = list(l = 120, b = 120)
)
Eight of the fifteen phecodes show a negative direction (β < 0) in ≥75% of variants, including all lipid metabolism phecodes and the broadest cardiovascular outcomes. The notable exception is cerebral atherosclerosis (CV_436.5), which trends risk-increasing in the larger variants with nominally significant positive associations in the 33K and 122K variants. This warrants follow-up.
All phecode-variant associations with p < 1.00e-03 and k ≥ 3 ancestry groups. None survive Bonferroni or FDR correction. Use the search box to filter by phenotype, category, or variant. Click any column header to sort.
meta_results |>
filter(pval < suggestive_thresh, n_groups >= 3) |>
arrange(pval) |>
left_join(variant_metadata, by = "variant_id") |>
select(rsid, variant_id, approx_n, phecode, phecode_string, phecode_category,
estimate, ci_lb, ci_ub, pval, fdr, n_groups, i2) |>
mutate(
across(c(estimate, ci_lb, ci_ub), ~ round(.x, 3)),
pval = round(pval, 5),
fdr = round(fdr, 4),
i2 = round(i2, 1)
) |>
rename(
rsID = rsid,
Variant = variant_id,
`Carriers (N)` = approx_n,
Phecode = phecode,
Phenotype = phecode_string,
Category = phecode_category,
`Beta (log-OR)`= estimate,
`CI Lower` = ci_lb,
`CI Upper` = ci_ub,
`P-value` = pval,
FDR = fdr,
k = n_groups,
`I²` = i2
) |>
datatable(
filter = "top",
rownames = FALSE,
options = list(
pageLength = 15,
scrollX = TRUE,
columnDefs = list(list(className = "dt-center", targets = "_all"))
)
) |>
formatStyle(
"Beta (log-OR)",
background = styleInterval(0, c("#d6eaf8", "#fadbd8")),
color = "black"
)
This PheWAS meta-analysis tested 1803 disease outcomes across 8 SPTLC3 variants in up to 389,000 participants from six ancestry groups. The main findings are:
No associations survived multiple testing correction — neither Bonferroni (p < 2.77e-05) nor FDR (q < 0.05) — across any variant or phecode.
An exploratory signal of interest emerges in lipid and cardiovascular phecodes — effect estimates trend negative (toward lower disease odds) in the majority of variants for lipid and broad cardiovascular outcomes. This was not a pre-specified directional hypothesis but is consistent with what would be expected if these variants reduce SPTLC3 activity.
No unexpected off-target associations — the top suggestive hits are scattered across unrelated disease categories with no cross-variant replication, consistent with random chance.
The study provides directional support for the SPTLC3 cardiovascular protection hypothesis but falls short of the statistical power needed for confirmatory evidence.
The most likely explanation is insufficient power for modest effect sizes. Protective associations with common diseases like hyperlipidemia, where the variant effect on lipid levels is real but small (|β| ≈ 0.04–0.05 log-OR), would require substantially larger sample sizes to clear Bonferroni correction across 1,800 phecodes. The consistent protective direction in the larger variants is encouraging but confidence intervals still include zero.
The most likely explanation is insufficient power for modest effect sizes. If these variants have a small effect on lipid levels (|β| ≈ 0.03–0.05 log-OR), detecting that association against a Bonferroni threshold correcting for 1,800 phecodes would require substantially larger sample sizes than the largest variant here (~389K). The nominally significant associations in the largest variants are encouraging but confidence intervals include zero.
The predominantly low I² values and ~50% REML boundary rate indicate effect sizes are broadly consistent across ancestry groups. There is no evidence of ancestry-specific effects being averaged away. The directional signal in the hypothesis-relevant phecodes appears consistent across ancestries rather than being driven by a single group.
CV_436.5 (Cerebral atherosclerosis) diverges from the otherwise protective pattern, showing nominally significant positive (risk-increasing) associations in the 33K and 122K variants. Cerebrovascular and coronary atherosclerosis have distinct pathophysiology and different genetic architectures in prior GWAS, making this divergence plausible. However, given the small case counts and lack of replication in the larger variants, this should be treated as hypothesis-generating.
Power. The Bonferroni threshold of p < 2.77e-05 is stringent for detecting modest protective effects (|β| ≈ 0.03–0.05 log-OR).
EHR phenotyping. Diagnostic codes underascertain disease and may misclassify treated patients, attenuating associations with conditions like hyperlipidemia.
Small k. Smaller variants had many phecode-variant pairs testable in only 2–3 ancestry groups, producing unstable heterogeneity estimates.
Multiple testing burden. Testing 14,424 total associations means the suggestive hits carry a non-trivial false positive rate and require replication before interpretation.
Report generated with R 4.4.0 · metafor 5.0.1 · tidyverse 2.0.0 · plotly 4.10.4