A Comprehensive Burden of Disease Study — Kenya, 2019
Author
Timothy Achala
Published
May 2, 2026
1 Introduction and Conceptual Framework
1.1 Background
The Global Burden of Disease (GBD) framework provides a systematic approach to quantifying health loss across populations. Two foundational metrics are:
Metric
What it Measures
Components
YLL
Premature mortality burden
Deaths × remaining life expectancy
YLD
Non-fatal health burden
Prevalence × disability weight × duration
DALY
Total health burden
YLL + YLD
One DALY = one lost year of healthy life. It represents the gap between current health status and an ideal situation where everyone lives in full health to an old age.
1.2 Epidemiological Justification
1.2.1 Why YLL Matters
Years of Life Lost penalises deaths at younger ages more heavily than deaths at older ages. A child who dies at age 2 contributes far more YLL than an 80-year-old dying of the same disease. This is crucial for priority-setting because it focuses attention on preventable premature mortality.
1.2.2 Why DALY Is Superior to Simple Death Counts
Death counts ignore:
Non-fatal burden: Depression rarely kills but causes enormous suffering.
Age at death: A death at 25 is more tragic (in terms of life-years) than at 85.
Severity of disability: A 10-year episode of mild anaemia ≠ 10 years of severe HIV.
DALYs address all three limitations simultaneously.
2 Data Loading and Exploration
2.1 Load the Dataset
Code
# ─────────────────────────────────────────────────────────────────# READ DATA# The CSV contains 128 rows covering 6 diseases × 8 age groups × 2 sexes# in Kenya, 2019 (GBD reference year).# ─────────────────────────────────────────────────────────────────raw_data <-read_csv("burden_of_disease_dataset.csv",show_col_types =FALSE)# Convert categorical variables to factors with meaningful orderingdata <- raw_data |>mutate(# Ordered age groups ensure correct axis ordering in plotsage_group =factor(age_group, levels =c("0-4", "5-14", "15-29", "30-44", "45-59", "60-69", "70-79", "80+" )),sex =factor(sex, levels =c("Male", "Female")),disease_category =factor(disease_category, levels =c("Communicable", "Non-Communicable", "Cardiovascular", "Mental Health", "Injuries" )),# Flag: does this disease cause direct mortality?cause_of_death =as.logical(cause_of_death =="Yes"),# age_at_death is already numeric; NAs are read as NA automatically by read_csvage_at_death =suppressWarnings(as.numeric(age_at_death)) )# Quick confirmationcat("Dimensions:", nrow(data), "rows ×", ncol(data), "columns\n")
Age groups: 0-4, 5-14, 15-29, 30-44, 45-59, 60-69, 70-79, 80+
Interpretation: The dataset captures 6 major disease causes (Ischemic Heart Disease, Stroke, Lower Respiratory Infections, HIV/AIDS, Diabetes Mellitus, Malaria, Depressive Disorders, Road Injuries) across 8 age strata for both sexes in Kenya 2019. The total of 128 records allows stratified analysis by age, sex, disease category, and cause type.
2.2 Data Quality Check
Code
# ─────────────────────────────────────────────────────────────────# MISSING VALUE AUDIT# age_at_death is legitimately NA for non-fatal diseases# (e.g., Depressive Disorders where cause_of_death = FALSE)# ─────────────────────────────────────────────────────────────────missing_summary <- data |>summarise(across(everything(), ~sum(is.na(.)))) |>pivot_longer(everything(), names_to ="Variable", values_to ="Missing_n") |>filter(Missing_n >0) |>mutate(Missing_pct =round(Missing_n /nrow(data) *100, 1),Explanation =case_when( Variable =="age_at_death"~"Expected: non-fatal diseases have no age at death",TRUE~"Investigate further" ) )missing_summary |>gt() |>tab_header(title ="Missing Data Audit") |>cols_label(Variable ="Column", Missing_n ="N Missing",Missing_pct ="% Missing", Explanation ="Reason") |>tab_style(style =cell_fill(color ="#FFF3CD"),locations =cells_body(rows = Missing_n >0))
Missing Data Audit
Column
N Missing
% Missing
Reason
age_at_death
2
1.6
Expected: non-fatal diseases have no age at death
Interpretation: The only missing values appear in age_at_death for Depressive Disorders, which is by design — depression is classified as primarily non-fatal in this dataset, so no age-at-death value applies. No unexpected missingness exists, confirming data integrity.
where \(L_{age,sex}\) is the remaining life expectancy at the age of death according to a standard life table (GBD uses 86.0 years for females, 91.9 years for males — here we use a unified 86-year frontier).
The remaining life expectancy for someone dying at age \(a\) is:
\[L(a) = \text{Standard Life Expectancy} - a\]
This means a 2-year-old dying contributes \((86 - 2) = 84\) YLL, while a 75-year-old dying contributes only \((86 - 75) = 11\) YLL.
3.2 YLL Computation in R
Code
# ─────────────────────────────────────────────────────────────────# YLL CALCULATION## Formula: YLL = Deaths × (Standard LE - Age at Death)## Key design decisions:# 1. We use the GBD standard LE of 86.0 years (reference frontier).# 2. Mid-point of the age group is used as the proxy age at death# when age_at_death is missing (only for non-fatal diseases).# 3. YLL is set to 0 for non-fatal diseases (cause_of_death = FALSE).# 4. We add a floor: if (standard_LE - age_at_death) < 0, YLL = 0# to handle the rare case where age_at_death exceeds the standard.# ─────────────────────────────────────────────────────────────────STANDARD_LE <-86.0# GBD universal standard life expectancy (years)yll_data <- data |>mutate(# ── Mid-point of age group for cases where age_at_death is NA ──age_midpoint = (age_group_lower + age_group_upper) /2,# ── Use actual recorded age at death, else fall back to midpoint ──effective_age_at_death =coalesce(age_at_death, age_midpoint),# ── Remaining life expectancy at age of death ──remaining_LE =pmax(STANDARD_LE - effective_age_at_death, 0),# ── YLL = Deaths × Remaining LE ──# Non-fatal diseases (cause_of_death = FALSE) get YLL = 0YLL =if_else(cause_of_death, deaths * remaining_LE, 0),# ── YLL rate per 100,000 population (for comparability) ──YLL_rate_100k = (YLL / population) *100000 )# Show a sample of the computationyll_data |>filter(deaths >0) |>select(disease_name, age_group, sex, deaths, effective_age_at_death, remaining_LE, YLL, YLL_rate_100k) |>slice_sample(n =12) |>arrange(disease_name, age_group) |>kable(caption ="Sample YLL Computations (12 randomly selected rows)",col.names =c("Disease", "Age Group", "Sex", "Deaths","Age at Death", "Remaining LE", "YLL", "YLL/100k"),digits =c(0, 0, 0, 0, 1, 1, 0, 1),format.args =list(big.mark =",") ) |>kable_styling(bootstrap_options =c("striped", "hover"), full_width =FALSE)
remaining_LE is the core multiplier. A child dying at age 2 carries 84 remaining years; a person dying at 75 carries only 11. This is the mathematical expression of the social preference for averting premature death.
pmax(..., 0) ensures we never obtain negative YLL, which could theoretically occur if someone dies beyond the 86-year frontier (edge case).
coalesce() gracefully handles diseases with no recorded death age by substituting the age-group midpoint — a standard epidemiological imputation.
3.3 YLL by Disease and Age Group
Code
# ─────────────────────────────────────────────────────────────────# AGGREGATE YLL BY DISEASE AND AGE GROUP# This is the primary YLL summary table for interpretation# ─────────────────────────────────────────────────────────────────yll_disease_age <- yll_data |>group_by(disease_name, disease_category, age_group) |>summarise(total_deaths =sum(deaths),total_YLL =sum(YLL),YLL_rate =sum(YLL) /sum(population) *100000,.groups ="drop" )# Total YLL per disease (both sexes combined)yll_disease_total <- yll_data |>group_by(disease_name, disease_category) |>summarise(Deaths =sum(deaths),YLL =sum(YLL),YLL_rate =sum(YLL) /sum(population) *100000,.groups ="drop" ) |>arrange(desc(YLL))yll_disease_total |>gt() |>tab_header(title ="Total YLL by Disease Cause",subtitle ="Kenya, 2019 — All ages, both sexes combined" ) |>fmt_number(columns =c(Deaths, YLL), use_seps =TRUE, decimals =0) |>fmt_number(columns = YLL_rate, decimals =1) |>cols_label(disease_name ="Disease",disease_category ="Category",Deaths ="Total Deaths",YLL ="Total YLL",YLL_rate ="YLL Rate (per 100k)" ) |>data_color(columns = YLL,palette ="Blues" ) |>tab_footnote(footnote ="YLL Rate standardised to population size per disease stratum.",locations =cells_column_labels(columns = YLL_rate) )
Total YLL by Disease Cause
Kenya, 2019 — All ages, both sexes combined
Disease
Category
Total Deaths
Total YLL
YLL Rate (per 100k)1
HIV/AIDS
Communicable
31,270
1,509,929
5,392.6
Malaria
Communicable
15,420
1,007,966
3,599.9
Lower Respiratory Infections
Communicable
20,890
709,141
2,532.6
Stroke
Cardiovascular
31,883
427,099
1,525.4
Road Injuries
Injuries
10,135
421,772
1,506.3
Ischemic Heart Disease
Cardiovascular
27,031
354,507
1,266.1
Diabetes Mellitus
Non-Communicable
23,942
352,611
1,259.3
Depressive Disorders
Mental Health
751
0
0.0
1 YLL Rate standardised to population size per disease stratum.
3.4 Visualisation: YLL by Disease and Age
Code
# ─────────────────────────────────────────────────────────────────# HEATMAP: YLL per disease × age group# Log scale used because the range is very wide (child deaths create# very high YLL despite fewer absolute numbers).# ─────────────────────────────────────────────────────────────────yll_heat <- yll_data |>group_by(disease_name, age_group) |>summarise(YLL =sum(YLL), .groups ="drop") |>filter(YLL >0)ggplot(yll_heat, aes(x = age_group, y =fct_reorder(disease_name, YLL, sum),fill = YLL)) +geom_tile(colour ="white", linewidth =0.5) +geom_text(aes(label =if_else(YLL >=1000,paste0(round(YLL /1000, 0), "k"),as.character(round(YLL)))),size =3, colour ="white", fontface ="bold") +scale_fill_viridis_c(name ="YLL",trans ="log10",labels =label_comma(),option ="plasma" ) +labs(title ="Years of Life Lost by Disease and Age Group",subtitle ="Kenya 2019 — Both sexes combined; colour on log₁₀ scale",x ="Age Group", y =NULL,caption ="Values > 1000 shown as 'Xk'. Source: Simulated GBD-style dataset." ) +theme_burden()
Figure 1: Heatmap of YLL by Disease and Age Group, Kenya 2019
Interpretation of heatmap:
HIV/AIDS in the 30–44 age group shows the darkest cells, reflecting the catastrophic toll of HIV on working-age adults in sub-Saharan Africa. Deaths at this age carry approximately 48 remaining life years each.
Malaria in children aged 0–4 generates enormous YLL despite lower death counts than cardiovascular diseases, purely because each death removes ~84 years of potential life.
Cardiovascular diseases (IHD, Stroke) peak in the 60–79 age bands. While death counts are highest here, YLL per death is low (6–16 years remaining), explaining their moderate YLL despite high mortality.
The near-empty cells for Depressive Disorders reflect its classification as primarily non-fatal; the few deaths are suicide-related.
3.5 YLL by Sex — Population Pyramid Style
Code
# ─────────────────────────────────────────────────────────────────# DIVERGING BAR CHART (POPULATION PYRAMID STYLE)# Males plotted left (negative values), females right (positive)# This makes sex disparities immediately visible# ─────────────────────────────────────────────────────────────────yll_sex <- yll_data |>group_by(age_group, sex) |>summarise(YLL =sum(YLL), .groups ="drop") |>mutate(YLL_plot =if_else(sex =="Male", -YLL, YLL))ggplot(yll_sex, aes(x = YLL_plot, y = age_group, fill = sex)) +geom_col(alpha =0.85, width =0.75) +geom_vline(xintercept =0, colour ="black", linewidth =0.8) +scale_x_continuous(labels =function(x) paste0(comma(abs(x /1e3)), "k"),breaks =pretty(c(-max(abs(yll_sex$YLL_plot)), max(abs(yll_sex$YLL_plot))), 6) ) +scale_fill_manual(values =c("Male"="#2166AC", "Female"="#D6604D")) +labs(title ="Years of Life Lost by Age Group and Sex",subtitle ="Males (left) vs Females (right) — Kenya 2019",x ="YLL (thousands)",y ="Age Group",fill ="Sex",caption ="Each bar represents total YLL summed across all 6 diseases." ) +theme_burden() +theme(legend.position ="top")
Figure 2: YLL by Age Group and Sex — Population Pyramid
Interpretation — Sex Disparities in YLL:
Males bear a heavier YLL burden in almost all age groups, particularly in the 15–44 range. This is driven by (a) higher road injury mortality in males,
greater cardiovascular event rates in males, and (c) higher occupational HIV exposure in some contexts.
Females aged 15–44 show relatively elevated YLL for their sex, largely due to HIV/AIDS (females account for the majority of new HIV infections in sub-Saharan Africa through heterosexual transmission).
The 60+ age groups show converging YLL between sexes, reflecting greater female longevity but higher female prevalence of stroke and diabetes at older ages.
4 Disability-Adjusted Life Years (DALY) Calculation
Alternatively: \(\text{YLD} = P \times DW\) where \(P\) = prevalent cases (used when incidence data is unavailable). We use the incidence-based approach here as recommended by GBD 2019.
# ─────────────────────────────────────────────────────────────────# STEP 1: YLD CALCULATION## YLD (incidence-based) = Incident Cases × Disability Weight × Duration## Why incidence-based?# - Aligns with GBD 2019 methodology# - Avoids double-counting prevalent cases from previous years# - More sensitive to new episodes and intervention effects# ─────────────────────────────────────────────────────────────────daly_data <- yll_data |>mutate(# YLD: multiply incident cases by DW and average durationYLD = incident_cases * disability_weight * duration_years,# DALY = YLL + YLDDALY = YLL + YLD,# Per-capita rates for population-adjusted comparisonsYLD_rate_100k = (YLD / population) *100000,DALY_rate_100k = (DALY / population) *100000,# Fraction of DALY attributable to YLL vs YLDYLL_fraction =if_else(DALY >0, YLL / DALY, 0),YLD_fraction =if_else(DALY >0, YLD / DALY, 0) )# Verification: print a readable summary for one diseasedaly_data |>filter(disease_name =="HIV/AIDS", sex =="Female") |>select(age_group, deaths, incident_cases, disability_weight, duration_years, YLL, YLD, DALY) |>kable(caption ="DALY Computation Verification: HIV/AIDS — Female",col.names =c("Age Group", "Deaths", "Incid. Cases", "DW", "Duration (yr)","YLL", "YLD", "DALY"),digits =c(0, 0, 0, 3, 2, 0, 0, 0),format.args =list(big.mark =",") ) |>kable_styling(bootstrap_options =c("striped", "hover", "condensed"),full_width =FALSE)
DALY Computation Verification: HIV/AIDS — Female
Age Group
Deaths
Incid. Cases
DW
Duration (yr)
YLL
YLD
DALY
0-4
720
1,900
0.547
4.8
60,408
4,989
65,397
5-14
420
1,050
0.547
7.2
31,878
4,135
36,013
15-29
4,200
12,000
0.547
9.2
263,760
60,389
324,149
30-44
7,200
20,000
0.547
10.5
344,160
114,870
459,030
45-59
2,800
8,200
0.547
8.8
92,120
39,472
131,592
60-69
820
2,800
0.547
5.8
17,220
8,883
26,103
70-79
350
1,200
0.547
3.8
3,920
2,494
6,414
80+
150
520
0.547
2.5
285
711
996
Interpretation of HIV/AIDS DALY computation:
In females aged 30–44, HIV generates the highest DALYs because both YLL (high deaths × ~48 remaining years) and YLD (high incidence × DW=0.547 × ~10.5 years duration) are simultaneously maximised.
The long duration parameter for HIV (8–10+ years) substantially amplifies YLD. This is the mathematical signature of a chronic, disabling condition — even with effective ART reducing mortality, the ongoing disability burden remains substantial.
4.3 Total DALY Summary
Code
# ─────────────────────────────────────────────────────────────────# AGGREGATE DALY TABLE — by disease, combining both sexes# ─────────────────────────────────────────────────────────────────daly_summary <- daly_data |>group_by(disease_name, disease_category) |>summarise(Deaths =sum(deaths),YLL =sum(YLL),YLD =sum(YLD),DALY =sum(DALY),DALY_rate =sum(DALY) /sum(population) *100000,YLL_pct =round(sum(YLL) /sum(DALY) *100, 1),YLD_pct =round(sum(YLD) /sum(DALY) *100, 1),.groups ="drop" ) |>arrange(desc(DALY))daly_summary |>gt() |>tab_header(title ="DALY Burden by Disease — Kenya 2019",subtitle ="All ages, both sexes; sorted by total DALY burden" ) |>fmt_number(columns =c(Deaths, YLL, YLD, DALY), use_seps =TRUE, decimals =0) |>fmt_number(columns = DALY_rate, decimals =1) |>fmt_number(columns =c(YLL_pct, YLD_pct), decimals =1) |>cols_label(disease_name ="Disease",disease_category ="Category",Deaths ="Deaths",YLL ="YLL",YLD ="YLD",DALY ="Total DALY",DALY_rate ="DALY Rate/100k",YLL_pct ="YLL %",YLD_pct ="YLD %" ) |>data_color(columns = DALY, palette ="Reds") |>tab_footnote("% of total DALY attributable to each component.",locations =cells_column_labels(columns = YLL_pct)) |>tab_style(style =cell_text(weight ="bold"),locations =cells_body(rows =1) )
DALY Burden by Disease — Kenya 2019
All ages, both sexes; sorted by total DALY burden
Disease
Category
Deaths
YLL
YLD
Total DALY
DALY Rate/100k
YLL %1
YLD %
HIV/AIDS
Communicable
31,270
1,509,929
426,640
1,936,569
6,916.3
78.0
22.0
Malaria
Communicable
15,420
1,007,966
646
1,008,612
3,602.2
99.9
0.1
Lower Respiratory Infections
Communicable
20,890
709,141
2,392
711,533
2,541.2
99.7
0.3
Stroke
Cardiovascular
31,883
427,099
25,495
452,594
1,616.4
94.4
5.6
Road Injuries
Injuries
10,135
421,772
7,561
429,333
1,533.3
98.2
1.8
Diabetes Mellitus
Non-Communicable
23,942
352,611
32,542
385,153
1,375.5
91.6
8.4
Ischemic Heart Disease
Cardiovascular
27,031
354,507
16,794
371,301
1,326.1
95.5
4.5
Depressive Disorders
Mental Health
751
0
111,408
111,408
397.9
0.0
100.0
1 % of total DALY attributable to each component.
5 Visualisation and Interpretation
5.1 DALY Composition — YLL vs YLD Stacked Chart
Code
# ─────────────────────────────────────────────────────────────────# STACKED BAR CHART: Proportion of DALY from YLL vs YLD# This is the most important diagnostic chart — it tells you# whether a disease kills (high YLL fraction) or disables (high YLD fraction)# ─────────────────────────────────────────────────────────────────daly_long <- daly_summary |>select(disease_name, YLL, YLD, DALY) |>pivot_longer(cols =c(YLL, YLD), names_to ="Component", values_to ="Value") |>mutate(Proportion = Value / DALY,Component =factor(Component, levels =c("YLD", "YLL")) )ggplot(daly_long,aes(x = Value /1e3,y =fct_reorder(disease_name, DALY),fill = Component)) +geom_col(alpha =0.88, width =0.7) +geom_text(data = daly_summary,aes(x = DALY /1e3+20, y = disease_name,label =paste0(comma(round(DALY /1e3, 0)), "k")),inherit.aes =FALSE, size =3.2, hjust =0, fontface ="bold" ) +scale_fill_manual(values =c("YLL"="#D73027", "YLD"="#4575B4"),labels =c("YLL"="YLL (premature death)", "YLD"="YLD (disability)") ) +scale_x_continuous(labels = comma, expand =expansion(mult =c(0, 0.15))) +labs(title ="DALY Burden by Disease: Premature Death vs Disability",subtitle ="Kenya 2019 — Sorted by total DALY; values in thousands",x ="DALYs (thousands)",y =NULL,fill ="DALY Component",caption ="YLL = Years of Life Lost (mortality); YLD = Years Lived with Disability" ) +theme_burden() +theme(legend.position ="top")
Figure 3: DALY Composition: YLL vs YLD contribution by disease
Interpretation — DALY Composition:
HIV/AIDS is dominated by YLL, meaning the primary mechanism of its burden is premature death. Even with ART, untreated or late-treated HIV kills people at working ages, generating enormous life-year losses. The YLD component reflects ongoing disability in people living with HIV.
Depressive Disorders are 100% YLD (by definition in this dataset — near zero deaths), illustrating how mental health conditions can impose massive burden that is entirely invisible to mortality statistics. This is the key argument for using DALYs over death counts in health planning.
Malaria shows a mixed pattern: childhood deaths generate YLL, while recurrent non-fatal episodes in older children and adults generate YLD. In high-transmission settings, YLD from chronic anaemia and cognitive impairment can exceed YLL.
Diabetes is predominantly YLD: the disease is well-managed enough to delay mortality, but the long duration of the condition (8–11 years average in this data) with even mild disability weights accumulates substantial YLD.
5.2 DALY Rate by Age Group and Disease
Code
# ─────────────────────────────────────────────────────────────────# FACETED LINE CHART: DALY rate across age groups, by sex# Shows the age-pattern of burden for each disease independently# ─────────────────────────────────────────────────────────────────daly_age_sex <- daly_data |>group_by(disease_name, age_group, sex) |>summarise(DALY_rate =sum(DALY) /sum(population) *100000,.groups ="drop" )ggplot(daly_age_sex, aes(x = age_group, y = DALY_rate,colour = sex, group = sex)) +geom_line(linewidth =1.0, alpha =0.9) +geom_point(size =2.5, alpha =0.9) +facet_wrap(~disease_name, scales ="free_y", ncol =2) +scale_colour_manual(values =c("Male"="#2166AC", "Female"="#D6604D")) +scale_y_continuous(labels = comma) +labs(title ="Age-Specific DALY Rate by Disease and Sex",subtitle ="Kenya 2019 — Rate per 100,000 population (y-axis free scale)",x ="Age Group", y ="DALY Rate (per 100,000)",colour ="Sex",caption ="Note: y-axes differ across panels to show within-disease age patterns." ) +theme_burden() +theme(axis.text.x =element_text(angle =45, hjust =1, size =8),legend.position ="top" )
Figure 4: DALY Rate per 100,000 by Age Group — Faceted by Disease
Interpretation — Age-Specific DALY Patterns:
Malaria shows a characteristic U-shape or monotonically declining pattern — the highest DALY rates in children under 5, then falling as acquired immunity develops, with a slight uptick in the elderly due to immune senescence. This underscores why under-5 malaria prevention (bed nets, chemoprevention) is the highest-impact intervention.
HIV/AIDS peaks dramatically in the 15–44 age band, with females consistently higher than males in the 15–29 group. This sex reversal is a hallmark of the sub-Saharan African epidemic and reflects higher female biological susceptibility and social vulnerability to HIV.
Cardiovascular diseases (IHD, Stroke) show the expected J-shaped increase with age, confirming that cardiovascular risk accumulates with ageing. Male rates exceed female rates until the 70–80+ band, where female longevity results in more years of exposure.
Road Injuries exhibit a distinctive peak in the 15–44 male age group — the classic pattern of young male risk-taking driving transport fatalities. Female rates are substantially lower across all ages.
Depression shows peak DALY rates in the 15–44 age group, especially in females aged 15–29, aligning with established epidemiology of major depressive disorder peaking in early adulthood.
5.3 DALY Treemap by Category
Code
# ─────────────────────────────────────────────────────────────────# PROPORTIONAL STACKED BAR: Category-level DALY composition# Useful for national health system budget allocation discussions# ─────────────────────────────────────────────────────────────────category_daly <- daly_data |>group_by(disease_category) |>summarise(DALY =sum(DALY),YLL =sum(YLL),YLD =sum(YLD),.groups ="drop" ) |>mutate(pct = DALY /sum(DALY) *100,label =glue("{disease_category}\n{round(pct,1)}%\n({comma(round(DALY/1e3))}k)") ) |>arrange(desc(DALY))ggplot(category_daly,aes(x =reorder(disease_category, DALY), y = DALY /1e3,fill = disease_category)) +geom_col(width =0.65, alpha =0.88) +geom_text(aes(label =paste0(round(pct, 1), "%")),hjust =-0.2, fontface ="bold", size =4) +scale_fill_brewer(palette ="Set2") +scale_y_continuous(labels = comma, expand =expansion(mult =c(0, 0.18))) +coord_flip() +labs(title ="Total DALY Burden by Disease Category",subtitle ="Kenya 2019 — Percentage of total DALY shown",x =NULL, y ="DALYs (thousands)",caption ="Categories: GBD disease classification groupings." ) +theme_burden() +theme(legend.position ="none")
Figure 5: Total DALY by Disease Category — Proportional Bar Chart
5.4 YLL vs YLD Scatter — Disease Positioning
Code
# ─────────────────────────────────────────────────────────────────# SCATTER PLOT: YLL (x) vs YLD (y)# Diseases in the upper-left are high disability, low mortality# Diseases in the lower-right are high mortality, low disability# Diseases in the upper-right impose dual burden# ─────────────────────────────────────────────────────────────────ggplot(daly_summary,aes(x = YLL /1e3, y = YLD /1e3,colour = disease_category, size = DALY /1e3)) +geom_point(alpha =0.80) +geom_text_repel(aes(label = disease_name), size =3.5, fontface ="bold",max.overlaps =15, box.padding =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed",colour ="grey50", linewidth =0.8) +annotate("text", x =max(daly_summary$YLL /1e3) *0.1,y =max(daly_summary$YLD /1e3) *0.9,label ="YLD > YLL\n(Disability-dominant)", colour ="grey40",size =3.2, hjust =0) +annotate("text", x =max(daly_summary$YLL /1e3) *0.75,y =max(daly_summary$YLD /1e3) *0.08,label ="YLL > YLD\n(Mortality-dominant)", colour ="grey40",size =3.2, hjust =0) +scale_size_continuous(name ="Total DALY (k)", range =c(3, 12)) +scale_colour_brewer(name ="Disease Category", palette ="Dark2") +scale_x_continuous(labels = comma) +scale_y_continuous(labels = comma) +labs(title ="Disease Characterisation: Mortality Burden vs Disability Burden",subtitle ="Position relative to 45° line indicates dominant burden component",x ="YLL (thousands) — Premature mortality component",y ="YLD (thousands) — Disability component",caption ="Dashed line = equal YLL and YLD. Point size = total DALY burden." ) +theme_burden()
Figure 6: YLL vs YLD Scatter Plot: Disease Characterisation
Interpretation — Disease Positioning:
Diseases above the 45° line (YLD > YLL) are primarily disabling rather than fatal. In this dataset, Depressive Disorders falls decisively above the line — it has near-zero YLL but substantial YLD.
Diseases below the 45° line (YLL > YLD) are primarily lethal. HIV/AIDS and Road Injuries in working-age adults fall in this zone — the main pathway of their burden is premature death, not chronic disability.
Malaria sits near the line, reflecting its dual nature: a major killer of children (YLL) and a chronic debilitating illness in older groups (YLD).
This positioning diagram is a powerful policy communication tool: diseases in the upper-left require disability management services; diseases in the lower-right require mortality prevention interventions.
6 Advanced Analyses
6.1 Population-Attributable Fraction by Age (YLL Contribution)
Code
# ─────────────────────────────────────────────────────────────────# WHICH AGE GROUPS CONTRIBUTE MOST YLL?# This guides intervention targeting:# If children dominate → focus on paediatric prevention# If working ages dominate → focus on adult screening and treatment# ─────────────────────────────────────────────────────────────────yll_age_paf <- yll_data |>group_by(age_group) |>summarise(YLL =sum(YLL),Deaths =sum(deaths),.groups ="drop" ) |>mutate(YLL_pct = YLL /sum(YLL) *100,Deaths_pct = Deaths /sum(Deaths) *100,# Ratio: how much of YLL does this age contribute relative to its share of deaths?# Ratio > 1 means YLL is disproportionately high (young deaths)YLL_Death_ratio = YLL_pct / Deaths_pct )yll_age_paf |>kable(caption ="YLL and Death Attribution by Age Group — All diseases",col.names =c("Age Group", "YLL", "Deaths", "YLL %", "Deaths %","YLL/Death Ratio"),digits =c(0, 0, 0, 1, 1, 2),format.args =list(big.mark =",") ) |>kable_styling(bootstrap_options =c("striped", "hover"), full_width =FALSE) |>row_spec(which(yll_age_paf$YLL_Death_ratio >1.5), background ="#FDECEA")
YLL and Death Attribution by Age Group — All diseases
Age Group
YLL
Deaths
YLL %
Deaths %
YLL/Death Ratio
0-4
1,244,810
14,770
26.0
9.2
2.84
5-14
314,223
4,103
6.6
2.5
2.58
15-29
718,230
11,466
15.0
7.1
2.11
30-44
941,126
19,715
19.7
12.2
1.61
45-59
655,825
19,925
13.7
12.4
1.11
60-69
481,253
22,720
10.1
14.1
0.71
70-79
347,622
30,923
7.3
19.2
0.38
80+
79,937
37,700
1.7
23.4
0.07
Interpretation — YLL/Death Ratio:
A YLL/Death ratio > 1 means that age group is “over-represented” in YLL relative to its share of deaths. Highlighted rows (ratio > 1.5) indicate age groups where each death removes many remaining life years.
Children under 5 have the highest ratio: each death removes ~84 years, so even modest numbers of childhood deaths dominate total YLL.
Ages 70–80+ have a ratio < 1: they account for the majority of deaths but each death removes only 6–16 years, so their share of YLL is proportionally much smaller.
This is the mathematical justification for prioritising child health interventions in YLL-based burden analyses.
6.2 Sex-Stratified DALY Rate Comparison
Code
# ─────────────────────────────────────────────────────────────────# DUMBBELL PLOT: Male vs Female DALY rates per disease# Shows the magnitude and direction of sex inequality in burden# ─────────────────────────────────────────────────────────────────sex_daly <- daly_data |>group_by(disease_name, sex) |>summarise(DALY_rate =sum(DALY) /sum(population) *100000,.groups ="drop" ) |>pivot_wider(names_from = sex, values_from = DALY_rate) |>mutate(gap = Male - Female,dominant =if_else(gap >=0, "Male higher", "Female higher") )ggplot(sex_daly) +geom_segment(aes(x = Female, xend = Male,y =fct_reorder(disease_name, Male),yend =fct_reorder(disease_name, Male),colour = dominant),linewidth =1.8, alpha =0.7) +geom_point(aes(x = Female, y = disease_name), colour ="#D6604D", size =4) +geom_point(aes(x = Male, y = disease_name), colour ="#2166AC", size =4) +geom_text(aes(x = Male, y = disease_name,label =paste0("Δ=", round(abs(gap), 0))),nudge_y =0.28, size =3, fontface ="italic") +scale_colour_manual(values =c("Male higher"="#2166AC", "Female higher"="#D6604D") ) +scale_x_continuous(labels = comma) +labs(title ="DALY Rate Disparity Between Males and Females",subtitle ="Dumbbell plot — Blue = Male; Red = Female; Δ = absolute difference",x ="DALY Rate per 100,000 Population",y =NULL,colour ="Which sex has higher burden",caption ="Rates standardised within each sex's own population." ) +theme_burden() +theme(legend.position ="top")
Figure 7: Sex-Specific DALY Rate Comparison by Disease
Interpretation — Sex Inequalities:
Road Injuries show the largest male excess, consistent with global data showing 3:1 male-to-female road mortality ratios in low- and middle-income countries. Interventions targeting young male drivers (helmet laws, breathalysing, speed enforcement) would close this gap most effectively.
HIV/AIDS in this dataset shows female excess in DALY rates due to the disproportionate HIV burden on young women in sub-Saharan Africa. This is a policy-critical finding — female-focused PrEP programmes, social protection, and economic empowerment are the highest-impact responses.
Depression shows female excess, consistent with the well-established 2:1 female-to-male ratio in major depressive disorder globally.
Cardiovascular diseases show male excess, particularly IHD, reflecting earlier age of onset in males and higher hypertension prevalence.
6.3 Cause-Deleted Life Expectancy (Hypothetical Analysis)
Code
# ─────────────────────────────────────────────────────────────────# CAUSE-DELETED LIFE EXPECTANCY## Question: How much would life expectancy increase if we# eliminated each disease?## Approximation: ΔLE ≈ YLL / total_population# This gives the average life-years gained per person if the# disease were fully eliminated. It is an approximation of the# Arriaga decomposition method.# ─────────────────────────────────────────────────────────────────total_pop <- daly_data |>distinct(age_group, sex, disease_name, population) |># Use one disease to avoid duplicate population rowsfilter(disease_name =="Ischemic Heart Disease") |>summarise(total_pop =sum(population)) |>pull(total_pop)cause_deleted <- yll_data |>group_by(disease_name) |>summarise(total_YLL =sum(YLL), .groups ="drop") |>mutate(delta_LE_years = total_YLL / total_pop,delta_LE_days = delta_LE_years *365.25,interpretation =case_when( delta_LE_years >=1~"Major impact (≥1 year gain)", delta_LE_years >=0.25~"Moderate impact (3–12 months gain)",TRUE~"Modest impact (<3 months gain)" ) ) |>arrange(desc(delta_LE_years))cause_deleted |>gt() |>tab_header(title ="Hypothetical Gain in Life Expectancy if Disease Eliminated",subtitle =glue("Estimated based on total YLL and population of {comma(total_pop)}") ) |>fmt_number(columns = total_YLL, use_seps =TRUE, decimals =0) |>fmt_number(columns = delta_LE_years, decimals =3) |>fmt_number(columns = delta_LE_days, decimals =1) |>cols_label(disease_name ="Disease",total_YLL ="Total YLL",delta_LE_years ="ΔLE (years)",delta_LE_days ="ΔLE (days)",interpretation ="Impact Level" ) |>data_color(columns = delta_LE_years, palette ="YlOrRd") |>tab_footnote(footnote ="ΔLE ≈ YLL / population. Assumes independence of causes (not strictly valid for correlated risks).",locations =cells_column_labels(columns = delta_LE_years) )
Hypothetical Gain in Life Expectancy if Disease Eliminated
Estimated based on total YLL and population of 28,000,000
Disease
Total YLL
ΔLE (years)1
ΔLE (days)
Impact Level
HIV/AIDS
1,509,929
0.054
19.7
Modest impact (<3 months gain)
Malaria
1,007,966
0.036
13.1
Modest impact (<3 months gain)
Lower Respiratory Infections
709,141
0.025
9.3
Modest impact (<3 months gain)
Stroke
427,099
0.015
5.6
Modest impact (<3 months gain)
Road Injuries
421,772
0.015
5.5
Modest impact (<3 months gain)
Ischemic Heart Disease
354,507
0.013
4.6
Modest impact (<3 months gain)
Diabetes Mellitus
352,611
0.013
4.6
Modest impact (<3 months gain)
Depressive Disorders
0
0.000
0.0
Modest impact (<3 months gain)
1 ΔLE ≈ YLL / population. Assumes independence of causes (not strictly valid for correlated risks).
Interpretation — Cause-Deleted Analysis:
Eliminating HIV/AIDS would yield the largest life expectancy gain, reflecting its concentration of deaths in the 15–44 age group where remaining life expectancy is highest.
Cardiovascular diseases (IHD + Stroke combined) would provide the second largest gain. Even though cardiovascular deaths are concentrated in older ages (reducing per-death YLL), the sheer volume of deaths creates a substantial aggregate impact.
Malaria elimination would provide meaningful but smaller gains in life expectancy because, while child deaths are devastating in YLL terms, the population base is large.
Depression contributes zero ΔLE in this model because it is classified as non-fatal — reinforcing that using LE alone would entirely miss the burden of mental illness.
Diseases in the lower-right quadrant (high DALY burden, low cost per DALY averted) represent the strongest case for immediate investment. Malaria and LRI typically occupy this space because bed nets, vaccines, and antibiotics are highly cost-effective.
HIV/AIDS has high DALY but moderate cost-effectiveness — ART is effective but expensive at scale, placing it in the medium-priority tier in this simplified model. Prevention (PrEP, condom promotion) would shift it toward higher priority.
Cardiovascular and diabetes interventions are often more expensive per DALY averted due to the need for lifelong medication and monitoring, placing them in the upper portions of the plot.
This matrix is a starting point for investment cases, not a final answer. Equity considerations, political feasibility, and co-benefits must also inform resource allocation.
8 Summary Statistics and Final Table
Code
# ─────────────────────────────────────────────────────────────────# COMPREHENSIVE FINAL SUMMARY TABLE# Suitable for a national health report or policy brief appendix# ─────────────────────────────────────────────────────────────────final_table <- daly_data |>group_by(disease_name, disease_category, sex) |>summarise(Deaths =sum(deaths),YLL =sum(YLL),YLD =sum(YLD),DALY =sum(DALY),DALY_rate =sum(DALY) /sum(population) *100000,YLL_frac =round(sum(YLL) /sum(DALY) *100, 1),.groups ="drop" ) |>arrange(disease_category, disease_name, sex)# Render as interactive DataTable for explorationdatatable( final_table |>mutate(across(c(Deaths, YLL, YLD, DALY), ~comma(round(.))),DALY_rate =round(DALY_rate, 1),YLL_frac =paste0(YLL_frac, "%")),colnames =c("Disease", "Category", "Sex", "Deaths", "YLL", "YLD","DALY", "DALY Rate/100k", "YLL%"),filter ="top",extensions ="Buttons",options =list(pageLength =16,dom ="Bfrtip",buttons =c("copy", "csv", "excel"),scrollX =TRUE ),caption ="Full DALY Results Table — Kenya 2019 (Interactive: search, sort, export)")
9 Methodological Notes
9.1 Key Analytical Assumptions
Analytical Assumptions and Justifications
Assumption
Value_Used
Justification
Standard Life Expectancy
86.0 years (GBD 2019 standard frontier)
Represents the ideal life expectancy achievable — aspiration rather than current LE
YLD Formula
Incidence × Disability Weight × Duration
Preferred over prevalence-based when incidence data available
Age at Death Proxy
Mid-point of age group when not recorded
Standard epidemiological convention for grouped data
Population Denominator
Population size per age-sex stratum
Ensures rates are comparable across strata of different sizes
Discounting
No discounting applied (consistent with GBD 2010+)
GBD dropped discounting in 2010 to avoid systematically undervaluing elderly burden
Age-weighting
No age-weighting (GBD 2010+ methodology)
GBD dropped age-weighting in 2010 to avoid systematically undervaluing children/elderly
9.2 Limitations
Simulated data: This dataset is designed for methodological illustration. Real GBD analyses use cause-of-death models, DisMod-MR for disease modelling, and systematic reviews — not point estimates.
Independence assumption: The cause-deleted LE analysis assumes diseases are independent, which is not true (e.g., diabetes increases cardiovascular and infectious disease risk simultaneously).
Comorbidity not modelled: Individuals with multiple conditions experience disability from each, but joint disability weights are not additive — the GBD uses a multiplicative correction.
No uncertainty intervals: Real GBD estimates include 95% uncertainty intervals derived from Monte Carlo simulation of input uncertainty.
10 References and Further Reading
GBD 2019 Diseases and Injuries Collaborators (2020). Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019. The Lancet, 396(10258), 1204–1222.
Murray CJL, Lopez AD (1996). The Global Burden of Disease. Harvard School of Public Health / WHO.
Salomon JA et al. (2012). Disability weights for the Global Burden of Disease 2010 study. The Lancet, 380(9859), 2129–2143.
Institute for Health Metrics and Evaluation (IHME). GBD Compare tool. http://vizhub.healthdata.org/gbd-compare
Disease Control Priorities 3rd Edition (DCP3). http://dcp-3.org
---title: "Years of Life Lost (YLL) and Disability-Adjusted Life Years (DALY) Analysis"subtitle: "A Comprehensive Burden of Disease Study — Kenya, 2019"author: "Timothy Achala"date: todayformat: html: toc: true toc-depth: 4 toc-location: left toc-title: "Contents" code-fold: true code-summary: "Show Code" code-tools: true theme: cosmo highlight-style: github self-contained: true fig-width: 10 fig-height: 6 number-sections: true smooth-scroll: true df-print: pagedexecute: echo: true warning: false message: false cache: false---```{r}#| label: setup#| include: false# ─────────────────────────────────────────────────────────────────# PACKAGE LOADING# All required packages for this full analysis# ─────────────────────────────────────────────────────────────────library(tidyverse) # Data wrangling, ggplot2, dplyr, tidyr, readrlibrary(knitr) # Table rendering (kable)library(kableExtra) # Enhanced table stylinglibrary(scales) # Axis formatting (comma, percent)library(ggrepel) # Non-overlapping text labels in plotslibrary(patchwork) # Combine multiple ggplot2 plotslibrary(janitor) # Clean column names, tabulation toolslibrary(gt) # Grammar of Tables for publication-ready tableslibrary(viridis) # Colour-blind-friendly paletteslibrary(RColorBrewer) # Additional colour paletteslibrary(glue) # String interpolationlibrary(DT) # Interactive data tables# ─────────────────────────────────────────────────────────────────# GLOBAL THEME FOR ALL PLOTS# Sets a clean, publication-ready ggplot2 appearance# ─────────────────────────────────────────────────────────────────theme_burden <-function(base_size =12) {theme_minimal(base_size = base_size) +theme(plot.title =element_text(face ="bold", size = base_size +2, hjust =0),plot.subtitle =element_text(size = base_size, colour ="grey40"),plot.caption =element_text(size =9, colour ="grey55", hjust =1),axis.title =element_text(size = base_size, face ="bold"),axis.text =element_text(size = base_size -1),legend.title =element_text(face ="bold"),panel.grid.minor =element_blank(),strip.text =element_text(face ="bold"),plot.margin =margin(10, 15, 10, 10) )}```# Introduction and Conceptual Framework {#sec-intro}## BackgroundThe **Global Burden of Disease (GBD)** framework provides a systematic approachto quantifying health loss across populations. Two foundational metrics are:| Metric | What it Measures | Components ||--------|-----------------|------------|| **YLL** | Premature mortality burden | Deaths × remaining life expectancy || **YLD** | Non-fatal health burden | Prevalence × disability weight × duration || **DALY** | Total health burden | YLL + YLD |> **One DALY = one lost year of healthy life.** It represents the gap between> current health status and an ideal situation where everyone lives in full> health to an old age.## Epidemiological Justification### Why YLL MattersYears of Life Lost penalises deaths at younger ages more heavily than deaths atolder ages. A child who dies at age 2 contributes far more YLL than an80-year-old dying of the same disease. This is crucial for priority-settingbecause it focuses attention on **preventable premature mortality**.### Why DALY Is Superior to Simple Death CountsDeath counts ignore:1. **Non-fatal burden**: Depression rarely kills but causes enormous suffering.2. **Age at death**: A death at 25 is more tragic (in terms of life-years) than at 85.3. **Severity of disability**: A 10-year episode of mild anaemia ≠ 10 years of severe HIV.DALYs address all three limitations simultaneously.---# Data Loading and Exploration {#sec-data}## Load the Dataset```{r}#| label: load-data# ─────────────────────────────────────────────────────────────────# READ DATA# The CSV contains 128 rows covering 6 diseases × 8 age groups × 2 sexes# in Kenya, 2019 (GBD reference year).# ─────────────────────────────────────────────────────────────────raw_data <-read_csv("burden_of_disease_dataset.csv",show_col_types =FALSE)# Convert categorical variables to factors with meaningful orderingdata <- raw_data |>mutate(# Ordered age groups ensure correct axis ordering in plotsage_group =factor(age_group, levels =c("0-4", "5-14", "15-29", "30-44", "45-59", "60-69", "70-79", "80+" )),sex =factor(sex, levels =c("Male", "Female")),disease_category =factor(disease_category, levels =c("Communicable", "Non-Communicable", "Cardiovascular", "Mental Health", "Injuries" )),# Flag: does this disease cause direct mortality?cause_of_death =as.logical(cause_of_death =="Yes"),# age_at_death is already numeric; NAs are read as NA automatically by read_csvage_at_death =suppressWarnings(as.numeric(age_at_death)) )# Quick confirmationcat("Dimensions:", nrow(data), "rows ×", ncol(data), "columns\n")cat("Diseases:", paste(unique(data$disease_name), collapse =", "), "\n")cat("Age groups:", paste(levels(data$age_group), collapse =", "), "\n")```**Interpretation:** The dataset captures **6 major disease causes** (IschemicHeart Disease, Stroke, Lower Respiratory Infections, HIV/AIDS, DiabetesMellitus, Malaria, Depressive Disorders, Road Injuries) across **8 age strata**for both sexes in Kenya 2019. The total of 128 records allows stratifiedanalysis by age, sex, disease category, and cause type.## Data Quality Check```{r}#| label: data-quality# ─────────────────────────────────────────────────────────────────# MISSING VALUE AUDIT# age_at_death is legitimately NA for non-fatal diseases# (e.g., Depressive Disorders where cause_of_death = FALSE)# ─────────────────────────────────────────────────────────────────missing_summary <- data |>summarise(across(everything(), ~sum(is.na(.)))) |>pivot_longer(everything(), names_to ="Variable", values_to ="Missing_n") |>filter(Missing_n >0) |>mutate(Missing_pct =round(Missing_n /nrow(data) *100, 1),Explanation =case_when( Variable =="age_at_death"~"Expected: non-fatal diseases have no age at death",TRUE~"Investigate further" ) )missing_summary |>gt() |>tab_header(title ="Missing Data Audit") |>cols_label(Variable ="Column", Missing_n ="N Missing",Missing_pct ="% Missing", Explanation ="Reason") |>tab_style(style =cell_fill(color ="#FFF3CD"),locations =cells_body(rows = Missing_n >0))```**Interpretation:** The only missing values appear in `age_at_death` forDepressive Disorders, which is **by design** — depression is classified asprimarily non-fatal in this dataset, so no age-at-death value applies. Nounexpected missingness exists, confirming data integrity.## Descriptive Summary Table```{r}#| label: descriptive-summary# ─────────────────────────────────────────────────────────────────# SUMMARY BY DISEASE AND SEX# Aggregates key burden indicators before YLL/DALY calculation# ─────────────────────────────────────────────────────────────────summary_tbl <- data |>group_by(disease_name, disease_category, sex) |>summarise(Total_Deaths =sum(deaths, na.rm =TRUE),Total_Incident =sum(incident_cases, na.rm =TRUE),Total_Prevalent =sum(prevalent_cases, na.rm =TRUE),Mean_Dis_Weight =round(mean(disability_weight), 3),.groups ="drop" ) |>arrange(disease_category, disease_name, sex)summary_tbl |>gt() |>tab_header(title ="Burden of Disease: Summary Statistics by Cause and Sex",subtitle ="Kenya, 2019" ) |>fmt_number(columns =c(Total_Deaths, Total_Incident, Total_Prevalent),use_seps =TRUE, decimals =0) |>cols_label(disease_name ="Disease",disease_category ="Category",sex ="Sex",Total_Deaths ="Total Deaths",Total_Incident ="Incident Cases",Total_Prevalent ="Prevalent Cases",Mean_Dis_Weight ="Mean DW" ) |>tab_row_group(label ="Cardiovascular", rows = disease_category =="Cardiovascular") |>tab_row_group(label ="Communicable", rows = disease_category =="Communicable") |>tab_row_group(label ="Mental Health", rows = disease_category =="Mental Health") |>tab_row_group(label ="Non-Communicable",rows = disease_category =="Non-Communicable") |>tab_row_group(label ="Injuries", rows = disease_category =="Injuries") |>tab_style(style =cell_fill(color ="#E8F4FD"),locations =cells_row_groups())```---# Years of Life Lost (YLL) Calculation {#sec-yll}## Theoretical FrameworkYLL measures the years of life a person would have lived had they not diedprematurely. The GBD uses the **standard expected years of life lost** formula:$$\text{YLL}_{age,sex} = N_{deaths} \times L_{age,sex}$$where $L_{age,sex}$ is the **remaining life expectancy** at the age of deathaccording to a **standard life table** (GBD uses 86.0 years for females, 91.9years for males — here we use a unified 86-year frontier).The remaining life expectancy for someone dying at age $a$ is:$$L(a) = \text{Standard Life Expectancy} - a$$This means a 2-year-old dying contributes $(86 - 2) = 84$ YLL, while a75-year-old dying contributes only $(86 - 75) = 11$ YLL.## YLL Computation in R```{r}#| label: yll-calculation# ─────────────────────────────────────────────────────────────────# YLL CALCULATION## Formula: YLL = Deaths × (Standard LE - Age at Death)## Key design decisions:# 1. We use the GBD standard LE of 86.0 years (reference frontier).# 2. Mid-point of the age group is used as the proxy age at death# when age_at_death is missing (only for non-fatal diseases).# 3. YLL is set to 0 for non-fatal diseases (cause_of_death = FALSE).# 4. We add a floor: if (standard_LE - age_at_death) < 0, YLL = 0# to handle the rare case where age_at_death exceeds the standard.# ─────────────────────────────────────────────────────────────────STANDARD_LE <-86.0# GBD universal standard life expectancy (years)yll_data <- data |>mutate(# ── Mid-point of age group for cases where age_at_death is NA ──age_midpoint = (age_group_lower + age_group_upper) /2,# ── Use actual recorded age at death, else fall back to midpoint ──effective_age_at_death =coalesce(age_at_death, age_midpoint),# ── Remaining life expectancy at age of death ──remaining_LE =pmax(STANDARD_LE - effective_age_at_death, 0),# ── YLL = Deaths × Remaining LE ──# Non-fatal diseases (cause_of_death = FALSE) get YLL = 0YLL =if_else(cause_of_death, deaths * remaining_LE, 0),# ── YLL rate per 100,000 population (for comparability) ──YLL_rate_100k = (YLL / population) *100000 )# Show a sample of the computationyll_data |>filter(deaths >0) |>select(disease_name, age_group, sex, deaths, effective_age_at_death, remaining_LE, YLL, YLL_rate_100k) |>slice_sample(n =12) |>arrange(disease_name, age_group) |>kable(caption ="Sample YLL Computations (12 randomly selected rows)",col.names =c("Disease", "Age Group", "Sex", "Deaths","Age at Death", "Remaining LE", "YLL", "YLL/100k"),digits =c(0, 0, 0, 0, 1, 1, 0, 1),format.args =list(big.mark =",") ) |>kable_styling(bootstrap_options =c("striped", "hover"), full_width =FALSE)```**Interpretation of YLL computation logic:**- `remaining_LE` is the core multiplier. A child dying at age 2 carries 84 remaining years; a person dying at 75 carries only 11. This is the mathematical expression of the social preference for averting premature death.- `pmax(..., 0)` ensures we never obtain negative YLL, which could theoretically occur if someone dies beyond the 86-year frontier (edge case).- `coalesce()` gracefully handles diseases with no recorded death age by substituting the age-group midpoint — a standard epidemiological imputation.## YLL by Disease and Age Group```{r}#| label: yll-by-disease-age# ─────────────────────────────────────────────────────────────────# AGGREGATE YLL BY DISEASE AND AGE GROUP# This is the primary YLL summary table for interpretation# ─────────────────────────────────────────────────────────────────yll_disease_age <- yll_data |>group_by(disease_name, disease_category, age_group) |>summarise(total_deaths =sum(deaths),total_YLL =sum(YLL),YLL_rate =sum(YLL) /sum(population) *100000,.groups ="drop" )# Total YLL per disease (both sexes combined)yll_disease_total <- yll_data |>group_by(disease_name, disease_category) |>summarise(Deaths =sum(deaths),YLL =sum(YLL),YLL_rate =sum(YLL) /sum(population) *100000,.groups ="drop" ) |>arrange(desc(YLL))yll_disease_total |>gt() |>tab_header(title ="Total YLL by Disease Cause",subtitle ="Kenya, 2019 — All ages, both sexes combined" ) |>fmt_number(columns =c(Deaths, YLL), use_seps =TRUE, decimals =0) |>fmt_number(columns = YLL_rate, decimals =1) |>cols_label(disease_name ="Disease",disease_category ="Category",Deaths ="Total Deaths",YLL ="Total YLL",YLL_rate ="YLL Rate (per 100k)" ) |>data_color(columns = YLL,palette ="Blues" ) |>tab_footnote(footnote ="YLL Rate standardised to population size per disease stratum.",locations =cells_column_labels(columns = YLL_rate) )```## Visualisation: YLL by Disease and Age```{r}#| label: fig-yll-age-heatmap#| fig-cap: "Heatmap of YLL by Disease and Age Group, Kenya 2019"#| fig-height: 5# ─────────────────────────────────────────────────────────────────# HEATMAP: YLL per disease × age group# Log scale used because the range is very wide (child deaths create# very high YLL despite fewer absolute numbers).# ─────────────────────────────────────────────────────────────────yll_heat <- yll_data |>group_by(disease_name, age_group) |>summarise(YLL =sum(YLL), .groups ="drop") |>filter(YLL >0)ggplot(yll_heat, aes(x = age_group, y =fct_reorder(disease_name, YLL, sum),fill = YLL)) +geom_tile(colour ="white", linewidth =0.5) +geom_text(aes(label =if_else(YLL >=1000,paste0(round(YLL /1000, 0), "k"),as.character(round(YLL)))),size =3, colour ="white", fontface ="bold") +scale_fill_viridis_c(name ="YLL",trans ="log10",labels =label_comma(),option ="plasma" ) +labs(title ="Years of Life Lost by Disease and Age Group",subtitle ="Kenya 2019 — Both sexes combined; colour on log₁₀ scale",x ="Age Group", y =NULL,caption ="Values > 1000 shown as 'Xk'. Source: Simulated GBD-style dataset." ) +theme_burden()```**Interpretation of heatmap:**- **HIV/AIDS in the 30–44 age group** shows the darkest cells, reflecting the catastrophic toll of HIV on working-age adults in sub-Saharan Africa. Deaths at this age carry approximately 48 remaining life years each.- **Malaria in children aged 0–4** generates enormous YLL despite lower death counts than cardiovascular diseases, purely because each death removes ~84 years of potential life.- **Cardiovascular diseases (IHD, Stroke)** peak in the 60–79 age bands. While death counts are highest here, YLL per death is low (6–16 years remaining), explaining their moderate YLL despite high mortality.- The near-empty cells for **Depressive Disorders** reflect its classification as primarily non-fatal; the few deaths are suicide-related.## YLL by Sex — Population Pyramid Style```{r}#| label: fig-yll-sex-pyramid#| fig-cap: "YLL by Age Group and Sex — Population Pyramid"#| fig-height: 6# ─────────────────────────────────────────────────────────────────# DIVERGING BAR CHART (POPULATION PYRAMID STYLE)# Males plotted left (negative values), females right (positive)# This makes sex disparities immediately visible# ─────────────────────────────────────────────────────────────────yll_sex <- yll_data |>group_by(age_group, sex) |>summarise(YLL =sum(YLL), .groups ="drop") |>mutate(YLL_plot =if_else(sex =="Male", -YLL, YLL))ggplot(yll_sex, aes(x = YLL_plot, y = age_group, fill = sex)) +geom_col(alpha =0.85, width =0.75) +geom_vline(xintercept =0, colour ="black", linewidth =0.8) +scale_x_continuous(labels =function(x) paste0(comma(abs(x /1e3)), "k"),breaks =pretty(c(-max(abs(yll_sex$YLL_plot)), max(abs(yll_sex$YLL_plot))), 6) ) +scale_fill_manual(values =c("Male"="#2166AC", "Female"="#D6604D")) +labs(title ="Years of Life Lost by Age Group and Sex",subtitle ="Males (left) vs Females (right) — Kenya 2019",x ="YLL (thousands)",y ="Age Group",fill ="Sex",caption ="Each bar represents total YLL summed across all 6 diseases." ) +theme_burden() +theme(legend.position ="top")```**Interpretation — Sex Disparities in YLL:**- **Males bear a heavier YLL burden** in almost all age groups, particularly in the 15–44 range. This is driven by (a) higher road injury mortality in males, (b) greater cardiovascular event rates in males, and (c) higher occupational HIV exposure in some contexts.- **Females aged 15–44** show relatively elevated YLL for their sex, largely due to HIV/AIDS (females account for the majority of new HIV infections in sub-Saharan Africa through heterosexual transmission).- **The 60+ age groups** show converging YLL between sexes, reflecting greater female longevity but higher female prevalence of stroke and diabetes at older ages.---# Disability-Adjusted Life Years (DALY) Calculation {#sec-daly}## Conceptual Components$$\text{DALY} = \text{YLL} + \text{YLD}$$### YLD (Years Lived with Disability)$$\text{YLD} = I \times DW \times L$$where:- $I$ = number of **incident cases** in the period- $DW$ = **disability weight** (0 = perfect health; 1 = death), drawn from GBD- $L$ = **average duration** of the episode (years)> Alternatively: $\text{YLD} = P \times DW$ where $P$ = prevalent cases> (used when incidence data is unavailable). We use the **incidence-based**> approach here as recommended by GBD 2019.### Disability Weights Used| Disease | DW | Interpretation ||---------|-----|---------------|| IHD | 0.432 | Moderate-severe: chest pain, fatigue, limitation || Stroke | 0.552 | Severe: paralysis, speech loss, dependency || LRI | 0.279 | Moderate: breathlessness, fever, activity limitation || HIV/AIDS | 0.547 | Severe: immunosuppression, opportunistic infections || Diabetes | 0.049 | Mild-moderate: managed disease with complications || Malaria | 0.186 | Moderate: acute fever, anaemia, prostration || Depression | 0.145 | Mild-moderate: low mood, anhedonia, functional impairment || Road Injuries | 0.370 | Moderate-severe: trauma, fractures, rehabilitation |## DALY Computation```{r}#| label: daly-calculation# ─────────────────────────────────────────────────────────────────# STEP 1: YLD CALCULATION## YLD (incidence-based) = Incident Cases × Disability Weight × Duration## Why incidence-based?# - Aligns with GBD 2019 methodology# - Avoids double-counting prevalent cases from previous years# - More sensitive to new episodes and intervention effects# ─────────────────────────────────────────────────────────────────daly_data <- yll_data |>mutate(# YLD: multiply incident cases by DW and average durationYLD = incident_cases * disability_weight * duration_years,# DALY = YLL + YLDDALY = YLL + YLD,# Per-capita rates for population-adjusted comparisonsYLD_rate_100k = (YLD / population) *100000,DALY_rate_100k = (DALY / population) *100000,# Fraction of DALY attributable to YLL vs YLDYLL_fraction =if_else(DALY >0, YLL / DALY, 0),YLD_fraction =if_else(DALY >0, YLD / DALY, 0) )# Verification: print a readable summary for one diseasedaly_data |>filter(disease_name =="HIV/AIDS", sex =="Female") |>select(age_group, deaths, incident_cases, disability_weight, duration_years, YLL, YLD, DALY) |>kable(caption ="DALY Computation Verification: HIV/AIDS — Female",col.names =c("Age Group", "Deaths", "Incid. Cases", "DW", "Duration (yr)","YLL", "YLD", "DALY"),digits =c(0, 0, 0, 3, 2, 0, 0, 0),format.args =list(big.mark =",") ) |>kable_styling(bootstrap_options =c("striped", "hover", "condensed"),full_width =FALSE)```**Interpretation of HIV/AIDS DALY computation:**- In females aged 30–44, HIV generates the highest DALYs because both YLL (high deaths × ~48 remaining years) and YLD (high incidence × DW=0.547 × ~10.5 years duration) are simultaneously maximised.- The long duration parameter for HIV (8–10+ years) substantially amplifies YLD. This is the mathematical signature of a **chronic, disabling condition** — even with effective ART reducing mortality, the ongoing disability burden remains substantial.## Total DALY Summary```{r}#| label: daly-summary-table# ─────────────────────────────────────────────────────────────────# AGGREGATE DALY TABLE — by disease, combining both sexes# ─────────────────────────────────────────────────────────────────daly_summary <- daly_data |>group_by(disease_name, disease_category) |>summarise(Deaths =sum(deaths),YLL =sum(YLL),YLD =sum(YLD),DALY =sum(DALY),DALY_rate =sum(DALY) /sum(population) *100000,YLL_pct =round(sum(YLL) /sum(DALY) *100, 1),YLD_pct =round(sum(YLD) /sum(DALY) *100, 1),.groups ="drop" ) |>arrange(desc(DALY))daly_summary |>gt() |>tab_header(title ="DALY Burden by Disease — Kenya 2019",subtitle ="All ages, both sexes; sorted by total DALY burden" ) |>fmt_number(columns =c(Deaths, YLL, YLD, DALY), use_seps =TRUE, decimals =0) |>fmt_number(columns = DALY_rate, decimals =1) |>fmt_number(columns =c(YLL_pct, YLD_pct), decimals =1) |>cols_label(disease_name ="Disease",disease_category ="Category",Deaths ="Deaths",YLL ="YLL",YLD ="YLD",DALY ="Total DALY",DALY_rate ="DALY Rate/100k",YLL_pct ="YLL %",YLD_pct ="YLD %" ) |>data_color(columns = DALY, palette ="Reds") |>tab_footnote("% of total DALY attributable to each component.",locations =cells_column_labels(columns = YLL_pct)) |>tab_style(style =cell_text(weight ="bold"),locations =cells_body(rows =1) )```---# Visualisation and Interpretation {#sec-viz}## DALY Composition — YLL vs YLD Stacked Chart```{r}#| label: fig-daly-composition#| fig-cap: "DALY Composition: YLL vs YLD contribution by disease"#| fig-height: 5# ─────────────────────────────────────────────────────────────────# STACKED BAR CHART: Proportion of DALY from YLL vs YLD# This is the most important diagnostic chart — it tells you# whether a disease kills (high YLL fraction) or disables (high YLD fraction)# ─────────────────────────────────────────────────────────────────daly_long <- daly_summary |>select(disease_name, YLL, YLD, DALY) |>pivot_longer(cols =c(YLL, YLD), names_to ="Component", values_to ="Value") |>mutate(Proportion = Value / DALY,Component =factor(Component, levels =c("YLD", "YLL")) )ggplot(daly_long,aes(x = Value /1e3,y =fct_reorder(disease_name, DALY),fill = Component)) +geom_col(alpha =0.88, width =0.7) +geom_text(data = daly_summary,aes(x = DALY /1e3+20, y = disease_name,label =paste0(comma(round(DALY /1e3, 0)), "k")),inherit.aes =FALSE, size =3.2, hjust =0, fontface ="bold" ) +scale_fill_manual(values =c("YLL"="#D73027", "YLD"="#4575B4"),labels =c("YLL"="YLL (premature death)", "YLD"="YLD (disability)") ) +scale_x_continuous(labels = comma, expand =expansion(mult =c(0, 0.15))) +labs(title ="DALY Burden by Disease: Premature Death vs Disability",subtitle ="Kenya 2019 — Sorted by total DALY; values in thousands",x ="DALYs (thousands)",y =NULL,fill ="DALY Component",caption ="YLL = Years of Life Lost (mortality); YLD = Years Lived with Disability" ) +theme_burden() +theme(legend.position ="top")```**Interpretation — DALY Composition:**- **HIV/AIDS is dominated by YLL**, meaning the primary mechanism of its burden is premature death. Even with ART, untreated or late-treated HIV kills people at working ages, generating enormous life-year losses. The YLD component reflects ongoing disability in people living with HIV.- **Depressive Disorders are 100% YLD** (by definition in this dataset — near zero deaths), illustrating how mental health conditions can impose massive burden that is entirely invisible to mortality statistics. This is the key argument for using DALYs over death counts in health planning.- **Malaria shows a mixed pattern**: childhood deaths generate YLL, while recurrent non-fatal episodes in older children and adults generate YLD. In high-transmission settings, YLD from chronic anaemia and cognitive impairment can exceed YLL.- **Diabetes is predominantly YLD**: the disease is well-managed enough to delay mortality, but the long duration of the condition (8–11 years average in this data) with even mild disability weights accumulates substantial YLD.## DALY Rate by Age Group and Disease```{r}#| label: fig-daly-age-facet#| fig-cap: "DALY Rate per 100,000 by Age Group — Faceted by Disease"#| fig-height: 8# ─────────────────────────────────────────────────────────────────# FACETED LINE CHART: DALY rate across age groups, by sex# Shows the age-pattern of burden for each disease independently# ─────────────────────────────────────────────────────────────────daly_age_sex <- daly_data |>group_by(disease_name, age_group, sex) |>summarise(DALY_rate =sum(DALY) /sum(population) *100000,.groups ="drop" )ggplot(daly_age_sex, aes(x = age_group, y = DALY_rate,colour = sex, group = sex)) +geom_line(linewidth =1.0, alpha =0.9) +geom_point(size =2.5, alpha =0.9) +facet_wrap(~disease_name, scales ="free_y", ncol =2) +scale_colour_manual(values =c("Male"="#2166AC", "Female"="#D6604D")) +scale_y_continuous(labels = comma) +labs(title ="Age-Specific DALY Rate by Disease and Sex",subtitle ="Kenya 2019 — Rate per 100,000 population (y-axis free scale)",x ="Age Group", y ="DALY Rate (per 100,000)",colour ="Sex",caption ="Note: y-axes differ across panels to show within-disease age patterns." ) +theme_burden() +theme(axis.text.x =element_text(angle =45, hjust =1, size =8),legend.position ="top" )```**Interpretation — Age-Specific DALY Patterns:**- **Malaria** shows a characteristic U-shape or monotonically declining pattern — the highest DALY rates in children under 5, then falling as acquired immunity develops, with a slight uptick in the elderly due to immune senescence. This underscores why **under-5 malaria prevention** (bed nets, chemoprevention) is the highest-impact intervention.- **HIV/AIDS** peaks dramatically in the 15–44 age band, with females consistently higher than males in the 15–29 group. This sex reversal is a hallmark of the sub-Saharan African epidemic and reflects higher female biological susceptibility and social vulnerability to HIV.- **Cardiovascular diseases** (IHD, Stroke) show the expected J-shaped increase with age, confirming that cardiovascular risk accumulates with ageing. Male rates exceed female rates until the 70–80+ band, where female longevity results in more years of exposure.- **Road Injuries** exhibit a distinctive peak in the **15–44 male** age group — the classic pattern of young male risk-taking driving transport fatalities. Female rates are substantially lower across all ages.- **Depression** shows peak DALY rates in the **15–44 age group**, especially in females aged 15–29, aligning with established epidemiology of major depressive disorder peaking in early adulthood.## DALY Treemap by Category```{r}#| label: fig-daly-category-bar#| fig-cap: "Total DALY by Disease Category — Proportional Bar Chart"#| fig-height: 4.5# ─────────────────────────────────────────────────────────────────# PROPORTIONAL STACKED BAR: Category-level DALY composition# Useful for national health system budget allocation discussions# ─────────────────────────────────────────────────────────────────category_daly <- daly_data |>group_by(disease_category) |>summarise(DALY =sum(DALY),YLL =sum(YLL),YLD =sum(YLD),.groups ="drop" ) |>mutate(pct = DALY /sum(DALY) *100,label =glue("{disease_category}\n{round(pct,1)}%\n({comma(round(DALY/1e3))}k)") ) |>arrange(desc(DALY))ggplot(category_daly,aes(x =reorder(disease_category, DALY), y = DALY /1e3,fill = disease_category)) +geom_col(width =0.65, alpha =0.88) +geom_text(aes(label =paste0(round(pct, 1), "%")),hjust =-0.2, fontface ="bold", size =4) +scale_fill_brewer(palette ="Set2") +scale_y_continuous(labels = comma, expand =expansion(mult =c(0, 0.18))) +coord_flip() +labs(title ="Total DALY Burden by Disease Category",subtitle ="Kenya 2019 — Percentage of total DALY shown",x =NULL, y ="DALYs (thousands)",caption ="Categories: GBD disease classification groupings." ) +theme_burden() +theme(legend.position ="none")```## YLL vs YLD Scatter — Disease Positioning```{r}#| label: fig-yll-yld-scatter#| fig-cap: "YLL vs YLD Scatter Plot: Disease Characterisation"#| fig-height: 5.5# ─────────────────────────────────────────────────────────────────# SCATTER PLOT: YLL (x) vs YLD (y)# Diseases in the upper-left are high disability, low mortality# Diseases in the lower-right are high mortality, low disability# Diseases in the upper-right impose dual burden# ─────────────────────────────────────────────────────────────────ggplot(daly_summary,aes(x = YLL /1e3, y = YLD /1e3,colour = disease_category, size = DALY /1e3)) +geom_point(alpha =0.80) +geom_text_repel(aes(label = disease_name), size =3.5, fontface ="bold",max.overlaps =15, box.padding =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed",colour ="grey50", linewidth =0.8) +annotate("text", x =max(daly_summary$YLL /1e3) *0.1,y =max(daly_summary$YLD /1e3) *0.9,label ="YLD > YLL\n(Disability-dominant)", colour ="grey40",size =3.2, hjust =0) +annotate("text", x =max(daly_summary$YLL /1e3) *0.75,y =max(daly_summary$YLD /1e3) *0.08,label ="YLL > YLD\n(Mortality-dominant)", colour ="grey40",size =3.2, hjust =0) +scale_size_continuous(name ="Total DALY (k)", range =c(3, 12)) +scale_colour_brewer(name ="Disease Category", palette ="Dark2") +scale_x_continuous(labels = comma) +scale_y_continuous(labels = comma) +labs(title ="Disease Characterisation: Mortality Burden vs Disability Burden",subtitle ="Position relative to 45° line indicates dominant burden component",x ="YLL (thousands) — Premature mortality component",y ="YLD (thousands) — Disability component",caption ="Dashed line = equal YLL and YLD. Point size = total DALY burden." ) +theme_burden()```**Interpretation — Disease Positioning:**- Diseases **above the 45° line** (YLD > YLL) are primarily disabling rather than fatal. In this dataset, **Depressive Disorders** falls decisively above the line — it has near-zero YLL but substantial YLD.- Diseases **below the 45° line** (YLL > YLD) are primarily lethal. **HIV/AIDS** and **Road Injuries** in working-age adults fall in this zone — the main pathway of their burden is premature death, not chronic disability.- **Malaria** sits near the line, reflecting its dual nature: a major killer of children (YLL) and a chronic debilitating illness in older groups (YLD).- This positioning diagram is a powerful **policy communication tool**: diseases in the upper-left require disability management services; diseases in the lower-right require mortality prevention interventions.---# Advanced Analyses {#sec-advanced}## Population-Attributable Fraction by Age (YLL Contribution)```{r}#| label: yll-age-attribution# ─────────────────────────────────────────────────────────────────# WHICH AGE GROUPS CONTRIBUTE MOST YLL?# This guides intervention targeting:# If children dominate → focus on paediatric prevention# If working ages dominate → focus on adult screening and treatment# ─────────────────────────────────────────────────────────────────yll_age_paf <- yll_data |>group_by(age_group) |>summarise(YLL =sum(YLL),Deaths =sum(deaths),.groups ="drop" ) |>mutate(YLL_pct = YLL /sum(YLL) *100,Deaths_pct = Deaths /sum(Deaths) *100,# Ratio: how much of YLL does this age contribute relative to its share of deaths?# Ratio > 1 means YLL is disproportionately high (young deaths)YLL_Death_ratio = YLL_pct / Deaths_pct )yll_age_paf |>kable(caption ="YLL and Death Attribution by Age Group — All diseases",col.names =c("Age Group", "YLL", "Deaths", "YLL %", "Deaths %","YLL/Death Ratio"),digits =c(0, 0, 0, 1, 1, 2),format.args =list(big.mark =",") ) |>kable_styling(bootstrap_options =c("striped", "hover"), full_width =FALSE) |>row_spec(which(yll_age_paf$YLL_Death_ratio >1.5), background ="#FDECEA")```**Interpretation — YLL/Death Ratio:**- A **YLL/Death ratio > 1** means that age group is "over-represented" in YLL relative to its share of deaths. Highlighted rows (ratio > 1.5) indicate age groups where each death removes many remaining life years.- **Children under 5** have the highest ratio: each death removes ~84 years, so even modest numbers of childhood deaths dominate total YLL.- **Ages 70–80+** have a ratio < 1: they account for the majority of deaths but each death removes only 6–16 years, so their share of YLL is proportionally much smaller.- This is the mathematical justification for prioritising **child health interventions** in YLL-based burden analyses.## Sex-Stratified DALY Rate Comparison```{r}#| label: fig-sex-daly-comparison#| fig-cap: "Sex-Specific DALY Rate Comparison by Disease"#| fig-height: 5# ─────────────────────────────────────────────────────────────────# DUMBBELL PLOT: Male vs Female DALY rates per disease# Shows the magnitude and direction of sex inequality in burden# ─────────────────────────────────────────────────────────────────sex_daly <- daly_data |>group_by(disease_name, sex) |>summarise(DALY_rate =sum(DALY) /sum(population) *100000,.groups ="drop" ) |>pivot_wider(names_from = sex, values_from = DALY_rate) |>mutate(gap = Male - Female,dominant =if_else(gap >=0, "Male higher", "Female higher") )ggplot(sex_daly) +geom_segment(aes(x = Female, xend = Male,y =fct_reorder(disease_name, Male),yend =fct_reorder(disease_name, Male),colour = dominant),linewidth =1.8, alpha =0.7) +geom_point(aes(x = Female, y = disease_name), colour ="#D6604D", size =4) +geom_point(aes(x = Male, y = disease_name), colour ="#2166AC", size =4) +geom_text(aes(x = Male, y = disease_name,label =paste0("Δ=", round(abs(gap), 0))),nudge_y =0.28, size =3, fontface ="italic") +scale_colour_manual(values =c("Male higher"="#2166AC", "Female higher"="#D6604D") ) +scale_x_continuous(labels = comma) +labs(title ="DALY Rate Disparity Between Males and Females",subtitle ="Dumbbell plot — Blue = Male; Red = Female; Δ = absolute difference",x ="DALY Rate per 100,000 Population",y =NULL,colour ="Which sex has higher burden",caption ="Rates standardised within each sex's own population." ) +theme_burden() +theme(legend.position ="top")```**Interpretation — Sex Inequalities:**- **Road Injuries** show the largest male excess, consistent with global data showing 3:1 male-to-female road mortality ratios in low- and middle-income countries. Interventions targeting young male drivers (helmet laws, breathalysing, speed enforcement) would close this gap most effectively.- **HIV/AIDS in this dataset** shows female excess in DALY rates due to the disproportionate HIV burden on young women in sub-Saharan Africa. This is a policy-critical finding — female-focused PrEP programmes, social protection, and economic empowerment are the highest-impact responses.- **Depression** shows female excess, consistent with the well-established 2:1 female-to-male ratio in major depressive disorder globally.- **Cardiovascular diseases** show male excess, particularly IHD, reflecting earlier age of onset in males and higher hypertension prevalence.## Cause-Deleted Life Expectancy (Hypothetical Analysis)```{r}#| label: cause-deleted-le# ─────────────────────────────────────────────────────────────────# CAUSE-DELETED LIFE EXPECTANCY## Question: How much would life expectancy increase if we# eliminated each disease?## Approximation: ΔLE ≈ YLL / total_population# This gives the average life-years gained per person if the# disease were fully eliminated. It is an approximation of the# Arriaga decomposition method.# ─────────────────────────────────────────────────────────────────total_pop <- daly_data |>distinct(age_group, sex, disease_name, population) |># Use one disease to avoid duplicate population rowsfilter(disease_name =="Ischemic Heart Disease") |>summarise(total_pop =sum(population)) |>pull(total_pop)cause_deleted <- yll_data |>group_by(disease_name) |>summarise(total_YLL =sum(YLL), .groups ="drop") |>mutate(delta_LE_years = total_YLL / total_pop,delta_LE_days = delta_LE_years *365.25,interpretation =case_when( delta_LE_years >=1~"Major impact (≥1 year gain)", delta_LE_years >=0.25~"Moderate impact (3–12 months gain)",TRUE~"Modest impact (<3 months gain)" ) ) |>arrange(desc(delta_LE_years))cause_deleted |>gt() |>tab_header(title ="Hypothetical Gain in Life Expectancy if Disease Eliminated",subtitle =glue("Estimated based on total YLL and population of {comma(total_pop)}") ) |>fmt_number(columns = total_YLL, use_seps =TRUE, decimals =0) |>fmt_number(columns = delta_LE_years, decimals =3) |>fmt_number(columns = delta_LE_days, decimals =1) |>cols_label(disease_name ="Disease",total_YLL ="Total YLL",delta_LE_years ="ΔLE (years)",delta_LE_days ="ΔLE (days)",interpretation ="Impact Level" ) |>data_color(columns = delta_LE_years, palette ="YlOrRd") |>tab_footnote(footnote ="ΔLE ≈ YLL / population. Assumes independence of causes (not strictly valid for correlated risks).",locations =cells_column_labels(columns = delta_LE_years) )```**Interpretation — Cause-Deleted Analysis:**- Eliminating **HIV/AIDS** would yield the largest life expectancy gain, reflecting its concentration of deaths in the 15–44 age group where remaining life expectancy is highest.- **Cardiovascular diseases** (IHD + Stroke combined) would provide the second largest gain. Even though cardiovascular deaths are concentrated in older ages (reducing per-death YLL), the sheer volume of deaths creates a substantial aggregate impact.- **Malaria elimination** would provide meaningful but smaller gains in life expectancy because, while child deaths are devastating in YLL terms, the population base is large.- **Depression** contributes zero ΔLE in this model because it is classified as non-fatal — reinforcing that using LE alone would entirely miss the burden of mental illness.---# Policy Implications and Priority Setting {#sec-policy}## Cost-Effectiveness Framing```{r}#| label: priority-matrix# ─────────────────────────────────────────────────────────────────# PRIORITY MATRIX# Combines DALY burden with intervention availability data# (Notional cost-effectiveness values for illustration)# ─────────────────────────────────────────────────────────────────priority_df <- daly_summary |>select(disease_name, disease_category, DALY, DALY_rate) |>mutate(# Notional cost per DALY averted (USD) — illustrative values based on DCP3cost_per_DALY_averted =c(185, 220, 1850, 6200, 4500, 95, 280, 750),# Intervention availability (1=good, 0=poor)intervention_strength =c(3, 2, 4, 2, 3, 5, 2, 3),priority_score = (DALY /max(DALY)) * (1000/ cost_per_DALY_averted) * (intervention_strength /5),priority_tier =cut(priority_score,breaks =quantile(priority_score, c(0, 0.33, 0.67, 1)),labels =c("Low Priority", "Medium Priority", "High Priority"),include.lowest =TRUE) )ggplot(priority_df,aes(x = DALY /1e3,y = cost_per_DALY_averted,size = DALY_rate,colour = priority_tier)) +geom_point(alpha =0.80) +geom_text_repel(aes(label = disease_name), size =3.2,box.padding =0.5, max.overlaps =15) +scale_y_log10(labels = dollar) +scale_size_continuous(name ="DALY Rate/100k", range =c(3, 14)) +scale_colour_manual(name ="Priority Tier",values =c("High Priority"="#D73027","Medium Priority"="#F46D43","Low Priority"="#74ADD1") ) +labs(title ="Disease Priority Matrix: DALY Burden vs Cost-Effectiveness",subtitle ="High burden + low cost per DALY averted = highest priority (illustrative data)",x ="Total DALY (thousands)",y ="Cost per DALY Averted (USD, log scale)",caption ="Cost-per-DALY values are illustrative, based on DCP3 order-of-magnitude estimates." ) +theme_burden()```**Interpretation — Priority Matrix:**- **Diseases in the lower-right quadrant** (high DALY burden, low cost per DALY averted) represent the strongest case for immediate investment. Malaria and LRI typically occupy this space because bed nets, vaccines, and antibiotics are highly cost-effective.- **HIV/AIDS** has high DALY but moderate cost-effectiveness — ART is effective but expensive at scale, placing it in the medium-priority tier in this simplified model. Prevention (PrEP, condom promotion) would shift it toward higher priority.- **Cardiovascular and diabetes interventions** are often more expensive per DALY averted due to the need for lifelong medication and monitoring, placing them in the upper portions of the plot.- This matrix is a **starting point for investment cases**, not a final answer. Equity considerations, political feasibility, and co-benefits must also inform resource allocation.---# Summary Statistics and Final Table {#sec-summary}```{r}#| label: final-summary-table# ─────────────────────────────────────────────────────────────────# COMPREHENSIVE FINAL SUMMARY TABLE# Suitable for a national health report or policy brief appendix# ─────────────────────────────────────────────────────────────────final_table <- daly_data |>group_by(disease_name, disease_category, sex) |>summarise(Deaths =sum(deaths),YLL =sum(YLL),YLD =sum(YLD),DALY =sum(DALY),DALY_rate =sum(DALY) /sum(population) *100000,YLL_frac =round(sum(YLL) /sum(DALY) *100, 1),.groups ="drop" ) |>arrange(disease_category, disease_name, sex)# Render as interactive DataTable for explorationdatatable( final_table |>mutate(across(c(Deaths, YLL, YLD, DALY), ~comma(round(.))),DALY_rate =round(DALY_rate, 1),YLL_frac =paste0(YLL_frac, "%")),colnames =c("Disease", "Category", "Sex", "Deaths", "YLL", "YLD","DALY", "DALY Rate/100k", "YLL%"),filter ="top",extensions ="Buttons",options =list(pageLength =16,dom ="Bfrtip",buttons =c("copy", "csv", "excel"),scrollX =TRUE ),caption ="Full DALY Results Table — Kenya 2019 (Interactive: search, sort, export)")```---# Methodological Notes {#sec-methods}## Key Analytical Assumptions```{r}#| label: assumptions-table#| echo: falsetibble(Assumption =c("Standard Life Expectancy","YLD Formula","Age at Death Proxy","Population Denominator","Discounting","Age-weighting" ),Value_Used =c("86.0 years (GBD 2019 standard frontier)","Incidence × Disability Weight × Duration","Mid-point of age group when not recorded","Population size per age-sex stratum","No discounting applied (consistent with GBD 2010+)","No age-weighting (GBD 2010+ methodology)" ),Justification =c("Represents the ideal life expectancy achievable — aspiration rather than current LE","Preferred over prevalence-based when incidence data available","Standard epidemiological convention for grouped data","Ensures rates are comparable across strata of different sizes","GBD dropped discounting in 2010 to avoid systematically undervaluing elderly burden","GBD dropped age-weighting in 2010 to avoid systematically undervaluing children/elderly" )) |>kable(caption ="Analytical Assumptions and Justifications") |>kable_styling(bootstrap_options =c("striped", "hover"), full_width =TRUE)```## Limitations1. **Simulated data**: This dataset is designed for methodological illustration. Real GBD analyses use cause-of-death models, DisMod-MR for disease modelling, and systematic reviews — not point estimates.2. **Independence assumption**: The cause-deleted LE analysis assumes diseases are independent, which is not true (e.g., diabetes increases cardiovascular and infectious disease risk simultaneously).3. **Comorbidity not modelled**: Individuals with multiple conditions experience disability from each, but joint disability weights are not additive — the GBD uses a multiplicative correction.4. **No uncertainty intervals**: Real GBD estimates include 95% uncertainty intervals derived from Monte Carlo simulation of input uncertainty.---# References and Further Reading {#sec-refs}- **GBD 2019 Diseases and Injuries Collaborators** (2020). Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019. *The Lancet*, 396(10258), 1204–1222.- **Murray CJL, Lopez AD** (1996). *The Global Burden of Disease*. Harvard School of Public Health / WHO.- **Salomon JA et al.** (2012). Disability weights for the Global Burden of Disease 2010 study. *The Lancet*, 380(9859), 2129–2143.- **Institute for Health Metrics and Evaluation (IHME)**. GBD Compare tool. http://vizhub.healthdata.org/gbd-compare- **Disease Control Priorities 3rd Edition (DCP3)**. http://dcp-3.org```{r}#| label: session-info#| collapse: truesessionInfo()```