ART Anova Analysis on the global AMR Burden Rate

Methodology

The metric measures that are used to describe the burden rate of AMR are:
- Attributable Death Rate
- Associated Death Rate
- Attributable Disability-Adjusted Life Years(DALY) rate
- Associated DALY rate
The meaning of terminology
- Attributable (to AMR): Refers to deaths or health outcomes directly caused by antimicrobial resistance, meaning they would not have occurred if the infection had been drug-susceptible.
- Associated (with AMR): Refers to deaths or health outcomes that occur in the presence of AMR, regardless of whether resistance was the direct cause.

The process of data extraction

Grouping WHO priority Pathogen

The data on these burden rates were extracted from the GBD Microbe website, which provides estimates of the global burden of antimicrobial resistance (AMR). This platform reports the mortality and DALY (Disability-Adjusted Life Years) rates caused by various drug-resistant pathogens. For this study, we used data categorized by the WHO Priority Pathogen List—which classifies pathogens into critical, high, and medium priority groups—across 204 countries and territories.

att_burden_rate <- read.csv("C:/Users/symou/Desktop/Research_AMR/Research_AMR/amr_burden/att_death_rate/microbe_download.csv")
att_burden_rate <- data.frame (head(att_burden_rate))
print(att_burden_rate[c("Location", "Age", "Metric", "Pathogen", "Antibiotic.class", "Counterfactual", "Year", "Value")])

##      Location              Age          Metric                Pathogen
## 1 Afghanistan Age-standardized Rate (per 100k) Acinetobacter baumannii
## 2 Afghanistan Age-standardized Rate (per 100k) Acinetobacter baumannii
## 3 Afghanistan Age-standardized Rate (per 100k) Acinetobacter baumannii
## 4 Afghanistan Age-standardized Rate (per 100k) Acinetobacter baumannii
## 5 Afghanistan Age-standardized Rate (per 100k) Acinetobacter baumannii
## 6 Afghanistan Age-standardized Rate (per 100k) Acinetobacter baumannii
##   Antibiotic.class Counterfactual Year    Value
## 1      Carbapenems   Attributable 1990 2.274598
## 2      Carbapenems   Attributable 1991 2.296411
## 3      Carbapenems   Attributable 1992 2.470749
## 4      Carbapenems   Attributable 1993 2.471484
## 5      Carbapenems   Attributable 1994 2.619534
## 6      Carbapenems   Attributable 1995 2.539058

The downloaded data for each country includes not only the age-standardized death rate caused by each pathogen but also the total number of deaths attributed to each corresponding pathogen. The aim is to categorize these burden rates into three groups—critical, high, and medium priority pathogens—according to the WHO Priority Pathogen List.
To compute the overall burden rate for each group, a weighted average is used. The weight for each pathogen is based on the total number of deaths it caused. The formula for calculating the weighted average burden rate of the critical group is as follows:

\[ \text{Critical Group Burden Rate} = \frac{\sum_{i} \left( \text{BurdenRate}_i \times \text{TotalDeaths}_i \right)}{\sum_{i} \text{TotalDeaths}_i} \]

Where:
- i refers to each pathogen in the critical group
- Burden rate i, is the age-standardized death rate from pathogen i
- Total Deaths, is the number of deaths caused by pathogen i

library (dplyr)
library (tibble)

critical_priority_pairs <- tibble(
   Pathogen = c(
      "Klebsiella pneumoniae", "Escherichia coli", "Acinetobacter baumannii",
      "Mycobacterium tuberculosis", "Mycobacterium tuberculosis", "Mycobacterium tuberculosis",
      "Escherichia coli", "Klebsiella pneumoniae", "Enterobacter spp.", "Citrobacter spp.",
      "Proteus spp.", "Serratia spp.", "Morganella spp."
   ),
   Antibiotic.class = c(
      "Carbapenems", "Third-generation cephalosporins", "Carbapenems",
      "Resistance to one or more antibiotics", "Extensive drug resistance in TB",
      "Multi-drug resistance excluding extensive drug resistance in TB",
      "Carbapenems", "Third-generation cephalosporins", "Carbapenems", "Third-generation cephalosporins",
      "Third-generation cephalosporins", "Third-generation cephalosporins", "Third-generation cephalosporins"
   )
)

# High Priority bug-drug pairs
high_priority_pairs <- tibble(
   Pathogen = c(
      "Salmonella enterica serovar Typhi", "Shigella spp.", "Enterococcus faecium",
      "Pseudomonas aeruginosa", "Non-typhoidal Salmonella", "Neisseria gonorrhoeae",
      "Staphylococcus aureus", "Neisseria gonorrhoeae"
   ),
   Antibiotic.class = c(
      "Fluoroquinolones", "Fluoroquinolones", "Vancomycin", "Carbapenems",
      "Fluoroquinolones", "Fluoroquinolones", "Methicillin", "Third-generation cephalosporins"
   )
)

# Medium Priority bug-drug pairs
medium_priority_pairs <- tibble(
   Pathogen = c(
      "Group A Streptococcus", "Streptococcus pneumoniae",
      "Haemophilus influenzae", "Group B Streptococcus"
   ),
   Antibiotic.class = c(
      "Macrolides", "Macrolides", "Aminopenicillin", "Penicillin"
   )
)

Matching a SDI value and SDI category to each country

The Social Demographic Index (SDI) is a score made by the Global Burden of Disease study to show how developed a country is. It looks at things like income, education, and birth rates and puts countries into five groups: low, low-middle, middle, middle-high, and high.
Since SDI can change every year, I used the 2021 SDI value to keep things consistent and group countries for the whole period from 1990 to 2021.
Then, for each country, I averaged the burden rates of the WHO’s critical pathogens over the years. This gave me one number per country that sums up the burden of antimicrobial resistance, which I used for further analysis to answer the research objective

library(dplyr)

country_list <- read.csv("country_list.csv", fileEncoding = "Windows-1254", stringsAsFactors = FALSE, header = FALSE)
colnames(country_list) <- "location_name"

sdi_value <- read.csv("sdi_value.csv", fileEncoding = "Windows-1254", stringsAsFactors = FALSE)
quintile_sdi_value <- read.csv ("sdi_quintile_ref.csv")

df_sdi_val <- country_list %>% 
   inner_join(
      sdi_value %>% 
         filter (year_id == 2021) %>% 
       select (location_name, year_id, mean_value),
      by = "location_name"
   ) %>%
   mutate(
      sdi_category = case_when(
         mean_value >= 0 & mean_value < 0.46581580319161997 ~ "low",
         mean_value >= 0.46581580319161997 & mean_value < 0.6188294452454329 ~ "middle_low",
         mean_value >= 0.6188294452454329 & mean_value < 0.7119746219361235 ~ "middle",
         mean_value >= 0.7119746219361235 & mean_value < 0.8102959891918925 ~ "middle_high",
         mean_value >= 0.8102959891918925 ~ "high"
      )
   )

# JOIN THE SDI CATEGORY TO THE 4 METRIC OF AMR BURDEN DATAFRAME FOR FURTHER STATISTICAL ANALYSIS

## write a function that join the sdi category to each country. Then aggregate the data
agg_with_sdi <- function(metric_df, sdi_df,
                        location_col_metric = "Location",
                        location_col_sdi = "location_name",
                        rate_col = "weighted_avg_rate",
                        group_cols = c("Location", "priority_group", "sdi_category")) {
  
  # join SDI to metric dataframe by location columns
  joined_df <- metric_df %>%
     left_join(sdi_df, by = setNames(location_col_sdi, location_col_metric))
  
  # Aggregate mean rate by group
  aggregated_df <- joined_df %>%
     group_by(across(all_of(group_cols))) %>%
     summarise(mean_rate = mean(.data[[rate_col]], na.rm = TRUE),
               mean_sdi = mean(mean_value, na.rm = TRUE),
               .groups = "drop")
}

The final data should be looking like this:

final_df <- read.csv("C:/Users/symou/Desktop/Research_AMR/Research_AMR/agg_priority_group_att_death_rate.csv")
print(head(final_df))

##      Location priority_group sdi_category mean_rate  mean_sdi
## 1 Afghanistan       Critical          low 2.2511523 0.3372000
## 2 Afghanistan           High          low 2.0505373 0.3372000
## 3 Afghanistan         Medium          low 0.4975396 0.3372000
## 4     Albania       Critical       middle 0.4635214 0.7068498
## 5     Albania           High       middle 0.7081419 0.7068498
## 6     Albania         Medium       middle 0.0735871 0.7068498

Data Analysis

Since our main objective was to examine the effects of WHO priority group, SDI category, and their interaction on a continuous outcome variable (burden rate), we initially applied a two-way ANOVA. However, the assumptions of ANOVA were violated — specifically, the residuals were not normally distributed, and there was unequal variance across groups.

To address these issues while still testing the interaction between the two categorical factors, we used the Aligned Rank Transformation (ART) ANOVA. The ART method allows for nonparametric analysis of factorial designs by first aligning the data to isolate each main effect and interaction, then applying a rank transformation. This makes it possible to perform ANOVA-like tests without requiring normality or equal variance.

Additionally, because the data included repeated measures across multiple countries, and countries could differ in baseline burden levels, we used a mixed-effects model to account for the random effect of country. This helps control for variability between countries that is not explained by the fixed effects (priority group and SDI). For example, countries vary in healthcare access, antibiotic usage policies, disease surveillance systems, and public health infrastructure. These differences can influence the overall AMR burden, regardless of the WHO priority group or SDI category.

After fitting the ART ANOVA, we used a linear model on the rank-transformed data to estimate the group effects. From this model, we calculated estimated marginal means (EMMs), which represent the adjusted average rank for each combination of SDI category and WHO priority group. This step helps us interpret the group differences while accounting for the other variable and the random effect of country. The EMMs make it easier to compare groups even in the presence of non-normal data and unequal variance, as they are based on the fitted model of aligned ranks rather than raw values.

write a function for fitting mode

library (ARTool)
library (emmeans)
library (dplyr)

# Using mixed-effects model to account for fixed effects of priority group and SDI category,
# while modeling random variability across countries (Location) as random intercepts.
## How do WHO priority pathogen groups and SDI categories affect AMR attributable death rates across countries, 
## accounting for country-level variability


# create a function to run an ART anova
run_art_analysis <- function(data) {
   # ensure categorical variables are treated as factor
   data <- data %>%
      mutate(across(c(Location, priority_group, sdi_category), as.factor))
   
   # fit ART model with mixed effect for location by + (1|Location)
   art_model <- art(mean_rate ~ priority_group * sdi_category + (1|Location), data = data)
   print(anova(art_model))
   
   # post-hoc for priority_group variable by estimated marginal means of aligned from linear model
   emm_model_priority <- artlm(art_model, "priority_group")
   emm_priority <- emmeans (emm_model_priority, pairwise ~ priority_group, adjust = "bonferroni")
   
   # post-hoc for sdi_category
   emm_model_sdi <- artlm(art_model, "sdi_category")
   emm_sdi <- emmeans(emm_model_sdi, pairwise ~ sdi_category, adjust = "bonferroni")
   
   # Interaction of two variables
   emm_model_interaction <- artlm(art_model, "priority_group:sdi_category")
   emm_interaction <- emmeans (emm_model_interaction, ~ priority_group | sdi_category)
   interaction_pairs <- pairs (emm_interaction, adjust = "bonferroni")
   
   # return all results as a list
   return (list (
      priority_group = list (emmeans = emm_priority$emmeans, contrasts = emm_priority$contrasts),
      sdi_group = list (emmeans = emm_sdi$emmeans, contrasts = emm_sdi$contrasts),
      interaction = list (emmeans = emm_interaction, contrasts = interaction_pairs)
   ))
}

Applying a function to attributable death rate dataframe

att_asdr_art <- run_art_analysis(final_df)

## Analysis of Variance of Aligned Rank Transformed Data
## 
## Table Type: Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) 
## Model: Mixed Effects (lmer)
## Response: art(mean_rate)
## 
##                                     F Df Df.res     Pr(>F)    
## 1 priority_group              352.452  2    398 < 2.22e-16 ***
## 2 sdi_category                 27.498  4    199 < 2.22e-16 ***
## 3 priority_group:sdi_category  16.588  8    398 < 2.22e-16 ***
## ---
## Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions

The result showed that there is a significant different of death rate among different pathogen priority group and sdi_category. There is also a significant interaction of the two variable affecting the death rate

# Post Hoc result of variable priority_group
list(
    Emms_path = att_asdr_art$priority_group$emmeans,
    Contrasts_path = att_asdr_art$priority_group$contrasts)

## $Emms_path
##  priority_group emmean   SE  df lower.CL upper.CL
##  Critical          370 8.99 543      353      388
##  High              412 8.99 543      394      429
##  Medium            136 8.99 543      118      153
## 
## Results are averaged over the levels of: sdi_category 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $Contrasts_path
##  contrast          estimate   SE  df t.ratio p.value
##  Critical - High      -41.3 11.2 398  -3.689  0.0008
##  Critical - Medium    234.5 11.2 398  20.925  <.0001
##  High - Medium        275.9 11.2 398  24.615  <.0001
## 
## Results are averaged over the levels of: sdi_category 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 3 tests

# Post Hoc result of variable sdi_category
list(
    Emms_sdi = att_asdr_art$sdi_group$emmeans,
    Contrasts_sdi = att_asdr_art$sdi_group$contrasts)

## $Emms_sdi
##  sdi_category emmean   SE  df lower.CL upper.CL
##  high            158 16.9 199      124      191
##  low             380 18.8 199      343      417
##  middle          326 17.1 199      292      359
##  middle_high     310 16.3 199      278      342
##  middle_low      373 16.9 199      340      406
## 
## Results are averaged over the levels of: priority_group 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $Contrasts_sdi
##  contrast                 estimate   SE  df t.ratio p.value
##  high - low                -222.19 25.2 199  -8.807  <.0001
##  high - middle             -168.07 24.0 199  -7.000  <.0001
##  high - middle_high        -152.43 23.5 199  -6.497  <.0001
##  high - middle_low         -215.45 23.9 199  -9.028  <.0001
##  low - middle                54.11 25.4 199   2.133  0.3413
##  low - middle_high           69.75 24.9 199   2.807  0.0550
##  low - middle_low             6.73 25.2 199   0.267  1.0000
##  middle - middle_high        15.64 23.6 199   0.662  1.0000
##  middle - middle_low        -47.38 24.0 199  -1.973  0.4984
##  middle_high - middle_low   -63.02 23.5 199  -2.686  0.0785
## 
## Results are averaged over the levels of: priority_group 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 10 tests

# Post Hoc result of the interaction between two variables
list(
    Emms_int = att_asdr_art$interaction$emmeans,
    Contrasts_int = att_asdr_art$interaction$contrasts)

## $Emms_int
## sdi_category = high:
##  priority_group emmean   SE  df lower.CL upper.CL
##  Critical          161 25.1 534      111      210
##  High              366 25.1 534      316      415
##  Medium            434 25.1 534      385      483
## 
## sdi_category = low:
##  priority_group emmean   SE  df lower.CL upper.CL
##  Critical          346 27.9 534      291      401
##  High              179 27.9 534      125      234
##  Medium            306 27.9 534      251      361
## 
## sdi_category = middle:
##  priority_group emmean   SE  df lower.CL upper.CL
##  Critical          285 25.4 534      236      335
##  High              360 25.4 534      310      410
##  Medium            322 25.4 534      272      372
## 
## sdi_category = middle_high:
##  priority_group emmean   SE  df lower.CL upper.CL
##  Critical          255 24.3 534      208      303
##  High              393 24.3 534      346      441
##  Medium            319 24.3 534      271      367
## 
## sdi_category = middle_low:
##  priority_group emmean   SE  df lower.CL upper.CL
##  Critical          339 25.1 534      289      388
##  High              222 25.1 534      173      271
##  Medium            292 25.1 534      242      341
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $Contrasts_int
## sdi_category = high:
##  contrast          estimate   SE  df t.ratio p.value
##  Critical - High     -205.0 30.9 398  -6.632  <.0001
##  Critical - Medium   -273.2 30.9 398  -8.840  <.0001
##  High - Medium        -68.3 30.9 398  -2.209  0.0833
## 
## sdi_category = low:
##  contrast          estimate   SE  df t.ratio p.value
##  Critical - High      166.6 34.4 398   4.851  <.0001
##  Critical - Medium     40.2 34.4 398   1.171  0.7266
##  High - Medium       -126.4 34.4 398  -3.680  0.0008
## 
## sdi_category = middle:
##  contrast          estimate   SE  df t.ratio p.value
##  Critical - High      -74.3 31.3 398  -2.376  0.0539
##  Critical - Medium    -36.2 31.3 398  -1.159  0.7419
##  High - Medium         38.1 31.3 398   1.218  0.6720
## 
## sdi_category = middle_high:
##  contrast          estimate   SE  df t.ratio p.value
##  Critical - High     -138.2 29.9 398  -4.630  <.0001
##  Critical - Medium    -63.6 29.9 398  -2.131  0.1012
##  High - Medium         74.6 29.9 398   2.499  0.0386
## 
## sdi_category = middle_low:
##  contrast          estimate   SE  df t.ratio p.value
##  Critical - High      116.9 30.9 398   3.782  0.0005
##  Critical - Medium     47.0 30.9 398   1.521  0.3874
##  High - Medium        -69.9 30.9 398  -2.262  0.0728
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 3 tests

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

interaction_emm <- att_asdr_art$interaction$emmeans
interaction_df <- as.data.frame(interaction_emm)

ggplot(interaction_df, aes(x = priority_group, y = emmean, fill = sdi_category)) +
    geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
    geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
                 position = position_dodge(width = 0.8), width = 0.2) + 
    scale_fill_brewer(palette = "Pastel1") +
    labs(title = "Interaction: Priority Group × SDI Category",
        x = "Priority Group", y = "Estimated Marginal Mean (Aligned)",
        fill = "SDI Category") +
    theme_minimal()

QUESTION

emm_raw_comparision - I am still learning the intuition of Aligned Rank Transformation ANOVA, so I am not sure if I do the analysis correctly or not.
- What is surprising me that how could the estimated marginal mean differ so much from the raw mean. I understand that this aligned ranked value of the data is used to account for the difference of base line for each country then fitting the linear model on that rank to have estimated mean. However, I am not sure if the process is 100 percent correct.
- That is why I would like to have your opinion, if the methodology I am using, the code that I wrote is seemingly correct.
- If you would have a time, please have a look and check for me please professor
- Thank you in advance!