Introduction

Measles are an extremely contagious viral respiratory infection. It is transmitted through airborne droplets, such as those produced by coughing and sneezing. The symptoms of measles typically include fever, cough, runny nose, and a red rash. If untreated, Measles can lead to serious complications such as pneumonia and encephalitis. There is no antiviral treatment for Measles. However, its symptoms can be managed, which helps lower the risk of complications. Measles is highly preventable through the MMR (measles, mumps, and rubella) vaccine.

The Coronavirus disease (COVID-19) pandemic disrupted public health in many ways. It interfered with healthcare access, routine vaccination programs, and lab testing. People were required to stay home, leading to school closures, limited travel, and reduced public activity, which likely limited the spread of infectious diseases such as measles. These combined changes may have played a significant role in reported measles cases during the pandemic compared to previous years. This case study examines how measles cases in the Americas differed between the pre-pandemic year (2019) and the first year of the pandemic (2022), when lock down measures were at their peak and most strict.

Load Libraries & Data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.1     ✔ readr     2.1.6
## ✔ ggplot2   4.0.1     ✔ stringr   1.6.0
## ✔ lubridate 1.9.5     ✔ tibble    3.3.1
## ✔ purrr     1.2.1     ✔ tidyr     1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at /Users/anthonyrodriguez/Documents/Biostatitics Spring 2026/ABDLabs
library(readxl)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
library(ggplot2)
library(ggpubr)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:purrr':
## 
##     some
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
cases_month <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-24/cases_month.csv')
## Rows: 22780 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): region, country, iso3
## dbl (12): year, month, measles_suspect, measles_clinical, measles_epi_linked...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cases_year <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-24/cases_year.csv')
## Rows: 2382 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): region, country, iso3
## dbl (16): year, total_population, annualized_population_most_recent_year_onl...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This data has so many observations which makes the data extremely hard to read. We need to filter out the most imporant pieces of data that we’ll be using so that R doesn’t call in the entire data set.

Filter Data By Year Seperately Into Countries and Seasons

cases_2019 <- filter(cases_month, year == "2019")
measles_by_season_AMR_2019 <- cases_2019 %>% # this represents measles cases by season  in 2019  
  filter(grepl("AMR", region)) %>%
  mutate(season = case_when(
    month %in% c(12, 1, 2)  ~ "Winter",
    month %in% c(3, 4, 5)   ~ "Spring",
    month %in% c(6, 7, 8)   ~ "Summer",
    month %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_
  )) %>%
  group_by(season) %>%   
  summarise(
    mean_measles = mean(measles_total, na.rm = TRUE),
    sd_measles   = sd(measles_total, na.rm = TRUE),
    n            = sum(!is.na(measles_total)),
    se_measles   = sd_measles / sqrt(n)
  )
measles_by_season_AMR_2019
## # A tibble: 4 × 5
##   season mean_measles sd_measles     n se_measles
##   <chr>         <dbl>      <dbl> <int>      <dbl>
## 1 Fall          143.       762.     77      86.9 
## 2 Spring         15.7       51.9    83       5.70
## 3 Summer        122.       685.     76      78.5 
## 4 Winter         22.4      122.     76      14.0
measles_by_country_AMR_2019 <- cases_2019 %>%
  filter(grepl("AMR", region)) %>%
  group_by(country) %>%          
  summarise(
    mean_measles = mean(log(measles_total + 1), na.rm = TRUE),
    sd_measles   = sd(log(measles_total + 1), na.rm = TRUE),    
    n            = sum(!is.na(measles_total)),
    se_measles   = sd_measles / sqrt(n)
  )
measles_by_country_AMR_2019 
## # A tibble: 33 × 5
##    country                          mean_measles sd_measles     n se_measles
##    <chr>                                   <dbl>      <dbl> <int>      <dbl>
##  1 Antigua and Barbuda                     0          0         2      0    
##  2 Argentina                               1.51       1.50     12      0.433
##  3 Bahamas                                 0.358      0.511     5      0.229
##  4 Barbados                                0          0         8      0    
##  5 Belize                                  0          0         8      0    
##  6 Bolivia (Plurinational State of)        0          0        12      0    
##  7 Brazil                                  6.07       2.20     12      0.636
##  8 Canada                                  2.23       0.899    10      0.284
##  9 Chile                                   0.530      0.513    12      0.148
## 10 Colombia                                2.97       0.454    12      0.131
## # ℹ 23 more rows
cases_2022 <- filter(cases_month, year == "2022")
measles_by_season_AMR_2022 <- cases_2022 %>% # this represents measles cases by season for 2022 and not as a whole
  filter(grepl("AMR", region)) %>%
  mutate(season = case_when(
    month %in% c(12, 1, 2)  ~ "Winter",
    month %in% c(3, 4, 5)   ~ "Spring",
    month %in% c(6, 7, 8)   ~ "Summer",
    month %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_
  )) %>%
  group_by(season) %>%   # ✅ group by season ONLY
  summarise(
    mean_measles = mean(measles_total, na.rm = TRUE),
    sd_measles   = sd(measles_total, na.rm = TRUE),
    n            = sum(!is.na(measles_total)),
    se_measles   = sd_measles / sqrt(n)
  )
measles_by_season_AMR_2022
## # A tibble: 4 × 5
##   season mean_measles sd_measles     n se_measles
##   <chr>         <dbl>      <dbl> <int>      <dbl>
## 1 Fall          1.32       7.50     62      0.953
## 2 Spring        0.538      2.17     65      0.269
## 3 Summer        0.188      0.957    64      0.120
## 4 Winter        0.690      3.67     58      0.481
measles_by_country_AMR_2022 <- cases_2022 %>%
  filter(grepl("AMR", region)) %>%
  group_by(country) %>%          
  summarise(
    mean_measles = mean(log(measles_total + 1), na.rm = TRUE),
    sd_measles   = sd(log(measles_total + 1), na.rm = TRUE),
    n            = sum(!is.na(measles_total)),
    se_measles   = sd_measles / sqrt(n)
  )
measles_by_country_AMR_2022
## # A tibble: 27 × 5
##    country                          mean_measles sd_measles     n se_measles
##    <chr>                                   <dbl>      <dbl> <int>      <dbl>
##  1 Argentina                               0.116      0.270    12     0.0779
##  2 Bahamas                                 0         NA         1    NA     
##  3 Barbados                                0          0         4     0     
##  4 Belize                                  0          0         5     0     
##  5 Bolivia (Plurinational State of)        0          0        12     0     
##  6 Brazil                                  0.963      1.10     12     0.319 
##  7 Canada                                  0.693      0         3     0     
##  8 Chile                                   0          0        12     0     
##  9 Colombia                                0          0        12     0     
## 10 Costa Rica                              0          0        11     0     
## # ℹ 17 more rows

Plot both of these data frames to see if there is anything that needs to be transformed or filtered even more.

ggplot(measles_by_season_AMR_2019, aes(x = season, y = mean_measles, color = season)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_measles - se_measles, ymax = mean_measles + se_measles),
                width = 0.2) +
  labs(title = "Measles Counts In Each Season for America in 2019",
       x = "Season",
       y = "Mean Measles Cases") +
  theme_classic() 

Figure 1. Seasonal Measles Cases in the Americas in 2019

This figure displays the mean measles cases by season in the America’s in 2019. Each season is represented by a single dot plot. The point on the plot represents the average number of measles cases for each season. Fall has the highest mean value, followed by summer. Winter and spring had much lower mean measles cases with Spring being the lowest. The error bars for fall and summer were much higher which indicates that there was greater variation in case counts in these seasons. The error bars for winter and spring were much smaller which indicates lower and more stable measles case counts.

measles_by_country_AMR_2019 <- cases_2019 %>%
  filter(grepl("AMR", region)) %>%
  group_by(country) %>%
  summarise(
    mean_measles = mean(measles_total, na.rm = TRUE),
    sd_measles   = sd(measles_total, na.rm = TRUE),
    n            = sum(!is.na(measles_total)),
    se_measles   = sd_measles / sqrt(n)
  ) %>%
  mutate(region_americas = case_when(
    country %in% c("Canada", "United States of America") ~ "North America",
    
    country %in% c("Mexico", "Guatemala", "Belize", "Honduras",
                   "El Salvador", "Nicaragua", "Costa Rica",
                   "Panama") ~ "Central America",
    
    country %in% c("Cuba", "Jamaica", "Haiti", "Dominican Republic",
                   "Puerto Rico", "Trinidad and Tobago", "Barbados",
                   "Bahamas", "Martinique", "Guadeloupe",
                   "Saint Lucia", "Grenada", "Aruba",
                   "Antigua and Barbuda", "Dominica",
                   "Saint Vincent and the Grenadines",
                   "Saint Kitts and Nevis") ~ "Caribbean",
    
    country %in% c("Colombia", "Venezuela (Bolivarian Republic of)", "Ecuador", "Peru",
                   "Brazil", "Bolivia (Plurinational State of)", "Chile", "Argentina",
                   "Uruguay", "Paraguay", "Guyana",
                   "Suriname") ~ "South America",
    
    TRUE ~ NA_character_
  ))
#let's make this graph, but facet_wrap by region within the Americas
ggplot(measles_by_country_AMR_2019, aes(x = country, y = mean_measles, color = region_americas)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_measles - se_measles, ymax = mean_measles + se_measles),
                width = 0.2) +
  labs(title = "Measles Cases By Country for America in 2019",
       x = "Country",
       y = "Log (Average Measles Cases)") +
  theme_classic() + 
  facet_wrap(~region_americas, scales = "free_x") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(measles_by_country_AMR_2019, aes(x = country, y = mean_measles, color = region_americas)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_measles - se_measles, ymax = mean_measles + se_measles),
                width = 0.2) +
  labs(title = "Average Number of Measles By Country for America in 2019",
       x = "Country",
       y = "Average Number of Measles Cases") +
  theme_classic() + 
  facet_wrap(~region_americas, scales = "free") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Figure 2. Measles Cases by Country for the Americas in 2019

This figure represents the average Measles cases by countries in the Americas in 2019. The countries are grouped into the following regions: Caribbean, Central America, North America and South America . Most countries, especially ones in the Carribean and Central Americas, report mean measles case values that are very low or near zero. This is not the case for North and South America. While most countries in South America have average Measles cases of 0, Brazils’ is about 1700 which is huge and a clear spike in Measles cases for them. In North America, only the United States showed a higher average of Measles cases which was about 120 - that is not as high as Brazil but higher than the average of 0 compared to most of other countries in the Americas.

ggplot(measles_by_season_AMR_2022, aes(x = season, y = mean_measles, color = season)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_measles - se_measles, ymax = mean_measles + se_measles),
                width = 0.2) +
  labs(title = "Measles Counts In Each Season for America in 2022",
       x = "Season",
       y = "Mean Measles Cases") +
  theme_classic()

Figure 3. Seasonal Measles Cases in the Americas in 2022

This figure displays the mean measles cases by season in the Americas in 2022. Each season is represented by a single dot plot. The point on the plot represents the average number of measles cases for each season. Fall has the highest mean value. After fall is Winter then spring and summer has the lowest mean measles cases out of all seasons in the Americas.

# We also need to adjust our data for the countries that turned in their medical records in 2022 so lets do that now.   

measles_by_country_AMR_2022 <- cases_2022 %>%
  filter(grepl("AMR", region)) %>%
  group_by(country) %>%
  summarise(
    mean_measles = mean(measles_total, na.rm = TRUE),
    sd_measles   = sd(measles_total, na.rm = TRUE),
    n            = sum(!is.na(measles_total)),
    se_measles   = sd_measles / sqrt(n)
  ) %>%
  mutate(region_americas = case_when(
    country %in% c("Canada", "United States of America") ~ "North America",
    
    country %in% c("Mexico", "Guatemala", "Belize", "Honduras",
                   "El Salvador", "Nicaragua", "Costa Rica",
                   "Panama") ~ "Central America",
    
    country %in% c("Cuba", "Jamaica", "Bahamas", "Haiti", "Dominican Republic",
                   "Barbados","Grenada",
                   "Antigua and Barbuda") ~ "Caribbean",
    
    country %in% c("Colombia", "Venezuela (Bolivarian Republic of)", "Peru",
                   "Brazil", "Bolivia (Plurinational State of)", "Chile", "Argentina",
                   "Uruguay", "Paraguay", "Guyana") ~ "South America",
    TRUE ~ NA_character_
  ))
ggplot(measles_by_country_AMR_2022, aes(x = country, y = mean_measles, color = region_americas)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_measles - se_measles, ymax = mean_measles + se_measles),
                width = 0.2) +
  labs(title = "Average Number of Measles By Country for America in 2022",
       x = "Country",
       y = "Average Number of Measles Cases") +
  theme_classic() + 
  facet_wrap(~ region_americas, scales = "free") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# This graph shows the Measles cases average in 2022 pertaining to the countries in the Americas 

Figure 4. Measles Cases by Country in the Americas

This figure represents the average number of measles cases across the different countries in the Americas in the year 2022. The countries are in groups by their region as follows: the Carribean, Central America, North America and South America. Most cpuntries especially ones in the Carribean and Central America report mean measles cases that are very low or near zero cases. North and South America on the other hand both have one country each that has a little more than the average for other. In North America, the United States has an average of about 10 measles cases and its error bars ereport bars that range from about 6-14. In South America, Brazil has an average of around 4 measles cases with error bars that range from about 2-5.

Merging the Two Data Frames into a New One

merged_df <- merge(measles_by_country_AMR_2019,measles_by_country_AMR_2022,by="country") 

Statistical Test

Paired T-Test

t.test(merged_df$mean_measles.x,merged_df$mean_measles.y,paired=TRUE) # mean_measles.x referes to the data for 2019, mean_measles.y refers to the data for 2022
## 
##  Paired t-test
## 
## data:  merged_df$mean_measles.x and merged_df$mean_measles.y
## t = 1.1115, df = 26, p-value = 0.2765
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -60.6373 203.4120
## sample estimates:
## mean difference 
##        71.38737

The p-value for this paired t-test is 0.2765. This p-value is greater than our threshold of 0.05. That means that we FAIL TO REJECT the null hypothesis that there seasonal changes have no impact on Measles cases.

Analysis

Across the four figures, measles cases in the Americas show differences by season and between 2019 and 2022. In 2019, the seasonal plot shows a higher mean of measles case in fall and summer, with lower values in winter and spring. In 2022, mean measles cases are lower across all seasons with fall still having the highest mean and summer having the lowest. This change suggests a shift in the number of reported cases and in how cases are distributed across seasons.

When comparing the countries of the Americas we see that 2019 shows that the United States in North America and Brazil in South America that have higher mean measles cases compared to the other countries in the Americas that reported. They had wider error bars which is an indication of greater variation. In contrast, in 2022, most countries across all regions in the Americas have mean measles cases at or near zero with only a few Brazil and the United States still have more but the means for both are much less their reported case means in 2019. This shows a huge decrease in reported measles cases across the Americas in 2022 compared to 2019.

Measles cases appear higher and more variable across countries in 2019, while in 2022, cases are lower and more consistent across regions.

Conclusion

In this study, we were looking to see if seasonal changes had an impact on the increase of Measles Cases in the Americas before and after Covid-19. After filtering down our data, plotting it, and running a paired t-test, we were able to conclude that seasonal changes do NOT effect the spread of Measles.

After doing some research, we found our that hotter climates in fact degrade the Measles virus and its function; leaving it inefficient and useless. Finding this out was an extra piece of evidence to help us answer our major statistical question and claims. The major outbreak in Brazil records in 2019 were partly of the collection site; Sau Paolo. Sao Paolo is the largest city in Brazil; making it the most populated and the 1st of Brazil cities to be contaminated with the Measles. The decrease in these records in 2022 was because of the global demand for Covid-19 vaccinations and quarantine. The demand for Covid-19 vaccinations made people realize about how much badly they needed their Measles vaccines so they got them if possible. Being apart from everyone allowed Brazil to recuperate everything that happened since the major outbreak.

References

Francis, Matthew R. “Just How Contagious Is Covid-19? This Chart Puts It in Perspective.” Popular Science, Popular Science, 21 Feb. 2020, www.popsci.com/story/health/how-diseases-spread/.

Makarenko, Cristina, et al. “Measles Resurgence in Brazil: Analysis of the 2019 Epidemic in the State of São Paulo.” Revista de Saude Publica, U.S. National Library of Medicine, 13 June 2022, pmc.ncbi.nlm.nih.gov/articles/PMC9239333/.

“Measles.” Mayo Clinic, Mayo Foundation for Medical Education and Research, www.mayoclinic.org/diseases-conditions/measles/symptoms-causes/syc-20374857. Accessed 29 Apr. 2026.

Rfordatascience. “Tidytuesday/Data/2025/2025-06-24/Readme.Md at Main · Rfordatascience/Tidytuesday.” GitHub, github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-06-24/readme.md. Accessed 29 Apr. 2026.