The data used in this analysis comes from the Pre-Approved TidyTuesday Datasets provided by Dr. Tothero and includes the annual number of confirmed measles cases reported across different regions of the world. This dataset allows for comparison of disease patterns on a global scale.
Measles is a highly contagious viral disease that continues to be a major global public health concern, especially in areas with low vaccination coverage. Although an effective vaccine exists, outbreaks still occur due to gaps in immunization, limited healthcare access, and population movement. Studying measles cases helps researchers and public health officials understand where the disease is most prevalent and where prevention efforts are needed.
|
|
|
Figure 1. Visualization of the measles virus (left) and common symptoms of measles (right)
The response variable in this study is the annual number of reported confirmed measles cases.
The explanatory variable is the geographic region, which groups countries into broader global areas.
This analysis aims to determine whether measles cases differ across regions, which can help identify areas that may need stronger vaccination and public health interventions.
Reseacrh Question: Is there a significant difference in the annual number of reported confirmed measles cases across different regions?
ggplot(data, aes(x = measles_lab_confirmed)) +
geom_histogram(fill = "red", color = "black") +
scale_x_log10() +
xlab("Annual Confirmed Measles Cases (log10 scale)") +
ylab("Frequency") +
ggtitle("Distribution of Annual Confirmed Measles Cases")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 703 rows containing non-finite outside the scale range
## (`stat_bin()`).
data %>%
group_by(region) %>%
summarise(
mean_cases = mean(measles_lab_confirmed, na.rm = TRUE),
median_cases = median(measles_lab_confirmed, na.rm = TRUE),
sd_cases = sd(measles_lab_confirmed, na.rm = TRUE),
se_cases = sd(measles_lab_confirmed, na.rm = TRUE) / sqrt(n()))
## # A tibble: 6 × 5
## region mean_cases median_cases sd_cases se_cases
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AFRO 262. 53 523. 21.3
## 2 AMRO 176. 0 1410. 78.0
## 3 EMRO 660. 79 2014. 119.
## 4 EURO 410. 10 1886. 72.4
## 5 SEARO 958. 20 3507. 287.
## 6 WPRO 704. 1 4044. 219.
ggplot(data, aes(x = region, y = measles_lab_confirmed)) +
geom_boxplot(fill = "red", color = "black") +
scale_y_log10() +
xlab("Region") +
ylab("Annual Confirmed Measles Cases (log10 scale)") +
ggtitle("Distribution of Confirmed Measles Cases by Region (Log Scale)") +
theme_minimal()
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 703 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
The histogram above shows the distribution of annual confirmed measles cases on a log scale. I chose to display it on a log scale because this transformation reduces the impact of extremely large values and provides a clearer view of the overall distribution, which still remains right-skewed. The histogram displays a warning related to the log10 transformation because the dataset contains zero values for confirmed measles cases, and log10(0) is undefined. As a result, these observations are removed from the visualization. This highlights that a substantial number of observations report zero confirmed cases, which is important when interpreting the overall distribution of the data.
The table above summarizes the mean, median, standard deviation, and standard error of annual confirmed measles cases across different regions.
The boxplot above shows the distribution of annual confirmed measles cases across different regions on a log scale. The log transformation has been used again because it allows for clearer comparison by reducing the impact of extremely large values. There are noticeable differences in both the median and spread of measles cases between regions. For example, SEARO and EMRO appear to have higher median case counts, while AMRO and EURO show lower median values. Additionally, regions such as SEARO and WPRO display greater variability, as indicated by the wider spread of the boxes and whiskers. These differences suggest that the number of confirmed measles cases varies by region, providing visual evidence that geographic region may influence measles case distribution. These observations will be further tested using statistical analysis in the next section.
To determine whether there is a statistically significant difference in the mean annual number of confirmed measles cases across regions, a one-way ANOVA test was conducted. This test was selected because the analysis involves comparing the mean number of confirmed measles cases (a continuous variable) across multiple independent groups (regions), making it appropriate for determining whether there are statistically significant differences between the means of more than two groups.
The null hypothesis (H₀) states that there is no difference in the mean number of confirmed measles cases across regions.
The alternative hypothesis (Hₐ) states that at least one region has a different mean number of confirmed measles cases.
anova_model <- aov(measles_lab_confirmed ~ region, data = data)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## region 5 1.200e+08 24001288 4.85 0.000204 ***
## Residuals 2376 1.176e+10 4948409
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = measles_lab_confirmed ~ region, data = data)
##
## $region
## diff lwr upr p adj
## AMRO-AFRO -85.72483 -521.68907 350.2394 0.9934576
## EMRO-AFRO 398.25984 -57.49318 854.0129 0.1266990
## EURO-AFRO 148.45656 -206.98744 503.9006 0.8412463
## SEARO-AFRO 696.44499 115.82998 1277.0600 0.0083220
## WPRO-AFRO 442.34601 12.21671 872.4753 0.0396571
## EMRO-AMRO 483.98467 -29.65829 997.6276 0.0781057
## EURO-AMRO 234.18139 -192.97047 661.3332 0.6227628
## SEARO-AMRO 782.16982 155.08764 1409.2520 0.0051226
## WPRO-AMRO 528.07084 37.02147 1019.1202 0.0265586
## EURO-EMRO -249.80328 -697.13392 197.5274 0.6035014
## SEARO-EMRO 298.18515 -342.81255 939.1829 0.7702064
## WPRO-EMRO 44.08617 -464.61362 552.7860 0.9998749
## SEARO-EURO 547.98843 -26.03916 1122.0160 0.0711868
## WPRO-EURO 293.88945 -127.30540 715.0843 0.3482746
## WPRO-SEARO -254.09898 -877.13866 368.9407 0.8543087
The results of the one-way ANOVA test show a statistically significant difference in the mean number of confirmed measles cases across regions (F(5, 2376) = 4.85, p = 0.000204). Since the p-value is less than 0.05, we reject the null hypothesis.
This indicates that at least one region has a significantly different mean number of confirmed measles cases compared to the others.
Since the ANOVA test indicated a statistically significant difference among group means, a Tukey post-hoc test was conducted to determine which specific regions differed from each other. The results show that several regional pairs had statistically significant differences; specifically, SEARO had significantly higher mean measles cases compared to both AFRO (p = 0.0083) and AMRO (p = 0.0051). Additionally, WPRO showed significantly higher mean cases compared to AFRO (p = 0.0397) and AMRO (p = 0.0266).
No other pairwise comparisons were statistically significant (p > 0.05), indicating that not all regions differ from each other. These results suggest that the overall differences identified in the ANOVA are primarily driven by higher case counts in SEARO and WPRO compared to AFRO and AMRO.
The results of this analysis show that there are statistically significant differences in the mean number of confirmed measles cases across global regions. Both the visualizations and the ANOVA test support the conclusion that measles case counts vary geographically.
The Tukey post-hoc test further revealed that regions such as SEARO and WPRO have significantly higher mean numbers of confirmed measles cases compared to AFRO and AMRO. These differences may be influenced by several factors, including variations in vaccination coverage, access to healthcare, population density, and differences in public health infrastructure across regions. Regions with lower vaccination rates or limited healthcare access may be more susceptible to higher measles case counts and outbreaks.
This analysis is important because it highlights how measles remains a global public health issue, particularly in certain regions. Understanding where higher case counts occur can help guide vaccination efforts and public health interventions to reduce the spread of the disease.
There are however, a few limitations to this analysis. The dataset does not account for all possible factors that influence measles cases, such as differences in reporting accuracy, population size, or vaccination rates within each region. Additionally, the use of aggregated regional data may mask important variations within individual countries. The presence of many zero values and highly skewed data may also impact the interpretation of results.
Overall, this analysis demonstrates that measles cases are not evenly distributed across regions, and that geographic location does play a role in the number of confirmed cases worldwide.
In conclusion, this analysis found that there is a statistically significant difference in the mean number of confirmed measles cases across global regions. The results indicate that regions such as SEARO and WPRO tend to have higher case counts compared to others, suggesting that geographic location plays an important role in measles distribution. These findings highlight the need for targeted public health efforts to reduce measles cases in the most affected regions.
Data Science Learning Community. (2025). Measles cases around the world [Data set]. TidyTuesday. https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-06-24/cases_year.csv
Data Science Learning Community. (2025). Measles cases around the world (README). TidyTuesday. https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-06-24/readme.md
KidsHealth New Zealand. (2025). Measles in children. https://www.kidshealth.org.nz/measles-in-children
Mayo Clinic. (2025). Measles: Symptoms and causes. https://www.mayoclinic.org/diseases-conditions/measles/symptoms-causes/syc-20374857
University of Virginia. (2025). Measles reported in 9 states: Here’s what you need to know. https://news.virginia.edu/content/measles-reported-9-states-heres-what-you-need-know