Tuberculosis (also known as TB) is an infectious airborne disease caused by a bacteria (Mycobacterium tuberculosis) that has been around since the late 19th century, and is still affecting the world’s population today. Tuberculosis is a disease that causes prolonged cough, chest pain, fatigue, and weight loss amongst plenty other symptoms in affected patients. Tuberculosis can exist in an inactive form where individuals are asymptomatic and not contagious, but when it is transmissible and contagious, its effects are worse amongst children and immunocompromised persons, especially individuals who have HIV. HIV stands for human immunodeficiency virus. It’s a virus that can only infect humans and leads to the weakening of the immune system, and it is also the virus that causes aids. HIV damages people’s immune systems, making it easier for them to get sick, which is why TB is twelve times more likely to affect a person with HIV versus one without. The tuberculosis dataset from the TidyTuesday site examined statistical data about the infectious disease, including incidence, mortality, and population across years in each country. We’ve decided to explore which region has the most progress in reducing tuberculosis mortality in HIV patients, based on the data. The independent variable is each region and the dependent variable is death rate per 100,000 people in each region. This data is important biologically because it allows researchers to understand the scale of the disease worldwide, and from there, are able to execute certain additional research or solutions to help reduce the amount of impaction caused by the disease.
Which region has the most progress in reducing tuberculosis mortality in HIV patients?
To better understand the data, we calculated statistics for each region. Measures like the mean and median help show the levels of TB mortality, while the standard deviation and standard error show how much the data varies and how reliable the averages are. These statistics help support and explain the patterns we observed in the graphs.
## Mean, Median, Standard Deviation, and Standard Error
tb_region <- summarize(
group_by(who_tb_data, g_whoregion, year),
mortality = mean(e_mort_tbhiv_num, na.rm = TRUE)
)
## `summarise()` has grouped output by 'g_whoregion'. You can override using the
## `.groups` argument.
tb_stats <- summarize(
group_by(tb_region, g_whoregion),
mean = mean(mortality),
median = median(mortality),
sd = sd(mortality),
n = length(mortality),
se = sd / sqrt(n)
)
tb_stats
## # A tibble: 6 × 6
## g_whoregion mean median sd n se
## <fct> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Africa 7675. 7909. 3363. 24 686.
## 2 Americas 165. 153. 33.9 24 6.93
## 3 Eastern Mediterranean 117. 108. 38.5 24 7.85
## 4 Europe 91.3 84.1 30.1 24 6.14
## 5 South-East Asia 12342. 13252. 8257. 24 1685.
## 6 Western Pacific 427. 449. 81.2 24 16.6
The statistics show clear differences in TB mortality among HIV patients across regions. Southeast Asia had the highest average mortality and the most variation, meaning its numbers were both high and changed a lot over time. Africa also had high mortality, but not as much variation as Southeast Asia. Regions like Europe, the Americas, and the Eastern Mediterranean had much lower mortality levels. The standard error was also highest for South-East Asia, showing that its average was less consistent compared to other regions. Overall, these results match what we saw in the graphs, where South-East Asia and Africa stood out the most.
All regions were compared together using a plot to show differences in tuberculosis mortality among HIV patients. After, separate plots were created for each region to more clearly show individual trends and differences in mortality rates.
## Filtering years by region and HIV/TB mortalities
tb_region <- summarize(
group_by(who_tb_data, g_whoregion, year),
mortality = mean(e_mort_tbhiv_num, na.rm = TRUE)
)
## `summarise()` has grouped output by 'g_whoregion'. You can override using the
## `.groups` argument.
## Plot of HIV/TB deaths from 2000-2023
ggplot(tb_region, aes(x = year, y = mortality, color = g_whoregion)) +
geom_line() +
geom_point() +
xlab("Year") +
ylab("TB-HIV Mortality") +
theme_classic()
## Africa data
africa <- filter(tb_region, g_whoregion == "Africa")
ggplot(africa, aes(x = year, y = mortality)) +
geom_line() +
geom_point() +
ggtitle("Africa TB HIV Mortality Over Time") +
theme_classic()
## Americas data
americas <- filter(tb_region, g_whoregion == "Americas")
ggplot(americas, aes(x = year, y = mortality)) +
geom_line() +
geom_point() +
ggtitle("America TB HIV Mortality Over Time") +
theme_classic()
## Europe data
europe <- filter(tb_region, g_whoregion == "Europe")
ggplot(europe, aes(x = year, y = mortality)) +
geom_line() +
geom_point() +
ggtitle("Europe TB HIV Mortality Over Time") +
theme_classic()
## Western Pacific data
westernpacific <- filter(tb_region, g_whoregion == "Western Pacific")
ggplot(westernpacific, aes(x = year, y = mortality)) +
geom_line() +
geom_point() +
ggtitle("Western Pacific TB HIV Mortality Over Time") +
theme_classic()
## Eastern Mediterranean data
easternmediterranean <- filter(tb_region, g_whoregion == "Eastern Mediterranean")
ggplot(easternmediterranean, aes(x = year, y = mortality)) +
geom_line() +
geom_point() +
ggtitle("Eastern Mediterranean TB HIV Mortality Over Time") +
theme_classic()
## Southeast Asia data
southeastasia <- filter(tb_region, g_whoregion == "South-East Asia")
ggplot(southeastasia, aes(x = year, y = mortality)) +
geom_line() +
geom_point() +
ggtitle("South East Asia TB HIV Mortality Over Time") +
theme_classic()
## Comparing TB HIV deaths across all countries
ggplot(tb_region, aes(x = year, y = mortality, color = g_whoregion)) +
geom_line() +
geom_point() +
ggtitle("TB-HIV Mortality Trends by Region") +
theme_classic()
The separate graphs show how TB mortality in HIV patients changed over time in each region. Southeast Asia and Africa started with really high numbers but went down a lot over the years, especially Southeast Asia which had the biggest drop (note the range of people on the y axis is much higher for Southeast Asia and Africa, while the other four regions stay in the hundreds). Other regions like Europe, the Americas, and the Eastern Mediterranean stayed pretty low the whole time and didn’t change much. The Western Pacific went down a little, but not as much as the other high regions. Overall, this shows that some regions made a lot of progress, while others were already low and stayed that way.
A one way ANOVA was used to determine whether tuberculosis mortality among HIV patients is different across world regions. This test is appropriate because the independent variable, region, is categorical, while the dependent variable, mortality, is numerical. ANOVA lets us compare mean mortality values across all countries at once. The null hypothesis states that there is no difference in mean tuberculosis mortality among HIV patients across world regions. The alternative hypothesis states that at least one region has a different mean tuberculosis mortality among HIV patients compared to others.
## Running ANOVA
mortality ~ g_whoregion
## mortality ~ g_whoregion
tb_ANOVA <- lm(mortality ~ g_whoregion, data = tb_region)
## Looking for normality
qqnorm(residuals(tb_ANOVA))
qqline(residuals(tb_ANOVA))
A QQ plot of the residuals showed some deviation from normality, mainly at the tails. However, given the large sample size, this violation is not considered severe enough to impact the ANOVA results.
anova(tb_ANOVA)
## Analysis of Variance Table
##
## Response: mortality
## Df Sum Sq Mean Sq F value Pr(>F)
## g_whoregion 5 3341845730 668369146 50.449 < 2.2e-16 ***
## Residuals 138 1828300342 13248553
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A one way ANOVA was used to examine differences in tuberculosis mortality in HIV patients across world regions. The results showed a significant effect of region on mortality (F = 50.449, p < 2.2e-16), which means that mean mortality rates are different across regions. Because the p-value is way below 0.05, we can reject the null hypothesis that all regions have equal mortality rates, which is expected. The larger F value suggests that variation between regions is much greater than variation in regions. The residuals show the differences in mortality that is not by region. Overall, these results suggest that geographic region plays an important role in TB mortality in HIV patients.
We decided to run a Tukey test to observe which specific regions are different from one another. The ANOVA shows that at least one group mean is different, but it does not specifically show where those differences are. The Tukey test lets us see comparisons between all regions.
TukeyHSD(aov(tb_ANOVA))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = tb_ANOVA)
##
## $g_whoregion
## diff lwr upr
## Americas-Africa -7510.36071 -10547.167 -4473.554
## Eastern Mediterranean-Africa -7558.03814 -10594.845 -4521.231
## Europe-Africa -7583.70046 -10620.507 -4546.894
## South-East Asia-Africa 4667.23201 1630.425 7704.039
## Western Pacific-Africa -7248.21489 -10285.022 -4211.408
## Eastern Mediterranean-Americas -47.67743 -3084.484 2989.129
## Europe-Americas -73.33975 -3110.146 2963.467
## South-East Asia-Americas 12177.59272 9140.786 15214.399
## Western Pacific-Americas 262.14582 -2774.661 3298.952
## Europe-Eastern Mediterranean -25.66232 -3062.469 3011.144
## South-East Asia-Eastern Mediterranean 12225.27015 9188.463 15262.077
## Western Pacific-Eastern Mediterranean 309.82325 -2726.983 3346.630
## South-East Asia-Europe 12250.93247 9214.126 15287.739
## Western Pacific-Europe 335.48557 -2701.321 3372.292
## Western Pacific-South-East Asia -11915.44690 -14952.254 -8878.640
## p adj
## Americas-Africa 0.0000000
## Eastern Mediterranean-Africa 0.0000000
## Europe-Africa 0.0000000
## South-East Asia-Africa 0.0002587
## Western Pacific-Africa 0.0000000
## Eastern Mediterranean-Americas 1.0000000
## Europe-Americas 0.9999998
## South-East Asia-Americas 0.0000000
## Western Pacific-Americas 0.9998658
## Europe-Eastern Mediterranean 1.0000000
## South-East Asia-Eastern Mediterranean 0.0000000
## Western Pacific-Eastern Mediterranean 0.9996950
## South-East Asia-Europe 0.0000000
## Western Pacific-Europe 0.9995498
## Western Pacific-South-East Asia 0.0000000
The Tukey test results show that a lot of region comparisons were statistically significant. For example, Southeast Asia had significantly higher mortality compared to all other regions, which is shown by very small p-values. Also, Africa had significantly higher mortality than regions such as the Americas, Europe, Eastern Mediterranean, and Western Pacific. But, a lot comparisons between regions like the Americas, Europe, Eastern Mediterranean, and Western Pacific were not statistically significant, shown by their p-values that were close to 1. These results suggest that the major differences in TB mortality among HIV patients are higher regions such as South-East Asia and Africa. Overall, the Tukey test helps identify which specific regions contribute most to the significant differences found in the ANOVA.
The results of the analysis show that tuberculosis mortality among HIV patients is different across world regions. The ANOVA test was significant (F = 50.449, p < 2.2e-16), which means we reject the null hypothesis and see that not all regions have the same average mortality. The Tukey test showed that regions like Southeast Asia and Africa have much higher mortality compared to other regions, while many other regions were not very different from each other. This analysis is important because it shows that some parts of the world are more affected than others and may need more healthcare support. However, there are some limitations, such as not including factors like healthcare access or living conditions that could affect mortality. Also, the data was not perfectly normal, but this did not strongly affect the results because the sample size was large.
In conclusion, this study found that tuberculosis mortality among HIV patients varies significantly across world regions. Statistical analysis using ANOVA confirmed that these differences are significant, and Tukey test results showed specific regions contributing to these differences. Southeast Asia and Africa showed the highest mortality rates, while other regions had lower levels. Based on trends over time, Southeast Asia showed the greatest progress in reducing tuberculosis mortality among HIV patients, as it had the largest overall decrease in mortality compared to other regions over time, even though it did not have the lowest mortality by the end of the time period. Overall, while some progress has been made in reducing TB mortality, regions still have not reached the level of complete reduction. More efforts, research, and studies are needed to reduce these differences and improve global health outcomes.
HIV education is prevention – learn more on hivcare.org. HIV Care. (2025, December 4). https://hivcare.org/hiv-basics/?gad_source=1&gad_campaignid=1672329731
Nehal, D. (2025, November 11). WHO TB Burden Data: Incidence, Mortality, and Population. GitHub. https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-11-11/readme.md
Navasardyan, I., Miwalian, R., Petrosyan, A., Yeganyan, S., & Venketaraman, V. (2024). HIV–TB Coinfection: Current Therapeutic Approaches and Drug Interactions. Viruses, 16(3), 321. https://doi.org/10.3390/v16030321
World Health Organization. (2026, March 24). Tuberculosis (TB). World Health Organization. https://www.who.int/news-room/fact-sheets/detail/tuberculosis