Rows: 989 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): Region, Country, ISO_code, ADM1
dbl (9): Latitude, Longitude, Year_start, Year_end, Age_start, Age_end, Indi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 589 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): Region, Country, ISO_code, ADM1
dbl (9): Latitude, Longitude, Year_start, Year_end, Age_start, Age_end, Indi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1000 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): Region, Country, ISO_code, ADM1
dbl (9): Latitude, Longitude, Year_start, Year_end, Age_start, Age_end, Indi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The dataset includes three tables describing the prevalence of three helminth infections: Ascaris, Hookworm, and Schistosoma. Each table contains corresponding columns that provide sufficient information on geographic location, survey time, study population, and survey results (sample size, positives, prevalence).
1.2 Combine into a master table
To explore the epidemiology of the three helminth infections across different regions, I will combine the datasets into a single master table that integrates all three diseases with their corresponding variables.
# A tibble: 2,578 × 14
disease Region Country ISO_code ADM1 Latitude Longitude Year_start Year_end
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ascaris Africa Angola AO Bengo -8.59 13.6 2010 2010
2 ascaris Africa Angola AO Bengo -8.63 13.7 2010 2010
3 ascaris Africa Angola AO Bengo -8.61 13.6 2010 2010
4 ascaris Africa Angola AO Bengo -8.62 14.2 2010 2010
5 ascaris Africa Angola AO Bengo -8.53 13.7 2010 2010
6 ascaris Africa Angola AO Bengo -8.60 13.6 2010 2010
7 ascaris Africa Angola AO Bengo -8.63 14.0 2010 2010
8 ascaris Africa Angola AO Bengo -8.62 13.8 2010 2010
9 ascaris Africa Angola AO Bengo -8.64 14.0 2010 2010
10 ascaris Africa Angola AO Bengo -8.60 13.6 2010 2010
# ℹ 2,568 more rows
# ℹ 5 more variables: Age_start <dbl>, Age_end <dbl>,
# Individuals_surveyed <dbl>, Number_Positives <dbl>, Prevalence <dbl>
This master table now contains 14 columns, with information from left to right including disease name, geographic location (the next 6 columns), study period (year start and year end), study population (age_start and age_end), and survey data (number of participants, number of positive cases, and disease prevalence).
2 Questions
2.1 Q1. Which region has the highest prevalence for each disease?
2.1.1 Rationale:
This question aims to identify regional hotspots of helminth infections and to provide an initial geographical overview of disease burden.
2.1.2 R code:
#group_by region and disease, then summarise the mean of prevalence to arrange it from higher to lowerq1.region.disease.tb <- combined.mastertable %>%group_by(Region, disease) %>%summarise(mean_prev =mean(Prevalence, na.rm =TRUE), n_sites =n()) %>%arrange(desc(mean_prev))
`summarise()` has grouped output by 'Region'. You can override using the
`.groups` argument.
q1.region.disease.tb
# A tibble: 6 × 4
# Groups: Region [3]
Region disease mean_prev n_sites
<chr> <chr> <dbl> <int>
1 East and Southeast Asia ascaris 0.288 133
2 Africa hookworm 0.233 880
3 Africa schisto 0.212 589
4 Central and South Asia hookworm 0.119 2
5 East and Southeast Asia hookworm 0.113 118
6 Africa ascaris 0.0736 856
#Draw a column chart to compare the mean of prevalence of each disease in each region ggplot(q1.region.disease.tb, aes(x = Region, y = mean_prev, fill = disease)) +geom_col (position ="dodge") +labs(title ="Mean prevalence by Region and Disease", y ="Mean prevalence") +theme(axis.text.x =element_text(angle =45, hjust =1)) +scale_fill_brewer(palette ="Dark2")
2.1.3 Results:
Results indicate that the East and Southeast Asian region exhibits the highest prevalence of ascariasis, whereas Africa shows the greatest burden of both hookworm infection and schistosomiasis. Among the three parasitic diseases, hookworm infection is the most widespread, while schistosomiasis appears to be restricted to Africa.
2.2 Q2. Are there geographic hotspots at finer resolution (latitude and longitude)?
2.2.1 Rationale:
This question extends the analysis from broad regional patterns (Q1) to more granular geographic scales defined by latitude and longitude. Identifying fine-resolution hotspots allows for a clearer understanding of localized transmission dynamics and helps pinpoint environmental and socio-economic factors driving disease prevalence.
2.2.2 R code:
#Create q2 table with the summary of mean prevalence in each group with the same lantitude, longitude and disease. q2.longid.latid.tb <- combined.mastertable %>%group_by(Longitude, Latitude, disease) %>%summarise(mean_prev =mean(Prevalence, na.rm =TRUE)) %>%filter(mean_prev >0)
`summarise()` has grouped output by 'Longitude', 'Latitude'. You can override
using the `.groups` argument.
#Draw a scatterplot of each disease on the geographic mapggplot(q2.longid.latid.tb, aes (x = Longitude, y = Latitude, color = mean_prev, size = mean_prev)) +geom_point() +scale_color_gradient(low ="lightyellow",high ="red") +facet_wrap(~ disease) +labs (title ="Geographic distribution of helminth prevalence", x ="Longitude", y ="Latitude", color ="mean prevalence", size ="mean prevalence")
2.2.3 Results:
The figure shows that all three helminth infections are concentrated within tropical latitudes (-20° to +20°). In details, Helminth infections show distinct geographic patterns: Ascariasis hotspots mainly occur between -10° to 25° latitude and 0° to 120° longitude, mainly in East and Southeast Asia. Hookworm has a wider distribution across Africa and Southeast Asia, mostly between -30° and 20° latitude. Schistosomiasis is largely confined to Africa, found between -30° and 20° latitude and 0° to around 60° longitude.
Tropical latitudes around -20° to +20° offer warm and humid climates supportive to parasite survival and transmission. Differences in longitude often reflect distinct climate zones, landscapes, and human population patterns, which affect host availability and parasite life cycles. For example, areas between 20° and 40° longitude may experience different rainfall, temperature, or land use compared to regions near 80°, leading to variations in infection prevalence.
2.3 Q3. How has the prevalence of Hookworm infection changed overtime across regions?
2.3.1 Rationale:
Building on results from Q1 and Q2, which show that hookworm is the only helminth infection present in all three regions, examining temporal trends is essential. Understanding how hookworm prevalence has evolved over time allows for evaluation of the effectiveness of control programs in diverse geographic contexts. Additionally, this analysis helps identify emerging or persistent hotspots that may require accelerated public health interventions.
2.3.2 R code:
#Filter hookworm data in all regionq3.hookworm <- combined.mastertable %>%filter(disease =="hookworm") %>%group_by(Region, Year_start) %>%summarise(mean_prev =mean(Prevalence, na.rm =TRUE))
`summarise()` has grouped output by 'Region'. You can override using the
`.groups` argument.
q3.hookworm
# A tibble: 26 × 3
# Groups: Region [3]
Region Year_start mean_prev
<chr> <dbl> <dbl>
1 Africa 1997 0.0992
2 Africa 1998 0.493
3 Africa 1999 0.648
4 Africa 2001 0.136
5 Africa 2002 0.176
6 Africa 2003 0.545
7 Africa 2004 0.712
8 Africa 2005 0.276
9 Africa 2006 0.0435
10 Africa 2007 0.220
# ℹ 16 more rows
#Plot trend of prevalence overtime by countryggplot(q3.hookworm, aes(x = Year_start, y = mean_prev, colour = Region)) +geom_point() +geom_line() +labs(title ="Trend of Hookworm prevalence by region", x ="Year of study", y ="mean_prev") +scale_x_continuous(breaks =unique(q3.hookworm$Year_start)) +theme(axis.text.x =element_text(angle =45, hjust =1))
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).
2.3.3 Results:
Hookworm prevalence in Africa demonstrates considerable year-to-year variability, which may reflect changes in control efforts, surveillance quality, or underlying transmission dynamics. In contrast, East and Southeast Asia experienced a marked surge in prevalence around 2006, followed by a notable decline, suggesting the possible impact of accelerated interventions or improved public health initiatives in that region. The limited data available for Central and South Asia precludes meaningful assessment of temporal trends in that area.
2.4 Q4. Which countries are affected by specific helminth infections, and to what extent do multiple helminth species co-occur within the same country?
2.4.1 Rationale:
In helminth epidemiology, understanding co-occurrence of infections within populations is crucial, as it informs whether integrated intervention strategies are necessary. Identifying countries where multiple helminth diseases overlap provides evidence to support the design of combined control programs, maximizing resource efficiency and enhancing the effectiveness of public health efforts.
2.4.2 R code:
#table express whether a disease appears in a countryq4.country.disease.tb <- combined.mastertable %>%group_by(Country, disease) %>%summarise(has_data ="X", .groups ="drop") %>%pivot_wider(names_from = disease, values_from = has_data, values_fill ="O")knitr::kable(q4.country.disease.tb, caption ="Presence of helminth diseases by country")
Presence of helminth diseases by country
Country
ascaris
hookworm
schisto
Angola
X
X
O
Burundi
X
X
O
Cameroon
X
O
O
China
X
X
O
Cote D’Ivoire
X
X
O
Democratic Republic of the Congo
O
O
X
Eritrea
X
X
O
Ethiopia
O
X
X
Ghana
X
X
O
Malawi
X
X
O
Nepal
O
X
O
Nigeria
X
X
O
Philippines
X
X
O
Senegal
X
X
O
Sierra Leone
X
X
X
South Africa
O
X
O
Uganda
X
X
X
United Republic of Tanzania
O
X
X
Zambia
O
O
X
2.4.3 Results:
The table shows that helminth infections rarely occur in isolation; most countries report the presence of two or even all three diseases concurrently. This notable overlap underscores the need for integrated, multi-targeted intervention strategies rather than disease-specific approaches. In contrast, schistosomiasis is less widely and more often occurs independently, suggesting unique transmission dynamics and the potential need for targeted interventions when it appears alone.
2.5 Q5. Is there a significant correlation between the presence of different helminth diseases (in pairs) across countries?
2.5.1 Rationale:
The Q4 data reveal potential co-occurrence of multiple helminth infections within countries. To determine whether these diseases are truly associated rather than coincidentally overlapping, this analysis quantitatively assesses the frequency of co-occurrence. The correlation ratio is calculated as: correlation_ratio = (number of nations presenting both diseases) / (number of nations having at least one of the two diseases)
2.5.2 R code:
#Create a table summarizing information about the frequency of occurrence of 2 diseases in pairs:q5.co_occur <- q4.country.disease.tb %>%summarise(Ascaris_Hookworm =sum(ascaris =="X"& hookworm =="X")/sum(ascaris =="X"| hookworm =="X"), Hookworm_Schisto =sum(schisto =="X"& hookworm =="X")/sum(schisto =="X"| hookworm =="X"), Ascaris_Schisto =sum(schisto =="X"& ascaris =="X")/sum(schisto =="X"| ascaris =="X")) %>%pivot_longer(cols =c("Ascaris_Hookworm","Hookworm_Schisto","Ascaris_Schisto"), names_to ="Pair", values_to ="Correlation_ratio")q5.co_occur
#Draw a graph illustrating the prevalence of each infection pair.ggplot(q5.co_occur) +geom_col(aes(x =reorder (Pair, Correlation_ratio), y = Correlation_ratio, fill = Pair)) +labs (title ="Co-occurance rate between disease pairs", x ="Disease pairs", y ="Correlation ratio") +scale_fill_brewer(palette=7)
2.5.3 Results:
The figure and table demonstrate that ascaris and hookworm exhibit a strong co-occurrence, with a correlation ratio of 71%, indicating these two infections frequently overlap within the same countries. By contrast, the correlation ratios for combinations involving schistosomes are much lower (all below 30%), reflecting a weaker association. These findings suggest that regions endemic for ascaris should be prioritized for concurrent hookworm surveillance and control efforts, and vice versa, while integrated strategies involving schistosomiasis may be less broadly applicable but still warranted in specific settings.
2.6 Q6. How does the prevalence of helminth infections change over time in countries where all three diseases are present?
2.6.1 Rationale:
Tracking the temporal dynamics of helminth prevalence in countries with comprehensive data on ascaris, hookworm, and schistosomiasis provides robust insights into the effectiveness of control strategies such as deworming and sanitation programs. Restricting the analysis to nations with full data coverage for all three infections ensures unbiased comparisons across diseases and over time.
2.6.2 R code:
#Filter countries with all 3 diseases and have more than 1 investigationq6.countries.all3 <- combined.mastertable %>%group_by(Country) %>%filter(n_distinct(disease) ==3&n_distinct(Year_start) >1) %>%ungroup()#Summarise dataq6.summarise <- q6.countries.all3 %>%group_by(Country, disease, Year_start) %>%summarise(mean_prev =mean(Prevalence)) %>%ungroup()
`summarise()` has grouped output by 'Country', 'disease'. You can override
using the `.groups` argument.
knitr::kable(q6.summarise, caption ="Temporal Trends of Helminth triple-infections in Endemic Countries")
Temporal Trends of Helminth triple-infections in Endemic Countries
Country
disease
Year_start
mean_prev
Uganda
ascaris
1998
0.0452189
Uganda
ascaris
2002
0.0594700
Uganda
ascaris
2003
0.0830500
Uganda
ascaris
2005
0.2472018
Uganda
ascaris
2006
0.2187270
Uganda
ascaris
2008
0.0951707
Uganda
ascaris
2009
0.0061564
Uganda
hookworm
1998
0.4934728
Uganda
hookworm
2002
0.5752034
Uganda
hookworm
2003
0.5447246
Uganda
hookworm
2004
0.7124457
Uganda
hookworm
2005
0.2762282
Uganda
hookworm
2006
0.0435118
Uganda
hookworm
2008
0.0331454
Uganda
hookworm
2009
0.1027241
Uganda
schisto
1998
0.1496048
Uganda
schisto
2002
0.4415909
Uganda
schisto
2003
0.4899146
Uganda
schisto
2004
0.3674429
Uganda
schisto
2005
0.0659276
Uganda
schisto
2006
0.0361705
Uganda
schisto
2008
0.1424500
Uganda
schisto
2009
0.1000861
#Plot trendggplot(q6.summarise, aes(x = Year_start, y = mean_prev, color = disease)) +geom_line () +geom_point() +scale_x_continuous(breaks = q6.summarise$Year_start) +facet_wrap(~ Country) +labs(title ="Trend of helminth prevalence over time", x ="Year of study", y ="Mean prevalence") +theme(axis.text.x =element_text(angle =45, hjust =1))
2.6.3 Results:
Q4 and Q5 findings suggest that ascaris and hookworm frequently co-occur, while schistosomiasis tends to be more independent. In Uganda, however, the prevalence trends of hookworm and schistosomiasis closely mirror each other, both peaking in the early 2000s before declining sharply after 2005. Notably, ascaris prevalence rose in 2005 as the other two dropped, which might indicate a data anomaly, reporting lag, or a unique epidemiological event deserving further investigation. This underscores the complexity of helminth epidemiology and the necessity of longitudinal, multi-disease monitoring.
2.7 Q7. Which countries have sufficiently robust epidemiological studies to provide reliable evidence for helminth infection assessment?
2.7.1 Rationale:
Filtering studies based on sample size (>29 participants) and replication (>2 studies per country) ensures the elimination of noise and increases confidence in the epidemiological conclusions. This approach identifies countries where data quality is strong enough to support in-depth analysis, accurate prevalence estimation, and meaningful evaluation of control program effectiveness.
#draw a graphic to visualize the dataggplot(q7.country.level, aes(x = Country, y = n_reliable_study, size = participants_per_study, color = participants_per_study)) +geom_point() +facet_wrap (~ group_by_num.studies, scales ="free") +labs(title ="Reliable studies vs participants per study", x ="Country", y ="Number of reliable studies", size ="Avg participants per study", color ="Avg participants per study") +theme(axis.text.x =element_text(angle =45, hjust =1, size =8)) +scale_color_gradient(low ="lightblue", high ="darkblue")
2.7.3 Results:
After applying reliability criteria, only 14 countries qualified for detailed epidemiological investigation. In the figure, countries represented by large, dark dots have the most reliable data due to large sample sizes, while those with a moderate to high number of reliable studies are ideal candidates for evaluating control efforts. Nations with both substantial sample sizes and repeated studies form the best foundation for further research into disease patterns and transmission dynamics within their populations.