This report presents a comprehensive analysis of infectious disease indicators in Kenya, with particular focus on cholera trends and comparisons across multiple diseases including measles, poliomyelitis, and meningitis.
The dataset covers infectious disease indicators in Kenya over multiple years, including reported cases and deaths for various diseases such as cholera, measles, polio, tetanus, diphtheria, and others. It provides yearly data on specific health metrics like number of cases, deaths, and case fatality rates for these diseases, enabling analysis of disease trends and impacts over time.
The dataset “kenya-infectious-disease-indicators.csv” has 3 columns (Metric, Year, and Value) and approximately 384 rows of data, capturing various infectious disease indicators reported over multiple years in Kenya.
Before we begin our analysis, we need to load the necessary R packages. These libraries provide functions for data manipulation, visualization, and reading CSV files.
library(knitr)
library(tidyverse)
library(kableExtra)
Now we’ll import the Kenya infectious disease dataset into R. This dataset contains yearly records of various disease metrics collected over multiple decades.
data <- read_csv("C:/Users/user/Documents/un-report/data/practice-datasets/kenya-infectious-disease-indicators.csv")
Let’s begin by examining the structure of our dataset. We’ll create a frequency table to see which disease metrics are most commonly reported and understand the overall composition of our data.
frequency_table <- data %>%
count(Metric) %>%
arrange(desc(n))
kable(frequency_table, caption = "<span style='color:blue; font-weight:bold;'> Table 1: Frequency of Reported Disease Metrics in Kenya </span>", escape = FALSE)
| Metric | n |
|---|---|
| Measles - number of reported cases | 46 |
| Poliomyelitis - number of reported cases | 39 |
| Cholera case fatality rate | 38 |
| Number of reported cases of cholera | 38 |
| Number of reported deaths from cholera | 38 |
| Total tetanus - number of reported cases | 36 |
| Neonatal tetanus - number of reported cases | 28 |
| Pertussis - number of reported cases | 28 |
| Yellow fever - number of reported cases | 27 |
| Diphtheria - number of reported cases | 24 |
| Total rubella - number of reported cases | 18 |
| Number of suspected meningitis deaths reported | 13 |
| Mumps - number of reported cases | 7 |
| Congenital Rubella Syndrome - number of reported cases | 2 |
| Japanese encephalitis - number of reported cases | 2 |
To prepare our data for analysis, we need to clean and filter it appropriately. This involves converting date formats, filtering specific disease metrics, and organizing the data into separate datasets for each disease of interest.
data$Year <- as.numeric(data$Year)
measles_cases <- filter(data, Metric == "Measles - number of reported cases")
polio_cases <- filter(data, Metric == "Poliomyelitis - number of reported cases")
cholera_cfr <- filter(data, Metric == "Cholera case fatality rate")
cholera_cases <- filter(data, Metric == "Number of reported cases of cholera")
cholera_deaths <- filter(data, Metric == "Number of reported deaths from cholera")
tetanus_total <- filter(data, Metric == "Total tetanus - number of reported cases")
tetanus_neonatal <- filter(data, Metric == "Neonatal tetanus - number of reported cases")
pertussis_cases <- filter(data, Metric == "Pertussis - number of reported cases")
yellow_fever_cases <- filter(data, Metric == "Yellow fever - number of reported cases")
diphtheria_cases <- filter(data, Metric == "Diphtheria - number of reported cases")
rubella_total <- filter(data, Metric == "Total rubella - number of reported cases")
meningitis_deaths <- filter(data, Metric == "Number of suspected meningitis deaths reported")
mumps_cases <- filter(data, Metric == "Mumps - number of reported cases")
congenital_rubella <- filter(data, Metric == "Congenital Rubella Syndrome - number of reported cases")
japanese_encephalitis <- filter(data, Metric == "Japanese encephalitis - number of reported cases")
In this section, we address three key research questions about infectious diseases in Kenya. Each question is explored through detailed statistical analysis and data visualization to reveal important public health trends and patterns.
Cholera is a major public health concern in Kenya, causing periodic outbreaks with significant morbidity and mortality. In this section, we examine the historical trends in both cholera cases and deaths to understand the disease burden and identify patterns in outbreak cycles.
Having reviewed each disease individually, we can now synthesize these findings to understand the broader patterns in infectious disease control in Kenya. Statistics
Here we present the key statistical highlights from the CFR data, including the highest and lowest rates observed during the study period.
First, we’ll calculate the total number of cholera cases and deaths recorded in the dataset to understand the overall magnitude of the cholera burden in Kenya.
total_cholera_cases <- sum(cholera_cases$Value, na.rm = TRUE)
print(paste("Total Cholera Cases:", total_cholera_cases))
## [1] "Total Cholera Cases: 118214"
total_cholera_deaths <- sum(cholera_deaths$Value, na.rm = TRUE)
print(paste("Total Cholera Deaths:", total_cholera_deaths))
## [1] "Total Cholera Deaths: 3873"
Based on the calculated totals, we can now interpret what these numbers tell us about cholera’s impact in Kenya over time. A total of 118,214 reported cholera cases have been recorded in Kenya over the available years. 3,873 deaths from cholera were reported in this time period. This indicates that cholera cases and deaths have varied considerably over time, with large outbreaks contributing to these totals. The data reflect cholera as a significant and persistent public health challenge in Kenya, with periodic epidemics causing thousands of cases and fatalities. If viewed over time, these data would likely show peaks during epidemic years and lower counts in others, consistent with the known cyclical patterns of cholera outbreaks in the country.
To better visualize these trends, we’ll create a line plot showing both cholera cases and deaths over time. This dual-line graph will help us identify outbreak periods and assess the relationship between cases and mortality.
Line plots clearly show temporal trends and peaks. Different colors allow comparison between cases and deaths.
cholera_data <- data %>%
filter(Metric %in% c("Number of reported cases of cholera", "Number of reported deaths from cholera")) %>%
mutate(Year = as.numeric(Year)) %>%
filter(!is.na(Year))
ggplot(cholera_data, aes(x = Year, y = Value, color = Metric)) +
geom_line() +
geom_point() +
labs(title = "Trends in Reported Cholera Cases and Deaths Over Time in Kenya",
x = "Year",
y = "Count",
color = "Metric") +
theme_minimal()
The case fatality rate is a critical indicator of healthcare quality and disease severity. By examining CFR trends over 45 years, we can assess improvements in cholera treatment and identify periods of healthcare system stress.
In this section, we break down the CFR trends by decade to identify key patterns and turning points in cholera management and outcomes.
geom_line() and geom_point() for line and points
The CFR was extremely high in 1971 (15.9%), indicating a severe cholera outbreak or poor case management. The rate dropped to 0% in 1972 and 1977, likely due to fewer cases or improved response, though it fluctuated in other years.
CFRs ranged mostly between 1.5% and 7.5%, with occasional spikes (e.g., 1985: 7.54%). This period shows a gradual improvement in cholera management, though outbreaks still occurred.
CFRs generally stabilized around 2–5%, but there were notable spikes: 2000: 6.74% 2014: 25.7% — the highest in the dataset, possibly due to a small number of cases with disproportionately high deaths. The CFR dropped significantly in 2015 (0.5%), suggesting improved treatment and reporting.
Drawing from the CFR data analysis, we can now make informed conclusions about the evolution of cholera treatment and public health response in Kenya.
The trend in cholera CFR from 1971 to 2016 in Kenya highlights periods of high mortality, often linked to limited healthcare infrastructure, delayed response, or data inconsistencies. However, the overall decline in recent years suggests progress in public health interventions and cholera control strategies.
We’ll now visualize the CFR trends using a line graph with red coloring to emphasize mortality rates. This visualization will clearly show the fluctuations and overall trend direction over the 45-year period.
# Filter for cholera CFR from 1971 to 2016
cfr_data <- data %>%
filter(Metric == "Cholera case fatality rate") %>%
filter(Year >= 1971 & Year <= 2016) %>%
mutate(Year = as.numeric(Year))
# Plot CFR trends
ggplot(cfr_data, aes(x = Year, y = Value)) +
geom_line(color = "red") +
geom_point(color = "red") +
labs(title = "Trends in Cholera Case Fatality Rate (1971-2016)",
x = "Year",
y = "CFR (%)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
### Alternative Visualization For comparison, we present an alternative
visualization of the same CFR data using a dark green color scheme,
which may be preferred for certain presentation contexts or to reduce
visual fatigue.
# Filter cholera CFR between 1971 and 2016
cfr_detailed <- data %>%
filter(Metric == "Cholera case fatality rate") %>%
filter(Year >= 1971 & Year <= 2016) %>%
mutate(Year = as.numeric(Year))
# Plot CFR trends with dark green
ggplot(cfr_detailed, aes(x = Year, y = Value)) +
geom_line(color = "darkgreen") +
geom_point(color = "darkgreen") +
labs(title = "Trend in Cholera Case Fatality Rate (1971-2016)",
x = "Year",
y = "Case Fatality Rate (%)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Comparing multiple diseases helps us understand the relative public health burden and the effectiveness of different intervention strategies. In this section, we examine measles, poliomyelitis, and meningitis to see how their patterns differ over time.
Before diving into the visualization, let’s examine the summary statistics for each disease to understand their individual characteristics and trends.
The incidences of measles, poliomyelitis, and meningitis compare across years in Kenya as follows:
Measles was the most widespread and consistently reported disease. Polio showed moderate incidence with successful control over time. Meningitis had fewer data points but notable peaks in specific years. Visualization 3: Comparison of Multiple Diseases To compare these three diseases visually, we’ll create a multi-line graph with each disease represented by a different color. This allows for direct comparison of their temporal patterns and relative magnitudes.
# Adding disease labels
measles_cases$Disease <- "Measles"
polio_cases$Disease <- "Poliomyelitis"
meningitis_deaths$Disease <- "Meningitis (Deaths)"
# Combining data
combined_data <- bind_rows(measles_cases, polio_cases, meningitis_deaths)
combined_data$Year <- as.numeric(combined_data$Year)
# Plotting disease comparison
ggplot(combined_data, aes(x = Year, y = Value, color = Disease)) +
geom_line(size = 1) +
geom_point() +
labs(title = "Comparison of Measles, Poliomyelitis, and Meningitis in Kenya",
x = "Year",
y = "Number of Cases / Deaths",
color = "Disease") +
theme_minimal()
### Additional Detailed Visualizations To provide a more comprehensive
view of the data, we present additional focused visualizations that
examine specific aspects of cholera trends in greater detail.
Cholera Cases Trend (1971-2016) This detailed visualization focuses exclusively on cholera case numbers, using contrasting colors (blue line with red points) to make individual data points stand out and help identify specific outbreak years.
# Filter cholera cases
cholera_cases_filtered <- filter(data, Metric == "Number of reported cases of cholera")
cholera_cases_filtered$Year <- as.numeric(cholera_cases_filtered$Year)
# Plot cholera cases
ggplot(cholera_cases_filtered, aes(x = Year, y = Value)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red") +
labs(title = "Trend of Reported Cholera Cases in Kenya (1971–2016)",
x = "Year",
y = "Number of Cases") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Similarly, we create a focused visualization of cholera deaths using red coloring throughout to emphasize the mortality aspect and make patterns in death rates clearly visible.
# Filter reported cholera deaths from 1971 to 2016
cholera_deaths_filtered <- data %>%
filter(Metric == "Number of reported deaths from cholera") %>%
filter(Year >= 1971 & Year <= 2016) %>%
mutate(Year = as.numeric(Year))
# Plot cholera deaths
ggplot(cholera_deaths_filtered, aes(x = Year, y = Value)) +
geom_line(color = "red") +
geom_point(color = "red") +
labs(title = "Reported Cholera Deaths in Kenya (1971-2016)",
x = "Year",
y = "Number of Deaths") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Having completed our comprehensive analysis of infectious disease data in Kenya, we now synthesize the key findings and their implications for public health policy and practice.
This analysis examined infectious disease trends in Kenya, with particular attention to cholera patterns between 1971 and 2016. The key findings include:
Cholera remains a significant public health challenge in Kenya, with over 118,000 cases and nearly 4,000 deaths recorded over the study period.
Case fatality rates have generally improved over time, declining from peaks of over 15% in the early 1970s to lower rates in recent years, though occasional spikes indicate ongoing challenges.
Measles had the highest burden but declined significantly after vaccination programs Polio has been effectively controlled through immunization Meningitis shows episodic outbreaks with sparse data Public health interventions appear effective, as evidenced by declining trends in most disease indicators, particularly for vaccine-preventable diseases.
These visualizations reveal important trends in disease incidence, mortality, and case fatality rates that can inform future public health interventions and resource allocation in Kenya. Session Information For reproducibility purposes, we include the R session information showing the versions of R and all packages used in this analysis.
sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: Africa/Johannesburg
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.4.0 lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
## [5] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5 tidyr_1.3.1
## [9] tibble_3.2.1 ggplot2_4.0.0 tidyverse_2.0.0 knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.3 xml2_1.3.8 stringi_1.8.7
## [5] hms_1.1.3 digest_0.6.37 magrittr_2.0.3 evaluate_1.0.3
## [9] grid_4.5.0 timechange_0.3.0 RColorBrewer_1.1-3 fastmap_1.2.0
## [13] jsonlite_2.0.0 viridisLite_0.4.2 scales_1.4.0 textshaping_1.0.1
## [17] jquerylib_0.1.4 cli_3.6.5 rlang_1.1.6 crayon_1.5.3
## [21] bit64_4.6.0-1 withr_3.0.2 cachem_1.1.0 yaml_2.3.10
## [25] parallel_4.5.0 tools_4.5.0 tzdb_0.5.0 vctrs_0.6.5
## [29] R6_2.6.1 lifecycle_1.0.4 bit_4.6.0 vroom_1.6.5
## [33] pkgconfig_2.0.3 pillar_1.10.2 bslib_0.9.0 gtable_0.3.6
## [37] glue_1.8.0 systemfonts_1.3.1 xfun_0.52 tidyselect_1.2.1
## [41] rstudioapi_0.17.1 farver_2.1.2 htmltools_0.5.8.1 labeling_0.4.3
## [45] rmarkdown_2.29 svglite_2.2.2 compiler_4.5.0 S7_0.2.0