This report presents a comprehensive analysis of infectious disease indicators in Kenya, with particular focus on cholera trends and comparisons across multiple diseases including measles, poliomyelitis, and meningitis.
The dataset covers infectious disease indicators in Kenya over multiple years, including reported cases and deaths for various diseases such as cholera, measles, polio, tetanus, diphtheria, and others. It provides yearly data on specific health metrics like number of cases, deaths, and case fatality rates for these diseases, enabling analysis of disease trends and impacts over time.
The dataset “kenya-infectious-disease-indicators.csv” has 3 columns (Metric, Year, and Value) and approximately 384 rows of data, capturing various infectious disease indicators reported over multiple years in Kenya.
Before we begin our analysis, we need to load the necessary R packages. These libraries provide functions for data manipulation, visualization, and reading CSV files.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.1
## Warning: package 'ggplot2' was built under R version 4.5.1
## Warning: package 'dplyr' was built under R version 4.5.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
library(readr)
Now we’ll import the Kenya infectious disease dataset into R. This dataset contains yearly records of various disease metrics collected over multiple decades.
# Load the dataset
data <- read_csv("C:/Users/user/Desktop/UN-Report/un-report/data/practice-datasets/kenya-infectious-disease-indicators.csv")
## Rows: 384 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Metric
## dbl (2): Year, Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s begin by examining the structure of our dataset. We’ll create a frequency table to see which disease metrics are most commonly reported and understand the overall composition of our data.
# Create frequency table of metrics
frequency_table <- data %>%
count(Metric) %>%
arrange(desc(n))
print(frequency_table)
## # A tibble: 15 × 2
## Metric n
## <chr> <int>
## 1 Measles - number of reported cases 46
## 2 Poliomyelitis - number of reported cases 39
## 3 Cholera case fatality rate 38
## 4 Number of reported cases of cholera 38
## 5 Number of reported deaths from cholera 38
## 6 Total tetanus - number of reported cases 36
## 7 Neonatal tetanus - number of reported cases 28
## 8 Pertussis - number of reported cases 28
## 9 Yellow fever - number of reported cases 27
## 10 Diphtheria - number of reported cases 24
## 11 Total rubella - number of reported cases 18
## 12 Number of suspected meningitis deaths reported 13
## 13 Mumps - number of reported cases 7
## 14 Congenital Rubella Syndrome - number of reported cases 2
## 15 Japanese encephalitis - number of reported cases 2
To prepare our data for analysis, we need to clean and filter it appropriately. This involves converting date formats, filtering specific disease metrics, and organizing the data into separate datasets for each disease of interest.
# Convert Year column to numeric format
data$Year <- as.numeric(data$Year)
# Filter different disease metrics
measles_cases <- filter(data, Metric == "Measles - number of reported cases")
polio_cases <- filter(data, Metric == "Poliomyelitis - number of reported cases")
cholera_cfr <- filter(data, Metric == "Cholera case fatality rate")
cholera_cases <- filter(data, Metric == "Number of reported cases of cholera")
cholera_deaths <- filter(data, Metric == "Number of reported deaths from cholera")
tetanus_total <- filter(data, Metric == "Total tetanus - number of reported cases")
tetanus_neonatal <- filter(data, Metric == "Neonatal tetanus - number of reported cases")
pertussis_cases <- filter(data, Metric == "Pertussis - number of reported cases")
yellow_fever_cases <- filter(data, Metric == "Yellow fever - number of reported cases")
diphtheria_cases <- filter(data, Metric == "Diphtheria - number of reported cases")
rubella_total <- filter(data, Metric == "Total rubella - number of reported cases")
meningitis_deaths <- filter(data, Metric == "Number of suspected meningitis deaths reported")
mumps_cases <- filter(data, Metric == "Mumps - number of reported cases")
congenital_rubella <- filter(data, Metric == "Congenital Rubella Syndrome - number of reported cases")
japanese_encephalitis <- filter(data, Metric == "Japanese encephalitis - number of reported cases")
In this section, we address three key research questions about infectious diseases in Kenya. Each question is explored through detailed statistical analysis and data visualization to reveal important public health trends and patterns.
Cholera is a major public health concern in Kenya, causing periodic outbreaks with significant morbidity and mortality. In this section, we examine the historical trends in both cholera cases and deaths to understand the disease burden and identify patterns in outbreak cycles.
Having reviewed each disease individually, we can now synthesize these findings to understand the broader patterns in infectious disease control in Kenya. Statistics
Here we present the key statistical highlights from the CFR data, including the highest and lowest rates observed during the study period.
First, we’ll calculate the total number of cholera cases and deaths recorded in the dataset to understand the overall magnitude of the cholera burden in Kenya.
# Calculate total cholera cases
total_cholera_cases <- sum(cholera_cases$Value, na.rm = TRUE)
print(paste("Total Cholera Cases:", total_cholera_cases))
## [1] "Total Cholera Cases: 118214"
# Calculate total cholera deaths
total_cholera_deaths <- sum(cholera_deaths$Value, na.rm = TRUE)
print(paste("Total Cholera Deaths:", total_cholera_deaths))
## [1] "Total Cholera Deaths: 3873"
Based on the calculated totals, we can now interpret what these numbers tell us about cholera’s impact in Kenya over time.
A total of 118,214 reported cholera cases have been recorded in Kenya over the available years. 3,873 deaths from cholera were reported in this time period. This indicates that cholera cases and deaths have varied considerably over time, with large outbreaks contributing to these totals.
The data reflect cholera as a significant and persistent public health challenge in Kenya, with periodic epidemics causing thousands of cases and fatalities. If viewed over time, these data would likely show peaks during epidemic years and lower counts in others, consistent with the known cyclical patterns of cholera outbreaks in the country.
To better visualize these trends, we’ll create a line plot showing both cholera cases and deaths over time. This dual-line graph will help us identify outbreak periods and assess the relationship between cases and mortality.
Variables:
Aesthetic Choices:
Geometry:
geom_line()
for trends over timegeom_point()
to show exact data pointsRationale: Line plots clearly show temporal trends and peaks. Different colors allow comparison between cases and deaths.
# Filter relevant cholera case and death data
cholera_data <- data %>%
filter(Metric %in% c("Number of reported cases of cholera", "Number of reported deaths from cholera")) %>%
mutate(Year = as.numeric(Year)) %>%
filter(!is.na(Year))
# Plot cholera trends
ggplot(cholera_data, aes(x = Year, y = Value, color = Metric)) +
geom_line() +
geom_point() +
labs(title = "Trends in Reported Cholera Cases and Deaths Over Time in Kenya",
x = "Year",
y = "Count",
color = "Metric") +
theme_minimal()
Cholera cases and deaths over time in Kenya
The case fatality rate is a critical indicator of healthcare quality and disease severity. By examining CFR trends over 45 years, we can assess improvements in cholera treatment and identify periods of healthcare system stress.
In this section, we break down the CFR trends by decade to identify key patterns and turning points in cholera management and outcomes.
Variables:
Aesthetic Choices:
Geometry:
geom_line()
and geom_point()
for line and
pointsDrawing from the CFR data analysis, we can now make informed conclusions about the evolution of cholera treatment and public health response in Kenya.
The trend in cholera CFR from 1971 to 2016 in Kenya highlights periods of high mortality, often linked to limited healthcare infrastructure, delayed response, or data inconsistencies. However, the overall decline in recent years suggests progress in public health interventions and cholera control strategies.
We’ll now visualize the CFR trends using a line graph with red coloring to emphasize mortality rates. This visualization will clearly show the fluctuations and overall trend direction over the 45-year period.
# Filter for cholera CFR from 1971 to 2016
cfr_data <- data %>%
filter(Metric == "Cholera case fatality rate") %>%
filter(Year >= 1971 & Year <= 2016) %>%
mutate(Year = as.numeric(Year))
# Plot CFR trends
ggplot(cfr_data, aes(x = Year, y = Value)) +
geom_line(color = "red") +
geom_point(color = "red") +
labs(title = "Trends in Cholera Case Fatality Rate (1971-2016)",
x = "Year",
y = "CFR (%)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Cholera case fatality rate trends
For comparison, we present an alternative visualization of the same CFR data using a dark green color scheme, which may be preferred for certain presentation contexts or to reduce visual fatigue.
# Filter cholera CFR between 1971 and 2016
cfr_detailed <- data %>%
filter(Metric == "Cholera case fatality rate") %>%
filter(Year >= 1971 & Year <= 2016) %>%
mutate(Year = as.numeric(Year))
# Plot CFR trends with dark green
ggplot(cfr_detailed, aes(x = Year, y = Value)) +
geom_line(color = "darkgreen") +
geom_point(color = "darkgreen") +
labs(title = "Trend in Cholera Case Fatality Rate (1971-2016)",
x = "Year",
y = "Case Fatality Rate (%)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Figure 3: Trend in Cholera Case Fatality Rate (1971-2016) - Alternative
Comparing multiple diseases helps us understand the relative public health burden and the effectiveness of different intervention strategies. In this section, we examine measles, poliomyelitis, and meningitis to see how their patterns differ over time.
Before diving into the visualization, let’s examine the summary statistics for each disease to understand their individual characteristics and trends.
The incidences of measles, poliomyelitis, and meningitis compare across years in Kenya as follows:
Trend: Measles had the highest incidence among the three diseases, with massive outbreaks in the 1980s. The number of cases declined significantly after the 1990s, likely due to expanded vaccination programs.
Trend: Polio cases were moderate in the 1980s and 1990s, with a clear decline toward eradication after 2000. Most recent years show zero reported cases, reflecting successful immunization efforts.
Trend: Meningitis data is sparse and irregular, but peaks in the late 1970s suggest episodic outbreaks. Reporting may have been inconsistent or limited to severe cases.
To compare these three diseases visually, we’ll create a multi-line graph with each disease represented by a different color. This allows for direct comparison of their temporal patterns and relative magnitudes.
Variables:
Aesthetic Choices:
Geometry:
geom_line()
with size = 1geom_point()
for data points# Add disease labels
measles_cases$Disease <- "Measles"
polio_cases$Disease <- "Poliomyelitis"
meningitis_deaths$Disease <- "Meningitis (Deaths)"
# Combine data
combined_data <- bind_rows(measles_cases, polio_cases, meningitis_deaths)
combined_data$Year <- as.numeric(combined_data$Year)
# Plot disease comparison
ggplot(combined_data, aes(x = Year, y = Value, color = Disease)) +
geom_line(size = 1) +
geom_point() +
labs(title = "Comparison of Measles, Poliomyelitis, and Meningitis in Kenya",
x = "Year",
y = "Number of Cases / Deaths",
color = "Disease") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Comparison of measles, polio, and meningitis trends
To provide a more comprehensive view of the data, we present additional focused visualizations that examine specific aspects of cholera trends in greater detail.
This detailed visualization focuses exclusively on cholera case numbers, using contrasting colors (blue line with red points) to make individual data points stand out and help identify specific outbreak years.
# Filter cholera cases
cholera_cases_filtered <- filter(data, Metric == "Number of reported cases of cholera")
cholera_cases_filtered$Year <- as.numeric(cholera_cases_filtered$Year)
# Plot cholera cases
ggplot(cholera_cases_filtered, aes(x = Year, y = Value)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red") +
labs(title = "Trend of Reported Cholera Cases in Kenya (1971–2016)",
x = "Year",
y = "Number of Cases") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Figure 5: Detailed Trend of Cholera Cases in Kenya
Similarly, we create a focused visualization of cholera deaths using red coloring throughout to emphasize the mortality aspect and make patterns in death rates clearly visible.
# Filter reported cholera deaths from 1971 to 2016
cholera_deaths_filtered <- data %>%
filter(Metric == "Number of reported deaths from cholera") %>%
filter(Year >= 1971 & Year <= 2016) %>%
mutate(Year = as.numeric(Year))
# Plot cholera deaths
ggplot(cholera_deaths_filtered, aes(x = Year, y = Value)) +
geom_line(color = "red") +
geom_point(color = "red") +
labs(title = "Reported Cholera Deaths in Kenya (1971-2016)",
x = "Year",
y = "Number of Deaths") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Figure 6: Reported Cholera Deaths in Kenya
Having completed our comprehensive analysis of infectious disease data in Kenya, we now synthesize the key findings and their implications for public health policy and practice.
This analysis examined infectious disease trends in Kenya, with particular attention to cholera patterns between 1971 and 2016. The key findings include:
Cholera remains a significant public health challenge in Kenya, with over 118,000 cases and nearly 4,000 deaths recorded over the study period.
Case fatality rates have generally improved over time, declining from peaks of over 15% in the early 1970s to lower rates in recent years, though occasional spikes indicate ongoing challenges.
Different diseases show distinct patterns:
Public health interventions appear effective, as evidenced by declining trends in most disease indicators, particularly for vaccine-preventable diseases.
These visualizations reveal important trends in disease incidence, mortality, and case fatality rates that can inform future public health interventions and resource allocation in Kenya.
For reproducibility purposes, we include the R session information showing the versions of R and all packages used in this analysis.
sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: Africa/Johannesburg
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [5] purrr_1.0.4 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
## [9] ggplot2_4.0.0 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 gtable_0.3.6 jsonlite_2.0.0 crayon_1.5.3
## [5] compiler_4.5.0 tidyselect_1.2.1 parallel_4.5.0 jquerylib_0.1.4
## [9] scales_1.4.0 yaml_2.3.10 fastmap_1.2.0 R6_2.6.1
## [13] labeling_0.4.3 generics_0.1.3 knitr_1.50 bslib_0.9.0
## [17] pillar_1.10.2 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.1.6
## [21] utf8_1.2.5 stringi_1.8.7 cachem_1.1.0 xfun_0.52
## [25] sass_0.4.10 S7_0.2.0 bit64_4.6.0-1 timechange_0.3.0
## [29] cli_3.6.5 withr_3.0.2 magrittr_2.0.3 digest_0.6.37
## [33] grid_4.5.0 vroom_1.6.5 rstudioapi_0.17.1 hms_1.1.3
## [37] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0
## [41] farver_2.1.2 rmarkdown_2.29 tools_4.5.0 pkgconfig_2.0.3
## [45] htmltools_0.5.8.1