Introduction

This report presents a comprehensive analysis of infectious disease indicators in Kenya, with particular focus on cholera trends and comparisons across multiple diseases including measles, poliomyelitis, and meningitis.

Dataset Description

The dataset covers infectious disease indicators in Kenya over multiple years, including reported cases and deaths for various diseases such as cholera, measles, polio, tetanus, diphtheria, and others. It provides yearly data on specific health metrics like number of cases, deaths, and case fatality rates for these diseases, enabling analysis of disease trends and impacts over time.

Dataset Size

The dataset “kenya-infectious-disease-indicators.csv” has 3 columns (Metric, Year, and Value) and approximately 384 rows of data, capturing various infectious disease indicators reported over multiple years in Kenya.

Loading Libraries

Before we begin our analysis, we need to load the necessary R packages. These libraries provide functions for data manipulation, visualization, and reading CSV files.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.1
## Warning: package 'ggplot2' was built under R version 4.5.1
## Warning: package 'dplyr' was built under R version 4.5.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
library(readr)

Loading Data

Now we’ll import the Kenya infectious disease dataset into R. This dataset contains yearly records of various disease metrics collected over multiple decades.

# Load the dataset
data <- read_csv("C:/Users/user/Desktop/UN-Report/un-report/data/practice-datasets/kenya-infectious-disease-indicators.csv")
## Rows: 384 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Metric
## dbl (2): Year, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Exploration

Let’s begin by examining the structure of our dataset. We’ll create a frequency table to see which disease metrics are most commonly reported and understand the overall composition of our data.

# Create frequency table of metrics
frequency_table <- data %>%
  count(Metric) %>%
  arrange(desc(n))

print(frequency_table)
## # A tibble: 15 × 2
##    Metric                                                     n
##    <chr>                                                  <int>
##  1 Measles - number of reported cases                        46
##  2 Poliomyelitis - number of reported cases                  39
##  3 Cholera case fatality rate                                38
##  4 Number of reported cases of cholera                       38
##  5 Number of reported deaths from cholera                    38
##  6 Total tetanus - number of reported cases                  36
##  7 Neonatal tetanus - number of reported cases               28
##  8 Pertussis - number of reported cases                      28
##  9 Yellow fever - number of reported cases                   27
## 10 Diphtheria - number of reported cases                     24
## 11 Total rubella - number of reported cases                  18
## 12 Number of suspected meningitis deaths reported            13
## 13 Mumps - number of reported cases                           7
## 14 Congenital Rubella Syndrome - number of reported cases     2
## 15 Japanese encephalitis - number of reported cases           2

Data Processing

To prepare our data for analysis, we need to clean and filter it appropriately. This involves converting date formats, filtering specific disease metrics, and organizing the data into separate datasets for each disease of interest.

# Convert Year column to numeric format
data$Year <- as.numeric(data$Year)

# Filter different disease metrics
measles_cases <- filter(data, Metric == "Measles - number of reported cases")
polio_cases <- filter(data, Metric == "Poliomyelitis - number of reported cases")
cholera_cfr <- filter(data, Metric == "Cholera case fatality rate")
cholera_cases <- filter(data, Metric == "Number of reported cases of cholera")
cholera_deaths <- filter(data, Metric == "Number of reported deaths from cholera")
tetanus_total <- filter(data, Metric == "Total tetanus - number of reported cases")
tetanus_neonatal <- filter(data, Metric == "Neonatal tetanus - number of reported cases")
pertussis_cases <- filter(data, Metric == "Pertussis - number of reported cases")
yellow_fever_cases <- filter(data, Metric == "Yellow fever - number of reported cases")
diphtheria_cases <- filter(data, Metric == "Diphtheria - number of reported cases")
rubella_total <- filter(data, Metric == "Total rubella - number of reported cases")
meningitis_deaths <- filter(data, Metric == "Number of suspected meningitis deaths reported")
mumps_cases <- filter(data, Metric == "Mumps - number of reported cases")
congenital_rubella <- filter(data, Metric == "Congenital Rubella Syndrome - number of reported cases")
japanese_encephalitis <- filter(data, Metric == "Japanese encephalitis - number of reported cases")

Research Questions

In this section, we address three key research questions about infectious diseases in Kenya. Each question is explored through detailed statistical analysis and data visualization to reveal important public health trends and patterns.

Question 1: How Have Reported Cholera Cases and Deaths Varied Over Time in Kenya?

Cholera is a major public health concern in Kenya, causing periodic outbreaks with significant morbidity and mortality. In this section, we examine the historical trends in both cholera cases and deaths to understand the disease burden and identify patterns in outbreak cycles.

Summary

Having reviewed each disease individually, we can now synthesize these findings to understand the broader patterns in infectious disease control in Kenya. Statistics

Here we present the key statistical highlights from the CFR data, including the highest and lowest rates observed during the study period.

First, we’ll calculate the total number of cholera cases and deaths recorded in the dataset to understand the overall magnitude of the cholera burden in Kenya.

# Calculate total cholera cases
total_cholera_cases <- sum(cholera_cases$Value, na.rm = TRUE)
print(paste("Total Cholera Cases:", total_cholera_cases))
## [1] "Total Cholera Cases: 118214"
# Calculate total cholera deaths
total_cholera_deaths <- sum(cholera_deaths$Value, na.rm = TRUE)
print(paste("Total Cholera Deaths:", total_cholera_deaths))
## [1] "Total Cholera Deaths: 3873"

Analysis

Based on the calculated totals, we can now interpret what these numbers tell us about cholera’s impact in Kenya over time.

A total of 118,214 reported cholera cases have been recorded in Kenya over the available years. 3,873 deaths from cholera were reported in this time period. This indicates that cholera cases and deaths have varied considerably over time, with large outbreaks contributing to these totals.

The data reflect cholera as a significant and persistent public health challenge in Kenya, with periodic epidemics causing thousands of cases and fatalities. If viewed over time, these data would likely show peaks during epidemic years and lower counts in others, consistent with the known cyclical patterns of cholera outbreaks in the country.

Question 3: How Do the Incidences of Different Infectious Diseases Compare Across Years in Kenya?

Comparing multiple diseases helps us understand the relative public health burden and the effectiveness of different intervention strategies. In this section, we examine measles, poliomyelitis, and meningitis to see how their patterns differ over time.

Overview

Before diving into the visualization, let’s examine the summary statistics for each disease to understand their individual characteristics and trends.

The incidences of measles, poliomyelitis, and meningitis compare across years in Kenya as follows:

a) Measles

  • Years Reported: 46
  • Peak Year: 1983
  • Peak Cases: 285,681
  • Average Annual Cases: ~41,989

Trend: Measles had the highest incidence among the three diseases, with massive outbreaks in the 1980s. The number of cases declined significantly after the 1990s, likely due to expanded vaccination programs.

b) Poliomyelitis

  • Years Reported: 39
  • Peak Year: 1988
  • Peak Cases: 1,688
  • Average Annual Cases: ~180

Trend: Polio cases were moderate in the 1980s and 1990s, with a clear decline toward eradication after 2000. Most recent years show zero reported cases, reflecting successful immunization efforts.

c) Meningitis (Suspected Deaths)

  • Years Reported: 13
  • Peak Year: 1977
  • Peak Deaths: 196
  • Average Annual Deaths: ~65

Trend: Meningitis data is sparse and irregular, but peaks in the late 1970s suggest episodic outbreaks. Reporting may have been inconsistent or limited to severe cases.

Summary

  • Measles was the most widespread and consistently reported disease.
  • Polio showed moderate incidence with successful control over time.
  • Meningitis had fewer data points but notable peaks in specific years.

Visualization 3: Comparison of Multiple Diseases

To compare these three diseases visually, we’ll create a multi-line graph with each disease represented by a different color. This allows for direct comparison of their temporal patterns and relative magnitudes.

Variables:

  • Year (continuous)
  • Number of Cases/Deaths (continuous)
  • Disease type (categorical)

Aesthetic Choices:

  • x-axis: Year
  • y-axis: Number of Cases/Deaths
  • Color: Disease type

Geometry:

  • geom_line() with size = 1
  • geom_point() for data points
# Add disease labels
measles_cases$Disease <- "Measles"
polio_cases$Disease <- "Poliomyelitis"
meningitis_deaths$Disease <- "Meningitis (Deaths)"

# Combine data
combined_data <- bind_rows(measles_cases, polio_cases, meningitis_deaths)
combined_data$Year <- as.numeric(combined_data$Year)

# Plot disease comparison
ggplot(combined_data, aes(x = Year, y = Value, color = Disease)) +
  geom_line(size = 1) +
  geom_point() +
  labs(title = "Comparison of Measles, Poliomyelitis, and Meningitis in Kenya",
       x = "Year",
       y = "Number of Cases / Deaths",
       color = "Disease") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Comparison of measles, polio, and meningitis trends

Comparison of measles, polio, and meningitis trends


Additional Detailed Visualizations

To provide a more comprehensive view of the data, we present additional focused visualizations that examine specific aspects of cholera trends in greater detail.

Cholera Cases Trend (1971-2016)

This detailed visualization focuses exclusively on cholera case numbers, using contrasting colors (blue line with red points) to make individual data points stand out and help identify specific outbreak years.

# Filter cholera cases
cholera_cases_filtered <- filter(data, Metric == "Number of reported cases of cholera")
cholera_cases_filtered$Year <- as.numeric(cholera_cases_filtered$Year)

# Plot cholera cases
ggplot(cholera_cases_filtered, aes(x = Year, y = Value)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red") +
  labs(title = "Trend of Reported Cholera Cases in Kenya (1971–2016)",
       x = "Year",
       y = "Number of Cases") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
Figure 5: Detailed Trend of Cholera Cases in Kenya

Figure 5: Detailed Trend of Cholera Cases in Kenya

Cholera Deaths Trend (1971-2016)

Similarly, we create a focused visualization of cholera deaths using red coloring throughout to emphasize the mortality aspect and make patterns in death rates clearly visible.

# Filter reported cholera deaths from 1971 to 2016
cholera_deaths_filtered <- data %>%
  filter(Metric == "Number of reported deaths from cholera") %>%
  filter(Year >= 1971 & Year <= 2016) %>%
  mutate(Year = as.numeric(Year))

# Plot cholera deaths
ggplot(cholera_deaths_filtered, aes(x = Year, y = Value)) +
  geom_line(color = "red") +
  geom_point(color = "red") +
  labs(title = "Reported Cholera Deaths in Kenya (1971-2016)",
       x = "Year",
       y = "Number of Deaths") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Figure 6: Reported Cholera Deaths in Kenya

Figure 6: Reported Cholera Deaths in Kenya


Overall Conclusions

Having completed our comprehensive analysis of infectious disease data in Kenya, we now synthesize the key findings and their implications for public health policy and practice.

This analysis examined infectious disease trends in Kenya, with particular attention to cholera patterns between 1971 and 2016. The key findings include:

  1. Cholera remains a significant public health challenge in Kenya, with over 118,000 cases and nearly 4,000 deaths recorded over the study period.

  2. Case fatality rates have generally improved over time, declining from peaks of over 15% in the early 1970s to lower rates in recent years, though occasional spikes indicate ongoing challenges.

  3. Different diseases show distinct patterns:

    • Measles had the highest burden but declined significantly after vaccination programs
    • Polio has been effectively controlled through immunization
    • Meningitis shows episodic outbreaks with sparse data
  4. Public health interventions appear effective, as evidenced by declining trends in most disease indicators, particularly for vaccine-preventable diseases.

These visualizations reveal important trends in disease incidence, mortality, and case fatality rates that can inform future public health interventions and resource allocation in Kenya.


Session Information

For reproducibility purposes, we include the R session information showing the versions of R and all packages used in this analysis.

sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: Africa/Johannesburg
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
##  [5] purrr_1.0.4     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
##  [9] ggplot2_4.0.0   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] bit_4.6.0          gtable_0.3.6       jsonlite_2.0.0     crayon_1.5.3      
##  [5] compiler_4.5.0     tidyselect_1.2.1   parallel_4.5.0     jquerylib_0.1.4   
##  [9] scales_1.4.0       yaml_2.3.10        fastmap_1.2.0      R6_2.6.1          
## [13] labeling_0.4.3     generics_0.1.3     knitr_1.50         bslib_0.9.0       
## [17] pillar_1.10.2      RColorBrewer_1.1-3 tzdb_0.5.0         rlang_1.1.6       
## [21] utf8_1.2.5         stringi_1.8.7      cachem_1.1.0       xfun_0.52         
## [25] sass_0.4.10        S7_0.2.0           bit64_4.6.0-1      timechange_0.3.0  
## [29] cli_3.6.5          withr_3.0.2        magrittr_2.0.3     digest_0.6.37     
## [33] grid_4.5.0         vroom_1.6.5        rstudioapi_0.17.1  hms_1.1.3         
## [37] lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.3     glue_1.8.0        
## [41] farver_2.1.2       rmarkdown_2.29     tools_4.5.0        pkgconfig_2.0.3   
## [45] htmltools_0.5.8.1