Exploratory Data Analysis - CEMA Internship Task 2023

In this R Markdown document, we will perform Exploratory Data Analysis on the “CEMA Internship Task 2023” dataset. The dataset contains information on health indicators across different counties in Kenya. Research Question* Based on the “CEMA Internship Task 2023” dataset, our research question is: “How do the health indicators of children under 5 years vary across different counties and how they vary over the 3 year period?”

Loading Necessary Libraries

# Load necessary libraries
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(magrittr)
library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

Reading the Dataset Let’s read the “CEMA Internship Task 2023” dataset and take a look at its structure.

# Read the dataset
cema_internship_task_2023<-read_csv("C:/Users/user/Downloads/cema_internship_task_2023.csv")
## Rows: 1410 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): period, county
## dbl (9): Total Dewormed, Acute Malnutrition, stunted 6-23 months, stunted 0-...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#View a few rows of the dataset
head(cema_internship_task_2023)
## # A tibble: 6 × 11
##   period county      `Total Dewormed` `Acute Malnutrition` `stunted 6-23 months`
##   <chr>  <chr>                  <dbl>                <dbl>                 <dbl>
## 1 Jan-23 Baringo Co…             3659                    8                   471
## 2 Jan-23 Bomet Coun…             1580                   NA                     1
## 3 Jan-23 Bungoma Co…             6590                   24                    98
## 4 Jan-23 Busia Coun…             7564                   NA                   396
## 5 Jan-23 Elgeyo Mar…             1407                   NA                    92
## 6 Jan-23 Embu County             3241                   72                   326
## # ℹ 6 more variables: `stunted 0-<6 months` <dbl>,
## #   `stunted 24-59 months` <dbl>, `diarrhoea cases` <dbl>,
## #   `Underweight 0-<6 months` <dbl>, `Underweight 6-23 months` <dbl>,
## #   `Underweight 24-59 Months` <dbl>

Data Preprocessing Converting “period” to Usable Dates We will convert the “period” column from character format to a date format, considering the date to be the middle of the month.

# Convert the "period" column to the middle of the month as a date
cema_internship_task_2023$period <- as.Date(paste0("15-", cema_internship_task_2023$period), format = "%d-%b-%y")

# View the first few rows of the converted dataset
head(cema_internship_task_2023)
## # A tibble: 6 × 11
##   period     county  `Total Dewormed` `Acute Malnutrition` `stunted 6-23 months`
##   <date>     <chr>              <dbl>                <dbl>                 <dbl>
## 1 2023-01-15 Baring…             3659                    8                   471
## 2 2023-01-15 Bomet …             1580                   NA                     1
## 3 2023-01-15 Bungom…             6590                   24                    98
## 4 2023-01-15 Busia …             7564                   NA                   396
## 5 2023-01-15 Elgeyo…             1407                   NA                    92
## 6 2023-01-15 Embu C…             3241                   72                   326
## # ℹ 6 more variables: `stunted 0-<6 months` <dbl>,
## #   `stunted 24-59 months` <dbl>, `diarrhoea cases` <dbl>,
## #   `Underweight 0-<6 months` <dbl>, `Underweight 6-23 months` <dbl>,
## #   `Underweight 24-59 Months` <dbl>

Summary Statistics Let’s calculate summary statistics for the dataset to get an overview of the data.

# Summary statistics of the dataset
summary(cema_internship_task_2023)
##      period              county          Total Dewormed   Acute Malnutrition
##  Min.   :2021-01-15   Length:1410        Min.   :    97   Min.   :   1.0    
##  1st Qu.:2021-08-15   Class :character   1st Qu.:  2454   1st Qu.:  15.0    
##  Median :2022-03-30   Mode  :character   Median :  4564   Median :  39.0    
##  Mean   :2022-03-31                      Mean   : 11458   Mean   : 125.4    
##  3rd Qu.:2022-11-15                      3rd Qu.:  8222   3rd Qu.: 143.5    
##  Max.   :2023-06-15                      Max.   :392800   Max.   :4123.0    
##                                                           NA's   :355       
##  stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
##  Min.   :   1.0      Min.   :   1.0      Min.   :   1.0       Min.   :  198  
##  1st Qu.:  69.5      1st Qu.:  36.5      1st Qu.:  22.0       1st Qu.: 1464  
##  Median : 159.0      Median :  84.0      Median :  50.0       Median : 2158  
##  Mean   : 280.2      Mean   : 139.8      Mean   : 110.8       Mean   : 2813  
##  3rd Qu.: 328.5      3rd Qu.: 157.0      3rd Qu.: 114.2       3rd Qu.: 3335  
##  Max.   :4398.0      Max.   :7900.0      Max.   :3169.0       Max.   :15795  
##  NA's   :11          NA's   :19          NA's   :14                          
##  Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
##  Min.   :   6.0          Min.   :  16.0          Min.   :   1.00         
##  1st Qu.:  87.0          1st Qu.: 249.0          1st Qu.:  51.25         
##  Median : 162.5          Median : 456.0          Median : 120.50         
##  Mean   : 223.5          Mean   : 652.3          Mean   : 305.74         
##  3rd Qu.: 272.8          3rd Qu.: 791.8          3rd Qu.: 311.00         
##  Max.   :1937.0          Max.   :5348.0          Max.   :4680.00         
## 

Handling NA Values We will check for the presence of NA values in each column of the dataset.

# Check for NA values in each column
colSums(is.na(cema_internship_task_2023))
##                   period                   county           Total Dewormed 
##                        0                        0                        0 
##       Acute Malnutrition      stunted 6-23 months      stunted 0-<6 months 
##                      355                       11                       19 
##     stunted 24-59 months          diarrhoea cases  Underweight 0-<6 months 
##                       14                        0                        0 
##  Underweight 6-23 months Underweight 24-59 Months 
##                        0                        0

Merging County Codes and Names The dataset contains county names, but we also have a separate dataset with county codes and names. We will merge the two datasets based on the county name, so we have the county code information in the main dataset.

# Read the dataset containing the county codes and names
county_code <- read_csv("C:/Users/user/Downloads/county code.csv")
## Rows: 47 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): county
## dbl (1): County Code
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(county_code)
## # A tibble: 6 × 2
##   `County Code` county             
##           <dbl> <chr>              
## 1             1 Mombasa County     
## 2             2 Kwale County       
## 3             3 Kilifi County      
## 4             4 Tana River County  
## 5             5 Lamu County        
## 6             6 Taita Taveta County
# Merge the two datasets based on the county column
cema_internship_task_2023 <- merge(cema_internship_task_2023, county_code, by.x = "county", by.y = "county", all.x = TRUE)
head(cema_internship_task_2023)
##           county     period Total Dewormed Acute Malnutrition
## 1 Baringo County 2023-01-15           3659                  8
## 2 Baringo County 2023-03-15           5113                 48
## 3 Baringo County 2022-04-15           3938                  3
## 4 Baringo County 2021-08-15           3153                  2
## 5 Baringo County 2021-02-15           4376                  1
## 6 Baringo County 2022-08-15           3517                  1
##   stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
## 1                 471                  34                  380            2620
## 2                 751                 225                 1104            3023
## 3                 360                  48                   88            2822
## 4                 110                 113                   53            2465
## 5                 114                  34                   24            1599
## 6                 160                  86                  131            2045
##   Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
## 1                      85                   739.0                      731
## 2                     129                  1192.0                     1538
## 3                      90                   658.3                      188
## 4                      83                   176.0                      144
## 5                      72                   212.0                      107
## 6                     241                   507.0                      368
##   County Code
## 1          30
## 2          30
## 3          30
## 4          30
## 5          30
## 6          30
# Summary statistics of the merged dataset
summary(cema_internship_task_2023)
##     county              period           Total Dewormed   Acute Malnutrition
##  Length:1410        Min.   :2021-01-15   Min.   :    97   Min.   :   1.0    
##  Class :character   1st Qu.:2021-08-15   1st Qu.:  2454   1st Qu.:  15.0    
##  Mode  :character   Median :2022-03-30   Median :  4564   Median :  39.0    
##                     Mean   :2022-03-31   Mean   : 11458   Mean   : 125.4    
##                     3rd Qu.:2022-11-15   3rd Qu.:  8222   3rd Qu.: 143.5    
##                     Max.   :2023-06-15   Max.   :392800   Max.   :4123.0    
##                                                           NA's   :355       
##  stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
##  Min.   :   1.0      Min.   :   1.0      Min.   :   1.0       Min.   :  198  
##  1st Qu.:  69.5      1st Qu.:  36.5      1st Qu.:  22.0       1st Qu.: 1464  
##  Median : 159.0      Median :  84.0      Median :  50.0       Median : 2158  
##  Mean   : 280.2      Mean   : 139.8      Mean   : 110.8       Mean   : 2813  
##  3rd Qu.: 328.5      3rd Qu.: 157.0      3rd Qu.: 114.2       3rd Qu.: 3335  
##  Max.   :4398.0      Max.   :7900.0      Max.   :3169.0       Max.   :15795  
##  NA's   :11          NA's   :19          NA's   :14                          
##  Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
##  Min.   :   6.0          Min.   :  16.0          Min.   :   1.00         
##  1st Qu.:  87.0          1st Qu.: 249.0          1st Qu.:  51.25         
##  Median : 162.5          Median : 456.0          Median : 120.50         
##  Mean   : 223.5          Mean   : 652.3          Mean   : 305.74         
##  3rd Qu.: 272.8          3rd Qu.: 791.8          3rd Qu.: 311.00         
##  Max.   :1937.0          Max.   :5348.0          Max.   :4680.00         
##                                                                          
##   County Code
##  Min.   : 1  
##  1st Qu.:12  
##  Median :24  
##  Mean   :24  
##  3rd Qu.:36  
##  Max.   :47  
## 
# Check for NA values in each column of the merged dataset
colSums(is.na(cema_internship_task_2023))
##                   county                   period           Total Dewormed 
##                        0                        0                        0 
##       Acute Malnutrition      stunted 6-23 months      stunted 0-<6 months 
##                      355                       11                       19 
##     stunted 24-59 months          diarrhoea cases  Underweight 0-<6 months 
##                       14                        0                        0 
##  Underweight 6-23 months Underweight 24-59 Months              County Code 
##                        0                        0                        0
# Check the exact column names in the dataset
names(cema_internship_task_2023)
##  [1] "county"                   "period"                  
##  [3] "Total Dewormed"           "Acute Malnutrition"      
##  [5] "stunted 6-23 months"      "stunted 0-<6 months"     
##  [7] "stunted 24-59 months"     "diarrhoea cases"         
##  [9] "Underweight 0-<6 months"  "Underweight 6-23 months" 
## [11] "Underweight 24-59 Months" "County Code"

Rearranging Columns To make the “County Code” one of the first columns in the dataset, we will rearrange the columns.

##Rearranging the columns so that the County code can be among the 1st columns
cema_internship_task_2023 <- cema_internship_task_2023 %>%
  select(`County Code`, county, period, `Total Dewormed`, `Acute Malnutrition`,
         `stunted 6-23 months`, `stunted 0-<6 months`, `stunted 24-59 months`,
         `diarrhoea cases`, `Underweight 0-<6 months`, `Underweight 6-23 months`,
         `Underweight 24-59 Months`)

head(cema_internship_task_2023)
##   County Code         county     period Total Dewormed Acute Malnutrition
## 1          30 Baringo County 2023-01-15           3659                  8
## 2          30 Baringo County 2023-03-15           5113                 48
## 3          30 Baringo County 2022-04-15           3938                  3
## 4          30 Baringo County 2021-08-15           3153                  2
## 5          30 Baringo County 2021-02-15           4376                  1
## 6          30 Baringo County 2022-08-15           3517                  1
##   stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
## 1                 471                  34                  380            2620
## 2                 751                 225                 1104            3023
## 3                 360                  48                   88            2822
## 4                 110                 113                   53            2465
## 5                 114                  34                   24            1599
## 6                 160                  86                  131            2045
##   Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
## 1                      85                   739.0                      731
## 2                     129                  1192.0                     1538
## 3                      90                   658.3                      188
## 4                      83                   176.0                      144
## 5                      72                   212.0                      107
## 6                     241                   507.0                      368

Data Visualization of the health indicators across counties

Distribution of Total Dewormed across Counties

#Visualize the distribution of "Total Dewormed" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Total Dewormed`)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(title = "Distribution of Total Dewormed across Counties",
       x = "County Code",
       y = "Total Dewormed") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Distribution of Acute Malnutrition across Counties

#Visualize the distribution of "Acute Malnutrition" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Acute Malnutrition`)) +
  geom_bar(stat = "identity", fill = "red") +
  labs(title = "Distribution of Acute Malnutrition across Counties",
       x = "County Code",
       y = "Acute Malnutrition") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 355 rows containing missing values (`position_stack()`).

Distribution of stunted 6-23 months across Counties

#Visualize the distribution of "stunted 6-23 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `stunted 6-23 months`)) +
  geom_bar(stat = "identity", fill = "green") +
  labs(title = "Distribution of stunted 6-23 months across Counties",
       x = "County Code",
       y = "stunted 6-23 months") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 11 rows containing missing values (`position_stack()`).

Distribution of stunted 0-<6 months across Counties

#Visualize the distribution of "stunted 0-<6 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `stunted 0-<6 months`)) +
  geom_bar(stat = "identity", fill = "purple") +
  labs(title = "Distribution of stunted 0-<6 months across Counties",
       x = "County Code",
       y = "stunted 0-<6 months") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 19 rows containing missing values (`position_stack()`).

Distribution of stunted 24-59 months across Counties

#Visualize the distribution of "stunted 24-59 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `stunted 24-59 months`)) +
  geom_bar(stat = "identity", fill = "orange") +
  labs(title = "Distribution of stunted 24-59 months across Counties",
       x = "County Code",
       y = "stunted 24-59 months") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 14 rows containing missing values (`position_stack()`).

Distribution of diarrhoea cases across counties

#Visualize the distribution of "diarrhoea cases" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `diarrhoea cases`)) +
  geom_bar(stat = "identity", fill = "brown") +
  labs(title = "Distribution of Diarrhoea Cases across Counties",
       x = "County Code",
       y = "Diarrhoea Cases") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

##Distribution of “Underweight 0-<6 months” across counties

#Visualize the distribution of "Underweight 0-<6 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Underweight 0-<6 months`)) +
  geom_bar(stat = "identity", fill = "pink") +
  labs(title = "Distribution of Underweight 0-<6 months across Counties",
       x = "County Code",
       y = "Underweight 0-<6 months") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Distribution of “Underweight 6-23 months” across counties

#Visualize the distribution of "Underweight 6-23 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Underweight 6-23 months`)) +
  geom_bar(stat = "identity", fill = "gray") +
  labs(title = "Distribution of Underweight 6-23 months across Counties",
       x = "County Code",
       y = "Underweight 6-23 months") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Distribution of “Underweight 24-59 Months” across counties

#Visualize the distribution of "Underweight 24-59 Months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Underweight 24-59 Months`)) +
  geom_bar(stat = "identity", fill = "cyan") +
  labs(title = "Distribution of Underweight 24-59 Months across Counties",
       x = "County Code",
       y = "Underweight 24-59 Months") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Perform ANOVA for each health indicator using rstatix (wrapped with invisible())
invisible({
  # Assuming "rstatix" is loaded and contains the relevant functions
  library(rstatix)
  
  # Group by County Code and calculate mean Total Dewormed
  anova_results <- cema_internship_task_2023 %>%
    group_by(`County Code`) %>%
    summarise(Total_Dewormed_Mean = mean(`Total Dewormed`, na.rm = TRUE)) %>%
    anova_test(Total_Dewormed_Mean ~ `County Code`)
  
  # View ANOVA results
  print(anova_results)
})
## ANOVA Table (type II tests)
## 
##          Effect DFn DFd     F     p p<.05   ges
## 1 `County Code`   1  45 1.364 0.249       0.029
# Perform ANOVA for each health indicator
anova_results_acute_malnutrition <- cema_internship_task_2023 %>%
  anova_test(`Acute Malnutrition` ~ `County Code`)
## Warning: NA detected in rows: 17,28,32,33,34,35,36,37,38,40,41,44,45,47,48,49,50,51,53,54,55,56,57,58,59,60,63,64,65,70,73,75,76,77,78,82,83,84,86,87,88,89,90,91,93,94,96,99,100,101,102,103,111,113,114,116,118,119,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,154,306,314,316,320,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,399,409,421,425,427,428,429,432,433,434,436,437,438,440,441,442,443,445,447,448,449,450,451,452,453,457,458,459,460,461,462,463,464,466,467,468,469,470,471,473,474,475,477,478,479,480,521,524,529,533,534,571,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,854,856,859,864,932,934,935,938,939,941,945,946,947,950,954,958,959,960,991,992,993,994,996,997,998,999,1000,1001,1002,1003,1005,1006,1007,1008,1009,1011,1013,1014,1016,1018,1019,1021,1022,1023,1024,1025,1026,1027,1028,1029,1030,1031,1032,1033,1034,1035,1036,1037,1039,1040,1042,1043,1044,1046,1047,1048,1049,1050,1091,1093,1094,1095,1105,1108,1114,1115,1116,1119,1120,1121,1123,1124,1125,1127,1130,1131,1133,1134,1135,1136,1137,1138,1140,1142,1143,1144,1145,1147,1150,1152,1153,1154,1155,1156,1158,1160,1161,1164,1165,1166,1167,1168,1169,1203,1204,1221,1227,1237,1244,1245,1249,1252,1291,1292,1293,1294,1295,1296,1297,1299,1300,1301,1302,1303,1304,1305,1306,1307,1309,1310,1311,1312,1313,1314,1315,1316,1317,1318,1319,1320.
## Removing this rows before the analysis.
anova_results_stunted_6_23 <- cema_internship_task_2023 %>%
  anova_test(`stunted 6-23 months` ~ `County Code`)
## Warning: NA detected in rows: 32,34,35,44,45,47,54,57,59,357,1140.
## Removing this rows before the analysis.
anova_results_stunted_0_6 <- cema_internship_task_2023 %>%
  anova_test(`stunted 0-<6 months` ~ `County Code`)
## Warning: NA detected in rows: 32,33,34,36,39,44,50,57,59,200,204,260,342,357,606,614,621,1356,1360.
## Removing this rows before the analysis.
anova_results_stunted_24_59 <- cema_internship_task_2023 %>%
  anova_test(`stunted 24-59 months` ~ `County Code`)
## Warning: NA detected in rows: 41,56,59,159,338,339,606,613,614,617,621,627,630,1237.
## Removing this rows before the analysis.
anova_results_diarrhoea <- cema_internship_task_2023 %>%
  anova_test(`diarrhoea cases` ~ `County Code`)

anova_results_underweight_0_6 <- cema_internship_task_2023 %>%
  anova_test(`Underweight 0-<6 months` ~ `County Code`)

anova_results_underweight_6_23 <- cema_internship_task_2023 %>%
  anova_test(`Underweight 6-23 months` ~ `County Code`)

anova_results_underweight_24_59 <- cema_internship_task_2023 %>%
  anova_test(`Underweight 24-59 Months` ~ `County Code`)

# View ANOVA results
print(anova_results_acute_malnutrition)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd      F        p p<.05   ges
## 1 `County Code`   1 1053 48.337 6.29e-12     * 0.044
print(anova_results_stunted_6_23)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd     F     p p<.05      ges
## 1 `County Code`   1 1397 0.526 0.468       0.000376
print(anova_results_stunted_0_6)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd      F       p p<.05  ges
## 1 `County Code`   1 1389 28.736 9.7e-08     * 0.02
print(anova_results_stunted_24_59)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd      F        p p<.05   ges
## 1 `County Code`   1 1394 32.898 1.19e-08     * 0.023
print(anova_results_diarrhoea)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd     F     p p<.05   ges
## 1 `County Code`   1 1408 4.582 0.032     * 0.003
print(anova_results_underweight_0_6)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd      F        p p<.05   ges
## 1 `County Code`   1 1408 31.658 2.21e-08     * 0.022
print(anova_results_underweight_6_23)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd      F       p p<.05  ges
## 1 `County Code`   1 1408 13.559 0.00024     * 0.01
print(anova_results_underweight_24_59)
## ANOVA Table (type II tests)
## 
##          Effect DFn  DFd     F        p p<.05   ges
## 1 `County Code`   1 1408 76.72 5.55e-18     * 0.052

Interpretation:Summary of ANOVA Analysis

The ANOVA analysis was conducted to examine the variations in health indicators across different counties in Kenya. The health indicators investigated were “Total Dewormed,” “Acute Malnutrition,” “Stunted 6-23 months,” “Stunted 0-<6 months,” “Stunted 24-59 months,” “Diarrhea cases,” “Underweight 0-<6 months,” “Underweight 6-23 months,” and “Underweight 24-59 Months.”

For “Total Dewormed,” the analysis did not find significant differences in the mean total number of children dewormed among the counties (p-value = 0.249), suggesting that the variation in deworming may be attributed to chance.

For “Acute Malnutrition,” there were significant differences in the mean values among the counties (p < 0.05), indicating that some counties may have higher levels of acute malnutrition in children <5 years.

Regarding stunting, “Stunted 0-<6 months” showed significant differences in mean values across counties (p < 0.05), suggesting variations in stunted growth among infants aged 0 to 6 months.

For “Stunted 6-23 months,” “Stunted 24-59 months,” and “Diarrhea cases,” the analysis did not find significant differences in the mean values among counties (p > 0.05).

For underweight children, “Underweight 0-<6 months” and “Underweight 24-59 months” exhibited significant differences in mean values among counties (p < 0.05), implying variations in underweight prevalence in these age groups.

However, “Underweight 6-23 months” did not show significant differences in mean values among counties (p > 0.05).

Overall, the ANOVA analysis provided insights into the variations in health indicators across counties in Kenya. These findings can help inform targeted interventions and policies to address specific health challenges faced by children under 5 years in different regions of the country.

Trend Analysis (Time Series)

# Select the relevant columns for trend analysis (e.g., "period" and "Total Dewormed")
trend_data <- cema_internship_task_2023 %>%
  select(period, `Total Dewormed`)

# Calculate the mean Total Dewormed for each period
mean_dewormed_by_period <- trend_data %>%
  group_by(period) %>%
  summarise(mean_dewormed = mean(`Total Dewormed`, na.rm = TRUE))

# Plot the trend of Total Dewormed over time
ggplot(mean_dewormed_by_period, aes(x = period, y = mean_dewormed)) +
  geom_line() +
  labs(title = "Trend Analysis of Total Dewormed in Kenya",
       x = "Period",
       y = "Mean Total Dewormed") +
  theme_minimal()

Trend Analysis for Acute Malnutrition

# Select the relevant columns for trend analysis (e.g., "period" and "Acute Malnutrition")
trend_data_acute_malnutrition <- cema_internship_task_2023 %>%
  select(period, `Acute Malnutrition`)

# Calculate the mean Acute Malnutrition for each period
mean_acute_malnutrition_by_period <- trend_data_acute_malnutrition %>%
  group_by(period) %>%
  summarise(mean_acute_malnutrition = mean(`Acute Malnutrition`, na.rm = TRUE))

# Plot the trend of Acute Malnutrition over time
ggplot(mean_acute_malnutrition_by_period, aes(x = period, y = mean_acute_malnutrition)) +
  geom_line() +
  labs(title = "Trend Analysis of Acute Malnutrition in Kenya",
       x = "Period",
       y = "Mean Acute Malnutrition") +
  theme_minimal()

Trend Analysis for Stunted 6-23 months

# Select the relevant columns for trend analysis (e.g., "period" and "stunted 6-23 months")
trend_data_stunted_6_23 <- cema_internship_task_2023 %>%
  select(period, `stunted 6-23 months`)

# Calculate the mean stunted 6-23 months for each period
mean_stunted_6_23_by_period <- trend_data_stunted_6_23 %>%
  group_by(period) %>%
  summarise(mean_stunted_6_23 = mean(`stunted 6-23 months`, na.rm = TRUE))

# Plot the trend of stunted 6-23 months over time
ggplot(mean_stunted_6_23_by_period, aes(x = period, y = mean_stunted_6_23)) +
  geom_line() +
  labs(title = "Trend Analysis of Stunted 6-23 months in Kenya",
       x = "Period",
       y = "Mean Stunted 6-23 months") +
  theme_minimal()

Trend Analysis for Stunted 0-<6 months

# Select the relevant columns for trend analysis (e.g., "period" and "stunted 0-<6 months")
trend_data_stunted_0_6 <- cema_internship_task_2023 %>%
  select(period, `stunted 0-<6 months`)

# Calculate the mean stunted 0-<6 months for each period
mean_stunted_0_6_by_period <- trend_data_stunted_0_6 %>%
  group_by(period) %>%
  summarise(mean_stunted_0_6 = mean(`stunted 0-<6 months`, na.rm = TRUE))

# Plot the trend of stunted 0-<6 months over time
ggplot(mean_stunted_0_6_by_period, aes(x = period, y = mean_stunted_0_6)) +
  geom_line() +
  labs(title = "Trend Analysis of Stunted 0-<6 months in Kenya",
       x = "Period",
       y = "Mean Stunted 0-<6 months") +
  theme_minimal()

Trend Analysis for Underweight 6-23 months

# Select the relevant columns for trend analysis (e.g., "period" and "Underweight 6-23 months")
trend_data_underweight_6_23 <- cema_internship_task_2023 %>%
  select(period, `Underweight 6-23 months`)

# Calculate the mean Underweight 6-23 months for each period
mean_underweight_6_23_by_period <- trend_data_underweight_6_23 %>%
  group_by(period) %>%
  summarise(mean_underweight_6_23 = mean(`Underweight 6-23 months`, na.rm = TRUE))

# Plot the trend of Underweight 6-23 months over time
ggplot(mean_underweight_6_23_by_period, aes(x = period, y = mean_underweight_6_23)) +
  geom_line() +
  labs(title = "Trend Analysis of Underweight 6-23 months in Kenya",
       x = "Period",
       y = "Mean Underweight 6-23 months") +
  theme_minimal()

Trend Analysis for Underweight 24-59 Months

# Select the relevant columns for trend analysis (e.g., "period" and "Underweight 24-59 Months")
trend_data_underweight_24_59 <- cema_internship_task_2023 %>%
  select(period, `Underweight 24-59 Months`)

# Calculate the mean Underweight 24-59 Months for each period
mean_underweight_24_59_by_period <- trend_data_underweight_24_59 %>%
  group_by(period) %>%
  summarise(mean_underweight_24_59 = mean(`Underweight 24-59 Months`, na.rm = TRUE))

# Plot the trend of Underweight 24-59 Months over time
ggplot(mean_underweight_24_59_by_period, aes(x = period, y = mean_underweight_24_59)) +
  geom_line() +
  labs(title = "Trend Analysis of Underweight 24-59 Months in Kenya",
       x = "Period",
       y = "Mean Underweight 24-59 Months") +
  theme_minimal()  

Interpretation

The trend analysis for the health indicators for children under 5 years in Kenya reveals concerning findings. Most health indicators, such as “Total Dewormed,” “Acute Malnutrition,” “Stunted 6-23 months,” “Stunted 24-59 months,” “Diarrhea cases,” “Underweight 0-<6 months,” “Underweight 6-23 months,” and “Underweight 24-59 Months,” show an upward trend over the period from January 2021 to June 2023. This upward trend is a cause for concern as it indicates a potential deterioration in the health status of young children across counties.

However, there is an interesting observation in the “Mean Stunted 0-<6 months” indicator. Unlike the other indicators, “Mean Stunted 0-<6 months” has maintained a relatively steady mean average trend over time. However, in the year 2023, there was a sharp spike in the number of stunted children aged 0 to 6 months. This sudden increase in stunted growth in this age group demands attention and further investigation to understand the underlying factors contributing to this spike.

The rising trends in most health indicators highlight the importance of implementing targeted interventions and policy measures to address the health challenges faced by young children in Kenya. The findings from this exploratory data analysis can serve as valuable insights for policymakers, health practitioners, and stakeholders to design and implement effective strategies to improve the health and well-being of children under 5 years in the country.