Exploratory Data Analysis - CEMA Internship Task 2023
In this R Markdown document, we will perform Exploratory Data Analysis on the “CEMA Internship Task 2023” dataset. The dataset contains information on health indicators across different counties in Kenya. Research Question* Based on the “CEMA Internship Task 2023” dataset, our research question is: “How do the health indicators of children under 5 years vary across different counties and how they vary over the 3 year period?”
Loading Necessary Libraries
# Load necessary libraries
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(magrittr)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
Reading the Dataset Let’s read the “CEMA Internship Task 2023” dataset and take a look at its structure.
# Read the dataset
cema_internship_task_2023<-read_csv("C:/Users/user/Downloads/cema_internship_task_2023.csv")
## Rows: 1410 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): period, county
## dbl (9): Total Dewormed, Acute Malnutrition, stunted 6-23 months, stunted 0-...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#View a few rows of the dataset
head(cema_internship_task_2023)
## # A tibble: 6 × 11
## period county `Total Dewormed` `Acute Malnutrition` `stunted 6-23 months`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Jan-23 Baringo Co… 3659 8 471
## 2 Jan-23 Bomet Coun… 1580 NA 1
## 3 Jan-23 Bungoma Co… 6590 24 98
## 4 Jan-23 Busia Coun… 7564 NA 396
## 5 Jan-23 Elgeyo Mar… 1407 NA 92
## 6 Jan-23 Embu County 3241 72 326
## # ℹ 6 more variables: `stunted 0-<6 months` <dbl>,
## # `stunted 24-59 months` <dbl>, `diarrhoea cases` <dbl>,
## # `Underweight 0-<6 months` <dbl>, `Underweight 6-23 months` <dbl>,
## # `Underweight 24-59 Months` <dbl>
Data Preprocessing Converting “period” to Usable Dates We will convert the “period” column from character format to a date format, considering the date to be the middle of the month.
# Convert the "period" column to the middle of the month as a date
cema_internship_task_2023$period <- as.Date(paste0("15-", cema_internship_task_2023$period), format = "%d-%b-%y")
# View the first few rows of the converted dataset
head(cema_internship_task_2023)
## # A tibble: 6 × 11
## period county `Total Dewormed` `Acute Malnutrition` `stunted 6-23 months`
## <date> <chr> <dbl> <dbl> <dbl>
## 1 2023-01-15 Baring… 3659 8 471
## 2 2023-01-15 Bomet … 1580 NA 1
## 3 2023-01-15 Bungom… 6590 24 98
## 4 2023-01-15 Busia … 7564 NA 396
## 5 2023-01-15 Elgeyo… 1407 NA 92
## 6 2023-01-15 Embu C… 3241 72 326
## # ℹ 6 more variables: `stunted 0-<6 months` <dbl>,
## # `stunted 24-59 months` <dbl>, `diarrhoea cases` <dbl>,
## # `Underweight 0-<6 months` <dbl>, `Underweight 6-23 months` <dbl>,
## # `Underweight 24-59 Months` <dbl>
Summary Statistics Let’s calculate summary statistics for the dataset to get an overview of the data.
# Summary statistics of the dataset
summary(cema_internship_task_2023)
## period county Total Dewormed Acute Malnutrition
## Min. :2021-01-15 Length:1410 Min. : 97 Min. : 1.0
## 1st Qu.:2021-08-15 Class :character 1st Qu.: 2454 1st Qu.: 15.0
## Median :2022-03-30 Mode :character Median : 4564 Median : 39.0
## Mean :2022-03-31 Mean : 11458 Mean : 125.4
## 3rd Qu.:2022-11-15 3rd Qu.: 8222 3rd Qu.: 143.5
## Max. :2023-06-15 Max. :392800 Max. :4123.0
## NA's :355
## stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
## Min. : 1.0 Min. : 1.0 Min. : 1.0 Min. : 198
## 1st Qu.: 69.5 1st Qu.: 36.5 1st Qu.: 22.0 1st Qu.: 1464
## Median : 159.0 Median : 84.0 Median : 50.0 Median : 2158
## Mean : 280.2 Mean : 139.8 Mean : 110.8 Mean : 2813
## 3rd Qu.: 328.5 3rd Qu.: 157.0 3rd Qu.: 114.2 3rd Qu.: 3335
## Max. :4398.0 Max. :7900.0 Max. :3169.0 Max. :15795
## NA's :11 NA's :19 NA's :14
## Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
## Min. : 6.0 Min. : 16.0 Min. : 1.00
## 1st Qu.: 87.0 1st Qu.: 249.0 1st Qu.: 51.25
## Median : 162.5 Median : 456.0 Median : 120.50
## Mean : 223.5 Mean : 652.3 Mean : 305.74
## 3rd Qu.: 272.8 3rd Qu.: 791.8 3rd Qu.: 311.00
## Max. :1937.0 Max. :5348.0 Max. :4680.00
##
Handling NA Values We will check for the presence of NA values in each column of the dataset.
# Check for NA values in each column
colSums(is.na(cema_internship_task_2023))
## period county Total Dewormed
## 0 0 0
## Acute Malnutrition stunted 6-23 months stunted 0-<6 months
## 355 11 19
## stunted 24-59 months diarrhoea cases Underweight 0-<6 months
## 14 0 0
## Underweight 6-23 months Underweight 24-59 Months
## 0 0
Merging County Codes and Names The dataset contains county names, but we also have a separate dataset with county codes and names. We will merge the two datasets based on the county name, so we have the county code information in the main dataset.
# Read the dataset containing the county codes and names
county_code <- read_csv("C:/Users/user/Downloads/county code.csv")
## Rows: 47 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): county
## dbl (1): County Code
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(county_code)
## # A tibble: 6 × 2
## `County Code` county
## <dbl> <chr>
## 1 1 Mombasa County
## 2 2 Kwale County
## 3 3 Kilifi County
## 4 4 Tana River County
## 5 5 Lamu County
## 6 6 Taita Taveta County
# Merge the two datasets based on the county column
cema_internship_task_2023 <- merge(cema_internship_task_2023, county_code, by.x = "county", by.y = "county", all.x = TRUE)
head(cema_internship_task_2023)
## county period Total Dewormed Acute Malnutrition
## 1 Baringo County 2023-01-15 3659 8
## 2 Baringo County 2023-03-15 5113 48
## 3 Baringo County 2022-04-15 3938 3
## 4 Baringo County 2021-08-15 3153 2
## 5 Baringo County 2021-02-15 4376 1
## 6 Baringo County 2022-08-15 3517 1
## stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
## 1 471 34 380 2620
## 2 751 225 1104 3023
## 3 360 48 88 2822
## 4 110 113 53 2465
## 5 114 34 24 1599
## 6 160 86 131 2045
## Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
## 1 85 739.0 731
## 2 129 1192.0 1538
## 3 90 658.3 188
## 4 83 176.0 144
## 5 72 212.0 107
## 6 241 507.0 368
## County Code
## 1 30
## 2 30
## 3 30
## 4 30
## 5 30
## 6 30
# Summary statistics of the merged dataset
summary(cema_internship_task_2023)
## county period Total Dewormed Acute Malnutrition
## Length:1410 Min. :2021-01-15 Min. : 97 Min. : 1.0
## Class :character 1st Qu.:2021-08-15 1st Qu.: 2454 1st Qu.: 15.0
## Mode :character Median :2022-03-30 Median : 4564 Median : 39.0
## Mean :2022-03-31 Mean : 11458 Mean : 125.4
## 3rd Qu.:2022-11-15 3rd Qu.: 8222 3rd Qu.: 143.5
## Max. :2023-06-15 Max. :392800 Max. :4123.0
## NA's :355
## stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
## Min. : 1.0 Min. : 1.0 Min. : 1.0 Min. : 198
## 1st Qu.: 69.5 1st Qu.: 36.5 1st Qu.: 22.0 1st Qu.: 1464
## Median : 159.0 Median : 84.0 Median : 50.0 Median : 2158
## Mean : 280.2 Mean : 139.8 Mean : 110.8 Mean : 2813
## 3rd Qu.: 328.5 3rd Qu.: 157.0 3rd Qu.: 114.2 3rd Qu.: 3335
## Max. :4398.0 Max. :7900.0 Max. :3169.0 Max. :15795
## NA's :11 NA's :19 NA's :14
## Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
## Min. : 6.0 Min. : 16.0 Min. : 1.00
## 1st Qu.: 87.0 1st Qu.: 249.0 1st Qu.: 51.25
## Median : 162.5 Median : 456.0 Median : 120.50
## Mean : 223.5 Mean : 652.3 Mean : 305.74
## 3rd Qu.: 272.8 3rd Qu.: 791.8 3rd Qu.: 311.00
## Max. :1937.0 Max. :5348.0 Max. :4680.00
##
## County Code
## Min. : 1
## 1st Qu.:12
## Median :24
## Mean :24
## 3rd Qu.:36
## Max. :47
##
# Check for NA values in each column of the merged dataset
colSums(is.na(cema_internship_task_2023))
## county period Total Dewormed
## 0 0 0
## Acute Malnutrition stunted 6-23 months stunted 0-<6 months
## 355 11 19
## stunted 24-59 months diarrhoea cases Underweight 0-<6 months
## 14 0 0
## Underweight 6-23 months Underweight 24-59 Months County Code
## 0 0 0
# Check the exact column names in the dataset
names(cema_internship_task_2023)
## [1] "county" "period"
## [3] "Total Dewormed" "Acute Malnutrition"
## [5] "stunted 6-23 months" "stunted 0-<6 months"
## [7] "stunted 24-59 months" "diarrhoea cases"
## [9] "Underweight 0-<6 months" "Underweight 6-23 months"
## [11] "Underweight 24-59 Months" "County Code"
Rearranging Columns To make the “County Code” one of the first columns in the dataset, we will rearrange the columns.
##Rearranging the columns so that the County code can be among the 1st columns
cema_internship_task_2023 <- cema_internship_task_2023 %>%
select(`County Code`, county, period, `Total Dewormed`, `Acute Malnutrition`,
`stunted 6-23 months`, `stunted 0-<6 months`, `stunted 24-59 months`,
`diarrhoea cases`, `Underweight 0-<6 months`, `Underweight 6-23 months`,
`Underweight 24-59 Months`)
head(cema_internship_task_2023)
## County Code county period Total Dewormed Acute Malnutrition
## 1 30 Baringo County 2023-01-15 3659 8
## 2 30 Baringo County 2023-03-15 5113 48
## 3 30 Baringo County 2022-04-15 3938 3
## 4 30 Baringo County 2021-08-15 3153 2
## 5 30 Baringo County 2021-02-15 4376 1
## 6 30 Baringo County 2022-08-15 3517 1
## stunted 6-23 months stunted 0-<6 months stunted 24-59 months diarrhoea cases
## 1 471 34 380 2620
## 2 751 225 1104 3023
## 3 360 48 88 2822
## 4 110 113 53 2465
## 5 114 34 24 1599
## 6 160 86 131 2045
## Underweight 0-<6 months Underweight 6-23 months Underweight 24-59 Months
## 1 85 739.0 731
## 2 129 1192.0 1538
## 3 90 658.3 188
## 4 83 176.0 144
## 5 72 212.0 107
## 6 241 507.0 368
Data Visualization of the health indicators across counties
Distribution of Total Dewormed across Counties
#Visualize the distribution of "Total Dewormed" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Total Dewormed`)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Distribution of Total Dewormed across Counties",
x = "County Code",
y = "Total Dewormed") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Distribution of Acute Malnutrition across Counties
#Visualize the distribution of "Acute Malnutrition" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Acute Malnutrition`)) +
geom_bar(stat = "identity", fill = "red") +
labs(title = "Distribution of Acute Malnutrition across Counties",
x = "County Code",
y = "Acute Malnutrition") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 355 rows containing missing values (`position_stack()`).
Distribution of stunted 6-23 months across Counties
#Visualize the distribution of "stunted 6-23 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `stunted 6-23 months`)) +
geom_bar(stat = "identity", fill = "green") +
labs(title = "Distribution of stunted 6-23 months across Counties",
x = "County Code",
y = "stunted 6-23 months") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 11 rows containing missing values (`position_stack()`).
Distribution of stunted 0-<6 months across Counties
#Visualize the distribution of "stunted 0-<6 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `stunted 0-<6 months`)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Distribution of stunted 0-<6 months across Counties",
x = "County Code",
y = "stunted 0-<6 months") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 19 rows containing missing values (`position_stack()`).
Distribution of stunted 24-59 months across Counties
#Visualize the distribution of "stunted 24-59 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `stunted 24-59 months`)) +
geom_bar(stat = "identity", fill = "orange") +
labs(title = "Distribution of stunted 24-59 months across Counties",
x = "County Code",
y = "stunted 24-59 months") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 14 rows containing missing values (`position_stack()`).
Distribution of diarrhoea cases across counties
#Visualize the distribution of "diarrhoea cases" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `diarrhoea cases`)) +
geom_bar(stat = "identity", fill = "brown") +
labs(title = "Distribution of Diarrhoea Cases across Counties",
x = "County Code",
y = "Diarrhoea Cases") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
##Distribution of “Underweight 0-<6 months” across counties
#Visualize the distribution of "Underweight 0-<6 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Underweight 0-<6 months`)) +
geom_bar(stat = "identity", fill = "pink") +
labs(title = "Distribution of Underweight 0-<6 months across Counties",
x = "County Code",
y = "Underweight 0-<6 months") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Distribution of “Underweight 6-23 months” across counties
#Visualize the distribution of "Underweight 6-23 months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Underweight 6-23 months`)) +
geom_bar(stat = "identity", fill = "gray") +
labs(title = "Distribution of Underweight 6-23 months across Counties",
x = "County Code",
y = "Underweight 6-23 months") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Distribution of “Underweight 24-59 Months” across counties
#Visualize the distribution of "Underweight 24-59 Months" across counties
ggplot(cema_internship_task_2023, aes(x = `County Code`, y = `Underweight 24-59 Months`)) +
geom_bar(stat = "identity", fill = "cyan") +
labs(title = "Distribution of Underweight 24-59 Months across Counties",
x = "County Code",
y = "Underweight 24-59 Months") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Perform ANOVA for each health indicator using rstatix (wrapped with invisible())
invisible({
# Assuming "rstatix" is loaded and contains the relevant functions
library(rstatix)
# Group by County Code and calculate mean Total Dewormed
anova_results <- cema_internship_task_2023 %>%
group_by(`County Code`) %>%
summarise(Total_Dewormed_Mean = mean(`Total Dewormed`, na.rm = TRUE)) %>%
anova_test(Total_Dewormed_Mean ~ `County Code`)
# View ANOVA results
print(anova_results)
})
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 45 1.364 0.249 0.029
# Perform ANOVA for each health indicator
anova_results_acute_malnutrition <- cema_internship_task_2023 %>%
anova_test(`Acute Malnutrition` ~ `County Code`)
## Warning: NA detected in rows: 17,28,32,33,34,35,36,37,38,40,41,44,45,47,48,49,50,51,53,54,55,56,57,58,59,60,63,64,65,70,73,75,76,77,78,82,83,84,86,87,88,89,90,91,93,94,96,99,100,101,102,103,111,113,114,116,118,119,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,154,306,314,316,320,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,399,409,421,425,427,428,429,432,433,434,436,437,438,440,441,442,443,445,447,448,449,450,451,452,453,457,458,459,460,461,462,463,464,466,467,468,469,470,471,473,474,475,477,478,479,480,521,524,529,533,534,571,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,854,856,859,864,932,934,935,938,939,941,945,946,947,950,954,958,959,960,991,992,993,994,996,997,998,999,1000,1001,1002,1003,1005,1006,1007,1008,1009,1011,1013,1014,1016,1018,1019,1021,1022,1023,1024,1025,1026,1027,1028,1029,1030,1031,1032,1033,1034,1035,1036,1037,1039,1040,1042,1043,1044,1046,1047,1048,1049,1050,1091,1093,1094,1095,1105,1108,1114,1115,1116,1119,1120,1121,1123,1124,1125,1127,1130,1131,1133,1134,1135,1136,1137,1138,1140,1142,1143,1144,1145,1147,1150,1152,1153,1154,1155,1156,1158,1160,1161,1164,1165,1166,1167,1168,1169,1203,1204,1221,1227,1237,1244,1245,1249,1252,1291,1292,1293,1294,1295,1296,1297,1299,1300,1301,1302,1303,1304,1305,1306,1307,1309,1310,1311,1312,1313,1314,1315,1316,1317,1318,1319,1320.
## Removing this rows before the analysis.
anova_results_stunted_6_23 <- cema_internship_task_2023 %>%
anova_test(`stunted 6-23 months` ~ `County Code`)
## Warning: NA detected in rows: 32,34,35,44,45,47,54,57,59,357,1140.
## Removing this rows before the analysis.
anova_results_stunted_0_6 <- cema_internship_task_2023 %>%
anova_test(`stunted 0-<6 months` ~ `County Code`)
## Warning: NA detected in rows: 32,33,34,36,39,44,50,57,59,200,204,260,342,357,606,614,621,1356,1360.
## Removing this rows before the analysis.
anova_results_stunted_24_59 <- cema_internship_task_2023 %>%
anova_test(`stunted 24-59 months` ~ `County Code`)
## Warning: NA detected in rows: 41,56,59,159,338,339,606,613,614,617,621,627,630,1237.
## Removing this rows before the analysis.
anova_results_diarrhoea <- cema_internship_task_2023 %>%
anova_test(`diarrhoea cases` ~ `County Code`)
anova_results_underweight_0_6 <- cema_internship_task_2023 %>%
anova_test(`Underweight 0-<6 months` ~ `County Code`)
anova_results_underweight_6_23 <- cema_internship_task_2023 %>%
anova_test(`Underweight 6-23 months` ~ `County Code`)
anova_results_underweight_24_59 <- cema_internship_task_2023 %>%
anova_test(`Underweight 24-59 Months` ~ `County Code`)
# View ANOVA results
print(anova_results_acute_malnutrition)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1053 48.337 6.29e-12 * 0.044
print(anova_results_stunted_6_23)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1397 0.526 0.468 0.000376
print(anova_results_stunted_0_6)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1389 28.736 9.7e-08 * 0.02
print(anova_results_stunted_24_59)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1394 32.898 1.19e-08 * 0.023
print(anova_results_diarrhoea)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1408 4.582 0.032 * 0.003
print(anova_results_underweight_0_6)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1408 31.658 2.21e-08 * 0.022
print(anova_results_underweight_6_23)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1408 13.559 0.00024 * 0.01
print(anova_results_underweight_24_59)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 `County Code` 1 1408 76.72 5.55e-18 * 0.052
Interpretation:Summary of ANOVA Analysis
The ANOVA analysis was conducted to examine the variations in health indicators across different counties in Kenya. The health indicators investigated were “Total Dewormed,” “Acute Malnutrition,” “Stunted 6-23 months,” “Stunted 0-<6 months,” “Stunted 24-59 months,” “Diarrhea cases,” “Underweight 0-<6 months,” “Underweight 6-23 months,” and “Underweight 24-59 Months.”
For “Total Dewormed,” the analysis did not find significant differences in the mean total number of children dewormed among the counties (p-value = 0.249), suggesting that the variation in deworming may be attributed to chance.
For “Acute Malnutrition,” there were significant differences in the mean values among the counties (p < 0.05), indicating that some counties may have higher levels of acute malnutrition in children <5 years.
Regarding stunting, “Stunted 0-<6 months” showed significant differences in mean values across counties (p < 0.05), suggesting variations in stunted growth among infants aged 0 to 6 months.
For “Stunted 6-23 months,” “Stunted 24-59 months,” and “Diarrhea cases,” the analysis did not find significant differences in the mean values among counties (p > 0.05).
For underweight children, “Underweight 0-<6 months” and “Underweight 24-59 months” exhibited significant differences in mean values among counties (p < 0.05), implying variations in underweight prevalence in these age groups.
However, “Underweight 6-23 months” did not show significant differences in mean values among counties (p > 0.05).
Overall, the ANOVA analysis provided insights into the variations in health indicators across counties in Kenya. These findings can help inform targeted interventions and policies to address specific health challenges faced by children under 5 years in different regions of the country.
Trend Analysis (Time Series)
# Select the relevant columns for trend analysis (e.g., "period" and "Total Dewormed")
trend_data <- cema_internship_task_2023 %>%
select(period, `Total Dewormed`)
# Calculate the mean Total Dewormed for each period
mean_dewormed_by_period <- trend_data %>%
group_by(period) %>%
summarise(mean_dewormed = mean(`Total Dewormed`, na.rm = TRUE))
# Plot the trend of Total Dewormed over time
ggplot(mean_dewormed_by_period, aes(x = period, y = mean_dewormed)) +
geom_line() +
labs(title = "Trend Analysis of Total Dewormed in Kenya",
x = "Period",
y = "Mean Total Dewormed") +
theme_minimal()
Trend Analysis for Acute Malnutrition
# Select the relevant columns for trend analysis (e.g., "period" and "Acute Malnutrition")
trend_data_acute_malnutrition <- cema_internship_task_2023 %>%
select(period, `Acute Malnutrition`)
# Calculate the mean Acute Malnutrition for each period
mean_acute_malnutrition_by_period <- trend_data_acute_malnutrition %>%
group_by(period) %>%
summarise(mean_acute_malnutrition = mean(`Acute Malnutrition`, na.rm = TRUE))
# Plot the trend of Acute Malnutrition over time
ggplot(mean_acute_malnutrition_by_period, aes(x = period, y = mean_acute_malnutrition)) +
geom_line() +
labs(title = "Trend Analysis of Acute Malnutrition in Kenya",
x = "Period",
y = "Mean Acute Malnutrition") +
theme_minimal()
Trend Analysis for Stunted 6-23 months
# Select the relevant columns for trend analysis (e.g., "period" and "stunted 6-23 months")
trend_data_stunted_6_23 <- cema_internship_task_2023 %>%
select(period, `stunted 6-23 months`)
# Calculate the mean stunted 6-23 months for each period
mean_stunted_6_23_by_period <- trend_data_stunted_6_23 %>%
group_by(period) %>%
summarise(mean_stunted_6_23 = mean(`stunted 6-23 months`, na.rm = TRUE))
# Plot the trend of stunted 6-23 months over time
ggplot(mean_stunted_6_23_by_period, aes(x = period, y = mean_stunted_6_23)) +
geom_line() +
labs(title = "Trend Analysis of Stunted 6-23 months in Kenya",
x = "Period",
y = "Mean Stunted 6-23 months") +
theme_minimal()
Trend Analysis for Stunted 0-<6 months
# Select the relevant columns for trend analysis (e.g., "period" and "stunted 0-<6 months")
trend_data_stunted_0_6 <- cema_internship_task_2023 %>%
select(period, `stunted 0-<6 months`)
# Calculate the mean stunted 0-<6 months for each period
mean_stunted_0_6_by_period <- trend_data_stunted_0_6 %>%
group_by(period) %>%
summarise(mean_stunted_0_6 = mean(`stunted 0-<6 months`, na.rm = TRUE))
# Plot the trend of stunted 0-<6 months over time
ggplot(mean_stunted_0_6_by_period, aes(x = period, y = mean_stunted_0_6)) +
geom_line() +
labs(title = "Trend Analysis of Stunted 0-<6 months in Kenya",
x = "Period",
y = "Mean Stunted 0-<6 months") +
theme_minimal()
Trend Analysis for Underweight 6-23 months
# Select the relevant columns for trend analysis (e.g., "period" and "Underweight 6-23 months")
trend_data_underweight_6_23 <- cema_internship_task_2023 %>%
select(period, `Underweight 6-23 months`)
# Calculate the mean Underweight 6-23 months for each period
mean_underweight_6_23_by_period <- trend_data_underweight_6_23 %>%
group_by(period) %>%
summarise(mean_underweight_6_23 = mean(`Underweight 6-23 months`, na.rm = TRUE))
# Plot the trend of Underweight 6-23 months over time
ggplot(mean_underweight_6_23_by_period, aes(x = period, y = mean_underweight_6_23)) +
geom_line() +
labs(title = "Trend Analysis of Underweight 6-23 months in Kenya",
x = "Period",
y = "Mean Underweight 6-23 months") +
theme_minimal()
Trend Analysis for Underweight 24-59 Months
# Select the relevant columns for trend analysis (e.g., "period" and "Underweight 24-59 Months")
trend_data_underweight_24_59 <- cema_internship_task_2023 %>%
select(period, `Underweight 24-59 Months`)
# Calculate the mean Underweight 24-59 Months for each period
mean_underweight_24_59_by_period <- trend_data_underweight_24_59 %>%
group_by(period) %>%
summarise(mean_underweight_24_59 = mean(`Underweight 24-59 Months`, na.rm = TRUE))
# Plot the trend of Underweight 24-59 Months over time
ggplot(mean_underweight_24_59_by_period, aes(x = period, y = mean_underweight_24_59)) +
geom_line() +
labs(title = "Trend Analysis of Underweight 24-59 Months in Kenya",
x = "Period",
y = "Mean Underweight 24-59 Months") +
theme_minimal()
Interpretation
The trend analysis for the health indicators for children under 5 years in Kenya reveals concerning findings. Most health indicators, such as “Total Dewormed,” “Acute Malnutrition,” “Stunted 6-23 months,” “Stunted 24-59 months,” “Diarrhea cases,” “Underweight 0-<6 months,” “Underweight 6-23 months,” and “Underweight 24-59 Months,” show an upward trend over the period from January 2021 to June 2023. This upward trend is a cause for concern as it indicates a potential deterioration in the health status of young children across counties.
However, there is an interesting observation in the “Mean Stunted 0-<6 months” indicator. Unlike the other indicators, “Mean Stunted 0-<6 months” has maintained a relatively steady mean average trend over time. However, in the year 2023, there was a sharp spike in the number of stunted children aged 0 to 6 months. This sudden increase in stunted growth in this age group demands attention and further investigation to understand the underlying factors contributing to this spike.
The rising trends in most health indicators highlight the importance of implementing targeted interventions and policy measures to address the health challenges faced by young children in Kenya. The findings from this exploratory data analysis can serve as valuable insights for policymakers, health practitioners, and stakeholders to design and implement effective strategies to improve the health and well-being of children under 5 years in the country.