According to Britannica; Case fatality rate is the proportion of people who die from a specified disease among all individuals diagnosed with the disease over a certain period of time. Case fatality rate is used in this study to measure the severity of diseases in teenagers and children resident in Nigeria.
To determine the most dangerous diseases to teenagers and children.
The Disease Outbreak in Nigeria Datasets is a public dataset simulated and collated by Emmanuel Odelami, this dataset is auto-generated based on the most common and deadly disease outbreaks in Nigeria. This dataset contains disease reports from 2009 to 2018 from all 36 states from both urban and rural settlements detailing their ages and diseases respectively.
The installed packages to be used are loaded
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.5
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(readr)
## Warning: package 'readr' was built under R version 4.0.5
library(ggplot2)
disease_outbreak <- read.csv("meningitis_dataset.csv")
The data is explored to get a grasp of the structure of the dataset
tibble(disease_outbreak)
## # A tibble: 284,484 x 40
## id surname firstn~1 middl~2 gender gende~3 gende~4 state settl~5 rural~6
## <int> <chr> <chr> <chr> <chr> <int> <int> <chr> <chr> <int>
## 1 1 Solade Grace Solape Female 0 1 Rive~ Rural 1
## 2 2 Eneche Kure Balogun Male 1 0 Ebon~ Rural 1
## 3 3 Sanusi Adaugo Kateri~ Female 0 1 Ogun Urban 0
## 4 4 Sowore Mooslem~ Ifedayo Female 0 1 Ondo Rural 1
## 5 5 Abdusalam Yusuf Okafor Male 1 0 Oyo Urban 0
## 6 6 Yakubu Janet Chioma Female 0 1 Kadu~ Rural 1
## 7 7 Razak Adaugo Adaobi Female 0 1 Tara~ Rural 1
## 8 8 Annakyi Danmbaz~ Osagie Male 1 0 Kats~ Rural 1
## 9 9 Adejoro Iyin Osatim~ Male 1 0 Kats~ Rural 1
## 10 10 Okorie Adaugo Chika Female 0 1 Osun Urban 0
## # ... with 284,474 more rows, 30 more variables: urban_settlement <int>,
## # report_date <chr>, report_year <int>, age <int>, age_str <chr>,
## # date_of_birth <chr>, child_group <int>, adult_group <int>, disease <chr>,
## # cholera <int>, diarrhoea <int>, measles <int>,
## # viral_haemmorrhaphic_fever <int>, meningitis <int>, ebola <int>,
## # marburg_virus <int>, yellow_fever <int>, rubella_mars <int>, malaria <int>,
## # serotype <chr>, NmA <int>, NmC <int>, NmW <int>, health_status <chr>, ...
## # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
colnames(disease_outbreak)
## [1] "id" "surname"
## [3] "firstname" "middlename"
## [5] "gender" "gender_male"
## [7] "gender_female" "state"
## [9] "settlement" "rural_settlement"
## [11] "urban_settlement" "report_date"
## [13] "report_year" "age"
## [15] "age_str" "date_of_birth"
## [17] "child_group" "adult_group"
## [19] "disease" "cholera"
## [21] "diarrhoea" "measles"
## [23] "viral_haemmorrhaphic_fever" "meningitis"
## [25] "ebola" "marburg_virus"
## [27] "yellow_fever" "rubella_mars"
## [29] "malaria" "serotype"
## [31] "NmA" "NmC"
## [33] "NmW" "health_status"
## [35] "alive" "dead"
## [37] "report_outcome" "unconfirmed"
## [39] "confirmed" "null_serotype"
disease_outbreak <- distinct(disease_outbreak)
tibble(disease_outbreak)
## # A tibble: 284,484 x 40
## id surname firstn~1 middl~2 gender gende~3 gende~4 state settl~5 rural~6
## <int> <chr> <chr> <chr> <chr> <int> <int> <chr> <chr> <int>
## 1 1 Solade Grace Solape Female 0 1 Rive~ Rural 1
## 2 2 Eneche Kure Balogun Male 1 0 Ebon~ Rural 1
## 3 3 Sanusi Adaugo Kateri~ Female 0 1 Ogun Urban 0
## 4 4 Sowore Mooslem~ Ifedayo Female 0 1 Ondo Rural 1
## 5 5 Abdusalam Yusuf Okafor Male 1 0 Oyo Urban 0
## 6 6 Yakubu Janet Chioma Female 0 1 Kadu~ Rural 1
## 7 7 Razak Adaugo Adaobi Female 0 1 Tara~ Rural 1
## 8 8 Annakyi Danmbaz~ Osagie Male 1 0 Kats~ Rural 1
## 9 9 Adejoro Iyin Osatim~ Male 1 0 Kats~ Rural 1
## 10 10 Okorie Adaugo Chika Female 0 1 Osun Urban 0
## # ... with 284,474 more rows, 30 more variables: urban_settlement <int>,
## # report_date <chr>, report_year <int>, age <int>, age_str <chr>,
## # date_of_birth <chr>, child_group <int>, adult_group <int>, disease <chr>,
## # cholera <int>, diarrhoea <int>, measles <int>,
## # viral_haemmorrhaphic_fever <int>, meningitis <int>, ebola <int>,
## # marburg_virus <int>, yellow_fever <int>, rubella_mars <int>, malaria <int>,
## # serotype <chr>, NmA <int>, NmC <int>, NmW <int>, health_status <chr>, ...
## # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
The dataset contains a lot of redundant information so the select() and filter() functions are used to reduce the redundancy; the necessary criteria are selected and only confirmed reports from teenagers and children are included
disease_outbreak <- disease_outbreak %>%
select(c(id,firstname,surname,gender,age,disease,settlement,state,report_year,health_status,report_outcome))%>%
filter(age < 20)%>%
filter(report_outcome == "Confirmed" | report_outcome == "confirmed")
tibble(disease_outbreak)
## # A tibble: 45,100 x 11
## id firstname surname gender age disease settl~1 state repor~2 healt~3
## <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <int> <chr>
## 1 5 Yusuf Abdusal~ Male 9 Rubell~ Urban Oyo 2017 Alive
## 2 10 Adaugo Okorie Female 15 Marbur~ Urban Osun 2014 Alive
## 3 13 Christopher Folawiyo Male 4 Measles Urban Adam~ 2012 Dead
## 4 22 Adaugo Eleojo Female 14 Malaria Urban Rive~ 2011 Alive
## 5 28 Jane Egar Female 7 Viral ~ Urban Kwara 2016 Dead
## 6 30 Caroline Isa Female 7 Marbur~ Urban Yobe 2014 Alive
## 7 33 Paulina Igbonubi Female 2 Viral ~ Urban Kogi 2017 Alive
## 8 42 Alexandria Quayum Female 17 Malaria Urban Oyo 2011 Alive
## 9 45 Christianah Chima Female 3 Yellow~ Rural Osun 2012 Dead
## 10 56 Alexandria Ileri Female 5 Rubell~ Urban Jiga~ 2017 Alive
## # ... with 45,090 more rows, 1 more variable: report_outcome <chr>, and
## # abbreviated variable names 1: settlement, 2: report_year, 3: health_status
## # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Here the dataset is split into the various settlements
#Urban Vs Rural
urban <- disease_outbreak%>%
filter(settlement == "Urban")
rural <- disease_outbreak%>%
filter(settlement == "Rural")
Finally, the dataset is then split by the gender
#Gender
male <- disease_outbreak%>%
filter(gender == "Male")
female <- disease_outbreak%>%
filter(gender == "Female")
#Health Status
ggplot(disease_outbreak, aes(x = health_status, fill = health_status)) + geom_bar() + guides(fill = guide_legend(title = "Health Status")) + scale_fill_brewer(palette = "Dark2") + labs(title = "Health Status Case Spread", x = "Health Status", y = "Number of Cases", subtitle = "A bar chart indicating the total number of cases and their health status")
#Gender
ggplot(disease_outbreak, aes(x = gender, fill = gender)) + geom_bar() + scale_fill_brewer(palette = "Dark2") + labs(title = "Gender Case Spread", x = "Gender", y = "Cases", subtitle = "A bar chart depicting the cases seperated by each Gender") + guides(fill = guide_legend(title = "Gender"))
#Settlements
ggplot(disease_outbreak, aes(x = settlement, fill = settlement)) + geom_bar() + scale_fill_brewer(palette = "Dark2") + labs(title = "Settlement Case Spread", x = "Settlement", y = "Cases", subtitle = "A bar chart depicting the cases and settlements") + guides(fill = guide_legend(title = "Settlement"))
#Plotting Yearly Reports
chart <- disease_outbreak%>%
group_by(report_year)%>%
count(report_year)
ggplot(chart, aes(x = report_year, y = n)) + geom_point()+ geom_smooth() + labs(title = "Years Vs Cases I", x = "Years", y = "Number of Cases", subtitle = "A scatterplot depicting the Yearly Cases")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(disease_outbreak, aes(x = report_year, fill = health_status)) + geom_bar() + labs(title = "Years Vs Cases II", x = "Years", y = "Number of Cases", subtitle = "A stacked bar chart depicting the Yearly Cases and Health Status") + guides(fill = guide_legend(title = "Health Status")) + scale_fill_brewer(palette = "Dark2")
ggplot(disease_outbreak, aes(y = disease, fill = disease)) + geom_bar() + scale_fill_brewer(palette = "RdYlGn") + labs(title = "Most Common Diseases", x = "Number of Cases", y = "Disease", subtitle = "A bar chart of the most common diseases for Teenages and Children") + guides(fill = guide_legend(title = "Disease"))
ggplot(disease_outbreak, aes(y = disease, fill = disease)) + geom_bar() + scale_fill_brewer(palette = "RdYlGn") + labs(title = "Diseases Vs Health Status", x = "Number of Cases", y = "Disease", subtitle = "A bar chart depicting the health status of each disease") + guides(fill = guide_legend(title = "Disease")) + facet_wrap(~health_status) + theme(axis.text.x = element_text(angle = 90))
Listed Below are the diseases with the highest Case fatality rates;
ggplot(male, aes(y = disease, fill = disease)) + geom_bar() + scale_fill_brewer(palette = "RdYlGn") + guides(fill = guide_legend(title = "Disease")) + labs(x = "Number of Cases", y = "Disease", title = "Male Disease Reports", subtitle = "A bar chart depicting the reported cases for males", caption = "The cases are split by their health status") + facet_wrap(~health_status)
ggplot(female, aes(y = disease, fill = disease)) + geom_bar() + scale_fill_brewer(palette = "RdYlGn") + guides(fill = guide_legend(title = "Disease")) + labs(x = "Number of Cases", y = "Disease", title = "Female Disease Reports", subtitle = "A bar chart depicting the reported cases for females", caption = "The cases are split by their health status") + facet_wrap(~health_status)
ggplot(urban, aes(y = disease, fill = disease)) + geom_bar() + scale_fill_brewer(palette = "RdYlGn") + facet_wrap(~health_status) + guides(fill = guide_legend(title = "Disease")) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Urban Disease Reports", x = "Number of Cases", y = "Disease")
ggplot(rural, aes(y = disease, fill = disease)) + geom_bar() + scale_fill_brewer(palette = "RdYlGn") + facet_wrap(~health_status) + guides(fill = guide_legend(title = "Disease")) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Rural Disease Reports", x = "Number of Cases", y = "Disease")
As Diarrhea was the most common disease and also the most fatal disease;
Other diseases have a case fatality rate greater than 45%, which is far from ideal, hence further analysis would be required to determine;
To help create a campaign and strategy to reduce the case fatality rates to an acceptable threshold.
Thank You! for your time.