The data set we are working on today is the accident record data by country from 1990-2019 published by the Global burden of Disease Collaborative Network in 2019. They collected the total number of deaths from road traffic incidents, including vehicle drivers or passengers, motorcyclists, cyclists and pedestrians. This data set have 6 variables : * Entity- the name of the countries * Code- the code for each country * Year- * Death-the number of death * Sidednes- * Historical-Population: The total population.
To analyse this data set, we are going to compare evolution over the years of the deaths for five selected African country Cameroon , Nigeria, Kenya, cote Ivoire and Senegal
Load the data set and the database
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 8010 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (4): Year, Deaths, Sidedness, Historical_Population
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
lets clean our database
let’s remove all the NA from the variable “death” and “Historical_population”
lets create a new table to include the for each of those 5 countries (Cameroon, Nigeria, Kenya, cote Ivoire, Senegal) and group by the average deaths and the average historical population
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'Entity'. You can override using the
`.groups` argument.
ggplot(by_death, aes(year, Entity, colour = Entity))+geom_point(aes(size = Deaths), alpha =1) +scale_color_manual(values =c("Senegal"="#8a0707", "Nigeria"="#7d1fcf","Kenya"="#d1135c","Cote d'Ivoire"="#ff5e00", "Cameroon"="#327829"))+scale_size_area() +theme_minimal() +labs(x ="Year",y ="Countries",size ="Number of death",caption ="Global Burden of Disease Study 2019",title ="Road traffic Death evolution per country from 1990 to 2020")
To clean this data set, I used the is.NA function to remove all the NA present in my Data set. I also filtered the data set to only keep the variable I needed. This visualization represent the evolution of the road traffic death person for five African countries we selected, Kenya, Cameroon, cote Ivoire, Senegal and Nigeria from 1990 to 2020. As we can see on the visualization, for most country, we can observe an evolution of death related to road traffic except from one Senegal where the average death is quite stable over the years. For this data set, I wish I could have showed at the same time the average death and the average historical population , to compare the evolution of both variable over the years, using a kind of graph with curves.