Inspired by this Tweet: https://twitter.com/hspter/status/1134098154167496704
I laughed, but I did not think that this was possible. I imagined it was just playing into my acknowledged stereotype that Germany and Switzerland are impressively rule following and have a special relationship with their vehicles.
Did a quick Google search: rates of seat belt use Germany.
Found this: https://www.who.int/violence_injury_prevention/road_safety_status/2013/country_profiles/germany.pdf?ua=1
So a quick scan of the datasheet says that 98% of people in the front seat use seat belts. Seems pretty great, but then I wanted to know how it all compares.
So I decided to do a #чистый_четвер! (That’s a clean Thursday in Russian as a play off of #TidyTuesday)
My libraries:
rr library(tidyverse) library(ggrepel) library(jsonlite)
Grab the JSON from the WHO site: (My first time getting a JSON independently!!!!)
Notes:
fromJSON(txt, simplifyVector = TRUE, simplifyDataFrame = simplifyVector, simplifyMatrix = simplifyVector, flatten = FALSE, …)
rr seatbelt_who <- fromJSON(://apps.who.int/gho/athena/data/GHO/RS_212.json?filter=COUNTRY:;SEATTYPE:)
Take a peak to make sure I imported correctly.
rr str(seatbelt_who)
List of 5
$ copyright: chr \(c) World Health Organization\
$ dataset :'data.frame': 1 obs. of 2 variables:
..$ label : chr \RSGSR\
..$ display: chr \ROAD_SAFETY_GLOBAL_STATUS_REPORT\
$ attribute:'data.frame': 24 obs. of 2 variables:
..$ label : chr [1:24] \DS\ \FIPS\ \IOC\ \ISO2\ ...
..$ display: chr [1:24] \DS\ \FIPS\ \IOC\ \ISO2\ ...
$ dimension:'data.frame': 6 obs. of 4 variables:
..$ label : chr [1:6] \GHO\ \PUBLISHSTATE\ \YEAR\ \REGION\ ...
..$ display : chr [1:6] \Indicator\ \PUBLISH STATES\ \Year\ \WHO region\ ...
..$ isMeasure: logi [1:6] TRUE FALSE FALSE FALSE FALSE FALSE
..$ code :List of 6
.. ..$ :'data.frame': 1 obs. of 5 variables:
.. .. ..$ label : chr \RS_212\
.. .. ..$ display : chr \Seat-belt wearing rate (%)\
.. .. ..$ display_sequence: int 73
.. .. ..$ url : chr \http://apps.who.int/gho/indicatorregistry/App_Main/view_indicator.aspx?iid=212\
.. .. ..$ attr :List of 1
.. .. .. ..$ :'data.frame': 2 obs. of 2 variables:
.. .. .. .. ..$ category: chr [1:2] \CATEGORY\ \DEFINITION_XML\
.. .. .. .. ..$ value : chr [1:2] \Injuries and violence\ \http://apps.who.int/gho/indicatorregistryservice/publicapiservice.asmx/IndicatorGetAsXml?profileCode=WHO&applic\| __truncated__
.. ..$ :'data.frame': 1 obs. of 5 variables:
.. .. ..$ label : chr \PUBLISHED\
.. .. ..$ display : chr \Published\
.. .. ..$ display_sequence: int 0
.. .. ..$ url : chr \\
.. .. ..$ attr :List of 1
.. .. .. ..$ : list()
.. ..$ :'data.frame': 1 obs. of 5 variables:
.. .. ..$ label : chr \2013\
.. .. ..$ display : chr \2013\
.. .. ..$ display_sequence: int 79867987
.. .. ..$ url : chr \\
.. .. ..$ attr :List of 1
.. .. .. ..$ : list()
.. ..$ :'data.frame': 6 obs. of 5 variables:
.. .. ..$ label : chr [1:6] \AFR\ \AMR\ \SEAR\ \EUR\ ...
.. .. ..$ display : chr [1:6] \Africa\ \Americas\ \South-East Asia\ \Europe\ ...
.. .. ..$ display_sequence: int [1:6] 10 20 30 40 50 60
.. .. ..$ url : chr [1:6] \\ \\ \\ \\ ...
.. .. ..$ attr :List of 6
.. .. .. ..$ : list()
.. .. .. ..$ : list()
.. .. .. ..$ : list()
.. .. .. ..$ : list()
.. .. .. ..$ : list()
.. .. .. ..$ : list()
.. ..$ :'data.frame': 179 obs. of 5 variables:
.. .. ..$ label : chr [1:179] \AFG\ \ALB\ \DZA\ \AND\ ...
.. .. ..$ display : chr [1:179] \Afghanistan\ \Albania\ \Algeria\ \Andorra\ ...
.. .. ..$ display_sequence: int [1:179] 10 20 30 40 50 60 70 80 90 100 ...
.. .. ..$ url : chr [1:179] \\ \\ \\ \\ ...
.. .. ..$ attr :List of 179
.. .. .. ..$ :'data.frame': 22 obs. of 2 variables:
.. .. .. .. ..$ category: chr [1:22] \WORLD_BANK_INCOME_GROUP_GNI_REFERENCE_YEAR\ \WORLD_BANK_INCOME_GROUP_RELEASE_DATE\ \WHO_REGION\ \WORLD_BANK_INCOME_GROUP\ ...
.. .. .. .. ..$ value : chr [1:22] \2017\ \2018\ \Eastern Mediterranean\ \Low income\ ...
.. .. .. ..$ :'data.frame': 22 obs. of 2 variables:
.. .. .. .. ..$ category: chr [1:22] \WORLD_BANK_INCOME_GROUP_GNI_REFERENCE_YEAR\ \WORLD_BANK_INCOME_GROUP_RELEASE_DATE\ \WHO_REGION\ \WORLD_BANK_INCOME_GROUP\ ...
.. .. .. .. ..$ value : chr [1:22] \2017\ \2018\ \Europe\ \Upper middle income\ ...
.. .. .. ..$ :'data.frame': 22 obs. of 2 variables:
.. .. .. .. ..$ category: chr [1:22] \WORLD_BANK_INCOME_GROUP_GNI_REFERENCE_YEAR\ \WORLD_BANK_INCOME_GROUP_RELEASE_DATE\ \LAND_AREA_KMSQ_2012\ \LANGUAGES_EN_2012\ ...
.. .. .. .. ..$ value : chr [1:22] \2017\ \2018\ \2
Goodness that looks crazy. Seems like there are nested datasets. I’ll try the the simplifed Json to see if I can get started faster.
rr simple_sb <- fromJSON(://apps.who.int/gho/athena/data/GHO/RS_212.json?profile=simple&filter=COUNTRY:;SEATTYPE:)
Check that out.
rr str(simple_sb)
List of 2
$ dimension:'data.frame': 6 obs. of 2 variables:
..$ label : chr [1:6] \GHO\ \PUBLISHSTATE\ \YEAR\ \REGION\ ...
..$ display: chr [1:6] \Indicator\ \PUBLISH STATES\ \Year\ \WHO region\ ...
$ fact :'data.frame': 716 obs. of 3 variables:
..$ dim :'data.frame': 716 obs. of 6 variables:
.. ..$ PUBLISHSTATE: chr [1:716] \Published\ \Published\ \Published\ \Published\ ...
.. ..$ COUNTRY : chr [1:716] \Mauritius\ \Serbia\ \Ecuador\ \Ghana\ ...
.. ..$ YEAR : chr [1:716] \2013\ \2013\ \2013\ \2013\ ...
.. ..$ SEATTYPE : chr [1:716] \Rear seat\ \Rear seat\ \Rear seat\ \Front seat\ ...
.. ..$ REGION : chr [1:716] \Africa\ \Europe\ \Americas\ \Africa\ ...
.. ..$ GHO : chr [1:716] \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ ...
..$ Value : chr [1:716] \0.2\ \3.1\ \3.4\ \4.9\ ...
..$ Comments: chr [1:716] NA NA NA NA ...
AHHH! I just want to do some stuff.
rr str(simple_sb$fact)
'data.frame': 716 obs. of 3 variables:
$ dim :'data.frame': 716 obs. of 6 variables:
..$ PUBLISHSTATE: chr \Published\ \Published\ \Published\ \Published\ ...
..$ COUNTRY : chr \Mauritius\ \Serbia\ \Ecuador\ \Ghana\ ...
..$ YEAR : chr \2013\ \2013\ \2013\ \2013\ ...
..$ SEATTYPE : chr \Rear seat\ \Rear seat\ \Rear seat\ \Front seat\ ...
..$ REGION : chr \Africa\ \Europe\ \Americas\ \Africa\ ...
..$ GHO : chr \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ ...
$ Value : chr \0.2\ \3.1\ \3.4\ \4.9\ ...
$ Comments: chr NA NA NA NA ...
Omg that worked.
rr sb_facts <- simple_sb$fact str(sb_facts)
'data.frame': 716 obs. of 3 variables:
$ dim :'data.frame': 716 obs. of 6 variables:
..$ PUBLISHSTATE: chr \Published\ \Published\ \Published\ \Published\ ...
..$ COUNTRY : chr \Mauritius\ \Serbia\ \Ecuador\ \Ghana\ ...
..$ YEAR : chr \2013\ \2013\ \2013\ \2013\ ...
..$ SEATTYPE : chr \Rear seat\ \Rear seat\ \Rear seat\ \Front seat\ ...
..$ REGION : chr \Africa\ \Europe\ \Americas\ \Africa\ ...
..$ GHO : chr \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ ...
$ Value : chr \0.2\ \3.1\ \3.4\ \4.9\ ...
$ Comments: chr NA NA NA NA ...
Hyperventilating from excitment!!!
names(sb_facts)
[1] "dim" "Value" "Comments"
Very confused because when I look at the dataframe it seems to have many columns but only these list and then the data are described as having 3 variables. I think I am missing something about how the data are organized. Gonna try to clean some of it up with janitor.
rr library(janitor) clean_names(sb_facts) r names(sb_facts)
[1] \dim\ \Value\ \Comments\
Still not what I expected. I did see something about simplify and flatten. Maybe that means something I want.
rr seatbelt_flatten <- fromJSON(://apps.who.int/gho/athena/data/GHO/RS_212.json?profile=simple&filter=COUNTRY:;SEATTYPE:, flatten = TRUE)
str(seatbelt_flatten$fact)
'data.frame': 716 obs. of 8 variables:
$ Value : chr \0.2\ \3.1\ \3.4\ \4.9\ ...
$ Comments : chr NA NA NA NA ...
$ dim.PUBLISHSTATE: chr \Published\ \Published\ \Published\ \Published\ ...
$ dim.COUNTRY : chr \Mauritius\ \Serbia\ \Ecuador\ \Ghana\ ...
$ dim.YEAR : chr \2013\ \2013\ \2013\ \2013\ ...
$ dim.SEATTYPE : chr \Rear seat\ \Rear seat\ \Rear seat\ \Front seat\ ...
$ dim.REGION : chr \Africa\ \Europe\ \Americas\ \Africa\ ...
$ dim.GHO : chr \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ ...
Ok still a bit confused but seemingly having 2 diminesion but better. Seems like I have 8 variables now in $fact part. Going to rename to make easier and attempt to clean.
rr seatbelt_1 <- seatbelt_flatten$fact
str(seatbelt_1)
'data.frame': 716 obs. of 8 variables:
$ Value : chr \0.2\ \3.1\ \3.4\ \4.9\ ...
$ Comments : chr NA NA NA NA ...
$ dim.PUBLISHSTATE: chr \Published\ \Published\ \Published\ \Published\ ...
$ dim.COUNTRY : chr \Mauritius\ \Serbia\ \Ecuador\ \Ghana\ ...
$ dim.YEAR : chr \2013\ \2013\ \2013\ \2013\ ...
$ dim.SEATTYPE : chr \Rear seat\ \Rear seat\ \Rear seat\ \Front seat\ ...
$ dim.REGION : chr \Africa\ \Europe\ \Americas\ \Africa\ ...
$ dim.GHO : chr \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ ...
rr seatbelt_1 <- clean_names(seatbelt_1)
rr colnames(seatbelt_1)
[1] \value\ \comments\
[3] \dim_publishstate\ \dim_country\
[5] \dim_year\ \dim_seattype\
[7] \dim_region\ \dim_gho\
Great. Names look better. But everything looks like a chr. I am going to coerece some cols. … I fell down a rabbit hole. I read some stuff on stackoverflow https://stackoverflow.com/questions/22772279/converting-multiple-columns-from-character-to-numeric-format-in-r
But also as totally unrelated to this analysis commentary. I like stackoverflow but I get anxious reading the comments sometime. The comments can be sooo snippy: https://stackoverflow.com/a/53857448/11484875 . Doesn’t seem very welcoming and I don’t have the confidence to post there, yet. I am so thankful for everyone that does though!
Ok back to analysis
This looks promising: df %>% mutate_at(‘x1’,as.numeric) %>% str()
Also lots about lapply
data[] <- lapply(data, function(x) type.convert(as.character(x), as.is = TRUE)) #change all vars to their best fitting data type
rr lapply(seatbelt_1, class)
$value
[1] \character\
$comments
[1] \character\
$dim_publishstate
[1] \character\
$dim_country
[1] \character\
$dim_year
[1] \character\
$dim_seattype
[1] \character\
$dim_region
[1] \character\
$dim_gho
[1] \character\
rr seatbelt_convert_1 <- lapply(seatbelt_1, function(x) type.convert(as.character(x), as.is = TRUE)) #change all vars to their best fitting data type str(seatbelt_convert_1)
List of 8
$ value : chr [1:716] \0.2\ \3.1\ \3.4\ \4.9\ ...
$ comments : chr [1:716] NA NA NA NA ...
$ dim_publishstate: chr [1:716] \Published\ \Published\ \Published\ \Published\ ...
$ dim_country : chr [1:716] \Mauritius\ \Serbia\ \Ecuador\ \Ghana\ ...
$ dim_year : int [1:716] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
$ dim_seattype : chr [1:716] \Rear seat\ \Rear seat\ \Rear seat\ \Front seat\ ...
$ dim_region : chr [1:716] \Africa\ \Europe\ \Americas\ \Africa\ ...
$ dim_gho : chr [1:716] \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ \Seat-belt wearing rate (%)\ ...
Nope
Try again # by specific columns: df %>% mutate_at(vars(x, y, z), ~as.numeric(as.character(.)))
seatbelt_convert_2 <- seatbelt_1 %>% mutate_at(vars(1,5), ~as.numeric(as.character(.))) %>% mutate_at(vars(3, 6, 7, 8), ~as.factor(as.character(.)))
NAs introduced by coercion
str(seatbelt_convert_2)
'data.frame': 716 obs. of 8 variables:
$ value : num 0.2 3.1 3.4 4.9 12.5 12.6 13.4 13.7 17.5 17.6 ...
$ comments : chr NA NA NA NA ...
$ dim_publishstate: Factor w/ 1 level "Published": 1 1 1 1 1 1 1 1 1 1 ...
$ dim_country : chr "Mauritius" "Serbia" "Ecuador" "Ghana" ...
$ dim_year : num 2013 2013 2013 2013 2013 ...
$ dim_seattype : Factor w/ 4 levels "All occupants",..: 4 4 4 3 4 4 4 2 4 2 ...
$ dim_region : Factor w/ 6 levels "Africa","Americas",..: 1 4 2 1 6 2 4 3 4 1 ...
$ dim_gho : Factor w/ 1 level "Seat-belt wearing rate (%)": 1 1 1 1 1 1 1 1 1 1 ...
Looks like that worked!!!
Okay these seems great. I don’t quite understand some of the piping behavior, but moving forward!!!
Time to plot!!! Lets look at counties in europe. still a lot, so just front seat rates and then exclude ones with NA
europe_seatbelt <- filter(seatbelt_convert_2, dim_region=="Europe")
europe_seatbelt_front <- filter(europe_seatbelt, dim_seattype =="Front seat")
europe_seatbelt_final <- filter(europe_seatbelt_front, !is.na(value))
europe_graph <- ggplot(europe_seatbelt_final) + aes(x = reorder(dim_country, value)) + aes(y = value) + geom_point()
europe_graph
I am so pleased!! I did not know about reorder until now!!!! I knew it must exist but hadn’t used it!
titles <- labs(title = "Who has their seatbelt on in Europe? (2013)", subtitle = "Exploring seatbelt use after falling into a Tweethole by @MaraAlexeev", x = "", y = "Front Seatbelt Use % -- Red line is 95%", caption = "Data from WHO")
europe_graph + titles
Ok to fix the country names and add color
europe_graph + titles + coord_flip() + scale_color_gradient() +
theme_classic() + geom_hline(yintercept=95, linetype="dashed", color = "red") + scale_y_continuous(position = "right") + scale_x_discrete(position = "top")
Omg. THe data looks like a buckled seat belt!!!!!!!!