In this assignment, we will practice collaborating around a code project with Github as a class.

Using several TidyVerse packages, and the bad-drivers dataset from fivethirtyeight.com, I’m going to create a programming sample “vignette” that demonstrates how to use the capabilities of ggplot2, dplyr, readr packages with the bad-drivers dataset.

Load TidyVerse

library(tidyverse)
## -- Attaching packages ----------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0       v purrr   0.2.5  
## v tibble  2.1.1       v dplyr   0.8.0.1
## v tidyr   0.8.2       v stringr 1.3.1  
## v readr   1.3.1       v forcats 0.3.0
## -- Conflicts -------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Using readr to read data from a csv file

drivers <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   `Number of drivers involved in fatal collisions per billion miles` = col_double(),
##   `Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding` = col_double(),
##   `Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired` = col_double(),
##   `Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted` = col_double(),
##   `Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents` = col_double(),
##   `Car Insurance Premiums ($)` = col_double(),
##   `Losses incurred by insurance companies for collisions per insured driver ($)` = col_double()
## )
head(drivers)
## # A tibble: 6 x 8
##   State `Number of driv~ `Percentage Of ~ `Percentage Of ~ `Percentage Of ~
##   <chr>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 Alab~             18.8               39               30               96
## 2 Alas~             18.1               41               25               90
## 3 Ariz~             18.6               35               28               84
## 4 Arka~             22.4               18               26               94
## 5 Cali~             12                 35               28               91
## 6 Colo~             13.6               37               28               79
## # ... with 3 more variables: `Percentage Of Drivers Involved In Fatal
## #   Collisions Who Had Not Been Involved In Any Previous Accidents` <dbl>,
## #   `Car Insurance Premiums ($)` <dbl>, `Losses incurred by insurance
## #   companies for collisions per insured driver ($)` <dbl>

Using ggplot2 to visualizae data; with pipe operation %>% from dplyr

drivers %>% ggplot(aes(x=reorder(State, -`Car Insurance Premiums ($)`), y=`Car Insurance Premiums ($)`, fill=State)) + 
  geom_bar(stat = "identity") + 
  guides(fill = FALSE) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))