We are going to explore this dataset that includes information from 1983 - 1997 on seat belt useage rate and fatalities. My assumption is that as more states enforce seatbelt laws, then the fatalities also decreases.

Import data from URL for BONUS

seatbelt_data <- read.csv('https://raw.githubusercontent.com/wilsonvetdev/R_Bridge/main/USSeatBelts.csv')
head(seatbelt_data)
##   X state year miles fatalities seatbelt speed65 speed70 drinkage alcohol
## 1 1    AK 1983  3358 0.04466945       NA      no      no      yes      no
## 2 2    AK 1984  3589 0.03733630       NA      no      no      yes      no
## 3 3    AK 1985  3840 0.03307291       NA      no      no      yes      no
## 4 4    AK 1986  4008 0.02519960       NA      no      no      yes      no
## 5 5    AK 1987  3900 0.01948718       NA      no      no      yes      no
## 6 6    AK 1988  3841 0.02525384       NA      no      no      yes      no
##   income      age enforce
## 1  17973 28.23497      no
## 2  18093 28.34354      no
## 3  18925 28.37282      no
## 4  18466 28.39665      no
## 5  18021 28.45325      no
## 6  18447 28.85142      no

Quick glance of the data set.

summary(seatbelt_data)
##        X          state                year          miles       
##  Min.   :  1   Length:765         Min.   :1983   Min.   :  3099  
##  1st Qu.:192   Class :character   1st Qu.:1986   1st Qu.: 11401  
##  Median :383   Mode  :character   Median :1990   Median : 30319  
##  Mean   :383                      Mean   :1990   Mean   : 41448  
##  3rd Qu.:574                      3rd Qu.:1994   3rd Qu.: 52312  
##  Max.   :765                      Max.   :1997   Max.   :285612  
##                                                                  
##    fatalities          seatbelt        speed65            speed70         
##  Min.   :0.008327   Min.   :0.0600   Length:765         Length:765        
##  1st Qu.:0.017341   1st Qu.:0.4200   Class :character   Class :character  
##  Median :0.021199   Median :0.5500   Mode  :character   Mode  :character  
##  Mean   :0.021490   Mean   :0.5289                                        
##  3rd Qu.:0.024774   3rd Qu.:0.6500                                        
##  Max.   :0.045470   Max.   :0.8700                                        
##                     NA's   :209                                           
##    drinkage           alcohol              income           age       
##  Length:765         Length:765         Min.   : 8372   Min.   :28.23  
##  Class :character   Class :character   1st Qu.:14266   1st Qu.:34.39  
##  Mode  :character   Mode  :character   Median :17624   Median :35.39  
##                                        Mean   :17993   Mean   :35.14  
##                                        3rd Qu.:21080   3rd Qu.:36.13  
##                                        Max.   :35863   Max.   :39.17  
##                                                                       
##    enforce         
##  Length:765        
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Making sure we don’t have duplicates

seatbelt_data %>% duplicated() %>% table
## .
## FALSE 
##   765

Checking data type for each column’s values and see if any columns need conversion.

str(seatbelt_data)
## 'data.frame':    765 obs. of  13 variables:
##  $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ state     : chr  "AK" "AK" "AK" "AK" ...
##  $ year      : int  1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 ...
##  $ miles     : int  3358 3589 3840 4008 3900 3841 3887 3979 4021 3841 ...
##  $ fatalities: num  0.0447 0.0373 0.0331 0.0252 0.0195 ...
##  $ seatbelt  : num  NA NA NA NA NA ...
##  $ speed65   : chr  "no" "no" "no" "no" ...
##  $ speed70   : chr  "no" "no" "no" "no" ...
##  $ drinkage  : chr  "yes" "yes" "yes" "yes" ...
##  $ alcohol   : chr  "no" "no" "no" "no" ...
##  $ income    : int  17973 18093 18925 18466 18021 18447 19970 21073 21496 22073 ...
##  $ age       : num  28.2 28.3 28.4 28.4 28.5 ...
##  $ enforce   : chr  "no" "no" "no" "no" ...

It seems like some columns should be boolean/logical data types. Let’s see if we are able to convert them and also rename some columns.

new_seatbelt_data <- seatbelt_data %>% transmute(
  id = X,
  state,
  year,
  miles,
  fatalities_million_miles = fatalities,
  seatbelt_use_rate = seatbelt,
  speed65,
  speed70,
  drinkage_21 = drinkage,
  alcohol,
  income, 
  mean_age = age,
  enforce_seatbelt = enforce
)

new_seatbelt_data$speed65 <- ifelse(new_seatbelt_data$speed65 == "yes", TRUE, FALSE)
new_seatbelt_data$speed70 <- ifelse(new_seatbelt_data$speed70 == "yes", TRUE, FALSE)
new_seatbelt_data$drinkage_21 <- ifelse(new_seatbelt_data$drinkage_21 == "yes", TRUE, FALSE)
new_seatbelt_data$alcohol <- ifelse(new_seatbelt_data$alcohol == "yes", TRUE, FALSE)

Checking data type for each column’s values after conversion. We can see that the previous columns with ‘yes’ and ‘no’ values are now appropriate logical values.

str(new_seatbelt_data)
## 'data.frame':    765 obs. of  13 variables:
##  $ id                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ state                   : chr  "AK" "AK" "AK" "AK" ...
##  $ year                    : int  1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 ...
##  $ miles                   : int  3358 3589 3840 4008 3900 3841 3887 3979 4021 3841 ...
##  $ fatalities_million_miles: num  0.0447 0.0373 0.0331 0.0252 0.0195 ...
##  $ seatbelt_use_rate       : num  NA NA NA NA NA ...
##  $ speed65                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ speed70                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ drinkage_21             : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ alcohol                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ income                  : int  17973 18093 18925 18466 18021 18447 19970 21073 21496 22073 ...
##  $ mean_age                : num  28.2 28.3 28.4 28.4 28.5 ...
##  $ enforce_seatbelt        : chr  "no" "no" "no" "no" ...

Dropping rows with some NAs.

new_seatbelt_data <- new_seatbelt_data %>% drop_na()
head(new_seatbelt_data, 20)
##    id state year miles fatalities_million_miles seatbelt_use_rate speed65
## 1   8    AK 1990  3979               0.02462930              0.45   FALSE
## 2   9    AK 1991  4021               0.02511813              0.66   FALSE
## 3  10    AK 1992  3841               0.02811768              0.66    TRUE
## 4  11    AK 1993  3918               0.03011741              0.69    TRUE
## 5  12    AK 1994  4150               0.02048193              0.69    TRUE
## 6  13    AK 1995  4123               0.02110114              0.69    TRUE
## 7  14    AK 1996  4115               0.01968408              0.69    TRUE
## 8  15    AK 1997  4387               0.01755186              0.69    TRUE
## 9  17    AL 1984 32961               0.02827584              0.13   FALSE
## 10 18    AL 1985 35091               0.02513465              0.17   FALSE
## 11 19    AL 1986 34003               0.03179131              0.29   FALSE
## 12 20    AL 1987 37426               0.02968525              0.21    TRUE
## 13 21    AL 1988 39684               0.02580385              0.29    TRUE
## 14 22    AL 1989 40765               0.02524224              0.38    TRUE
## 15 23    AL 1990 42347               0.02647177              0.44    TRUE
## 16 24    AL 1991 42924               0.02599944              0.47    TRUE
## 17 25    AL 1992 45762               0.02252961              0.58    TRUE
## 18 26    AL 1993 47337               0.02205463              0.55    TRUE
## 19 27    AL 1994 48956               0.02212191              0.55    TRUE
## 20 28    AL 1995 50628               0.02200363              0.52    TRUE
##    speed70 drinkage_21 alcohol income mean_age enforce_seatbelt
## 1    FALSE        TRUE   FALSE  21073 29.58628               no
## 2    FALSE        TRUE   FALSE  21496 29.82771        secondary
## 3    FALSE        TRUE   FALSE  22073 30.21070        secondary
## 4    FALSE        TRUE   FALSE  22711 30.46439        secondary
## 5    FALSE        TRUE   FALSE  23417 30.75657        secondary
## 6    FALSE        TRUE   FALSE  23971 31.17860        secondary
## 7    FALSE        TRUE   FALSE  24310 31.44535        secondary
## 8    FALSE        TRUE   FALSE  24969 31.60147        secondary
## 9    FALSE       FALSE   FALSE  10417 34.42163               no
## 10   FALSE        TRUE   FALSE  11133 34.60257               no
## 11   FALSE        TRUE   FALSE  11736 34.76067               no
## 12   FALSE        TRUE   FALSE  12394 34.95566               no
## 13   FALSE        TRUE   FALSE  13288 35.13696               no
## 14   FALSE        TRUE   FALSE  14266 35.32786               no
## 15   FALSE        TRUE   FALSE  15213 35.62292               no
## 16   FALSE        TRUE   FALSE  15895 35.68874               no
## 17   FALSE        TRUE   FALSE  16817 35.83262        secondary
## 18   FALSE        TRUE   FALSE  17398 35.94825        secondary
## 19   FALSE        TRUE   FALSE  18163 36.09201        secondary
## 20    TRUE        TRUE    TRUE  19041 36.20575        secondary

Visualization.

viz <- ggplot(data = new_seatbelt_data, aes(x = seatbelt_use_rate, y = fatalities_million_miles)) +
       geom_point() +
       geom_smooth() +
       labs(title = "Seatbelt usage rate VS fatalities per million of traffic miles", subtitle = "Effects of Mandatory Seat Belt Laws in the US data (1983-1997)", x = "Seatbelt Useage Rate", y = "Fatalities per one million traffic miles")
viz
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

viz2 <- ggplot(data = new_seatbelt_data, aes(x = year, y = seatbelt_use_rate)) +
        geom_point() +
        geom_smooth() +
        labs(title = "Seatbelt usage rate VS Years", subtitle = "Effects of Mandatory Seat Belt Laws in the US (1983-1997)")
viz2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

viz3 <- ggplot(data = new_seatbelt_data, aes(x = year, y = fatalities_million_miles)) +
        geom_point() +
        geom_smooth()
viz3
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Conclusion: There is a slight drop on fatalities per million traffic miles as seatbelt usesage rate increases. The second graph shows an increase on seatbelt useage rate from 1983 - 1997.