We are going to explore this dataset that includes information from
1983 - 1997 on seat belt useage rate and fatalities. My assumption is
that as more states enforce seatbelt laws, then the fatalities also
decreases.
Import data from URL for BONUS
seatbelt_data <- read.csv('https://raw.githubusercontent.com/wilsonvetdev/R_Bridge/main/USSeatBelts.csv')
head(seatbelt_data)
## X state year miles fatalities seatbelt speed65 speed70 drinkage alcohol
## 1 1 AK 1983 3358 0.04466945 NA no no yes no
## 2 2 AK 1984 3589 0.03733630 NA no no yes no
## 3 3 AK 1985 3840 0.03307291 NA no no yes no
## 4 4 AK 1986 4008 0.02519960 NA no no yes no
## 5 5 AK 1987 3900 0.01948718 NA no no yes no
## 6 6 AK 1988 3841 0.02525384 NA no no yes no
## income age enforce
## 1 17973 28.23497 no
## 2 18093 28.34354 no
## 3 18925 28.37282 no
## 4 18466 28.39665 no
## 5 18021 28.45325 no
## 6 18447 28.85142 no
Quick glance of the data set.
summary(seatbelt_data)
## X state year miles
## Min. : 1 Length:765 Min. :1983 Min. : 3099
## 1st Qu.:192 Class :character 1st Qu.:1986 1st Qu.: 11401
## Median :383 Mode :character Median :1990 Median : 30319
## Mean :383 Mean :1990 Mean : 41448
## 3rd Qu.:574 3rd Qu.:1994 3rd Qu.: 52312
## Max. :765 Max. :1997 Max. :285612
##
## fatalities seatbelt speed65 speed70
## Min. :0.008327 Min. :0.0600 Length:765 Length:765
## 1st Qu.:0.017341 1st Qu.:0.4200 Class :character Class :character
## Median :0.021199 Median :0.5500 Mode :character Mode :character
## Mean :0.021490 Mean :0.5289
## 3rd Qu.:0.024774 3rd Qu.:0.6500
## Max. :0.045470 Max. :0.8700
## NA's :209
## drinkage alcohol income age
## Length:765 Length:765 Min. : 8372 Min. :28.23
## Class :character Class :character 1st Qu.:14266 1st Qu.:34.39
## Mode :character Mode :character Median :17624 Median :35.39
## Mean :17993 Mean :35.14
## 3rd Qu.:21080 3rd Qu.:36.13
## Max. :35863 Max. :39.17
##
## enforce
## Length:765
## Class :character
## Mode :character
##
##
##
##
Making sure we don’t have duplicates
seatbelt_data %>% duplicated() %>% table
## .
## FALSE
## 765
Checking data type for each column’s values and see if any columns
need conversion.
str(seatbelt_data)
## 'data.frame': 765 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ state : chr "AK" "AK" "AK" "AK" ...
## $ year : int 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 ...
## $ miles : int 3358 3589 3840 4008 3900 3841 3887 3979 4021 3841 ...
## $ fatalities: num 0.0447 0.0373 0.0331 0.0252 0.0195 ...
## $ seatbelt : num NA NA NA NA NA ...
## $ speed65 : chr "no" "no" "no" "no" ...
## $ speed70 : chr "no" "no" "no" "no" ...
## $ drinkage : chr "yes" "yes" "yes" "yes" ...
## $ alcohol : chr "no" "no" "no" "no" ...
## $ income : int 17973 18093 18925 18466 18021 18447 19970 21073 21496 22073 ...
## $ age : num 28.2 28.3 28.4 28.4 28.5 ...
## $ enforce : chr "no" "no" "no" "no" ...
It seems like some columns should be boolean/logical data types.
Let’s see if we are able to convert them and also rename some
columns.
new_seatbelt_data <- seatbelt_data %>% transmute(
id = X,
state,
year,
miles,
fatalities_million_miles = fatalities,
seatbelt_use_rate = seatbelt,
speed65,
speed70,
drinkage_21 = drinkage,
alcohol,
income,
mean_age = age,
enforce_seatbelt = enforce
)
new_seatbelt_data$speed65 <- ifelse(new_seatbelt_data$speed65 == "yes", TRUE, FALSE)
new_seatbelt_data$speed70 <- ifelse(new_seatbelt_data$speed70 == "yes", TRUE, FALSE)
new_seatbelt_data$drinkage_21 <- ifelse(new_seatbelt_data$drinkage_21 == "yes", TRUE, FALSE)
new_seatbelt_data$alcohol <- ifelse(new_seatbelt_data$alcohol == "yes", TRUE, FALSE)
Checking data type for each column’s values after conversion. We can
see that the previous columns with ‘yes’ and ‘no’ values are now
appropriate logical values.
str(new_seatbelt_data)
## 'data.frame': 765 obs. of 13 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ state : chr "AK" "AK" "AK" "AK" ...
## $ year : int 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 ...
## $ miles : int 3358 3589 3840 4008 3900 3841 3887 3979 4021 3841 ...
## $ fatalities_million_miles: num 0.0447 0.0373 0.0331 0.0252 0.0195 ...
## $ seatbelt_use_rate : num NA NA NA NA NA ...
## $ speed65 : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ speed70 : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ drinkage_21 : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ alcohol : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ income : int 17973 18093 18925 18466 18021 18447 19970 21073 21496 22073 ...
## $ mean_age : num 28.2 28.3 28.4 28.4 28.5 ...
## $ enforce_seatbelt : chr "no" "no" "no" "no" ...
Dropping rows with some NAs.
new_seatbelt_data <- new_seatbelt_data %>% drop_na()
head(new_seatbelt_data, 20)
## id state year miles fatalities_million_miles seatbelt_use_rate speed65
## 1 8 AK 1990 3979 0.02462930 0.45 FALSE
## 2 9 AK 1991 4021 0.02511813 0.66 FALSE
## 3 10 AK 1992 3841 0.02811768 0.66 TRUE
## 4 11 AK 1993 3918 0.03011741 0.69 TRUE
## 5 12 AK 1994 4150 0.02048193 0.69 TRUE
## 6 13 AK 1995 4123 0.02110114 0.69 TRUE
## 7 14 AK 1996 4115 0.01968408 0.69 TRUE
## 8 15 AK 1997 4387 0.01755186 0.69 TRUE
## 9 17 AL 1984 32961 0.02827584 0.13 FALSE
## 10 18 AL 1985 35091 0.02513465 0.17 FALSE
## 11 19 AL 1986 34003 0.03179131 0.29 FALSE
## 12 20 AL 1987 37426 0.02968525 0.21 TRUE
## 13 21 AL 1988 39684 0.02580385 0.29 TRUE
## 14 22 AL 1989 40765 0.02524224 0.38 TRUE
## 15 23 AL 1990 42347 0.02647177 0.44 TRUE
## 16 24 AL 1991 42924 0.02599944 0.47 TRUE
## 17 25 AL 1992 45762 0.02252961 0.58 TRUE
## 18 26 AL 1993 47337 0.02205463 0.55 TRUE
## 19 27 AL 1994 48956 0.02212191 0.55 TRUE
## 20 28 AL 1995 50628 0.02200363 0.52 TRUE
## speed70 drinkage_21 alcohol income mean_age enforce_seatbelt
## 1 FALSE TRUE FALSE 21073 29.58628 no
## 2 FALSE TRUE FALSE 21496 29.82771 secondary
## 3 FALSE TRUE FALSE 22073 30.21070 secondary
## 4 FALSE TRUE FALSE 22711 30.46439 secondary
## 5 FALSE TRUE FALSE 23417 30.75657 secondary
## 6 FALSE TRUE FALSE 23971 31.17860 secondary
## 7 FALSE TRUE FALSE 24310 31.44535 secondary
## 8 FALSE TRUE FALSE 24969 31.60147 secondary
## 9 FALSE FALSE FALSE 10417 34.42163 no
## 10 FALSE TRUE FALSE 11133 34.60257 no
## 11 FALSE TRUE FALSE 11736 34.76067 no
## 12 FALSE TRUE FALSE 12394 34.95566 no
## 13 FALSE TRUE FALSE 13288 35.13696 no
## 14 FALSE TRUE FALSE 14266 35.32786 no
## 15 FALSE TRUE FALSE 15213 35.62292 no
## 16 FALSE TRUE FALSE 15895 35.68874 no
## 17 FALSE TRUE FALSE 16817 35.83262 secondary
## 18 FALSE TRUE FALSE 17398 35.94825 secondary
## 19 FALSE TRUE FALSE 18163 36.09201 secondary
## 20 TRUE TRUE TRUE 19041 36.20575 secondary
Visualization.
viz <- ggplot(data = new_seatbelt_data, aes(x = seatbelt_use_rate, y = fatalities_million_miles)) +
geom_point() +
geom_smooth() +
labs(title = "Seatbelt usage rate VS fatalities per million of traffic miles", subtitle = "Effects of Mandatory Seat Belt Laws in the US data (1983-1997)", x = "Seatbelt Useage Rate", y = "Fatalities per one million traffic miles")
viz
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

viz2 <- ggplot(data = new_seatbelt_data, aes(x = year, y = seatbelt_use_rate)) +
geom_point() +
geom_smooth() +
labs(title = "Seatbelt usage rate VS Years", subtitle = "Effects of Mandatory Seat Belt Laws in the US (1983-1997)")
viz2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

viz3 <- ggplot(data = new_seatbelt_data, aes(x = year, y = fatalities_million_miles)) +
geom_point() +
geom_smooth()
viz3
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Conclusion: There is a slight drop on fatalities per million traffic
miles as seatbelt usesage rate increases. The second graph shows an
increase on seatbelt useage rate from 1983 - 1997.