The aim of this report is to formulate and answer questions about our dataset which pertains to the reports of US traffic fatalities from 1982 to 1988. Through the use of the statistical programming language R, we are able to manipulate and analyse the data to form our own conclusions and report on our findings. The questions we address in this report are:
From these questions, we discovered that enforcing state punishment for driving under the influence does in fact reduce the amount of alcohol related vehicular fatalities, and beer tax surprisingly does not prevent fatalities on the roadway.
# LOAD DATA v1 - uncomment the link below to: load data direct from html
data <- read.csv("Fatalities.csv")
# LOAD DATA v2 - uncomment the link below to: load data from local file
#cars = read.csv("dataset file location")
# Quick look at top 5 rows of data
head(data)
## X state year spirits unemp income emppop beertax baptist mormon
## 1 1 al 1982 1.37 14.4 10544.15 50.69204 1.539379 30.3557 0.32829
## 2 2 al 1983 1.36 13.7 10732.80 52.14703 1.788991 30.3336 0.34341
## 3 3 al 1984 1.32 11.1 11108.79 54.16809 1.714286 30.3115 0.35924
## 4 4 al 1985 1.28 8.9 11332.63 55.27114 1.652542 30.2895 0.37579
## 5 5 al 1986 1.23 9.8 11661.51 56.51450 1.609907 30.2674 0.39311
## 6 6 al 1987 1.18 7.8 11944.00 57.50988 1.560000 30.2453 0.41123
## drinkage dry youngdrivers miles breath jail service fatal nfatal
## 1 19.00 25.0063 0.211572 7233.887 no no no 839 146
## 2 19.00 22.9942 0.210768 7836.348 no no no 930 154
## 3 19.00 24.0426 0.211484 8262.990 no no no 932 165
## 4 19.67 23.6339 0.211140 8726.917 no no no 882 146
## 5 21.00 23.4647 0.213400 8952.854 no no no 1081 172
## 6 21.00 23.7924 0.215527 9166.302 no no no 1110 181
## sfatal fatal1517 nfatal1517 fatal1820 nfatal1820 fatal2124 nfatal2124 afatal
## 1 99 53 9 99 34 120 32 309.438
## 2 98 71 8 108 26 124 35 341.834
## 3 94 49 7 103 25 118 34 304.872
## 4 98 66 9 100 23 114 45 276.742
## 5 119 82 10 120 23 119 29 360.716
## 6 114 94 11 127 31 138 30 368.421
## pop pop1517 pop1820 pop2124 milestot unempus emppopus gsp
## 1 3942002 208999.6 221553.4 290000.1 28516 9.7 57.8 -0.02212476
## 2 3960008 202000.1 219125.5 290000.2 31032 9.6 57.9 0.04655825
## 3 3988992 197000.0 216724.1 288000.2 32961 7.5 59.5 0.06279784
## 4 4021008 194999.7 214349.0 284000.3 35091 7.2 60.1 0.02748997
## 5 4049994 203999.9 212000.0 263000.3 36259 7.0 60.7 0.03214295
## 6 4082999 204999.8 208998.5 258999.8 37426 6.2 61.5 0.04897637
## Size of data
dim(data)
## [1] 336 35
## R's classification of data
class(data)
## [1] "data.frame"
## R's classification of variables
str(data)
## 'data.frame': 336 obs. of 35 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ state : chr "al" "al" "al" "al" ...
## $ year : int 1982 1983 1984 1985 1986 1987 1988 1982 1983 1984 ...
## $ spirits : num 1.37 1.36 1.32 1.28 1.23 ...
## $ unemp : num 14.4 13.7 11.1 8.9 9.8 ...
## $ income : num 10544 10733 11109 11333 11662 ...
## $ emppop : num 50.7 52.1 54.2 55.3 56.5 ...
## $ beertax : num 1.54 1.79 1.71 1.65 1.61 ...
## $ baptist : num 30.4 30.3 30.3 30.3 30.3 ...
## $ mormon : num 0.328 0.343 0.359 0.376 0.393 ...
## $ drinkage : num 19 19 19 19.7 21 ...
## $ dry : num 25 23 24 23.6 23.5 ...
## $ youngdrivers: num 0.212 0.211 0.211 0.211 0.213 ...
## $ miles : num 7234 7836 8263 8727 8953 ...
## $ breath : chr "no" "no" "no" "no" ...
## $ jail : chr "no" "no" "no" "no" ...
## $ service : chr "no" "no" "no" "no" ...
## $ fatal : int 839 930 932 882 1081 1110 1023 724 675 869 ...
## $ nfatal : int 146 154 165 146 172 181 139 131 112 149 ...
## $ sfatal : int 99 98 94 98 119 114 89 76 60 81 ...
## $ fatal1517 : int 53 71 49 66 82 94 66 40 40 51 ...
## $ nfatal1517 : int 9 8 7 9 10 11 8 7 7 8 ...
## $ fatal1820 : int 99 108 103 100 120 127 105 81 83 118 ...
## $ nfatal1820 : int 34 26 25 23 23 31 24 16 19 34 ...
## $ fatal2124 : int 120 124 118 114 119 138 123 96 80 123 ...
## $ nfatal2124 : int 32 35 34 45 29 30 25 36 17 33 ...
## $ afatal : num 309 342 305 277 361 ...
## $ pop : num 3942002 3960008 3988992 4021008 4049994 ...
## $ pop1517 : num 209000 202000 197000 195000 204000 ...
## $ pop1820 : num 221553 219125 216724 214349 212000 ...
## $ pop2124 : num 290000 290000 288000 284000 263000 ...
## $ milestot : num 28516 31032 32961 35091 36259 ...
## $ unempus : num 9.7 9.6 7.5 7.2 7 ...
## $ emppopus : num 57.8 57.9 59.5 60.1 60.7 ...
## $ gsp : num -0.0221 0.0466 0.0628 0.0275 0.0321 ...
#sapply(mtcars, class)
Within the data, there were no missing variables and no duplicates of the data. The R classification of the csv file resembled this. Also, the dataset did not contain any summaries or descriptions of data, only quantitative and qualitative observations. This software was able to accurately classify each of the variables in the dataset, so no changes had to be made to the R output.
The data was obtained from an online repository, home to over 1300 unique data sets intended for teaching and statistical development. The data set selected was originally sourced from the online complement to the textbook ‘Introduction to Econometrics’, Stock and Watson. Data used in this study is considered to have good validity and reliability as it was sourced from multiple credible associations including the US Department of Transportation (Total Fatalities /Total Miles Travelled Annually), US Bureau of Economic Analysis (Personal Income), US Bureau of Labour Statistics (Unemployment Rate). Possible issues may occur due to the states having drastically different data, which proves to be difficult when representing the relationships between variables on a graph.
Within the data, there are 35 variables, each representing a different statistic or contributing factor regarding traffic fatalities in the lower 48 US states. Each row represents road fatalities in a particular state during a year between 1982 and 1988. Each column signifies a different statistic or factor that contributed to the accident, for example, spirits consumption, alcohol tax, population of age specific groups and laws that were in place in that state.
In the United States, every year over 38,000 people die in a roadway crash. Additionally, around 4.4 million are injured seriously enough to require medical attention. The US experiences the highest rate of road fatalities out of all high-income nations, approximately 50% higher than countries such as Australia, Canada or Japan (Evans, 2014). There are a multitude of factors which contribute to the high death toll on the roads. These can include irresponsible conduct on the roads, lack of preventative legislation or laws that are unenforced and poor road quality (“Road Safety Facts — Association for Safe International Road Travel”, 2020) . The data set which is studied in this report was recorded in the 1980s, which is interesting as it was during this time that the US traffic safety policies began to diverge from those put in place such as other countries such as the United Kingdom and Australia.
Road Safety Facts — Association for Safe International Road Travel. (2020). Retrieved 25 September 2020, from https://www.asirt.org/safe-travel/road-safety-facts/
U.S. traffic-related injuries and fatalities | Statista. (2020). Retrieved 26 September 2020, from https://www.statista.com/statistics/191900/road-traffic-related-injuries-and-fatalities-in-the-us-since-1988/
Evans, L. (2014). Traffic Fatality Reductions: United States Compared With 25 Other Countries. Retrieved 25 September 2020, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103211/
Kabacoff, R., 2020. Data Visualization With R. [online] Rkabacoff.github.io. Available at: https://rkabacoff.github.io/datavis/Bivariate.html#categorical-vs.quantitative [Accessed 1 October 2020].
Arel-Bundock, V., 2020. [online] Vincentarelbundock.github.io. Available at: https://vincentarelbundock.github.io/Rdatasets/csv/AER/Fatalities.csv [Accessed 18 September 2020].
Style: APA