1 Executive Summary

The aim of this report is to formulate and answer questions about our dataset which pertains to the reports of US traffic fatalities from 1982 to 1988. Through the use of the statistical programming language R, we are able to manipulate and analyse the data to form our own conclusions and report on our findings. The questions we address in this report are:

  • Does enforcing state punishment for driving under the influence effectively reduce the amount of alcohol related vehicular fatalities?
  • Does increasing the tax on a case of beer affect the number of alcohol related fatalities on the roadways in the lower 48 states in the US?

From these questions, we discovered that enforcing state punishment for driving under the influence does in fact reduce the amount of alcohol related vehicular fatalities, and beer tax surprisingly does not prevent fatalities on the roadway.


2 Full Report

2.1 Initial Data Analysis (IDA)

# LOAD DATA v1 - uncomment the link below to: load data direct from html
data <- read.csv("Fatalities.csv")

# LOAD DATA v2 - uncomment the link below to: load data from local file
#cars = read.csv("dataset file location")

# Quick look at top 5 rows of data
head(data)
##   X state year spirits unemp   income   emppop  beertax baptist  mormon
## 1 1    al 1982    1.37  14.4 10544.15 50.69204 1.539379 30.3557 0.32829
## 2 2    al 1983    1.36  13.7 10732.80 52.14703 1.788991 30.3336 0.34341
## 3 3    al 1984    1.32  11.1 11108.79 54.16809 1.714286 30.3115 0.35924
## 4 4    al 1985    1.28   8.9 11332.63 55.27114 1.652542 30.2895 0.37579
## 5 5    al 1986    1.23   9.8 11661.51 56.51450 1.609907 30.2674 0.39311
## 6 6    al 1987    1.18   7.8 11944.00 57.50988 1.560000 30.2453 0.41123
##   drinkage     dry youngdrivers    miles breath jail service fatal nfatal
## 1    19.00 25.0063     0.211572 7233.887     no   no      no   839    146
## 2    19.00 22.9942     0.210768 7836.348     no   no      no   930    154
## 3    19.00 24.0426     0.211484 8262.990     no   no      no   932    165
## 4    19.67 23.6339     0.211140 8726.917     no   no      no   882    146
## 5    21.00 23.4647     0.213400 8952.854     no   no      no  1081    172
## 6    21.00 23.7924     0.215527 9166.302     no   no      no  1110    181
##   sfatal fatal1517 nfatal1517 fatal1820 nfatal1820 fatal2124 nfatal2124  afatal
## 1     99        53          9        99         34       120         32 309.438
## 2     98        71          8       108         26       124         35 341.834
## 3     94        49          7       103         25       118         34 304.872
## 4     98        66          9       100         23       114         45 276.742
## 5    119        82         10       120         23       119         29 360.716
## 6    114        94         11       127         31       138         30 368.421
##       pop  pop1517  pop1820  pop2124 milestot unempus emppopus         gsp
## 1 3942002 208999.6 221553.4 290000.1    28516     9.7     57.8 -0.02212476
## 2 3960008 202000.1 219125.5 290000.2    31032     9.6     57.9  0.04655825
## 3 3988992 197000.0 216724.1 288000.2    32961     7.5     59.5  0.06279784
## 4 4021008 194999.7 214349.0 284000.3    35091     7.2     60.1  0.02748997
## 5 4049994 203999.9 212000.0 263000.3    36259     7.0     60.7  0.03214295
## 6 4082999 204999.8 208998.5 258999.8    37426     6.2     61.5  0.04897637
## Size of data
dim(data)
## [1] 336  35
## R's classification of data
class(data)
## [1] "data.frame"
## R's classification of variables
str(data)
## 'data.frame':    336 obs. of  35 variables:
##  $ X           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ state       : chr  "al" "al" "al" "al" ...
##  $ year        : int  1982 1983 1984 1985 1986 1987 1988 1982 1983 1984 ...
##  $ spirits     : num  1.37 1.36 1.32 1.28 1.23 ...
##  $ unemp       : num  14.4 13.7 11.1 8.9 9.8 ...
##  $ income      : num  10544 10733 11109 11333 11662 ...
##  $ emppop      : num  50.7 52.1 54.2 55.3 56.5 ...
##  $ beertax     : num  1.54 1.79 1.71 1.65 1.61 ...
##  $ baptist     : num  30.4 30.3 30.3 30.3 30.3 ...
##  $ mormon      : num  0.328 0.343 0.359 0.376 0.393 ...
##  $ drinkage    : num  19 19 19 19.7 21 ...
##  $ dry         : num  25 23 24 23.6 23.5 ...
##  $ youngdrivers: num  0.212 0.211 0.211 0.211 0.213 ...
##  $ miles       : num  7234 7836 8263 8727 8953 ...
##  $ breath      : chr  "no" "no" "no" "no" ...
##  $ jail        : chr  "no" "no" "no" "no" ...
##  $ service     : chr  "no" "no" "no" "no" ...
##  $ fatal       : int  839 930 932 882 1081 1110 1023 724 675 869 ...
##  $ nfatal      : int  146 154 165 146 172 181 139 131 112 149 ...
##  $ sfatal      : int  99 98 94 98 119 114 89 76 60 81 ...
##  $ fatal1517   : int  53 71 49 66 82 94 66 40 40 51 ...
##  $ nfatal1517  : int  9 8 7 9 10 11 8 7 7 8 ...
##  $ fatal1820   : int  99 108 103 100 120 127 105 81 83 118 ...
##  $ nfatal1820  : int  34 26 25 23 23 31 24 16 19 34 ...
##  $ fatal2124   : int  120 124 118 114 119 138 123 96 80 123 ...
##  $ nfatal2124  : int  32 35 34 45 29 30 25 36 17 33 ...
##  $ afatal      : num  309 342 305 277 361 ...
##  $ pop         : num  3942002 3960008 3988992 4021008 4049994 ...
##  $ pop1517     : num  209000 202000 197000 195000 204000 ...
##  $ pop1820     : num  221553 219125 216724 214349 212000 ...
##  $ pop2124     : num  290000 290000 288000 284000 263000 ...
##  $ milestot    : num  28516 31032 32961 35091 36259 ...
##  $ unempus     : num  9.7 9.6 7.5 7.2 7 ...
##  $ emppopus    : num  57.8 57.9 59.5 60.1 60.7 ...
##  $ gsp         : num  -0.0221 0.0466 0.0628 0.0275 0.0321 ...
#sapply(mtcars, class)

Within the data, there were no missing variables and no duplicates of the data. The R classification of the csv file resembled this. Also, the dataset did not contain any summaries or descriptions of data, only quantitative and qualitative observations. This software was able to accurately classify each of the variables in the dataset, so no changes had to be made to the R output.

The data was obtained from an online repository, home to over 1300 unique data sets intended for teaching and statistical development. The data set selected was originally sourced from the online complement to the textbook ‘Introduction to Econometrics’, Stock and Watson. Data used in this study is considered to have good validity and reliability as it was sourced from multiple credible associations including the US Department of Transportation (Total Fatalities /Total Miles Travelled Annually), US Bureau of Economic Analysis (Personal Income), US Bureau of Labour Statistics (Unemployment Rate). Possible issues may occur due to the states having drastically different data, which proves to be difficult when representing the relationships between variables on a graph.

Within the data, there are 35 variables, each representing a different statistic or contributing factor regarding traffic fatalities in the lower 48 US states. Each row represents road fatalities in a particular state during a year between 1982 and 1988. Each column signifies a different statistic or factor that contributed to the accident, for example, spirits consumption, alcohol tax, population of age specific groups and laws that were in place in that state.

2.2 Domain Knowledge

In the United States, every year over 38,000 people die in a roadway crash. Additionally, around 4.4 million are injured seriously enough to require medical attention. The US experiences the highest rate of road fatalities out of all high-income nations, approximately 50% higher than countries such as Australia, Canada or Japan (Evans, 2014). There are a multitude of factors which contribute to the high death toll on the roads. These can include irresponsible conduct on the roads, lack of preventative legislation or laws that are unenforced and poor road quality (“Road Safety Facts — Association for Safe International Road Travel”, 2020) . The data set which is studied in this report was recorded in the 1980s, which is interesting as it was during this time that the US traffic safety policies began to diverge from those put in place such as other countries such as the United Kingdom and Australia.


2.3 Research Question 1

2.4 Research Question 2

3 References

Style: APA