Source Data

The following analysis uses data comprised of traffic accidents in the city of Tempe, Arizona. They constitute 28,470 observations and 32 variables. You can find more details in the data set documentation at the Tempe Open Data Portal.

Preview

We can explore the first six observations in the following table using function head().

head(dat)

Preprocessing

All data pre-processing are available in Appendix A: Data Preprocessing.

Part I: Summaries

The following provides brief exploratory summaries on key data points.

1. Mondays Accidents

Question: How many accidents happened on Mondays?

Note: This solution has been completed for you.

sum(dat$day == "Mon", 
    na.rm = TRUE)

## [1] 4094

Answer: 4,094 accidents occurred on Mondays.

2. Monday Accidents (%)

Question: What proportion of accidents each week occur on Monday??

table(dat$day) %>% pander()

Mon	Tue	Wed	Thu	Fri	Sat	Sun
4094	4656	4711	4814	5006	3044	2145

mean(dat$day =="Mon", na.rm=TRUE)

## [1] 0.1438005

Answer: 14.38% of all accidents occur on Mondays.

3. Harmful Monday Accidents (%)

Question: What proportion of accidents on Mondays result in harm??

Note: “Harm” is defined as any accident that causes at least one injury or fatality, a.k.a. a “casualty”.

#head(dat, n=20)
dat %>%
  group_by(day) %>%
  summarise(accidents=n(), injuries = sum(Totalinjuries), fatalities = sum(Totalfatalities)) %>% pander()

day	accidents	injuries	fatalities
Mon	4094	1644	13
Tue	4656	2056	8
Wed	4711	2144	9
Thu	4814	2204	10
Fri	5006	2103	12
Sat	3044	1192	11
Sun	2145	866	6

prop <- sum(dat$day == "Mon" & (dat$Totalinjuries > 0 | dat$Totalfatalities > 0))
prop/sum(dat$day == "Mon")

## [1] 0.2984856

Answer: 29.85% of all Monday accidents have at least one casualty.

4. Most Common Accidents

Question: What is the most common accident type on Mondays?

dat %>%
  filter(day == "Mon") %>%
  count(Collisionmanner) %>%
  arrange(desc(n))

Answer: Rear End collisions are the most common accident types.

5. Differences in Accidents

Question: Are there differences in the proportion of accidents that result in harm each day of the week?

dat %>%
  group_by(day) %>%
  mutate(harm.accidents = Totalinjuries > 0 | Totalfatalities > 0) %>%
  summarise(accidents=n(), 
            injuries = sum(Totalinjuries), 
            fatalities = sum(Totalfatalities),
            prop.harm = round(mean(harm.accidents), 2)) %>% 
  pander()

day	accidents	injuries	fatalities	prop.harm
Mon	4094	1644	13	0.3
Tue	4656	2056	8	0.32
Wed	4711	2144	9	0.31
Thu	4814	2204	10	0.33
Fri	5006	2103	12	0.3
Sat	3044	1192	11	0.26
Sun	2145	866	6	0.27

Answer: Significantly more accidents occur during the week (specifically Tuesday, Wednesday, and Thursday)..

6. Accident Stats by Weekday

Instructions: Create a table that reports the following for each day of the week.

Total accidents
Total injuries
Total fatalities
Proportion of harmful accidents (i.e. casualties)

dat %>%
  group_by(day) %>%
  mutate(harm_acc = Totalinjuries > 0 | Totalfatalities > 0) %>%
  summarize(n = n(),
            injuries = sum(Totalinjuries),
            fatalities = sum(Totalfatalities),
            harm.rate = round(mean(harm_acc),2)) %>% 
  pander()

day	n	injuries	fatalities	harm.rate
Mon	4094	1644	13	0.3
Tue	4656	2056	8	0.32
Wed	4711	2144	9	0.31
Thu	4814	2204	10	0.33
Fri	5006	2103	12	0.3
Sat	3044	1192	11	0.26
Sun	2145	866	6	0.27

Part II: Age Groups

The following provides summaries of accidents by age groups.

1. Accidents by Hour & Age

Instructions: Create a table of counts of accidents by time of day (hour12) and age of driver (age).

Question: Which age group has the largest number of accidents at 7 AM?

table(dat$hour, dat$age)

##     
##      Age 16-18 Age 18-25 Age 25-35 Age 35-45 Age 45-55 Age 55-65 Age 65-75
##   00        24       129        76        43        11         9         2
##   01        10       109        90        16        11         4         1
##   02         5       154        82        28         9         5         1
##   03         3        80        35        15         5         3         0
##   04         4        43        24        11        16         4         3
##   05         4        98        46        39        40        26         8
##   06        24       164       122        90        79        57        21
##   07        77       408       371       243       175       116        39
##   08        72       419       329       205       149       107        56
##   09        42       276       200       136       111        94        59
##   10        36       298       207       115       108        95        59
##   11        66       387       249       144       154       121        81
##   12        77       507       301       230       199       129        77
##   13        84       587       324       190       184       153        85
##   14       109       585       349       206       187       138        85
##   15       134       674       419       270       228       155        92
##   16       157       835       563       314       280       184        83
##   17       152       989       582       356       262       168        68
##   18       106       635       381       226       174       119        59
##   19        74       376       213       126       101        62        37
##   20        48       338       170        91        71        44        20
##   21        58       295       135        70        52        51        13
##   22        54       241       100        46        38        19        13
##   23        38       169        88        40        35        15         8
##     
##      Age 75-100
##   00          0
##   01          0
##   02          0
##   03          0
##   04          1
##   05          5
##   06          7
##   07         17
##   08         31
##   09         38
##   10         46
##   11         56
##   12         57
##   13         43
##   14         50
##   15         55
##   16         46
##   17         41
##   18         41
##   19         13
##   20         12
##   21          6
##   22          2
##   23          3

dat %>%
  filter(dat$hour =="07") %>%
  count(age, hour) %>%
  pander()

age	hour	n
Age 16-18	07	77
Age 18-25	07	408
Age 25-35	07	371
Age 35-45	07	243
Age 45-55	07	175
Age 55-65	07	116
Age 65-75	07	39
Age 75-100	07	17
NA	07	160

Answer: Drivers of Age 18-25 have the greatest number of accidents from 7:00 to 7:59 AM.

2. Accidents by Hour & Age (%)

Instructions: Create a new table of time of day and age group that reports the proportion of accidents at “7 AM” (hour12) for each age group (age). The proportions within each age group should sum to one.

Objective: The table should contain the following columns and initial values:

age	hour12	n	n.age	n.hour	p	p.age	p.hour
Age 16-18	7 AM	77	1458	1606	0.05	0.05	0.05
Age 18-25	7 AM	408	8796	1606	0.25	0.05	0.25
Age 25-35	7 AM	371	5456	1606	0.23	0.07	0.23

Solution: The solution has been partially provided. Use the hints provided to create the table.

dat %>%
  count(age, hour) %>%
  group_by(age)%>%
  mutate(n.age=sum(n), n.hour = sum(dat$hour == "07"))%>%
  mutate(p = round(n/n.hour, 2), p.age = round(n/n.age, 2), p.hour = round(n/n.hour,2)) %>%
  filter(hour == "07") %>%
pander()

age	hour	n	n.age	n.hour	p	p.age	p.hour
Age 16-18	07	77	1458	1606	0.05	0.05	0.05
Age 18-25	07	408	8796	1606	0.25	0.05	0.25
Age 25-35	07	371	5456	1606	0.23	0.07	0.23
Age 35-45	07	243	3250	1606	0.15	0.07	0.15
Age 45-55	07	175	2679	1606	0.11	0.07	0.11
Age 55-65	07	116	1878	1606	0.07	0.06	0.07
Age 65-75	07	39	970	1606	0.02	0.04	0.02
Age 75-100	07	17	570	1606	0.01	0.03	0.01
NA	07	160	3413	1606	0.1	0.05	0.1

Part III: Rates of Harm

The following reports the accidents, casualties, proportion, and average casualties per harmful accident.

1. Accidents by Hour

Instructions: Visualize total accidents by time of day (hour).

Note: This solution has been completed for you.

dat %>%
  group_by(hour) %>%
  summarize(n = n()) %>%
  plot(type = "b",
       bty = "n",
       pch = 19,
       cex = 2,
       xlim = c(0,25),       
       xlab = "Hour",
       ylab = "Total Accidents",
       main = "Total Accidents by Time of Day")

2. Total Casualties by Hour

Instructions: Visualize total injuries and fatalities by time of day (hour).

dat %>%
  group_by(hour) %>%
  summarize(n = sum(Totalfatalities | Totalinjuries)) %>%
  plot(type = "b",
       bty = "n",
       pch = 19,
       cex = 2,
       xlim = c(0,25),       
       xlab = "Hour",
       ylab = "Total Number of Passengers Hurt or Killed",
       main = "Total Injuries or Fatalities by Hour of the Day")

3. Accidents with Casualties (%)

Instructions: Visualize the proportion of harmful collisions our of all collisions by time of day (hour).

dat %>%
  group_by(hour) %>%
  mutate(harm.accidents = Totalinjuries > 0 | Totalfatalities > 0) %>%
  summarise(prop.harm = mean(harm.accidents)) %>% 
  plot(type = "b",
       bty = "n",
       pch = 19,
       cex = 2,
       ylim = c(0.25, 0.35),
       xlim = c(0,25),
       xlab = "Hour",
       ylab = "Proportion of Accidents Resulting in Harm",
       main = "Proportion of Crashes that Result in Injuries or Fatalities")

4. Avg. Harm by Harmful Accident

Instructions: Visualize average injuries or fatalities per harmful collision.

#summary(dat$casualty)

dat%>%
  group_by(hour) %>%
  mutate(n = sum(Totalinjuries > 0 | Totalfatalities > 0),
            Injuries = sum(Totalinjuries),
            Fatalities = sum(Totalfatalities),
            Casualties = (Injuries+Fatalities)) %>%
  summarise(aveinj = Casualties/n) %>%
  plot(type = "b",
       bty = "n",
       pch = 19,
       cex = 2,
       xlim = c(0,25),
       xlab = "Hour",
       ylab = "AVe Number of Passengers Hurt",
       main = "Average Injuries or Fatalities Per Harmful Crash")

Appendix

The following code is used for preprocessing tasks in the above solutions.

A: Data Preprocessing

These expressions transform, format, and create myriad new variables used in this report.

date.vec <- strptime(dat$DateTime, 
                     format = "%m/%d/%y %H:%M")       # Create date fields

dat$hour   <- format(date.vec, format = "%H")
dat$month  <- format(date.vec, format = "%b")
dat$day    <- format(date.vec, format = "%a")
dat$day365 <- format(date.vec, format = "%j")
dat$week   <- format(date.vec, format = "%V")

dat$day <- factor(dat$day, 
                  levels = c("Mon",
                             "Tue",
                             "Wed",
                             "Thu",
                             "Fri",
                             "Sat",
                             "Sun"))                  # Order weekdays

dat$hour12 <- format(date.vec, 
                     format="%l %p")                  # Create 12-hour format

time.levels <- c("12 AM", paste(1:11, "AM"), 
                 "12 PM", paste(1:11, "PM"))

dat$hour12 <- factor(dat$hour12, 
                     levels = time.levels)            # Order time intervals

age.labels <- paste0("Age ", 
                     c(16,18,25,35,45,55,65,75), "-", 
                     c(18,25,35,45,55,65,75,100) )

dat$age <- cut(dat$Age_Drv1, 
               breaks = c(16,18,25,
                          35,45,55,
                          65,75,100), 
               labels = age.labels)                   # Discretize age ranges

————

DELETE THIS LINE & ALL LINES BELOW BEFORE SUBMITTING

————

How to Submit

Use the following instructions to submit your assignment, which may vary depending on your course’s platform.

Knitting to HTML

When you have completed your assignment, click the “Knit” button to render your .RMD file into a .HTML report.

Special Instructions

Perform the following depending on your course’s platform:

Canvas: Upload both your .RMD and .HTML files to the appropriate link
Blackboard or iCollege: Compress your .RMD and .HTML files in a .ZIP file and upload to the appropriate link

.HTML files are preferred but not allowed by all platforms.

Before You Submit

Remember to ensure the following before submitting your assignment.

Name your files using this format: Lab-##-LastName.rmd and Lab-##-LastName.html
Show both the solution for your code and write out your answers in the body text
Do not show excessive output; truncate your output, e.g. with function head()
Follow appropriate styling conventions, e.g. spaces after commas, etc.
Above all, ensure that your conventions are consistent

See Google’s R Style Guide for examples of common conventions.

Common Knitting Issues

.RMD files are knit into .HTML and other formats procedural, or line-by-line.

An error in code when knitting will halt the process; error messages will tell you the specific line with the error
Certain functions like install.packages() or setwd() are bound to cause errors in knitting
Altering a dataset or variable in one chunk will affect their use in all later chunks
If an object is “not found”, make sure it was created or loaded with library() in a previous chunk

If All Else Fails: If you cannot determine and fix the errors in a code chunk that’s preventing you from knitting your document, add eval = FALSE inside the brackets of {r} at the beginning of a chunk to ensure that R does not attempt to evaluate it, that is: {r eval = FALSE}. This will prevent an erroneous chunk of code from halting the knitting process.

Lab 05 - Tempe Accident Analysis

Jacqui Anderson

June 16, 2022