Who is Drunk in Ames, Iowa

1. Breath alcohol tests in Ames, Iowa, USA

Ames, Iowa, USA is the home of Iowa State University, a land grant university with over 36,000 students. By comparison, the city of Ames, Iowa, itself only has about 65,000 residents. As with any other college town, Ames has had its fair share of alcohol-related incidents. (For example, Google ‘VEISHEA riots 2014’.) We will take a look at some breath alcohol test data from Ames that is published by the State of Iowa.

The data contains 1,556 readings from breath alcohol tests administered by the Ames and Iowa State University Police Departments from January 2013 to December 2017. The columns in this data set are year, month, day, hour, location, gender, Res1, Res2.

# Load the packages 
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readr)
library(ggplot2)

# Read the data into your workspace
ba_data <- read_csv('datasets/breath_alcohol_ames.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   year = col_double(),
##   month = col_double(),
##   day = col_double(),
##   hour = col_double(),
##   location = col_character(),
##   gender = col_character(),
##   Res1 = col_double(),
##   Res2 = col_double()
## )

# Quickly inspect the data
head(ba_data)

## # A tibble: 6 x 8
##    year month   day  hour location gender  Res1  Res2
##   <dbl> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl>
## 1  2017    12    17     1 Ames PD  M      0.046 0.046
## 2  2017    12    14     3 ISU PD   F      0.121 0.12 
## 3  2017    12    10     5 ISU PD   F      0.068 0.067
## 4  2017    12    10     3 ISU PD   F      0.077 0.077
## 5  2017    12     9     2 ISU PD   M      0.085 0.084
## 6  2017    12     9     1 Ames PD  M      0.16  0.161

# Obtain counts for each year 
ba_year <- ba_data %>%
    group_by(year) %>%
    count()

2. What is the busiest police department in Ames?

There are two police departments in the data set: the Iowa State University Police Department and the Ames Police Department. Which one administers more breathalyzer tests?

# Count the totals for each department
pds <- ba_data %>%
    group_by(location) %>%
    count()
pds

## # A tibble: 2 x 2
## # Groups:   location [2]
##   location     n
##   <chr>    <int>
## 1 Ames PD    616
## 2 ISU PD     940

3. Nothing Good Happens after 2am

We all know that “nothing good happens after 2am.” Thus, there are inevitably some times of the day when breath alcohol tests, especially in a college town like Ames, are most and least common. Which hours of the day have the most and least breathalyzer tests?

# Count by hour and arrange by descending frequency

# hourly <- ba_data %>%
#    group_by(hour) %>%
#    count() %>%
#    arrange(desc(n))

hourly <- ba_data %>%
    count(hour, sort = TRUE)


hourly

## # A tibble: 24 x 2
##     hour     n
##    <dbl> <int>
##  1     2   417
##  2     3   364
##  3     1   219
##  4     4   124
##  5     0    98
##  6    23    51
##  7     5    48
##  8    20    31
##  9    22    29
## 10    21    26
## # … with 14 more rows

# Use a geom_ to create the appropriate bar chart
ggplot(hourly, aes(x = hour,  weight = n)) +
    geom_bar()

4. Breathalyzer tests by month

Now that we have discovered which time of day is most common for breath alcohol tests, we will determine which time of the year has the most breathalyzer tests. Which month will have the most recorded tests?

# Count by month and arrange by descending frequency
monthly <- ba_data %>%
    count(month, sort = TRUE)

# Make month a factor
monthly$month <- as.factor(monthly$month)

# Use a geom_ to create the appropriate bar chart
ggplot(monthly, aes(x = month,weight = n)) + 
  geom_bar()

# ggplot(monthly, aes(x = month, y = n))+ 
#  geom_col()

5. COLLEGE

When we think of (binge) drinking in college towns in America, we usually think of something like this image at the left. And so, one might suspect that breath alcohol tests are given to men more often than women and that men drink more than women.

# Count by gender 
ba_data %>%
    count(gender)

## # A tibble: 3 x 2
##   gender     n
##   <chr>  <int>
## 1 F        425
## 2 M       1102
## 3 <NA>      29

# Create a dataset with no NAs in gender 
clean_gender <- ba_data %>%
    filter(!is.na(gender))
clean_gender

## # A tibble: 1,527 x 8
##     year month   day  hour location gender  Res1  Res2
##    <dbl> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl>
##  1  2017    12    17     1 Ames PD  M      0.046 0.046
##  2  2017    12    14     3 ISU PD   F      0.121 0.12 
##  3  2017    12    10     5 ISU PD   F      0.068 0.067
##  4  2017    12    10     3 ISU PD   F      0.077 0.077
##  5  2017    12     9     2 ISU PD   M      0.085 0.084
##  6  2017    12     9     1 Ames PD  M      0.16  0.161
##  7  2017    12     7     3 Ames PD  M      0.131 0.131
##  8  2017    12     4     1 ISU PD   M      0     0    
##  9  2017    12     3     1 Ames PD  M      0.091 0.09 
## 10  2017    12     3     0 Ames PD  M      0.095 0.095
## # … with 1,517 more rows

# Create a mean test result variable and save as mean_bas
mean_bas <- clean_gender %>%
    mutate(meanRes = (Res1+Res2)/2)

mean_bas

## # A tibble: 1,527 x 9
##     year month   day  hour location gender  Res1  Res2 meanRes
##    <dbl> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl>   <dbl>
##  1  2017    12    17     1 Ames PD  M      0.046 0.046  0.046 
##  2  2017    12    14     3 ISU PD   F      0.121 0.12   0.121 
##  3  2017    12    10     5 ISU PD   F      0.068 0.067  0.0675
##  4  2017    12    10     3 ISU PD   F      0.077 0.077  0.077 
##  5  2017    12     9     2 ISU PD   M      0.085 0.084  0.0845
##  6  2017    12     9     1 Ames PD  M      0.16  0.161  0.160 
##  7  2017    12     7     3 Ames PD  M      0.131 0.131  0.131 
##  8  2017    12     4     1 ISU PD   M      0     0      0     
##  9  2017    12     3     1 Ames PD  M      0.091 0.09   0.0905
## 10  2017    12     3     0 Ames PD  M      0.095 0.095  0.095 
## # … with 1,517 more rows

# Create side-by-side boxplots to compare the mean blood alcohol levels of men and women
ggplot(mean_bas, aes(x = gender, y = meanRes)) + 
  geom_boxplot()

6. Above the legal limit

In the USA, it is illegal to drive with a blood alcohol concentration (BAC) above 0.08%. This is the case for all 50 states. Assuming everyone tested in our data was driving (though we have no way of knowing this from the data), if either of the results (Res1, Res2) are above 0.08, the person would be charged with DUI (driving under the influence).

# Filter the data
duis <- ba_data %>%
    filter(Res1 > 0.08 | Res2 > 0.08 )
duis

## # A tibble: 1,159 x 8
##     year month   day  hour location gender  Res1  Res2
##    <dbl> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl>
##  1  2017    12    14     3 ISU PD   F      0.121 0.12 
##  2  2017    12     9     2 ISU PD   M      0.085 0.084
##  3  2017    12     9     1 Ames PD  M      0.16  0.161
##  4  2017    12     7     3 Ames PD  M      0.131 0.131
##  5  2017    12     3     1 Ames PD  M      0.091 0.09 
##  6  2017    12     3     0 Ames PD  M      0.095 0.095
##  7  2017    12     2     1 Ames PD  M      0.155 0.155
##  8  2017    11    29     3 ISU PD   F      0.171 0.171
##  9  2017    11    17    18 ISU PD   M      0.176 0.176
## 10  2017    11    11     2 ISU PD   M      0.094 0.093
## # … with 1,149 more rows

# Proportion of tests that would have resulted in a DUI
p_dui <- nrow(duis) / nrow(ba_data)
p_dui

## [1] 0.7448586

7. Breathalyzer tests: is there a pattern over time?

We previously saw that 2 a.m. is the most common time of day for breathalyzer tests to be administered, and August is the most common month of the year for breathalyzer tests. Now, we look at the weeks in the year over time. We briefly use the lubridate package for a bit of date-time manipulation.

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

# Create date variable using paste() and ymd()
ba_data <- ba_data %>% 
    mutate(date = ymd(paste(year, month, day, sep = "-")))

# Create a week variable using week()
ba_data <- ba_data %>% 
  mutate(week = week(date))

ba_data

## # A tibble: 1,556 x 10
##     year month   day  hour location gender  Res1  Res2 date        week
##    <dbl> <dbl> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl> <date>     <dbl>
##  1  2017    12    17     1 Ames PD  M      0.046 0.046 2017-12-17    51
##  2  2017    12    14     3 ISU PD   F      0.121 0.12  2017-12-14    50
##  3  2017    12    10     5 ISU PD   F      0.068 0.067 2017-12-10    50
##  4  2017    12    10     3 ISU PD   F      0.077 0.077 2017-12-10    50
##  5  2017    12     9     2 ISU PD   M      0.085 0.084 2017-12-09    49
##  6  2017    12     9     1 Ames PD  M      0.16  0.161 2017-12-09    49
##  7  2017    12     7     3 Ames PD  M      0.131 0.131 2017-12-07    49
##  8  2017    12     4     1 ISU PD   M      0     0     2017-12-04    49
##  9  2017    12     3     1 Ames PD  M      0.091 0.09  2017-12-03    49
## 10  2017    12     3     0 Ames PD  M      0.095 0.095 2017-12-03    49
## # … with 1,546 more rows

8. Looking at timelines

How do the weeks differ over time? One of the most common data visualizations is the time series, a line tracking the changes in a variable over time. We will use the new week variable to look at test frequency over time. We end with a time series plot showing frequency of breathalyzer tests by week in year, with one line for each year.

# Create the weekly data set 
# weekly <- ba_data %>%
#    count(week, year)
# weekly

weekly <- ba_data %>%
    group_by(week, year) %>%
    count()

# Make year a factor
weekly <- weekly %>% 
    mutate(year = as.factor(year))

# Create the time series plot with one line for each year
ggplot(weekly, aes(x = week, y = n, color = year)) + 
  geom_line() + 
  geom_point() +  
  scale_x_continuous(breaks = seq(0,52,2))

9. The end of VEISHEA

From Wikipedia: “VEISHEA was an annual week-long celebration held each spring on the campus of Iowa State University in Ames, Iowa. The celebration featured an annual parade and many open-house demonstrations of the university facilities and departments. Campus organizations exhibited products, technologies, and held fundraisers for various charity groups. In addition, VEISHEA brought speakers, lecturers, and entertainers to Iowa State. […] VEISHEA was the largest student-run festival in the nation, bringing in tens of thousands of visitors to the campus each year.” This over 90-year tradition in Ames was terminated permanently after riots in 2014, where drunk celebrators flipped over multiple vehicles and tore light poles down. This was not the first incidence of violence and severe property damage in VEISHEA’s history. Did former President Leath make the right decision?

# Run this code to create the plot 
ggplot() + 
  geom_point(data = weekly, aes(x = week, y = n, color = year)) + 
  geom_line(data = weekly, aes(x = week, y = n, color = year)) +  # included to make the plot more readable 
  geom_segment(data = NULL, arrow = arrow(angle = 20, length = unit(0.1, "inches"),
                                          ends = "last", type = "closed"), 
               aes(x = c(20,20), xend = c(15.5,16), y = c(21, 20), yend = c(21, 12.25))) + 
  geom_text(data = NULL, aes(x = 23, y = 20.5, label = "VEISHEA Weeks"), size = 3) + 
  scale_x_continuous(breaks = seq(0,52,2))

# Make a decision about VEISHEA. TRUE or FALSE?  
cancelling_VEISHEA_was_right <- TRUE