1. Breath alcohol tests in Ames, Iowa, USA

Ames, Iowa, USA is the home of Iowa State University, a land grant university with over 36,000 students. By comparison, the city of Ames, Iowa, itself only has about 65,000 residents. As with any other college town, Ames has had its fair share of alcohol-related incidents. (For example, Google ‘VEISHEA riots 2014’.) We will take a look at some breath alcohol test data from Ames that is published by the State of Iowa.

# Load the packages 
library(dplyr)

Attaching package: 㤼㸱dplyr㤼㸲

The following objects are masked from 㤼㸱package:stats㤼㸲:

    filter, lag

The following objects are masked from 㤼㸱package:base㤼㸲:

    intersect, setdiff, setequal, union
library(readr)
library(ggplot2)

# Read the data into your workspace
ba_data <- read_csv("breath_alcohol_ames.csv")
Parsed with column specification:
cols(
  year = col_double(),
  month = col_double(),
  day = col_double(),
  hour = col_double(),
  location = col_character(),
  gender = col_character(),
  Res1 = col_double(),
  Res2 = col_double()
)
# Quickly inspect the data
head(ba_data)

# Obtain counts for each year 
# .... YOUR CODE FOR TASK 1 ....
ba_year <- ba_data %>%
  count(year)
ba_year

2. What is the busiest police department in Ames?

There are two police departments in the data set: the Iowa State University Police Department and the Ames Police Department. Which one administers more breathalyzer tests?

# Count the totals for each department
pds <- ba_data %>%
  count(location)
pds

3. Nothing Good Happens after 2am

We all know that “nothing good happens after 2am.” Thus, there are inevitably some times of the day when breath alcohol tests, especially in a college town like Ames, are most and least common. Which hours of the day have the most and least breathalyzer tests?

# Count by hour and arrange by descending frequency
hourly <- ba_data %>%
  count(hour, sort = TRUE)
# Use a geom_ to create the appropriate bar chart
ggplot(hourly, aes(x = hour, weight = n)) + geom_bar()

4. Breathalyzer tests by month

Now that we have discovered which time of day is most common for breath alcohol tests, we will determine which time of the year has the most breathalyzer tests. Which month will have the most recorded tests?

# Count by month and arrange by descending frequency
monthly <- ba_data %>%
  count(month, sort = TRUE)

# Make month a factor
monthly$month <- as.factor(monthly$month)

# Use a geom_ to create the appropriate bar chart
ggplot(monthly, aes(x = month, weight = n)) + geom_bar()

5. COLLEGE

When we think of (binge) drinking in college towns in America, we usually think of something like this image at the left. And so, one might suspect that breath alcohol tests are given to men more often than women and that men drink more than women.

# Count by gender 
ba_data %>% count(gender)

# Create a dataset with no NAs in gender 
clean_gender <- ba_data %>%
  filter(!is.na(gender))

# Create a mean test result variable and save as mean_bas
mean_bas <- clean_gender %>%
  mutate(meanRes = (Res1 + Res2)/2)

# Create side-by-side boxplots to compare the mean blood alcohol levels of men and women
ggplot(mean_bas, aes(x = gender, y = meanRes)) + geom_boxplot()

7. Breathalyzer tests: is there a pattern over time?

We previously saw that 2 a.m. is the most common time of day for breathalyzer tests to be administered, and August is the most common month of the year for breathalyzer tests. Now, we look at the weeks in the year over time. We briefly use the lubridate package for a bit of date-time manipulation.

library(lubridate) 

# Create date variable using paste() and ymd()
ba_data <- ba_data %>% mutate(date = ymd(paste(year, month, day, sep = "-")))

# Create a week variable using week()
ba_data <- ba_data %>% mutate(week = week(date))
head(ba_data)

8. Looking at timelines

How do the weeks differ over time? One of the most common data visualizations is the time series, a line tracking the changes in a variable over time. We will use the new week variable to look at test frequency over time. We end with a time series plot showing frequency of breathalyzer tests by week in year, with one line for each year.

# Create the weekly data set 
weekly <- ba_data %>%
  count(week, year)
# Make year a factor
weekly <- weekly %>% mutate(year = as.factor(year))

# Create the time series plot with one line for each year
ggplot(weekly, aes(x = week, y = n)) + 
  geom_line(aes(color = year)) + 
  geom_point(aes(color = year)) +  
  scale_x_continuous(breaks = seq(0,52,2))

9. The end of VEISHEA

From Wikipedia: “VEISHEA was an annual week-long celebration held each spring on the campus of Iowa State University in Ames, Iowa. The celebration featured an annual parade and many open-house demonstrations of the university facilities and departments. Campus organizations exhibited products, technologies, and held fundraisers for various charity groups. In addition, VEISHEA brought speakers, lecturers, and entertainers to Iowa State. […] VEISHEA was the largest student-run festival in the nation, bringing in tens of thousands of visitors to the campus each year.”

This over 90-year tradition in Ames was terminated permanently after riots in 2014, where drunk celebrators flipped over multiple vehicles and tore light poles down. This was not the first incidence of violence and severe property damage in VEISHEA’s history. Did former President Leath make the right decision?

# Run this code to create the plot 
ggplot() + 
  geom_point(data = weekly, aes(x = week, y = n, color = year)) + 
  geom_line(data = weekly, aes(x = week, y = n, color = year)) +  # included to make the plot more readable 
  geom_segment(data = NULL, arrow = arrow(angle = 20, length = unit(0.1, "inches"),
                                          ends = "last", type = "closed"), 
               aes(x = c(20,20), xend = c(15.5,16), y = c(21, 20), yend = c(21, 12.25))) + 
  geom_text(data = NULL, aes(x = 23, y = 20.5, label = "VEISHEA Weeks"), size = 3) + 
  scale_x_continuous(breaks = seq(0,52,2)) 


# Make a decision about VEISHEA. TRUE or FALSE?  
cancelling_VEISHEA_was_right <- FALSE
