Instructions
Q1 What is the title of the screencast?
Q2 When was it published?
Q3 Describe the data
Q4-Q5 Describe how Dave approached the analysis each step.
- Below is the data Dave analyzes in the video
When in the day do we see bikers?
What directions do people commute by bike?

Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ

Instructions

You must follow the instructions below to get credits for this assignment.

Read the document posted in Moodle before answering the following questions.
Write in your own words. Multiple identical answers will get zero.
Elaborate your answer. One or two sentence answers won’t get credit.

library(tidyverse)
library(lubridate)
library(scales)
theme_set(theme_light())
bike_traffic_raw <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-02/bike_traffic.csv")
bike_traffic <- bike_traffic_raw %>%
  mutate(date = mdy_hms(date)) %>%
  filter(bike_count < 2000) %>%
  select(-ped_count)

bike_traffic %>%
  count(crossing, direction)
## # A tibble: 13 x 3
##    crossing                                 direction     n
##    <chr>                                    <chr>     <int>
##  1 39th Ave NE Greenway at NE 62nd St       North     38660
##  2 39th Ave NE Greenway at NE 62nd St       South     38660
##  3 Broadway Cycle Track North Of E Union St North     44565
##  4 Broadway Cycle Track North Of E Union St South     44565
##  5 Burke Gilman Trail                       North     42905
##  6 Burke Gilman Trail                       South     42902
##  7 Elliot Bay Trail                         North     45234
##  8 Elliot Bay Trail                         South     45234
##  9 MTS Trail                                East      45565
## 10 NW 58th St Greenway at 22nd Ave          East      44342
## 11 NW 58th St Greenway at 22nd Ave          West      44342
## 12 Sealth Trail                             North     16054
## 13 Sealth Trail                             South     16054

bike_traffic %>%
  ggplot(aes(date, fill = is.na(bike_count))) +
  geom_histogram() +
  facet_grid(crossing ~ direction)

Q1 What is the title of the screencast?

Bike Frequencies in Seattle

Q2 When was it published?

April 5th, 2019

Q3 Describe the data

Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?

Source of Data: hourly stats for bikes and pedestrians in Seattle

Row Representation: North, South, East, West

Number of observations: half a million

Variables and meaning:

crossing - the street crossing/intersection
direction - North/South/East/West - varies by crossing
data - date of data upload
bike_count - number of bikes each hour
ped_count - number of pdestrians each hour

Q4-Q5 Describe how Dave approached the analysis each step.

Hint: For example, importing data, understanding the data, data exploration, etc.

- Below is the data Dave analyzes in the video

When in the day do we see bikers?

bike_traffic %>%
  group_by(crossing,
           hour = hour(date)) %>%
  summarize(bike_count = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = bike_count / sum(bike_count)) %>%
  ggplot(aes(hour, pct_bike, color = crossing)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = percent_format()) +
  labs(title = "When in the day do people bike through these Seattle crossings?",
       subtitle = "Based on crossings from 2014-February 2019",
       color = "Crossing",
       x = "Time of day (local time)",
       y = "% of bike crossings that happen in this hour")

bike_by_time_window <- bike_traffic %>%
  mutate(hour = hour(date)) %>%
  mutate(time_window = case_when(
    between(hour, 7, 10) ~ "Morning Commute",
    between(hour, 11, 15) ~ "Midday",
    between(hour, 16, 18) ~ "Evening Commute",
    TRUE ~ "Night"
  )) %>%
  group_by(crossing,
           time_window) %>%
  summarize(number_missing = sum(is.na(bike_count)),
            bike_count = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = bike_count / sum(bike_count))
bike_by_time_window %>%
  select(-number_missing, -bike_count) %>%
  spread(time_window, pct_bike) %>%
  mutate(TotalCommute = `Evening Commute` + `Morning Commute`) %>%
  arrange(desc(TotalCommute))
## # A tibble: 7 x 6
## # Groups:   crossing [7]
##   crossing           `Evening Commut… Midday `Morning Commut… Night TotalCommute
##   <chr>                         <dbl>  <dbl>            <dbl> <dbl>        <dbl>
## 1 Elliot Bay Trail              0.329  0.250            0.278 0.143        0.607
## 2 39th Ave NE Green…            0.294  0.248            0.288 0.171        0.581
## 3 Sealth Trail                  0.266  0.307            0.280 0.147        0.546
## 4 MTS Trail                     0.270  0.330            0.271 0.129        0.541
## 5 Burke Gilman Trail            0.271  0.370            0.241 0.117        0.513
## 6 Broadway Cycle Tr…            0.271  0.248            0.213 0.268        0.484
## 7 NW 58th St Greenw…            0.240  0.364            0.202 0.194        0.442
bike_by_time_window %>%
  ggplot(aes(time_window, pct_bike)) +
  geom_col() +
  coord_flip() +
  facet_wrap(~ crossing)

bike_by_time_window %>%
  group_by(crossing) %>%
  summarize(total_bikes = sum(bike_count),
            pct_commute = sum(bike_count[str_detect(time_window, "Commute")]) / total_bikes) %>%
  ggplot(aes(total_bikes, pct_commute)) +
  geom_point() +
  scale_x_log10()

bike_traffic %>%
  group_by(crossing,
           weekday = wday(date, label = TRUE),
           hour = hour(date)) %>%
  summarize(total_bikes = sum(bike_count, na.rm = TRUE)) %>%
  group_by(crossing) %>%
  mutate(pct_bike = total_bikes / sum(total_bikes)) %>%
  ggplot(aes(hour, pct_bike, color = crossing)) +
  geom_line(show.legend = FALSE) +
  facet_grid(crossing ~ weekday) +
  scale_y_continuous(labels = percent_format()) +
  labs(x = "Time of week",
       y = "% of bike crossings happening in this hour",
       title = "When in the week do people in Seattle bike?",
       subtitle = "Based on crossings from 2014-February 2019")

bike_traffic %>%
  filter(date < "2018-01-01") %>%
  group_by(crossing,
           month = fct_relevel(month.name[month(date)], month.name)) %>%
  summarize(total_bikes = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = total_bikes / sum(total_bikes)) %>%
  ggplot(aes(month, pct_bike, color = crossing, group = crossing)) +
  geom_line() +
  expand_limits(y = 0) +
  scale_y_continuous(labels = percent_format()) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "What time of year do people bike?",
       subtitle = "Based on 2014-2017 bike crossings",
       y = "% of yearly trips in this month",
       x = "")

What directions do people commute by bike?

bike_by_direction_hour_crossing <- bike_traffic %>%
  filter(crossing != "MTS Trail",
         !wday(date, label = TRUE) %in% c("Sat", "Sun"),
         direction %in% c("North", "South")) %>%
  mutate(hour = hour(date)) %>%
  group_by(crossing,
           direction,
           hour) %>%
  summarize(bike_count = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = bike_count / sum(bike_count))
bike_by_direction_hour_crossing %>%
  group_by(crossing) %>%
  mutate(average_hour = sum((hour * pct_bike)[direction == "North"])) %>%
  ungroup() %>%
  mutate(crossing = fct_reorder(crossing, average_hour)) %>%
  ggplot(aes(hour, pct_bike, color = direction)) +
  geom_line() +
  facet_grid(crossing ~ .) +
  scale_y_continuous(labels = percent_format()) +
  labs(x = "Time of day",
       y = "% of bike crossings happening in this hour",
       title = "In which directions do people commute by bike?",
       subtitle = "Based on weekday crossings at six Seattle locations from 2014-February 2019",
       color = "Direction")

Importing Data:

Above is all of the data that Dave analyzes in the bike frequencies video. He talks about where peopl cross interesections, what time they bike on certain streets/trails, how many bikes per hour on certain streets/trails, when in the week people bike in seattle, what time of year do they bike, and what directions do they commute by.

Understanding the Data:

Understanding the data becomes a lot easier when we hit the knit button because the code that Dave comes up with shows the different patterns for bike crossings on certain streets and trails. The data also shows what time of day the bikes are there per hour. The data also shows when in the week is the most popular and what time of year is the most popular. Any of the data that is shown through a graph is a lot easier to understand and Dave does this with all of the data.

Data Exploration:

Q6 Did you see anything in the video that you learned in class? Describe.

The video had a lot of the same set up for data that we use in class such as ggplot, geom_line, and I also saw terms like “summarize” and “group by” which we used often in class to set up data for graphs like line plot, scatterplot, and bar graphs.

Q7 What is a major finding from the analysis.

There are a lot of really amazing findings in this data but what really jumps out at me is the amount of travel that happen only North and South. The travel for these two directions is the most prominent throughout this data set which is really interesting because it has to do with where people are coming from and which direction the heart of the city is, menaing where people work, go to school, go to the store, etc. That just makes the most sense to me on why people would be traveling the most in these two directions.

Q8 What is the most interesting thing you really liked about the analysis.

I really enjoyed watching Dave make the line plot for “When in the day do people bike through these Seattle crossings?” because he set it up so that there was a line plot for every single trail and was able to pull data with a couple different codes. I also enjoy the fact that he color coded every graph so it made it easier to understand. It’s really interesting to look at data on graphs like this because the “flow” of the data is different for each informational piece depending on what kind of graph, table, etc., is used to display the code.

Tidytuesday Screencast