Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ

Instructions

You must follow the instructions below to get credits for this assignment.

library(tidyverse)
library(lubridate)
library(scales)
theme_set(theme_light())
bike_traffic_raw <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-02/bike_traffic.csv")
bike_traffic <- bike_traffic_raw %>%
  mutate(date = mdy_hms(date)) %>%
  filter(bike_count < 2000) %>%
  select(-ped_count)
bike_traffic %>%
  count(crossing, direction)
## # A tibble: 13 x 3
##    crossing                                 direction     n
##    <chr>                                    <chr>     <int>
##  1 39th Ave NE Greenway at NE 62nd St       North     38660
##  2 39th Ave NE Greenway at NE 62nd St       South     38660
##  3 Broadway Cycle Track North Of E Union St North     44565
##  4 Broadway Cycle Track North Of E Union St South     44565
##  5 Burke Gilman Trail                       North     42905
##  6 Burke Gilman Trail                       South     42902
##  7 Elliot Bay Trail                         North     45234
##  8 Elliot Bay Trail                         South     45234
##  9 MTS Trail                                East      45565
## 10 NW 58th St Greenway at 22nd Ave          East      44342
## 11 NW 58th St Greenway at 22nd Ave          West      44342
## 12 Sealth Trail                             North     16054
## 13 Sealth Trail                             South     16054
bike_traffic %>%
  ggplot(aes(date, fill = is.na(bike_count))) +
  geom_histogram() +
  facet_grid(crossing ~ direction)

Q1 What is the title of the screencast?

Bike Frequencies in Seattle

Q2 When was it published?

April 5th, 2019

Q3 Describe the data

Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?

Source of Data: hourly stats for bikes and pedestrians in Seattle

Row Representation: North, South, East, West

Number of observations: half a million

Variables and meaning:

Q4-Q5 Describe how Dave approached the analysis each step.

Hint: For example, importing data, understanding the data, data exploration, etc.

- Below is the data Dave analyzes in the video

When in the day do we see bikers?

bike_traffic %>%
  group_by(crossing,
           hour = hour(date)) %>%
  summarize(bike_count = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = bike_count / sum(bike_count)) %>%
  ggplot(aes(hour, pct_bike, color = crossing)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = percent_format()) +
  labs(title = "When in the day do people bike through these Seattle crossings?",
       subtitle = "Based on crossings from 2014-February 2019",
       color = "Crossing",
       x = "Time of day (local time)",
       y = "% of bike crossings that happen in this hour")

bike_by_time_window <- bike_traffic %>%
  mutate(hour = hour(date)) %>%
  mutate(time_window = case_when(
    between(hour, 7, 10) ~ "Morning Commute",
    between(hour, 11, 15) ~ "Midday",
    between(hour, 16, 18) ~ "Evening Commute",
    TRUE ~ "Night"
  )) %>%
  group_by(crossing,
           time_window) %>%
  summarize(number_missing = sum(is.na(bike_count)),
            bike_count = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = bike_count / sum(bike_count))
bike_by_time_window %>%
  select(-number_missing, -bike_count) %>%
  spread(time_window, pct_bike) %>%
  mutate(TotalCommute = `Evening Commute` + `Morning Commute`) %>%
  arrange(desc(TotalCommute))
## # A tibble: 7 x 6
## # Groups:   crossing [7]
##   crossing           `Evening Commut… Midday `Morning Commut… Night TotalCommute
##   <chr>                         <dbl>  <dbl>            <dbl> <dbl>        <dbl>
## 1 Elliot Bay Trail              0.329  0.250            0.278 0.143        0.607
## 2 39th Ave NE Green…            0.294  0.248            0.288 0.171        0.581
## 3 Sealth Trail                  0.266  0.307            0.280 0.147        0.546
## 4 MTS Trail                     0.270  0.330            0.271 0.129        0.541
## 5 Burke Gilman Trail            0.271  0.370            0.241 0.117        0.513
## 6 Broadway Cycle Tr…            0.271  0.248            0.213 0.268        0.484
## 7 NW 58th St Greenw…            0.240  0.364            0.202 0.194        0.442
bike_by_time_window %>%
  ggplot(aes(time_window, pct_bike)) +
  geom_col() +
  coord_flip() +
  facet_wrap(~ crossing)

bike_by_time_window %>%
  group_by(crossing) %>%
  summarize(total_bikes = sum(bike_count),
            pct_commute = sum(bike_count[str_detect(time_window, "Commute")]) / total_bikes) %>%
  ggplot(aes(total_bikes, pct_commute)) +
  geom_point() +
  scale_x_log10()

bike_traffic %>%
  group_by(crossing,
           weekday = wday(date, label = TRUE),
           hour = hour(date)) %>%
  summarize(total_bikes = sum(bike_count, na.rm = TRUE)) %>%
  group_by(crossing) %>%
  mutate(pct_bike = total_bikes / sum(total_bikes)) %>%
  ggplot(aes(hour, pct_bike, color = crossing)) +
  geom_line(show.legend = FALSE) +
  facet_grid(crossing ~ weekday) +
  scale_y_continuous(labels = percent_format()) +
  labs(x = "Time of week",
       y = "% of bike crossings happening in this hour",
       title = "When in the week do people in Seattle bike?",
       subtitle = "Based on crossings from 2014-February 2019")

bike_traffic %>%
  filter(date < "2018-01-01") %>%
  group_by(crossing,
           month = fct_relevel(month.name[month(date)], month.name)) %>%
  summarize(total_bikes = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = total_bikes / sum(total_bikes)) %>%
  ggplot(aes(month, pct_bike, color = crossing, group = crossing)) +
  geom_line() +
  expand_limits(y = 0) +
  scale_y_continuous(labels = percent_format()) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "What time of year do people bike?",
       subtitle = "Based on 2014-2017 bike crossings",
       y = "% of yearly trips in this month",
       x = "")

What directions do people commute by bike?

bike_by_direction_hour_crossing <- bike_traffic %>%
  filter(crossing != "MTS Trail",
         !wday(date, label = TRUE) %in% c("Sat", "Sun"),
         direction %in% c("North", "South")) %>%
  mutate(hour = hour(date)) %>%
  group_by(crossing,
           direction,
           hour) %>%
  summarize(bike_count = sum(bike_count, na.rm = TRUE)) %>%
  mutate(pct_bike = bike_count / sum(bike_count))
bike_by_direction_hour_crossing %>%
  group_by(crossing) %>%
  mutate(average_hour = sum((hour * pct_bike)[direction == "North"])) %>%
  ungroup() %>%
  mutate(crossing = fct_reorder(crossing, average_hour)) %>%
  ggplot(aes(hour, pct_bike, color = direction)) +
  geom_line() +
  facet_grid(crossing ~ .) +
  scale_y_continuous(labels = percent_format()) +
  labs(x = "Time of day",
       y = "% of bike crossings happening in this hour",
       title = "In which directions do people commute by bike?",
       subtitle = "Based on weekday crossings at six Seattle locations from 2014-February 2019",
       color = "Direction")

Importing Data:

  • Above is all of the data that Dave analyzes in the bike frequencies video. He talks about where peopl cross interesections, what time they bike on certain streets/trails, how many bikes per hour on certain streets/trails, when in the week people bike in seattle, what time of year do they bike, and what directions do they commute by.

Understanding the Data:

  • Understanding the data becomes a lot easier when we hit the knit button because the code that Dave comes up with shows the different patterns for bike crossings on certain streets and trails. The data also shows what time of day the bikes are there per hour. The data also shows when in the week is the most popular and what time of year is the most popular. Any of the data that is shown through a graph is a lot easier to understand and Dave does this with all of the data.

Data Exploration:

Most popular Seattle crossing and time:

  • The data shows that at roughly 8 am and 4 pm traffic is the highest on all trails, Elliot Bay Trail having the highest amount of traffic during these times.

Most popular reason people are traveling at certain times:

-The data also shows that most often people use the trails for a morning and evening commute from work to home during the week.

Most popular time of the week for bike travel:

  • Bike travel is most popular Monday through Friday which makes sense for people that are using their bikes for commuting to and from work.

Most popular time of year to bike:

  • The window of time of April to September is when travel time for the year is above 10% on bicycles in Seattle.

Most popular directions of travel on each trail:

  • North travel in the morning is most popular on Elliot Bay Trail and Sealth Trail whereas North travel is most popular on Track North and Greenway trails in the afternoon. South travel is most popular on Elliot Bay Trail and Sealth trail in the afternoon. Gilman Trail has the most travel South in the morning compared to North travel and Greenway trail has the most travel South in the morning.

Q6 Did you see anything in the video that you learned in class? Describe.

  • The video had a lot of the same set up for data that we use in class such as ggplot, geom_line, and I also saw terms like “summarize” and “group by” which we used often in class to set up data for graphs like line plot, scatterplot, and bar graphs.

Q7 What is a major finding from the analysis.

There are a lot of really amazing findings in this data but what really jumps out at me is the amount of travel that happen only North and South. The travel for these two directions is the most prominent throughout this data set which is really interesting because it has to do with where people are coming from and which direction the heart of the city is, menaing where people work, go to school, go to the store, etc. That just makes the most sense to me on why people would be traveling the most in these two directions.

Q8 What is the most interesting thing you really liked about the analysis.

I really enjoyed watching Dave make the line plot for “When in the day do people bike through these Seattle crossings?” because he set it up so that there was a line plot for every single trail and was able to pull data with a couple different codes. I also enjoy the fact that he color coded every graph so it made it easier to understand. It’s really interesting to look at data on graphs like this because the “flow” of the data is different for each informational piece depending on what kind of graph, table, etc., is used to display the code.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.