Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
library(tidyverse)
library(lubridate)
library(scales)
theme_set(theme_light())
bike_traffic_raw <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-02/bike_traffic.csv")
bike_traffic <- bike_traffic_raw %>%
mutate(date = mdy_hms(date)) %>%
filter(bike_count < 2000) %>%
select(-ped_count)
bike_traffic %>%
count(crossing, direction)
## # A tibble: 13 x 3
## crossing direction n
## <chr> <chr> <int>
## 1 39th Ave NE Greenway at NE 62nd St North 38660
## 2 39th Ave NE Greenway at NE 62nd St South 38660
## 3 Broadway Cycle Track North Of E Union St North 44565
## 4 Broadway Cycle Track North Of E Union St South 44565
## 5 Burke Gilman Trail North 42905
## 6 Burke Gilman Trail South 42902
## 7 Elliot Bay Trail North 45234
## 8 Elliot Bay Trail South 45234
## 9 MTS Trail East 45565
## 10 NW 58th St Greenway at 22nd Ave East 44342
## 11 NW 58th St Greenway at 22nd Ave West 44342
## 12 Sealth Trail North 16054
## 13 Sealth Trail South 16054
bike_traffic %>%
ggplot(aes(date, fill = is.na(bike_count))) +
geom_histogram() +
facet_grid(crossing ~ direction)
Bike Frequencies in Seattle
April 5th, 2019
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
Source of Data: hourly stats for bikes and pedestrians in Seattle
Row Representation: North, South, East, West
Number of observations: half a million
Variables and meaning:
Hint: For example, importing data, understanding the data, data exploration, etc.
bike_traffic %>%
group_by(crossing,
hour = hour(date)) %>%
summarize(bike_count = sum(bike_count, na.rm = TRUE)) %>%
mutate(pct_bike = bike_count / sum(bike_count)) %>%
ggplot(aes(hour, pct_bike, color = crossing)) +
geom_line() +
geom_point() +
scale_y_continuous(labels = percent_format()) +
labs(title = "When in the day do people bike through these Seattle crossings?",
subtitle = "Based on crossings from 2014-February 2019",
color = "Crossing",
x = "Time of day (local time)",
y = "% of bike crossings that happen in this hour")
bike_by_time_window <- bike_traffic %>%
mutate(hour = hour(date)) %>%
mutate(time_window = case_when(
between(hour, 7, 10) ~ "Morning Commute",
between(hour, 11, 15) ~ "Midday",
between(hour, 16, 18) ~ "Evening Commute",
TRUE ~ "Night"
)) %>%
group_by(crossing,
time_window) %>%
summarize(number_missing = sum(is.na(bike_count)),
bike_count = sum(bike_count, na.rm = TRUE)) %>%
mutate(pct_bike = bike_count / sum(bike_count))
bike_by_time_window %>%
select(-number_missing, -bike_count) %>%
spread(time_window, pct_bike) %>%
mutate(TotalCommute = `Evening Commute` + `Morning Commute`) %>%
arrange(desc(TotalCommute))
## # A tibble: 7 x 6
## # Groups: crossing [7]
## crossing `Evening Commut… Midday `Morning Commut… Night TotalCommute
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Elliot Bay Trail 0.329 0.250 0.278 0.143 0.607
## 2 39th Ave NE Green… 0.294 0.248 0.288 0.171 0.581
## 3 Sealth Trail 0.266 0.307 0.280 0.147 0.546
## 4 MTS Trail 0.270 0.330 0.271 0.129 0.541
## 5 Burke Gilman Trail 0.271 0.370 0.241 0.117 0.513
## 6 Broadway Cycle Tr… 0.271 0.248 0.213 0.268 0.484
## 7 NW 58th St Greenw… 0.240 0.364 0.202 0.194 0.442
bike_by_time_window %>%
ggplot(aes(time_window, pct_bike)) +
geom_col() +
coord_flip() +
facet_wrap(~ crossing)
bike_by_time_window %>%
group_by(crossing) %>%
summarize(total_bikes = sum(bike_count),
pct_commute = sum(bike_count[str_detect(time_window, "Commute")]) / total_bikes) %>%
ggplot(aes(total_bikes, pct_commute)) +
geom_point() +
scale_x_log10()
bike_traffic %>%
group_by(crossing,
weekday = wday(date, label = TRUE),
hour = hour(date)) %>%
summarize(total_bikes = sum(bike_count, na.rm = TRUE)) %>%
group_by(crossing) %>%
mutate(pct_bike = total_bikes / sum(total_bikes)) %>%
ggplot(aes(hour, pct_bike, color = crossing)) +
geom_line(show.legend = FALSE) +
facet_grid(crossing ~ weekday) +
scale_y_continuous(labels = percent_format()) +
labs(x = "Time of week",
y = "% of bike crossings happening in this hour",
title = "When in the week do people in Seattle bike?",
subtitle = "Based on crossings from 2014-February 2019")
bike_traffic %>%
filter(date < "2018-01-01") %>%
group_by(crossing,
month = fct_relevel(month.name[month(date)], month.name)) %>%
summarize(total_bikes = sum(bike_count, na.rm = TRUE)) %>%
mutate(pct_bike = total_bikes / sum(total_bikes)) %>%
ggplot(aes(month, pct_bike, color = crossing, group = crossing)) +
geom_line() +
expand_limits(y = 0) +
scale_y_continuous(labels = percent_format()) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "What time of year do people bike?",
subtitle = "Based on 2014-2017 bike crossings",
y = "% of yearly trips in this month",
x = "")
bike_by_direction_hour_crossing <- bike_traffic %>%
filter(crossing != "MTS Trail",
!wday(date, label = TRUE) %in% c("Sat", "Sun"),
direction %in% c("North", "South")) %>%
mutate(hour = hour(date)) %>%
group_by(crossing,
direction,
hour) %>%
summarize(bike_count = sum(bike_count, na.rm = TRUE)) %>%
mutate(pct_bike = bike_count / sum(bike_count))
bike_by_direction_hour_crossing %>%
group_by(crossing) %>%
mutate(average_hour = sum((hour * pct_bike)[direction == "North"])) %>%
ungroup() %>%
mutate(crossing = fct_reorder(crossing, average_hour)) %>%
ggplot(aes(hour, pct_bike, color = direction)) +
geom_line() +
facet_grid(crossing ~ .) +
scale_y_continuous(labels = percent_format()) +
labs(x = "Time of day",
y = "% of bike crossings happening in this hour",
title = "In which directions do people commute by bike?",
subtitle = "Based on weekday crossings at six Seattle locations from 2014-February 2019",
color = "Direction")
Most popular Seattle crossing and time:
Most popular reason people are traveling at certain times:
-The data also shows that most often people use the trails for a morning and evening commute from work to home during the week.
Most popular time of the week for bike travel:
Most popular time of year to bike:
Most popular directions of travel on each trail:
There are a lot of really amazing findings in this data but what really jumps out at me is the amount of travel that happen only North and South. The travel for these two directions is the most prominent throughout this data set which is really interesting because it has to do with where people are coming from and which direction the heart of the city is, menaing where people work, go to school, go to the store, etc. That just makes the most sense to me on why people would be traveling the most in these two directions.
I really enjoyed watching Dave make the line plot for “When in the day do people bike through these Seattle crossings?” because he set it up so that there was a line plot for every single trail and was able to pull data with a couple different codes. I also enjoy the fact that he color coded every graph so it made it easier to understand. It’s really interesting to look at data on graphs like this because the “flow” of the data is different for each informational piece depending on what kind of graph, table, etc., is used to display the code.