Source file ⇒ Assignment_7.Rmd
In this assignment, you’ll examine some factors that may influence the use of bicycles in a bike-renting program. The data come from Washington, DC and cover the last quarter of 2014.
Two data tables are available:
Stations gives the locations of the bike rental stations.Trips contains records of individual rentals.You can access the data like this:
head(Stations)
## name lat long nbBikes
## 1 20th & Bell St 38.85610 -77.05120 7
## 2 18th & Eads St. 38.85725 -77.05332 8
## 3 20th & Crystal Dr 38.85640 -77.04920 8
## 4 15th & Crystal Dr 38.86017 -77.04959 9
## 5 Aurora Hills Community Ctr/18th & Hayes St 38.85787 -77.05949 7
## 6 Pentagon City Metro / 12th & S Hayes St 38.86230 -77.05994 7
## nbEmptyDocks
## 1 4
## 2 3
## 3 7
## 4 2
## 5 4
## 6 12
head(data_site)
## [1] "http://tiny.cc/dcf/2014-Q4-Trips-History-Data-Small.rds"
head(Trips)
## duration sdate sstation
## 344758 0h 9m 15s 2014-11-06 16:26:00 15th & L St NW
## 113251 0h 47m 21s 2014-10-12 11:30:00 3rd & D St SE
## 633756 2h 46m 22s 2014-12-27 14:24:00 10th & E St NW
## 466862 0h 15m 15s 2014-11-23 16:42:00 4th & M St SW
## 474332 0h 18m 33s 2014-11-24 17:29:00 1st & Washington Hospital Center NW
## 581597 0h 2m 36s 2014-12-15 13:11:00 11th & Kenyon St NW
## edate estation bikeno
## 344758 2014-11-06 16:35:00 15th & L St NW W00169
## 113251 2014-10-12 12:17:00 Jefferson Dr & 14th St SW W01482
## 633756 2014-12-27 17:10:00 10th & E St NW W21346
## 466862 2014-11-23 16:57:00 5th & K St NW W00647
## 474332 2014-11-24 17:47:00 Columbus Circle / Union Station W21580
## 581597 2014-12-15 13:14:00 Park Rd & Holmead Pl NW W21286
## client
## 344758 Registered
## 113251 Registered
## 633756 Casual
## 466862 Casual
## 474332 Registered
## 581597 Registered
The Trips data table is a random subset of 10,000 trips from the full quarterly data. Start with this small data table to develop your analysis commands. When you have this working well, you can access the full data set of more than 600,000 events by removing -Small from the name of the data_site.
It’s natural to expect that bikes are rented more at some times of day than others. The variable sdate gives the time (including the date) that the rental started.
Make these plots and interpret them:
Trips %>%
ggplot(aes(x = sdate)) +
geom_density()
This density plot shows how frequent trips started on on a specific date relative to all of the dates.
lubridate::hour(), and lubridate::minute() to extract the hour of the day and minute within the hour from sdate, e.g.Trips2 <- Trips %>%
mutate(time_of_day = lubridate::hour(sdate) + lubridate::minute(sdate) / 60)
Trips2 %>%
ggplot(aes(x = time_of_day)) +
geom_density()
lubridate::wday() to generate day of the week.)Trips3 <- Trips %>%
mutate(time_of_day = lubridate::hour(sdate) + lubridate::minute(sdate) / 60, day_of_week = lubridate::wday(sdate))
Trips3 %>%
ggplot(aes(x = time_of_day)) +
geom_density() +
facet_wrap(~day_of_week)
geom_density() to the client variable. You may also want to set the alpha for transparency and color=NA to suppress the outline of the density function.Trips4 <- Trips %>%
mutate(time_of_day = lubridate::hour(sdate) + lubridate::minute(sdate) / 60, day_of_week = lubridate::wday(sdate))
Trips4 %>%
ggplot(aes(x = time_of_day, alpha = 0.2)) +
geom_density(aes(fill = client)) +
facet_wrap(~day_of_week)
NOTE: client describes whether the renter is a regular user (level Registered) or has not joined the bike-rental organization (Causal).
geom_density() with the argument position = position_stack().Trips5 <- Trips %>%
mutate(time_of_day = lubridate::hour(sdate) + lubridate::minute(sdate) / 60, day_of_week = lubridate::wday(sdate))
Trips5 %>%
ggplot(aes(x = time_of_day, alpha = 0.5)) +
geom_density(aes(fill = client), position = position_stack()) +
facet_wrap(~day_of_week)
Trips6 <- Trips %>%
mutate(time_of_day = lubridate::hour(sdate) + lubridate::minute(sdate) / 60) %>%
mutate(wday = ifelse(lubridate::wday(sdate) %in% c(1,7), "weekend", "weekday"))
Trips6 %>%
ggplot(aes(x = time_of_day, alpha = 0.1)) +
geom_density(aes(fill = client), position = position_stack()) +
facet_wrap(~wday)
How does the start-to-end trip distance depend on time of day, day of the week, and client?
To answer this, you need first to compute the distance in each trip. As a start, compute a table like the following from the Stations data.
How to do this?
Stations, which we’ll call Left and Right. Left will have names sstation, lat, and long. Right will have names estation, lat2, and long2. The other variables, nbBikes and nbEmptyDocks should be dropped. Use the function dpylr::rename() to do the renaming of name,lat, and long (i.e. dyplyr::rename(sstation=name)).Left and Right with a full outer join. This is a join in which every case in Left is matched to every case in Right. You can accomplish the full outer join with left%>% merge(right,all=TRUE).Of course, with the latitude and longitude of each station, you have enough information to calculate the distance between stations. This calculation is provided by the haversine() function, which you can load with
left <- mosaic::read.file("http://tiny.cc/dcf/DC-Stations.csv")
## Reading data with read.csv()
right <- mosaic::read.file("http://tiny.cc/dcf/DC-Stations.csv")
## Reading data with read.csv()
left1 <- left %>%
select(name,lat,long) %>%
dplyr::rename(sstation=name)
head(left1)
## sstation lat long
## 1 20th & Bell St 38.85610 -77.05120
## 2 18th & Eads St. 38.85725 -77.05332
## 3 20th & Crystal Dr 38.85640 -77.04920
## 4 15th & Crystal Dr 38.86017 -77.04959
## 5 Aurora Hills Community Ctr/18th & Hayes St 38.85787 -77.05949
## 6 Pentagon City Metro / 12th & S Hayes St 38.86230 -77.05994
right2 <- right %>%
select(name,lat,long) %>%
dplyr::rename(estation=name,lat2=lat,long2=long)
head(right2)
## estation lat2 long2
## 1 20th & Bell St 38.85610 -77.05120
## 2 18th & Eads St. 38.85725 -77.05332
## 3 20th & Crystal Dr 38.85640 -77.04920
## 4 15th & Crystal Dr 38.86017 -77.04959
## 5 Aurora Hills Community Ctr/18th & Hayes St 38.85787 -77.05949
## 6 Pentagon City Metro / 12th & S Hayes St 38.86230 -77.05994
new <- left1 %>%
merge(right2,all=TRUE)
source("http://tiny.cc/dcf/haversine.R")
Stations2 <- new %>%
mutate(dist= haversine(lat,long,lat2,long2)) %>%
select(sstation,estation,dist)
head(Stations2)
## sstation estation dist
## 1 20th & Bell St 20th & Bell St 0.0000000
## 2 18th & Eads St. 20th & Bell St 0.2237177
## 3 20th & Crystal Dr 20th & Bell St 0.1763635
## 4 15th & Crystal Dr 20th & Bell St 0.4734716
## 5 Aurora Hills Community Ctr/18th & Hayes St 20th & Bell St 0.7441989
## 6 Pentagon City Metro / 12th & S Hayes St 20th & Bell St 1.0236764
Trips$hours <- (lubridate::hour(Trips$sdate))
finaltable <-Stations2 %>%
merge(Trips, all =TRUE)
ff <- finaltable[complete.cases(finaltable),]
head(ff)
## sstation estation dist duration
## 1 10th & E St NW 10th & E St NW 0.0000000 2h 46m 22s
## 2 10th & E St NW 10th & E St NW 0.0000000 0h 43m 10s
## 5 10th & E St NW 10th & U St NW 2.3669377 0h 14m 19s
## 6 10th & E St NW 10th St & Constitution Ave NW 0.3209389 1h 41m 53s
## 7 10th & E St NW 10th St & Constitution Ave NW 0.3209389 0h 3m 24s
## 8 10th & E St NW 10th St & Constitution Ave NW 0.3209389 0h 23m 46s
## sdate edate bikeno client hours
## 1 2014-12-27 14:24:00 2014-12-27 17:10:00 W21346 Casual 14
## 2 2014-10-31 18:57:00 2014-10-31 19:40:00 W01048 Casual 18
## 5 2014-10-12 13:57:00 2014-10-12 14:11:00 W20237 Registered 13
## 6 2014-10-19 09:34:00 2014-10-19 11:16:00 W01330 Casual 9
## 7 2014-10-04 15:49:00 2014-10-04 15:55:00 W21458 Casual 15
## 8 2014-11-23 11:34:00 2014-11-23 11:58:00 W21957 Casual 11