The data was scraped from Citi Bike’s station map, available at https://member.citibikenyc.com/map/. I trimmed the data to only include the station name, available docks, and total docks, excluding id, longitude, latitude, address, etc.
citi_raw.data <- fromJSON("https://feeds.citibikenyc.com/stations/stations.json")
citi <- citi_raw.data[['stationBeanList']]
The stations identified below have the highest number of bikes in use. Citi Bike management will want to consider adding additional bikes to these locations.
citi_trim <- select(citi, c(2:4))
citi_pop <- slice_max(citi_trim, totalDocks - availableDocks, n=5)
knitr::kable(citi_pop, digits = 2)
| stationName | availableDocks | totalDocks |
|---|---|---|
| Dean St & Hoyt St | 0 | 77 |
| W 15 St & 7 Ave | 4 | 79 |
| W Broadway & Spring St | 7 | 79 |
| E 15 St & 3 Ave | 6 | 77 |
| DeKalb Ave & Hudson Ave | 5 | 74 |
| W 41 St & 8 Ave | 2 | 71 |
library(ggplot2)
ggplot(citi_pop, aes(y=totalDocks - availableDocks, x=stationName)) +
geom_bar(position = "identity", stat="identity", color="red", fill="darkblue")+
theme(axis.text.x = element_text(angle = 45))+
ylab("Total bikes in use") +
xlab("Station name")
The stations identified below have the lowest number of bikes in use and the highest total bikes available. Citi Bike mangement will want to consider driving demand or removing bikes/stations to these locations with further review of data of an acceptable period of time.
citi_notpopall <- slice_min(citi_trim, totalDocks - availableDocks)
citi_notpop <- slice_max(citi_notpopall, totalDocks, n=5)
knitr::kable(citi_notpop, digits = 2)
| stationName | availableDocks | totalDocks |
|---|---|---|
| W 44 St & 11 Ave | 79 | 79 |
| E 97 St & Madison Ave | 41 | 41 |
| Broadway & Hancock St | 35 | 35 |
| Riverside Dr & W 145 St | 35 | 35 |
| Wadsworth Ave & W 179 St | 35 | 35 |
ggplot(citi_notpop, aes(y=totalDocks, x=stationName)) +
geom_bar(position = "identity", stat="identity", color="red", fill="darkblue")+
theme(axis.text.x = element_text(angle = 45))+
ylab("Total bikes not in use") +
xlab("Station name")
This frequency distribution reveals that a majority of bike stations are operating at a low capacity during the time of this web scrape. Management will want to review long term trends to identify solutions to increase sales at low preforming stations.
ggplot(data = citi_trim, aes(x = totalDocks - availableDocks)) +
geom_bar(stat = "bin", binwidth = 10, fill = "darkred", col = "darkblue") +
xlab("Total bikes in use") +
ylab("Frequency")
Finding data for the .JSON file took a large chunk of the time on this project and I had switched me data sets multiple times in the formation of this assignment.