Citi Bike

The data was scraped from Citi Bike’s station map, available at https://member.citibikenyc.com/map/. I trimmed the data to only include the station name, available docks, and total docks, excluding id, longitude, latitude, address, etc.

citi_raw.data <- fromJSON("https://feeds.citibikenyc.com/stations/stations.json")
citi <- citi_raw.data[['stationBeanList']]

Unpopular Citi Bike Stations

The stations identified below have the lowest number of bikes in use and the highest total bikes available. Citi Bike mangement will want to consider driving demand or removing bikes/stations to these locations with further review of data of an acceptable period of time.

citi_notpopall <- slice_min(citi_trim, totalDocks - availableDocks)
citi_notpop <- slice_max(citi_notpopall, totalDocks, n=5)
knitr::kable(citi_notpop, digits = 2)
stationName availableDocks totalDocks
W 44 St & 11 Ave 79 79
E 97 St & Madison Ave 41 41
Broadway & Hancock St 35 35
Riverside Dr & W 145 St 35 35
Wadsworth Ave & W 179 St 35 35
ggplot(citi_notpop, aes(y=totalDocks, x=stationName)) + 
    geom_bar(position = "identity", stat="identity", color="red", fill="darkblue")+
theme(axis.text.x = element_text(angle = 45))+
  ylab("Total bikes not in use") +
  xlab("Station name")

Citi Bike Station Preformance

This frequency distribution reveals that a majority of bike stations are operating at a low capacity during the time of this web scrape. Management will want to review long term trends to identify solutions to increase sales at low preforming stations.

ggplot(data = citi_trim, aes(x = totalDocks - availableDocks)) + 
  geom_bar(stat = "bin", binwidth = 10, fill = "darkred", col = "darkblue") +
  xlab("Total bikes in use") +
  ylab("Frequency")

Trials and Tributions

Finding data for the .JSON file took a large chunk of the time on this project and I had switched me data sets multiple times in the formation of this assignment.