Data for this project was obtained from the following website: Organization The City of Calgary Business Unit Transportation Planning https://data.calgary.ca/Transportation-Transit/Annual-Bicycle-Counts/ybd2-54bg
License URL https://data.calgary.ca/d/Open-Data-Terms/u45n-7awa The data consists of 290 rows with 23 columns (variables) including: Year, Gender, Quadrant, Helmet Use, Age
My project will look at data collected by the City of Calgary as part of their Cycling Strategy. Data was used by the city to improve infrastructure and access for cyclists in an effort to increase the number of Calgarians choosing cycling as a primary means of transportation. 290 data collection sites were located throughout the city from May to September 2013-2016. Calgary is a very large city at 825km2 but it also has one of the most extensive cycle track networks in North America. The City maintains approximately 1000 km of regional pathways and 96 km of trails, plus another 290 km of on-street bikeways and cycle tracks.
For my first figure, I am going to display the location of the data collection sites on a map using the ggmaps package and the variables “Latitude” and “Longitude” from the CSV file.
Calgarymap1
My second visualization uses column plots to show a) an increase in the number of people cycling (both male and female riders) from 2013 to 2016 and b) cyclist counts by month to see which months are more popular for riding (* note Calgary is a very cold snowy city in winter and cyclist counts drop significantly). The data clearly shows the number of male riders is almost double the number of female riders regardless of Year or Month.
grid.arrange(plot1, plot2, ncol=2)
My third visualization uses a column chart divided into the four quadrants. This visualization shows cyclist counts (Male and Female) from 2013-2016 for each of the four quadrants in Calgary to demonstrate the significant differences in cyclist counts from the west side of the city to the east side. This is largely due to the lack of infrastructure in the eastern half of the city.
plot3
My fourth visualization uses boxplots to show the statistical distribution of cyclists throughout the city - separated into Male and Female plots using facet wrap. The NW and SW quadrants show significant count outliers compared to the SE and NE quadrants.
subplot(plot4a,plot4b, shareX=T, shareY = F, titleY = T, titleX = T)
My fifth visual uses a simple column plot to show helmet use over the four years of the study for all riders (Male and Female). From 2013 to 2016 helmet usage has increased almost 70%.
plot5
My sixth visualization uses a simple column plot to show that the overwhelming majority of riders counted in this study were between the age of 18 to 65. While this result is not surprising, this should be used as a caution when using age in statistical calculations. I would recommend future studies attempt to break down age into smaller categories.
plot6
My seventh visualization creates an interactive correlation matrix to identify potential strong and weak correlations between a number of variables in the original dataset.
One interesting observations include:
The correlation between gender and helmet use. While both Male and Female riders show high correlation with helmet use, Male riders have a slightly higher correlation that Female riders. Helmet use is not mandatory for riders over 16 in Calgary. It would be interesting to understand why fewer Female riders choose to wear a helmet.
A similar correlation plot of cyclists vs pedestrians by Gender and Quadrant of the city showed interesting results. Quadrant of the city was a strong predictor for cycling but not useful at all for pedestrians (since there is less cycling infrastructure in the NE and SE but no lack of pedestrian infrastructure).
fig
For my eighth visualization I created two maps to demonstrate the impact of data collection bias on the final data. Over the time frame of the study (2013 to 2016) data collection sites were not in the same location. For a proper study I would have made sure data collection sites were the same from year-to-year. The map on the left shows the location of data collections sites each year. The density map on the right shows the impact of moving locations on count data over the same time period.
a_mgif
b_mgif