1 Description of the Dataset

biking <- read.csv("C:/Users/qinfa/Desktop/school/STA 321/biking.csv")
biking$BrooklynBridge <- decomma(biking$BrooklynBridge)
biking$Total <- decomma(biking$Total)

kable(head(biking), caption = "First few records in the data set") 
First few records in the data set
Date Day HighTemp LowTemp Precipitation BrooklynBridge Total
10/1 Sunday 66.9 50.0 0 2297 15975
10/2 Monday 72.0 52.0 0 3387 23784
10/3 Tuesday 70.0 57.0 0 3386 25280
10/4 Wednesday 75.0 55.9 0 3412 25477
10/5 Thursday 82.0 64.9 0 3312 23942
10/6 Friday 81.0 69.1 0 2982 22197

This dataset represents the total number of cyclists per 24 hour periods in October at Brooklyn Bridge and in New York City.

Date (categorical) - the date in October on which the data was collected, will serve as the ID in the analysis

Day (categorical) - the day of the week

HighTemp (continuous) - the highest temperature of that day

LowTemp (continuous) - the lowest temperature of that day

Precipitation (continuous) - the amount of precipitation for that day

BrooklynBridge (integer) - the number of cyclists on the Brooklyn Bridge for that day

Total (integer) - the number of total cyclists for that day

2 Research Question

The primary random response variable is the number of cyclists on the Brooklyn Bridge. The predictor variables will be the day of the week, the high and low temperatures, the precipitation, and the total number of cyclists that day.

3 Poisson Regression on Cyclist Count on the Brooklyn Bridge

We first build a Poisson frequency regression model that ignores the total number of cyclists for that day.

model.freq <- glm(BrooklynBridge ~ Day + HighTemp + LowTemp + Precipitation + Total, family = poisson(link = "log"), data = biking)
##
pois.count.coef = summary(model.freq)$coef
kable(pois.count.coef, caption = "The Poisson regression model for the counts of cyclists on the Brooklyn Bridge versus the Day of the Week, High and Low Temperatures, the Precipitation, and the Total number of cyclists")
The Poisson regression model for the counts of cyclists on the Brooklyn Bridge versus the Day of the Week, High and Low Temperatures, the Precipitation, and the Total number of cyclists
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.9179042 0.0382688 180.7715394 0.0000000
DayMonday -0.0435639 0.0131007 -3.3253198 0.0008832
DaySaturday 0.1168925 0.0160482 7.2838351 0.0000000
DaySunday 0.1483905 0.0169500 8.7545901 0.0000000
DayThursday -0.0543316 0.0132370 -4.1045401 0.0000405
DayTuesday -0.0868645 0.0127485 -6.8136827 0.0000000
DayWednesday -0.0624903 0.0129472 -4.8265492 0.0000014
HighTemp -0.0065703 0.0010435 -6.2966593 0.0000000
LowTemp 0.0006489 0.0009593 0.6764132 0.4987784
Precipitation -0.5555543 0.0269869 -20.5860452 0.0000000
Total 0.0000712 0.0000016 44.7939032 0.0000000

According to this model, we find the LowTemp does not have a significant effect on the number of cyclists on the Brooklyn Bridge. The coefficients of the days of the week indicate that there are fewer cyclists during the weekdays and more cyclists on Saturday and Sunday, compared to on Friday. Precipitation and HighTemp both are negatively correlated with the number of cyclists on the Brooklyn Bridge, indicating that higher temperatures and amounts of precipitation correspond to decreases in the number of cyclists that use the bridge. Finally, the total number of cyclists corresponds to a positive change to the number of cyclists that use the bridge, which makes sense. We recreate this model but remove LowTemp.

model.freq <- glm(BrooklynBridge ~ Day + HighTemp + Precipitation + Total, family = poisson(link = "log"), data = biking)
##
pois.count.coef = summary(model.freq)$coef
kable(pois.count.coef, caption = "The Poisson regression model for the counts of cyclists on the Brooklyn Bridge versus the Day of the Week, High and Low Temperatures, the Precipitation, and the Total number of cyclists")
The Poisson regression model for the counts of cyclists on the Brooklyn Bridge versus the Day of the Week, High and Low Temperatures, the Precipitation, and the Total number of cyclists
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.9237086 0.0372894 185.675079 0.0000000
DayMonday -0.0433374 0.0130964 -3.309114 0.0009359
DaySaturday 0.1141852 0.0155416 7.347084 0.0000000
DaySunday 0.1456516 0.0164573 8.850258 0.0000000
DayThursday -0.0528062 0.0130427 -4.048707 0.0000515
DayTuesday -0.0860004 0.0126835 -6.780478 0.0000000
DayWednesday -0.0621668 0.0129380 -4.804976 0.0000015
HighTemp -0.0059700 0.0005490 -10.875199 0.0000000
Precipitation -0.5557272 0.0269797 -20.597969 0.0000000
Total 0.0000707 0.0000014 50.844755 0.0000000

All of the predictors are now significant.

We will proceed by creating a model that includes the total number of cyclists in the city, as the number of cyclists on the Brooklyn Bridge should be dependent on the number of cyclists in the city in total.

4 Poisson Regression on Rates

The following model assesses the relationship between the proportion of total cyclists that were counted on the Brooklyn Bridge over each day in October with the day of the week, the temperature of each day, and the precipitation.

model.rates <- glm(BrooklynBridge ~ Day + HighTemp + LowTemp + Precipitation + BrooklynBridge, offset = log(Total), 
                   family = poisson(link = "log"), data = biking)
## Warning in model.matrix.default(mt, mf, contrasts): the response appeared on
## the right-hand side and was dropped
## Warning in model.matrix.default(mt, mf, contrasts): problem with term 5 in
## model.matrix: no columns are assigned
kable(summary(model.rates)$coef, caption = "Poisson regression on the proportion of bikers in NYC that used the Brooklyn Bridge for each day of the week adjusted by temperature and precipitation.")
Poisson regression on the proportion of bikers in NYC that used the Brooklyn Bridge for each day of the week adjusted by temperature and precipitation.
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.8782777 0.0353287 -53.1657605 0.0000000
DayMonday -0.0290109 0.0131078 -2.2132605 0.0268797
DaySaturday 0.0101480 0.0138468 0.7328773 0.4636333
DaySunday 0.0534409 0.0143345 3.7281271 0.0001929
DayThursday -0.0298994 0.0131118 -2.2803399 0.0225875
DayTuesday -0.0270614 0.0125364 -2.1586255 0.0308792
DayWednesday -0.0239583 0.0127221 -1.8831975 0.0596736
HighTemp 0.0019344 0.0008161 2.3701500 0.0177809
LowTemp -0.0037133 0.0008384 -4.4288199 0.0000095
Precipitation -0.3240410 0.0259998 -12.4632219 0.0000000

The model indicates that the log of the proportion of cyclists that used the Brooklyn Bridge per day is not identical across the days of the week and are affected by the differing weather conditions of each day. Precipitation has a negative relationship with the proportion of bikers that used the Brooklyn Bridge. Not all of the days are significantly different from the baseline day of Friday, but the log rate of bikers that used the Brooklyn Bridge on Saturday and Sunday is higher while the other days of the week are lower. The regression coefficients for the days of the week indicate the change of log rate between each day of the week and the baseline day of the week, Friday.

5 Conclusions

There is only one categorical variable in the dataset (not including the ID), so no analysis was done comparing two categorical variables to examine them graphically an check for interaction.

The first model we created on the cyclist counts found that the days of the week, the high temperature, the precipitation, and the total number of cyclists were significant in predicting the count of cyclists that would use the Brooklyn Bridge over a 24 hour period. For the days of the week, more cyclists would use the Brooklyn Bridge on weekends than during the week. The more cyclists there were in total in NYC that day, the more cyclists would use the Brooklyn Bridge. The high temperature and precipitation, on the other hand, had a negatve associaton with the number of cyclists that would use the bridge.

A second model was created on the proportion of total cyclists that used the Brooklyn Bridge. This model found the predictors of days of the week, high temperature, low temperature, and precipitation as significant. Again, weekends saw higher rates of cyclists using the Brooklyn Bridge compared to the other days of the week. In this model, higher high temperatures corresponded to higher rates of cyclists using the Brooklyn Bridge, but higher recorded low temperatures and precipitation had a negative relationship with the proportion of cyclists that would use the bridge.