1 Description of the Dataset

biking <- read.csv("C:/Users/qinfa/Desktop/school/STA 321/biking.csv")
biking$BrooklynBridge <- decomma(biking$BrooklynBridge)
biking$Total <- decomma(biking$Total)

biking$AvgTemp <- (biking$HighTemp + biking$LowTemp)/2
biking <- biking %>% mutate(
  NewPrecip = case_when(
    Precipitation > 0 ~ 1,
    Precipitation == 0 ~ 0
  )
)

kable(head(biking), caption = "First few records in the data set") 
First few records in the data set
Date Day HighTemp LowTemp Precipitation BrooklynBridge Total AvgTemp NewPrecip
10/1 Sunday 66.9 50.0 0 2297 15975 58.45 0
10/2 Monday 72.0 52.0 0 3387 23784 62.00 0
10/3 Tuesday 70.0 57.0 0 3386 25280 63.50 0
10/4 Wednesday 75.0 55.9 0 3412 25477 65.45 0
10/5 Thursday 82.0 64.9 0 3312 23942 73.45 0
10/6 Friday 81.0 69.1 0 2982 22197 75.05 0

This dataset represents the total number of cyclists per 24 hour periods in October at Brooklyn Bridge and in New York City.

Date (categorical) - the date in October on which the data was collected, will serve as the ID in the analysis

Day (categorical) - the day of the week

HighTemp (continuous) - the highest temperature of that day

LowTemp (continuous) - the lowest temperature of that day

Precipitation (continuous) - the amount of precipitation for that day

BrooklynBridge (integer) - the number of cyclists on the Brooklyn Bridge for that day

Total (integer) - the number of total cyclists for that day

AvgTemp (continuous) - the average temperature of that day

NewPrecip (categorical) - indicates whether or not there was precipitation that day; 0 for no precipitation, 1 otherwise

2 Research Question

The goal is to create a dispersed poisson model for the number of cyclists on the Brooklyn Bridge. The predictor variables will be the day of the week, the average temperature of the day, and whether or not there was precipitation.

3 Poisson Model on Rates

We begin by constructing a poisson model without taking into account dispersion.

model.rates <- glm(BrooklynBridge ~ Day + AvgTemp + NewPrecip, offset = log(Total), 
                   family = poisson(link = "log"), data = biking)
pander(summary(model.rates)$coef, caption = "Poisson regression the proportion of NYC cyclists who used the Brooklyn Bridge for each day of the week adjusted by average temperature and precipitation.")
Poisson regression the proportion of NYC cyclists who used the Brooklyn Bridge for each day of the week adjusted by average temperature and precipitation.
  Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.854 0.03412 -54.33 0
DayMonday -0.008945 0.01363 -0.6564 0.5116
DaySaturday 0.02704 0.01393 1.941 0.05224
DaySunday 0.0487 0.01464 3.325 0.000883
DayThursday -0.02096 0.01313 -1.597 0.1104
DayTuesday -0.02645 0.01254 -2.109 0.03496
DayWednesday -0.004351 0.01282 -0.3394 0.7343
AvgTemp -0.00156 0.0005165 -3.021 0.002519
NewPrecip -0.09782 0.009332 -10.48 1.043e-25

4 Quasi-Poisson Rate Model on Cyclists on Brooklyn Bridge

The models we constructed last week assumed no dispersion issues in the model. We will examine the dispersion of the data below. We begin by doing a quasi-poisson regression.

quasimodel.rates <- glm(BrooklynBridge ~ Day + AvgTemp + NewPrecip, offset = log(Total), 
                   family = quasipoisson, data = biking)
pander(summary(quasimodel.rates)$coef, caption = "Quasi-Poisson regression on the proportion of NYC cyclists who used the Brooklyn Bridge for each day of the week adjusted by average temperature and precipitation.")
Quasi-Poisson regression on the proportion of NYC cyclists who used the Brooklyn Bridge for each day of the week adjusted by average temperature and precipitation. Right off the bat, we can see that the p-values of the quasi-poisson model are much larger than the original. We can find the dispersion index with the following code.
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.854 0.1343 -13.81 2.569e-12
DayMonday -0.008945 0.05362 -0.1668 0.869
DaySaturday 0.02704 0.05482 0.4933 0.6267
DaySunday 0.0487 0.05762 0.8451 0.4072
DayThursday -0.02096 0.05167 -0.4058 0.6888
DayTuesday -0.02645 0.04936 -0.5359 0.5974
DayWednesday -0.004351 0.05045 -0.08625 0.932
AvgTemp -0.00156 0.002032 -0.7678 0.4508
NewPrecip -0.09782 0.03672 -2.664 0.01418
ydif=biking$BrooklynBridge-exp(model.rates$linear.predictors)  # diff between y and yhat
prsd = ydif/sqrt(exp(model.rates$linear.predictors))   # Pearson residuals
phi = sum(prsd^2)/15          # Dispersion index: 24-9 = 15  
pander(cbind(Dispersion = phi))
Dispersion
22.71

This returns a dispersion index of 22.71, which is much larger than one. Therefore, we will use the quasipoisson model as our final model.

5 Visualizations

We will now create a graph that visualizes the relationship between the proportion of cyclists that took the Brooklyn Bridge and the day of the week for days with and without precipitation.

NoPrecipitation = c(exp(-1.854-0.008945), exp(-1.854-0.02645),   
               exp(-1.854-0.004351),exp(-1.854+0.02096),
               exp(-1.854),exp(-1.854+0.02704), exp(-1.854+0.0487))

Precipitation = c(exp(-1.854-0.008945-0.09782), exp(-1.854-0.02645-0.09782),   
               exp(-1.854-0.004351-0.09782),exp(-1.854+0.02096-0.09782),
               exp(-1.854-0.09782),exp(-1.854+0.02704-0.09782), exp(-1.854+0.0487-0.09782))

minmax = range(c(NoPrecipitation, Precipitation))

plot(1:7, NoPrecipitation, type="l", lty =1, col="red", xlab="", 
               ylab="Prop. of Brooklyn Cyclists", xlim=c(0,7), ylim=c(0.1, 0.2), axes=FALSE )
title("The Trend Line of Proportion of Brooklyn Bridge Cyclists")
axis(2)
axis(1, labels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday", "Sunday"), 
            at = 1:7)
points(1:7,NoPrecipitation, pch=19, col="red")
##
lines(1:7, Precipitation, lty =2, col="blue")
points(1:7, Precipitation, pch=20, col="blue")

##
legend("topleft", c("No Precipitation","Precipitation"),
                  pch=19:22, lty=1:2,  bty="n", 
        col=c("red", "blue"))

6 Final Model and Discussion

In the quasipoisson model, it does not appear that the average temperature or the day of the week is significant to the proportion of cyclists that used the Brooklyn Bridge for that day. Whether or not there was precicipation that day has a signficant effect on the proportion of cyclists on the Brooklyn Bridge according to the model, with \(log(R_{precipitation})-log(R_{no precipitation})=-0.09782\), which indicates that precipitation meant that about 10% fewer cyclists would use the bridge.

The visualization indicates that the proportion of cyclists that used the bridge remained relatively constant over the days of the week with a slight increase on the weekends, and that there was a smaller proportion of cyclists on the bridge on the days where there was measured precipitation. As the lines seem “parallel,” no interaction effect is observed as well.