biking <- read.csv("C:/Users/qinfa/Desktop/school/STA 321/biking.csv")
biking$BrooklynBridge <- decomma(biking$BrooklynBridge)
biking$Total <- decomma(biking$Total)
biking$AvgTemp <- (biking$HighTemp + biking$LowTemp)/2
biking <- biking %>% mutate(
NewPrecip = case_when(
Precipitation > 0 ~ 1,
Precipitation == 0 ~ 0
)
)
kable(head(biking), caption = "First few records in the data set")
Date | Day | HighTemp | LowTemp | Precipitation | BrooklynBridge | Total | AvgTemp | NewPrecip |
---|---|---|---|---|---|---|---|---|
10/1 | Sunday | 66.9 | 50.0 | 0 | 2297 | 15975 | 58.45 | 0 |
10/2 | Monday | 72.0 | 52.0 | 0 | 3387 | 23784 | 62.00 | 0 |
10/3 | Tuesday | 70.0 | 57.0 | 0 | 3386 | 25280 | 63.50 | 0 |
10/4 | Wednesday | 75.0 | 55.9 | 0 | 3412 | 25477 | 65.45 | 0 |
10/5 | Thursday | 82.0 | 64.9 | 0 | 3312 | 23942 | 73.45 | 0 |
10/6 | Friday | 81.0 | 69.1 | 0 | 2982 | 22197 | 75.05 | 0 |
This dataset represents the total number of cyclists per 24 hour periods in October at Brooklyn Bridge and in New York City.
Date (categorical) - the date in October on which the data was collected, will serve as the ID in the analysis
Day (categorical) - the day of the week
HighTemp (continuous) - the highest temperature of that day
LowTemp (continuous) - the lowest temperature of that day
Precipitation (continuous) - the amount of precipitation for that day
BrooklynBridge (integer) - the number of cyclists on the Brooklyn Bridge for that day
Total (integer) - the number of total cyclists for that day
AvgTemp (continuous) - the average temperature of that day
NewPrecip (categorical) - indicates whether or not there was precipitation that day; 0 for no precipitation, 1 otherwise
The goal is to create a dispersed poisson model for the number of cyclists on the Brooklyn Bridge. The predictor variables will be the day of the week, the average temperature of the day, and whether or not there was precipitation.
We begin by constructing a poisson model without taking into account dispersion.
model.rates <- glm(BrooklynBridge ~ Day + AvgTemp + NewPrecip, offset = log(Total),
family = poisson(link = "log"), data = biking)
pander(summary(model.rates)$coef, caption = "Poisson regression the proportion of NYC cyclists who used the Brooklyn Bridge for each day of the week adjusted by average temperature and precipitation.")
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -1.854 | 0.03412 | -54.33 | 0 |
DayMonday | -0.008945 | 0.01363 | -0.6564 | 0.5116 |
DaySaturday | 0.02704 | 0.01393 | 1.941 | 0.05224 |
DaySunday | 0.0487 | 0.01464 | 3.325 | 0.000883 |
DayThursday | -0.02096 | 0.01313 | -1.597 | 0.1104 |
DayTuesday | -0.02645 | 0.01254 | -2.109 | 0.03496 |
DayWednesday | -0.004351 | 0.01282 | -0.3394 | 0.7343 |
AvgTemp | -0.00156 | 0.0005165 | -3.021 | 0.002519 |
NewPrecip | -0.09782 | 0.009332 | -10.48 | 1.043e-25 |
The models we constructed last week assumed no dispersion issues in the model. We will examine the dispersion of the data below. We begin by doing a quasi-poisson regression.
quasimodel.rates <- glm(BrooklynBridge ~ Day + AvgTemp + NewPrecip, offset = log(Total),
family = quasipoisson, data = biking)
pander(summary(quasimodel.rates)$coef, caption = "Quasi-Poisson regression on the proportion of NYC cyclists who used the Brooklyn Bridge for each day of the week adjusted by average temperature and precipitation.")
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | -1.854 | 0.1343 | -13.81 | 2.569e-12 |
DayMonday | -0.008945 | 0.05362 | -0.1668 | 0.869 |
DaySaturday | 0.02704 | 0.05482 | 0.4933 | 0.6267 |
DaySunday | 0.0487 | 0.05762 | 0.8451 | 0.4072 |
DayThursday | -0.02096 | 0.05167 | -0.4058 | 0.6888 |
DayTuesday | -0.02645 | 0.04936 | -0.5359 | 0.5974 |
DayWednesday | -0.004351 | 0.05045 | -0.08625 | 0.932 |
AvgTemp | -0.00156 | 0.002032 | -0.7678 | 0.4508 |
NewPrecip | -0.09782 | 0.03672 | -2.664 | 0.01418 |
ydif=biking$BrooklynBridge-exp(model.rates$linear.predictors) # diff between y and yhat
prsd = ydif/sqrt(exp(model.rates$linear.predictors)) # Pearson residuals
phi = sum(prsd^2)/15 # Dispersion index: 24-9 = 15
pander(cbind(Dispersion = phi))
Dispersion |
---|
22.71 |
This returns a dispersion index of 22.71, which is much larger than one. Therefore, we will use the quasipoisson model as our final model.
We will now create a graph that visualizes the relationship between the proportion of cyclists that took the Brooklyn Bridge and the day of the week for days with and without precipitation.
NoPrecipitation = c(exp(-1.854-0.008945), exp(-1.854-0.02645),
exp(-1.854-0.004351),exp(-1.854+0.02096),
exp(-1.854),exp(-1.854+0.02704), exp(-1.854+0.0487))
Precipitation = c(exp(-1.854-0.008945-0.09782), exp(-1.854-0.02645-0.09782),
exp(-1.854-0.004351-0.09782),exp(-1.854+0.02096-0.09782),
exp(-1.854-0.09782),exp(-1.854+0.02704-0.09782), exp(-1.854+0.0487-0.09782))
minmax = range(c(NoPrecipitation, Precipitation))
plot(1:7, NoPrecipitation, type="l", lty =1, col="red", xlab="",
ylab="Prop. of Brooklyn Cyclists", xlim=c(0,7), ylim=c(0.1, 0.2), axes=FALSE )
title("The Trend Line of Proportion of Brooklyn Bridge Cyclists")
axis(2)
axis(1, labels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday", "Sunday"),
at = 1:7)
points(1:7,NoPrecipitation, pch=19, col="red")
##
lines(1:7, Precipitation, lty =2, col="blue")
points(1:7, Precipitation, pch=20, col="blue")
##
legend("topleft", c("No Precipitation","Precipitation"),
pch=19:22, lty=1:2, bty="n",
col=c("red", "blue"))
In the quasipoisson model, it does not appear that the average temperature or the day of the week is significant to the proportion of cyclists that used the Brooklyn Bridge for that day. Whether or not there was precicipation that day has a signficant effect on the proportion of cyclists on the Brooklyn Bridge according to the model, with \(log(R_{precipitation})-log(R_{no precipitation})=-0.09782\), which indicates that precipitation meant that about 10% fewer cyclists would use the bridge.
The visualization indicates that the proportion of cyclists that used the bridge remained relatively constant over the days of the week with a slight increase on the weekends, and that there was a smaller proportion of cyclists on the bridge on the days where there was measured precipitation. As the lines seem “parallel,” no interaction effect is observed as well.