Introduction

This project extends the previous Poisson report investigating NYC cyclist counts by addressing over-dispersion. The objective is to develop a quasi-Poisson model that produces reliable inference while explaining how day of the week, average temperature, and precipitation influence cyclist activity on the Queensboro bridge. The main questions being investigated are does daily cyclist volume depend significantly on weekday and weather conditions, and whether or not correcting for over-dispersion changes the conclusions drawn from the standard Poisson model.

Materials

The dataset contains daily observations of the day, high temperature, low temperature, precipitation, cyclist count across the bridge, and total traffic exposure. New variable average temperature was creating by averaging the high and low temp.The outcome variable is count = queensborobridge, representing the total cyclist count per day.

Methodology and Analysis

Two models were fit to the counts of cyclists: A regular Poisson GLM, and a Quasi-Poisson GLM. The estimated dispersion parameter phi was calculated from Pearson residuals. If phi is close to 1, Poisson suffices; if phi > 1, quasi-Poisson is more effective. Coefficients were exponentiation to IRRs, and visualization plots examined how counts vary with Day, average temperature, and new precipitation.

Results and Conclusions

Model Diagnostics

Estimated dispersion parameter phi = 109.12, meaning that the variance is over 100 times the mean, a clear sign of strong over-dispersion. Average temperature yielded a 0.6% increase per degree, new precipitation yielded a 27% decrease on rainy days, and week pattern loses significance after the quassi-Poisson implementation.

#Model Interpretation Rain is the only statistically robust predictor after applying the quassi-Poisson model to handle over-dispersion. Warmer days show slightly higher counts, but the variability is too large to claim formal significance. Weekday differences visible in raw data disappear.

Statistical Conclusion

Because Phi > 1, the quassi-Poisson Model yields better results. This is because the model preserves the mean fit but corrects the standard errors, yielding appropriate inference. Thus, the quasi-Poisson model is the final working model for this set of data. # General Discussion

This study shows how over-dispersion inflates apparent significance in count models. Using the quassi-Poisson model, we can apply robust methods without altering predicted means. The model confirms that precipitation strongly reduces cyclist count on the bridge, and temperature and weekday are not statistically reliable once daily variability is considered.

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

library(readxl)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(ggplot2)
library(broom)
library(knitr)



data_path <- "C:/Users/rg03/Downloads/sta321/PoissonData.xlsx"
raw <- read_excel(data_path)


nm <- names(raw)
nm <- tolower(nm)
nm <- gsub("[[:space:]]+", "_", nm)     
nm <- gsub("[^a-z0-9_]", "", nm)        
nm <- gsub("_+", "_", nm)               
nm <- gsub("^_+|_+$", "", nm)          
names(raw) <- nm

dat <- raw %>%
  mutate(

    count    = as.numeric(queensborobridge),
    exposure = as.numeric(total),

    avgtemp   = (as.numeric(hightemp) + as.numeric(lowtemp)) / 2,
    newprecip = ifelse(as.numeric(precipitation) > 0, 1, 0),

    
    day = factor(
      day,
      levels = c("Monday","Tuesday","Wednesday","Thursday",
                 "Friday","Saturday","Sunday"),
      ordered = TRUE
    )
  ) %>%
  filter(!is.na(count), !is.na(avgtemp), !is.na(newprecip), !is.na(day))


kable(head(dat), caption = "Preview of transformed data")

Preview of transformed data
date	day	hightemp	lowtemp	precipitation	queensborobridge	total	count	exposure	avgtemp	newprecip
2025-07-01	Saturday	84.9	72.0	0.23	3216	11867	3216	11867	78.45	1
2025-07-02	Sunday	87.1	73.0	0.00	3579	13995	3579	13995	80.05	0
2025-07-03	Monday	87.1	71.1	0.45	4230	16067	4230	16067	79.10	1
2025-07-04	Tuesday	82.9	70.0	0.00	3861	13925	3861	13925	76.45	0
2025-07-05	Wednesday	84.9	71.1	0.00	5862	23110	5862	23110	78.00	0
2025-07-06	Thursday	75.0	71.1	0.00	5251	21861	5251	21861	73.05	0

m_pois <- glm(count ~ day + avgtemp + newprecip,
              family = poisson(link = "log"),
              data = dat)


m_quasi <- glm(count ~ day + avgtemp + newprecip,
               family = quasipoisson(link = "log"),
               data = dat)


m_quasi_rate <- glm(count ~ day + avgtemp + newprecip + offset(log(exposure)),
                    family = quasipoisson(link = "log"),
                    data = dat %>% filter(!is.na(exposure), exposure > 0))


phi_hat <- sum(residuals(m_pois, type = "pearson")^2) / m_pois$df.residual



pois_tab   <- tidy(m_pois, conf.int = TRUE, exponentiate = TRUE)
quasi_tab  <- tidy(m_quasi, conf.int = TRUE, exponentiate = TRUE)
quasi_rate <- tidy(m_quasi_rate, conf.int = TRUE, exponentiate = TRUE)

kable(pois_tab,  digits = 3, caption = "Poisson model (IRRs)")

Poisson model (IRRs)
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	3067.571	0.042	189.020	0.000	2822.419	3333.737
day.L	0.762	0.007	-38.606	0.000	0.752	0.773
day.Q	0.888	0.007	-16.881	0.000	0.876	0.900
day.C	1.039	0.008	5.091	0.000	1.024	1.055
day^4	1.113	0.007	15.060	0.000	1.098	1.129
day^5	0.951	0.007	-6.953	0.000	0.937	0.964
day^6	0.985	0.007	-2.149	0.032	0.971	0.999
avgtemp	1.006	0.001	11.007	0.000	1.005	1.007
newprecip	0.727	0.007	-43.950	0.000	0.717	0.737

kable(quasi_tab, digits = 3, caption = "Quasi-Poisson model (IRRs)")

Quasi-Poisson model (IRRs)
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	3067.571	0.444	18.095	0.000	1279.506	7287.428
day.L	0.762	0.073	-3.696	0.001	0.660	0.880
day.Q	0.888	0.074	-1.616	0.120	0.768	1.025
day.C	1.039	0.079	0.487	0.631	0.890	1.215
day^4	1.113	0.074	1.442	0.163	0.963	1.289
day^5	0.951	0.076	-0.666	0.513	0.818	1.103
day^6	0.985	0.075	-0.206	0.839	0.850	1.140
avgtemp	1.006	0.006	1.054	0.303	0.995	1.017
newprecip	0.727	0.076	-4.207	0.000	0.626	0.842

kable(data.frame(Dispersion_Estimate = round(phi_hat, 3)),
      caption = "Estimated dispersion parameter ϕ̂")

Estimated dispersion parameter ϕ̂
Dispersion_Estimate
109.115

if (!is.null(m_quasi_rate) && inherits(m_quasi_rate, "glm")) {
  kable(quasi_rate, digits = 3, caption = "Quasi-Poisson rate model (IRRs) with offset log(Total)")
}

Quasi-Poisson rate model (IRRs) with offset log(Total)
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.272	0.165	-7.879	0.000	0.197	0.376
day.L	1.005	0.028	0.176	0.862	0.952	1.060
day.Q	0.949	0.028	-1.895	0.071	0.899	1.002
day.C	0.972	0.029	-0.964	0.345	0.918	1.030
day^4	0.991	0.028	-0.341	0.737	0.938	1.046
day^5	1.017	0.028	0.596	0.557	0.962	1.075
day^6	1.013	0.028	0.471	0.642	0.959	1.070
avgtemp	0.998	0.002	-0.755	0.458	0.994	1.003
newprecip	1.049	0.028	1.697	0.104	0.992	1.108

p1 <- ggplot(dat, aes(day, count)) +
  geom_boxplot() +
  labs(title = "Cyclist Counts by Day", y = "Count", x = "Day")


p2 <- ggplot(dat, aes(avgtemp, count)) +
  geom_point() +
  geom_smooth(method = "loess", se = FALSE) +
  labs(title = "Counts vs Average Temperature",
       y = "Count", x = "Average Temperature (°F)")


p3 <- ggplot(dat, aes(factor(newprecip), count)) +
  geom_boxplot() +
  labs(title = "Cyclist Counts by Precipitation",
       x = "NewPrecip (0 = Dry, 1 = Wet)", y = "Count")

print(p1)

print(p2)

## `geom_smooth()` using formula = 'y ~ x'

print(p3)

A quassi-Poisson Approach to Investigating Cyclist Count on the Queensboro Bridge

Ryan Gorman

2025-11-12