1 Introduction and Background

Here we have a dataset sourced from New York City’s Traffic Information Management System (TIMS). TIMS recorded the number of cyclists entering and leaving three of New York City’s five boroughs - Queens, Manhattan and Brooklyn - via a collection of bridges known as the East River Bridges (Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge). These recordings took place in 2017. April, July and October are the three months that are present in our available copy of the data.

For today’s analysis we are going to look at a randomly selected subset of the larger dataset (subset was chosen using R’s runif function), that pertains to cyclists who entered and left our three boroughs of interest - Queens, Manhattan and Brooklyn - via the Manhattan Bridge throughout the entire month of July 2017. This data has 31 observations, one detailing each day, and no missing values. A breakdown of each of the original dataset’s variables, their practical meaning and data types are below.

Name Meaning Data_Type
Date Date for that observation; YYYY-MM-DD form Date
Day Day of the week for that observation character
HighTemp That day’s highest recorded temperature double
LowTemp That day’s lowest recorded temperature double
Precipitation Measure of rain that day (inches) double
Manhattan Number of cyclists entering/leaving Queens, Manhattan or Brooklyn via the MANHATTAN Bridge double
Total Total number of cyclists entering/leaving Queens, Manhattan or Brooklyn via ANY of the East River Bridges double

1.1 Objective of Analysis

With the available data, my goal for this analysis is to examine the association between weather conditions and day of the week with the amount of cyclist traffic that the Manhattan Bridge experiences. In order to do this, I created two new variables - MeanTemp and TempDiff - which were calculated by averaging that particular day’s low and high temperatures and finding the difference between those temperatures respectively.

Using these temperature-related metrics, along with measures of precipitation and records of the day of the week, I will use Poisson and quasi-Poisson regression techniques to see which if any of these factors play a particular role in the overall amount or the relative rate of cyclist traffic that the Manhattan Bridge experiences.

2 Poisson Regression Modeling

To explore any potential associations, I created Poisson models of two different regression types, one being for counts and one being for rates.

Poisson counts regression examines the total number of occurrences of a particular event (in this case cyclists on the Manhattan Bridge) and uses a logarithmic function to determine which, if any of the explanatory variables have a significant effect on said response variable’s mean. The formula for said regression is below:

  • \(\beta\)0 = the log of our response variable’s mean; not very useful for practical interpretation

  • \(\beta\)1, \(\beta\)2, \(\beta\)3, … \(\beta\)p = the change in our response variable’s log mean, in association with a one unit increase in said predictor variable


Additionally, Poisson rates regression aims to find the expected rate of a particular event’s occurrence relative to that event’s proportion within a larger “population.” In the instance of this dataset and analysis, our variable Total, which represents the total number of cyclists on all the East River Bridges, will be what the number of cyclists on the Manhattan Bridge are considered to be a proportion of. The calculation for this type of Poisson regression is similar to counts regression, but the logarithm of the population variable is also considered to be a factor. This can be expressed in both of the following ways.


  • In Poisson rates regression, the parameters \(\beta\)0, …. \(\beta\)p should be interpreted in the same manner as they are in Poisson counts model.

2.1 Poisson Regression (Counts)

Below is a summary of the Poisson counts regression model I created, with measures of temperature range and averages, precipitation amount and day of the week all functioning as predictors of how many cyclists crossed the Manhattan Bridge in or out of our three boroughs of interest.

# Counts Model:
  # Response = Manhattan
  # Predictors = Day, MeanTemp, TempDiff, Precipitation
    # Day is stored as a Factor

Counts_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, family = poisson(link = "log"), data = Data)

Counts_Model_Sum = summary(Counts_Model)
Counts_Model_Coef = Counts_Model_Sum$coefficients

invisible(Counts_Model_Coef)
kable(Counts_Model_Coef, caption = "<b><center> Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")
Table:
Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists

Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.5013371 0.0421490 201.697286 0.0000000
DayMonday 0.3199236 0.0089935 35.572866 0.0000000
DayTuesday 0.3357894 0.0093242 36.012843 0.0000000
DayWednesday 0.4023102 0.0090796 44.309388 0.0000000
DayThursday 0.2807381 0.0096557 29.074878 0.0000000
DayFriday 0.1331873 0.0107366 12.405032 0.0000000
DaySaturday -0.0859127 0.0097279 -8.831613 0.0000000
MeanTemp -0.0029260 0.0006129 -4.774076 0.0000018
TempDiff 0.0143243 0.0008575 16.703807 0.0000000
Precipitation -0.4307477 0.0104214 -41.332836 0.0000000
# All predictor variables are significant

In the model, we can see that every predictor variable is statistically significant as per p values well below the standard of 0.05, so no stepwise regression or model simplification is necessary.

As for the practical implications of our model summary, we can say that although every predictor variable is statistically significant, the magnitude of their impacts are relatively small. Precipitation’s estimated negative effect on the log mean of Manhattan Bridge cyclists has an absolute value ~ |.4307|, which is the the highest of all our predictors.

It appears that the day’s average temperature and difference in daily highs and lows played very little practical significance in the log mean of that day’s cyclists. When we look at the difference in log means from a day-of-the-week perspective, we do see a slightly more impactful effect. With Sunday being coded in as the baseline, it looks like Wednesday has the greatest amount of cyclist traffic and Saturday has the least. This higher count of cyclists during the workweek could be due to the Manhattan Bridge functioning for many as a commuting method.

All in all, our Poisson counts model yields some interesting and statistically significant revelations, most notably that cyclists care far more about precipitation than they do temperature fluctuation, and that cyclist traffic appears to tick upwards throughout the workweek before dying down for the weekend. However, the relatively small magnitude of each variable’s estimated effect is a downside regarding the model’s utility.

2.2 Poisson Regression (Rates)

After Poisson counts regression, I then performed Poisson rates regression with the total number of cyclists entering and exiting our three boroughs of interest across all the East River Bridges as the “population” for which the Manhattan Bridge cyclists are acting as a sample of.

This process consisted of me creating two different Poisson rates models. The first one I created listed both temperature variables as statistically insignificant. Given their status as statistically insignificant in this model, and their minute practical significance in the previous counts model, I chose to remove them and create a second Poisson rates model which did not factor in the day’s average or range of temperature.

### Rates Model 1
Rates_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model_Sum = summary(Rates_Model)
Rates_Model_Coef = Rates_Model_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model_Coef, caption = "<b><center> Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")
Table:
Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists

Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.1844325 0.0418215 -28.3211719 0.0000000
DayMonday 0.0418134 0.0088829 4.7071774 0.0000025
DayTuesday 0.0549949 0.0094416 5.8247706 0.0000000
DayWednesday 0.0316272 0.0090743 3.4853702 0.0004915
DayThursday 0.0048565 0.0096974 0.5008067 0.6165072
DayFriday -0.0167479 0.0108925 -1.5375635 0.1241554
DaySaturday -0.0667274 0.0097414 -6.8498669 0.0000000
MeanTemp -0.0010004 0.0006053 -1.6527512 0.0983815
TempDiff 0.0008449 0.0008628 0.9792330 0.3274649
Precipitation -0.0306511 0.0095235 -3.2184824 0.0012887
### Rates Model 2
Rates_Model2 = glm(Manhattan ~ Day + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model2_Sum = summary(Rates_Model2)
Rates_Model2_Coef = Rates_Model2_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model2_Coef, caption = "<b><center> Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")
Table:
Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists

Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2497005 0.0065309 -191.3530685 0.0000000
DayMonday 0.0417392 0.0088231 4.7306551 0.0000022
DayTuesday 0.0522471 0.0090521 5.7718313 0.0000000
DayWednesday 0.0285783 0.0088706 3.2216687 0.0012745
DayThursday 0.0006398 0.0091826 0.0696739 0.9444532
DayFriday -0.0205402 0.0106430 -1.9299220 0.0536165
DaySaturday -0.0684652 0.0096858 -7.0685797 0.0000000
Precipitation -0.0288171 0.0093266 -3.0897749 0.0020031

Looking at the findings of our second Poisson rates regression model, we see a trend similar to that of our Poisson counts regression model, that being a common occurrence of statistical significance but not a great deal of practical significance on display when the magnitude of the regression coefficient is taken into consideration.

Once again treating Sunday as our baseline, it looks like the rate of Manhattan Bridge cyclists in proportion to the entirety of East River Bridge cyclists is at its highest early in the week, with that rate declining going into the weekend. That being said, the statistical significance of this breakdown also greatly decreases when we look at the data for Thursday and to a much lesser but still noticeable extent Friday, perhaps suggesting that the Manhattan Bridge cyclist rate’s decline at the tail end of the workweek could be chalked up to random chance and not a particular characteristic of the Bridge that affects the experience of its cyclists only on those particular days.

2.3 Day of the Week Averages

Since both our counts and rates models suggested that the day of the week has the greatest association with the log mean of the Manhattan Bridge’s cyclists, I decided to calculate the average counts and rates per day to compare them to each other and the mean across all days considered. The table with this information is below.

Count_Averages = c(
  round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$Day == "Saturday"]))
)

AllDays_Rates_Avg = sum(Data$Manhattan)/sum(Data$Total)

Sun_Rates_Avg = sum(Data$Manhattan[Data$Day == "Sunday"])/sum(Data$Total[Data$Day == "Sunday"])

Mon_Rates_Avg = sum(Data$Manhattan[Data$Day == "Monday"])/sum(Data$Total[Data$Day == "Monday"])

Tues_Rates_Avg = sum(Data$Manhattan[Data$Day == "Tuesday"])/sum(Data$Total[Data$Day == "Tuesday"])

Wed_Rates_Avg = sum(Data$Manhattan[Data$Day == "Wednesday"])/sum(Data$Total[Data$Day == "Wednesday"])

Thur_Rates_Avg = sum(Data$Manhattan[Data$Day == "Thursday"])/sum(Data$Total[Data$Day == "Thursday"])

Fri_Rates_Avg = sum(Data$Manhattan[Data$Day == "Friday"])/sum(Data$Total[Data$Day == "Friday"])

Sat_Rates_Avg = sum(Data$Manhattan[Data$Day == "Saturday"])/sum(Data$Total[Data$Day == "Saturday"])

Day_Rates_Averages = c(AllDays_Rates_Avg, Sun_Rates_Avg, Mon_Rates_Avg, Tues_Rates_Avg, Wed_Rates_Avg, Thur_Rates_Avg, Fri_Rates_Avg, Sat_Rates_Avg)

Rate_Averages = round(Day_Rates_Averages, digits = 4)

Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

Counts_Difference = c(
  0, # Difference between the average count of all days and itself
  round(mean(Data$Manhattan[Data$Day == "Sunday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Monday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Friday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Saturday"])) - round(mean(Data$Manhattan))
)

Rates_DifferenceB = c(
  0,
  Sun_Rates_Avg - AllDays_Rates_Avg,
  Mon_Rates_Avg - AllDays_Rates_Avg,
  Tues_Rates_Avg - AllDays_Rates_Avg,
  Wed_Rates_Avg - AllDays_Rates_Avg,
  Thur_Rates_Avg - AllDays_Rates_Avg,
  Fri_Rates_Avg - AllDays_Rates_Avg,
  Sat_Rates_Avg - AllDays_Rates_Avg
)

Rates_Difference = round(Rates_DifferenceB, digits = 4)

Table = cbind(Days, Count_Averages, Counts_Difference, Rate_Averages, Rates_Difference)

kable(Table, caption = "<b><center><span style='color:#000000;'>Distribution of Manhattan Bridge Cyclist Count and Rates July 2017</center></b>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center"
  )
Distribution of Manhattan Bridge Cyclist Count and Rates July 2017
Days Count_Averages Counts_Difference Rate_Averages Rates_Difference
All Days 5425 0 0.2885 0
Sunday 4690 -735 0.2865 -0.002
Monday 6001 576 0.2975 0.009
Tuesday 6363 938 0.302 0.0135
Wednesday 6938 1513 0.2949 0.0064
Thursday 5999 574 0.2868 -0.0017
Friday 4338 -1087 0.2775 -0.0109
Saturday 4031 -1394 0.2665 -0.022

The table provides greater detail into the implications of our Poisson count and rate models. That being weekday totals of Manhattan Bridge cyclists (specifically Monday - Thursday) far outweigh the count of cyclists on the bridge from Friday to Sunday. With the average number of cylclists from Monday - Thursday being about 6,325, and the average number Friday - Sunday being about 4,353.

As for the rate of Manhattan Bridge cyclists relative to cyclists on all East River Bridges, we see that the Manhattan Bridge’s cyclist rate is slightly above average Monday - Wednesday, but then below average Thursday through Sunday.

2.4 Poisson Modeling Takeaways

To conclude, any implementations done in response to our Poisson models’ findings should be done with some degree of caution due to the low practical significance found in both our count and rate models. That being said, there are still valuable takeaways that we can draw from our analysis.

First, the Manhattan Bridge is clearly busier, both in the sense of raw volume and as a proportion of the overall East River Bridge network, early and throughout the standard workweek than it is during the weekend. Second, the daily average temperature as well as the difference between that day’s high and low played very little if any role in the count or rate of cyclists on any given day, but the measure of precipitation does appear to have a relatively noticeable and negative association with the number of that day’s cyclists on the Manhattan Bridge.

3 Quasi-Poisson Regression Modeling

In addition to analyzing our data at hand via Poisson regression, I decided to also create a quasi-Poisson model of the data. Quasi-Poisson modeling is an alternative to Poisson modeling, and it is particularly valuable when the mean and variance of the model’s response variable (number of cyclists on the Manhattan bridge in this case) are not approximately equal to one another (known as dispersion).

For my quasi-Poisson model, I included that day’s average temperature, day of the week and precipitation amount as the relevant factors. Day of the week obviously played the biggest role in our previous Poisson models, with precipitation consistently being cited as statistically significant despite relatively low practical significance. For this model, I chose to discretize precipitation, with days of no recorded rain being marked as “0” and days with any amount of rain being marked as “1.”

Data$NewPrecip = Data$Precipitation
Data$NewPrecip[Data$Precipitation == 0] = 0
Data$NewPrecip[Data$Precipitation > 0] = 1

Data = data.frame(Data$Date, Data$Day, Data$Day_Num, Data$HighTemp, Data$LowTemp, Data$MeanTemp, Data$TempDiff, Data$Precipitation, Data$NewPrecip, Data$Manhattan, Data$Total)
colnames(Data) = c("Date", "Day", "Day_Num", "HighTemp", "LowTemp", "MeanTemp","TempDiff", "Precipitation", "NewPrecip","Manhattan", "Total")


# 1.) Below is the quasi-Poisson regression model
  # As instructed, only includes Day, MeanTemp and NewPrecip

Quasi_Counts_Model = glm(Manhattan ~ Day + MeanTemp + NewPrecip, family = quasipoisson, data = Data)
  
Quasi_Counts_Model_Sum = summary(Quasi_Counts_Model)
Quasi_Counts_Model_Coef = Quasi_Counts_Model_Sum$coefficients

invisible(Quasi_Counts_Model_Coef)
kable(Quasi_Counts_Model_Coef, caption = "<b><center> Quasi-Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")
Table:
Quasi-Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists

Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.0428119 0.5287239 15.2117416 0.0000000
DayMonday 0.3201598 0.1180432 2.7122270 0.0127244
DayTuesday 0.2356884 0.1223379 1.9265366 0.0670535
DayWednesday 0.3064451 0.1211249 2.5299920 0.0190731
DayThursday 0.2542985 0.1240203 2.0504581 0.0524206
DayFriday 0.0302193 0.1365853 0.2212482 0.8269398
DaySaturday -0.0805066 0.1301538 -0.6185500 0.5425648
MeanTemp 0.0062851 0.0068223 0.9212583 0.3669084
NewPrecip -0.4057049 0.0921307 -4.4035809 0.0002251

A summary of the quasi-Poisson counts model can be seen above. We can see that there is great similarity between the findings of this model and our original Poisson counts model. However before we can determine which one is superior for interpretative use, we must calculate this quasi-Poisson’s dispersion parameter, “phi hat” (\(\hat{\phi}\)).

3.1 Dispersion and Counts Model Selection

\(\hat{\phi}\) is used in quasi-Poisson regression to determine if our data’s response variable is overly or underly dispersed. Generally, a phi hat value of around 1 is representative of an approximately equal mean and variance of the response. If a quasi-Poisson model’s dispersion value is significantly different than 1, then that model should be used for associative analysis rather than a traditional Poisson counterpart, as the quasi-Poisson calculation includes greater estimation of standard errors. However, if \(\hat{\phi}\) ~ 1, then the traditional Poisson model should be used, as it is less computationally intensive and avoids otherwise unnecessary extra steps. The formula for \(\hat{\phi}\)’s calculation can be seen below.

n = nrow(Data)
p = 3
Pearson_Residuals = residuals(Quasi_Counts_Model, type = "pearson")
Sq_Pearson_Residuals = Pearson_Residuals^2
Dispersion_Parameter = (sum(Sq_Pearson_Residuals))/(n-p)

#### Double checked phi's value using Prof's coding method; got same result
  ydif=Data$Manhattan-exp(Quasi_Counts_Model$linear.predictors)  # diff between y and yhat
  prsd = ydif/sqrt(exp(Quasi_Counts_Model$linear.predictors))   # Pearson residuals
  phi_check = sum(prsd^2)/(n-p)
#### 
  
invisible(Dispersion_Parameter)
invisible(phi_check)

Our model yielded a value of \(\hat{\phi}\) ~ 142, which is well beyond the margin of error for a properly dispersed Poisson response variable. For this reason, we can deem that the quasi-Poisson counts model is more valuable for associative analysis than the Poisson counts model. Because of this, we will use the quasi-Poisson for our ultimate interpretations.

3.2 Visual Aids

Referring to our quasi-Poisson model summary above, we see that the day’s average temperature does not appear to have significant statistical or practical association with the Manhattan bridge’s number of cyclists. However, there does appear to be such a difference between the number of cyclists on a totally clear day as opposed to a day with at least some level of precipitation (recorded via variable NewPrecip). And, as consistently seen in our original Poisson regression models, there is certainly a large difference between the typical number of cyclists depending on the day of the week.

Knowing this, I created two visuals below to enhance our grasp of the relationship that both the day of the week and the presence of precipitation have with each other as well as the standard number of cyclists that were on the Manhattan Bridge throughout July 2017. Unfortunately, there were no instances of Tuesdays or Wednesdays with precipitation in this study, resulting in a blank in both our table and bar chart below.

#### Table
Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

No_Precipitation = c(
  round(mean(Data$Manhattan[Data$NewPrecip == 0])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Saturday"]))
)

Some_Precipitation = c(
  round(mean(Data$Manhattan[Data$NewPrecip == 1])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Saturday"]))
)

Vis_Table = data.frame(Days, No_Precipitation, Some_Precipitation)

kable(Vis_Table, caption = "<b><center><span style='color:#000000;'>Average Cyclist Counts on the Manhattan Bridge July 2017</center></b>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")
Average Cyclist Counts on the Manhattan Bridge July 2017
Days No_Precipitation Some_Precipitation
All Days 6008 3746
Sunday 4924 3756
Monday 7408 3892
Tuesday 6363 NaN
Wednesday 6938 NaN
Thursday 6006 5980
Friday 5802 2874
Saturday 4484 3352
#### Barchart
Vis_long =
  Vis_Table %>%
  pivot_longer(
    cols = c(No_Precipitation, Some_Precipitation),
    names_to = "Precipitation",
    values_to = "Manhattan"
  )

Vis_long$Days = factor(
  Vis_long$Days,
  levels = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday",
             "Thursday", "Friday", "Saturday")
)
ggplot(Vis_long,
       aes(x = Days, y = Manhattan, fill = Precipitation)) +
  geom_bar(stat = "identity",
           position = position_dodge(width = 0.9),
           na.rm = TRUE) +
  labs(
    title = "Average Cyclist Counts on the Manhattan Bridge July 2017",
    x = "Day of the Week",
    y = "Number of Cyclists",
    fill = "Precipitation"
  ) +
  scale_fill_manual(
    values = c("No_Precipitation" = "darkred",
               "Some_Precipitation"    = "lightblue")
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )

The depictions above provide added clarity to the summary statistics from our quasi-Poisson model. Sunday was the baseline for our regression model’s calculations, and all remaining days other than Saturday had positive coefficients. We can see in both our chart and table, that the count of cyclists is certainly higher throughout the week than it is on the weekends. As for the role of precipitation; our discretized variable NewPrecip had a regression coefficent of ~ 0.4 (the highest absolute value of any factor in the model) and a significant p value ~ .0002. That finding can be intuitively confirmed by taking a glance at our visuals. Other than Thursday, all other days for which there are data points for both precipitation and no precipitation show a noticeable decrease in cyclists when there is a presence of precipitation.

4 Conclusion

Through our analysis via the means of both Poisson and quasi-Poisson modeling techniques, our findings were relatively consistent. Those being the following;

  • The Manhattan Bridge is far busier during the week than it is on the weekend.
  • At least in the month of July, the temperature does not play any sort of significant role in the raw number of the bridge’s cyclists nor its share of the totality of East River Bridge cyclists.
  • Although temperature is not significantly associated with cyclist traffic, precipitation is. Any amount of precipitation has a negative and statistically significant relationship with Manhattan Bridge’s cyclist traffic.

Regarding the existence of both our Poisson and quasi-Poisson models to estimate the association between all these factors, the quasi-Poisson is more ideal due to this dataset’s extremely high dispersion parameter (\(\hat{\phi}\) ~ 142).

If we were to continue or expand on this analysis in the future, it would be valuable to expand the scope of our data outside of the month of July and into months that border on seasonal changes such as March, April or October. Intuitively, one might guess that the day’s temperatures play a much larger role in a time of the year like that.

---
title: "Poisson and Quasi-Poisson Analysis of Relationship between Weather and Day of Week with Cyclist Traffic on Manhattan Bridge"
author: "Chris Bahm"
date: "2025-11-10"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: true
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(car)) {library(car)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE
)

```

# Introduction and Background 
Here we have a dataset sourced from New York City's Traffic Information Management System (TIMS). TIMS recorded the number of cyclists entering and leaving three of New York City's five boroughs - Queens, Manhattan and Brooklyn - via a collection of bridges known as the East River Bridges (Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge). These recordings took place in 2017. April, July and October are the three months that are present in our available copy of the data.

For today's analysis we are going to look at a randomly selected subset of the larger dataset (subset was chosen using R's runif function), that pertains to cyclists who entered and left our three boroughs of interest - Queens, Manhattan and Brooklyn - via the **Manhattan** Bridge throughout the entire month of July 2017. This data has 31 observations, one detailing each day, and no missing values. A breakdown of each of the original dataset's variables, their practical meaning and data types are below.

```{r Variable Table, echo=FALSE}
library(knitr)

Var_Table = data.frame(
  Name = c("Date",
           "Day", 
           "HighTemp", 
           "LowTemp",
           "Precipitation", 
           "Manhattan", 
           "Total"),
  
  Meaning = c("Date for that observation; YYYY-MM-DD form", 
              "Day of the week for that observation",
              "That day's highest recorded temperature", 
              "That day's lowest recorded temperature", 
              "Measure of rain that day (inches)",  
              "Number of cyclists entering/leaving Queens, Manhattan or Brooklyn via the MANHATTAN Bridge",
              "Total number of cyclists entering/leaving Queens, Manhattan or Brooklyn via ANY of the East River Bridges"), 
              
  
  Data_Type = c("Date", "character", "double", "double", "double", "double", "double"))

kable(Var_Table) %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

```
## Objective of Analysis
With the available data, my goal for this analysis is to examine the association between weather conditions and day of the week with the amount of cyclist traffic that the Manhattan Bridge experiences. In order to do this, I created two new variables - MeanTemp and TempDiff - which were calculated by averaging that particular day's low and high temperatures and finding the difference between those temperatures respectively. 

Using these temperature-related metrics, along with measures of precipitation and records of the day of the week, I will use Poisson and quasi-Poisson regression techniques to see which if any of these factors play a particular role in the overall amount *or* the relative rate of cyclist traffic that the Manhattan Bridge experiences.

```{r Data Loading and Cleaning, include=FALSE}
library (openxlsx)
options(scipen = 999)

# round(runif(1, min = 1, max = 10))
  # Used line above to randomly select which subset of the data to do my analysis on. The fifth tab on the original data Excel spreadsheet was for observations on the Manhattan Bridge from 7/1 to 7/31

Data = read.xlsx("https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/NYC_Cyclists_Data.xlsx", sheet = "Manhattan 2")

glimpse(Data)

# Converting variable Date to a date object.
# Origin = 1899-12-30 since Excel stores data values as the number of days since then.
Data$Date = as.Date(Data$Date, origin = "1899-12-30")

# Originally converting variable Day to a date object, then using the weekdays function on it.
Data$Day = as.Date(Data$Day, origin = "1899-12-30")
  Data$Day = weekdays(as.Date(Data$Day))

  # Creation of Day_Num variable
Data$Day_Num[Data$Day == "Sunday"] = 1
Data$Day_Num[Data$Day == "Monday"] = 2
Data$Day_Num[Data$Day == "Tuesday"] = 3
Data$Day_Num[Data$Day == "Wednesday"] = 4
Data$Day_Num[Data$Day == "Thursday"] = 5
Data$Day_Num[Data$Day == "Friday"] = 6
Data$Day_Num[Data$Day == "Saturday"] = 7

  Data$MeanTemp = (Data$HighTemp + Data$LowTemp)/2
  Data$TempDiff = Data$HighTemp - Data$LowTemp
  Data$Day = factor(Data$Day,
                  levels = c("Sunday", "Monday", "Tuesday", "Wednesday", 
                             "Thursday", "Friday", "Saturday"))
            # Since Sunday is the first level of the factor listed here, it will be recognized as the baseline by R

# Reorder columns for ideal visual perception
Data = data.frame(Data$Date, Data$Day, Data$Day_Num, Data$HighTemp, Data$LowTemp, Data$MeanTemp, Data$TempDiff, Data$Precipitation, Data$Manhattan, Data$Total)
colnames(Data) = c("Date", "Day", "Day_Num", "HighTemp", "LowTemp", "MeanTemp","TempDiff", "Precipitation", "Manhattan", "Total")

glimpse(Data)
```

# Poisson Regression Modeling
To explore any potential associations, I created Poisson models of two different regression types, one being for counts and one being for rates.

Poisson counts regression examines the total number of occurrences of a particular event (in this case cyclists on the Manhattan Bridge) and uses a logarithmic function to determine which, if any of the explanatory variables have a significant effect on said response variable's mean. The formula for said regression is below:

```{r, echo=FALSE}
include_graphics("Poisson_Model_Form.png")
```

- $\beta$~0~ = the log of our response variable's mean; not very useful for practical interpretation 

- $\beta$~1~, $\beta$~2~, $\beta$~3~, ... $\beta$~p~ = the change in our response variable's log mean, in association with a one unit increase in said predictor variable

 <br> 

Additionally, Poisson rates regression aims to find the expected rate of a particular event's occurrence relative to that event's proportion within a larger "population." In the instance of this dataset and analysis, our variable Total, which represents the **total** number of cyclists on all the East River Bridges, will be what the number of cyclists on the Manhattan Bridge are considered to be a proportion of. The calculation for this type of Poisson regression is similar to counts regression, but the logarithm of the population variable is also considered to be a factor. This can be expressed in both of the following ways.
<br>
```{r, echo=FALSE}
include_graphics("Poisson_Form_Rates1.png")
```
 <br> 
```{r, echo=FALSE}
include_graphics("Poisson_Model_Form_Rates.png")
```
 <br> 
 
 - In Poisson rates regression, the parameters $\beta$~0~, .... $\beta$~p~ should be interpreted in the same manner as they are in Poisson counts model.

## Poisson Regression (Counts)
Below is a summary of the Poisson counts regression model I created, with measures of temperature range and averages, precipitation amount and day of the week all functioning as predictors of how many cyclists crossed the Manhattan Bridge in or out of our three boroughs of interest.
```{r}
# Counts Model:
  # Response = Manhattan
  # Predictors = Day, MeanTemp, TempDiff, Precipitation
    # Day is stored as a Factor

Counts_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, family = poisson(link = "log"), data = Data)

Counts_Model_Sum = summary(Counts_Model)
Counts_Model_Coef = Counts_Model_Sum$coefficients

invisible(Counts_Model_Coef)
kable(Counts_Model_Coef, caption = "<b><center> Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")

# All predictor variables are significant
```
In the model, we can see that *every* predictor variable is statistically significant as per p values well below the standard of 0.05, so no stepwise regression or model simplification is necessary. 

As for the practical implications of our model summary, we can say that although every predictor variable is statistically significant, the magnitude of their impacts are relatively small. Precipitation's estimated negative effect on the log mean of Manhattan Bridge cyclists has an absolute value ~ |.4307|, which is the the highest of all our predictors.

It appears that the day's average temperature and difference in daily highs and lows played very little practical significance in the log mean of that day's cyclists. When we look at the difference in log means from a day-of-the-week perspective, we do see a slightly more impactful effect. With Sunday being coded in as the baseline, it looks like Wednesday has the greatest amount of cyclist traffic and Saturday has the least. This higher count of cyclists during the workweek could be due to the Manhattan Bridge functioning for many as a commuting method.

All in all, our Poisson counts model yields some interesting and statistically significant revelations, most notably that cyclists care far more about precipitation than they do temperature fluctuation, and that cyclist traffic appears to tick upwards throughout the workweek before dying down for the weekend. However, the relatively small magnitude of each variable's estimated effect is a downside regarding the model's utility. 

## Poisson Regression (Rates)
After Poisson counts regression, I then performed Poisson rates regression with the total number of cyclists entering and exiting our three boroughs of interest across *all* the East River Bridges as the "population" for which the Manhattan Bridge cyclists are acting as a sample of. 

This process consisted of me creating two different Poisson rates models. The first one I created listed both temperature variables as statistically insignificant. Given their status as statistically insignificant in this model, and their minute practical significance in the previous counts model, I chose to remove them and create a second Poisson rates model which did not factor in the day's average or range of temperature.
```{r}
### Rates Model 1
Rates_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model_Sum = summary(Rates_Model)
Rates_Model_Coef = Rates_Model_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model_Coef, caption = "<b><center> Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")


### Rates Model 2
Rates_Model2 = glm(Manhattan ~ Day + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model2_Sum = summary(Rates_Model2)
Rates_Model2_Coef = Rates_Model2_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model2_Coef, caption = "<b><center> Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")

```
Looking at the findings of our second Poisson rates regression model, we see a trend similar to that of our Poisson counts regression model, that being a common occurrence of statistical significance but not a great deal of practical significance on display when the magnitude of the regression coefficient is taken into consideration.

Once again treating Sunday as our baseline, it looks like the rate of Manhattan Bridge cyclists in proportion to the entirety of East River Bridge cyclists is at its highest early in the week, with that rate declining going into the weekend. That being said, the statistical significance of this breakdown also greatly decreases when we look at the data for Thursday and to a much lesser but still noticeable extent Friday, perhaps suggesting that the Manhattan Bridge cyclist rate's decline at the tail end of the workweek could be chalked up to random chance and not a particular characteristic of the Bridge that affects the experience of its cyclists only on those particular days.

## Day of the Week Averages
Since both our counts and rates models suggested that the day of the week has the greatest association with the log mean of the Manhattan Bridge's cyclists, I decided to calculate the average counts and rates per day to compare them to each other and the mean across all days considered. The table with this information is below.
```{r}
Count_Averages = c(
  round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$Day == "Saturday"]))
)

AllDays_Rates_Avg = sum(Data$Manhattan)/sum(Data$Total)

Sun_Rates_Avg = sum(Data$Manhattan[Data$Day == "Sunday"])/sum(Data$Total[Data$Day == "Sunday"])

Mon_Rates_Avg = sum(Data$Manhattan[Data$Day == "Monday"])/sum(Data$Total[Data$Day == "Monday"])

Tues_Rates_Avg = sum(Data$Manhattan[Data$Day == "Tuesday"])/sum(Data$Total[Data$Day == "Tuesday"])

Wed_Rates_Avg = sum(Data$Manhattan[Data$Day == "Wednesday"])/sum(Data$Total[Data$Day == "Wednesday"])

Thur_Rates_Avg = sum(Data$Manhattan[Data$Day == "Thursday"])/sum(Data$Total[Data$Day == "Thursday"])

Fri_Rates_Avg = sum(Data$Manhattan[Data$Day == "Friday"])/sum(Data$Total[Data$Day == "Friday"])

Sat_Rates_Avg = sum(Data$Manhattan[Data$Day == "Saturday"])/sum(Data$Total[Data$Day == "Saturday"])

Day_Rates_Averages = c(AllDays_Rates_Avg, Sun_Rates_Avg, Mon_Rates_Avg, Tues_Rates_Avg, Wed_Rates_Avg, Thur_Rates_Avg, Fri_Rates_Avg, Sat_Rates_Avg)

Rate_Averages = round(Day_Rates_Averages, digits = 4)

Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

Counts_Difference = c(
  0, # Difference between the average count of all days and itself
  round(mean(Data$Manhattan[Data$Day == "Sunday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Monday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Friday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Saturday"])) - round(mean(Data$Manhattan))
)

Rates_DifferenceB = c(
  0,
  Sun_Rates_Avg - AllDays_Rates_Avg,
  Mon_Rates_Avg - AllDays_Rates_Avg,
  Tues_Rates_Avg - AllDays_Rates_Avg,
  Wed_Rates_Avg - AllDays_Rates_Avg,
  Thur_Rates_Avg - AllDays_Rates_Avg,
  Fri_Rates_Avg - AllDays_Rates_Avg,
  Sat_Rates_Avg - AllDays_Rates_Avg
)

Rates_Difference = round(Rates_DifferenceB, digits = 4)

Table = cbind(Days, Count_Averages, Counts_Difference, Rate_Averages, Rates_Difference)

kable(Table, caption = "<b><center><span style='color:#000000;'>Distribution of Manhattan Bridge Cyclist Count and Rates July 2017</center></b>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center"
  )

```

The table provides greater detail into the implications of our Poisson count and rate models. That being weekday totals of Manhattan Bridge cyclists (specifically Monday - Thursday) far outweigh the count of cyclists on the bridge from Friday to Sunday. With the average number of cylclists from Monday - Thursday being about 6,325, and the average number Friday - Sunday being about 4,353.

As for the rate of Manhattan Bridge cyclists relative to cyclists on all East River Bridges, we see that the Manhattan Bridge's cyclist rate is slightly above average Monday - Wednesday, but then below average Thursday through Sunday.


## Poisson Modeling Takeaways
To conclude, any implementations done in response to our Poisson models' findings should be done with some degree of caution due to the low practical significance found in both our count and rate models. That being said, there are still valuable takeaways that we can draw from our analysis. 

First, the Manhattan Bridge is clearly busier, both in the sense of raw volume and as a proportion of the overall East River Bridge network, early and throughout the standard workweek than it is during the weekend. Second, the daily average temperature as well as the difference between that day's high and low played very little if any role in the count or rate of cyclists on any given day, but the measure of precipitation does appear to have a relatively noticeable and negative association with the number of that day's cyclists on the Manhattan Bridge.

# Quasi-Poisson Regression Modeling
In addition to analyzing our data at hand via Poisson regression, I decided to also create a quasi-Poisson model of the data. Quasi-Poisson modeling is an alternative to Poisson modeling, and it is particularly valuable when the mean and variance of the model's response variable (number of cyclists on the Manhattan bridge in this case) are not approximately equal to one another (known as dispersion).

For my quasi-Poisson model, I included that day's average temperature, day of the week and precipitation amount as the relevant factors. Day of the week obviously played the biggest role in our previous Poisson models, with precipitation consistently being cited as statistically significant despite relatively low practical significance. For this model, I chose to discretize precipitation, with days of no recorded rain being marked as "0" and days with *any* amount of rain being marked as "1." 
```{r, Quasi-Poisson Model}
Data$NewPrecip = Data$Precipitation
Data$NewPrecip[Data$Precipitation == 0] = 0
Data$NewPrecip[Data$Precipitation > 0] = 1

Data = data.frame(Data$Date, Data$Day, Data$Day_Num, Data$HighTemp, Data$LowTemp, Data$MeanTemp, Data$TempDiff, Data$Precipitation, Data$NewPrecip, Data$Manhattan, Data$Total)
colnames(Data) = c("Date", "Day", "Day_Num", "HighTemp", "LowTemp", "MeanTemp","TempDiff", "Precipitation", "NewPrecip","Manhattan", "Total")


# 1.) Below is the quasi-Poisson regression model
  # As instructed, only includes Day, MeanTemp and NewPrecip

Quasi_Counts_Model = glm(Manhattan ~ Day + MeanTemp + NewPrecip, family = quasipoisson, data = Data)
  
Quasi_Counts_Model_Sum = summary(Quasi_Counts_Model)
Quasi_Counts_Model_Coef = Quasi_Counts_Model_Sum$coefficients

invisible(Quasi_Counts_Model_Coef)
kable(Quasi_Counts_Model_Coef, caption = "<b><center> Quasi-Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists </center></b>")
  
```

A summary of the quasi-Poisson counts model can be seen above. We can see that there is great similarity between the findings of this model and our original Poisson counts model. However before we can determine which one is superior for interpretative use, we must calculate this quasi-Poisson's dispersion parameter, "phi hat" ($\hat{\phi}$).

## Dispersion and Counts Model Selection

$\hat{\phi}$ is used in quasi-Poisson regression to determine if our data's response variable is overly or underly dispersed. Generally, a phi hat value of around 1 is representative of an approximately equal mean and variance of the response. If a quasi-Poisson model's dispersion value is significantly different than 1, then that model should be used for associative analysis rather than a traditional Poisson counterpart, as the quasi-Poisson calculation includes greater estimation of standard errors. However, if $\hat{\phi}$ ~ 1, then the traditional Poisson model should be used, as it is less computationally intensive and avoids otherwise unnecessary extra steps. The formula for $\hat{\phi}$'s calculation can be seen below.

```{r,  fig.align="center", echo=FALSE}
include_graphics("Dispersion_Parameter.png")
```

```{r Dispersion Parameter}
n = nrow(Data)
p = 3
Pearson_Residuals = residuals(Quasi_Counts_Model, type = "pearson")
Sq_Pearson_Residuals = Pearson_Residuals^2
Dispersion_Parameter = (sum(Sq_Pearson_Residuals))/(n-p)

#### Double checked phi's value using Prof's coding method; got same result
  ydif=Data$Manhattan-exp(Quasi_Counts_Model$linear.predictors)  # diff between y and yhat
  prsd = ydif/sqrt(exp(Quasi_Counts_Model$linear.predictors))   # Pearson residuals
  phi_check = sum(prsd^2)/(n-p)
#### 
  
invisible(Dispersion_Parameter)
invisible(phi_check)
```

Our model yielded a value of $\hat{\phi}$ ~ 142, which is **well** beyond the margin of error for a properly dispersed Poisson response variable. For this reason, we can deem that the quasi-Poisson counts model is more valuable for associative analysis than the Poisson counts model. Because of this, we will use the quasi-Poisson for our ultimate interpretations.

## Visual Aids
Referring to our quasi-Poisson model summary above, we see that the day's average temperature does not appear to have significant statistical or practical association with the Manhattan bridge's number of cyclists. However, there does appear to be such a difference between the number of cyclists on a totally clear day as opposed to a day with at least *some* level of precipitation (recorded via variable NewPrecip). And, as consistently seen in our original Poisson regression models, there is certainly a large difference between the typical number of cyclists depending on the day of the week.

Knowing this, I created two visuals below to enhance our grasp of the relationship that both the day of the week and the presence of precipitation have with each other as well as the standard number of cyclists that were on the Manhattan Bridge throughout July 2017. Unfortunately, there were no instances of Tuesdays or Wednesdays with precipitation in this study, resulting in a blank in both our table and bar chart below.

```{r, fig.align='center', fig.width=10, fig.height=6}
#### Table
Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

No_Precipitation = c(
  round(mean(Data$Manhattan[Data$NewPrecip == 0])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 0 & Data$Day == "Saturday"]))
)

Some_Precipitation = c(
  round(mean(Data$Manhattan[Data$NewPrecip == 1])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$NewPrecip == 1 & Data$Day == "Saturday"]))
)

Vis_Table = data.frame(Days, No_Precipitation, Some_Precipitation)

kable(Vis_Table, caption = "<b><center><span style='color:#000000;'>Average Cyclist Counts on the Manhattan Bridge July 2017</center></b>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

#### Barchart
Vis_long =
  Vis_Table %>%
  pivot_longer(
    cols = c(No_Precipitation, Some_Precipitation),
    names_to = "Precipitation",
    values_to = "Manhattan"
  )

Vis_long$Days = factor(
  Vis_long$Days,
  levels = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday",
             "Thursday", "Friday", "Saturday")
)
ggplot(Vis_long,
       aes(x = Days, y = Manhattan, fill = Precipitation)) +
  geom_bar(stat = "identity",
           position = position_dodge(width = 0.9),
           na.rm = TRUE) +
  labs(
    title = "Average Cyclist Counts on the Manhattan Bridge July 2017",
    x = "Day of the Week",
    y = "Number of Cyclists",
    fill = "Precipitation"
  ) +
  scale_fill_manual(
    values = c("No_Precipitation" = "darkred",
               "Some_Precipitation"    = "lightblue")
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )
```

The depictions above provide added clarity to the summary statistics from our quasi-Poisson model. Sunday was the baseline for our regression model's calculations, and all remaining days other than Saturday had positive coefficients. We can see in both our chart and table, that the count of cyclists is certainly higher throughout the week than it is on the weekends. As for the role of precipitation; our discretized variable NewPrecip had a regression coefficent of ~ 0.4 (the highest absolute value of any factor in the model) and a significant p value ~ .0002. That finding can be intuitively confirmed by taking a glance at our visuals. Other than Thursday, all other days for which there are data points for both precipitation and no precipitation show a *noticeable* decrease in cyclists when there is a presence of precipitation.

# Conclusion
Through our analysis via the means of both Poisson and quasi-Poisson modeling techniques, our findings were relatively consistent. Those being the following;

- The Manhattan Bridge is far busier during the week than it is on the weekend.
- At least in the month of July, the temperature does not play any sort of significant role in the raw number of the bridge's cyclists nor its share of the totality of East River Bridge cyclists.
- Although temperature is not significantly associated with cyclist traffic, precipitation is. Any amount of precipitation has a negative and statistically significant relationship with Manhattan Bridge's cyclist traffic.

Regarding the existence of both our Poisson and quasi-Poisson models to estimate the association between all these factors, the quasi-Poisson is more ideal due to this dataset's *extremely* high dispersion parameter ($\hat{\phi}$ ~ 142). 

If we were to continue or expand on this analysis in the future, it would be valuable to expand the scope of our data outside of the month of July and into months that border on seasonal changes such as March, April or October. Intuitively, one might guess that the day's temperatures play a much larger role in a time of the year like that.

# References:

Original Dataset Source:

- https://pengdsci.github.io/STA321/ww09/w09-AssignDataSet.xlsx

Dataset Download Links via Github:

- https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/NYC_Cyclists_Data.xlsx