Data Analysis
We will first get a brief summary of our data to get a good scope of
the data set.
Cyclists <- read.csv("https://raw.githubusercontent.com/TylerBattaglini/STA-321/refs/heads/main/ManhattanBridge.csv", header = TRUE)
summary(Cyclists)
X Date Day HighTemp LowTemp
Min. : 1.0 Min. :43009 Min. :43009 Min. :54.00 Min. :43.00
1st Qu.: 8.5 1st Qu.:43017 1st Qu.:43017 1st Qu.:64.90 1st Qu.:52.00
Median :16.0 Median :43024 Median :43024 Median :71.10 Median :57.00
Mean :16.0 Mean :43024 Mean :43024 Mean :69.84 Mean :57.92
3rd Qu.:23.5 3rd Qu.:43032 3rd Qu.:43032 3rd Qu.:75.45 3rd Qu.:64.90
Max. :31.0 Max. :43039 Max. :43039 Max. :82.00 Max. :72.00
Precipitation ManhattanBridge Total
Min. :0.0000 Min. : 661 Min. : 2835
1st Qu.:0.0000 1st Qu.:4530 1st Qu.:15956
Median :0.0000 Median :5610 Median :19939
Mean :0.1365 Mean :5298 Mean :18652
3rd Qu.:0.0550 3rd Qu.:6640 3rd Qu.:22977
Max. :3.0300 Max. :7691 Max. :26050
var(Cyclists)
X Date Day HighTemp
X 82.666667 82.666667 82.666667 -39.7333333
Date 82.666667 82.666667 82.666667 -39.7333333
Day 82.666667 82.666667 82.666667 -39.7333333
HighTemp -39.733333 -39.733333 -39.733333 61.4503656
LowTemp -25.286667 -25.286667 -25.286667 48.4430753
Precipitation 1.337333 1.337333 1.337333 -0.4978032
ManhattanBridge -4998.000000 -4998.000000 -4998.000000 1855.0206452
Total -16000.166667 -16000.166667 -16000.166667 6982.1644086
LowTemp Precipitation ManhattanBridge Total
X -2.528667e+01 1.3373333 -4998.0000 -16000.167
Date -2.528667e+01 1.3373333 -4998.0000 -16000.167
Day -2.528667e+01 1.3373333 -4998.0000 -16000.167
HighTemp 4.844308e+01 -0.4978032 1855.0206 6982.164
LowTemp 6.296340e+01 0.5293258 -3489.4718 -10768.102
Precipitation 5.293258e-01 0.2946570 -556.8061 -1890.055
ManhattanBridge -3.489472e+03 -556.8060645 2923576.0129 9607968.955
Total -1.076810e+04 -1890.0554409 9607968.9548 31808787.525
Poisson Regression on
Cyclists Count
We first build a Poisson regression with counts as our response and
also leaving out the Total. We use Poisson regression for this model
because this is a count data, which shows the number of occurrences of
an event, in this case cyclist counts, within a fixed interval of time.
We assume that our counts are independent from one another and also that
our variance and mean are equal to each other.
model.freq <- glm(ManhattanBridge~ HighTemp + Precipitation + LowTemp + Day, family = poisson(link = "log"), data = Cyclists)
pois.count.coef = summary(model.freq)$coef
kable(pois.count.coef, caption = "The Poisson regression model for the counts of cyclists entering the bridge vs the tempature, precipitation, and day.")
The Poisson regression model for the counts of cyclists
entering the bridge vs the tempature, precipitation, and day.
(Intercept) |
218.5321098 |
14.1546585 |
15.43888 |
0 |
HighTemp |
0.0163560 |
0.0005981 |
27.34862 |
0 |
Precipitation |
-0.9591493 |
0.0194684 |
-49.26688 |
0 |
LowTemp |
-0.0206914 |
0.0005624 |
-36.79199 |
0 |
Day |
-0.0048775 |
0.0003287 |
-14.83949 |
0 |
The above data above shows that all our data above is significant at
an alpha level of .05. This means that we have statistical evidence to
draw conclusions. Each coefficient represents the log change in the
expected count for one unit increase in our predictor value while
assuming all other values are constant.For example we can expect when
precipitation goes up by one unit we can expect our count to go down by
-.959. Some other findings we can see from our output above is that
precipitation is our biggest influence on the count because its absolute
value is the highest out all the coefficients. We also see when
comparing low temperature with high temperature that a lower temperature
will tend to have more affect on our count rather than high temperatures
because the low temperature has a higher absolute value.
Poisson Rregression
on Rates
The following model has the same variables but we are offsetting with
the Total count. By taking the logarithm of Total count as the offset,
we are adjusting the counts of cyclists on the Manhattan Bridge for the
total number of cyclists across all bridges. This means that if the
total number of cyclists varies significantly from day to day, the model
will account for that variability.
model.rates <- glm(ManhattanBridge ~ HighTemp + Precipitation + LowTemp + Day, offset = log(Total),
family = poisson(link = "log"), data = Cyclists)
kable(summary(model.rates)$coef, caption = "Poisson regression on the rate of the
the cyclists entering and leaving the Brooklyn Bridge.")
Poisson regression on the rate of the the cyclists entering and
leaving the Brooklyn Bridge.
(Intercept) |
59.9613619 |
14.6285560 |
4.0989255 |
0.0000415 |
HighTemp |
0.0000196 |
0.0006049 |
0.0324407 |
0.9741206 |
Precipitation |
-0.0707936 |
0.0130863 |
-5.4097362 |
0.0000001 |
LowTemp |
-0.0018394 |
0.0005745 |
-3.2018355 |
0.0013655 |
Day |
-0.0014205 |
0.0003397 |
-4.1817681 |
0.0000289 |
We see from the above output that all of our p-values are blow a
significance level of .01 except for High temp. We also see that all of
our coefficients have changed to a lower absolute value. One change that
stood out to me was the decrease of high temp, compared to the other
coefficients this one dropped the most and is basically a non factor in
predicting the model.
EDA and Feature
Engineering
Since low temp and high temp are very similar variables we combine to
make average temperature. This will give us a better understanding of
how many people go cycling on the bridge given the average temperature
of the day not just high and low temp. So we add up the two variables
and then divide by two to get the days average. We then change our
precipitation variable for the purpose of seeing if any type of
precipitation throughout the day changed our totals. So what we do is
assign a 0 if there was no rain on a day and then a 1 if there was any
precipitation on that day.
Cyclists$AvgTemp <- (Cyclists$HighTemp + Cyclists$LowTemp) / 2
Cyclists$NewPrecip <- ifelse(Cyclists$Precipitation > 0, 1, 0)
Cyclists$NewPrecip <- as.factor(Cyclists$NewPrecip)
Cyclists<- Cyclists[, !(names(Cyclists) %in% c("HighTemp", "LowTemp", "Precipitation"))]
head(Cyclists)
X Date Day ManhattanBridge Total AvgTemp NewPrecip
1 1 43009 43009 4540 15975 58.45 0
2 2 43010 43010 7059 23784 62.00 0
3 3 43011 43011 7370 25280 63.50 0
4 4 43012 43012 7691 25477 65.45 0
5 5 43013 43013 7034 23942 73.45 0
6 6 43014 43014 6204 22197 75.05 0
New Variables
Date - The date in which the total number of bikes was recorded and
is our observation ID. Day - The day of the given by a number AvgTemp -
The average temperature of a given day. numerical NewPrecp - Gives us a
1 or a 0 if there is precipitation or none. 1 for precip and 0 for none
ManhattanBridge - The total number of cyclists on the bridge on a
certain day. numerical Total - The total number of bikes recorded on all
the bridges in a certain day. numerical
Dispersion Index
Next, we extract the approximated dispersion index using both Pearson
and deviance residuals
pois.model = glm(ManhattanBridge ~ Day + AvgTemp+ NewPrecip,
family = poisson(link="log"), data =Cyclists)
yhat = pois.model$fitted.values
pearson.resid = (Cyclists$ManhattanBridge - yhat)/sqrt(yhat)
Pearson.disp = sum(pearson.resid^2)/pois.model$df.residual
Deviance.disp = (pois.model$deviance)/pois.model$df.residual
disp = cbind(Pearson.disp = Pearson.disp, Deviance.disp = Deviance.disp)
kable(disp, caption="Dispersion parameter", align = 'c')
Dispersion parameter
324.1135 |
358.387 |
From the output above our Poison assumption is seriously violated.
Our dispersion output is very far off our value of 1 so we will continue
and use a Quasi poisson regression model to account for our variation
being bigger than our mean.
Quasi Model
We now summarize our inferential statistics
quasi.model = glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip,
family = quasipoisson, data =Cyclists)
summary(quasi.model )
Call:
glm(formula = ManhattanBridge ~ Day + AvgTemp + NewPrecip, family = quasipoisson,
data = Cyclists)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 507.263386 236.058816 2.149 0.0408 *
Day -0.011580 0.005482 -2.112 0.0441 *
AvgTemp -0.005623 0.006963 -0.808 0.4264
NewPrecip1 -0.503976 0.109360 -4.608 8.72e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 324.1152)
Null deviance: 19829.5 on 30 degrees of freedom
Residual deviance: 9676.5 on 27 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
We see from the above output that our Avgtemp is not significant.
Even though this is concerning we know that average temperature does
have an affect on whether or not cyclists enter and leave the bridge. So
we continue but be careful of our output and its significance.
SE.quasi.pois = summary(quasi.model)$coef
kable(SE.quasi.pois, caption = "Summary statistics of quasi-poisson regression model")
Summary statistics of quasi-poisson regression model
(Intercept) |
507.2633860 |
236.0588157 |
2.1488856 |
0.0407684 |
Day |
-0.0115796 |
0.0054821 |
-2.1122515 |
0.0440572 |
AvgTemp |
-0.0056231 |
0.0069633 |
-0.8075388 |
0.4264135 |
NewPrecip1 |
-0.5039760 |
0.1093600 |
-4.6084131 |
0.0000872 |
This is our final model. Since our model is in log scale we need to
show these values in a way we can draw actual conclusions from. For
example, the coefficient for Newprecip is -0.504. This is the estimated
Poisson regression coefficient comparing precipitation or none, given
the other variables are held constant in the model. The difference in
the logs of expected publications is expected to be 0.504 units lower
for precipitation compared to none while holding the other variables
constant in the model. What we can do to make this easier to understand
is make visual aids to help us see what is actually happening.
Visual Aid
Next, we make a visualization to show how the explanatory variables
in the final working model affect the actual number of cyclists. We
exponentiate the log count of cyclists and make two graphs showcasing
precipitation. Both graphs will show the amount of cyclists given the x
axis of either days and average temperature. The first graph will hold
temperature constant and showcase as days progress how many cyclists are
there while also showing the affect of precipitation and no
precipitation. The second graph will hold days constant while also
showcasing the affect of temperature on the cyclist count and also
showing precipitation
day_range <- seq(min(Cyclists$Day), max(Cyclists$Day), length.out = 100) # Days from min to max
temp_range <- seq(min(Cyclists$AvgTemp), 90, length.out = 100) # Temperature from min to 90 degrees
# Create a new data frame for predictions
pred_data <- data.frame(Day = rep(day_range, each = 2), # Two rows for each day, for both Precipitation 0 and 1
AvgTemp = rep(mean(Cyclists$AvgTemp), 200), # Hold AvgTemp constant
NewPrecip1 = rep(c(0, 1), each = 100)) # Precipitation 0 and 1
# Calculate predicted cyclist counts for both conditions (no precipitation and precipitation)
pred_data$Cyclists_Predicted <- exp(507.2633860 - 0.0115796 * pred_data$Day -
0.0056231 * pred_data$AvgTemp -
0.5039760 * pred_data$NewPrecip1)
# Plot the effect of Day on predicted cyclist counts for both Precipitation 0 and 1
ggplot(pred_data, aes(x = Day, y = Cyclists_Predicted, color = factor(NewPrecip1))) +
geom_line() +
scale_color_manual(values = c("red", "blue"), labels = c("No Precipitation", "Precipitation")) +
labs(title = "Effect of Day on Predicted Cyclist Counts",
x = "Day",
y = "Predicted Cyclist Counts",
color = "Precipitation") +
theme_minimal()

# Now plot the effect of AvgTemp on predicted cyclist counts for both Precipitation 0 and 1
pred_data$AvgTemp <- temp_range
pred_data$Cyclists_Predicted <- exp(507.2633860 - 0.0115796 * pred_data$Day -
0.0056231 * pred_data$AvgTemp -
0.5039760 * pred_data$NewPrecip1)
ggplot(pred_data, aes(x = AvgTemp, y = Cyclists_Predicted, color = factor(NewPrecip1))) +
geom_line() +
scale_color_manual(values = c("red", "blue"), labels = c("No Precipitation", "Precipitation")) +
labs(title = "Effect of AvgTemp on Predicted Cyclist Counts",
x = "Average Temperature (\u00b0F)", # Using Unicode for degree symbol
y = "Predicted Cyclist Counts",
color = "Precipitation") +
scale_x_continuous(limits = c(min(Cyclists$AvgTemp), 80)) + # Set x-axis to go up to 90 degrees
theme_minimal()
Warning: Removed 50 rows containing missing values or values outside the scale range
(`geom_line()`).

We see from the graph above that precipitation has a major effect on
both of the graphs. From the red and the blue lines are drastically
different, the red line is way above the blue line showing that the
predicted counts go down when there is precipitation. The first graph
showcases the Days, shows that there is a decrease when the days goes
up. Also from the second graph we see that again when the temperature
goes up the predicted count of cyclists goes down.
---
title: "Poisson Regression of Manhattan Bridge"
author: 'Tyler Battaglini'
date: "2024-11-3"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    fig_width: 4
    fig_caption: yes
    number_sections: yes
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
editor_options: 
  chunk_output_type: inline
always_allow_html: true
---

```{=html}

<style type="text/css">

/* Cascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in HTML or XML. it is a simple mechanism for adding style (e.g., fonts, colors, spacing) to Web documents. */

h1.title {  /* Title - font specifications of the report title */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}
h4.author { /* Header 4 - font specifications for authors  */
  font-size: 20px;
  font-weight: bold;
  font-family: system-ui;
  color: DarkRed;
  text-align: center;
}
h4.date { /* Header 4 - font specifications for the date  */
  font-size: 18px;
  font-weight: bold;
  font-family: system-ui;
  color: DarkBlue;
  text-align: center;
}
h1 { /* Header 1 - font specifications for level 1 section title  */
    font-size: 22px;
    font-weight: bold;
    font-family: system-ui;
    color: navy;
    text-align: left;
}
h2 { /* Header 2 - font specifications for level 2 section title */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - font specifications of level 3 section title  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - font specifications of level 4 section title  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

</style>
```

```{r setup, include=FALSE}
# Detect, install and load packages if needed.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("MASS")) {
   install.packages("MASS")
   library(MASS)
}
if (!require("nleqslv")) {
   install.packages("nleqslv")
   library(nleqslv)
}
#
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}

if (!require("psych")) {   
  install.packages("psych")
   library(psych)
}
if (!require("MASS")) {   
  install.packages("MASS")
   library(MASS)
}
if (!require("ggplot2")) {   
  install.packages("ggplot2")
   library(ggplot2)
}
if (!require("GGally")) {   
  install.packages("GGally")
   library(GGally)
}
if (!require("car")) {   
  install.packages("car")
   library(car)
}
if (!require("dplyr")) {   
  install.packages("dplyr")
   library(dplyr)
}
if (!require("caret")) {   
  install.packages("caret")
   library(caret)
}
if (!require("readxl")) {   
  install.packages("readxl")
   library(readxl)
}
if (!require("openxlsx")) {   
  install.packages("openxlsx")
   library(openxlsx)
}
# specifications of outputs of code in code chunks
knitr::opts_chunk$set(echo = TRUE,      # include code chunk in the output file
                      warnings = FALSE,  # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      messages = FALSE,  #
                      results = TRUE,
                      
                      comment = NA       # you can also decide whether to include the output
                                         # in the output file.
                      )   
```

# Introduction

## Description

The daily total of bike counts was conducted monthly on the Manhattan Bridge. To keep count of cyclists entering and leaving the Manhattan bridge the Traffic Information Management System (TIMS) collects the count data. Each record represents the total number of cyclists per 24 hours at Brooklyn Bridge

Date - The date in which the total number of bikes was recorded and is our observation ID.
Day - The day of the week Monday through Sunday
HighTemp - The highest the temperature on a certain day
LowTemp - The low temperature on a certain day
Precipitation - The amount of rain recorded for that day, most likely in inches of rain
ManhattanBridge - The total number of cyclists on the bridge on a certain day
Total - The total number of bikes recorded on all the bridges in a certain day

## Research Question
How do various predictors affect the amount of cyclists on this city's bridges. This analysis aims to identify significant predictors of cyclist totals using Poisson regression and focusing on variables such as temperature, day, and precipitation. Our response variable for this analysis is total number of cyclists

# Data Analysis

We will first get a brief summary of our data to get a good scope of the data set. 

```{r}
Cyclists <- read.csv("https://raw.githubusercontent.com/TylerBattaglini/STA-321/refs/heads/main/ManhattanBridge.csv", header = TRUE)
summary(Cyclists)
var(Cyclists)
```

## Poisson Regression on Cyclists Count

We first build a Poisson regression with counts as our response and also leaving out the Total. We use Poisson regression for this model because this is a count data, which shows the number of occurrences of an event, in this case cyclist counts, within a fixed interval of time. We assume that our counts are independent from one another and also that our variance and mean are equal to each other.

```{r}
model.freq <- glm(ManhattanBridge~ HighTemp + Precipitation + LowTemp + Day, family = poisson(link = "log"), data = Cyclists)
pois.count.coef = summary(model.freq)$coef
kable(pois.count.coef, caption = "The Poisson regression model for the counts of cyclists entering the bridge vs the tempature, precipitation, and day.")

```

The above data above shows that all our data above is significant at an alpha level of .05. This means that we have statistical evidence to draw conclusions. Each coefficient represents the log change in the expected count for one unit increase in our predictor value while assuming all other values are constant.For example we can expect when precipitation goes up by one unit we can expect our count to go down by -.959. Some other findings we can see from our output above is that precipitation is our biggest influence on the count because its absolute value is the highest out all the coefficients. We also see when comparing low temperature with high temperature that a lower temperature will tend to have more affect on our count rather than high temperatures because the low temperature has a higher absolute value. 

## Poisson Rregression on Rates

The following model has the same variables but we are offsetting with the Total count. By taking the logarithm of Total count as the offset, we are adjusting the counts of cyclists on the Manhattan Bridge for the total number of cyclists across all bridges. This means that if the total number of cyclists varies significantly from day to day, the model will account for that variability.

```{r}
model.rates <- glm(ManhattanBridge ~ HighTemp + Precipitation + LowTemp + Day, offset = log(Total), 
                   family = poisson(link = "log"), data = Cyclists)
kable(summary(model.rates)$coef, caption = "Poisson regression on the rate of the 
      the cyclists entering and leaving the Brooklyn Bridge.")
```

We see from the above output that all of our p-values are blow a significance level of .01 except for High temp. We also see that all of our coefficients have changed to a lower absolute value. One change that stood out to me was the decrease of high temp, compared to the other coefficients this one dropped the most and is basically a non factor in predicting the model.

## EDA and Feature Engineering

Since low temp and high temp are very similar variables we combine to make average temperature. This will give us a better understanding of how many people go cycling on the bridge given the average temperature of the day not just high and low temp. So we add up the two variables and then divide by two to get the days average. We then change our precipitation variable for the purpose of seeing if any type of precipitation throughout the day changed our totals. So what we do is assign a 0 if there was no rain on a day and then a 1 if there was any precipitation on that day.


```{r}
Cyclists$AvgTemp <- (Cyclists$HighTemp + Cyclists$LowTemp) / 2

Cyclists$NewPrecip <- ifelse(Cyclists$Precipitation > 0, 1, 0)
Cyclists$NewPrecip <- as.factor(Cyclists$NewPrecip)


Cyclists<- Cyclists[, !(names(Cyclists) %in% c("HighTemp", "LowTemp", "Precipitation"))]

head(Cyclists)

```

## New Variables

Date - The date in which the total number of bikes was recorded and is our observation ID.
Day - The day of the given by a number
AvgTemp - The average temperature of a given day. numerical
NewPrecp - Gives us a 1 or a 0 if there is precipitation or none. 1 for precip and 0 for none
ManhattanBridge - The total number of cyclists on the bridge on a certain day. numerical 
Total - The total number of bikes recorded on all the bridges in a certain day. numerical

## Dispersion Index

Next, we extract the approximated dispersion index using both Pearson and deviance residuals

```{r}
pois.model = glm(ManhattanBridge ~ Day + AvgTemp+ NewPrecip, 
                 family = poisson(link="log"), data =Cyclists)  
yhat = pois.model$fitted.values
pearson.resid = (Cyclists$ManhattanBridge - yhat)/sqrt(yhat)
Pearson.disp = sum(pearson.resid^2)/pois.model$df.residual

Deviance.disp = (pois.model$deviance)/pois.model$df.residual

disp = cbind(Pearson.disp = Pearson.disp, Deviance.disp = Deviance.disp)
kable(disp, caption="Dispersion parameter", align = 'c')

```

From the output above our Poison assumption is seriously violated. Our dispersion output is very far off our value of 1 so we will continue and use a Quasi poisson regression model to account for our variation being bigger than our mean.

## Quasi Model

We now summarize our inferential statistics

```{r}
quasi.model = glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip, 
                 family = quasipoisson, data =Cyclists)  
summary(quasi.model )

```
We see from the above output that our Avgtemp is not significant. Even though this is concerning we know that average temperature does have an affect on whether or not cyclists enter and leave the bridge. So we continue but be careful of our output and its significance. 

```{r}
SE.quasi.pois = summary(quasi.model)$coef
kable(SE.quasi.pois, caption = "Summary statistics of quasi-poisson regression model")

```

This is our final model. Since our model is in log scale we need to show these values in a way we can draw actual conclusions from. For example, the coefficient for Newprecip is -0.504. This is the estimated Poisson regression coefficient comparing precipitation or none, given the other variables are held constant in the model. The difference in the logs of expected publications is expected to be 0.504 units lower for precipitation compared to none while holding the other variables constant in the model. What we can do to make this easier to understand is make visual aids to help us see what is actually happening.

## Visual Aid

Next, we make a visualization to show how the explanatory variables in the final working model affect the actual number of cyclists. 
We exponentiate the log count of cyclists and make two graphs showcasing precipitation. Both graphs will show the amount of cyclists given the x axis of either days and average temperature. The first graph will hold temperature constant and showcase as days progress how many cyclists are there while also showing the affect of precipitation and no precipitation. The second graph will hold days constant while also showcasing the affect of temperature on the cyclist count and also showing precipitation



```{r}
day_range <- seq(min(Cyclists$Day), max(Cyclists$Day), length.out = 100)  # Days from min to max
temp_range <- seq(min(Cyclists$AvgTemp), 90, length.out = 100)  # Temperature from min to 90 degrees

# Create a new data frame for predictions
pred_data <- data.frame(Day = rep(day_range, each = 2),  # Two rows for each day, for both Precipitation 0 and 1
                        AvgTemp = rep(mean(Cyclists$AvgTemp), 200),  # Hold AvgTemp constant
                        NewPrecip1 = rep(c(0, 1), each = 100))  # Precipitation 0 and 1

# Calculate predicted cyclist counts for both conditions (no precipitation and precipitation)
pred_data$Cyclists_Predicted <- exp(507.2633860 - 0.0115796 * pred_data$Day - 
                                     0.0056231 * pred_data$AvgTemp - 
                                     0.5039760 * pred_data$NewPrecip1)

# Plot the effect of Day on predicted cyclist counts for both Precipitation 0 and 1
ggplot(pred_data, aes(x = Day, y = Cyclists_Predicted, color = factor(NewPrecip1))) +
  geom_line() +
  scale_color_manual(values = c("red", "blue"), labels = c("No Precipitation", "Precipitation")) +
  labs(title = "Effect of Day on Predicted Cyclist Counts",
       x = "Day",
       y = "Predicted Cyclist Counts",
       color = "Precipitation") +
  theme_minimal()

# Now plot the effect of AvgTemp on predicted cyclist counts for both Precipitation 0 and 1
pred_data$AvgTemp <- temp_range
pred_data$Cyclists_Predicted <- exp(507.2633860 - 0.0115796 * pred_data$Day - 
                                     0.0056231 * pred_data$AvgTemp - 
                                     0.5039760 * pred_data$NewPrecip1)

ggplot(pred_data, aes(x = AvgTemp, y = Cyclists_Predicted, color = factor(NewPrecip1))) +
  geom_line() +
  scale_color_manual(values = c("red", "blue"), labels = c("No Precipitation", "Precipitation")) +
  labs(title = "Effect of AvgTemp on Predicted Cyclist Counts",
       x = "Average Temperature (\u00b0F)",  # Using Unicode for degree symbol
       y = "Predicted Cyclist Counts",
       color = "Precipitation") +
  scale_x_continuous(limits = c(min(Cyclists$AvgTemp), 80)) +  # Set x-axis to go up to 90 degrees
  theme_minimal()
```

We see from the graph above that precipitation has a major effect on both of the graphs. From the red and the blue lines are drastically different, the red line is way above the blue line showing that the predicted counts go down when there is precipitation. The first graph showcases the Days, shows that there is a decrease when the days goes up. Also from the second graph we see that again when the temperature goes up the predicted count of cyclists goes down.




# Conclusion

We do not use our original Poisson regression because of our big value of our dispersion. This big dispersion value means that our variance is a lot bigger than our mean. So we then use a Quasi Poisson model to account for this over dispersion by introducing a dispersion parameter. This model gave us more accurate p-values. Judging from our graphs and output above we see that precipitation has the biggest influence. Precipitation has the biggest absolute value coefficient and also shows the biggest difference between the two sides of our variables. There is also a gradual decrease in both of the graphs. When both days and temperature go up in their respective graphs the count of cyclists goes down. 