1. Reading Data

Reading the data and performing minor adjustments to remove inappropriate outliers and make the data easy to work with.

library(readr)
library(ggplot2)
library(patchwork)
library(dplyr)
library(lubridate)
library(GGally)
library(corrplot)
library(car)
library(ggthemes)
library(ggrepel)
library(boot)
library(broom)
library(lindia)
week2=read_csv("C:/Users/rajas/OneDrive/Desktop/Desktop/Applied Data Science/INFOH510/R Jupyter/Metro_Interstate_Traffic_Volume.csv")
week2=week2[week2$temp>0,]
week2=week2[week2$rain_1h< 60,]
week2<- week2|>
  mutate(temp=(((temp-273)*9/5))+32)
week2$hour<- as.integer(format(as.POSIXct(week2$date_time),"%H")) #converting the date_time information into hours,month,year, weekdays to get relevant insights.
week2$month<- month(as.integer(format(as.POSIXct(week2$date_time),"%m")),label = TRUE) #using lubridate library to get the month labels
week2$year<- as.integer(format(as.POSIXct(week2$date_time),"%y"))
week2$day<- as.integer(format(as.POSIXct(week2$date_time),"%d"))
week2$weekday<-weekdays(as.Date(week2$date_time))
week2$weekday<-factor(week2$weekday,levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) #sorting the weekdays
data_df<-week2

df_selected <- data_df |>
  select(traffic_volume, weather_main, temp, hour, holiday,weekday)
df_selected
## # A tibble: 48,193 × 6
##    traffic_volume weather_main  temp  hour holiday weekday
##             <dbl> <chr>        <dbl> <int> <chr>   <fct>  
##  1           5545 Clouds        59.5     9 None    Tuesday
##  2           4516 Clouds        61.4    10 None    Tuesday
##  3           4767 Clouds        61.8    11 None    Tuesday
##  4           5026 Clouds        62.8    12 None    Tuesday
##  5           4918 Clouds        64.7    13 None    Tuesday
##  6           5181 Clear         65.7    14 None    Tuesday
##  7           5584 Clear         68.3    15 None    Tuesday
##  8           6015 Clear         69.5    16 None    Tuesday
##  9           5791 Clouds        70.1    17 None    Tuesday
## 10           4770 Clouds        68.2    18 None    Tuesday
## # ℹ 48,183 more rows

2. Previous data dive insights

Below are the insights we gained from the previous data dive where we chose the response and explanatory variable and performed various tests. One of the test was an anova test where we disproved the null hypothesis and concluded that weather effects have impact on mean traffic volume.

table(df_selected$weather_main)
## 
##        Clear       Clouds      Drizzle          Fog         Haze         Mist 
##        13381        15164         1821          912         1360         5950 
##         Rain        Smoke         Snow       Squall Thunderstorm 
##         5671           20         2876            4         1034
df_selected <- df_selected |>
  mutate(weather_main = dplyr::recode(weather_main, "Squall" = "Extreme", "Thunderstorm" = "Extreme"))
table(df_selected$weather_main)
## 
##   Clear  Clouds Drizzle Extreme     Fog    Haze    Mist    Rain   Smoke    Snow 
##   13381   15164    1821    1038     912    1360    5950    5671      20    2876

ANOVA Test on weather_main and traffic_volume

anova_result <- aov(traffic_volume ~ weather_main, data = df_selected)
summary(anova_result)
##                 Df    Sum Sq   Mean Sq F value Pr(>F)    
## weather_main     9 3.759e+09 417640535   107.9 <2e-16 ***
## Residuals    48183 1.865e+11   3869919                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value is much smaller than 0.05, we reject the null hypothesis. This means that weather conditions significantly affect traffic volume. The large F-statistic suggests that the differences in mean traffic volume across weather categories are meaningful. Weather plays a statistically significant role in determining traffic patterns. City planners and transportation authorities should consider weather conditions when designing traffic management systems. The above box plot shows us mean traffic is lower during Fog and Mist. These are the specific conditions that will be needed to be catered to while addressing concerns or creating designs.

ggplot(df_selected, aes(x = weather_main, y = traffic_volume)) +
  geom_boxplot() +
  labs(title = "Traffic Volume vs Weather Conditions",
       x = "Weather Condition",
       y = "Traffic Volume")+
  theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'lightblue'),
          panel.grid.major = element_line(color = "white")) 

We also built a linear regression model to check the dependency of variables. We chose one dependent variable and concluded that Temperature has a statistically significant but weak effect on traffic volume. The effect size is small, meaning other variables (like time of day, day of the week, holidays, or weather conditions) likely have a much stronger influence on traffic volume. We have observed this previously where time of the day and week of the day have significant impact on traffic volume. So if we want a better model, we should add more predictors (e.g., holiday, hour of the day) to improve the model’s predictive power.

lm_model <- lm(traffic_volume ~ temp, data = df_selected)
summary(lm_model)
## 
## Call:
## lm(formula = traffic_volume ~ temp, data = df_selected)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3626  -1984     93   1662   4341 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2721.7694    20.4533  133.07   <2e-16 ***
## temp          11.4860     0.3921   29.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1969 on 48191 degrees of freedom
## Multiple R-squared:  0.01749,    Adjusted R-squared:  0.01747 
## F-statistic:   858 on 1 and 48191 DF,  p-value: < 2.2e-16

We will further delve deeper in this assignment to include more variables and diagnose them.

3. Building an Improved Regression Model with Interaction and Binary Terms

3.1 Adding New Variables

We’ll extend our model by adding:

  1. Hour of the Day (hour) – Traffic volume is likely to vary depending on the time of day.

  2. Weather Condition (weather_main) – Certain weather conditions may impact traffic volume.

  3. Interaction Term (temp * hour) – To check if temperature influences traffic differently depending on the time of day.

  4. Binary Variable (holiday) – To capture differences in traffic on holidays.

3.2 Why These Variables?

  • Hour (hour): We expect rush hour (7-9 AM, 4-7 PM) to have increased traffic.

  • Weather (weather_main): Certain conditions (rain, snow) might reduce traffic. We have already seen in the above anova analysis that Haze and Fog do effect weather similar to clouds, etc.

  • Interaction (temp * hour): Traffic may be more sensitive to temperature changes at certain times.

  • Holiday (holiday): Weekends/holidays may see reduced commuter traffic.

3.3 Potential Multicollinearity Issues

Before proceeding, let’s check Variance Inflation Factor (VIF) to detect multicollinearity:

vif(lm(traffic_volume ~ temp + hour + weather_main + temp:hour + holiday, data = df_selected))
##                  GVIF Df GVIF^(1/(2*Df))
## temp         4.310860  1        2.076261
## hour         5.304140  1        2.303072
## weather_main 1.167735  9        1.008652
## holiday      1.009854 11        1.000446
## temp:hour    9.271777  1        3.044959

Any VIF over 5 is a concern as it indicates high multicollinearity. Here we observe both hour and temp:hour having VIF greater. the interaction pair has a large multicolinearity. We should ideally remove these parameter. But we will run the VIF check removing only the pair to see if that reduces the colinearity of hour.

vif(lm(traffic_volume ~ temp + hour + weather_main + holiday, data = df_selected))
##                  GVIF Df GVIF^(1/(2*Df))
## temp         1.147373  1        1.071155
## hour         1.038489  1        1.019063
## weather_main 1.159077  9        1.008235
## holiday      1.006483 11        1.000294

From the results above we see that removing the interaction pair significantly reduced the multicolinearity of other variables. We will keep the variables hour and temp separately and instead choose weekday for interaction pair.

vif(lm(traffic_volume ~ temp + hour + weather_main + weekday:hour + holiday, data = df_selected))
##                  GVIF Df GVIF^(1/(2*Df))
## temp         1.147471  1        1.071201
## hour         2.581052  1        1.606565
## weather_main 1.166707  9        1.008603
## holiday      1.006485 11        1.000294
## hour:weekday 2.559308  6        1.081459

With every variable VIF within acceptable levels the new variable list changes as below.

  1. Hour of the Day (hour) – Traffic volume is likely to vary depending on the time of day.

  2. Weather Condition (weather_main) – Certain weather conditions may impact traffic volume.

  3. Interaction Term (weekday * hour) – To check if weekday influences traffic differently depending on the time of day. (Traffic might be different on weekdays and weekends)

  4. Binary Variable (holiday) – To capture differences in traffic on holidays.

4. Evaluating the Model

Lets fit and diagnose the above model to gain further insights.

4.1 Fit the extended model

lm_extended <- lm(traffic_volume ~ temp + hour + weather_main + weekday:hour + holiday, data = df_selected)
summary(lm_extended)
## 
## Call:
## lm(formula = traffic_volume ~ temp + hour + weather_main + weekday:hour + 
##     holiday, data = df_selected)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5397.7 -1646.8    68.3  1485.2  4645.6 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       476.8176   736.9499   0.647 0.517625    
## temp                                8.8236     0.3849  22.924  < 2e-16 ***
## hour                               89.0123     1.9028  46.779  < 2e-16 ***
## weather_mainClouds                443.1876    21.4701  20.642  < 2e-16 ***
## weather_mainDrizzle               165.9657    45.1162   3.679 0.000235 ***
## weather_mainExtreme              -210.7459    58.6663  -3.592 0.000328 ***
## weather_mainFog                    12.3467    61.9070   0.199 0.841920    
## weather_mainHaze                  501.0958    51.5611   9.718  < 2e-16 ***
## weather_mainMist                   76.7633    28.2479   2.717 0.006580 ** 
## weather_mainRain                  133.5057    28.8533   4.627 3.72e-06 ***
## weather_mainSmoke                 143.3855   403.9934   0.355 0.722651    
## weather_mainSnow                  166.6649    38.3272   4.348 1.37e-05 ***
## holidayColumbus Day              -473.8311  1092.8421  -0.434 0.664598    
## holidayIndependence Day            -0.8928  1092.9233  -0.001 0.999348    
## holidayLabor Day                 -144.9491  1004.1190  -0.144 0.885221    
## holidayMartin Luther King Jr Day -317.8631  1042.0217  -0.305 0.760334    
## holidayMemorial Day              -305.5051  1092.9102  -0.280 0.779837    
## holidayNew Years Day              501.9674  1041.9403   0.482 0.629977    
## holidayNone                      1105.6598   736.9498   1.500 0.133538    
## holidayState Fair                -525.3388  1092.8549  -0.481 0.630730    
## holidayThanksgiving Day           -46.5805  1041.9750  -0.045 0.964343    
## holidayVeterans Day              -326.2052  1092.8027  -0.299 0.765320    
## holidayWashingtons Birthday      -438.4094  1092.7909  -0.401 0.688287    
## hour:weekdayTuesday                17.2176     2.2925   7.510 6.00e-14 ***
## hour:weekdayWednesday              21.9347     2.2796   9.622  < 2e-16 ***
## hour:weekdayThursday               27.6850     2.2857  12.113  < 2e-16 ***
## hour:weekdayFriday                 33.5054     2.2895  14.634  < 2e-16 ***
## hour:weekdaySaturday              -14.9533     2.2923  -6.523 6.94e-11 ***
## hour:weekdaySunday                -48.2795     2.2887 -21.094  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1805 on 48164 degrees of freedom
## Multiple R-squared:  0.1755, Adjusted R-squared:  0.175 
## F-statistic: 366.1 on 28 and 48164 DF,  p-value: < 2.2e-16

The Multiple R-squared shows that the model explains 17.55% of the variance in traffic_volume which is pretty much a weak fit. The Adjust R-squared value of 0.175 does not differ from the adjusted signifying the new variables we added have not over fitted. The higher F-statistic value and very low p value shows us that the overall model is highly significant and one or more of the predictors are useful. The model is statistically significant, but it explains a relatively small portion of the variability in traffic volume. This suggests there are other important factors influencing traffic that are not accounted for in the model. Lets analyse individual variables and then further discuss if we can improve the model.

4.2 Interpretations of Variables:

4.2.1 hour

Estimate: 89.01 (p < 2e-16): As each hour of the day passes, the traffic increases by 89 vehicles on average. This assumes a linear effect, meaning traffic increases steadily throughout the day. We will verify this by plotting the residuals under diagnostics to see if this conclusion hold true.

4.2.2 temp

Estimate: 8.82 (p < 2e-16): Traffic volume increases by 8.82 vehicles per degree Celsius. This suggests a small but statistically significant relationship between temperature and traffic volume. Warmer temperatures might encourage more travel. However, the effect size is relatively small compared to time-of-day and weather.

ggplot(data = df_selected) +
  geom_point(mapping = aes(x=temp, y=traffic_volume))+
      theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

4.2.3 weather_main

Some weather conditions show a strong impact on traffic:

Significant Increases in Traffic:

  • Clouds: +443.19 (p < 2e-16) → Substantial increase.

  • Haze: +501.10 (p < 2e-16) → Large increase.

  • Mist: +76.76 (p = 0.0066) → Small increase.

  • Rain: +133.51 (p = 3.72e-06) → Moderate increase.

  • Snow: +166.66 (p = 1.37e-05) → Moderate increase.

Unexpected finding:

  • Snow and haze increase traffic volume, which contradicts the assumption that bad weather decreases traffic. This might indicate that more people drive instead of using alternative transport in bad weather.

Significant Decreases in Traffic:

  • Extreme weather (Thunderstorms, Squalls): -210.75 (p = 0.0003)

  • Traffic volume drops in extreme weather, possibly due to hazardous conditions.

Weather clearly plays a role, but the specific effects should be explored further, especially the increase during snow events.

4.2.4 holiday

Most holidays are not statistically significant, meaning they do not have a major impact on traffic. No Holiday (None): +1105.66 (p = 0.133). This suggests traffic is slightly higher on non-holidays. Some holidays (e.g., Thanksgiving, Veterans Day) may not affect traffic significantly.A deeper look at pre-holiday and post-holiday effects might be useful.

4.2.5 weekday:hour

The interaction term shows how traffic changes depending on the day of the week:

  • Tuesday-Friday: Traffic increases significantly as the day progresses.

  • Saturday: Less traffic later in the day.

  • Sunday: Strongest negative effect. Traffic drops throughout the day.

This confirms that weekday traffic follows work-hour patterns, while weekends behave differently. Sunday consistently shows the largest decrease in traffic, likely due to fewer work commutes.

5. Extended model diagnostics

gg_reshist(lm_extended)+
    theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

When we look at the above histogram of residuals, we see that there is a slight concern, as the tails are not equally distributed. There is a “shorter” tail on the left, and a longer one on the right. This is not severe, but it indicates that some aspect of the phenomenon is missing from the model (i.e., we may need more/different explanatory variables).

gg_resfitted(lm_extended) +
  geom_smooth(se=FALSE)+
      theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

For this particular plot, we can already see that our assumption is being violated, in that the variance in errors increases and decreases as the day progresses. Traffic volume typically follows a peak pattern (e.g., rush hours). A linear trend may not fully capture variations in morning/evening peaks.

gg_qqplot(lm_extended)+
      theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

The residuals deviate significantly from the red diagonal line, indicating non-normality. The tails (both lower and upper extremes) curve away from the line, suggesting heavy tails (presence of outliers or skewness). The left tail (lower values) is much lower than expected. The right tail (higher values) is much higher than expected. This is characteristic of a skewed distribution or potential unequal variance. The pattern suggests outliers or non-linear relationships that the model may not be capturing well. We can use boxplots or leverage Cook’s distance to identify high-leverage points. Based on these details normalizing the residuals like Log transformation or polynomial transformation might help us improve the model.

gg_cooksd(lm_extended, threshold = 'matlab')+
      theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

Here, we can see a few rows of data have a high influence on the model. Each number represents the row in the data. Lets select Rows 16334, 16337, 23088, 23086, 36808, 38793, 46802 , so we can investigate them by plotting various variables against the response variable.

ggplot(data = slice(df_selected, 
                    c(16334,16337, 23088,23086, 36808,38793,46802))) +
  geom_point(data = df_selected,
             mapping = aes(x = hour, y = traffic_volume)) +
  geom_point(mapping = aes(x = hour, y = traffic_volume),
             color = 'darkred') +
  geom_text_repel(mapping = aes(x = hour, 
                                y = traffic_volume,
                                label = weather_main),
             color = 'darkred') +
  labs(title="Investigating High Influence Points",
       subtitle="Label = Weather Condition")+
      theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

The above plot shows us that the majority of the outliers that are present are on days when the weather is “Smoke”. But the phenomenon is not clear as it creates both low traffic conditions and very high traffic contions. We need to figure out if any other variable explains this behavior.

ggplot(data = slice(df_selected, 
                    c(16334,16337, 23088,23086, 36808,38793,46802))) +
  geom_point(data = df_selected,
             mapping = aes(x = hour, y = traffic_volume)) +
  geom_point(mapping = aes(x = hour, y = traffic_volume),
             color = 'darkred') +
  geom_text_repel(mapping = aes(x = hour, 
                                y = traffic_volume,
                                label = weekday),
             color = 'darkred') +
  labs(title="Investigating High Influence Points",
       subtitle="Label = Day of the Week")+
      theme(axis.text=element_text(size=25),
          axis.title.x = element_text(size = 20),
          axis.title.y = element_text(size = 20),
          plot.title = element_text(size = 20),
          legend.key.size = unit(2,"cm"),
          legend.text = element_text(size = 18),
          legend.title = element_text(size = 14),
          panel.background = element_rect(fill = 'white'),
          panel.grid.major = element_line(color = "grey")) 

Its interesting to note that most influenced days by Smoky weather seem to be Monday, Friday and Saturday. Even though the weather was not ideal, people seemed to travel more or were unaffected on Monday and Friday. This could either be due to the fact that these are peak work travel days or that people prefer to take their own vehicles instead of public transport during days where there impaired vision.

5.Conclusion

6. Additional Insights: Refinement of the model

lm_poly <- lm(traffic_volume ~ temp + poly(hour, 2) + weather_main + holiday, data = df_selected)
summary(lm_poly)
## 
## Call:
## lm(formula = traffic_volume ~ temp + poly(hour, 2) + weather_main + 
##     holiday, data = df_selected)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5082.4  -679.2  -116.7   644.1  3820.0 
## 
## Coefficients:
##                                    Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                       4.697e+03  4.804e+02    9.777  < 2e-16 ***
## temp                              4.670e+00  2.513e-01   18.586  < 2e-16 ***
## poly(hour, 2)1                    1.499e+05  1.198e+03  125.118  < 2e-16 ***
## poly(hour, 2)2                   -3.138e+05  1.188e+03 -264.032  < 2e-16 ***
## weather_mainClouds                6.181e+01  1.406e+01    4.396 1.10e-05 ***
## weather_mainDrizzle               2.777e+01  2.939e+01    0.945 0.344646    
## weather_mainExtreme              -2.141e+02  3.821e+01   -5.603 2.12e-08 ***
## weather_mainFog                  -1.811e+02  4.033e+01   -4.490 7.13e-06 ***
## weather_mainHaze                 -5.300e+01  3.365e+01   -1.575 0.115219    
## weather_mainMist                 -8.703e+01  1.840e+01   -4.729 2.27e-06 ***
## weather_mainRain                 -3.311e+01  1.879e+01   -1.762 0.078108 .  
## weather_mainSmoke                -4.974e+02  2.631e+02   -1.890 0.058745 .  
## weather_mainSnow                 -1.512e+02  2.498e+01   -6.052 1.44e-09 ***
## holidayColumbus Day              -4.858e+02  7.120e+02   -0.682 0.495051    
## holidayIndependence Day           1.748e+01  7.121e+02    0.025 0.980414    
## holidayLabor Day                 -2.667e+01  6.542e+02   -0.041 0.967487    
## holidayMartin Luther King Jr Day -1.766e+02  6.789e+02   -0.260 0.794766    
## holidayMemorial Day               9.338e+00  7.121e+02    0.013 0.989537    
## holidayNew Years Day              5.803e+02  6.789e+02    0.855 0.392646    
## holidayNone                      -1.644e+03  4.803e+02   -3.424 0.000618 ***
## holidayState Fair                -4.380e+02  7.120e+02   -0.615 0.538465    
## holidayThanksgiving Day           4.634e+01  6.789e+02    0.068 0.945578    
## holidayVeterans Day              -2.922e+02  7.120e+02   -0.410 0.681551    
## holidayWashingtons Birthday      -2.834e+02  7.120e+02   -0.398 0.690624    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1176 on 48169 degrees of freedom
## Multiple R-squared:   0.65,  Adjusted R-squared:  0.6498 
## F-statistic:  3889 on 23 and 48169 DF,  p-value: < 2.2e-16

The Multiple R-squared shows that the model explains 65% of the variance in traffic_volume which is a moderate to good fit much better than the previews model. The Adjust R-squared value of 0.6498 does not differ from the adjusted signifying the new variables we added have not over fitted. The higher F-statistic value and very low p value shows us that the overall model is highly significant and one or more of the predictors are useful. the polynomial function added to the model shows us +ve poly 1 which signifies that the hour of the day significantly impact traffic volumes. the negative coefficients for ploy 2 variations for hour confirms a U-shaped or inverted-U pattern for traffic volume throughout the day. This suggests rush hour peaks and lower volume in early morning & late night. We have seen that in the Residual Vs Fitted graph we plotted for the previous model.

The new model is fits well (R² = 65%) and captures rush hour peaks, weather effects, and holiday impacts. Temperature has a small but significant effect. Bad weather (Extreme, Fog, Snow) lowers traffic significantly. Holidays have mixed effects, but traffic is lower on regular days.