We are going to be exploring the Metro Interstate Traffic Volume, which contains traffic volume data for a section of Interstate 94 in Minneapolis, Minnesota. It includes various features like weather conditions, holidays, and time-based information, etc. We will create various visualizations and try to understand traffic patterns in this section of the interstate answering following questions at the end:

  1. How do traffic patterns change with respect to Time?

  2. Do factors like weather and temperature have an impact on traffic?

  3. What are the traffic patterns during holidays?

1. Reading Data

library(readr)
library(ggplot2)
library(patchwork)
library(dplyr)
library(lubridate)
library(GGally)
week2=read_csv("C:/Users/rajas/OneDrive/Desktop/Desktop/Applied Data Science/INFOH510/R Jupyter/Metro_Interstate_Traffic_Volume.csv")

2. Summary of the data

2.1 Numerical Data Summary and Visualization

custom_summary <- function(data){
  print("Summary of Temperature")
  print(summary(week2$temp))
  print("Summary of Cloud Percentage")
  print(summary(week2$clouds_all))
  print("Summary of Traffic Metro Volume")
  print(summary(week2$traffic_volume))
}
custom_summary(week2)
## [1] "Summary of Temperature"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   272.2   282.4   281.2   291.8   310.1 
## [1] "Summary of Cloud Percentage"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.00   64.00   49.36   90.00  100.00 
## [1] "Summary of Traffic Metro Volume"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    1193    3380    3260    4933    7280

Notice that the minimum value in ‘temp’ column is 0.0. This is physically not possible (In a usual everyday outdoor environment) and thus all such values in the data must be excluded before any conclusion is made for a sound inference. Removing inappropriate values and converting the temperature to Fahrenheit to better understand the insights.

week2=week2[week2$temp>0,]
week2=week2[week2$rain_1h< 60,]
week2<- week2|>
  mutate(temp=(((temp-273)*9/5))+32)

plot_d=function(data,titl,scal){
  week2|>
    ggplot()+
    geom_boxplot(mapping=aes(y=data), fill="steelblue")+
    labs(title=titl,y= scal)
  }
plt1=plot_d(week2$temp, "Temperature Distribution","Temperature(F)")
plt2=plot_d(week2$clouds_all,"Cloud Cover", "Cover%")
plt3=plot_d(week2$traffic_volume,"Traffic Distribution", "Volume")
patchwork::wrap_plots(plt1,plt2,plt3, guides="collect")

2.2 Categorical data summary

df=function(d1,d2){
  a=data.frame(table(d1))
  colnames(a)<-c(d2,'Count')
  a
}
weather_main=df(week2$weather_main,"Types Of Weather")
weather_description=df(week2$weather_description,"Weather Description")
holidays=df(week2$holiday,"Holidays")
holidays
##                     Holidays Count
## 1              Christmas Day     6
## 2               Columbus Day     5
## 3           Independence Day     5
## 4                  Labor Day     7
## 5  Martin Luther King Jr Day     6
## 6               Memorial Day     5
## 7              New Years Day     6
## 8                       None 48132
## 9                 State Fair     5
## 10          Thanksgiving Day     6
## 11              Veterans Day     5
## 12      Washingtons Birthday     5
weather_main
##    Types Of Weather Count
## 1             Clear 13381
## 2            Clouds 15164
## 3           Drizzle  1821
## 4               Fog   912
## 5              Haze  1360
## 6              Mist  5950
## 7              Rain  5671
## 8             Smoke    20
## 9              Snow  2876
## 10           Squall     4
## 11     Thunderstorm  1034
weather_description
##                    Weather Description Count
## 1                        broken clouds  4666
## 2                              drizzle   651
## 3                           few clouds  1956
## 4                                  fog   912
## 5                        freezing rain     2
## 6                                 haze  1360
## 7              heavy intensity drizzle    64
## 8                 heavy intensity rain   467
## 9                           heavy snow   616
## 10             light intensity drizzle  1100
## 11         light intensity shower rain    13
## 12                          light rain  3372
## 13                 light rain and snow     6
## 14                   light shower snow    11
## 15                          light snow  1946
## 16                                mist  5950
## 17                       moderate rain  1664
## 18                     overcast clouds  5081
## 19               proximity shower rain   136
## 20              proximity thunderstorm   673
## 21 proximity thunderstorm with drizzle    13
## 22    proximity thunderstorm with rain    52
## 23                    scattered clouds  3461
## 24                      shower drizzle     6
## 25                         shower snow     1
## 26                        sky is clear 11655
## 27                        Sky is Clear  1726
## 28                               sleet     3
## 29                               smoke    20
## 30                                snow   293
## 31                             SQUALLS     4
## 32                        thunderstorm   125
## 33           thunderstorm with drizzle     2
## 34        thunderstorm with heavy rain    63
## 35     thunderstorm with light drizzle    15
## 36        thunderstorm with light rain    54
## 37              thunderstorm with rain    37
## 38                     very heavy rain    17

3. Trend Analysis

4. Data Analysis

4.1 Correlation analysis

Lets analyze the correlation between different variables like weather, temperature on traffic data.

corr_data=week2|>
  select(traffic_volume,temp, rain_1h,snow_1h,clouds_all, hour)
ggpairs(corr_data,title="Correlation Matrix of Traffic Volume and Weather Factors")

Let us try to understand the different visualization portrayed above.

The variable summary and distribution charts(2.1) show that the temperature in this part of Minnesota fluctuates between 30F and 65F consistently with highest being 98.3F and lowest being -21.3F. This region seems to be covered by clouds most days as we see the distribution to be fairly uniform. Minnesota sems to have a total of 10 holidays(2.2) with a local holiday unique to them in the form of Minnesota State Fair (https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume).

Peak traffic(3.1) seems to be between 7:00 am and 5:00 pm during the day. This suggests that the majority of the traffic could be people travelling to and from for jobs. We see a significant dip in traffic volume(3.1) during weekends across the years.  Looking at the overall traffic figures, there is an increase as years progress but it is not a steady year over year increase. We see pretty large fluctuations, noticeably between 2012 and 2013. Overall traffic seems to have increased post 2014/2015 and has been sustaining at around 15-20 million.

It is worth noting that travel volume is the maximum during New year’s Day(3.2) among the holidays. Labor day and Independence Day are almost always are part of a long weekend, which would entice people to travel more which is the likely cause of the higher numbers we see in the chart.

We see Multi-modal distribution with peaks at different levels(4.2) for traffic volume whereas as temperature sees a bimodal distribution. There is a moderate positive corelation(0.352) between traffic volume and time of the day(4.2). We observe weak positive corelation(0.132) between temperature and traffic volume indicating warmer days have a slight increase in traffic. Weather does not seems to have a large impact on traffic volume as we see little to no correlation between those variables and traffic volume.

5. Conclusion

Understanding traffic patterns is very nuanced and requires data analysis at multiple levels. With the high level dive we have completed, we can form a few surface level conclusions:

  1. How do traffic patterns change with respect to Time?

    1. Peak traffic is between 7:00 AM to 5:00 PM which is typically commute traffic where people are going to and from their workplaces. This is further supported by the significant decrease in weekend volume. The scatter plots in the correlation analysis between traffic volume and hour suggest the same trend.
  2. What are the traffic patterns during holidays?

    1. It is a known fact that overall traffic is lower during holidays. But during which holidays do people travel the most? Is it Christmas and thanksgiving to visit families? Do people travel to different parts of the city during Halloween than they do during the end of year holidays? Looking at the data, people travel the most during New year’s Day. This is likely people travelling back to their houses from their hometown after spending the holidays with their families. Long weekends like Labor Day and Independence day have significantly higher traffic volume.
  3. Do factors like weather and temperature have an impact on traffic?

    1. Surprisingly, weather does not seem to have a huge impact on traffic volume. There seems to be no correlation between the rain/snow and weather variables and the volume as seen in the correlation pots. This also solidifies the inference that most traffic that is native to this part of the interstate is commute traffic.

6. Future Scope

There is still lot more insight that can be generated with further deep dive into the data.

  1. Monthly traffic patterns and if seasonal temperature change causes monthly traffic to fluctuate

  2. External events that could have caused the overall traffic drop between early years 2012-1015 like policy changes, private taxi services, etc.

  3. Predictive forecasting to see if traffic volume would increase over the years

  4. Anomaly detection to provide opportunities for optimization of traffic