How do traffic patterns change with respect to Time?
Do factors like weather and temperature have an impact on traffic?
What are the traffic patterns during holidays?
library(readr)
library(ggplot2)
library(patchwork)
library(dplyr)
library(lubridate)
library(GGally)
week2=read_csv("C:/Users/rajas/OneDrive/Desktop/Desktop/Applied Data Science/INFOH510/R Jupyter/Metro_Interstate_Traffic_Volume.csv")
custom_summary <- function(data){
print("Summary of Temperature")
print(summary(week2$temp))
print("Summary of Cloud Percentage")
print(summary(week2$clouds_all))
print("Summary of Traffic Metro Volume")
print(summary(week2$traffic_volume))
}
custom_summary(week2)
## [1] "Summary of Temperature"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 272.2 282.4 281.2 291.8 310.1
## [1] "Summary of Cloud Percentage"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 64.00 49.36 90.00 100.00
## [1] "Summary of Traffic Metro Volume"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 1193 3380 3260 4933 7280
Notice that the minimum value in ‘temp’ column is 0.0. This is physically not possible (In a usual everyday outdoor environment) and thus all such values in the data must be excluded before any conclusion is made for a sound inference. Removing inappropriate values and converting the temperature to Fahrenheit to better understand the insights.
week2=week2[week2$temp>0,]
week2=week2[week2$rain_1h< 60,]
week2<- week2|>
mutate(temp=(((temp-273)*9/5))+32)
plot_d=function(data,titl,scal){
week2|>
ggplot()+
geom_boxplot(mapping=aes(y=data), fill="steelblue")+
labs(title=titl,y= scal)
}
plt1=plot_d(week2$temp, "Temperature Distribution","Temperature(F)")
plt2=plot_d(week2$clouds_all,"Cloud Cover", "Cover%")
plt3=plot_d(week2$traffic_volume,"Traffic Distribution", "Volume")
patchwork::wrap_plots(plt1,plt2,plt3, guides="collect")
df=function(d1,d2){
a=data.frame(table(d1))
colnames(a)<-c(d2,'Count')
a
}
weather_main=df(week2$weather_main,"Types Of Weather")
weather_description=df(week2$weather_description,"Weather Description")
holidays=df(week2$holiday,"Holidays")
holidays
## Holidays Count
## 1 Christmas Day 6
## 2 Columbus Day 5
## 3 Independence Day 5
## 4 Labor Day 7
## 5 Martin Luther King Jr Day 6
## 6 Memorial Day 5
## 7 New Years Day 6
## 8 None 48132
## 9 State Fair 5
## 10 Thanksgiving Day 6
## 11 Veterans Day 5
## 12 Washingtons Birthday 5
weather_main
## Types Of Weather Count
## 1 Clear 13381
## 2 Clouds 15164
## 3 Drizzle 1821
## 4 Fog 912
## 5 Haze 1360
## 6 Mist 5950
## 7 Rain 5671
## 8 Smoke 20
## 9 Snow 2876
## 10 Squall 4
## 11 Thunderstorm 1034
weather_description
## Weather Description Count
## 1 broken clouds 4666
## 2 drizzle 651
## 3 few clouds 1956
## 4 fog 912
## 5 freezing rain 2
## 6 haze 1360
## 7 heavy intensity drizzle 64
## 8 heavy intensity rain 467
## 9 heavy snow 616
## 10 light intensity drizzle 1100
## 11 light intensity shower rain 13
## 12 light rain 3372
## 13 light rain and snow 6
## 14 light shower snow 11
## 15 light snow 1946
## 16 mist 5950
## 17 moderate rain 1664
## 18 overcast clouds 5081
## 19 proximity shower rain 136
## 20 proximity thunderstorm 673
## 21 proximity thunderstorm with drizzle 13
## 22 proximity thunderstorm with rain 52
## 23 scattered clouds 3461
## 24 shower drizzle 6
## 25 shower snow 1
## 26 sky is clear 11655
## 27 Sky is Clear 1726
## 28 sleet 3
## 29 smoke 20
## 30 snow 293
## 31 SQUALLS 4
## 32 thunderstorm 125
## 33 thunderstorm with drizzle 2
## 34 thunderstorm with heavy rain 63
## 35 thunderstorm with light drizzle 15
## 36 thunderstorm with light rain 54
## 37 thunderstorm with rain 37
## 38 very heavy rain 17
Grouping the data and summarizing by traffic, we can get meaningful insights as shown below.
week2$hour<- as.integer(format(as.POSIXct(week2$date_time),"%H")) #converting the date_time information into hours,month,year, weekdays to get relevant insights.
week2$month<- month(as.integer(format(as.POSIXct(week2$date_time),"%m")),label = TRUE) #using lubridate library to get the month labels
week2$year<- as.integer(format(as.POSIXct(week2$date_time),"%y"))
week2$weekday<-weekdays(as.Date(week2$date_time))
week2$weekday<-factor(week2$weekday,levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) #sorting the weekdays
#grouping the data by hour and weekdays to get traffic information
hour_data<-week2|>
group_by(hour,year)|>
summarise(Total_Volume=sum(traffic_volume),.groups='drop')|>
ggplot()+
geom_line(mapping=aes(x=hour,y=Total_Volume,color=as.factor(year),group=year),size=1)+
geom_point(mapping=aes(x=hour,y=Total_Volume,color=as.factor(year),group=year))+
labs(title="Traffic Volume by Hour", x= "Hour of the Day",y="Traffic Volume", color= "Year")+
#theme_minimal()
theme(axis.text=element_text(size=15))
seasonal<- week2|>
group_by(weekday,year)|>
summarise(Total_Volume=sum(traffic_volume),.groups='drop')|>
ggplot()+
geom_line(mapping=aes(x=weekday,y=Total_Volume, color=as.factor(year),group=year),size=1)+
geom_point(mapping=aes(x=weekday,y=Total_Volume,color=as.factor(year),group=year))+
labs(title="Weekly Traffic Volume", x= "Weekday",y="Traffic Volume",color="Year")+
#theme_minimal()
theme(axis.text=element_text(size=15))
hour_data+seasonal+plot_layout(guides="collect")
holiday_data<-week2[week2$holiday!= "None",]
holiday_data|>
group_by(holiday)|>
summarise(Total_traffic=sum(traffic_volume),.groups='drop')|>
ggplot()+
geom_bar(aes(x= holiday,y=Total_traffic,fill=holiday), stat="identity")+
labs(title="Holiday Traffic Volume",x="Holiday",y="Traffic Volume")+
theme(axis.text=element_text(size=12))
Lets analyze the correlation between different variables like weather, temperature on traffic data.
corr_data=week2|>
select(traffic_volume,temp, rain_1h,snow_1h,clouds_all, hour)
ggpairs(corr_data,title="Correlation Matrix of Traffic Volume and Weather Factors")
Let us try to understand the different visualization portrayed above.
The variable summary and distribution charts(2.1) show that the temperature in this part of Minnesota fluctuates between 30F and 65F consistently with highest being 98.3F and lowest being -21.3F. This region seems to be covered by clouds most days as we see the distribution to be fairly uniform. Minnesota sems to have a total of 10 holidays(2.2) with a local holiday unique to them in the form of Minnesota State Fair (https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume).
Peak traffic(3.1) seems to be between 7:00 am and 5:00 pm during the day. This suggests that the majority of the traffic could be people travelling to and from for jobs. We see a significant dip in traffic volume(3.1) during weekends across the years. Looking at the overall traffic figures, there is an increase as years progress but it is not a steady year over year increase. We see pretty large fluctuations, noticeably between 2012 and 2013. Overall traffic seems to have increased post 2014/2015 and has been sustaining at around 15-20 million.
It is worth noting that travel volume is the maximum during New year’s Day(3.2) among the holidays. Labor day and Independence Day are almost always are part of a long weekend, which would entice people to travel more which is the likely cause of the higher numbers we see in the chart.
We see Multi-modal distribution with peaks at different levels(4.2) for traffic volume whereas as temperature sees a bimodal distribution. There is a moderate positive corelation(0.352) between traffic volume and time of the day(4.2). We observe weak positive corelation(0.132) between temperature and traffic volume indicating warmer days have a slight increase in traffic. Weather does not seems to have a large impact on traffic volume as we see little to no correlation between those variables and traffic volume.
Understanding traffic patterns is very nuanced and requires data analysis at multiple levels. With the high level dive we have completed, we can form a few surface level conclusions:
How do traffic patterns change with respect to Time?
What are the traffic patterns during holidays?
Do factors like weather and temperature have an impact on traffic?
There is still lot more insight that can be generated with further deep dive into the data.
Monthly traffic patterns and if seasonal temperature change causes monthly traffic to fluctuate
External events that could have caused the overall traffic drop between early years 2012-1015 like policy changes, private taxi services, etc.
Predictive forecasting to see if traffic volume would increase over the years
Anomaly detection to provide opportunities for optimization of traffic