library(lubridate)
Attaching package: 㤼㸱lubridate㤼㸲
The following object is masked from 㤼㸱package:base㤼㸲:
date
library(ggplot2)
library(ggthemes)
library(dplyr)
Attaching package: 㤼㸱dplyr㤼㸲
The following objects are masked from 㤼㸱package:lubridate㤼㸲:
intersect, setdiff, union
The following objects are masked from 㤼㸱package:stats㤼㸲:
filter, lag
The following objects are masked from 㤼㸱package:base㤼㸲:
intersect, setdiff, setequal, union
library(tidyverse)
[37m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[37m[32mv[37m [34mtibble [37m 2.1.3 [32mv[37m [34mpurrr [37m 0.3.3
[32mv[37m [34mtidyr [37m 1.0.0 [32mv[37m [34mstringr[37m 1.4.0
[32mv[37m [34mreadr [37m 1.3.1 [32mv[37m [34mforcats[37m 0.4.0[39m
[37m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[37m [34mlubridate[37m::[32mas.difftime()[37m masks [34mbase[37m::as.difftime()
[31mx[37m [34mlubridate[37m::[32mdate()[37m masks [34mbase[37m::date()
[31mx[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31mx[37m [34mlubridate[37m::[32mintersect()[37m masks [34mbase[37m::intersect()
[31mx[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()
[31mx[37m [34mlubridate[37m::[32msetdiff()[37m masks [34mbase[37m::setdiff()
[31mx[37m [34mlubridate[37m::[32munion()[37m masks [34mbase[37m::union()[39m
traffic=Timeseries_traffic
head(traffic)
As in the time coloumn we need time only but date is also shown so we have to strip time only.
traffic$time=format(as.POSIXct(traffic$Time,format="%m/%d/%Y"),format = "%H:%M:%S")
head(traffic)
Now we have the time column for time only now we have to delete original Time column,as it is not required now
traffic1=traffic[,-2]
head(traffic1)
For ggplot we need to convert each time format individually as a factor.
traffic1$Day=ymd(traffic1$Day)
Now we can pullout individual time and covert it into factor.
traffic1$day=factor(day(traffic1$Day))
head(traffic1)
Making time as a factor in seperate coloumn
traffic1$hour=factor(hour(hms(traffic1$time)))
traffic1$minute=factor(minute(hms(traffic1$time)))
head(traffic1)
NA
NA
ggplot for understanding the traffic density.
ggplot(traffic1,aes(hour,`Traffic Density`,fill=day))+geom_bar(stat = "identity")+ggtitle("Traffic density vs Hour")
Now we can clearly see that pattern of traffic density for different days in different hours of the day.
ggplot(traffic1,aes(hour,`Traffic Density`,fill=minute))+geom_bar(stat = "identity")+ggtitle("Traffic density vs hour",subtitle ="In quarter hour interval")
Above graph shows the traffic density at each quarter hour of a day.
ggplot(traffic1,aes(hour,`Two wheelers`,fill=day ,group=day))+geom_bar(stat = "identity")+scale_color_viridis_d()+ggtitle("Two Wheelers density of each hour of a day")
The above graph shows the reltation of two wheelers density in each hour of a week and from the graph we can conclude that the all days of a week are following the same pattern and two wheelers density is maximum at 9am and 6pm to 7pm.
Now to see the effect of two wheeler density in traffic density we can plot a graph for any of the day for traffic density.
traffic_day=filter(traffic1,day==19)
view(traffic_day)
ggplot(traffic_day,aes(hour,`Two wheelers`,col=`Traffic Density`,size=`Traffic Density` ))+geom_point(alpha=0.7)+scale_color_viridis_c()+ggtitle("Two wheelers vs hour plot",subtitle = "Showing traffic density on 19th day")
The above graph is showing the relationship between two wheelers density in each hour of 19th day and traffic density at that time.
Thus we can clearly see that two weelers density is at its peak at 9am and 6 to 7 pm and also the traffic density is maximum at htese hours which indicates that two wheelers and traffic density are having almost a linear relationship.
Now from the above graphs we can see that these independent variables are having almost linear relationship with the traffic density as if the number of vehicles are more on the road the traffic density will b more,so we can plot the graph for the density of vehicle and directly relate it to the traffic density.
ggplot(traffic1,aes(hour,`Car/Jeep/Van`))+geom_point(col="lightgreen",size=2,alpha=0.5)+geom_line(size=1,col="yellow")+theme_dark()+ggtitle("Car/Jeep/Van density graph")
The above graph shows the density of Car/Jeep/Van at each hour and it also indicates that the density of Car/Jeep/Van is following a trend.
df2 <- data.frame(t(data_traffic[-1]))
colnames(df2) <- data_traffic[, 1]
df2$vehicle=row.names(df2)
df2
ggplot(df2,aes(`19`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,fill=alpha("white",0.5),shape=21,stroke=2,show.legend = FALSE)+geom_segment(aes(x=`19`, xend=`19`, y=0, yend=vehicle),size=0.5,show.legend = FALSE)+ggtitle("Vehicle density for 19th day")+theme_dark()+scale_color_viridis_d()
From the above graph the can see that on 19th day the Car/jeep/van are the most in number.
Similarly we can see the density of vehicles on all 6 days.
library(gridExtra)
g1=ggplot(df2,aes(`20`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,shape=21,stroke=2,show.legend = FALSE)+theme_dark()
g2=ggplot(df2,aes(`21`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,shape=21,stroke=2,show.legend = FALSE)+theme_dark()
g3=ggplot(df2,aes(`22`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,shape=21,stroke=2,show.legend = FALSE)+theme_dark()
g4=ggplot(df2,aes(`23`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,shape=21,stroke=2,show.legend = FALSE)+theme_dark()
g5=ggplot(df2,aes(`24`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,shape=21,stroke=2,show.legend = FALSE)+theme_dark()
g6=ggplot(df2,aes(`25`,vehicle,col=vehicle))+geom_point(size=5,alpha=0.7,shape=21,stroke=2,show.legend = FALSE)+theme_dark()
grid.arrange(g1,g2,g3,g4,g5,g6)
From the above plots we can conclude that the car/jeep/van are contributing the most in the traffic density on each day and two wheeler comes after that and so on.
So the above graph tell us a lot about the relations and effect of vehicles on traffic densitty in accordance to time.
Now moving over the modeling
str(traffic3)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 864 obs. of 20 variables:
$ Two wheelers : num 6 7 3 0 2 5 3 2 0 2 ...
$ Autorickshaw (3w) : num 0 0 0 0 0 0 0 0 0 1 ...
$ 4 Wheel Auto : num 1 0 0 0 0 1 0 0 0 0 ...
$ Taxi : num 0 0 0 0 0 0 0 0 0 0 ...
$ Car/Jeep/Van : num 27 21 24 15 20 12 9 12 6 9 ...
$ Mini Bus : num 1 1 1 5 2 1 0 0 0 0 ...
$ Standard Bus : num 10 20 6 9 3 8 5 5 7 1 ...
$ Mini LCV : num 9 6 4 6 9 5 6 4 4 4 ...
$ LCV : num 10 9 3 3 6 6 7 9 6 9 ...
$ 2-AxleTruck : num 8 6 6 8 3 5 9 7 6 6 ...
$ 3-AxleTruck : num 2 9 7 7 11 17 7 8 5 12 ...
$ MAV (4 Axle and above): num 6 5 13 9 14 15 6 9 7 12 ...
$ Tractor : num 1 0 1 0 1 2 0 0 0 0 ...
$ Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ Cycle Rickshaw : num 0 0 0 0 0 0 0 0 0 0 ...
$ Carts : num 0 0 0 0 0 0 0 0 0 0 ...
$ Traffic Density : num 145 173 152 146 158 ...
$ day : Factor w/ 7 levels "19","20","21",..: 1 1 1 1 1 1 1 1 1 1 ...
$ hour : Factor w/ 24 levels "0","1","2","3",..: 1 1 1 2 2 2 2 3 3 3 ...
$ minute : Factor w/ 4 levels "0","15","30",..: 2 3 4 1 2 3 4 1 2 3 ...
traffic3$day=as.numeric(traffic3$day)
traffic3$hour=as.numeric(traffic3$hour)
traffic3$minute=as.numeric(traffic3$minute)
str(traffic3)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 864 obs. of 20 variables:
$ Two wheelers : num 6 7 3 0 2 5 3 2 0 2 ...
$ Autorickshaw (3w) : num 0 0 0 0 0 0 0 0 0 1 ...
$ 4 Wheel Auto : num 1 0 0 0 0 1 0 0 0 0 ...
$ Taxi : num 0 0 0 0 0 0 0 0 0 0 ...
$ Car/Jeep/Van : num 27 21 24 15 20 12 9 12 6 9 ...
$ Mini Bus : num 1 1 1 5 2 1 0 0 0 0 ...
$ Standard Bus : num 10 20 6 9 3 8 5 5 7 1 ...
$ Mini LCV : num 9 6 4 6 9 5 6 4 4 4 ...
$ LCV : num 10 9 3 3 6 6 7 9 6 9 ...
$ 2-AxleTruck : num 8 6 6 8 3 5 9 7 6 6 ...
$ 3-AxleTruck : num 2 9 7 7 11 17 7 8 5 12 ...
$ MAV (4 Axle and above): num 6 5 13 9 14 15 6 9 7 12 ...
$ Tractor : num 1 0 1 0 1 2 0 0 0 0 ...
$ Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ Cycle Rickshaw : num 0 0 0 0 0 0 0 0 0 0 ...
$ Carts : num 0 0 0 0 0 0 0 0 0 0 ...
$ Traffic Density : num 145 173 152 146 158 ...
$ day : num 1 1 1 1 1 1 1 1 1 1 ...
$ hour : num 1 1 1 2 2 2 2 3 3 3 ...
$ minute : num 2 3 4 1 2 3 4 1 2 3 ...
Now for the analysis we converted the factor column into numeric
library(GGally)
ggcorr(traffic3,method = c("everything","pearson"))+ggtitle("Correlation plot ")
the standard deviation is zero
Above plot shows the corelation of the variables using pearson method.
Spliting the data
set.seed(123)
tr_traffic=TRAIN(traffic3)
Length of logical index must be 1 or 864, not 20
ts_traffic=TEST(traffic3)
Length of logical index must be 1 or 864, not 20
Now training the model using train data (i.e tr_traffic).
model_tr=lm(`Traffic Density`~.,data = traffic3)
summary(model_tr)
essentially perfect fit: summary may be unreliable
Call:
lm(formula = `Traffic Density` ~ ., data = traffic3)
Residuals:
Min 1Q Median 3Q Max
-1.424e-12 -7.330e-15 8.000e-16 9.100e-15 7.075e-13
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.481e-14 1.353e-14 -2.572e+00 0.0103 *
`Two wheelers` 5.000e-01 2.782e-16 1.797e+15 <2e-16 ***
`Autorickshaw (3w)` 1.000e+00 1.150e-15 8.699e+14 <2e-16 ***
`4 Wheel Auto` 1.000e+00 4.541e-15 2.202e+14 <2e-16 ***
Taxi 1.000e+00 1.890e-15 5.291e+14 <2e-16 ***
`Car/Jeep/Van` 1.000e+00 2.890e-16 3.461e+15 <2e-16 ***
`Mini Bus` 1.500e+00 2.290e-15 6.549e+14 <2e-16 ***
`Standard Bus` 3.000e+00 7.419e-16 4.044e+15 <2e-16 ***
`Mini LCV` 1.000e+00 9.177e-16 1.090e+15 <2e-16 ***
LCV 1.500e+00 8.967e-16 1.673e+15 <2e-16 ***
`2-AxleTruck` 3.000e+00 8.798e-16 3.410e+15 <2e-16 ***
`3-AxleTruck` 3.000e+00 6.885e-16 4.357e+15 <2e-16 ***
`MAV (4 Axle and above)` 4.500e+00 5.153e-16 8.732e+15 <2e-16 ***
Tractor 1.500e+00 1.860e-15 8.066e+14 <2e-16 ***
Cycle 5.000e-01 1.325e-14 3.772e+13 <2e-16 ***
`Cycle Rickshaw` 2.000e+00 2.361e-14 8.469e+13 <2e-16 ***
Carts NA NA NA NA
day 7.503e-16 1.105e-15 6.790e-01 0.4975
hour 9.565e-16 4.017e-16 2.381e+00 0.0175 *
minute -2.590e-16 1.882e-15 -1.380e-01 0.8905
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.11e-14 on 845 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.157e+31 on 18 and 845 DF, p-value: < 2.2e-16
Now as we see carts column is showing 0 values only which is taken as NA so we can remove carts from the model.
Making predictions using test data:
prediction_tr=predict(model_tr,ts_traffic)
prediction from a rank-deficient fit may be misleading
head(prediction_tr)
1 2 3 4 5 6
145.0 145.5 113.5 46.0 118.5 100.5
Checking the validity of the data:
RMSE(ts_traffic$`Traffic Density`,prediction_tr)
[1] "rmse of predicted and orignal data is = 5.5429767098568e-14"
As the value of rmse is very low thus it signifies that the validity of model is very good.