This project aims to optimize traffic light timings dynamically, resulting in a more efficient and streamlined traffic flow that significantly reduces congestion and enhances the overall commuting experience for all road users by:
Developing a robust classification system categorizing traffic situations into distinct levels (low, medium, high, and heavy).
Employing predictive modeling to accurately forecast the number of vehicles on the roads at specific times.
library(dplyr)
traffic_df <- read.csv("Dataset/Traffic.csv")
glimpse(traffic_df)
## Rows: 2,976
## Columns: 9
## $ Time <chr> "12:00:00 AM", "12:15:00 AM", "12:30:00 AM", "12:45:…
## $ Date <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
## $ Day.of.the.week <chr> "Tuesday", "Tuesday", "Tuesday", "Tuesday", "Tuesday…
## $ CarCount <int> 31, 49, 46, 51, 57, 44, 37, 42, 51, 34, 45, 45, 50, …
## $ BikeCount <int> 0, 0, 0, 0, 6, 0, 0, 4, 0, 0, 0, 0, 0, 0, 22, 16, 28…
## $ BusCount <int> 4, 3, 3, 2, 15, 5, 1, 4, 9, 4, 1, 1, 3, 4, 42, 49, 2…
## $ TruckCount <int> 4, 3, 6, 5, 16, 4, 4, 5, 7, 7, 1, 3, 0, 4, 1, 0, 3, …
## $ Total <int> 39, 55, 55, 58, 94, 53, 42, 55, 67, 45, 47, 49, 53, …
## $ Traffic.Situation <chr> "low", "low", "low", "low", "normal", "low", "low", …
head(traffic_df)
## Time Date Day.of.the.week CarCount BikeCount BusCount TruckCount Total
## 1 12:00:00 AM 10 Tuesday 31 0 4 4 39
## 2 12:15:00 AM 10 Tuesday 49 0 3 3 55
## 3 12:30:00 AM 10 Tuesday 46 0 3 6 55
## 4 12:45:00 AM 10 Tuesday 51 0 2 5 58
## 5 1:00:00 AM 10 Tuesday 57 6 15 16 94
## 6 1:15:00 AM 10 Tuesday 44 0 5 4 53
## Traffic.Situation
## 1 low
## 2 low
## 3 low
## 4 low
## 5 normal
## 6 low
Replace any ” ” to NA
cnames<-names(traffic_df)
for (i in cnames){
print(paste(i,sum(traffic_df[i]=="" )))
traffic_df[which(traffic_df[i]==""),i]<-NA
}
## [1] "Time 0"
## [1] "Date 0"
## [1] "Day.of.the.week 0"
## [1] "CarCount 0"
## [1] "BikeCount 0"
## [1] "BusCount 0"
## [1] "TruckCount 0"
## [1] "Total 0"
## [1] "Traffic.Situation 0"
Check for missing values and sum them up for each column
colSums(is.na(traffic_df))
## Time Date Day.of.the.week CarCount
## 0 0 0 0
## BikeCount BusCount TruckCount Total
## 0 0 0 0
## Traffic.Situation
## 0
Create new column for hour , minute, AM/PM
library(lubridate)
traffic_df$Time_hms<- hms(traffic_df$Time)
traffic_df$Time <- as.POSIXct(strptime(traffic_df$Time, format = "%I:%M:%S %p"))
# Create columns for hour, minute, and AM/PM
traffic_df <- traffic_df %>%
mutate(
hour = hour(Time), # Hour in separate column
minute = minute(Time), # Minute in separate column
am_pm = ifelse(hour(Time) < 12, 1, 2) # 1 for AM, 2 for PM
)
head(traffic_df)
## Time Date Day.of.the.week CarCount BikeCount BusCount
## 1 2024-01-13 00:00:00 10 Tuesday 31 0 4
## 2 2024-01-13 00:15:00 10 Tuesday 49 0 3
## 3 2024-01-13 00:30:00 10 Tuesday 46 0 3
## 4 2024-01-13 00:45:00 10 Tuesday 51 0 2
## 5 2024-01-13 01:00:00 10 Tuesday 57 6 15
## 6 2024-01-13 01:15:00 10 Tuesday 44 0 5
## TruckCount Total Traffic.Situation Time_hms hour minute am_pm
## 1 4 39 low 12H 0M 0S 0 0 1
## 2 3 55 low 12H 15M 0S 0 15 1
## 3 6 55 low 12H 30M 0S 0 30 1
## 4 5 58 low 12H 45M 0S 0 45 1
## 5 16 94 normal 1H 0M 0S 1 0 1
## 6 4 53 low 1H 15M 0S 1 15 1
To create factor on numeric, and sort the ordinal variable
traffic_df %>%
count(`Day.of.the.week`)
## Day.of.the.week n
## 1 Friday 384
## 2 Monday 384
## 3 Saturday 384
## 4 Sunday 384
## 5 Thursday 480
## 6 Tuesday 480
## 7 Wednesday 480
traffic_df$'Day.of.the.week' = factor(traffic_df$'Day.of.the.week', levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
traffic_df$DayofWeek_Numeric = as.numeric(traffic_df$'Day.of.the.week')
traffic_df$'Traffic.Situation' = factor(traffic_df$'Traffic.Situation', levels = c("low", "normal", "high", "heavy"))
traffic_df$Traffic_Situation_Numeric = as.numeric(traffic_df$'Traffic.Situation')
head(traffic_df)
## Time Date Day.of.the.week CarCount BikeCount BusCount
## 1 2024-01-13 00:00:00 10 Tuesday 31 0 4
## 2 2024-01-13 00:15:00 10 Tuesday 49 0 3
## 3 2024-01-13 00:30:00 10 Tuesday 46 0 3
## 4 2024-01-13 00:45:00 10 Tuesday 51 0 2
## 5 2024-01-13 01:00:00 10 Tuesday 57 6 15
## 6 2024-01-13 01:15:00 10 Tuesday 44 0 5
## TruckCount Total Traffic.Situation Time_hms hour minute am_pm
## 1 4 39 low 12H 0M 0S 0 0 1
## 2 3 55 low 12H 15M 0S 0 15 1
## 3 6 55 low 12H 30M 0S 0 30 1
## 4 5 58 low 12H 45M 0S 0 45 1
## 5 16 94 normal 1H 0M 0S 1 0 1
## 6 4 53 low 1H 15M 0S 1 15 1
## DayofWeek_Numeric Traffic_Situation_Numeric
## 1 2 1
## 2 2 1
## 3 2 1
## 4 2 1
## 5 2 2
## 6 2 1
To check on summary statistic of each column
summary(traffic_df)
## Time Date Day.of.the.week CarCount
## Min. :2024-01-13 00:00:00 Min. : 1 Monday :384 Min. : 6.0
## 1st Qu.:2024-01-13 05:56:15 1st Qu.: 8 Tuesday :480 1st Qu.: 19.0
## Median :2024-01-13 11:52:30 Median :16 Wednesday:480 Median : 64.0
## Mean :2024-01-13 11:52:30 Mean :16 Thursday :480 Mean : 68.7
## 3rd Qu.:2024-01-13 17:48:45 3rd Qu.:24 Friday :384 3rd Qu.:107.0
## Max. :2024-01-13 23:45:00 Max. :31 Saturday :384 Max. :180.0
## Sunday :384
## BikeCount BusCount TruckCount Total
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 21.0
## 1st Qu.: 5.00 1st Qu.: 1.00 1st Qu.: 6.00 1st Qu.: 55.0
## Median :12.00 Median :12.00 Median :14.00 Median :109.0
## Mean :14.92 Mean :15.28 Mean :15.32 Mean :114.2
## 3rd Qu.:22.00 3rd Qu.:25.00 3rd Qu.:23.00 3rd Qu.:164.0
## Max. :70.00 Max. :50.00 Max. :40.00 Max. :279.0
##
## Traffic.Situation Time_hms hour minute
## low : 304 Min. :1H 0M 0S Min. : 0.00 Min. : 0.00
## normal:1669 1st Qu.:3H 56M 15S 1st Qu.: 5.75 1st Qu.:11.25
## high : 321 Median :6H 52M 30S Median :11.50 Median :22.50
## heavy : 682 Mean :6H 52M 30S Mean :11.50 Mean :22.50
## 3rd Qu.:9H 48M 45S 3rd Qu.:17.25 3rd Qu.:33.75
## Max. :12H 45M 0S Max. :23.00 Max. :45.00
##
## am_pm DayofWeek_Numeric Traffic_Situation_Numeric
## Min. :1.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.0 1st Qu.:2.000 1st Qu.:2.000
## Median :1.5 Median :4.000 Median :2.000
## Mean :1.5 Mean :3.903 Mean :2.464
## 3rd Qu.:2.0 3rd Qu.:6.000 3rd Qu.:3.000
## Max. :2.0 Max. :7.000 Max. :4.000
##
The total vehicles are categorized and ranked as follows, with mean values for each category: Low (65), Normal (83), High(142) and Heavy (199).
library(ggplot2)
mean_values <- traffic_df %>%
group_by(`Traffic.Situation`) %>%
summarize(mean_total = mean(Total))
ggplot(traffic_df, aes(x = `Traffic.Situation`, y = Total, fill = `Traffic.Situation`)) +
geom_boxplot() +
geom_text(data = mean_values, aes(x = `Traffic.Situation`, y = mean_total, label = round(mean_total, 0)),
vjust = -0.5, color = "black", size = 3, position = position_dodge(width = 0.75)) +
labs(title = "Relationship between Total and Traffic Situation",
x = "Traffic Situation",
y = "Total") +
scale_fill_manual(values = c("low" = "blue", "normal" = "green", "high" = "red","heavy"="grey")) +
theme_minimal()
The overall number of vehicles remains relatively constant throughout the month.
ggplot(traffic_df, aes(x = factor(Date), y = Total)) +
geom_bar(stat = "identity") +
labs(title = "Number of Vehicles by Date",
x = "Date",
y = "Total") +
theme_minimal()
Cars constitute a larger portion of the total vehicles, indicating their prevalence compared to other vehicle types. BikeCount is high on every few days.
library(tidyr)
# Calculate proportions
EDA_VehiclebyDate <- traffic_df %>%
group_by(Date) %>%
summarize(
percent_car = sum(CarCount) / sum(Total) * 100,
percent_bike = sum(BikeCount) / sum(Total) * 100,
percent_bus = sum(BusCount) / sum(Total) * 100,
percent_truck = sum(TruckCount) / sum(Total) * 100
)
# Reshape data for plotting using gather
EDA_VehiclebyDate_Gather <- EDA_VehiclebyDate %>%
gather(key = "Vehicle_Type", value = "Percentage", -Date)
# Plotting
ggplot(EDA_VehiclebyDate_Gather, aes(x = factor(Date), y = Percentage, fill = Vehicle_Type)) +
geom_bar(stat = "identity") +
labs(title = "Proportion of Vehicle Types Over Date",
x = "Date",
y = "Percentage") +
scale_fill_manual(values = c("percent_car" = "blue", "percent_bike" = "green", "percent_bus" = "red", "percent_truck" = "purple")) +
theme_minimal()
On Fridays, BikeCount noticeably surges, accompanied by a diminished proportion of larger vehicles such as BusCount and TruckCount.
# Calculate proportions
EDA_VehiclebyDay <- traffic_df %>%
group_by(Day.of.the.week) %>%
summarize(
percent_car = sum(CarCount) / sum(Total) * 100,
percent_bike = sum(BikeCount) / sum(Total) * 100,
percent_bus = sum(BusCount) / sum(Total) * 100,
percent_truck = sum(TruckCount) / sum(Total) * 100
)
# Reshape data for plotting using gather
EDA_VehiclebyDay_Gather <- EDA_VehiclebyDay %>%
gather(key = "Vehicle_Type", value = "Percentage", -Day.of.the.week)
# Plotting
ggplot(EDA_VehiclebyDay_Gather, aes(x = factor(Day.of.the.week), y = Percentage, fill = Vehicle_Type)) +
geom_bar(stat = "identity") +
labs(title = "Proportion of Vehicle Types Over Day of Week",
x = "Day of the Week",
y = "Percentage") +
scale_fill_manual(values = c("percent_car" = "blue", "percent_bike" = "green", "percent_bus" = "red", "percent_truck" = "purple")) +
theme_minimal()
The traffic situation pattern is generally similar on both weekdays and weekends. However, there is a notable difference in the total number of vehicles, with more vehicles observed on weekdays compared to weekends.
ggplot(traffic_df, aes(x = Day.of.the.week, fill = `Traffic.Situation`)) +
geom_bar(position = "dodge") +
labs(x = "Day of the week") +
ggtitle("Histogram of Traffic Situation by Day of the week") +
theme_minimal()
On an hourly basis, the traffic situation is generally normal during most hours (0000 to 2300). The period from 0600 to 1800 experiences heavier traffic, likely due to working hours.The number of vehicles peaked around 5-8am and 4-6pm
ggplot(traffic_df, aes(x = factor(hour), y = Total,fill = `Traffic.Situation`)) +
geom_bar(stat = "identity") +
labs(title = "Number of Vehicles by Hour",
x = "Time",
y = "Total") +
theme_minimal()
The correlation coefficients indicate strong, positive relationships between traffic situation and CarCount (0.69), BikeCount (0.56), BusCount (0.69), Total (0.78).
selected_cols <- traffic_df[ c('DayofWeek_Numeric', 'Date','CarCount', 'BikeCount', 'BusCount','TruckCount', 'Total', 'hour', 'minute', 'am_pm','Traffic_Situation_Numeric')]
# Computing the correlation matrix
correlation_matrix <- cor(selected_cols,method = "spearman")
correlation_matrix
## DayofWeek_Numeric Date CarCount
## DayofWeek_Numeric 1.000000000 -0.024441186 -0.004679699
## Date -0.024441186 1.000000000 -0.011382575
## CarCount -0.004679699 -0.011382575 1.000000000
## BikeCount 0.035196332 -0.005270197 0.749585264
## BusCount -0.044073940 -0.015112213 0.754586842
## TruckCount -0.029521139 0.022323732 -0.645307877
## Total -0.011219205 -0.008937869 0.960325256
## hour 0.000000000 0.000000000 0.161930232
## minute 0.000000000 0.000000000 -0.002307269
## am_pm 0.000000000 0.000000000 0.114641356
## Traffic_Situation_Numeric -0.025435751 -0.004718380 0.691167541
## BikeCount BusCount TruckCount Total
## DayofWeek_Numeric 0.035196332 -0.0440739400 -0.029521139 -0.011219205
## Date -0.005270197 -0.0151122132 0.022323732 -0.008937869
## CarCount 0.749585264 0.7545868423 -0.645307877 0.960325256
## BikeCount 1.000000000 0.7269200607 -0.608864711 0.818856416
## BusCount 0.726920061 1.0000000000 -0.568699752 0.837282379
## TruckCount -0.608864711 -0.5686997518 1.000000000 -0.552649980
## Total 0.818856416 0.8372823788 -0.552649980 1.000000000
## hour 0.198004911 0.2226425469 -0.004864151 0.193874934
## minute -0.001020467 0.0009379625 0.004154811 -0.006117425
## am_pm 0.191860372 0.1757892994 0.017641702 0.149989606
## Traffic_Situation_Numeric 0.560930908 0.6887718293 -0.296704073 0.776969464
## hour minute am_pm
## DayofWeek_Numeric 0.000000000 0.0000000000 0.00000000
## Date 0.000000000 0.0000000000 0.00000000
## CarCount 0.161930232 -0.0023072694 0.11464136
## BikeCount 0.198004911 -0.0010204673 0.19186037
## BusCount 0.222642547 0.0009379625 0.17578930
## TruckCount -0.004864151 0.0041548113 0.01764170
## Total 0.193874934 -0.0061174248 0.14998961
## hour 1.000000000 0.0000000000 0.86677814
## minute 0.000000000 1.0000000000 0.00000000
## am_pm 0.866778142 0.0000000000 1.00000000
## Traffic_Situation_Numeric 0.113419565 -0.0211591243 0.07481169
## Traffic_Situation_Numeric
## DayofWeek_Numeric -0.02543575
## Date -0.00471838
## CarCount 0.69116754
## BikeCount 0.56093091
## BusCount 0.68877183
## TruckCount -0.29670407
## Total 0.77696946
## hour 0.11341957
## minute -0.02115912
## am_pm 0.07481169
## Traffic_Situation_Numeric 1.00000000
library(reshape2) # For data manipulation
# Convert correlation matrix to a long format for ggplot
cor_df <- melt(correlation_matrix)
# Plot heatmap using ggplot with rotated x-axis labels
ggplot(cor_df, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "#D73027", mid = "#FFFFBF", high = "#4575B4", midpoint = 0,
limits = c(-1, 1), name = "Correlation") +
geom_text(aes(label = round(value, 4)), color = "black", size = 2.45) +
labs(title = "Correlation Heatmap", x = "", y = "") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5))
library(caret)
library(randomForest)
set.seed(50)
# Create train-test split using createDataPartition
trainIndex <- createDataPartition(traffic_df$Traffic.Situation, p = 0.7, list = FALSE)
train_data <- traffic_df[trainIndex, ]
test_data <- traffic_df[-trainIndex, ]
table(traffic_df$Traffic.Situation)
##
## low normal high heavy
## 304 1669 321 682
train_data_down <- downSample(x = train_data[,-ncol(train_data)],
y = train_data$Traffic.Situation)
table(train_data_down$Traffic.Situation)
##
## low normal high heavy
## 213 213 213 213
#Build the classfication model using randomForest
train_data_down <- subset(train_data_down, select = -c(Class))
# Check the column names in train_data
names(train_data)
## [1] "Time" "Date"
## [3] "Day.of.the.week" "CarCount"
## [5] "BikeCount" "BusCount"
## [7] "TruckCount" "Total"
## [9] "Traffic.Situation" "Time_hms"
## [11] "hour" "minute"
## [13] "am_pm" "DayofWeek_Numeric"
## [15] "Traffic_Situation_Numeric"
# Check the column names in train_data_down
names(train_data_down)
## [1] "Time" "Date" "Day.of.the.week"
## [4] "CarCount" "BikeCount" "BusCount"
## [7] "TruckCount" "Total" "Traffic.Situation"
## [10] "Time_hms" "hour" "minute"
## [13] "am_pm" "DayofWeek_Numeric"
#For train data
model.rf = randomForest(Traffic.Situation ~ ., data = train_data,
keep.forest = TRUE, ntree = 150)
#For train data down
model_down.rf = randomForest(Traffic.Situation ~ ., data = train_data_down,
keep.forest = TRUE, ntree = 150)
print(model.rf)
##
## Call:
## randomForest(formula = Traffic.Situation ~ ., data = train_data, keep.forest = TRUE, ntree = 150)
## Type of random forest: classification
## Number of trees: 150
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 0%
## Confusion matrix:
## low normal high heavy class.error
## low 213 0 0 0 0
## normal 0 1169 0 0 0
## high 0 0 225 0 0
## heavy 0 0 0 478 0
print(model_down.rf)
##
## Call:
## randomForest(formula = Traffic.Situation ~ ., data = train_data_down, keep.forest = TRUE, ntree = 150)
## Type of random forest: classification
## Number of trees: 150
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 1.06%
## Confusion matrix:
## low normal high heavy class.error
## low 213 0 0 0 0.000000000
## normal 1 205 5 2 0.037558685
## high 0 0 213 0 0.000000000
## heavy 0 0 1 212 0.004694836
plot(model.rf, main = "Error rate of random forest")
# Add legend with class labels
legend("topright", legend = levels(train_data$Traffic.Situation), col = 1:4, lty = 1)
plot(model_down.rf, main = "Error rate of random forest")
# Add legend with class labels
legend("topright", legend = levels(train_data$Traffic.Situation), col = 1:4, lty = 1)
varImpPlot(model.rf, pch = 20, main = "Importance of Variables")
varImpPlot(model_down.rf, pch = 20, main = "Importance of Variables")
#For train data
pred_Test_rd = predict(model.rf, test_data[,setdiff(names(test_data),"Traffic.Situation")],
type="response",
norm.votes=TRUE
)
#For train data down
pred_Test_rd_down = predict(model_down.rf, test_data[,setdiff(names(test_data),"is_canceled")],
type="response",
norm.votes=TRUE
)
tab_Test = table(actual=test_data$Traffic.Situation, predicted=pred_Test_rd);
confusionMatrix(tab_Test, mode = "everything")
## Confusion Matrix and Statistics
##
## predicted
## actual low normal high heavy
## low 91 0 0 0
## normal 0 500 0 0
## high 0 0 96 0
## heavy 0 0 0 204
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.9959, 1)
## No Information Rate : 0.5612
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: low Class: normal Class: high Class: heavy
## Sensitivity 1.0000 1.0000 1.0000 1.000
## Specificity 1.0000 1.0000 1.0000 1.000
## Pos Pred Value 1.0000 1.0000 1.0000 1.000
## Neg Pred Value 1.0000 1.0000 1.0000 1.000
## Precision 1.0000 1.0000 1.0000 1.000
## Recall 1.0000 1.0000 1.0000 1.000
## F1 1.0000 1.0000 1.0000 1.000
## Prevalence 0.1021 0.5612 0.1077 0.229
## Detection Rate 0.1021 0.5612 0.1077 0.229
## Detection Prevalence 0.1021 0.5612 0.1077 0.229
## Balanced Accuracy 1.0000 1.0000 1.0000 1.000
tab_Test_down = table(actual=test_data$Traffic.Situation, predicted=pred_Test_rd_down);
confusionMatrix(tab_Test_down,mode="everything")
## Confusion Matrix and Statistics
##
## predicted
## actual low normal high heavy
## low 91 0 0 0
## normal 2 486 8 4
## high 0 0 96 0
## heavy 0 0 1 203
##
## Overall Statistics
##
## Accuracy : 0.9832
## 95% CI : (0.9724, 0.9905)
## No Information Rate : 0.5455
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9727
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: low Class: normal Class: high Class: heavy
## Sensitivity 0.9785 1.0000 0.9143 0.9807
## Specificity 1.0000 0.9654 1.0000 0.9985
## Pos Pred Value 1.0000 0.9720 1.0000 0.9951
## Neg Pred Value 0.9975 1.0000 0.9887 0.9942
## Precision 1.0000 0.9720 1.0000 0.9951
## Recall 0.9785 1.0000 0.9143 0.9807
## F1 0.9891 0.9858 0.9552 0.9878
## Prevalence 0.1044 0.5455 0.1178 0.2323
## Detection Rate 0.1021 0.5455 0.1077 0.2278
## Detection Prevalence 0.1021 0.5612 0.1077 0.2290
## Balanced Accuracy 0.9892 0.9827 0.9571 0.9896
Random Forest Model with Train Data only has higher precision which means it can predict the traffic situation more accurately most of the time while Random Forest Model with Train Data Down has slightly lower precision.
Random Forest Model with Train Data Down has lower recall compare to Random Forest Model with Train Data, which means it has lower percentage of traffic situation that were classified correctly/accurately.
Random Forest Model with Train Data only has higher accuracy and F1 score compared to Random Forest with Train Data Down. Overall,
Random Forest Model with Train Data has better performance compared to Random Forest Model with Train Data Down. In conclusion, Downsampling has not really improved the performance of the model.
trainIndex <- createDataPartition(traffic_df$Total, p = 0.7, list = FALSE)
train_data <- traffic_df[trainIndex, ]
test_data <- traffic_df[-trainIndex, ]
head(train_data)
## Time Date Day.of.the.week CarCount BikeCount BusCount
## 2 2024-01-13 00:15:00 10 Tuesday 49 0 3
## 3 2024-01-13 00:30:00 10 Tuesday 46 0 3
## 4 2024-01-13 00:45:00 10 Tuesday 51 0 2
## 5 2024-01-13 01:00:00 10 Tuesday 57 6 15
## 7 2024-01-13 01:30:00 10 Tuesday 37 0 1
## 8 2024-01-13 01:45:00 10 Tuesday 42 4 4
## TruckCount Total Traffic.Situation Time_hms hour minute am_pm
## 2 3 55 low 12H 15M 0S 0 15 1
## 3 6 55 low 12H 30M 0S 0 30 1
## 4 5 58 low 12H 45M 0S 0 45 1
## 5 16 94 normal 1H 0M 0S 1 0 1
## 7 4 42 low 1H 30M 0S 1 30 1
## 8 5 55 low 1H 45M 0S 1 45 1
## DayofWeek_Numeric Traffic_Situation_Numeric
## 2 2 1
## 3 2 1
## 4 2 1
## 5 2 2
## 7 2 1
## 8 2 1
modelA <- lm(Total ~Date * DayofWeek_Numeric * hour * minute *
am_pm * Traffic_Situation_Numeric,data = selected_cols)
summary(modelA)
##
## Call:
## lm(formula = Total ~ Date * DayofWeek_Numeric * hour * minute *
## am_pm * Traffic_Situation_Numeric, data = selected_cols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -71.414 -22.457 -4.413 19.729 125.775
##
## Coefficients:
## Estimate
## (Intercept) -1.523e+02
## Date 2.450e+00
## DayofWeek_Numeric 1.773e+01
## hour 1.067e+01
## minute 2.490e+00
## am_pm 1.385e+02
## Traffic_Situation_Numeric 6.141e+01
## Date:DayofWeek_Numeric -3.221e-01
## Date:hour -1.860e-01
## DayofWeek_Numeric:hour -1.337e+00
## Date:minute -1.604e-01
## DayofWeek_Numeric:minute -7.059e-01
## hour:minute -2.627e-01
## Date:am_pm -2.955e+00
## DayofWeek_Numeric:am_pm -1.743e+01
## hour:am_pm -9.316e+00
## minute:am_pm -2.133e+00
## Date:Traffic_Situation_Numeric -1.249e+00
## DayofWeek_Numeric:Traffic_Situation_Numeric -1.157e+01
## hour:Traffic_Situation_Numeric 4.214e-01
## minute:Traffic_Situation_Numeric -5.297e-01
## am_pm:Traffic_Situation_Numeric -2.550e+01
## Date:DayofWeek_Numeric:hour 1.922e-02
## Date:DayofWeek_Numeric:minute 3.782e-02
## Date:hour:minute 1.345e-02
## DayofWeek_Numeric:hour:minute 5.314e-02
## Date:DayofWeek_Numeric:am_pm 4.929e-01
## Date:hour:am_pm 2.250e-01
## DayofWeek_Numeric:hour:am_pm 1.260e+00
## Date:minute:am_pm 1.762e-01
## DayofWeek_Numeric:minute:am_pm 7.420e-01
## hour:minute:am_pm 2.089e-01
## Date:DayofWeek_Numeric:Traffic_Situation_Numeric 1.766e-01
## Date:hour:Traffic_Situation_Numeric 1.083e-01
## DayofWeek_Numeric:hour:Traffic_Situation_Numeric 9.232e-01
## Date:minute:Traffic_Situation_Numeric 4.936e-02
## DayofWeek_Numeric:minute:Traffic_Situation_Numeric 1.954e-01
## hour:minute:Traffic_Situation_Numeric 5.866e-02
## Date:am_pm:Traffic_Situation_Numeric 1.422e+00
## DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric 1.104e+01
## hour:am_pm:Traffic_Situation_Numeric 1.052e+00
## minute:am_pm:Traffic_Situation_Numeric 5.611e-01
## Date:DayofWeek_Numeric:hour:minute -2.552e-03
## Date:DayofWeek_Numeric:hour:am_pm -3.401e-02
## Date:DayofWeek_Numeric:minute:am_pm -4.257e-02
## Date:hour:minute:am_pm -1.351e-02
## DayofWeek_Numeric:hour:minute:am_pm -5.291e-02
## Date:DayofWeek_Numeric:hour:Traffic_Situation_Numeric -8.847e-03
## Date:DayofWeek_Numeric:minute:Traffic_Situation_Numeric -1.242e-02
## Date:hour:minute:Traffic_Situation_Numeric -3.546e-03
## DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric -1.014e-02
## Date:DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric -2.642e-01
## Date:hour:am_pm:Traffic_Situation_Numeric -1.134e-01
## DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric -7.996e-01
## Date:minute:am_pm:Traffic_Situation_Numeric -6.162e-02
## DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric -2.442e-01
## hour:minute:am_pm:Traffic_Situation_Numeric -5.762e-02
## Date:DayofWeek_Numeric:hour:minute:am_pm 2.832e-03
## Date:DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric 6.047e-04
## Date:DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric 1.657e-02
## Date:DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric 1.600e-02
## Date:hour:minute:am_pm:Traffic_Situation_Numeric 4.411e-03
## DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric 1.526e-02
## Date:DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric -9.487e-04
## Std. Error
## (Intercept) 1.769e+02
## Date 8.621e+00
## DayofWeek_Numeric 3.881e+01
## hour 1.802e+01
## minute 6.189e+00
## am_pm 1.514e+02
## Traffic_Situation_Numeric 8.327e+01
## Date:DayofWeek_Numeric 1.917e+00
## Date:hour 8.680e-01
## DayofWeek_Numeric:hour 3.970e+00
## Date:minute 3.135e-01
## DayofWeek_Numeric:minute 1.369e+00
## hour:minute 6.416e-01
## Date:am_pm 7.399e+00
## DayofWeek_Numeric:am_pm 3.304e+01
## hour:am_pm 1.138e+01
## minute:am_pm 5.288e+00
## Date:Traffic_Situation_Numeric 4.105e+00
## DayofWeek_Numeric:Traffic_Situation_Numeric 1.808e+01
## hour:Traffic_Situation_Numeric 8.308e+00
## minute:Traffic_Situation_Numeric 2.896e+00
## am_pm:Traffic_Situation_Numeric 6.961e+01
## Date:DayofWeek_Numeric:hour 1.925e-01
## Date:DayofWeek_Numeric:minute 6.916e-02
## Date:hour:minute 3.242e-02
## DayofWeek_Numeric:hour:minute 1.445e-01
## Date:DayofWeek_Numeric:am_pm 1.638e+00
## Date:hour:am_pm 5.485e-01
## DayofWeek_Numeric:hour:am_pm 2.497e+00
## Date:minute:am_pm 2.672e-01
## DayofWeek_Numeric:minute:am_pm 1.148e+00
## hour:minute:am_pm 4.044e-01
## Date:DayofWeek_Numeric:Traffic_Situation_Numeric 9.000e-01
## Date:hour:Traffic_Situation_Numeric 4.068e-01
## DayofWeek_Numeric:hour:Traffic_Situation_Numeric 1.825e+00
## Date:minute:Traffic_Situation_Numeric 1.474e-01
## DayofWeek_Numeric:minute:Traffic_Situation_Numeric 6.366e-01
## hour:minute:Traffic_Situation_Numeric 3.008e-01
## Date:am_pm:Traffic_Situation_Numeric 3.432e+00
## DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric 1.490e+01
## hour:am_pm:Traffic_Situation_Numeric 5.245e+00
## minute:am_pm:Traffic_Situation_Numeric 2.395e+00
## Date:DayofWeek_Numeric:hour:minute 7.225e-03
## Date:DayofWeek_Numeric:hour:am_pm 1.216e-01
## Date:DayofWeek_Numeric:minute:am_pm 5.810e-02
## Date:hour:minute:am_pm 2.033e-02
## DayofWeek_Numeric:hour:minute:am_pm 8.957e-02
## Date:DayofWeek_Numeric:hour:Traffic_Situation_Numeric 8.999e-02
## Date:DayofWeek_Numeric:minute:Traffic_Situation_Numeric 3.223e-02
## Date:hour:minute:Traffic_Situation_Numeric 1.538e-02
## DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric 6.744e-02
## Date:DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric 7.432e-01
## Date:hour:am_pm:Traffic_Situation_Numeric 2.564e-01
## DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric 1.140e+00
## Date:minute:am_pm:Traffic_Situation_Numeric 1.211e-01
## DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric 5.123e-01
## hour:minute:am_pm:Traffic_Situation_Numeric 1.872e-01
## Date:DayofWeek_Numeric:hour:minute:am_pm 4.480e-03
## Date:DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric 3.404e-03
## Date:DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric 5.626e-02
## Date:DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric 2.592e-02
## Date:hour:minute:am_pm:Traffic_Situation_Numeric 9.480e-03
## DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric 4.113e-02
## Date:DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric 2.069e-03
## t value
## (Intercept) -0.861
## Date 0.284
## DayofWeek_Numeric 0.457
## hour 0.592
## minute 0.402
## am_pm 0.915
## Traffic_Situation_Numeric 0.737
## Date:DayofWeek_Numeric -0.168
## Date:hour -0.214
## DayofWeek_Numeric:hour -0.337
## Date:minute -0.512
## DayofWeek_Numeric:minute -0.516
## hour:minute -0.409
## Date:am_pm -0.399
## DayofWeek_Numeric:am_pm -0.527
## hour:am_pm -0.818
## minute:am_pm -0.403
## Date:Traffic_Situation_Numeric -0.304
## DayofWeek_Numeric:Traffic_Situation_Numeric -0.640
## hour:Traffic_Situation_Numeric 0.051
## minute:Traffic_Situation_Numeric -0.183
## am_pm:Traffic_Situation_Numeric -0.366
## Date:DayofWeek_Numeric:hour 0.100
## Date:DayofWeek_Numeric:minute 0.547
## Date:hour:minute 0.415
## DayofWeek_Numeric:hour:minute 0.368
## Date:DayofWeek_Numeric:am_pm 0.301
## Date:hour:am_pm 0.410
## DayofWeek_Numeric:hour:am_pm 0.505
## Date:minute:am_pm 0.659
## DayofWeek_Numeric:minute:am_pm 0.647
## hour:minute:am_pm 0.516
## Date:DayofWeek_Numeric:Traffic_Situation_Numeric 0.196
## Date:hour:Traffic_Situation_Numeric 0.266
## DayofWeek_Numeric:hour:Traffic_Situation_Numeric 0.506
## Date:minute:Traffic_Situation_Numeric 0.335
## DayofWeek_Numeric:minute:Traffic_Situation_Numeric 0.307
## hour:minute:Traffic_Situation_Numeric 0.195
## Date:am_pm:Traffic_Situation_Numeric 0.414
## DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric 0.741
## hour:am_pm:Traffic_Situation_Numeric 0.201
## minute:am_pm:Traffic_Situation_Numeric 0.234
## Date:DayofWeek_Numeric:hour:minute -0.353
## Date:DayofWeek_Numeric:hour:am_pm -0.280
## Date:DayofWeek_Numeric:minute:am_pm -0.733
## Date:hour:minute:am_pm -0.664
## DayofWeek_Numeric:hour:minute:am_pm -0.591
## Date:DayofWeek_Numeric:hour:Traffic_Situation_Numeric -0.098
## Date:DayofWeek_Numeric:minute:Traffic_Situation_Numeric -0.385
## Date:hour:minute:Traffic_Situation_Numeric -0.231
## DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric -0.150
## Date:DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric -0.356
## Date:hour:am_pm:Traffic_Situation_Numeric -0.442
## DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric -0.701
## Date:minute:am_pm:Traffic_Situation_Numeric -0.509
## DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric -0.477
## hour:minute:am_pm:Traffic_Situation_Numeric -0.308
## Date:DayofWeek_Numeric:hour:minute:am_pm 0.632
## Date:DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric 0.178
## Date:DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric 0.295
## Date:DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric 0.617
## Date:hour:minute:am_pm:Traffic_Situation_Numeric 0.465
## DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric 0.371
## Date:DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric -0.459
## Pr(>|t|)
## (Intercept) 0.389
## Date 0.776
## DayofWeek_Numeric 0.648
## hour 0.554
## minute 0.687
## am_pm 0.360
## Traffic_Situation_Numeric 0.461
## Date:DayofWeek_Numeric 0.867
## Date:hour 0.830
## DayofWeek_Numeric:hour 0.736
## Date:minute 0.609
## DayofWeek_Numeric:minute 0.606
## hour:minute 0.682
## Date:am_pm 0.690
## DayofWeek_Numeric:am_pm 0.598
## hour:am_pm 0.413
## minute:am_pm 0.687
## Date:Traffic_Situation_Numeric 0.761
## DayofWeek_Numeric:Traffic_Situation_Numeric 0.522
## hour:Traffic_Situation_Numeric 0.960
## minute:Traffic_Situation_Numeric 0.855
## am_pm:Traffic_Situation_Numeric 0.714
## Date:DayofWeek_Numeric:hour 0.920
## Date:DayofWeek_Numeric:minute 0.585
## Date:hour:minute 0.678
## DayofWeek_Numeric:hour:minute 0.713
## Date:DayofWeek_Numeric:am_pm 0.764
## Date:hour:am_pm 0.682
## DayofWeek_Numeric:hour:am_pm 0.614
## Date:minute:am_pm 0.510
## DayofWeek_Numeric:minute:am_pm 0.518
## hour:minute:am_pm 0.606
## Date:DayofWeek_Numeric:Traffic_Situation_Numeric 0.844
## Date:hour:Traffic_Situation_Numeric 0.790
## DayofWeek_Numeric:hour:Traffic_Situation_Numeric 0.613
## Date:minute:Traffic_Situation_Numeric 0.738
## DayofWeek_Numeric:minute:Traffic_Situation_Numeric 0.759
## hour:minute:Traffic_Situation_Numeric 0.845
## Date:am_pm:Traffic_Situation_Numeric 0.679
## DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric 0.459
## hour:am_pm:Traffic_Situation_Numeric 0.841
## minute:am_pm:Traffic_Situation_Numeric 0.815
## Date:DayofWeek_Numeric:hour:minute 0.724
## Date:DayofWeek_Numeric:hour:am_pm 0.780
## Date:DayofWeek_Numeric:minute:am_pm 0.464
## Date:hour:minute:am_pm 0.506
## DayofWeek_Numeric:hour:minute:am_pm 0.555
## Date:DayofWeek_Numeric:hour:Traffic_Situation_Numeric 0.922
## Date:DayofWeek_Numeric:minute:Traffic_Situation_Numeric 0.700
## Date:hour:minute:Traffic_Situation_Numeric 0.818
## DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric 0.880
## Date:DayofWeek_Numeric:am_pm:Traffic_Situation_Numeric 0.722
## Date:hour:am_pm:Traffic_Situation_Numeric 0.658
## DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric 0.483
## Date:minute:am_pm:Traffic_Situation_Numeric 0.611
## DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric 0.634
## hour:minute:am_pm:Traffic_Situation_Numeric 0.758
## Date:DayofWeek_Numeric:hour:minute:am_pm 0.527
## Date:DayofWeek_Numeric:hour:minute:Traffic_Situation_Numeric 0.859
## Date:DayofWeek_Numeric:hour:am_pm:Traffic_Situation_Numeric 0.768
## Date:DayofWeek_Numeric:minute:am_pm:Traffic_Situation_Numeric 0.537
## Date:hour:minute:am_pm:Traffic_Situation_Numeric 0.642
## DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric 0.711
## Date:DayofWeek_Numeric:hour:minute:am_pm:Traffic_Situation_Numeric 0.647
##
## Residual standard error: 30.81 on 2912 degrees of freedom
## Multiple R-squared: 0.7436, Adjusted R-squared: 0.738
## F-statistic: 134 on 63 and 2912 DF, p-value: < 2.2e-16
modelB <- lm(Total ~Date * DayofWeek_Numeric * hour * minute *
am_pm + Traffic_Situation_Numeric,data = selected_cols)
summary(modelB)
##
## Call:
## lm(formula = Total ~ Date * DayofWeek_Numeric * hour * minute *
## am_pm + Traffic_Situation_Numeric, data = selected_cols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.330 -23.388 -3.542 20.039 125.363
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.232e+02 4.206e+01 -2.929 0.00342
## Date 1.762e-01 2.147e+00 0.082 0.93462
## DayofWeek_Numeric -6.703e+00 9.780e+00 -0.685 0.49316
## hour 1.185e+01 4.264e+00 2.779 0.00549
## minute 1.841e+00 1.499e+00 1.228 0.21937
## am_pm 8.938e+01 3.618e+01 2.471 0.01354
## Traffic_Situation_Numeric 4.590e+01 6.277e-01 73.129 < 2e-16
## Date:DayofWeek_Numeric 4.399e-02 4.961e-01 0.089 0.92934
## Date:hour 4.904e-02 2.176e-01 0.225 0.82169
## DayofWeek_Numeric:hour 7.227e-01 9.910e-01 0.729 0.46587
## Date:minute -6.862e-02 7.653e-02 -0.897 0.36994
## DayofWeek_Numeric:minute -2.919e-01 3.485e-01 -0.838 0.40234
## hour:minute -1.734e-01 1.518e-01 -1.142 0.25356
## Date:am_pm -3.060e-01 1.847e+00 -0.166 0.86846
## DayofWeek_Numeric:am_pm 5.962e+00 8.414e+00 0.709 0.47869
## hour:am_pm -7.333e+00 2.696e+00 -2.720 0.00657
## minute:am_pm -1.354e+00 1.289e+00 -1.050 0.29386
## Date:DayofWeek_Numeric:hour -6.672e-03 5.027e-02 -0.133 0.89442
## Date:DayofWeek_Numeric:minute 1.068e-02 1.768e-02 0.604 0.54561
## Date:hour:minute 5.942e-03 7.754e-03 0.766 0.44361
## DayofWeek_Numeric:hour:minute 3.115e-02 3.531e-02 0.882 0.37784
## Date:DayofWeek_Numeric:am_pm -5.214e-02 4.268e-01 -0.122 0.90276
## Date:hour:am_pm -1.178e-02 1.376e-01 -0.086 0.93180
## DayofWeek_Numeric:hour:am_pm -4.946e-01 6.268e-01 -0.789 0.43009
## Date:minute:am_pm 5.735e-02 6.584e-02 0.871 0.38378
## DayofWeek_Numeric:minute:am_pm 2.254e-01 2.998e-01 0.752 0.45219
## hour:minute:am_pm 1.096e-01 9.604e-02 1.142 0.25370
## Date:DayofWeek_Numeric:hour:minute -1.057e-03 1.791e-03 -0.590 0.55532
## Date:DayofWeek_Numeric:hour:am_pm 4.553e-03 3.179e-02 0.143 0.88613
## Date:DayofWeek_Numeric:minute:am_pm -8.063e-03 1.521e-02 -0.530 0.59603
## Date:hour:minute:am_pm -4.299e-03 4.905e-03 -0.876 0.38084
## DayofWeek_Numeric:hour:minute:am_pm -1.992e-02 2.233e-02 -0.892 0.37261
## Date:DayofWeek_Numeric:hour:minute:am_pm 6.841e-04 1.133e-03 0.604 0.54599
##
## (Intercept) **
## Date
## DayofWeek_Numeric
## hour **
## minute
## am_pm *
## Traffic_Situation_Numeric ***
## Date:DayofWeek_Numeric
## Date:hour
## DayofWeek_Numeric:hour
## Date:minute
## DayofWeek_Numeric:minute
## hour:minute
## Date:am_pm
## DayofWeek_Numeric:am_pm
## hour:am_pm **
## minute:am_pm
## Date:DayofWeek_Numeric:hour
## Date:DayofWeek_Numeric:minute
## Date:hour:minute
## DayofWeek_Numeric:hour:minute
## Date:DayofWeek_Numeric:am_pm
## Date:hour:am_pm
## DayofWeek_Numeric:hour:am_pm
## Date:minute:am_pm
## DayofWeek_Numeric:minute:am_pm
## hour:minute:am_pm
## Date:DayofWeek_Numeric:hour:minute
## Date:DayofWeek_Numeric:hour:am_pm
## Date:DayofWeek_Numeric:minute:am_pm
## Date:hour:minute:am_pm
## DayofWeek_Numeric:hour:minute:am_pm
## Date:DayofWeek_Numeric:hour:minute:am_pm
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.93 on 2943 degrees of freedom
## Multiple R-squared: 0.7389, Adjusted R-squared: 0.736
## F-statistic: 260.2 on 32 and 2943 DF, p-value: < 2.2e-16
modelC <- lm(Total ~DayofWeek_Numeric * hour * minute * am_pm
+ Date + Traffic_Situation_Numeric,data = selected_cols)
summary(modelC)
##
## Call:
## lm(formula = Total ~ DayofWeek_Numeric * hour * minute * am_pm +
## Date + Traffic_Situation_Numeric, data = selected_cols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.784 -23.576 -4.029 20.053 125.602
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.200e+02 1.926e+01 -6.229 5.36e-10 ***
## DayofWeek_Numeric -5.964e+00 4.412e+00 -1.352 0.177
## hour 1.269e+01 1.957e+00 6.487 1.02e-10 ***
## minute 6.655e-01 6.852e-01 0.971 0.332
## am_pm 8.471e+01 1.656e+01 5.117 3.31e-07 ***
## Date -3.418e-02 6.328e-02 -0.540 0.589
## Traffic_Situation_Numeric 4.587e+01 6.262e-01 73.262 < 2e-16 ***
## DayofWeek_Numeric:hour 6.037e-01 4.471e-01 1.350 0.177
## DayofWeek_Numeric:minute -1.019e-01 1.572e-01 -0.648 0.517
## hour:minute -7.100e-02 6.945e-02 -1.022 0.307
## DayofWeek_Numeric:am_pm 5.088e+00 3.796e+00 1.340 0.180
## hour:am_pm -7.554e+00 1.236e+00 -6.111 1.12e-09 ***
## minute:am_pm -3.757e-01 5.895e-01 -0.637 0.524
## DayofWeek_Numeric:hour:minute 1.243e-02 1.593e-02 0.780 0.435
## DayofWeek_Numeric:hour:am_pm -4.152e-01 2.828e-01 -1.468 0.142
## DayofWeek_Numeric:minute:am_pm 8.158e-02 1.352e-01 0.603 0.546
## hour:minute:am_pm 3.596e-02 4.392e-02 0.819 0.413
## DayofWeek_Numeric:hour:minute:am_pm -7.762e-03 1.008e-02 -0.770 0.441
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.87 on 2958 degrees of freedom
## Multiple R-squared: 0.7385, Adjusted R-squared: 0.737
## F-statistic: 491.3 on 17 and 2958 DF, p-value: < 2.2e-16
modelD <- lm(Total ~DayofWeek_Numeric * hour * am_pm
+ minute + Date + Traffic_Situation_Numeric,data = selected_cols)
summary(modelD)
##
## Call:
## lm(formula = Total ~ DayofWeek_Numeric * hour * am_pm + minute +
## Date + Traffic_Situation_Numeric, data = selected_cols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -72.405 -23.725 -4.058 20.084 123.557
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -105.87194 11.57309 -9.148 < 2e-16 ***
## DayofWeek_Numeric -8.25516 2.63826 -3.129 0.001771 **
## hour 11.09512 1.17446 9.447 < 2e-16 ***
## am_pm 76.24396 9.91906 7.687 2.04e-14 ***
## minute 0.03801 0.03376 1.126 0.260331
## Date -0.03417 0.06329 -0.540 0.589286
## Traffic_Situation_Numeric 45.87978 0.62564 73.333 < 2e-16 ***
## DayofWeek_Numeric:hour 0.88331 0.26728 3.305 0.000961 ***
## DayofWeek_Numeric:am_pm 6.92323 2.26978 3.050 0.002307 **
## hour:am_pm -6.74449 0.74164 -9.094 < 2e-16 ***
## DayofWeek_Numeric:hour:am_pm -0.58975 0.16907 -3.488 0.000493 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.88 on 2965 degrees of freedom
## Multiple R-squared: 0.7377, Adjusted R-squared: 0.7368
## F-statistic: 833.9 on 10 and 2965 DF, p-value: < 2.2e-16
modelE <- lm(Total ~DayofWeek_Numeric * hour * am_pm
+ Traffic_Situation_Numeric,data = selected_cols)
summary(modelE)
##
## Call:
## lm(formula = Total ~ DayofWeek_Numeric * hour * am_pm + Traffic_Situation_Numeric,
## data = selected_cols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -71.557 -23.517 -4.038 20.169 123.895
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -105.5631 11.5001 -9.179 < 2e-16 ***
## DayofWeek_Numeric -8.2548 2.6381 -3.129 0.001770 **
## hour 11.0991 1.1744 9.451 < 2e-16 ***
## am_pm 76.2645 9.9183 7.689 2e-14 ***
## Traffic_Situation_Numeric 45.8633 0.6253 73.342 < 2e-16 ***
## DayofWeek_Numeric:hour 0.8834 0.2673 3.306 0.000959 ***
## DayofWeek_Numeric:am_pm 6.9252 2.2696 3.051 0.002299 **
## hour:am_pm -6.7467 0.7416 -9.098 < 2e-16 ***
## DayofWeek_Numeric:hour:am_pm -0.5899 0.1691 -3.489 0.000492 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.88 on 2967 degrees of freedom
## Multiple R-squared: 0.7376, Adjusted R-squared: 0.7369
## F-statistic: 1042 on 8 and 2967 DF, p-value: < 2.2e-16
model_names <- c("Model A", "Model B", "Model C", "Model D", "Model E")
multiple_r_squared <- c(0.7436, 0.7389, 0.7385, 0.7377, 0.7376)
adjusted_r_squared <- c(0.7416, 0.736, 0.737, 0.7368, 0.7369)
f_statistic <- c(274.2, 260.2, 491.3, 833.9, 1042)
residual_standard_error <- c(30.91, 30.93, 30.87, 30.88, 30.88)
model_comparison <- data.frame(Model = model_names,
Multiple_R_squared = multiple_r_squared,
Adjusted_R_squared = adjusted_r_squared,
F_statistic = f_statistic,
Residual_Standard_Error = residual_standard_error)
# Displaying the model comparison dataframe
print(model_comparison)
## Model Multiple_R_squared Adjusted_R_squared F_statistic
## 1 Model A 0.7436 0.7416 274.2
## 2 Model B 0.7389 0.7360 260.2
## 3 Model C 0.7385 0.7370 491.3
## 4 Model D 0.7377 0.7368 833.9
## 5 Model E 0.7376 0.7369 1042.0
## Residual_Standard_Error
## 1 30.91
## 2 30.93
## 3 30.87
## 4 30.88
## 5 30.88
predictions <- predict(modelE, newdata = test_data)
mae <- mean(abs(predictions - test_data$Total))
rsquared <- 1 - sum((predictions - test_data$Total)^2) / sum((mean(test_data$Total) - test_data$Total)^2)
print(paste("Mean Absolute Error (MAE):", mae))
## [1] "Mean Absolute Error (MAE): 25.4947560591296"
print(paste("R-squared:", rsquared))
## [1] "R-squared: 0.727816654081597"
plot(modelE)
Model E exhibits strong performance, with an R-squared value of 0.7376, indicating that it explains a significant portion of the variance in the target variable. Model E is slightly lower in R-squared but has a similar adjusted R-squared, suggesting it may be a more parsimonious model. Model E has the highest F-statistic, indicating a significant improvement over the null model. Models A, D and E have similar and lower residual standard errors, suggesting better precision. In summary, model E is preferred as it balances good fit (R-squared), simplicity (adjusted R-squared) and predictive power (F-statistic). The small decrease in R-squared compared to Model A is justifiable due to the more parsimonious nature of Model E. However, the differences between models are relatively small and further refinement such as feature engineering, exploring additional variables or trying different algorithms may lead to improved results.
• The traffic data suggests a consistent monthly pattern, indicating a steady flow of vehicles throughout the month.
• Weekdays exhibit higher vehicle volumes compared to weekends, suggesting potential variations in commuter behavior or activities.
• The heavy traffic during working hours (0600 to 1800) highlights the impact of regular working schedules on traffic patterns.
• Random forest is successfully developed, which can be used to predict and classify traffic situations into distinct levels (low, medium, high, heavy).
• Based on the confusion matrix for both train data model and downsampling data model, both models show strong predictive capabilities for classifying traffic situations. The train data model achieving perfect accuracy and the downsampling data model maintaining high accuracy while addressing class imbalances. These models can serve as valuable tools for predicting and managing traffic situations, but further validation and testing on diverse datasets are recommended for comprehensive assessment and deployment.
• Besides, linear regression is developed, which can be used to accurately forecast the number of vehicles on the roads at specific times.
• In overall, Model E stands as a robust tool for forecasting traffic, offering valuable insights into temporal and situational factors influencing vehicular flow. It proves highly effective in predicting road traffic and showcasing strong explanatory power with an R-squared of 0.7376. Key features like day, hour, and traffic situation are critical and their interactions significantly contribute to accurate predictions. Compared to other models, Model E demonstrates competitive performance. While effective on the evaluated data, ongoing validation and adaptation to diverse datasets are crucial for real-world applicability.