Calling “dplyr” in order to use the “slice” function
library(dplyr)
Importing clean_traffic_data set from “k_means_clustering_Aarhus_Traffic” and removing duplicates
df <- clean_traffic_data
df_unique <- unique(df)
First we remove date columns to leave time values and variables. We then Create a sequence from 0 to 55 in increments of 5 to imitate the time intervals of the original dataset.
whole_set_selected <- select(df_unique, Hour, Minute, avgMeasuredTime, vehicleCount)
Minute <- data.frame(Minute = seq(0,55,5))
‘Minute_padded’ adds a leading zero to single digit values to create a common width (0 becomes 00, 5 becomes 05). We then merge the hour column and minute sequence vector and rename the columns.
Minute_padded <- sprintf("%02d",Minute$Minute)
whole_merged_set <- merge(unique(df_unique$Hour), as.data.frame(Minute_padded), all=TRUE)
colnames(whole_merged_set) <- c("Hour","Minute")
Merging the two selected data frames. all.x means NAs are added where rows don’t merge.
merge_whole_set <- merge(whole_merged_set, whole_set_selected, all.x = TRUE)
Selecting necessary columns, calculating the k-means clusters and producing a 3D plot by hour
library(rgl)
output_1 <- select(merge_whole_set,Hour, avgMeasuredTime, vehicleCount)
output_k <- kmeans(na.omit(output_1), 5)
plot3d(output_1, col=output_k$cluster, size=3, xlab = "Hour",
ylab="Average Time", zlab="Vehicle Count")
Data instances are now discrete values by hour. Rush hour can be seen to be from 6-8am and 3-4pm.