Calling “dplyr” in order to use the “slice” function

library(dplyr)

Importing clean_traffic_data set from “k_means_clustering_Aarhus_Traffic” and removing duplicates

df <- clean_traffic_data
df_unique <- unique(df)

First we remove date columns to leave time values and variables. We then Create a sequence from 0 to 55 in increments of 5 to imitate the time intervals of the original dataset.

whole_set_selected <- select(df_unique, Hour, Minute, avgMeasuredTime, vehicleCount)
Minute <- data.frame(Minute = seq(0,55,5))

‘Minute_padded’ adds a leading zero to single digit values to create a common width (0 becomes 00, 5 becomes 05). We then merge the hour column and minute sequence vector and rename the columns.

Minute_padded <- sprintf("%02d",Minute$Minute)

whole_merged_set <- merge(unique(df_unique$Hour), as.data.frame(Minute_padded), all=TRUE)
colnames(whole_merged_set) <- c("Hour","Minute")

Merging the two selected data frames. all.x means NAs are added where rows don’t merge.

merge_whole_set <- merge(whole_merged_set, whole_set_selected, all.x = TRUE)

Selecting necessary columns, calculating the k-means clusters and producing a 3D plot by hour

library(rgl)
output_1 <- select(merge_whole_set,Hour, avgMeasuredTime, vehicleCount)
output_k <- kmeans(na.omit(output_1), 5)
plot3d(output_1, col=output_k$cluster, size=3, xlab = "Hour", 
       ylab="Average Time", zlab="Vehicle Count")

Data instances are now discrete values by hour. Rush hour can be seen to be from 6-8am and 3-4pm.