The markdown document here shows how data preparation was done in R towards the application here.

To derive the new coordinates, a map was needed to be super-imposed on the locations, which was done using Tableau. The detailed steps of the preliminary data processing can be accessed through https://tinyurl.com/y9azq63n

The steps of data preparation in R begin from below, and the pre-processed data set can be accessed from https://tinyurl.com/y7sjec96

Set the working directory to your preferred location.

A set of packages are needed to carry out to create the nodes and edges table. First is tidyr and tidyverse, which gives a comprehensive suite of packages to handle messy data.

The splitstackshape library is used to break down the concatanted string into two other columns, which form the basis of the “From” and “To” columns.

## Warning: package 'tidyr' was built under R version 3.3.3
## Warning: package 'tidyverse' was built under R version 3.3.3
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.3.3
## Warning: package 'tibble' was built under R version 3.3.3
## Warning: package 'readr' was built under R version 3.3.3
## Warning: package 'purrr' was built under R version 3.3.3
## Warning: package 'dplyr' was built under R version 3.3.3
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
## Warning: package 'splitstackshape' was built under R version 3.3.3
## Loading required package: data.table
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose

Read in the data using read_csv which gives a neat way to extract information from csv files.

## Parsed with column specification:
## cols(
##   Timestamp = col_character(),
##   `car-id` = col_character(),
##   `car-type` = col_character(),
##   `gate-name` = col_character(),
##   `X Coord` = col_double(),
##   `Y Coord` = col_double(),
##   `Path ID` = col_integer(),
##   `Car#` = col_integer(),
##   `Time Spent` = col_character(),
##   Distance = col_integer(),
##   `Distance in Miles` = col_double(),
##   `Time Spent in Hours` = col_double(),
##   SPEEDS = col_double(),
##   `Corridor Name` = col_character(),
##   DistanceNew = col_double()
## )
d2<-concat.split.multiple(data=d1,split.cols = c("Corridor Name"), seps = "+")
## This function is deprecated. Use `cSplit` instead.

This step will create some nulls in the case of the first instance of the timestamp for each car-ID as the location of the timestamp is not a corridor. For e.g. for the first timestamp, the corridor is just ‘entrance3’ and cannot be further expanded. So, the nulls are removed by the below code.

The two columns are renamed to From and To.

d3<-d2[-which(is.na(d2$`Corridor Name_2`)),]

names(d3)[names(d3)=="Corridor Name_1"]<-"From"
names(d3)[names(d3)=="Corridor Name_2"]<-"To"

There are some unnecessary columns, which can be dropped.

d4<-d3[,-c(7:14)]

d4<-d4[,c(7,8,1,2,3,4,5,6)]

The data table created now can be split into nodes and edges tables.

A new transformation to create gate groups is done as below.

nodes$gate_categories[(grepl('gate',nodes$`gate-name`))]<-"Gates"
nodes$gate_categories[(grepl('general',nodes$`gate-name`))]<-"General Gates"
nodes$gate_categories[(grepl('ranger',nodes$`gate-name`))]<-"Ranger Stops"
nodes$gate_categories[(grepl('entrance',nodes$`gate-name`))]<-"Entrances"
nodes$gate_categories[(grepl('camping',nodes$`gate-name`))]<-"Camps"

nodes$x<-nodes$X
nodes$y<-nodes$Y
nodes<-unique(nodes)
write_csv(nodes,path="C:/Users/bluec/Desktop/nodes1.csv")

For the edges table, another of columns need to be extracted.

edges<-d4[,c(1,2,3,4,5)]

The timestamp information is converted into new set of Day time and night time through the following code.

strptime is used as the base package to extract hour information by parsing the timestamp.

A categorisation is made and a count column is appended to show the traffic network density to be incremented by 1 whenever the particular edge is present.

edges$Timestamp = as.character(edges$Timestamp)
edges$hour = hour(strptime(edges$Timestamp,"%d/%m/%Y %H:%M"))
edges$tod[edges$hour<=6|edges$hour>18]<-"Night time"
edges$tod[edges$hour>6&edges$hour<=18]<-"Day time"

edges$Count<-1

edges<-edges[,-c(6)]
write_csv(edges,path="C:/Users/bluec/Desktop/edges1.csv")