The purpose of our project stems from the fact that we are interested to find out connectivity between buses in Singapore and to find out which are the more dominant or those bus stops that are dominated. These information could help LTA or other agencies plan for better connectivity measures such as increasing frequency of buses to certain areas, or creating more bus stops in others.
I came out with 5 dominant maps to address different interests. (Refer to the Figure labels in Sketch) Figure 1 will address the concerns of which flows are more dominant. We will remove flows that are less dominant. Remaining flow paths are those that are potentially carrying more passengers. Thickness of the graph indicates the frequency of passengers between any 2 busstops in Singapore.
Figure 2 will address connections between busstops in Singapore. This is an animated graph which will show all the connections for busstops. This uses ‘gganimate’ to allow the graph to iterate through bus connections with frequency >1, >1000, >5000 and so on.
Figure 3 is an interactive node graph classified by Planning Area. Each of the coloured node represents a planning area. I have filtered to some PA because there are too many within the picture it will look too cluttered. When user hover over the node, they can see which PA it belongs to. The edges represents the connection while the thickness of the edge represents the frequency. Frequency is transformed such that the edges does not look thicker than the nodes.
Figure 4 is an interactive Sankey diagram also classified by Planning Area. If there are buses travelling within its own planning area, it can also be reflected in the Sankey diagram. Similarly, thickness of the flow represents the frequency of passengers. Frequency is transformed such that the edges look reasonable sized, but still retains proportion. Similarly, i also filtered out some PA for viewing or it will be too cluttered.
Figure 5 is a animated time series facet plot, showing only some selected PA. The facets are facetted by origin PA, while the coloured lines in each box are classified by Destination PA.
packages = c('flows','sp','tidyverse','st','sf','maptools','networkD3','gganimate','leaflet','RColorBrewer')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
data<- read.csv("data/origin_subset_10000.csv")
Show first 10 row of data
head(data,10)
## X YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE
## 1 0 2020-01 WEEKENDS/HOLIDAY 16 BUS 4168
## 2 1 2020-01 WEEKDAY 16 BUS 4168
## 3 2 2020-01 WEEKENDS/HOLIDAY 14 BUS 80119
## 4 3 2020-01 WEEKDAY 14 BUS 80119
## 5 4 2020-01 WEEKDAY 17 BUS 20281
## 6 5 2020-01 WEEKENDS/HOLIDAY 17 BUS 20281
## 7 6 2020-01 WEEKDAY 7 BUS 19051
## 8 7 2020-01 WEEKENDS/HOLIDAY 17 BUS 11169
## 9 8 2020-01 WEEKDAY 14 BUS 81049
## 10 9 2020-01 WEEKENDS/HOLIDAY 14 BUS 81049
## DESTINATION_PT_CODE TOTAL_TRIPS BusStopCode_x RoadName_Origin
## 1 10051 80 4168 Nth Bridge Rd
## 2 10051 60 4168 Nth Bridge Rd
## 3 90079 80 80119 Sims Way
## 4 90079 240 80119 Sims Way
## 5 20141 260 20281 West Coast Highway
## 6 20141 40 20281 West Coast Highway
## 7 10017 20 19051 Dover Rd
## 8 4219 18 11169 C'wealth Ave
## 9 70161 40 81049 Geylang Rd
## 10 70161 20 81049 Geylang Rd
## Description_Origin Planning_Area_Origin Latitude_Origin
## 1 Aft City Hall Stn Exit B DOWNTOWN CORE 1.292082
## 2 Aft City Hall Stn Exit B DOWNTOWN CORE 1.292082
## 3 Aft Geylang Rd GEYLANG 1.309921
## 4 Aft Geylang Rd GEYLANG 1.309921
## 5 Opp The Japanese Sec Sch CLEMENTI 1.312342
## 6 Opp The Japanese Sec Sch CLEMENTI 1.312342
## 7 New Town Sec Sch QUEENSTOWN 1.308964
## 8 C'Wealth Stn Exit B/C QUEENSTOWN 1.302705
## 9 Aft Lor 34 Geylang GEYLANG 1.314076
## 10 Aft Lor 34 Geylang GEYLANG 1.314076
## Longitude_Origin BusStopCode_y RoadName_Destination
## 1 103.8513 10051 Jln Bt Merah
## 2 103.8513 10051 Jln Bt Merah
## 3 103.8760 90079 Rhu Cross
## 4 103.8760 90079 Rhu Cross
## 5 103.7548 20141 West Coast Rd
## 6 103.7548 20141 West Coast Rd
## 7 103.7739 10017 Eu Tong Sen St
## 8 103.7984 4219 River Valley Rd
## 9 103.8874 70161 Merpati Rd
## 10 103.8874 70161 Merpati Rd
## Description_Destination Planning_Area_Destination Latitude_Destination
## 1 Blk 149 BUKIT MERAH 1.277412
## 2 Blk 149 BUKIT MERAH 1.277412
## 3 Opp Costa Rhu Condo KALLANG 1.297607
## 4 Opp Costa Rhu Condo KALLANG 1.297607
## 5 Opp Blk 408 JURONG EAST 1.317720
## 6 Opp Blk 408 JURONG EAST 1.317720
## 7 Aft Hosp Dr BUKIT MERAH 1.278320
## 8 Opp Clarke Quay MUSEUM 1.290968
## 9 OPP MATTAR STN EXIT A GEYLANG 1.327586
## 10 OPP MATTAR STN EXIT A GEYLANG 1.327586
## Longitude_Destination
## 1 103.8321
## 2 103.8321
## 3 103.8673
## 4 103.8673
## 5 103.7488
## 6 103.7488
## 7 103.8376
## 8 103.8469
## 9 103.8832
## 10 103.8832
In the field of spatial analysis, working on flows supposes to focus on the relationships between places rather than on their characteristics. Analysis and flow representation often assume a selection to ease the interpretation.
One of the first method developed was the so-called dominant flows (or nodal regions) proposed by Nystuen and Dacey in 1961 (Dacey (1961)). According to this method, a place i is dominated by a place j if two conditions are met:
the most important flow from i is emitted towards j;
the sum of the flows received by j is greater than the sum of the flows received by i.
This method creates what is called in graph theory a tree (acyclic graph) or a forest (a set of unconnected trees) with three types of nodes: dominant, dominated and intermediate. Various methods have subsequently been proposed to better reflect this intensity.Analysing commuters data between cities, one may choose to select:
all flows greater than 100;
the 50 first flows (global criterion);
the 10 first flows emitted by each city (local criterion).
These criteria can also be expressed in relative form:
flows that represent more than 10% of the active population of each city (local criterion);
flows that take into account 80% of all commuters (global criterion).
Show only relevant columns for this project
flow <- data %>%
select('DAY_TYPE','TIME_PER_HOUR','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS') %>%
unite(from_to, BusStopCode_x,BusStopCode_y, sep = "_", remove=FALSE) %>%
group_by(from_to) %>%
summarise(Frequency = sum(TOTAL_TRIPS))%>%
separate(from_to, c("from", "to"))
To show the flow in terms of count between 2 busstop codes:
# Prepare data
myflows <- prepflows(mat = flow, i = "from", j = "to", fij = "Frequency")
myflows[1:4,1:4]
## 10009 10011 10017 10018
## 10009 0 0 0 0
## 10011 0 0 0 0
## 10017 0 0 0 0
## 10018 0 0 0 0
The statmat function provides various indicators and graphical outputs on a flow matrix to allow statistically relevant selections. Measures provided are density (number of present flows divided by the number of possible flows); number, size and composition of connected components; sum, quartiles and average intensity of flows. In addition, four graphics can be plotted: degree distribution curve (by default, outdegree), weighted degree distribution curve, Lorenz curve and boxplot on flow intensities.
# Get statistics about the matrix
statmat(mat = myflows, output = "none", verbose = TRUE)
## matrix dimension: 3372 X 3372
## nb. links: 5951
## density: 0.0005235325
## nb. of components (weak) 130
## nb. of components (weak, size > 1) 130
## sum of flows: 3382029
## min: 2
## Q1: 20
## median: 80
## Q3: 320
## max: 143840
## mean: 568.3127
## sd: 3119.456
# Plot Lorenz curve only
statmat(mat = myflows, output = "lorenz", verbose = FALSE)
# Graphics only
statmat(mat = myflows, output = "all", verbose = FALSE)
# Statistics only
mystats <- statmat(mat = myflows, output = "none", verbose = FALSE)
str(mystats)
## List of 16
## $ matdim : int [1:2] 3372 3372
## $ nblinks : num 5951
## $ density : num 0.000524
## $ connectcomp : int 130
## $ connectcompx: int 130
## $ sizecomp :'data.frame': 130 obs. of 3 variables:
## ..$ idcomp : int [1:130] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ sizecomp: num [1:130] 3062 2 2 2 2 ...
## ..$ wcomp : num [1:130] 3140906 60 20 100 160 ...
## $ compocomp :'data.frame': 3372 obs. of 2 variables:
## ..$ id : chr [1:3372] "10009" "10011" "10017" "10018" ...
## ..$ idcomp: num [1:3372] 1 1 1 1 1 1 1 1 1 1 ...
## $ degrees :'data.frame': 3372 obs. of 3 variables:
## ..$ id : chr [1:3372] "10009" "10011" "10017" "10018" ...
## ..$ degree : num [1:3372] 12 4 10 3 2 3 4 1 3 2 ...
## ..$ wdegree: num [1:3372] 5382 1640 592 350 60 ...
## $ sumflows : num 3382029
## $ min : num 2
## $ Q1 : num 20
## $ median : num 80
## $ Q3 : num 320
## $ max : num 143840
## $ mean : num 568
## $ sd : num 3119
# Sum of flows
mystats$sumflows
## [1] 3382029
To ease comparisons, the compmat function outputs a data.frame that provides statistics on differences between two matrices (for example a matrix and selection of this matrix).
Visualisation helps analysis, plotDomFlows function produces a graph where sizes and colors of vertices depend on their position in the graph (dominant, intermediate or dominated) and thicknesses of links depend on flow intensites.
The plotMapDomFlows function maps the selected flows according to the same principles. Both functions only apply to a dominant flows selection1.
We compare two different thresholds (5000 and 10000) on the total volume of flows.
# Remove the matrix diagonal
diag(myflows) <- 0
# Selection of flows > 500
flowSel1 <- firstflowsg(mat = myflows, method = "xfirst", k = 5000)
# Selection of flows > 1000
flowSel2 <- firstflowsg(mat = myflows, method = "xfirst", k = 10000)
# Compare initial matrix and selected matrices
compmat(mat1 = myflows, mat2 = myflows * flowSel1, digits = 1)
## mat1 mat2 absdiff reldiff
## nblinks 5951.0 104.0 5847 98.3
## sumflows 3382029.0 1516940.0 1865089 55.1
## connectcompx 130.0 74.0 56 NA
## min 2.0 5060.0 NA NA
## Q1 20.0 7155.0 NA NA
## median 80.0 9290.0 NA NA
## Q3 320.0 13260.0 NA NA
## max 143840.0 143840.0 NA NA
## mean 568.3 14586.0 NA NA
## sd 3119.5 18400.7 NA NA
If we select flows greater than 500 commuters, we loose 98.3% of all links but only 55.1% of the volume of flows.
compmat(mat1 = myflows, mat2 = myflows * flowSel2, digits = 1)
## mat1 mat2 absdiff reldiff
## nblinks 5951.0 49.0 5902 99.2
## sumflows 3382029.0 1125976.0 2256053 66.7
## connectcompx 130.0 38.0 92 NA
## min 2.0 10060.0 NA NA
## Q1 20.0 11100.0 NA NA
## median 80.0 13680.0 NA NA
## Q3 320.0 19720.0 NA NA
## max 143840.0 143840.0 NA NA
## mean 568.3 22979.1 NA NA
## sd 3119.5 24256.7 NA NA
With a threshold of 10000 commuters, 99.2% of links are lost but only 66.7% of the volume of flows.
Notes: * nblinks: number of cells with values > 0 * density: number of links divided by number of possible links (also called gamma index by geographers), loops excluded * sumflows: sum of flows * A data.frame that provides statistics on differences between mat1 and mat2: absdiff are the absolute differences and reldiff are the relative differences (in percent).
# Remove the matrix diagonal
diag(myflows) <- 0
# Select flows that represent at least 20% of the sum of outgoing flows for
# each urban area. ( can select other methods )
flowSel1 <- firstflows(mat = myflows/rowSums(myflows)*100, method = "xfirst",
k = 20)
# Select the dominant flows (incoming flows criterion)
flowSel2 <- domflows(mat = myflows, w = colSums(myflows), k = 1)
# Combine selections
flowSel <- myflows * flowSel1 * flowSel2
# Node weights
inflows <- data.frame(id = colnames(myflows), w = colSums(myflows))
This is for us to plot the Singapore map shape
# mpsz <- st_read(dsn = "data/geospatial",
# layer = "MP14_SUBZONE_WEB_PL")
mpsz <- readShapeSpatial("data/geospatial/MP14_SUBZONE_WEB_PL.shp")
We are only filtering first 10000 records out of a few million because it takes infinitely long to knit
mpbus <- readShapeSpatial("data/BusStopLocation_Jan2020/BusStop.shp")
## Warning: readShapeSpatial is deprecated; use rgdal::readOGR or sf::st_read
## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read
#class(mpsz)
#mpsz
#library(maptools)
#mpsz <- readShapeSpatial("data/geospatial/MP14_SUBZONE_WEB_PL.shp")
#class(mpsz)
# plot the shape of Singapore
sp::plot(mpsz, col = "#cceae7", border = NA)
# plot the locations of the busstops in Singapore
sp::plot(mpbus, col = "#cceaf7")
# Plot dominant flows map
opar <- par(mar = c(0,0,2,0))
sp::plot(mpsz, col = "#cceae7", border = NA)
#mpsz_trans <- SpatialPointsDataFrame(coords = xy, data = mydf,
# proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
plotMapDomFlows(mat = flowSel, spdf = mpbus, spdfid = "BUS_STOP_N", w = inflows, wid = "id",
wvar = "w", wcex = 0.05, add = TRUE,
legend.flows.pos = "topright",
legend.flows.title = "Nb. of commuters")
title("Dominant Flows of Commuters")
mtext(text = "singapore bus,2020", side = 4, line = -1, adj = 0.01, cex = 0.8)
We have seen above that we can plot all the flows according to the requirements:“xfirst”, “nfirst”,“xsumfirst”. In our final model, we decided that outflows we go with xfirst, k=20. There is no right or wrong to our selection, is depending on our preference. Typically we would like to keep flows that constitude to a significant outflow of 20% from our nodes rather than smaller insignificant flows. Based on the map, we can see a quite a few dominant flows residing in North and central part of Singapore, and much less dominance in east and west side.
Show only relevant columns for this project We are retrieving the To and From BusStop Code, as well as the total number of trips between these 2 busstops across the whole month of Feb 2020
flow <- data %>%
select('DAY_TYPE','TIME_PER_HOUR','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS','Latitude_Origin','Longitude_Origin','Latitude_Destination','Longitude_Destination') %>%
unite(from_to, BusStopCode_x,BusStopCode_y, sep = "_", remove=FALSE) %>%
group_by(from_to) %>%
summarise(Frequency = sum(TOTAL_TRIPS))%>%
separate(from_to, c("from", "to"))
We are retrieving the origin busstop longitude and latitudes
location_ori <- data %>%
select('BusStopCode_x','Latitude_Origin','Longitude_Origin') %>%
distinct()
We are retrieving the destination busstop longitude and latitudes
location_dest <- data %>%
select('BusStopCode_y','Latitude_Destination','Longitude_Destination') %>%
distinct()
Merge the location data with the busstop frequency details for origin
origin_flow <- merge(x= flow, y=location_ori, by.x=c("from"),
by.y=c("BusStopCode_x"), all.x= TRUE)
Merge the location data with the busstop frequency details for origin
ori_dest_flow <- merge(x= origin_flow, y=location_dest, by.x=c("to"),
by.y=c("BusStopCode_y"), all.x= TRUE)
ggplot and gganimatexquiet<- scale_x_continuous("", breaks=NULL)
yquiet<-scale_y_continuous("", breaks=NULL)
quiet<-list(xquiet, yquiet)
p <- ggplot(ori_dest_flow, aes(Longitude_Origin, Latitude_Origin))+
ggtitle("Animated Connections of Bus Filtered by Frequency of Passengers")+
#The next line tells ggplot that we wish to plot line segments. The "alpha=" is line transparency and used below
geom_segment(aes(x=Longitude_Origin, y=Latitude_Origin,xend=Longitude_Destination, yend=Latitude_Destination, alpha=Frequency), col="white")+
#Here is the magic bit that sets line transparency - essential to make the plot readable
scale_alpha_continuous(range = c(0.2,1))+ labs(size = 'Frequency') +
#Set black background, ditch axes and fix aspect ratio
theme(panel.background = element_rect(fill='black',colour='black'))+quiet+coord_equal()
# ori_dest_flow[which(ori_dest_flow$Frequency>5000),]
p
gganimate to animate the graph via these filters as shown:#p + transition_time(Frequency) +
# labs(title = "Frequency: {frame_time}")
#anim <- p + transition_states(Frequency, transition_length = 1000, state_length=2) +
# ggtitle('Freq: {closest_state}', subtitle='Frame {frame} of {nframes}')
anim <- p + transition_filter(
"Freq >= 1000" = Frequency >= 1,
"Freq >= 1000" = Frequency >= 1000,
"Freq >= 5000" = Frequency >= 5000,
"Freq >= 10000" = Frequency >= 10000,
"Freq >= 50000" = Frequency >= 50000,
"Freq >= 100000" = Frequency >= 100000,
) +
ggtitle('Freq: {closest_filter}', subtitle='Frame {frame} of {nframes}')
anim
We can see majority of the network flow is from the town area. It is quite sparse in the west and Lim Chu Kang areas. At frequency >5000, we see that the dominating paths are from North to Central and Central to East.
NetworkD3We want to filter out Planning Area from Origin and Planning Area at Destination. In this following we wish to find out the flow chart between the planning areas
find out how many unique nodes are there in both the Origin and Destination
mpgeo <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\jia yi\Desktop\MITB\Sem2\ISSS608 Visual Analytics and its applications\DataVizMakeover\DataViz9\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
byPA <- mpgeo %>%
group_by(PLN_AREA_N) %>% summarize(count=n())
numPA <- n_distinct(byPA)
numPA
## [1] 55
flow2 <- data %>%
select('DAY_TYPE','TIME_PER_HOUR','Planning_Area_Origin','Planning_Area_Destination','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS','Latitude_Origin','Longitude_Origin','Latitude_Destination','Longitude_Destination') %>%
unite(from_toPA, Planning_Area_Origin,Planning_Area_Destination, sep = ":", remove=FALSE) %>%
group_by(from_toPA)%>%
summarise(Frequency = sum(TOTAL_TRIPS))%>%
separate(from_toPA, c("from_PA", "to_PA"),sep = ":")
#flow3 <- flow2 %>%
common <- c()
flow2 <- flow2[flow2$from_PA != flow2$to_PA,]
flow2 <- flow2[flow2$from_PA == c("ANG MO KIO","PUNGGOL","HOUGANG", "WOODLANDS","BEDOK"),]
flow2$Frequency_sqrt <- sapply(flow2$Frequency, function(x) sqrt(x))
print(flow2)
## # A tibble: 25 x 4
## from_PA to_PA Frequency Frequency_sqrt
## <chr> <chr> <int> <dbl>
## 1 ANG MO KIO BEDOK 1800 42.4
## 2 ANG MO KIO CLEMENTI 40 6.32
## 3 ANG MO KIO MANDAI 40 6.32
## 4 ANG MO KIO OUTRAM 148 12.2
## 5 ANG MO KIO ROCHOR 476 21.8
## 6 ANG MO KIO TANGLIN 218 14.8
## 7 BEDOK ANG MO KIO 660 25.7
## 8 BEDOK CHANGI 30520 175.
## 9 BEDOK KALLANG 2658 51.6
## 10 BEDOK OUTRAM 2408 49.1
## # ... with 15 more rows
# flow2$from_PA <- sapply(flow2$from_PA, function(x) paste('from_', x))
# flow2$to_PA <- sapply(flow2$to_PA, function(x) paste('to_', x))
NetworkD3name_vec <- c(unique(flow2$from_PA), unique(flow2$to_PA))
#name_vec_from <- lapply(name_vec, function(x) paste('to_', x))
#name_vec <- c(unique(name_vec))
#print(name_vec)
nodes <- data.frame(name = name_vec, id = 0:22)
#nodes_from$name <- sapply(nodes$name, function(x) paste('from_', x))
#nodes_to$name <- sapply(nodes$name, function(x) paste('from_', x))
#nodes_from_to <- rbind(nodes_from, nodes_to)
#
class(flow2)
## [1] "tbl_df" "tbl" "data.frame"
print(flow2)
## # A tibble: 25 x 4
## from_PA to_PA Frequency Frequency_sqrt
## <chr> <chr> <int> <dbl>
## 1 ANG MO KIO BEDOK 1800 42.4
## 2 ANG MO KIO CLEMENTI 40 6.32
## 3 ANG MO KIO MANDAI 40 6.32
## 4 ANG MO KIO OUTRAM 148 12.2
## 5 ANG MO KIO ROCHOR 476 21.8
## 6 ANG MO KIO TANGLIN 218 14.8
## 7 BEDOK ANG MO KIO 660 25.7
## 8 BEDOK CHANGI 30520 175.
## 9 BEDOK KALLANG 2658 51.6
## 10 BEDOK OUTRAM 2408 49.1
## # ... with 15 more rows
#print(nodes_from_to)
print(class(flow2))
## [1] "tbl_df" "tbl" "data.frame"
# links <- flow2 %>%
# left_join(nodes, by = c('from_PA' = 'name')) %>%
# rename(from_PA = id) %>%
# left_join(nodes, by = c('to_PA' = 'name')) %>%
# rename(to_PA = id)
links <- flow2 %>%
left_join(nodes,flow2, by=c('from_PA'='name')) %>%
rename(from_PA_id = id) %>%
left_join(nodes,flow2, by=c('to_PA'='name')) %>%
rename(to_PA_id = id)
forcenetwork <- forceNetwork(Links = links, Nodes = nodes, Source = 'from_PA_id', Target = 'to_PA_id',
Value = 'Frequency_sqrt', NodeID = 'name', Group = 'id', zoom = TRUE)
forcenetwork <- htmlwidgets::prependContent(forcenetwork,htmltools::tags$h1("Interactive Node flow Origin to Destination by PA"))
forcenetwork
The thickest edge (out of the points i selected) shows the frequency is highest between Changi and Bedok, while the thinest edge seems to be between Bedok and Woodlands. Since this not in R shiny, i can only hard code the nodes i want to visualise. We can however, select the nodes in R Shiny.
NetworkD3sankey <- sankeyNetwork(Links = links, Nodes = nodes, Source = 'from_PA_id', Target = 'to_PA_id',
Value = 'Frequency_sqrt', NodeID = 'name', fontSize = 16)
sankey <- htmlwidgets::prependContent(sankey, htmltools::tags$h1("Interactive Sankey flow Origin to Destination by PA"))
sankey
timeSeries <- data %>%
select('DAY_TYPE','TIME_PER_HOUR','Planning_Area_Origin','Planning_Area_Destination','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS','Latitude_Origin','Longitude_Origin','Latitude_Destination','Longitude_Destination') %>%
filter(Planning_Area_Origin %in% c('BEDOK','ANG MO KIO','HOUGANG','WOODLANDS')) %>%
filter(Planning_Area_Destination %in% c('YISHUN','ORCHARD','BUKIT TIMAH','CHANGI','SENGKANG')) %>%
#unite(from_to, BusStopCode_x,BusStopCode_y, sep = "_", remove=FALSE) %>%
group_by(Planning_Area_Origin,Planning_Area_Destination,TIME_PER_HOUR) %>%
summarise(Frequency = sum(TOTAL_TRIPS))%>%
arrange(TIME_PER_HOUR)
#separate(from_to, c("from", "to"))
As expected, thickest line is between Changi to Bedok, while thinest line is between Queenstown to Hougang, where line is proportional to frequency of passengers.
#timeSeries= timeSeries[timeSeries$Planning_Area_Origin == "ANG MO KIO",]
p <- ggplot(
timeSeries,
aes(TIME_PER_HOUR, Frequency, group = Planning_Area_Destination, color = factor(Planning_Area_Destination))
) +
geom_line() +
scale_y_continuous(trans='log10')+
scale_color_viridis_d() +
labs(x = "Time of the day (hr)", y = "Average Frequency across the month") +
theme(legend.position = "top")
p
## Analysis for Static plot for Frequency against time of day This is a plot of frequency of passengers alighting. ( did not plot from where) Out of the PA we selected, the most number of passengers in Singapore is at Sengkang, and at 11 am the least number of passengers is at Bukit Timah.
gganimatep +
facet_wrap(~Planning_Area_Origin) +
geom_point() +
transition_reveal(TIME_PER_HOUR)
## Analysis on the animated time series analysis We see that each facet is based on the origin location, and each line is the destination location. For the 10000 records ( we filtered first 10000 records in the Jan 2020 data), we see that thoroughout the day the travel from Hougang to Sengkang is the highest. This is not surprising as Sengkang is an interchange point. The travel from Orchard to Bedok increases sharply from 7am and stays quite stagnant throughout the day.
Animated graph allows for users to see trend more easily as all the items are shown only at every time step. It allows users to focus at each time frame/ each filter, reduce the problem of inconsistency. One example of inconsistency is when we take average, it may seem surprisingly high when in fact it could be due to anomaly rather than being a really high value. Anomalies could happen and time series data can allow users to see at which time point there is anomaly. For instance, going to work hours may reflect surprisingly high volume at interchange, while very low volume off peak hours.
Interactive charts allow users to select and highlight area of interest and allow users to focus on parts of the graph rather than having too cluttered graph which makes it hard to read.
It increases user experience. Interactive graph like the ones from NetworkD3 hides away details when not hovered over, and only displays when we hover over, showing a cleaner display yet containing much more information
Interactive graph allows users to pose more questions that are answered by much less visual – plotting one interactive plot can answer many questions by switching selections/filters and mixing selections/filters based on user’s interest.
https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group https://tidyr.tidyverse.org/reference/separate.html https://d4tagirl.com/2017/05/how-to-plot-animated-maps-with-gganimate https://spatial.ly/2015/03/mapping-flows/ http://personal.tcu.edu/kylewalker/interactive-flow-visualization-in-r.html https://medium.com/@mueller.johannes.j/use-r-and-gganimate-to-make-an-animated-map-of-european-students-and-their-year-abroad-517ad75dca06