1 Purpose and components + Ways to add animation / interactivity

The purpose of our project stems from the fact that we are interested to find out connectivity between buses in Singapore and to find out which are the more dominant or those bus stops that are dominated. These information could help LTA or other agencies plan for better connectivity measures such as increasing frequency of buses to certain areas, or creating more bus stops in others.

I came out with 5 dominant maps to address different interests. (Refer to the Figure labels in Sketch) Figure 1 will address the concerns of which flows are more dominant. We will remove flows that are less dominant. Remaining flow paths are those that are potentially carrying more passengers. Thickness of the graph indicates the frequency of passengers between any 2 busstops in Singapore.

Figure 2 will address connections between busstops in Singapore. This is an animated graph which will show all the connections for busstops. This uses ‘gganimate’ to allow the graph to iterate through bus connections with frequency >1, >1000, >5000 and so on.

Figure 3 is an interactive node graph classified by Planning Area. Each of the coloured node represents a planning area. I have filtered to some PA because there are too many within the picture it will look too cluttered. When user hover over the node, they can see which PA it belongs to. The edges represents the connection while the thickness of the edge represents the frequency. Frequency is transformed such that the edges does not look thicker than the nodes.

Figure 4 is an interactive Sankey diagram also classified by Planning Area. If there are buses travelling within its own planning area, it can also be reflected in the Sankey diagram. Similarly, thickness of the flow represents the frequency of passengers. Frequency is transformed such that the edges look reasonable sized, but still retains proportion. Similarly, i also filtered out some PA for viewing or it will be too cluttered.

Figure 5 is a animated time series facet plot, showing only some selected PA. The facets are facetted by origin PA, while the coloured lines in each box are classified by Destination PA.

Sketch 1-4 Sketch 5

2 Creating Visuals

Overview: Importing packages

packages = c('flows','sp','tidyverse','st','sf','maptools','networkD3','gganimate','leaflet','RColorBrewer')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Import my dataset

data<- read.csv("data/origin_subset_10000.csv")

Show first 10 row of data

head(data,10)

##    X YEAR_MONTH         DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE
## 1  0    2020-01 WEEKENDS/HOLIDAY            16     BUS           4168
## 2  1    2020-01          WEEKDAY            16     BUS           4168
## 3  2    2020-01 WEEKENDS/HOLIDAY            14     BUS          80119
## 4  3    2020-01          WEEKDAY            14     BUS          80119
## 5  4    2020-01          WEEKDAY            17     BUS          20281
## 6  5    2020-01 WEEKENDS/HOLIDAY            17     BUS          20281
## 7  6    2020-01          WEEKDAY             7     BUS          19051
## 8  7    2020-01 WEEKENDS/HOLIDAY            17     BUS          11169
## 9  8    2020-01          WEEKDAY            14     BUS          81049
## 10 9    2020-01 WEEKENDS/HOLIDAY            14     BUS          81049
##    DESTINATION_PT_CODE TOTAL_TRIPS BusStopCode_x    RoadName_Origin
## 1                10051          80          4168      Nth Bridge Rd
## 2                10051          60          4168      Nth Bridge Rd
## 3                90079          80         80119           Sims Way
## 4                90079         240         80119           Sims Way
## 5                20141         260         20281 West Coast Highway
## 6                20141          40         20281 West Coast Highway
## 7                10017          20         19051           Dover Rd
## 8                 4219          18         11169       C'wealth Ave
## 9                70161          40         81049         Geylang Rd
## 10               70161          20         81049         Geylang Rd
##          Description_Origin Planning_Area_Origin Latitude_Origin
## 1  Aft City Hall Stn Exit B        DOWNTOWN CORE        1.292082
## 2  Aft City Hall Stn Exit B        DOWNTOWN CORE        1.292082
## 3            Aft Geylang Rd              GEYLANG        1.309921
## 4            Aft Geylang Rd              GEYLANG        1.309921
## 5  Opp The Japanese Sec Sch             CLEMENTI        1.312342
## 6  Opp The Japanese Sec Sch             CLEMENTI        1.312342
## 7          New Town Sec Sch           QUEENSTOWN        1.308964
## 8     C'Wealth Stn Exit B/C           QUEENSTOWN        1.302705
## 9        Aft Lor 34 Geylang              GEYLANG        1.314076
## 10       Aft Lor 34 Geylang              GEYLANG        1.314076
##    Longitude_Origin BusStopCode_y RoadName_Destination
## 1          103.8513         10051         Jln Bt Merah
## 2          103.8513         10051         Jln Bt Merah
## 3          103.8760         90079            Rhu Cross
## 4          103.8760         90079            Rhu Cross
## 5          103.7548         20141        West Coast Rd
## 6          103.7548         20141        West Coast Rd
## 7          103.7739         10017       Eu Tong Sen St
## 8          103.7984          4219      River Valley Rd
## 9          103.8874         70161           Merpati Rd
## 10         103.8874         70161           Merpati Rd
##    Description_Destination Planning_Area_Destination Latitude_Destination
## 1                  Blk 149               BUKIT MERAH             1.277412
## 2                  Blk 149               BUKIT MERAH             1.277412
## 3      Opp Costa Rhu Condo                   KALLANG             1.297607
## 4      Opp Costa Rhu Condo                   KALLANG             1.297607
## 5              Opp Blk 408               JURONG EAST             1.317720
## 6              Opp Blk 408               JURONG EAST             1.317720
## 7              Aft Hosp Dr               BUKIT MERAH             1.278320
## 8          Opp Clarke Quay                    MUSEUM             1.290968
## 9    OPP MATTAR STN EXIT A                   GEYLANG             1.327586
## 10   OPP MATTAR STN EXIT A                   GEYLANG             1.327586
##    Longitude_Destination
## 1               103.8321
## 2               103.8321
## 3               103.8673
## 4               103.8673
## 5               103.7488
## 6               103.7488
## 7               103.8376
## 8               103.8469
## 9               103.8832
## 10              103.8832

Plot Static Flow map by Dominance using ‘FLOWS’ package

Explanation of geographic flows: issues and methods

In the field of spatial analysis, working on flows supposes to focus on the relationships between places rather than on their characteristics. Analysis and flow representation often assume a selection to ease the interpretation.

One of the first method developed was the so-called dominant flows (or nodal regions) proposed by Nystuen and Dacey in 1961 (Dacey (1961)). According to this method, a place i is dominated by a place j if two conditions are met:

the most important flow from i is emitted towards j;
the sum of the flows received by j is greater than the sum of the flows received by i.

This method creates what is called in graph theory a tree (acyclic graph) or a forest (a set of unconnected trees) with three types of nodes: dominant, dominated and intermediate. Various methods have subsequently been proposed to better reflect this intensity.Analysing commuters data between cities, one may choose to select:

all flows greater than 100;

the 50 first flows (global criterion);

the 10 first flows emitted by each city (local criterion).

These criteria can also be expressed in relative form:

flows that represent more than 10% of the active population of each city (local criterion);
flows that take into account 80% of all commuters (global criterion).

Data Preparation for my FLOWS graph

Show only relevant columns for this project

flow <- data %>%
  select('DAY_TYPE','TIME_PER_HOUR','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS') %>%
  unite(from_to, BusStopCode_x,BusStopCode_y, sep = "_", remove=FALSE) %>%
  group_by(from_to) %>% 
  summarise(Frequency = sum(TOTAL_TRIPS))%>%
  separate(from_to, c("from", "to"))

To show the flow in terms of count between 2 busstop codes:

# Prepare data
myflows <- prepflows(mat = flow, i = "from", j = "to", fij = "Frequency")
myflows[1:4,1:4]

##       10009 10011 10017 10018
## 10009     0     0     0     0
## 10011     0     0     0     0
## 10017     0     0     0     0
## 10018     0     0     0     0

The statmat function provides various indicators and graphical outputs on a flow matrix to allow statistically relevant selections. Measures provided are density (number of present flows divided by the number of possible flows); number, size and composition of connected components; sum, quartiles and average intensity of flows. In addition, four graphics can be plotted: degree distribution curve (by default, outdegree), weighted degree distribution curve, Lorenz curve and boxplot on flow intensities.

# Get statistics about the matrix
statmat(mat = myflows, output = "none", verbose = TRUE)

## matrix dimension: 3372 X 3372 
## nb. links: 5951 
## density: 0.0005235325 
## nb. of components (weak) 130 
## nb. of components (weak, size > 1) 130 
## sum of flows: 3382029 
## min: 2 
## Q1: 20 
## median: 80 
## Q3: 320 
## max: 143840 
## mean: 568.3127 
## sd: 3119.456

# Plot Lorenz curve only
statmat(mat = myflows, output = "lorenz", verbose = FALSE)

# Graphics only
statmat(mat = myflows, output = "all", verbose = FALSE)

# Statistics only
mystats <- statmat(mat = myflows, output = "none", verbose = FALSE)
str(mystats)

## List of 16
##  $ matdim      : int [1:2] 3372 3372
##  $ nblinks     : num 5951
##  $ density     : num 0.000524
##  $ connectcomp : int 130
##  $ connectcompx: int 130
##  $ sizecomp    :'data.frame':    130 obs. of  3 variables:
##   ..$ idcomp  : int [1:130] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ sizecomp: num [1:130] 3062 2 2 2 2 ...
##   ..$ wcomp   : num [1:130] 3140906 60 20 100 160 ...
##  $ compocomp   :'data.frame':    3372 obs. of  2 variables:
##   ..$ id    : chr [1:3372] "10009" "10011" "10017" "10018" ...
##   ..$ idcomp: num [1:3372] 1 1 1 1 1 1 1 1 1 1 ...
##  $ degrees     :'data.frame':    3372 obs. of  3 variables:
##   ..$ id     : chr [1:3372] "10009" "10011" "10017" "10018" ...
##   ..$ degree : num [1:3372] 12 4 10 3 2 3 4 1 3 2 ...
##   ..$ wdegree: num [1:3372] 5382 1640 592 350 60 ...
##  $ sumflows    : num 3382029
##  $ min         : num 2
##  $ Q1          : num 20
##  $ median      : num 80
##  $ Q3          : num 320
##  $ max         : num 143840
##  $ mean        : num 568
##  $ sd          : num 3119

# Sum of flows
mystats$sumflows

## [1] 3382029

To ease comparisons, the compmat function outputs a data.frame that provides statistics on differences between two matrices (for example a matrix and selection of this matrix).

Visualisation helps analysis, plotDomFlows function produces a graph where sizes and colors of vertices depend on their position in the graph (dominant, intermediate or dominated) and thicknesses of links depend on flow intensites.

The plotMapDomFlows function maps the selected flows according to the same principles. Both functions only apply to a dominant flows selection1.

Number of commuters in Jan 2020 between any 2 busstops

We compare two different thresholds (5000 and 10000) on the total volume of flows.

# Remove the matrix diagonal
diag(myflows) <- 0

# Selection of flows > 500
flowSel1 <- firstflowsg(mat = myflows, method = "xfirst", k = 5000)
# Selection of flows > 1000
flowSel2 <- firstflowsg(mat = myflows, method = "xfirst", k = 10000)

Find the links where flow = 5000

# Compare initial matrix and selected matrices
compmat(mat1 = myflows, mat2 = myflows * flowSel1, digits = 1)

##                   mat1      mat2 absdiff reldiff
## nblinks         5951.0     104.0    5847    98.3
## sumflows     3382029.0 1516940.0 1865089    55.1
## connectcompx     130.0      74.0      56      NA
## min                2.0    5060.0      NA      NA
## Q1                20.0    7155.0      NA      NA
## median            80.0    9290.0      NA      NA
## Q3               320.0   13260.0      NA      NA
## max           143840.0  143840.0      NA      NA
## mean             568.3   14586.0      NA      NA
## sd              3119.5   18400.7      NA      NA

If we select flows greater than 500 commuters, we loose 98.3% of all links but only 55.1% of the volume of flows.

Find the links where flow = 10000

compmat(mat1 = myflows, mat2 = myflows * flowSel2, digits = 1)

##                   mat1      mat2 absdiff reldiff
## nblinks         5951.0      49.0    5902    99.2
## sumflows     3382029.0 1125976.0 2256053    66.7
## connectcompx     130.0      38.0      92      NA
## min                2.0   10060.0      NA      NA
## Q1                20.0   11100.0      NA      NA
## median            80.0   13680.0      NA      NA
## Q3               320.0   19720.0      NA      NA
## max           143840.0  143840.0      NA      NA
## mean             568.3   22979.1      NA      NA
## sd              3119.5   24256.7      NA      NA

With a threshold of 10000 commuters, 99.2% of links are lost but only 66.7% of the volume of flows.

Notes: * nblinks: number of cells with values > 0 * density: number of links divided by number of possible links (also called gamma index by geographers), loops excluded * sumflows: sum of flows * A data.frame that provides statistics on differences between mat1 and mat2: absdiff are the absolute differences and reldiff are the relative differences (in percent).

# Remove the matrix diagonal
diag(myflows) <- 0

# Select flows that represent at least 20% of the sum of outgoing flows for 
# each urban area. ( can select other methods )
flowSel1 <- firstflows(mat = myflows/rowSums(myflows)*100, method = "xfirst", 
                       k = 20)


# Select the dominant flows (incoming flows criterion)
flowSel2 <- domflows(mat = myflows, w = colSums(myflows), k = 1)

# Combine selections
flowSel <- myflows * flowSel1 * flowSel2

# Node weights
inflows <- data.frame(id = colnames(myflows), w = colSums(myflows))

Get Geospatial data of Singapore

This is for us to plot the Singapore map shape

# mpsz <- st_read(dsn = "data/geospatial", 
#                 layer = "MP14_SUBZONE_WEB_PL")
mpsz <- readShapeSpatial("data/geospatial/MP14_SUBZONE_WEB_PL.shp")

We are only filtering first 10000 records out of a few million because it takes infinitely long to knit

mpbus <- readShapeSpatial("data/BusStopLocation_Jan2020/BusStop.shp")

## Warning: readShapeSpatial is deprecated; use rgdal::readOGR or sf::st_read

## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read

#class(mpsz)
#mpsz

 #library(maptools)
 #mpsz <- readShapeSpatial("data/geospatial/MP14_SUBZONE_WEB_PL.shp")
 #class(mpsz)
 
# plot the shape of Singapore
 sp::plot(mpsz, col = "#cceae7", border = NA)

# plot the locations of the busstops in Singapore
 sp::plot(mpbus, col = "#cceaf7")

Flow for DOMINANT, intermediate and Dominated busstops

# Plot dominant flows map
opar <- par(mar = c(0,0,2,0))
sp::plot(mpsz, col = "#cceae7", border = NA)

#mpsz_trans <- SpatialPointsDataFrame(coords = xy, data = mydf,
#                               proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))


plotMapDomFlows(mat = flowSel, spdf = mpbus, spdfid = "BUS_STOP_N", w = inflows, wid = "id",
                wvar = "w", wcex = 0.05, add = TRUE,
                legend.flows.pos = "topright",
                legend.flows.title = "Nb. of commuters")
title("Dominant Flows of Commuters")
mtext(text = "singapore bus,2020", side = 4, line = -1, adj = 0.01, cex = 0.8)

Analysis of DOM flow chart

We have seen above that we can plot all the flows according to the requirements:“xfirst”, “nfirst”,“xsumfirst”. In our final model, we decided that outflows we go with xfirst, k=20. There is no right or wrong to our selection, is depending on our preference. Typically we would like to keep flows that constitude to a significant outflow of 20% from our nodes rather than smaller insignificant flows. Based on the map, we can see a quite a few dominant flows residing in North and central part of Singapore, and much less dominance in east and west side.

Data Preparation for Figures 2 to 5

Show only relevant columns for this project We are retrieving the To and From BusStop Code, as well as the total number of trips between these 2 busstops across the whole month of Feb 2020

flow <- data %>%
  select('DAY_TYPE','TIME_PER_HOUR','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS','Latitude_Origin','Longitude_Origin','Latitude_Destination','Longitude_Destination') %>%
  unite(from_to, BusStopCode_x,BusStopCode_y, sep = "_", remove=FALSE) %>%
  group_by(from_to) %>% 
  summarise(Frequency = sum(TOTAL_TRIPS))%>%
  separate(from_to, c("from", "to"))

We are retrieving the origin busstop longitude and latitudes

location_ori <- data %>%
  select('BusStopCode_x','Latitude_Origin','Longitude_Origin') %>%
  distinct()

We are retrieving the destination busstop longitude and latitudes

location_dest <- data %>%
  select('BusStopCode_y','Latitude_Destination','Longitude_Destination') %>%
  distinct()

Merge the location data with the busstop frequency details for origin

origin_flow <- merge(x= flow, y=location_ori, by.x=c("from"),
                     by.y=c("BusStopCode_x"), all.x= TRUE)

Merge the location data with the busstop frequency details for origin

ori_dest_flow <- merge(x= origin_flow, y=location_dest, by.x=c("to"),
                     by.y=c("BusStopCode_y"), all.x= TRUE)

Plot Animated network graph for busstops using `ggplot` and `gganimate`

Now for plotting with ggplot2.This first step removes the axes in the resulting plot.

xquiet<- scale_x_continuous("", breaks=NULL)
yquiet<-scale_y_continuous("", breaks=NULL)
quiet<-list(xquiet, yquiet)

Creating the geometric layers

p <- ggplot(ori_dest_flow, aes(Longitude_Origin, Latitude_Origin))+
  ggtitle("Animated Connections of Bus Filtered by Frequency of Passengers")+
#The next line tells ggplot that we wish to plot line segments. The "alpha=" is line transparency and used below 
geom_segment(aes(x=Longitude_Origin, y=Latitude_Origin,xend=Longitude_Destination, yend=Latitude_Destination, alpha=Frequency), col="white")+
#Here is the magic bit that sets line transparency - essential to make the plot readable
scale_alpha_continuous(range = c(0.2,1))+ labs(size = 'Frequency') +
#Set black background, ditch axes and fix aspect ratio 
theme(panel.background = element_rect(fill='black',colour='black'))+quiet+coord_equal()
# ori_dest_flow[which(ori_dest_flow$Frequency>5000),]
p

using `gganimate` to animate the graph via these filters as shown:

#p + transition_time(Frequency) +
#  labs(title = "Frequency: {frame_time}")

#anim <- p + transition_states(Frequency, transition_length = 1000, state_length=2) + 
#        ggtitle('Freq: {closest_state}', subtitle='Frame {frame} of {nframes}')

anim <- p + transition_filter(
                    "Freq >=  1000" = Frequency >= 1,
                    "Freq >=  1000" = Frequency >= 1000, 
                    "Freq >=  5000" = Frequency >= 5000, 
                    "Freq >=  10000" = Frequency >= 10000, 
                    "Freq >=  50000" = Frequency >= 50000, 
                    "Freq >=  100000" = Frequency >= 100000, 
                    ) + 
        ggtitle('Freq: {closest_filter}', subtitle='Frame {frame} of {nframes}')



anim

Analysis on the Animated Network chart

We can see majority of the network flow is from the town area. It is quite sparse in the west and Lim Chu Kang areas. At frequency >5000, we see that the dominating paths are from North to Central and Central to East.

Plot interactive maps using `NetworkD3`

Process the data

We want to filter out Planning Area from Origin and Planning Area at Destination. In this following we wish to find out the flow chart between the planning areas

find out how many unique nodes are there in both the Origin and Destination

mpgeo <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")

## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\jia yi\Desktop\MITB\Sem2\ISSS608 Visual Analytics and its applications\DataVizMakeover\DataViz9\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs

To get number of nodes, we need to know number of distinct PA

byPA <- mpgeo %>%
  group_by(PLN_AREA_N) %>% summarize(count=n())
numPA <- n_distinct(byPA)
numPA

## [1] 55

Group by Planning Area

flow2 <- data %>%
  select('DAY_TYPE','TIME_PER_HOUR','Planning_Area_Origin','Planning_Area_Destination','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS','Latitude_Origin','Longitude_Origin','Latitude_Destination','Longitude_Destination') %>%
  unite(from_toPA, Planning_Area_Origin,Planning_Area_Destination, sep = ":", remove=FALSE) %>%
  group_by(from_toPA)%>%
  summarise(Frequency = sum(TOTAL_TRIPS))%>%
  separate(from_toPA, c("from_PA", "to_PA"),sep = ":")
  
#flow3 <- flow2 %>%
common <- c()

flow2 <- flow2[flow2$from_PA != flow2$to_PA,]

flow2 <- flow2[flow2$from_PA == c("ANG MO KIO","PUNGGOL","HOUGANG", "WOODLANDS","BEDOK"),]
flow2$Frequency_sqrt <- sapply(flow2$Frequency, function(x) sqrt(x))
print(flow2)

## # A tibble: 25 x 4
##    from_PA    to_PA      Frequency Frequency_sqrt
##    <chr>      <chr>          <int>          <dbl>
##  1 ANG MO KIO BEDOK           1800          42.4 
##  2 ANG MO KIO CLEMENTI          40           6.32
##  3 ANG MO KIO MANDAI            40           6.32
##  4 ANG MO KIO OUTRAM           148          12.2 
##  5 ANG MO KIO ROCHOR           476          21.8 
##  6 ANG MO KIO TANGLIN          218          14.8 
##  7 BEDOK      ANG MO KIO       660          25.7 
##  8 BEDOK      CHANGI         30520         175.  
##  9 BEDOK      KALLANG         2658          51.6 
## 10 BEDOK      OUTRAM          2408          49.1 
## # ... with 15 more rows

# flow2$from_PA <- sapply(flow2$from_PA, function(x) paste('from_', x))
# flow2$to_PA <- sapply(flow2$to_PA, function(x) paste('to_', x))

Plot Interactive Node map using `NetworkD3`

name_vec <- c(unique(flow2$from_PA), unique(flow2$to_PA))
#name_vec_from <-  lapply(name_vec, function(x) paste('to_', x))
#name_vec <- c(unique(name_vec))
#print(name_vec)

nodes <- data.frame(name = name_vec, id = 0:22)
#nodes_from$name <-  sapply(nodes$name, function(x) paste('from_', x))
#nodes_to$name  <-  sapply(nodes$name, function(x) paste('from_', x))
#nodes_from_to <- rbind(nodes_from, nodes_to)
#
class(flow2)

## [1] "tbl_df"     "tbl"        "data.frame"

print(flow2)

## # A tibble: 25 x 4
##    from_PA    to_PA      Frequency Frequency_sqrt
##    <chr>      <chr>          <int>          <dbl>
##  1 ANG MO KIO BEDOK           1800          42.4 
##  2 ANG MO KIO CLEMENTI          40           6.32
##  3 ANG MO KIO MANDAI            40           6.32
##  4 ANG MO KIO OUTRAM           148          12.2 
##  5 ANG MO KIO ROCHOR           476          21.8 
##  6 ANG MO KIO TANGLIN          218          14.8 
##  7 BEDOK      ANG MO KIO       660          25.7 
##  8 BEDOK      CHANGI         30520         175.  
##  9 BEDOK      KALLANG         2658          51.6 
## 10 BEDOK      OUTRAM          2408          49.1 
## # ... with 15 more rows

#print(nodes_from_to)


print(class(flow2))

## [1] "tbl_df"     "tbl"        "data.frame"

# links <- flow2 %>%
#   left_join(nodes, by = c('from_PA' = 'name'))  %>%
#   rename(from_PA = id) %>%
#   left_join(nodes, by = c('to_PA' = 'name')) %>%
#   rename(to_PA = id)

links <- flow2 %>%
  left_join(nodes,flow2, by=c('from_PA'='name')) %>%
  rename(from_PA_id = id) %>%
  left_join(nodes,flow2, by=c('to_PA'='name')) %>%
  rename(to_PA_id = id)


forcenetwork <- forceNetwork(Links = links, Nodes = nodes, Source = 'from_PA_id', Target = 'to_PA_id', 
             Value = 'Frequency_sqrt', NodeID = 'name', Group = 'id', zoom = TRUE)
forcenetwork <- htmlwidgets::prependContent(forcenetwork,htmltools::tags$h1("Interactive Node flow Origin to Destination by PA"))
forcenetwork

Interactive Node flow Origin to Destination by PA

Analysis on the Node Interactive Chart

The thickest edge (out of the points i selected) shows the frequency is highest between Changi and Bedok, while the thinest edge seems to be between Bedok and Woodlands. Since this not in R shiny, i can only hard code the nodes i want to visualise. We can however, select the nodes in R Shiny.

Plot Interactive Sankey map using `NetworkD3`

sankey <- sankeyNetwork(Links = links, Nodes = nodes, Source = 'from_PA_id', Target = 'to_PA_id', 
              Value = 'Frequency_sqrt', NodeID = 'name', fontSize = 16)
sankey <- htmlwidgets::prependContent(sankey, htmltools::tags$h1("Interactive Sankey flow Origin to Destination by PA"))
sankey

Interactive Sankey flow Origin to Destination by PA

timeSeries <- data %>%
  select('DAY_TYPE','TIME_PER_HOUR','Planning_Area_Origin','Planning_Area_Destination','BusStopCode_x','BusStopCode_y','TOTAL_TRIPS','Latitude_Origin','Longitude_Origin','Latitude_Destination','Longitude_Destination') %>%
  filter(Planning_Area_Origin %in% c('BEDOK','ANG MO KIO','HOUGANG','WOODLANDS')) %>%
  filter(Planning_Area_Destination %in% c('YISHUN','ORCHARD','BUKIT TIMAH','CHANGI','SENGKANG')) %>%
  #unite(from_to, BusStopCode_x,BusStopCode_y, sep = "_", remove=FALSE) %>%
  group_by(Planning_Area_Origin,Planning_Area_Destination,TIME_PER_HOUR) %>% 
  summarise(Frequency = sum(TOTAL_TRIPS))%>%
  arrange(TIME_PER_HOUR)
  #separate(from_to, c("from", "to"))

Analysis on the Sankey Interactive plot

As expected, thickest line is between Changi to Bedok, while thinest line is between Queenstown to Hougang, where line is proportional to frequency of passengers.

Static plot for timeseries

#timeSeries= timeSeries[timeSeries$Planning_Area_Origin == "ANG MO KIO",]
p <- ggplot(
  timeSeries,
  aes(TIME_PER_HOUR, Frequency, group = Planning_Area_Destination, color = factor(Planning_Area_Destination))
  ) +
  geom_line() +
  scale_y_continuous(trans='log10')+
  scale_color_viridis_d() +
  labs(x = "Time of the day (hr)", y = "Average Frequency across the month") +
  theme(legend.position = "top")
p

## Analysis for Static plot for Frequency against time of day This is a plot of frequency of passengers alighting. ( did not plot from where) Out of the PA we selected, the most number of passengers in Singapore is at Sengkang, and at 11 am the least number of passengers is at Bukit Timah.

Animated plot for timeseries using `gganimate`

p + 
  facet_wrap(~Planning_Area_Origin) +
  geom_point() +
  transition_reveal(TIME_PER_HOUR)

## Analysis on the animated time series analysis We see that each facet is based on the origin location, and each line is the destination location. For the 10000 records ( we filtered first 10000 records in the Jan 2020 data), we see that thoroughout the day the travel from Hougang to Sengkang is the highest. This is not surprising as Sengkang is an interchange point. The travel from Orchard to Bedok increases sharply from 7am and stays quite stagnant throughout the day.

Reflection

Animated graph allows for users to see trend more easily as all the items are shown only at every time step. It allows users to focus at each time frame/ each filter, reduce the problem of inconsistency. One example of inconsistency is when we take average, it may seem surprisingly high when in fact it could be due to anomaly rather than being a really high value. Anomalies could happen and time series data can allow users to see at which time point there is anomaly. For instance, going to work hours may reflect surprisingly high volume at interchange, while very low volume off peak hours.
Interactive charts allow users to select and highlight area of interest and allow users to focus on parts of the graph rather than having too cluttered graph which makes it hard to read.
It increases user experience. Interactive graph like the ones from NetworkD3 hides away details when not hovered over, and only displays when we hover over, showing a cleaner display yet containing much more information
Interactive graph allows users to pose more questions that are answered by much less visual – plotting one interactive plot can answer many questions by switching selections/filters and mixing selections/filters based on user’s interest.

References

https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group https://tidyr.tidyverse.org/reference/separate.html https://d4tagirl.com/2017/05/how-to-plot-animated-maps-with-gganimate https://spatial.ly/2015/03/mapping-flows/ http://personal.tcu.edu/kylewalker/interactive-flow-visualization-in-r.html https://medium.com/@mueller.johannes.j/use-r-and-gganimate-to-make-an-animated-map-of-european-students-and-their-year-abroad-517ad75dca06

DataVizMO9

Chan Jia Yi

3/27/2020 (updated : 2020-03-29)

1 Purpose and components + Ways to add animation / interactivity

2 Creating Visuals

Overview: Importing packages

Import my dataset

Plot Static Flow map by Dominance using ‘FLOWS’ package

Explanation of geographic flows: issues and methods

Data Preparation for my FLOWS graph

Number of commuters in Jan 2020 between any 2 busstops

Find the links where flow = 5000

Find the links where flow = 10000

Get Geospatial data of Singapore

Flow for DOMINANT, intermediate and Dominated busstops

Analysis of DOM flow chart

Data Preparation for Figures 2 to 5

Plot Animated network graph for busstops using `ggplot` and `gganimate`

Now for plotting with ggplot2.This first step removes the axes in the resulting plot.

Creating the geometric layers

using `gganimate` to animate the graph via these filters as shown:

Analysis on the Animated Network chart

Plot interactive maps using `NetworkD3`

Process the data

To get number of nodes, we need to know number of distinct PA

Group by Planning Area

Plot Interactive Node map using `NetworkD3`

Interactive Node flow Origin to Destination by PA

Analysis on the Node Interactive Chart

Plot Interactive Sankey map using `NetworkD3`

Interactive Sankey flow Origin to Destination by PA

Analysis on the Sankey Interactive plot

Static plot for timeseries

Animated plot for timeseries using `gganimate`

Reflection

References

DataVizMO9

Chan Jia Yi

3/27/2020 (updated : 2020-03-29)

1 Purpose and components + Ways to add animation / interactivity

2 Creating Visuals

Overview: Importing packages

Import my dataset

Plot Static Flow map by Dominance using ‘FLOWS’ package

Explanation of geographic flows: issues and methods

Data Preparation for my FLOWS graph

Number of commuters in Jan 2020 between any 2 busstops

Find the links where flow = 5000

Find the links where flow = 10000

Get Geospatial data of Singapore

Flow for DOMINANT, intermediate and Dominated busstops

Analysis of DOM flow chart

Data Preparation for Figures 2 to 5

Plot Animated network graph for busstops using ggplot and gganimate

Now for plotting with ggplot2.This first step removes the axes in the resulting plot.

Creating the geometric layers

using gganimate to animate the graph via these filters as shown:

Analysis on the Animated Network chart

Plot interactive maps using NetworkD3

Process the data

To get number of nodes, we need to know number of distinct PA

Group by Planning Area

Plot Interactive Node map using NetworkD3

Interactive Node flow Origin to Destination by PA

Analysis on the Node Interactive Chart

Plot Interactive Sankey map using NetworkD3

Interactive Sankey flow Origin to Destination by PA

Analysis on the Sankey Interactive plot

Static plot for timeseries

Animated plot for timeseries using gganimate

Reflection

References

Plot Animated network graph for busstops using `ggplot` and `gganimate`

using `gganimate` to animate the graph via these filters as shown:

Plot interactive maps using `NetworkD3`

Plot Interactive Node map using `NetworkD3`

Plot Interactive Sankey map using `NetworkD3`

Animated plot for timeseries using `gganimate`