RPubs Link: https://rpubs.com/HarpreetK/612095

Overview

We are going to conduct a use-case to demonstrate the potential contribution of geospatial analytics in R to integrate, analyse and communicate the analysis results by using open data provided by different government agencies. The specific objectives of the study are as follow:

  1. Calibrating a simple linear regression to reveal the relation between public bus commuters’ flows (i.e. tap-in or tap-out) data and residential population at the planning sub-zone level.

  2. Performing spatial autocorrelation analysis on the residual of the regression model to test if the model conforms to the randomization assumption.

  3. Performing localized geospatial statistics analysis by using commuters’ tap-in and tap-out data to identify geographical clustering.

The data used in this assignment are as follow:

  1. Passenger Volume by BusStops (taken from LTA)

  2. Residental Population Data by Subzones (taken from Singstat)

  3. Location of Bus Stops (taken from LTA)

  4. Planning Subzone Levels (taken from URA)

Objective 1: Calibrating a simple linear regression to reveal the relation between public bus commuters’ flows (i.e. tap-in or tap-out) data and residential population at theplanning sub-zone level.

Getting Started

Installing & Loading Packages

packages = c('rgdal', 'sf', 'spdep', 'tmap', 'tidyverse', 'rgeos')

for (p in packages) {
  if (!require(p, character.only = T)) {
    install.packages(p)
  }
}

library(p, character.only = T)

Importing Data

passengervol <- read_csv("data/aspatial/passenger volume by busstop.csv")
## Parsed with column specification:
## cols(
##   YEAR_MONTH = col_character(),
##   DAY_TYPE = col_character(),
##   TIME_PER_HOUR = col_double(),
##   PT_TYPE = col_character(),
##   PT_CODE = col_character(),
##   TOTAL_TAP_IN_VOLUME = col_double(),
##   TOTAL_TAP_OUT_VOLUME = col_double()
## )
popdata <- read_csv("data/aspatial/respopagesextod2011to2019.csv")
## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )
subzones <- readOGR(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## OGR data source with driver: ESRI Shapefile 
## Source: "F:\IS415\IS415_Take-home_Ex01\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields
busstops <- readOGR(dsn = "data/geospatial", 
                    layer = "BusStop")
## OGR data source with driver: ESRI Shapefile 
## Source: "F:\IS415\IS415_Take-home_Ex01\data\geospatial", layer: "BusStop"
## with 5040 features
## It has 3 fields

Cleaning up and Adjusting Data

Getting population by Subzone in 2019

(Filtering for 2019 population data in each subzone.)

Pop_By_SZ_2019 <- popdata %>%
  filter(Time == 2019) %>%
  mutate_at(.vars = vars(PA, SZ), .funs = funs(toupper)) %>%
  select(`PA`, `SZ`,`Pop`) %>%
  group_by(`SZ`) %>%
  summarise(TOTALPOP = sum(`Pop`))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

Converting busstops and subzones to sf dataframes

subzones_sf <- st_as_sf(subzones, coords = c("XCOORD", "YCOORD"),
                       crs= 3414) 

busstops_sf <- st_as_sf(busstops, coords = c("XCOORD", "YCOORD"),
                       crs= 3414) 

Linking Busstops to Subzones

(Finding out the busstops for each subzone.)

Bus_Stops_At_SZ <- st_join(subzones_sf, busstops_sf)

Linking Tap-In and Tap-Out Data to Subzones

(Finding out the tapin/out volume at busstops and the subzones for the busstops.)

Tap_Vol_At_SZ <- inner_join(Bus_Stops_At_SZ, passengervol, by = c("BUS_STOP_N" = "PT_CODE"))
## Warning: Column `BUS_STOP_N`/`PT_CODE` joining factor and character vector,
## coercing into character vector

Grouping the tapin and tapout data by Subzones

(Finding the total vol of tapin/tapout for each subzone.)

Total_Tap_Per_SZ <- Tap_Vol_At_SZ %>%
  group_by(`SUBZONE_N`) %>%
  summarise(TOTAL_TAP_IN = sum(`TOTAL_TAP_IN_VOLUME`),
            TOTAL_TAP_OUT = sum(`TOTAL_TAP_OUT_VOLUME`))

Grouping Pop and TapVolData by Subzones

(Finding the population and vol of tapin/out for each subzone.)

TapVol_PopCount_SZ <- inner_join(Pop_By_SZ_2019, Total_Tap_Per_SZ, by = c("SZ"="SUBZONE_N"))
## Warning: Column `SZ`/`SUBZONE_N` joining character vector and factor, coercing
## into character vector

Simple Regression Model for Tap-In Data and Residential Population at the Subzone Level

LM_Tap_In <- lm(TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
summary(LM_Tap_In)
## 
## Call:
## lm(formula = TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -863762 -123199  -63290   34580 1950650 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.245e+05  2.107e+04   5.907 9.34e-09 ***
## TOTALPOP    1.924e+01  9.370e-01  20.535  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 297800 on 303 degrees of freedom
## Multiple R-squared:  0.5819, Adjusted R-squared:  0.5805 
## F-statistic: 421.7 on 1 and 303 DF,  p-value: < 2.2e-16
LM_Tap_In
## 
## Call:
## lm(formula = TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
## 
## Coefficients:
## (Intercept)     TOTALPOP  
##   124455.17        19.24

From the output above:

–> The estimated regression line equation can be written as follow: VOL_OF_TAP_IN = (19.24*POPULATION) + 124455.17

–> The intercept is 124455.17. It can be interpreted as the predicted number of tap-ins for a subzone with 0 people.

–> The regression coefficient for the variable population, also known as the slope, is 19.24. For every 1000 people, the VOL_OF_TAP_IN is estimated to be (19.24*1000 + 124455.17) = 12474.41

Visualising Regression Line For LM Model (Tap-In)

ggplot(TapVol_PopCount_SZ, aes(TOTALPOP, TOTAL_TAP_IN)) +
  geom_point() +
  stat_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'

There appears to be a positive linear relationship between Tap-In Data and Residential Population at the Subzone Level.

Simple Regression Model for Tap-Out Data and Residential Population at the Subzone Level

LM_Tap_Out <- lm(TOTAL_TAP_OUT ~ TOTALPOP, data = TapVol_PopCount_SZ)
summary(LM_Tap_Out)
## 
## Call:
## lm(formula = TOTAL_TAP_OUT ~ TOTALPOP, data = TapVol_PopCount_SZ)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -823156 -122534  -58009   31081 1644315 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.249e+05  2.040e+04   6.125 2.81e-09 ***
## TOTALPOP    1.916e+01  9.072e-01  21.114  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 288300 on 303 degrees of freedom
## Multiple R-squared:  0.5954, Adjusted R-squared:  0.594 
## F-statistic: 445.8 on 1 and 303 DF,  p-value: < 2.2e-16
LM_Tap_Out
## 
## Call:
## lm(formula = TOTAL_TAP_OUT ~ TOTALPOP, data = TapVol_PopCount_SZ)
## 
## Coefficients:
## (Intercept)     TOTALPOP  
##   124936.62        19.16

From the output above:

–> The estimated regression line equation can be written as follow: vOL_OF_TAP_OUT = (19.16*POPULATION) + 124936.62

–> The intercept is 124936.2. It can be interpreted as the predicted number of tap-outs for a subzone with 0 people.

–> The regression coefficient for the variable population, also known as the slope, is 19.16. For every 1000 people, the VOL_OF_TAP_IN is estimated to be (19.16*1000 + 124936.62) = 144096.62

Visualising Regression Line For LM Model (Tap-Out)

ggplot(TapVol_PopCount_SZ, aes(TOTALPOP, TOTAL_TAP_OUT)) +
  geom_point() +
  stat_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'

There appears to be a positive linear relationship between Tap-Out Data and Residential Population at the Subzone Level.

Converting tibble to sf

DF_TapVol_PopCount_SZ <- as.data.frame(TapVol_PopCount_SZ)

SF_TapVol_PopCount_SZ <- st_as_sf(DF_TapVol_PopCount_SZ)
class(SF_TapVol_PopCount_SZ)
## [1] "sf"         "data.frame"

Objective 2: Performing spatial autocorrelation analysis on the residual of the regression model to test if the model conforms to the randomization assumption.

Obtaining Residuals

(Obtaining residuals for tapin/out and adding them to the dataframe)

#TAP IN RESIDUALS
SF_TapVol_PopCount_SZ$Res_Tap_In <- residuals(LM_Tap_In)

#TAP OUT RESIDUALS
SF_TapVol_PopCount_SZ$Res_Tap_Out <- residuals(LM_Tap_Out)

Modelling Residuals

Residual Map for Tap In Residuals

qtm(SF_TapVol_PopCount_SZ, "Res_Tap_In")
## Warning: The shape SF_TapVol_PopCount_SZ is invalid. See sf::st_is_valid
## Variable(s) "Res_Tap_In" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Null Hypothesis: The residuals for the simple regression between tap-in data and residential population are randomly distributed by subzones.

Alternative Hypothesis: The residuals for the simple regression between tap-in data and residential population is randomly distributed by subzones.

Confidence Interval: 95%

Residual Map for Tap Out Residuals

qtm(SF_TapVol_PopCount_SZ, "Res_Tap_Out")
## Warning: The shape SF_TapVol_PopCount_SZ is invalid. See sf::st_is_valid
## Variable(s) "Res_Tap_Out" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Null Hypothesis: The residuals for the simple regression between tap-out data and residential population are randomly distributed by subzones.

Alternative Hypothesis: The residuals for the simple regression between tap-out data and residential population are randomly distributed by subzones.

Confidence Interval: 95%

Splitting Data Sets and Converting them to SpatialPolygonDataFrame

(Each SpatialPolygonDataFrame will show subzone,geometry and tapin/out data only.)

SF_TAP_IN <- SF_TapVol_PopCount_SZ %>%
  select(`SZ`, `Res_Tap_In`, `geometry`)

SP_TAP_IN <- as_Spatial(SF_TAP_IN)
  
SF_TAP_OUT <- SF_TapVol_PopCount_SZ %>%
  select(`SZ`, `Res_Tap_Out`, `geometry`)

SP_TAP_OUT <- as_Spatial(SF_TAP_OUT)

Modelling Spatial Neighbours (Residuals for Tap In Data)

Creating QUEENS Contiguity Based Neighbours

wm_q_tapin <- poly2nb(SP_TAP_IN, queen = TRUE)
summary(wm_q_tapin)
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  6 11 27 79 76 50 36 13  2  2  1  1  1 
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links

Plotting QUEENS Contiguity Based Neighbours

plot(SP_TAP_IN, border = "lightgrey")
plot(wm_q_tapin, coordinates(SP_TAP_IN), pch = 19, cex = 0.6, add = TRUE, col= "red")

Computing Distanced Based Neighbours

coord_tapin <- coordinates(SP_TAP_IN)
k1_tapin <- knn2nb(knearneigh(coord_tapin))
k1dists_tapin <- unlist(nbdists(k1_tapin, coord_tapin))
summary(k1dists_tapin)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   182.5   616.7   891.7   941.4  1169.9  5403.9

Computing Fixed Distance Weight Matrix

wm_tapin_d5500 <- dnearneigh(coord_tapin, 0, 5500)
wm_tapin_d5500
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574

Plotting Fixed Distance Weight Matrix

par(mfrow=c(1,2))
plot(SP_TAP_IN, border="lightgrey")
plot(k1_tapin, coord_tapin, add=TRUE, col="red", length=0.08, main="1st nearest neighbours")
plot(SP_TAP_IN, border="lightgrey")
plot(wm_tapin_d5500, coord_tapin, add=TRUE, pch = 19, cex = 0.5, main="Distance link")

Row-standaradised Weights Matrix

Queens

rswm_q_tapin <- nb2listw(wm_q_tapin, style="W", zero.policy = TRUE)
rswm_q_tapin
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0      S1       S2
## W 305 93025 305 106.114 1253.543

Fixed Distance Weight Matrix

dist_tapin <- nbdists(wm_tapin_d5500, coord_tapin)
ids_tapin <- lapply(dist_tapin, function(x) 1/(x))

rswm_tapin <- nb2listw(wm_tapin_d5500, glist = ids_tapin, style = "B", zero.policy = TRUE)
rswm_tapin
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574 
## 
## Weights style: B 
## Weights constants summary:
##     n    nn       S0          S1        S2
## B 305 93025 6.576281 0.008236356 0.7998781

Modelling Spatial Neighbours (Residuals for Tap Out Data)

Creating QUEENS Contiguity Based Neighbours

wm_q_tapout <- poly2nb(SP_TAP_OUT, queen = TRUE)
summary(wm_q_tapout)
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  6 11 27 79 76 50 36 13  2  2  1  1  1 
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links

Plotting QUEENS Contiguity Based Neighbours

plot(SP_TAP_OUT, border = "lightgrey")
plot(wm_q_tapout, coordinates(SP_TAP_OUT), pch = 19, cex = 0.6, add = TRUE, col= "red")

Computing Distanced Based Neighbours

coord_tapout <- coordinates(SP_TAP_OUT)
k1_tapout <- knn2nb(knearneigh(coord_tapout))
k1dists_tapout <- unlist(nbdists(k1_tapout, coord_tapout))
summary(k1dists_tapout)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   182.5   616.7   891.7   941.4  1169.9  5403.9

Computing Fixed Distance Weight Matrix

wm_tapout_d5500 <- dnearneigh(coord_tapout, 0, 5500)
wm_tapout_d5500
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574

Plotting Fixed Distance Weight Matrix

par(mfrow=c(1,2))
plot(SP_TAP_OUT, border="lightgrey")
plot(k1_tapout, coord_tapout, add=TRUE, col="red", length=0.08, main="1st nearest neighbours")
plot(SP_TAP_OUT, border="lightgrey")
plot(wm_tapout_d5500, coord_tapout, add=TRUE, pch = 19, cex = 0.5, main="Distance link")

Row-standardised Weights Matrix

Queens

rswm_q_tapout <- nb2listw(wm_q_tapout, style="W", zero.policy = TRUE)
rswm_q_tapout
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0      S1       S2
## W 305 93025 305 106.114 1253.543

Fixed Distance Weight Matrix

dist_tapout <- nbdists(wm_tapout_d5500, coord_tapout)
ids_tapout <- lapply(dist_tapout, function(x) 1/(x))

rswm_tapout <- nb2listw(wm_tapout_d5500, glist = ids_tapout, style = "B", zero.policy = TRUE)
rswm_tapout
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574 
## 
## Weights style: B 
## Weights constants summary:
##     n    nn       S0          S1        S2
## B 305 93025 6.576281 0.008236356 0.7998781

Row-Standardised Fixed Distance Weight Matrix VS Row-Standardised Queens Weight Matrix

–> The Row-Standardised Fixed Distance Weight Matrix will be used for the subsequent calculations in the spatial autocorrelation analysis of the residual.

–> The polygons vary largely in terms of size hence the fixed distance matrix will be better to ensure a consistent scale of analysis.

Computing Global Moran’s I (Tap In Residuals)

lm.morantest(LM_Tap_In, rswm_tapin, alternative = "two.sided")
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
## weights: rswm_tapin
## 
## Moran I statistic standard deviate = 0.14602, p-value = 0.8839
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I      Expectation         Variance 
##    -0.0021633370    -0.0039081224     0.0001427711

Conclusion:

–> p-value = 0.8839 > 0.05 (alpha value at 95% confidence interval)

–> We do not reject the null hypothesis at 95% confidence interval

–> The residual for the simple regression between tap-in data and residential population is randomly distributed by subzones.

–> There is negative autocorrelation therefore there are no signs of clustering.

Computing Monte Carlo Moran’s I (Tap In Residuals)

set.seed(1234)

mc_tapin <- moran.mc(SP_TAP_IN$Res_Tap_In, listw = rswm_tapin, nsim = 999, zero.policy = TRUE, na.action=na.omit)

mc_tapin
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  SP_TAP_IN$Res_Tap_In 
## weights: rswm_tapin  
## number of simulations + 1: 1000 
## 
## statistic = -0.0021633, observed rank = 594, p-value = 0.406
## alternative hypothesis: greater

Conclusion:

–> p-value = 0.406 > 0.05 (alpha value at 95% confidence interval)

–> We do not reject the null hypothesis at 95% confidence interval

–> The residual for the simple regression between tap-in data and residential population is randomly distributed by subzones.

–> The results obtained from Monte Carlo Moran’s I are consistent with the results from Moran’s I done above.

Computing Global Moran’s I (Tap Out Residuals)

lm.morantest(LM_Tap_Out, rswm_tapout, alternative = "two.sided")
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = TOTAL_TAP_OUT ~ TOTALPOP, data =
## TapVol_PopCount_SZ)
## weights: rswm_tapout
## 
## Moran I statistic standard deviate = 0.34595, p-value = 0.7294
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     0.0002255333    -0.0039081224     0.0001427711

Conclusion:

–> p-value = 0.7294 > 0.05 (alpha value at 95% confidence interval)

–> We do not reject the null hypothesis at 95% confidence interval

–> The residual for the simple regression between tap-out data and residential population is randomly distributed by subzones.

–> There is positive autocorrelation therefore there are signs of clustering.

Computing Monte Carlo Moran’s I (Tap Out Residuals)

set.seed(1234)

mc_tapout <- moran.mc(SP_TAP_OUT$Res_Tap_Out, listw = rswm_tapout, nsim = 999, zero.policy = TRUE, na.action=na.omit)

mc_tapout
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  SP_TAP_OUT$Res_Tap_Out 
## weights: rswm_tapout  
## number of simulations + 1: 1000 
## 
## statistic = 0.00022553, observed rank = 670, p-value = 0.33
## alternative hypothesis: greater

Conclusion:

–> p-value = 0.33 > 0.05 (alpha value at 95% confidence interval)

–> We do not reject the null hypothesis at 95% confidence interval

–> The residual for the simple regression between tap-out data and residential population is randomly distributed by subzones.

–> The results obtained from Monte Carlo Moran’s I are consistent with the results from Moran’s I done above.

Spatial Correlogram

Compute Moran’s I correlogram and plot (tap In Residuals)

MI_corr_tapin <- sp.correlogram(wm_tapin_d5500, SP_TAP_IN$Res_Tap_In, order=6, method="I", style="B", zero.policy = TRUE)
plot(MI_corr_tapin)

Compute Moran’s I correlogram and plot (Tap Out Residuals)

MI_corr_tapout <- sp.correlogram(wm_tapout_d5500, SP_TAP_OUT$Res_Tap_Out, order=6, method="I", style="B", zero.policy = TRUE)
plot(MI_corr_tapout)

Objective 3: Performing localized geospatial statistics analysis by using commuters’ tap-in and tap-out data to identify geographical clustering.

Splitting Data Sets and Converting them to SpatialPolygonDataFrame

SF_LA_TAPIN <- SF_TapVol_PopCount_SZ %>%
  select(`SZ`, `TOTAL_TAP_IN`, `geometry`)

SP_LA_TAPIN <- as_Spatial(SF_LA_TAPIN)
  
SF_LA_TAPOUT <- SF_TapVol_PopCount_SZ %>%
  select(`SZ`, `TOTAL_TAP_OUT`, `geometry`)

SP_LA_TAPOUT <- as_Spatial(SF_LA_TAPOUT)

Cluster and Outlier Analysis on Tap In/ Tap Out Vol

Deriving Spatial Weight Matrix (Tap In Vol)

Creating QUEEN Contiguity Based Neighbours

tapin_wm_q <- poly2nb(SP_LA_TAPIN, queen = TRUE)
summary(tapin_wm_q)
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  6 11 27 79 76 50 36 13  2  2  1  1  1 
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links

Row Standardised QUEENS Weight Matrix

tapin_rswm_q <- nb2listw(tapin_wm_q, zero.policy = TRUE)
summary(tapin_rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  6 11 27 79 76 50 36 13  2  2  1  1  1 
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0      S1       S2
## W 305 93025 305 106.114 1253.543

Deriving Spatial Weight Matrix (Tap Out Vol)

Creating QUEEN Contiguity Based Neighbours

tapout_wm_q <- poly2nb(SP_LA_TAPOUT, queen = TRUE)
summary(tapout_wm_q)
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  6 11 27 79 76 50 36 13  2  2  1  1  1 
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links

Row Standardised QUEENS Weight Matrix

tapout_rswm_q <- nb2listw(tapout_wm_q, zero.policy = TRUE)
summary(tapout_rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 1844 
## Percentage nonzero weights: 1.982263 
## Average number of links: 6.045902 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  6 11 27 79 76 50 36 13  2  2  1  1  1 
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0      S1       S2
## W 305 93025 305 106.114 1253.543

Computing Local Moran’s I (TapIn)

Fips_TapIn <- order(SP_LA_TAPIN$SZ)
LocalMI_TapIn <- localmoran(SP_LA_TAPIN$TOTAL_TAP_IN, tapin_rswm_q)
head(LocalMI_TapIn)
##           Ii         E.Ii    Var.Ii       Z.Ii  Pr(z > 0)
## 1  0.1002384 -0.003289474 0.1555725  0.2624769 0.39647691
## 2  0.4810975 -0.003289474 0.4726349  0.7045787 0.24053622
## 3 -0.1003840 -0.003289474 0.1159397 -0.2851534 0.61223668
## 4  0.0873004 -0.003289474 0.1555725  0.2296747 0.40917227
## 5  0.4765620 -0.003289474 0.1027288  1.4971342 0.06717917
## 6 -0.2819368 -0.003289474 0.1555725 -0.7064613 0.76004934

Computing Local Moran’s I (TapOut)

Fips_TapOut <- order(SP_LA_TAPOUT$SZ)
LocalMI_TapOut <- localmoran(SP_LA_TAPOUT$TOTAL_TAP_OUT, tapout_rswm_q)
head(LocalMI_TapOut)
##            Ii         E.Ii    Var.Ii       Z.Ii  Pr(z > 0)
## 1  0.07971463 -0.003289474 0.1557926  0.2102936 0.41671926
## 2  0.45881805 -0.003289474 0.4733132  0.6716898 0.25089061
## 3 -0.19489338 -0.003289474 0.1161026 -0.5623199 0.71305096
## 4  0.10136942 -0.003289474 0.1557926  0.2651568 0.39544436
## 5  0.42402974 -0.003289474 0.1028725  1.3323021 0.09138047
## 6 -0.22286917 -0.003289474 0.1557926 -0.5563124 0.71100134

Mapping both local Moran’s I values and p-values (TapIn)

SP_LA_TAPIN.localMI_TapIn <- cbind(SP_LA_TAPIN, LocalMI_TapIn)

SP_LA_TAPOUT.localMI_TapOut <- cbind(SP_LA_TAPOUT, LocalMI_TapOut)
LocalMI_TapIn.map <- tm_shape(SP_LA_TAPIN.localMI_TapIn) +
  tm_fill(col = "Ii", 
          style = "pretty", 
          title = "Local Moran Stats (TapIn) ") +
  tm_borders(alpha = 0.5)

pvalue_TapIn.map <- tm_shape(SP_LA_TAPIN.localMI_TapIn) +
  tm_fill(col = "Pr.z...0.", 
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette="-Blues", 
          title = "local Moran's I p-values (TapIn) ") +
  tm_borders(alpha = 0.5)

tmap_arrange(LocalMI_TapIn.map, pvalue_TapIn.map, asp=1, ncol=2)
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid

From the maps:

–> The areas with high local Moran’s I statistics indicate that the subzone has neighboring subzones with similarly high or low attribute values; this subzone is part of a cluster.

–> The areas with negative local Moran’s I statistics indicate that the subzones have neighboring subzones with dissimilar values; the subzone is an outlier

–> There appears to be clustering towards the east of Singapore.

–> The cluster towards the east is statistically significant as the p-value for those subzones are very small.

Mapping both local Moran’s I values and p-values (TapOut)

LocalMI_TapOut.map <- tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
  tm_fill(col = "Ii", 
          style = "pretty", 
          title = "Local Moran Stats (TapOut) ") +
  tm_borders(alpha = 0.5)

pvalue_TapOut.map <- tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
  tm_fill(col = "Pr.z...0.", 
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette="-Blues", 
          title = "local Moran's I p-values (TapOut) ") +
  tm_borders(alpha = 0.5)

tmap_arrange(LocalMI_TapOut.map, pvalue_TapOut.map, asp=1, ncol=2)
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid

From the maps:

–> The areas with high local Moran’s I statistics indicate that the subzone has neighboring subzones with similarly high or low attribute values; this subzone is part of a cluster.

–> The areas with negative local Moran’s I statistics indicate that the subzones have neighboring subzones with dissimilar values; the subzone is an outlier

–> There appears to be clustering towards the east of Singapore.

–> The cluster towards the east is statistically significant as the p-value for those subzones are very small.

Creating LISA Cluster Maps

Plotting Moran Scatterplot With Standardised Variable (Tap In)

SP_LA_TAPIN$Z.TAPIN <- scale(SP_LA_TAPIN$TOTAL_TAP_IN) %>% as.vector

MPlot_TapIn <- moran.plot(SP_LA_TAPIN$Z.TAPIN, tapin_rswm_q, labels = as.character(SP_LA_TAPIN$SZ), xlab = "z-TAP_IN_VOLUME" , ylab = "Spatially Lag z-TAP_IN_VOLUME")

From the Moran Scatterplot:

–> Tap In Volume for subzones such as Xilin and Toa Payoh Central represent outliers. This is because they are located in the first and fourth quadrant respectively.

–> Tap In Volume for subzones such as Pasir Ris central, Bedok Reservoir and Kembangan represent clusters. This is because they are located in the second quadrant.

Plotting Moran Scatterplot With Standardised Variable (Tap Out)

SP_LA_TAPOUT$Z.TAPOUT <- scale(SP_LA_TAPOUT$TOTAL_TAP_OUT) %>% as.vector

MPlot_TapOut <- moran.plot(SP_LA_TAPOUT$Z.TAPOUT, tapout_rswm_q, labels = as.character(SP_LA_TAPOUT$SZ), xlab = "z-TAP_OUT_VOLUME" , ylab = "Spatially Lag z-TAP_OUT_VOLUME")

From the Moran Scatterplot:

–> Tap In Volume for subzones such as Xilin and Toa Payoh Central represent outliers. This is because they are located in the first and fourth quadrant respectively.

–> Tap In Volume for subzones such as Bedok South, Pasir Ris central, Bedok Reservoir and Kembangan represent clusters. This is because they are located in the second quadrant.

Preparing and Building LISA Cluster Map (TapIn)

–> Preparing

#Step 1: 
quadrant_tapin <- vector(mode="numeric",length=nrow(LocalMI_TapIn))
#Step 2: Variable of interest around its mean
DV_tapin <- SP_LA_TAPIN$TOTAL_TAP_IN - mean(SP_LA_TAPIN$TOTAL_TAP_IN)     
#step 3: Centering Local Moran's around mean
C_mI_tapin <- LocalMI_TapIn[,1] - mean(LocalMI_TapIn[,1])    
#Step 4: Setting Statistical Intelligence Level for local Moran
signif_tapin <- 0.05 
#Step 5: Defining high-high, low-low, low-high and high-low categories
quadrant_tapin[DV_tapin >0 & C_mI_tapin>0] <- 4      
quadrant_tapin[DV_tapin <0 & C_mI_tapin<0] <- 1      
quadrant_tapin[DV_tapin <0 & C_mI_tapin>0] <- 2
quadrant_tapin[DV_tapin >0 & C_mI_tapin<0] <- 3
quadrant_tapin[LocalMI_TapIn[,5]>signif_tapin] <- 0

–>Building

SP_LA_TAPIN.localMI_TapIn$quadrant <- quadrant_tapin
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

tm_shape(SP_LA_TAPIN.localMI_TapIn) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapin)))+1], labels = clusters[c(sort(unique(quadrant_tapin)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid

From the LISA Map:

–> The strongly colored regions are therefore those that contribute significantly to a positive local spatial autocorrelation outcome, while the other regions do not contribute significantly to the local spatial autocorrelation outcome.

–> There is a statistically significant cluster of high-high values towards the east.

Preparing and Building LISA Cluster Map (TapOut)

–> Preparing

#Step 1: 
quadrant_tapout <- vector(mode="numeric",length=nrow(LocalMI_TapOut))
#Step 2: Variable of interest around its mean
DV_tapout <- SP_LA_TAPOUT$TOTAL_TAP_OUT - mean(SP_LA_TAPOUT$TOTAL_TAP_OUT)     
#step 3: Centering Local Moran's around mean
C_mI_tapout <- LocalMI_TapOut[,1] - mean(LocalMI_TapOut[,1])    
#Step 4: Setting Statistical Intelligence Level for local Moran
signif_tapout <- 0.05 
#Step 5: Defining high-high, low-low, low-high and high-low categories
quadrant_tapout[DV_tapout >0 & C_mI_tapout>0] <- 4      
quadrant_tapout[DV_tapout <0 & C_mI_tapout<0] <- 1      
quadrant_tapout[DV_tapout <0 & C_mI_tapout>0] <- 2
quadrant_tapout[DV_tapout >0 & C_mI_tapout<0] <- 3
quadrant_tapout[LocalMI_TapOut[,5]>signif_tapout] <- 0

–>Building

SP_LA_TAPOUT.localMI_TapOut$quadrant <- quadrant_tapout
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapout)))+1], labels = clusters[c(sort(unique(quadrant_tapout)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid

From the LISA Map:

–> The strongly colored regions are therefore those that contribute significantly to a positive local spatial autocorrelation outcome, while the other regions do not contribute significantly to the local spatial autocorrelation outcome.

–> There is a statistically significant cluster of high-high values towards the east.

Comparing LISA maps for Tap In and Tap Out

lisa_tapin <- tm_shape(SP_LA_TAPIN.localMI_TapIn) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapin)))+1], labels = clusters[c(sort(unique(quadrant_tapin)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5) +tm_layout("LISA (Tap In)")

lisa_tapout <- tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapout)))+1], labels = clusters[c(sort(unique(quadrant_tapout)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5) +tm_layout("LISA (Tap Out)")

tmap_arrange(lisa_tapin, lisa_tapout, asp=1,ncol=2)
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid

Comparing the two LISA Maps:

–> There are differences in the high-high clusters for Tap In Volume and Tap Out Volume.

–> Tap Out Volume appears to have slightly more high-high areas as compared to Tap In Volume.

–> Tap Out Volume has more clusters in the East as compared to Tap Out Volume.

Hot and Cold Spot Area Analysis

Deriving Distance-Based Weight Matrix (Tap In Vol)

Creating fixed distance proximity matrix

dnb_tapin <- dnearneigh(coordinates(SP_LA_TAPIN), 0, 5500)
dnb_tapin
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574
plot(SP_LA_TAPIN, border = "lightgrey")
plot(dnb_tapin, coordinates(SP_LA_TAPIN), add = TRUE, col = "red")

dnb_lw_tapin <- nb2listw(dnb_tapin, style = "B")
summary(dnb_lw_tapin)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574 
## Link number distribution:
## 
##   1   6   8   9  10  11  13  14  16  18  19  20  21  22  23  24  25  26  27  28 
##   1   1   3   1   1   1   3   1   4   3   2   4   3   5   3   4   3   6   6   4 
##  29  30  31  32  33  34  35  36  37  38  39  41  42  43  44  45  46  47  48  49 
##   7   7   8   5   3   5   8   3   5   1   4   3   5   2   5   3   4   7   3   5 
##  50  51  52  53  54  55  56  58  59  60  61  62  63  64  65  66  67  69  70  71 
##   6   3   3   5   4   2   3   3   4   2   1   3   1   1   2   1   2   1   1   3 
##  72  73  74  75  76  78  79  80  82  83  84  85  86  87  88  89  90  91  92  93 
##   1   2   4   1   1   2   4   3   3   2   2   5   1   7   6   4   1   3   2   3 
##  94  95  96  97  98  99 101 102 103 104 105 107 108 109 110 
##   2   3   2   3   4   4   3   6   3   2   5   2   1   4   1 
## 1 least connected region:
## 275 with 1 link
## 1 most connected region:
## 160 with 110 links
## 
## Weights style: B 
## Weights constants summary:
##     n    nn    S0    S1      S2
## B 305 93025 17036 34072 4827632

Deriving Distance-Based Weight Matrix (Tap Out Vol)

Creating fixed distance proximity matrix

dnb_tapout <- dnearneigh(coordinates(SP_LA_TAPOUT), 0, 5500)
dnb_tapout
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574
plot(SP_LA_TAPOUT, border = "lightgrey")
plot(dnb_tapout, coordinates(SP_LA_TAPOUT), add = TRUE, col = "red")

dnb_lw_tapout <- nb2listw(dnb_tapout, style = "B")
summary(dnb_lw_tapout)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305 
## Number of nonzero links: 17036 
## Percentage nonzero weights: 18.31336 
## Average number of links: 55.85574 
## Link number distribution:
## 
##   1   6   8   9  10  11  13  14  16  18  19  20  21  22  23  24  25  26  27  28 
##   1   1   3   1   1   1   3   1   4   3   2   4   3   5   3   4   3   6   6   4 
##  29  30  31  32  33  34  35  36  37  38  39  41  42  43  44  45  46  47  48  49 
##   7   7   8   5   3   5   8   3   5   1   4   3   5   2   5   3   4   7   3   5 
##  50  51  52  53  54  55  56  58  59  60  61  62  63  64  65  66  67  69  70  71 
##   6   3   3   5   4   2   3   3   4   2   1   3   1   1   2   1   2   1   1   3 
##  72  73  74  75  76  78  79  80  82  83  84  85  86  87  88  89  90  91  92  93 
##   1   2   4   1   1   2   4   3   3   2   2   5   1   7   6   4   1   3   2   3 
##  94  95  96  97  98  99 101 102 103 104 105 107 108 109 110 
##   2   3   2   3   4   4   3   6   3   2   5   2   1   4   1 
## 1 least connected region:
## 275 with 1 link
## 1 most connected region:
## 160 with 110 links
## 
## Weights style: B 
## Weights constants summary:
##     n    nn    S0    S1      S2
## B 305 93025 17036 34072 4827632

###Computing Gi Statistics

Gi Statistics using Fixed Distance (Tap In)

tapin_fips <- order(SP_LA_TAPIN$SZ)
gi.fixed_tapin <- localG(SP_LA_TAPIN$TOTAL_TAP_IN, dnb_lw_tapin)

SP_LA_TAPIN.gi_tapin <- cbind(SP_LA_TAPIN, as.matrix(gi.fixed_tapin))
names(SP_LA_TAPIN.gi_tapin)[4] <- "gstat"

GI Statistics using Fixed Distance (Tap Out)

tapout_fips <- order(SP_LA_TAPOUT$SZ)
gi.fixed_tapout <- localG(SP_LA_TAPOUT$TOTAL_TAP_OUT, dnb_lw_tapout)

SP_LA_TAPOUT.gi_tapout <- cbind(SP_LA_TAPOUT, as.matrix(gi.fixed_tapout))
names(SP_LA_TAPOUT.gi_tapout)[4] <- "gstat"

Visualing Local Gi

Mapping Gi values with fixed distance weights (Tap In)

tm_shape(SP_LA_TAPIN.gi_tapin) +
  tm_fill(col = "gstat", 
          style = "pretty",
          palette="-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5)
## Warning: The shape SP_LA_TAPIN.gi_tapin is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

From the choropleth map above:

–> Subzones shaded in red are hot spot areas and subzones shaded in blue are the cold spot areas. The darkness of the colours represent the intensity of the Gi values.

–> The hot spot areas are on the north, east and west regions of Singapore. The cold spot areas, on the other hand, mainly comprise of subzones located on the southern part of the country.

–> The most intense positive Gi values are found towards the east while the most intense negative Gi values are found towards the south.

Mapping Gi values with fixed distance weights (Tap Out)

tm_shape(SP_LA_TAPOUT.gi_tapout) +
  tm_fill(col = "gstat", 
          style = "pretty",
          palette="-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5)
## Warning: The shape SP_LA_TAPOUT.gi_tapout is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

From the choropleth map above:

–> Subzones shaded in red are hot spot areas and subzones shaded in blue are the cold spot areas. The darkness of the colours represent the intensity of the Gi values.

–> The hot spot areas are on the north, east and west regions of Singapore. The cold spot areas, on the other hand, mainly comprise of subzones located on the southern part of the country.

–> The most intense positive Gi values are found towards the east while the most intense negative Gi values are found towards the south.

Comparing Gi Values for Tap in and Tap Out Volume

Gi_tapin <- tm_shape(SP_LA_TAPIN.gi_tapin) +
  tm_fill(col = "gstat", 
          style = "pretty",
          palette="-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5) +
  tm_layout("Gi Values (Tap In)")

Gi_tapout <- tm_shape(SP_LA_TAPOUT.gi_tapout) +
  tm_fill(col = "gstat", 
          style = "pretty",
          palette="-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5) +
  tm_layout("Gi Values (Tap Out)")

tmap_arrange(Gi_tapin, Gi_tapout, asp=1,ncol=2)
## Warning: The shape SP_LA_TAPIN.gi_tapin is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning: The shape SP_LA_TAPOUT.gi_tapout is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Comparing the two maps:

–> There are minor differences between the two maps. For example, Tap Out has more subzones with high positive local Gi values in the East as compared to Tap In. It also has more subzones with high negative local Gi values in the South as compared to Tap In.