RPubs Link: https://rpubs.com/HarpreetK/612095
We are going to conduct a use-case to demonstrate the potential contribution of geospatial analytics in R to integrate, analyse and communicate the analysis results by using open data provided by different government agencies. The specific objectives of the study are as follow:
Calibrating a simple linear regression to reveal the relation between public bus commuters’ flows (i.e. tap-in or tap-out) data and residential population at the planning sub-zone level.
Performing spatial autocorrelation analysis on the residual of the regression model to test if the model conforms to the randomization assumption.
Performing localized geospatial statistics analysis by using commuters’ tap-in and tap-out data to identify geographical clustering.
The data used in this assignment are as follow:
Passenger Volume by BusStops (taken from LTA)
Residental Population Data by Subzones (taken from Singstat)
Location of Bus Stops (taken from LTA)
Planning Subzone Levels (taken from URA)
packages = c('rgdal', 'sf', 'spdep', 'tmap', 'tidyverse', 'rgeos')
for (p in packages) {
if (!require(p, character.only = T)) {
install.packages(p)
}
}
library(p, character.only = T)
passengervol <- read_csv("data/aspatial/passenger volume by busstop.csv")
## Parsed with column specification:
## cols(
## YEAR_MONTH = col_character(),
## DAY_TYPE = col_character(),
## TIME_PER_HOUR = col_double(),
## PT_TYPE = col_character(),
## PT_CODE = col_character(),
## TOTAL_TAP_IN_VOLUME = col_double(),
## TOTAL_TAP_OUT_VOLUME = col_double()
## )
popdata <- read_csv("data/aspatial/respopagesextod2011to2019.csv")
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
subzones <- readOGR(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## OGR data source with driver: ESRI Shapefile
## Source: "F:\IS415\IS415_Take-home_Ex01\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields
busstops <- readOGR(dsn = "data/geospatial",
layer = "BusStop")
## OGR data source with driver: ESRI Shapefile
## Source: "F:\IS415\IS415_Take-home_Ex01\data\geospatial", layer: "BusStop"
## with 5040 features
## It has 3 fields
(Filtering for 2019 population data in each subzone.)
Pop_By_SZ_2019 <- popdata %>%
filter(Time == 2019) %>%
mutate_at(.vars = vars(PA, SZ), .funs = funs(toupper)) %>%
select(`PA`, `SZ`,`Pop`) %>%
group_by(`SZ`) %>%
summarise(TOTALPOP = sum(`Pop`))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
subzones_sf <- st_as_sf(subzones, coords = c("XCOORD", "YCOORD"),
crs= 3414)
busstops_sf <- st_as_sf(busstops, coords = c("XCOORD", "YCOORD"),
crs= 3414)
(Finding out the busstops for each subzone.)
Bus_Stops_At_SZ <- st_join(subzones_sf, busstops_sf)
(Finding out the tapin/out volume at busstops and the subzones for the busstops.)
Tap_Vol_At_SZ <- inner_join(Bus_Stops_At_SZ, passengervol, by = c("BUS_STOP_N" = "PT_CODE"))
## Warning: Column `BUS_STOP_N`/`PT_CODE` joining factor and character vector,
## coercing into character vector
(Finding the total vol of tapin/tapout for each subzone.)
Total_Tap_Per_SZ <- Tap_Vol_At_SZ %>%
group_by(`SUBZONE_N`) %>%
summarise(TOTAL_TAP_IN = sum(`TOTAL_TAP_IN_VOLUME`),
TOTAL_TAP_OUT = sum(`TOTAL_TAP_OUT_VOLUME`))
(Finding the population and vol of tapin/out for each subzone.)
TapVol_PopCount_SZ <- inner_join(Pop_By_SZ_2019, Total_Tap_Per_SZ, by = c("SZ"="SUBZONE_N"))
## Warning: Column `SZ`/`SUBZONE_N` joining character vector and factor, coercing
## into character vector
LM_Tap_In <- lm(TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
summary(LM_Tap_In)
##
## Call:
## lm(formula = TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -863762 -123199 -63290 34580 1950650
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.245e+05 2.107e+04 5.907 9.34e-09 ***
## TOTALPOP 1.924e+01 9.370e-01 20.535 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 297800 on 303 degrees of freedom
## Multiple R-squared: 0.5819, Adjusted R-squared: 0.5805
## F-statistic: 421.7 on 1 and 303 DF, p-value: < 2.2e-16
LM_Tap_In
##
## Call:
## lm(formula = TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
##
## Coefficients:
## (Intercept) TOTALPOP
## 124455.17 19.24
From the output above:
–> The estimated regression line equation can be written as follow: VOL_OF_TAP_IN = (19.24*POPULATION) + 124455.17
–> The intercept is 124455.17. It can be interpreted as the predicted number of tap-ins for a subzone with 0 people.
–> The regression coefficient for the variable population, also known as the slope, is 19.24. For every 1000 people, the VOL_OF_TAP_IN is estimated to be (19.24*1000 + 124455.17) = 12474.41
ggplot(TapVol_PopCount_SZ, aes(TOTALPOP, TOTAL_TAP_IN)) +
geom_point() +
stat_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'
There appears to be a positive linear relationship between Tap-In Data and Residential Population at the Subzone Level.
LM_Tap_Out <- lm(TOTAL_TAP_OUT ~ TOTALPOP, data = TapVol_PopCount_SZ)
summary(LM_Tap_Out)
##
## Call:
## lm(formula = TOTAL_TAP_OUT ~ TOTALPOP, data = TapVol_PopCount_SZ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -823156 -122534 -58009 31081 1644315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.249e+05 2.040e+04 6.125 2.81e-09 ***
## TOTALPOP 1.916e+01 9.072e-01 21.114 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 288300 on 303 degrees of freedom
## Multiple R-squared: 0.5954, Adjusted R-squared: 0.594
## F-statistic: 445.8 on 1 and 303 DF, p-value: < 2.2e-16
LM_Tap_Out
##
## Call:
## lm(formula = TOTAL_TAP_OUT ~ TOTALPOP, data = TapVol_PopCount_SZ)
##
## Coefficients:
## (Intercept) TOTALPOP
## 124936.62 19.16
From the output above:
–> The estimated regression line equation can be written as follow: vOL_OF_TAP_OUT = (19.16*POPULATION) + 124936.62
–> The intercept is 124936.2. It can be interpreted as the predicted number of tap-outs for a subzone with 0 people.
–> The regression coefficient for the variable population, also known as the slope, is 19.16. For every 1000 people, the VOL_OF_TAP_IN is estimated to be (19.16*1000 + 124936.62) = 144096.62
ggplot(TapVol_PopCount_SZ, aes(TOTALPOP, TOTAL_TAP_OUT)) +
geom_point() +
stat_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'
There appears to be a positive linear relationship between Tap-Out Data and Residential Population at the Subzone Level.
DF_TapVol_PopCount_SZ <- as.data.frame(TapVol_PopCount_SZ)
SF_TapVol_PopCount_SZ <- st_as_sf(DF_TapVol_PopCount_SZ)
class(SF_TapVol_PopCount_SZ)
## [1] "sf" "data.frame"
(Obtaining residuals for tapin/out and adding them to the dataframe)
#TAP IN RESIDUALS
SF_TapVol_PopCount_SZ$Res_Tap_In <- residuals(LM_Tap_In)
#TAP OUT RESIDUALS
SF_TapVol_PopCount_SZ$Res_Tap_Out <- residuals(LM_Tap_Out)
qtm(SF_TapVol_PopCount_SZ, "Res_Tap_In")
## Warning: The shape SF_TapVol_PopCount_SZ is invalid. See sf::st_is_valid
## Variable(s) "Res_Tap_In" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Null Hypothesis: The residuals for the simple regression between tap-in data and residential population are randomly distributed by subzones.
Alternative Hypothesis: The residuals for the simple regression between tap-in data and residential population is randomly distributed by subzones.
Confidence Interval: 95%
qtm(SF_TapVol_PopCount_SZ, "Res_Tap_Out")
## Warning: The shape SF_TapVol_PopCount_SZ is invalid. See sf::st_is_valid
## Variable(s) "Res_Tap_Out" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Null Hypothesis: The residuals for the simple regression between tap-out data and residential population are randomly distributed by subzones.
Alternative Hypothesis: The residuals for the simple regression between tap-out data and residential population are randomly distributed by subzones.
Confidence Interval: 95%
(Each SpatialPolygonDataFrame will show subzone,geometry and tapin/out data only.)
SF_TAP_IN <- SF_TapVol_PopCount_SZ %>%
select(`SZ`, `Res_Tap_In`, `geometry`)
SP_TAP_IN <- as_Spatial(SF_TAP_IN)
SF_TAP_OUT <- SF_TapVol_PopCount_SZ %>%
select(`SZ`, `Res_Tap_Out`, `geometry`)
SP_TAP_OUT <- as_Spatial(SF_TAP_OUT)
wm_q_tapin <- poly2nb(SP_TAP_IN, queen = TRUE)
summary(wm_q_tapin)
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11 12 14 17
## 6 11 27 79 76 50 36 13 2 2 1 1 1
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
plot(SP_TAP_IN, border = "lightgrey")
plot(wm_q_tapin, coordinates(SP_TAP_IN), pch = 19, cex = 0.6, add = TRUE, col= "red")
coord_tapin <- coordinates(SP_TAP_IN)
k1_tapin <- knn2nb(knearneigh(coord_tapin))
k1dists_tapin <- unlist(nbdists(k1_tapin, coord_tapin))
summary(k1dists_tapin)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 182.5 616.7 891.7 941.4 1169.9 5403.9
wm_tapin_d5500 <- dnearneigh(coord_tapin, 0, 5500)
wm_tapin_d5500
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
par(mfrow=c(1,2))
plot(SP_TAP_IN, border="lightgrey")
plot(k1_tapin, coord_tapin, add=TRUE, col="red", length=0.08, main="1st nearest neighbours")
plot(SP_TAP_IN, border="lightgrey")
plot(wm_tapin_d5500, coord_tapin, add=TRUE, pch = 19, cex = 0.5, main="Distance link")
rswm_q_tapin <- nb2listw(wm_q_tapin, style="W", zero.policy = TRUE)
rswm_q_tapin
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 305 93025 305 106.114 1253.543
dist_tapin <- nbdists(wm_tapin_d5500, coord_tapin)
ids_tapin <- lapply(dist_tapin, function(x) 1/(x))
rswm_tapin <- nb2listw(wm_tapin_d5500, glist = ids_tapin, style = "B", zero.policy = TRUE)
rswm_tapin
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 305 93025 6.576281 0.008236356 0.7998781
wm_q_tapout <- poly2nb(SP_TAP_OUT, queen = TRUE)
summary(wm_q_tapout)
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11 12 14 17
## 6 11 27 79 76 50 36 13 2 2 1 1 1
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
plot(SP_TAP_OUT, border = "lightgrey")
plot(wm_q_tapout, coordinates(SP_TAP_OUT), pch = 19, cex = 0.6, add = TRUE, col= "red")
coord_tapout <- coordinates(SP_TAP_OUT)
k1_tapout <- knn2nb(knearneigh(coord_tapout))
k1dists_tapout <- unlist(nbdists(k1_tapout, coord_tapout))
summary(k1dists_tapout)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 182.5 616.7 891.7 941.4 1169.9 5403.9
wm_tapout_d5500 <- dnearneigh(coord_tapout, 0, 5500)
wm_tapout_d5500
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
par(mfrow=c(1,2))
plot(SP_TAP_OUT, border="lightgrey")
plot(k1_tapout, coord_tapout, add=TRUE, col="red", length=0.08, main="1st nearest neighbours")
plot(SP_TAP_OUT, border="lightgrey")
plot(wm_tapout_d5500, coord_tapout, add=TRUE, pch = 19, cex = 0.5, main="Distance link")
rswm_q_tapout <- nb2listw(wm_q_tapout, style="W", zero.policy = TRUE)
rswm_q_tapout
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 305 93025 305 106.114 1253.543
dist_tapout <- nbdists(wm_tapout_d5500, coord_tapout)
ids_tapout <- lapply(dist_tapout, function(x) 1/(x))
rswm_tapout <- nb2listw(wm_tapout_d5500, glist = ids_tapout, style = "B", zero.policy = TRUE)
rswm_tapout
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 305 93025 6.576281 0.008236356 0.7998781
–> The Row-Standardised Fixed Distance Weight Matrix will be used for the subsequent calculations in the spatial autocorrelation analysis of the residual.
–> The polygons vary largely in terms of size hence the fixed distance matrix will be better to ensure a consistent scale of analysis.
lm.morantest(LM_Tap_In, rswm_tapin, alternative = "two.sided")
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = TOTAL_TAP_IN ~ TOTALPOP, data = TapVol_PopCount_SZ)
## weights: rswm_tapin
##
## Moran I statistic standard deviate = 0.14602, p-value = 0.8839
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## -0.0021633370 -0.0039081224 0.0001427711
Conclusion:
–> p-value = 0.8839 > 0.05 (alpha value at 95% confidence interval)
–> We do not reject the null hypothesis at 95% confidence interval
–> The residual for the simple regression between tap-in data and residential population is randomly distributed by subzones.
–> There is negative autocorrelation therefore there are no signs of clustering.
set.seed(1234)
mc_tapin <- moran.mc(SP_TAP_IN$Res_Tap_In, listw = rswm_tapin, nsim = 999, zero.policy = TRUE, na.action=na.omit)
mc_tapin
##
## Monte-Carlo simulation of Moran I
##
## data: SP_TAP_IN$Res_Tap_In
## weights: rswm_tapin
## number of simulations + 1: 1000
##
## statistic = -0.0021633, observed rank = 594, p-value = 0.406
## alternative hypothesis: greater
Conclusion:
–> p-value = 0.406 > 0.05 (alpha value at 95% confidence interval)
–> We do not reject the null hypothesis at 95% confidence interval
–> The residual for the simple regression between tap-in data and residential population is randomly distributed by subzones.
–> The results obtained from Monte Carlo Moran’s I are consistent with the results from Moran’s I done above.
lm.morantest(LM_Tap_Out, rswm_tapout, alternative = "two.sided")
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = TOTAL_TAP_OUT ~ TOTALPOP, data =
## TapVol_PopCount_SZ)
## weights: rswm_tapout
##
## Moran I statistic standard deviate = 0.34595, p-value = 0.7294
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## 0.0002255333 -0.0039081224 0.0001427711
Conclusion:
–> p-value = 0.7294 > 0.05 (alpha value at 95% confidence interval)
–> We do not reject the null hypothesis at 95% confidence interval
–> The residual for the simple regression between tap-out data and residential population is randomly distributed by subzones.
–> There is positive autocorrelation therefore there are signs of clustering.
set.seed(1234)
mc_tapout <- moran.mc(SP_TAP_OUT$Res_Tap_Out, listw = rswm_tapout, nsim = 999, zero.policy = TRUE, na.action=na.omit)
mc_tapout
##
## Monte-Carlo simulation of Moran I
##
## data: SP_TAP_OUT$Res_Tap_Out
## weights: rswm_tapout
## number of simulations + 1: 1000
##
## statistic = 0.00022553, observed rank = 670, p-value = 0.33
## alternative hypothesis: greater
Conclusion:
–> p-value = 0.33 > 0.05 (alpha value at 95% confidence interval)
–> We do not reject the null hypothesis at 95% confidence interval
–> The residual for the simple regression between tap-out data and residential population is randomly distributed by subzones.
–> The results obtained from Monte Carlo Moran’s I are consistent with the results from Moran’s I done above.
MI_corr_tapin <- sp.correlogram(wm_tapin_d5500, SP_TAP_IN$Res_Tap_In, order=6, method="I", style="B", zero.policy = TRUE)
plot(MI_corr_tapin)
MI_corr_tapout <- sp.correlogram(wm_tapout_d5500, SP_TAP_OUT$Res_Tap_Out, order=6, method="I", style="B", zero.policy = TRUE)
plot(MI_corr_tapout)
SF_LA_TAPIN <- SF_TapVol_PopCount_SZ %>%
select(`SZ`, `TOTAL_TAP_IN`, `geometry`)
SP_LA_TAPIN <- as_Spatial(SF_LA_TAPIN)
SF_LA_TAPOUT <- SF_TapVol_PopCount_SZ %>%
select(`SZ`, `TOTAL_TAP_OUT`, `geometry`)
SP_LA_TAPOUT <- as_Spatial(SF_LA_TAPOUT)
tapin_wm_q <- poly2nb(SP_LA_TAPIN, queen = TRUE)
summary(tapin_wm_q)
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11 12 14 17
## 6 11 27 79 76 50 36 13 2 2 1 1 1
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
tapin_rswm_q <- nb2listw(tapin_wm_q, zero.policy = TRUE)
summary(tapin_rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11 12 14 17
## 6 11 27 79 76 50 36 13 2 2 1 1 1
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 305 93025 305 106.114 1253.543
tapout_wm_q <- poly2nb(SP_LA_TAPOUT, queen = TRUE)
summary(tapout_wm_q)
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11 12 14 17
## 6 11 27 79 76 50 36 13 2 2 1 1 1
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
tapout_rswm_q <- nb2listw(tapout_wm_q, zero.policy = TRUE)
summary(tapout_rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 1844
## Percentage nonzero weights: 1.982263
## Average number of links: 6.045902
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11 12 14 17
## 6 11 27 79 76 50 36 13 2 2 1 1 1
## 6 least connected regions:
## 2 41 132 170 228 275 with 2 links
## 1 most connected region:
## 39 with 17 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 305 93025 305 106.114 1253.543
Fips_TapIn <- order(SP_LA_TAPIN$SZ)
LocalMI_TapIn <- localmoran(SP_LA_TAPIN$TOTAL_TAP_IN, tapin_rswm_q)
head(LocalMI_TapIn)
## Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
## 1 0.1002384 -0.003289474 0.1555725 0.2624769 0.39647691
## 2 0.4810975 -0.003289474 0.4726349 0.7045787 0.24053622
## 3 -0.1003840 -0.003289474 0.1159397 -0.2851534 0.61223668
## 4 0.0873004 -0.003289474 0.1555725 0.2296747 0.40917227
## 5 0.4765620 -0.003289474 0.1027288 1.4971342 0.06717917
## 6 -0.2819368 -0.003289474 0.1555725 -0.7064613 0.76004934
Fips_TapOut <- order(SP_LA_TAPOUT$SZ)
LocalMI_TapOut <- localmoran(SP_LA_TAPOUT$TOTAL_TAP_OUT, tapout_rswm_q)
head(LocalMI_TapOut)
## Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
## 1 0.07971463 -0.003289474 0.1557926 0.2102936 0.41671926
## 2 0.45881805 -0.003289474 0.4733132 0.6716898 0.25089061
## 3 -0.19489338 -0.003289474 0.1161026 -0.5623199 0.71305096
## 4 0.10136942 -0.003289474 0.1557926 0.2651568 0.39544436
## 5 0.42402974 -0.003289474 0.1028725 1.3323021 0.09138047
## 6 -0.22286917 -0.003289474 0.1557926 -0.5563124 0.71100134
SP_LA_TAPIN.localMI_TapIn <- cbind(SP_LA_TAPIN, LocalMI_TapIn)
SP_LA_TAPOUT.localMI_TapOut <- cbind(SP_LA_TAPOUT, LocalMI_TapOut)
LocalMI_TapIn.map <- tm_shape(SP_LA_TAPIN.localMI_TapIn) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran Stats (TapIn) ") +
tm_borders(alpha = 0.5)
pvalue_TapIn.map <- tm_shape(SP_LA_TAPIN.localMI_TapIn) +
tm_fill(col = "Pr.z...0.",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values (TapIn) ") +
tm_borders(alpha = 0.5)
tmap_arrange(LocalMI_TapIn.map, pvalue_TapIn.map, asp=1, ncol=2)
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid
From the maps:
–> The areas with high local Moran’s I statistics indicate that the subzone has neighboring subzones with similarly high or low attribute values; this subzone is part of a cluster.
–> The areas with negative local Moran’s I statistics indicate that the subzones have neighboring subzones with dissimilar values; the subzone is an outlier
–> There appears to be clustering towards the east of Singapore.
–> The cluster towards the east is statistically significant as the p-value for those subzones are very small.
LocalMI_TapOut.map <- tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran Stats (TapOut) ") +
tm_borders(alpha = 0.5)
pvalue_TapOut.map <- tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
tm_fill(col = "Pr.z...0.",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values (TapOut) ") +
tm_borders(alpha = 0.5)
tmap_arrange(LocalMI_TapOut.map, pvalue_TapOut.map, asp=1, ncol=2)
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid
From the maps:
–> The areas with high local Moran’s I statistics indicate that the subzone has neighboring subzones with similarly high or low attribute values; this subzone is part of a cluster.
–> The areas with negative local Moran’s I statistics indicate that the subzones have neighboring subzones with dissimilar values; the subzone is an outlier
–> There appears to be clustering towards the east of Singapore.
–> The cluster towards the east is statistically significant as the p-value for those subzones are very small.
SP_LA_TAPIN$Z.TAPIN <- scale(SP_LA_TAPIN$TOTAL_TAP_IN) %>% as.vector
MPlot_TapIn <- moran.plot(SP_LA_TAPIN$Z.TAPIN, tapin_rswm_q, labels = as.character(SP_LA_TAPIN$SZ), xlab = "z-TAP_IN_VOLUME" , ylab = "Spatially Lag z-TAP_IN_VOLUME")
From the Moran Scatterplot:
–> Tap In Volume for subzones such as Xilin and Toa Payoh Central represent outliers. This is because they are located in the first and fourth quadrant respectively.
–> Tap In Volume for subzones such as Pasir Ris central, Bedok Reservoir and Kembangan represent clusters. This is because they are located in the second quadrant.
SP_LA_TAPOUT$Z.TAPOUT <- scale(SP_LA_TAPOUT$TOTAL_TAP_OUT) %>% as.vector
MPlot_TapOut <- moran.plot(SP_LA_TAPOUT$Z.TAPOUT, tapout_rswm_q, labels = as.character(SP_LA_TAPOUT$SZ), xlab = "z-TAP_OUT_VOLUME" , ylab = "Spatially Lag z-TAP_OUT_VOLUME")
From the Moran Scatterplot:
–> Tap In Volume for subzones such as Xilin and Toa Payoh Central represent outliers. This is because they are located in the first and fourth quadrant respectively.
–> Tap In Volume for subzones such as Bedok South, Pasir Ris central, Bedok Reservoir and Kembangan represent clusters. This is because they are located in the second quadrant.
–> Preparing
#Step 1:
quadrant_tapin <- vector(mode="numeric",length=nrow(LocalMI_TapIn))
#Step 2: Variable of interest around its mean
DV_tapin <- SP_LA_TAPIN$TOTAL_TAP_IN - mean(SP_LA_TAPIN$TOTAL_TAP_IN)
#step 3: Centering Local Moran's around mean
C_mI_tapin <- LocalMI_TapIn[,1] - mean(LocalMI_TapIn[,1])
#Step 4: Setting Statistical Intelligence Level for local Moran
signif_tapin <- 0.05
#Step 5: Defining high-high, low-low, low-high and high-low categories
quadrant_tapin[DV_tapin >0 & C_mI_tapin>0] <- 4
quadrant_tapin[DV_tapin <0 & C_mI_tapin<0] <- 1
quadrant_tapin[DV_tapin <0 & C_mI_tapin>0] <- 2
quadrant_tapin[DV_tapin >0 & C_mI_tapin<0] <- 3
quadrant_tapin[LocalMI_TapIn[,5]>signif_tapin] <- 0
–>Building
SP_LA_TAPIN.localMI_TapIn$quadrant <- quadrant_tapin
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
tm_shape(SP_LA_TAPIN.localMI_TapIn) +
tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapin)))+1], labels = clusters[c(sort(unique(quadrant_tapin)))+1], popup.vars = c("Postal.Code")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid
From the LISA Map:
–> The strongly colored regions are therefore those that contribute significantly to a positive local spatial autocorrelation outcome, while the other regions do not contribute significantly to the local spatial autocorrelation outcome.
–> There is a statistically significant cluster of high-high values towards the east.
–> Preparing
#Step 1:
quadrant_tapout <- vector(mode="numeric",length=nrow(LocalMI_TapOut))
#Step 2: Variable of interest around its mean
DV_tapout <- SP_LA_TAPOUT$TOTAL_TAP_OUT - mean(SP_LA_TAPOUT$TOTAL_TAP_OUT)
#step 3: Centering Local Moran's around mean
C_mI_tapout <- LocalMI_TapOut[,1] - mean(LocalMI_TapOut[,1])
#Step 4: Setting Statistical Intelligence Level for local Moran
signif_tapout <- 0.05
#Step 5: Defining high-high, low-low, low-high and high-low categories
quadrant_tapout[DV_tapout >0 & C_mI_tapout>0] <- 4
quadrant_tapout[DV_tapout <0 & C_mI_tapout<0] <- 1
quadrant_tapout[DV_tapout <0 & C_mI_tapout>0] <- 2
quadrant_tapout[DV_tapout >0 & C_mI_tapout<0] <- 3
quadrant_tapout[LocalMI_TapOut[,5]>signif_tapout] <- 0
–>Building
SP_LA_TAPOUT.localMI_TapOut$quadrant <- quadrant_tapout
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapout)))+1], labels = clusters[c(sort(unique(quadrant_tapout)))+1], popup.vars = c("Postal.Code")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid
From the LISA Map:
–> The strongly colored regions are therefore those that contribute significantly to a positive local spatial autocorrelation outcome, while the other regions do not contribute significantly to the local spatial autocorrelation outcome.
–> There is a statistically significant cluster of high-high values towards the east.
lisa_tapin <- tm_shape(SP_LA_TAPIN.localMI_TapIn) +
tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapin)))+1], labels = clusters[c(sort(unique(quadrant_tapin)))+1], popup.vars = c("Postal.Code")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5) +tm_layout("LISA (Tap In)")
lisa_tapout <- tm_shape(SP_LA_TAPOUT.localMI_TapOut) +
tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant_tapout)))+1], labels = clusters[c(sort(unique(quadrant_tapout)))+1], popup.vars = c("Postal.Code")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5) +tm_layout("LISA (Tap Out)")
tmap_arrange(lisa_tapin, lisa_tapout, asp=1,ncol=2)
## Warning: The shape SP_LA_TAPIN.localMI_TapIn is invalid. See sf::st_is_valid
## Warning: The shape SP_LA_TAPOUT.localMI_TapOut is invalid. See sf::st_is_valid
Comparing the two LISA Maps:
–> There are differences in the high-high clusters for Tap In Volume and Tap Out Volume.
–> Tap Out Volume appears to have slightly more high-high areas as compared to Tap In Volume.
–> Tap Out Volume has more clusters in the East as compared to Tap Out Volume.
dnb_tapin <- dnearneigh(coordinates(SP_LA_TAPIN), 0, 5500)
dnb_tapin
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
plot(SP_LA_TAPIN, border = "lightgrey")
plot(dnb_tapin, coordinates(SP_LA_TAPIN), add = TRUE, col = "red")
dnb_lw_tapin <- nb2listw(dnb_tapin, style = "B")
summary(dnb_lw_tapin)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
## Link number distribution:
##
## 1 6 8 9 10 11 13 14 16 18 19 20 21 22 23 24 25 26 27 28
## 1 1 3 1 1 1 3 1 4 3 2 4 3 5 3 4 3 6 6 4
## 29 30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 47 48 49
## 7 7 8 5 3 5 8 3 5 1 4 3 5 2 5 3 4 7 3 5
## 50 51 52 53 54 55 56 58 59 60 61 62 63 64 65 66 67 69 70 71
## 6 3 3 5 4 2 3 3 4 2 1 3 1 1 2 1 2 1 1 3
## 72 73 74 75 76 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93
## 1 2 4 1 1 2 4 3 3 2 2 5 1 7 6 4 1 3 2 3
## 94 95 96 97 98 99 101 102 103 104 105 107 108 109 110
## 2 3 2 3 4 4 3 6 3 2 5 2 1 4 1
## 1 least connected region:
## 275 with 1 link
## 1 most connected region:
## 160 with 110 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 305 93025 17036 34072 4827632
dnb_tapout <- dnearneigh(coordinates(SP_LA_TAPOUT), 0, 5500)
dnb_tapout
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
plot(SP_LA_TAPOUT, border = "lightgrey")
plot(dnb_tapout, coordinates(SP_LA_TAPOUT), add = TRUE, col = "red")
dnb_lw_tapout <- nb2listw(dnb_tapout, style = "B")
summary(dnb_lw_tapout)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 305
## Number of nonzero links: 17036
## Percentage nonzero weights: 18.31336
## Average number of links: 55.85574
## Link number distribution:
##
## 1 6 8 9 10 11 13 14 16 18 19 20 21 22 23 24 25 26 27 28
## 1 1 3 1 1 1 3 1 4 3 2 4 3 5 3 4 3 6 6 4
## 29 30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 47 48 49
## 7 7 8 5 3 5 8 3 5 1 4 3 5 2 5 3 4 7 3 5
## 50 51 52 53 54 55 56 58 59 60 61 62 63 64 65 66 67 69 70 71
## 6 3 3 5 4 2 3 3 4 2 1 3 1 1 2 1 2 1 1 3
## 72 73 74 75 76 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93
## 1 2 4 1 1 2 4 3 3 2 2 5 1 7 6 4 1 3 2 3
## 94 95 96 97 98 99 101 102 103 104 105 107 108 109 110
## 2 3 2 3 4 4 3 6 3 2 5 2 1 4 1
## 1 least connected region:
## 275 with 1 link
## 1 most connected region:
## 160 with 110 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 305 93025 17036 34072 4827632
###Computing Gi Statistics
tapin_fips <- order(SP_LA_TAPIN$SZ)
gi.fixed_tapin <- localG(SP_LA_TAPIN$TOTAL_TAP_IN, dnb_lw_tapin)
SP_LA_TAPIN.gi_tapin <- cbind(SP_LA_TAPIN, as.matrix(gi.fixed_tapin))
names(SP_LA_TAPIN.gi_tapin)[4] <- "gstat"
tapout_fips <- order(SP_LA_TAPOUT$SZ)
gi.fixed_tapout <- localG(SP_LA_TAPOUT$TOTAL_TAP_OUT, dnb_lw_tapout)
SP_LA_TAPOUT.gi_tapout <- cbind(SP_LA_TAPOUT, as.matrix(gi.fixed_tapout))
names(SP_LA_TAPOUT.gi_tapout)[4] <- "gstat"
tm_shape(SP_LA_TAPIN.gi_tapin) +
tm_fill(col = "gstat",
style = "pretty",
palette="-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.5)
## Warning: The shape SP_LA_TAPIN.gi_tapin is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
From the choropleth map above:
–> Subzones shaded in red are hot spot areas and subzones shaded in blue are the cold spot areas. The darkness of the colours represent the intensity of the Gi values.
–> The hot spot areas are on the north, east and west regions of Singapore. The cold spot areas, on the other hand, mainly comprise of subzones located on the southern part of the country.
–> The most intense positive Gi values are found towards the east while the most intense negative Gi values are found towards the south.
tm_shape(SP_LA_TAPOUT.gi_tapout) +
tm_fill(col = "gstat",
style = "pretty",
palette="-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.5)
## Warning: The shape SP_LA_TAPOUT.gi_tapout is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
From the choropleth map above:
–> Subzones shaded in red are hot spot areas and subzones shaded in blue are the cold spot areas. The darkness of the colours represent the intensity of the Gi values.
–> The hot spot areas are on the north, east and west regions of Singapore. The cold spot areas, on the other hand, mainly comprise of subzones located on the southern part of the country.
–> The most intense positive Gi values are found towards the east while the most intense negative Gi values are found towards the south.
Gi_tapin <- tm_shape(SP_LA_TAPIN.gi_tapin) +
tm_fill(col = "gstat",
style = "pretty",
palette="-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.5) +
tm_layout("Gi Values (Tap In)")
Gi_tapout <- tm_shape(SP_LA_TAPOUT.gi_tapout) +
tm_fill(col = "gstat",
style = "pretty",
palette="-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.5) +
tm_layout("Gi Values (Tap Out)")
tmap_arrange(Gi_tapin, Gi_tapout, asp=1,ncol=2)
## Warning: The shape SP_LA_TAPIN.gi_tapin is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning: The shape SP_LA_TAPOUT.gi_tapout is invalid. See sf::st_is_valid
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Comparing the two maps:
–> There are minor differences between the two maps. For example, Tap Out has more subzones with high positive local Gi values in the East as compared to Tap In. It also has more subzones with high negative local Gi values in the South as compared to Tap In.