Student Name: Zhao Tianyun
Student ID: 01368190
Link to Rpub:
SGPools_svy21: a dataset displaying the number of wins by Singapore Pools’ outlets, also containing the coordinates locations of these outlets
respopagsex2000to2018: a database with population info(age, sex etc.) by Sinagpore subzones
MP14_SUBZONE_WEB_PL: geospatial shapefile of Singapore subzones
To map the outlets by location, type and number of wins to get a intuitive understanding of the big picture. And to relate the number of wins with the local population by planning subzones to see which subzones have potentially higher average wins per capita.
One challenge is the cleaning of aspatial datasets in order to transform them to clean datasets that are easy to serve my purpose. Another challenge is wrangle the aspatial datasets with geospatail datasets. A major challenge encountered is to restrict the spatial points(sgpools outlets) into the respective spatial polygons(planning subzones) in order to find which outlets are within the repective subzone and to calculate total winnings within the subzone. This is done with the assistance of spatialEco package.
A caption
Now, we will go through the data wrangling and visualization steps ## Loading required packages
packages = c('tidyverse', 'sf', 'tmap', 'spatialEco')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
}
sgpools <- read_csv("data/aspatial/SGPools_svy21.csv")
## Parsed with column specification:
## cols(
## NAME = col_character(),
## ADDRESS = col_character(),
## POSTCODE = col_double(),
## XCOORD = col_double(),
## YCOORD = col_double(),
## `OUTLET TYPE` = col_character(),
## `Gp1Gp2 Winnings` = col_double()
## )
sgpools_sf <- st_as_sf(sgpools,
coords = c("XCOORD", "YCOORD"),
crs= 3414)
Map the sgpools outlets with symbol size proportional to winnings and symbol color by outlet type.
map1<- tm_shape(sgpools_sf)+
tm_bubbles(col = "OUTLET TYPE",
id = "NAME",
size = "Gp1Gp2 Winnings",
border.col = "black",
border.lwd = 1)
map2<- tm_shape(sgpools_sf)+
tm_bubbles(col = "OUTLET TYPE",
id = "Gp1Gp2 Winnings",
size = "Gp1Gp2 Winnings",
border.col = "black",
border.lwd = 1)
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\SMU\IS428\Ass 5\Assignment-5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
pop <- read_csv("data/aspatial/respopagsex2000to2018.csv")
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
Get total population of all ages and sex within each subzone
pop2 <- pop %>%
filter(Time == 2018) %>%
spread(AG, Pop) %>%
mutate(`TOTAL` = rowSums(.[5:22])) %>%
select(`PA`, `SZ`, `Sex`, `TOTAL`)
popsz <- pop2 %>%
group_by(SZ) %>%
summarise(sum = sum(TOTAL))
Check the output
head(popsz)
## # A tibble: 6 x 2
## SZ sum
## <chr> <dbl>
## 1 Admiralty 14180
## 2 Airport Road 0
## 3 Alexandra Hill 14260
## 4 Alexandra North 1430
## 5 Aljunied 40040
## 6 Anak Bukit 22100
mpsz_popsz <- left_join(mpsz, popsz,
by = c("SUBZONE_N" = "SZ"))
Check the output
mpsz_popsz
## Simple feature collection with 323 features and 16 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## First 10 features:
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
## 1 1 1 MARINA SOUTH MSSZ01 Y MARINA SOUTH
## 2 2 1 PEARL'S HILL OTSZ01 Y OUTRAM
## 3 3 3 BOAT QUAY SRSZ03 Y SINGAPORE RIVER
## 4 4 8 HENDERSON HILL BMSZ08 N BUKIT MERAH
## 5 5 3 REDHILL BMSZ03 N BUKIT MERAH
## 6 6 7 ALEXANDRA HILL BMSZ07 N BUKIT MERAH
## 7 7 9 BUKIT HO SWEE BMSZ09 N BUKIT MERAH
## 8 8 2 CLARKE QUAY SRSZ02 Y SINGAPORE RIVER
## 9 9 13 PASIR PANJANG 1 QTSZ13 N QUEENSTOWN
## 10 10 7 QUEENSWAY QTSZ07 N QUEENSTOWN
## PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
## 1 MS CENTRAL REGION CR 5ED7EB253F99252E 2014-12-05 31595.84
## 2 OT CENTRAL REGION CR 8C7149B9EB32EEFC 2014-12-05 28679.06
## 3 SR CENTRAL REGION CR C35FEFF02B13E0E5 2014-12-05 29654.96
## 4 BM CENTRAL REGION CR 3775D82C5DDBEFBD 2014-12-05 26782.83
## 5 BM CENTRAL REGION CR 85D9ABEF0A40678F 2014-12-05 26201.96
## 6 BM CENTRAL REGION CR 9D286521EF5E3B59 2014-12-05 25358.82
## 7 BM CENTRAL REGION CR 7839A8577144EFE2 2014-12-05 27680.06
## 8 SR CENTRAL REGION CR 48661DC0FBA09F7A 2014-12-05 29253.21
## 9 QT CENTRAL REGION CR 1F721290C421BFAB 2014-12-05 22077.34
## 10 QT CENTRAL REGION CR 3580D2AFFBEE914C 2014-12-05 24168.31
## Y_ADDR SHAPE_Leng SHAPE_Area sum geometry
## 1 29220.19 5267.381 1630379.3 NA MULTIPOLYGON (((31495.56 30...
## 2 29782.05 3506.107 559816.2 NA MULTIPOLYGON (((29092.28 30...
## 3 29974.66 1740.926 160807.5 NA MULTIPOLYGON (((29932.33 29...
## 4 29933.77 3313.625 595428.9 NA MULTIPOLYGON (((27131.28 30...
## 5 30005.70 2825.594 387429.4 NA MULTIPOLYGON (((26451.03 30...
## 6 29991.38 4428.913 1030378.8 NA MULTIPOLYGON (((25899.7 297...
## 7 30230.86 3275.312 551732.0 NA MULTIPOLYGON (((27746.95 30...
## 8 30222.86 2208.619 290184.7 NA MULTIPOLYGON (((29351.26 29...
## 9 29893.78 6571.323 1084792.3 NA MULTIPOLYGON (((20996.49 30...
## 10 30104.18 3454.239 631644.3 NA MULTIPOLYGON (((24472.11 29...
Frame the sgpool spatial data points within its corresponding subzone spatial polygons using point.in.poly from spatialGEO package.
st_crs(mpsz) = 3414
mpsz_sgpools <- point.in.poly(sgpools_sf, mpsz)
Check its contents
mpsz_sgpools_data <- mpsz_sgpools@data
head(mpsz_sgpools_data)
## NAME
## 1 Livewire (Marina Bay Sands)
## 2 Livewire (Resorts World Sentosa)
## 3 SportsBuzz (Kranji)
## 4 SportsBuzz (PoMo)
## 5 Prime Serangoon North
## 6 Singapore Pools Woodlands Centre
## ADDRESS
## 1 2 Bayfront Avenue, #01-01 The Shoppes at Marina Bay Sands Singapore 018972
## 2 26 Sentosa Gateway #B1-208 Singapore 098138
## 3 Lotus Lounge, Level 2, 1 Turf Club Avenue Singapore Racecourse Singapore 738078
## 4 1 Selegie Rd PoMo #01-01 Singapore 188306
## 5 Blk 542B Serangoon North Ave 3 #02-01 Singapore 552542
## 6 1A Woodlands Centre Road #01-26 Singapore 731001
## POSTCODE OUTLET.TYPE Gp1Gp2.Winnings OBJECTID SUBZONE_NO SUBZONE_N
## 1 18972 Branch 5 41 12 BAYFRONT SUBZONE
## 2 98138 Branch 11 20 1 SENTOSA
## 3 738078 Branch 0 281 2 TURF CLUB
## 4 188306 Branch 44 97 8 SELEGIE
## 5 552542 Branch 0 253 3 SERANGOON NORTH
## 6 731001 Branch 3 320 9 NORTH COAST
## SUBZONE_C CA_IND PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## 1 DTSZ12 Y DOWNTOWN CORE DT CENTRAL REGION CR
## 2 SISZ01 N SOUTHERN ISLANDS SI CENTRAL REGION CR
## 3 SKSZ02 N SUNGEI KADUT SK NORTH REGION NR
## 4 RCSZ08 Y ROCHOR RC CENTRAL REGION CR
## 5 SGSZ03 N SERANGOON SG NORTH-EAST REGION NER
## 6 WDSZ09 N WOODLANDS WD NORTH REGION NR
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR SHAPE_Leng SHAPE_Area
## 1 197F5E664DA4D5E1 2014-12-05 30806.24 29530.17 2897.126 521200.52
## 2 A6FCDC9C447952CB 2014-12-05 27593.94 25813.35 17496.194 4919132.37
## 3 17459E60C3EEAF30 2014-12-05 20234.60 44507.19 7665.676 3291405.51
## 4 E8FA232C5BEEA956 2014-12-05 29817.33 31611.00 1478.050 49626.02
## 5 C685042EC58E5C55 2014-12-05 32458.80 39597.64 3610.732 684704.30
## 6 898B2436858382A1 2014-12-05 22147.04 48031.55 10847.882 2450783.61
mpsz_sgpools_data_summary <- mpsz_sgpools_data %>%
group_by(SUBZONE_N) %>%
summarise(total_winnings = sum(Gp1Gp2.Winnings))
Check the result
mpsz_sgpools_data_summary
## # A tibble: 159 x 2
## SUBZONE_N total_winnings
## <chr> <dbl>
## 1 ADMIRALTY 20
## 2 ALEXANDRA HILL 62
## 3 ALJUNIED 178
## 4 ANAK BUKIT 62
## 5 ANCHORVALE 21
## 6 ANG MO KIO TOWN CENTRE 77
## 7 BALESTIER 69
## 8 BANGKIT 42
## 9 BAYFRONT SUBZONE 5
## 10 BEDOK NORTH 181
## # ... with 149 more rows
popsz$SZ = toupper(popsz$SZ)
winning_pop <- full_join(mpsz_sgpools_data_summary, popsz, by = c("SUBZONE_N" = "SZ"))
winning_pop <- winning_pop %>%
mutate(`winning_per_capita` = `total_winnings`/`sum`) %>%
filter(sum > 0)
winning_pop
## # A tibble: 234 x 4
## SUBZONE_N total_winnings sum winning_per_capita
## <chr> <dbl> <dbl> <dbl>
## 1 ADMIRALTY 20 14180 0.00141
## 2 ALEXANDRA HILL 62 14260 0.00435
## 3 ALJUNIED 178 40040 0.00445
## 4 ANAK BUKIT 62 22100 0.00281
## 5 ANCHORVALE 21 43750 0.00048
## 6 ANG MO KIO TOWN CENTRE 77 4830 0.0159
## 7 BALESTIER 69 32360 0.00213
## 8 BANGKIT 42 22930 0.00183
## 9 BEDOK NORTH 181 82630 0.00219
## 10 BEDOK RESERVOIR 27 25860 0.00104
## # ... with 224 more rows
As there are no sgpools outlets in the respective subzones
winning_pop[is.na(winning_pop)] = 0
ggplot(winning_pop, aes(x=winning_per_capita)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
winning_pop_sf <- left_join(mpsz, winning_pop,
by = ("SUBZONE_N"))
Define breaks to represent the values better as the values are very skewed. The tmap is designed to show subzone names when hovered over with mouse.
tmap_mode("view")
## tmap mode set to interactive viewing
map0 <- tm_shape(winning_pop_sf)+
tm_fill("winning_per_capita",
id = c("SUBZONE_N"),
n = 5,
breaks=c(0, 0.001, 0.005, 0.01, 0.1, 0.3)
) +
tm_borders(alpha = 0.5)
map0
## Plot the previous sgPools outlets map using tmap
tmap_arrange(map1, map2, nrow = 1, sync = TRUE)
## Legend for symbol sizes not available in view mode.
## Legend for symbol sizes not available in view mode.
There two visualizations produced. They are both interactive maps using tmap package.
The first map(map0) is a thematic map showing the number of winning per capita by planning subzones. Breaks are mannually defined to represent the values better as they are very skewed. The tmap is designed to show subzone names when hovered over with mouse.
The second map(map1 and map2) shows the locations of sgpools outlets represented in bubbles. The color of the bubbles show the outlet type. The size of the bubbles shows the relative number of winnings. Two small maps are produced side-by-side in this case to show both name of the outlet and number of winnings when mouse pointer hovers over the outlets.
The location of the outlets is not evenly distributed. They are distributed around major residential areas and business areas. The number of winnings differs probably due to the number of sales for each outlet. The average winning from branch and outlets do not differ much.
As for winning per capita, we can see that some subzones has very high winning per capita. Among the top are Jurong Gateway, Boulevard, People’s park, Boat quay, Selegie, Somerset, Yishun Centre, Hougang Central, Maritime Square, Ang Mo Kio Town Centre. These numbers do not likely indicate that you have a better chance of winning buying from outlets in these areas. Instead, this may indicate that these areas have a very high human traffic and are major hubs or town core areas.