1 Overview

Student Name: Zhao Tianyun

Student ID: 01368190

Link to Rpub:

1.1 Understanding datasets

1.1.1 Aspatial

SGPools_svy21: a dataset displaying the number of wins by Singapore Pools’ outlets, also containing the coordinates locations of these outlets

respopagsex2000to2018: a database with population info(age, sex etc.) by Sinagpore subzones

1.1.2 Geospatial

MP14_SUBZONE_WEB_PL: geospatial shapefile of Singapore subzones

1.2 Purpose of Visualisation

To map the outlets by location, type and number of wins to get a intuitive understanding of the big picture. And to relate the number of wins with the local population by planning subzones to see which subzones have potentially higher average wins per capita.

1.3 Challenges

One challenge is the cleaning of aspatial datasets in order to transform them to clean datasets that are easy to serve my purpose. Another challenge is wrangle the aspatial datasets with geospatail datasets. A major challenge encountered is to restrict the spatial points(sgpools outlets) into the respective spatial polygons(planning subzones) in order to find which outlets are within the repective subzone and to calculate total winnings within the subzone. This is done with the assistance of spatialEco package.

1.4 Sketch of Proposed DataViz Design

A caption

A caption

2 DataViz Step-by-Step

Now, we will go through the data wrangling and visualization steps ## Loading required packages

packages = c('tidyverse', 'sf', 'tmap', 'spatialEco')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
}

2.1 Loading datasets

sgpools <- read_csv("data/aspatial/SGPools_svy21.csv")
## Parsed with column specification:
## cols(
##   NAME = col_character(),
##   ADDRESS = col_character(),
##   POSTCODE = col_double(),
##   XCOORD = col_double(),
##   YCOORD = col_double(),
##   `OUTLET TYPE` = col_character(),
##   `Gp1Gp2 Winnings` = col_double()
## )

2.2 Creating a sf data frame from sgpools data frame

sgpools_sf <- st_as_sf(sgpools, 
                       coords = c("XCOORD", "YCOORD"),
                       crs= 3414)

2.3 Mapping the outlets using interactive tmap

Map the sgpools outlets with symbol size proportional to winnings and symbol color by outlet type.

map1<- tm_shape(sgpools_sf)+
        tm_bubbles(col = "OUTLET TYPE", 
          id = "NAME",
          size = "Gp1Gp2 Winnings",
          border.col = "black",
          border.lwd = 1)
map2<- tm_shape(sgpools_sf)+
        tm_bubbles(col = "OUTLET TYPE", 
          id = "Gp1Gp2 Winnings",
          size = "Gp1Gp2 Winnings",
          border.col = "black",
          border.lwd = 1)

2.4 Import singapore subzone spatial data

mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\SMU\IS428\Ass 5\Assignment-5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

2.5 Import population data

pop <- read_csv("data/aspatial/respopagsex2000to2018.csv")
## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

2.6 Dataframe tranformation to get total population by subzones

Get total population of all ages and sex within each subzone

pop2 <- pop %>%
  filter(Time == 2018) %>%
  spread(AG, Pop) %>%
  mutate(`TOTAL` = rowSums(.[5:22])) %>%
  select(`PA`, `SZ`, `Sex`, `TOTAL`)
popsz <- pop2 %>%
  group_by(SZ) %>%
  summarise(sum = sum(TOTAL))

Check the output

head(popsz)
## # A tibble: 6 x 2
##   SZ                sum
##   <chr>           <dbl>
## 1 Admiralty       14180
## 2 Airport Road        0
## 3 Alexandra Hill  14260
## 4 Alexandra North  1430
## 5 Aljunied        40040
## 6 Anak Bukit      22100

2.7 Join population data with subzone spatial dataframe

mpsz_popsz <- left_join(mpsz, popsz, 
                              by = c("SUBZONE_N" = "SZ"))

Check the output

mpsz_popsz
## Simple feature collection with 323 features and 16 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## First 10 features:
##    OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
## 1         1          1    MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
## 2         2          1    PEARL'S HILL    OTSZ01      Y          OUTRAM
## 3         3          3       BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
## 4         4          8  HENDERSON HILL    BMSZ08      N     BUKIT MERAH
## 5         5          3         REDHILL    BMSZ03      N     BUKIT MERAH
## 6         6          7  ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
## 7         7          9   BUKIT HO SWEE    BMSZ09      N     BUKIT MERAH
## 8         8          2     CLARKE QUAY    SRSZ02      Y SINGAPORE RIVER
## 9         9         13 PASIR PANJANG 1    QTSZ13      N      QUEENSTOWN
## 10       10          7       QUEENSWAY    QTSZ07      N      QUEENSTOWN
##    PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
## 1          MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05 31595.84
## 2          OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05 28679.06
## 3          SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05 29654.96
## 4          BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05 26782.83
## 5          BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05 26201.96
## 6          BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05 25358.82
## 7          BM CENTRAL REGION       CR 7839A8577144EFE2 2014-12-05 27680.06
## 8          SR CENTRAL REGION       CR 48661DC0FBA09F7A 2014-12-05 29253.21
## 9          QT CENTRAL REGION       CR 1F721290C421BFAB 2014-12-05 22077.34
## 10         QT CENTRAL REGION       CR 3580D2AFFBEE914C 2014-12-05 24168.31
##      Y_ADDR SHAPE_Leng SHAPE_Area sum                       geometry
## 1  29220.19   5267.381  1630379.3  NA MULTIPOLYGON (((31495.56 30...
## 2  29782.05   3506.107   559816.2  NA MULTIPOLYGON (((29092.28 30...
## 3  29974.66   1740.926   160807.5  NA MULTIPOLYGON (((29932.33 29...
## 4  29933.77   3313.625   595428.9  NA MULTIPOLYGON (((27131.28 30...
## 5  30005.70   2825.594   387429.4  NA MULTIPOLYGON (((26451.03 30...
## 6  29991.38   4428.913  1030378.8  NA MULTIPOLYGON (((25899.7 297...
## 7  30230.86   3275.312   551732.0  NA MULTIPOLYGON (((27746.95 30...
## 8  30222.86   2208.619   290184.7  NA MULTIPOLYGON (((29351.26 29...
## 9  29893.78   6571.323  1084792.3  NA MULTIPOLYGON (((20996.49 30...
## 10 30104.18   3454.239   631644.3  NA MULTIPOLYGON (((24472.11 29...

2.8 Wrangle sgpools spatial points with subzones spatial polygons

Frame the sgpool spatial data points within its corresponding subzone spatial polygons using point.in.poly from spatialGEO package.

st_crs(mpsz) = 3414
mpsz_sgpools <- point.in.poly(sgpools_sf, mpsz)

2.9 Extract the wrangled data frame from spatial dataframe

Check its contents

mpsz_sgpools_data <- mpsz_sgpools@data
head(mpsz_sgpools_data)
##                               NAME
## 1      Livewire (Marina Bay Sands)
## 2 Livewire (Resorts World Sentosa)
## 3              SportsBuzz (Kranji)
## 4                SportsBuzz (PoMo)
## 5            Prime Serangoon North
## 6 Singapore Pools Woodlands Centre
##                                                                             ADDRESS
## 1      2 Bayfront Avenue, #01-01 The Shoppes at Marina Bay Sands   Singapore 018972
## 2                                     26 Sentosa Gateway #B1-208   Singapore 098138
## 3 Lotus Lounge, Level 2, 1 Turf Club Avenue Singapore Racecourse   Singapore 738078
## 4                                       1 Selegie Rd PoMo #01-01   Singapore 188306
## 5                          Blk 542B Serangoon North Ave 3 #02-01   Singapore 552542
## 6                                1A Woodlands Centre Road #01-26   Singapore 731001
##   POSTCODE OUTLET.TYPE Gp1Gp2.Winnings OBJECTID SUBZONE_NO        SUBZONE_N
## 1    18972      Branch               5       41         12 BAYFRONT SUBZONE
## 2    98138      Branch              11       20          1          SENTOSA
## 3   738078      Branch               0      281          2        TURF CLUB
## 4   188306      Branch              44       97          8          SELEGIE
## 5   552542      Branch               0      253          3  SERANGOON NORTH
## 6   731001      Branch               3      320          9      NORTH COAST
##   SUBZONE_C CA_IND       PLN_AREA_N PLN_AREA_C          REGION_N REGION_C
## 1    DTSZ12      Y    DOWNTOWN CORE         DT    CENTRAL REGION       CR
## 2    SISZ01      N SOUTHERN ISLANDS         SI    CENTRAL REGION       CR
## 3    SKSZ02      N     SUNGEI KADUT         SK      NORTH REGION       NR
## 4    RCSZ08      Y           ROCHOR         RC    CENTRAL REGION       CR
## 5    SGSZ03      N        SERANGOON         SG NORTH-EAST REGION      NER
## 6    WDSZ09      N        WOODLANDS         WD      NORTH REGION       NR
##            INC_CRC FMEL_UPD_D   X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
## 1 197F5E664DA4D5E1 2014-12-05 30806.24 29530.17   2897.126  521200.52
## 2 A6FCDC9C447952CB 2014-12-05 27593.94 25813.35  17496.194 4919132.37
## 3 17459E60C3EEAF30 2014-12-05 20234.60 44507.19   7665.676 3291405.51
## 4 E8FA232C5BEEA956 2014-12-05 29817.33 31611.00   1478.050   49626.02
## 5 C685042EC58E5C55 2014-12-05 32458.80 39597.64   3610.732  684704.30
## 6 898B2436858382A1 2014-12-05 22147.04 48031.55  10847.882 2450783.61

2.10 Get total winnings by subzone

mpsz_sgpools_data_summary <- mpsz_sgpools_data %>%
  group_by(SUBZONE_N) %>%
  summarise(total_winnings = sum(Gp1Gp2.Winnings))

Check the result

mpsz_sgpools_data_summary
## # A tibble: 159 x 2
##    SUBZONE_N              total_winnings
##    <chr>                           <dbl>
##  1 ADMIRALTY                          20
##  2 ALEXANDRA HILL                     62
##  3 ALJUNIED                          178
##  4 ANAK BUKIT                         62
##  5 ANCHORVALE                         21
##  6 ANG MO KIO TOWN CENTRE             77
##  7 BALESTIER                          69
##  8 BANGKIT                            42
##  9 BAYFRONT SUBZONE                    5
## 10 BEDOK NORTH                       181
## # ... with 149 more rows

2.11 Join together total winnings and total population by subzones

popsz$SZ = toupper(popsz$SZ)

winning_pop <- full_join(mpsz_sgpools_data_summary, popsz, by = c("SUBZONE_N" = "SZ"))

2.12 Calculate winnings per capita by subzone

winning_pop <- winning_pop %>%
  mutate(`winning_per_capita` = `total_winnings`/`sum`) %>%
  filter(sum > 0)
winning_pop
## # A tibble: 234 x 4
##    SUBZONE_N              total_winnings   sum winning_per_capita
##    <chr>                           <dbl> <dbl>              <dbl>
##  1 ADMIRALTY                          20 14180            0.00141
##  2 ALEXANDRA HILL                     62 14260            0.00435
##  3 ALJUNIED                          178 40040            0.00445
##  4 ANAK BUKIT                         62 22100            0.00281
##  5 ANCHORVALE                         21 43750            0.00048
##  6 ANG MO KIO TOWN CENTRE             77  4830            0.0159 
##  7 BALESTIER                          69 32360            0.00213
##  8 BANGKIT                            42 22930            0.00183
##  9 BEDOK NORTH                       181 82630            0.00219
## 10 BEDOK RESERVOIR                    27 25860            0.00104
## # ... with 224 more rows

2.13 Replace NAs with 0

As there are no sgpools outlets in the respective subzones

winning_pop[is.na(winning_pop)] = 0

2.14 Check the distribution of winning per capita

ggplot(winning_pop, aes(x=winning_per_capita)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

2.15 Join the dataframe with subzone spatial polygons dataframe

winning_pop_sf <- left_join(mpsz, winning_pop, 
                              by = ("SUBZONE_N"))

2.16 Plot the winnings per capita using tmap

Define breaks to represent the values better as the values are very skewed. The tmap is designed to show subzone names when hovered over with mouse.

tmap_mode("view")
## tmap mode set to interactive viewing
map0 <- tm_shape(winning_pop_sf)+
          tm_fill("winning_per_capita",
                id = c("SUBZONE_N"),
                n = 5,
                breaks=c(0, 0.001, 0.005, 0.01, 0.1, 0.3)
                ) +
          tm_borders(alpha = 0.5)
map0

## Plot the previous sgPools outlets map using tmap

tmap_arrange(map1, map2, nrow = 1, sync = TRUE)
## Legend for symbol sizes not available in view mode.
## Legend for symbol sizes not available in view mode.

3 Descriptions

3.1 Description of visualization

There two visualizations produced. They are both interactive maps using tmap package.

The first map(map0) is a thematic map showing the number of winning per capita by planning subzones. Breaks are mannually defined to represent the values better as they are very skewed. The tmap is designed to show subzone names when hovered over with mouse.

The second map(map1 and map2) shows the locations of sgpools outlets represented in bubbles. The color of the bubbles show the outlet type. The size of the bubbles shows the relative number of winnings. Two small maps are produced side-by-side in this case to show both name of the outlet and number of winnings when mouse pointer hovers over the outlets.

3.2 Findings

  1. The location of the outlets is not evenly distributed. They are distributed around major residential areas and business areas. The number of winnings differs probably due to the number of sales for each outlet. The average winning from branch and outlets do not differ much.

  2. As for winning per capita, we can see that some subzones has very high winning per capita. Among the top are Jurong Gateway, Boulevard, People’s park, Boat quay, Selegie, Somerset, Yishun Centre, Hougang Central, Maritime Square, Ang Mo Kio Town Centre. These numbers do not likely indicate that you have a better chance of winning buying from outlets in these areas. Instead, this may indicate that these areas have a very high human traffic and are major hubs or town core areas.