Maps Lab 9 - Hawker centres in Singapore

First, load in the required libraries:

library(tidyverse)

## -- Attaching packages ---------------------------------------------------------------------------- tidyverse 1.2.1 --

## <U+221A> ggplot2 2.2.1     <U+221A> purrr   0.2.4
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.4
## <U+221A> tidyr   0.8.0     <U+221A> stringr 1.3.0
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0

## -- Conflicts ------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(sf)

## Warning: package 'sf' was built under R version 3.4.4

## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3

library(sp)
library(rgdal)

## Warning: package 'rgdal' was built under R version 3.4.4

## rgdal: version: 1.2-18, (SVN revision 718)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-7

library(spatstat)

## Loading required package: spatstat.data

## Loading required package: nlme

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:dplyr':
## 
##     collapse

## Loading required package: rpart

## 
## spatstat 1.55-0       (nickname: 'Stunned Mullet') 
## For an introduction to spatstat, type 'beginner'

library(ggmap)
library(maptools)

## Checking rgeos availability: FALSE
##      Note: when rgeos is not available, polygon geometry     computations in maptools depend on gpclib,
##      which has a restricted licence. It is disabled by default;
##      to enable gpclib, type gpclibPermit()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

1. Data loading and cleaning

Load the data from a csv file and take a look at it:

tenders <- read_csv("tabula-tender-bids-from-mar-2012-to-jan-2017.csv", col_names = c("centre","stall","area","trade","bid","month"))

## Parsed with column specification:
## cols(
##   centre = col_character(),
##   stall = col_character(),
##   area = col_double(),
##   trade = col_character(),
##   bid = col_character(),
##   month = col_character()
## )

## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)

## Warning: 23 parsing failures.
## row # A tibble: 5 x 5 col     row col   expected               actual    file                        expected   <int> <chr> <chr>                  <chr>     <chr>                       actual 1  2269 <NA>  6 columns              4 columns 'tabula-tender-bids-from-m~ file 2  3499 <NA>  6 columns              4 columns 'tabula-tender-bids-from-m~ row 3  3945 area  no trailing characters " 38 ."   'tabula-tender-bids-from-m~ col 4  3947 area  no trailing characters " 27 ."   'tabula-tender-bids-from-m~ expected 5  3999 area  no trailing characters " 02 ."   'tabula-tender-bids-from-m~
## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
## See problems(...) for more details.

head(tenders)

summary(tenders)

##     centre             stall                area           trade          
##  Length:5563        Length:5563        Min.   : 2.080   Length:5563       
##  Class :character   Class :character   1st Qu.: 6.000   Class :character  
##  Mode  :character   Mode  :character   Median : 6.750   Mode  :character  
##                                        Mean   : 7.091                     
##                                        3rd Qu.: 8.490                     
##                                        Max.   :22.000                     
##                                        NA's   :2804                       
##      bid               month          
##  Length:5563        Length:5563       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
##

There are some issues with the data that need to be cleaned: - Every alternate row is empty - The stall type are marked using header rows - Price and month should be parsed as numeric and date formats respectively

# Fix alternate empty rows
tenders <- tenders %>% drop_na(centre)

# Convert stall type to column
tenders <- tenders %>%
    filter(row_number() != 1135) %>%
    mutate(type = if_else(row_number() < 1135, "cooked", "lockup")) %>%
    filter(row_number() != 1749) %>%
    mutate(type = if_else(row_number() >= 1749, "market", type))

# Parse price and month as numeric and date
tenders <- tenders %>%
    mutate(bidNum = as.numeric(str_replace_all(bid, pattern="\\$|,", replacement=""))) %>%
    mutate(date = dmy(paste0("01-",month))) %>%
    mutate(priceM2 = bidNum/area)

(At this point, I realised that there are 49 rows (588 to 636) where the data is shifted, because of a missing comma in each row. I edited the csv file to add the missing commas, then redid all the earlier steps.)

2. Exploring some questions

Does the average price per m2 change over time?

avg_price_date = tenders %>%
    filter(is.na(priceM2) == FALSE) %>%
    group_by(date) %>%
    summarise(avg_price = mean(priceM2))

ggplot(avg_price_date, aes(x=date, y=avg_price)) + geom_line() + labs(title="Hawker stalls price per m2 over time", x="Month", y="Price per m2")

It looks like the average price increased over the years, but because the price fluctuates significantly, it’s a bit hard to tell for certain. What if we take a running 3 month average of the price instead?

avg_price_date$avg_3mth = 0
n = nrow(avg_price_date)
for (i in 2:(nrow(avg_price_date)-1)) {
    avg_price_date$avg_3mth[i] = (avg_price_date$avg_price[i-1] 
                                  +avg_price_date$avg_price[i]
                                  +avg_price_date$avg_price[i+1]
                                  )/3
}
avg_price_date$avg_3mth[1] = (avg_price_date$avg_price[1]+avg_price_date$avg_price[2])/2
avg_price_date$avg_3mth[n] = (avg_price_date$avg_price[n]+avg_price_date$avg_price[n-1])/2

ggplot(avg_price_date, aes(x=date, y=avg_3mth)) + geom_line() + labs(title="Hawker stalls price per m2, 3-month average", x="Month", y="Price per m2")

We can see that the price started very high in 2012 but crashed in the later half. It was on an upward trend from 2014-16, but dipped towards the end of 2016.

How many bids are there for each trade?

nbids_trade <- tenders %>%
    group_by(trade) %>%
    summarise(count = n())

# sort data and plot
nbids_trade <- nbids_trade[order(nbids_trade$count), ]
nbids_trade %>% 
  ggplot(aes(x=factor(trade, trade), y=count)) + geom_col() + coord_flip() + labs(title="Number of bids for each trade", x="Trade", y="Number of bids")

Cooked food and sundry goods top the list, while more specialized trades such as beef, mutton and halal meat have the least bids.

3. Looking at hawker centres in space

Next, we use geocoding to get the specific geographic location of each hawker centre and do some analysis.

# geocoding and joining
centres = read.csv('centres-geocoded.csv')
tenders.sp <- left_join(tenders, centres, by = c("centre" = "centre"))

## Warning: Column `centre` joining character vector and factor, coercing into
## character vector

Running a check, we find that there is one row in tenders that has not been joined correctly. This is because the centre name is repeated in the original data, so just clean that up and join again.

tenders.sp %>% filter(is.na(lat))

tenders$centre[597] = "BLK 51 OLD AIRPORT ROAD"
tenders.sp <- left_join(tenders, centres, by = c("centre" = "centre"))

## Warning: Column `centre` joining character vector and factor, coercing into
## character vector

Plotting a few different aspects of the data:

# convenience variables for theming
theme_noaxis = theme(axis.ticks.y=element_blank(), axis.text.y=element_blank(), axis.ticks.x=element_blank(), axis.text.x=element_blank())
theme_nolabs = labs(x="", y="")

ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_density2d() + coord_fixed() + labs(title="Density of hawker stall bids") + theme_noaxis + theme_nolabs

The highest density of bids is in the south, especially around the city area. This is surprising as we don’t normally associate the city with hawker centres, but more of shopping centres and restaurants. However, since the data is on bids rather than the number of stalls, it could be that hawker stalls in the city are more competitive.

ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point(alpha=0.3) + coord_fixed() + facet_grid(type ~ year(date)) + labs(title="Distribution of hawker stall bids by year and type", x="", y="") + theme_noaxis

Cooked food and lockup stalls seem to have become more widely distributed over the years.

ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point(alpha=0.3) + coord_fixed() + facet_wrap(~trade) + labs(title="Distribution of hawker stall bids by trade", x="", y="") + theme_noaxis

There does not seem to be any clear pattern of hawker stall locations based on trade.

4. Spatial Point Patterns

The spatstat package allows us to analyse spatial data.

First, we need to convert the dataset to a format that spatstat can work with:

centres.sp <- tenders.sp %>%
  filter(lat > 0) %>%
  group_by(centre, lon, lat) %>%
  summarise(price = mean(priceM2, na.rm = T)) %>%
  replace_na(list(price = 0))

coordinates(centres.sp) <- c('lon', 'lat')
centres.ppp <- unmark(as.ppp(centres.sp))

Load in a Singapore map to use as the boundary:

sg <- readOGR(".", "sg-all")

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/siyan/Documents/2018/T7 making maps/Lab9/lab9rproj", layer: "sg-all"
## with 1 features
## It has 13 fields

sg.window <- as.owin(sg)
centres.ppp <- centres.ppp[sg.window]

Now, we can plot some data!

plot(centres.ppp)

plot(Kest(centres.ppp))

plot(density(centres.ppp, 0.01))

contour(density(centres.ppp, 0.015))

The smoothing parameter adjusts the “fine-ness” of density calculations. When it is very small, individual hawker centres can be seen. When it is big, the density is aggregated over almost the whole of Singapore.

Does the clustering of hawker centres correspond to population densities?

pop <- as.im(readGDAL("sg-pop.tif"))

## sg-pop.tif has GDAL driver GTiff 
## and has 37 rows and 58 columns

plot(rhohat(centres.ppp, pop))

plot(pop)
plot(centres.ppp, add=T)

The locations of hawker centres mostly follows the population density in the south and east of Singapore. Surprisingly, the two areas with highest population density (around Chua Chu Kang and Yio Chu Kang) are not represented in the tenders data. There are also hardly any centres in the north.

A point to note is that our initial data is on bids for hawker centre stalls. Hence, it could be that the hawker centres in these areas did not have any turnover in 2012-2017.