This webpage aims to document the process in discovering key elements that affect hawker centres in Singapore. First, a key source data used is the list of successful tenders for stalls at hawker centres, which is obtained from http://www.nea.gov.sg/services-forms/tender-notice. Another source of data is the locations of all hawker centres in Singapore, which is obtained from https://data.gov.sg/dataset/hawker-centres.
The following code snippet shows how the original csv file storing the dataset has to be processed before commencing any analysis:
library(data.table)
tenders <- fread("tabula-list-of-successful-tenderers-from-march-2012.csv",header = F)
colnames(tenders) <- c("centre","stall","area","trade","bid","month") #Give names to columns
#First get rid of the blank row, then row 822 too:
tenders <- tenders[centre!="",]
tenders <- tenders[!822,]
#Now to do some tidying up of data
tenders[1:821,stalltype:="cooked"]
tenders[822:nrow(tenders),stalltype:="lockup"]
tenders[,bidNum:=as.numeric(gsub(bid,pattern="\\$|,",replacement = "")),]
tenders[,date:=as.Date(paste0("01-",month),"%d-%b-%Y"),]
tenders[,priceM2:= bidNum/as.numeric(area)]
tenders$area <- as.numeric(tenders$area)
Here I include some plots that aim to answer the following questions:
library(ggplot2)
ggplot(tenders,aes(x=area,y=bidNum))+geom_point()+ggtitle("Plot 1: Scatter plot of Bid Price against Area of stall")+xlab("Area of stall")+ylab("Bid Price in dollars")+facet_wrap(~stalltype)
ggplot(tenders[stalltype=="cooked"&trade!="COOKED FOOD(NON-MUSLIM)",,],aes(trade,priceM2))+geom_boxplot()+xlab("Trade")+ylab("Price per metres square of stall area")+ggtitle("Plot 2: Plot of average prices per unit area for each type of cooked food stalls")
#Note that I have excluded the trade "COOKED FOOD(NON-MUSLIM)" since there is only one such store
ggplot(tenders,aes(date,fill=stalltype))+geom_bar(position = "stack")+ylab("Number of stalls")+xlab("Date")+ggtitle("Plot 3A: How has the the number of bids changed with time?\n(Stacked plot of all bids)")
ggplot(tenders,aes(date,fill=stalltype))+geom_bar()+ylab("Number of stalls")+xlab("Date")+ggtitle("Plot 3B: How has the the number of bids changed with time? \n(Plots separated by stall type)")+facet_wrap(~stalltype)
library(rgdal)
h <- readOGR("hawker-centres.kml","HAWKERCENTRE")
## OGR data source with driver: KML
## Source: "hawker-centres.kml", layer: "HAWKERCENTRE"
## with 110 features
## It has 2 fields
plot(h)
#Above shows a quick lat-lon plot of all hawker centres in Singapore
h.t<- data.frame(toupper(h$Name),h@coords[,1:2]) #Store name+lat/lon of hawker centres in a data frame
colnames(h.t) <- c("name","lon",'lat') #Give names to the h.t data frame
tenders.sp <- merge(tenders,h.t,by.x = "centre",by.y = "name",all.x=T)
#Note that tenders.sp above has missing lat/lon values for several entries
#Now we try and correct the merged table by using Google's help:
library(ggmap)
centres <- tenders[,list(COUNT=.N),by=centre]
centres[,location:=paste0(centre,", Singapore"),]
#g <- geocode(centres[,location,],output = "latlon",source="google",sensor="F")
#Due to limitations on campus, use the prepared csv file instead
g <- read.csv("centres-geocoded.csv")
centres <- cbind(centres,g)
tenders.sp <- merge(tenders,centres,by.x = "centre",by.y = "centre",all.x=T)
#Now a quick test to see if everything's all right:
ggplot(tenders.sp,aes(x=lon,y=lat,size=priceM2,color=stalltype))+geom_point(alpha=0.3)+coord_fixed()+ggtitle("Basic plot of all stalls, coloured by stall type, sized by size of stall")
#Using a density plot instead: 2 variants here
ggplot(tenders.sp,aes(x=lon,y=lat))+geom_point()+geom_density2d()+coord_fixed()+ggtitle("Contour plots of bids by location")
ggplot(tenders.sp,aes(x=lon,y=lat))+geom_point()+geom_hex()+coord_fixed()+ggtitle("Using hex bins for density plots")
ggplot(tenders.sp,aes(x=lon,y=lat))+geom_point()+geom_density2d()+coord_fixed()+ggtitle("Contour plots of bids by location, segregated by type of stall")+facet_wrap(~stalltype)
tenders.sp$year <- substr(tenders.sp$date,1,4)
ggplot(tenders.sp[!1749,],aes(x=lon,y=lat))+geom_point()+geom_density2d()+coord_fixed()+ggtitle("Contour plots of bids by location, segregated by year")+facet_wrap(~year) #Note that row 1749 is an invalid row and ought to be discarded ...
The first plot shows that there is a significant clustering of bids for cooked food stalls at at hawker centre somewhere in the central part of Singapore (Ang Mo Kio?), while the clustering effect is significantly reduced (though not absent) for lockup stalls.
The second plot shows that the spatial distribution of bids remains about the same through the years from 2012 to 2016, with the ‘peaks’ at approximately the same places, and an even distribution of bids elsewhere in Singapore.
library(spatstat)
library(maptools)
#The following code creates a list of unique hawker centres:
centres.sp <- tenders.sp[lat>0,list(lon=lon[1],lat=lat[1],price=mean(priceM2),count=.N),by=centre]
centres.sp[is.na(price),price:=0,] #Replace all NA to 0
coordinates(centres.sp) <- c("lon","lat")
centres.ppp <- unmark(as.ppp(centres.sp))
Here’s a quick look at the first scatter plot of points representing the hawker centres of Singapore.
plot(centres.ppp, main="Quick scatter plot of all hawker centres, round 1")
sg <- readOGR(".","sg-all")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "sg-all"
## with 1 features
## It has 13 fields
sg.window <- as.owin(sg)
centres.ppp <- centres.ppp[sg.window]
plot(centres.ppp, main="Quick scatter plot of all hawker centres, round 2")
Note that the outline of Singapore is now present and will be used as the bounding box for subsequent analysis.
plot(Kest(centres.ppp),main="This graph provides a mathematical reference \nto how much clustering there is.")
plot(density(centres.ppp, 0.02), main = "Areas in brighter colours in the plot below denote greater clustering \nof points in that area, compared to what would be expected if points \nwere distributed randomly")
contour(density(centres.ppp, 0.02), main="This plot shows a 'contour' map that represents the density plot above, \nwith a greater height representing greater clustering of points.")
pop <- as.im(readGDAL("sg-pop.tif"))
## sg-pop.tif has GDAL driver GTiff
## and has 37 rows and 58 columns
plot(rhohat(centres.ppp, pop),main="Plot of relationship between \n'intensity' of hawker centres vs neighbouring populations")
plot(rhohat(centres.ppp, pop, weights=centres.sp$price),main="Plot of relationship between 'intensity' of hawker centres vs neighbouring \npopulations, with intensity weighted by average price of each stall")
It can be seen that in both plots above, an increase in population is generally accompanied by an increase in ‘intensity’ of hawker centres. However, beyond the population of 10000, both plots show a sharp decrease in ‘intensity’ of hawker centres. The following code shall investigate the possible reasons by locating these outliers.
plot(pop,main="Heat map of population in Singapore")
plot(centres.ppp, add=T)
The underlying heat map shows that there is a high population density in the towns of Jurong West, Woodlands, and in the Northeastern towns of Hougang and Sengkang. The overlay of hawker centres shows that these same towns do not have a lot of hawker centres, with about 2 hawker centres for Jurong West, 1 for Woodlands and none at the Northeastern towns of Hougang and Sengkang.
There is a very good writeup on why this is the case by Azhar Ghani from the Institue of Policy Studies, with his report titled A Recipe for Success: How Singapore Hawker Centres Came to Be. A short quote from the then-Prime Minister, Goh Chok Tong, sums it up nicely:
“The reason why we stopped was very simple: we were not in the business of providing food stalls. The only reasons we built hawker centres was because we needed to get hawkers off the streets.” - Goh Chok Tong
As a result, when all the street hawkers had been relocated to hawker centres, the government stopped building new hawker centres. Yet at the same time, new towns that were being built were progressively getting denser in terms of population, which resulted in this situation where the areas with the highest population density have some of the lowest ‘intensity’ of hawker centres.
As a side note, this also supports why I like coming to SUTD - there’s good and cheap food to be found in the vicinity of the campus, which is unfortunately not the case back home …