This week, we explored the characteristics and spatial distribution of hawker stalls in Singapore, using data on tender bids for hawker stalls from March 2012 to February 2016 from the NEA website.
First, data clean-up was in order, following instructions from the lab:
library(data.table)
tenders <- fread("tabula-list-of-successful-tenderers-from-march-2012.csv", header = F)
colnames(tenders) <- c("centre", "stall", "area", "trade", "bid", "month") # add headers to table
# clean up data
tenders <- tenders[centre != ""] # remove empty rows
tenders <- tenders[!822] # remove 'lock-up stalls' row
tenders <- tenders[!1255] # remove 'market stalls' row
tenders[1:821, type:="cooked food"] # set type to cooked
tenders[822:1254, type:="lock-up"] # set type to lock-up
tenders[1255:nrow(tenders), type:="market"] # set type to market
tenders[,bidNum:=as.numeric(gsub(bid,pattern="\\$|,",replacement = ""))] # populate new column "bidNum" with bid prices with dollar signs removed
tenders[,date:=as.Date(paste0("01-", month), "%d-%b-%Y"),] # create new column "date" with properly formatted date
tenders[,priceM2:=bidNum/as.numeric(area)] # create new column "priceM2", price per area
tenders[,area:=as.numeric(area)] # format area
head(tenders) # everything ok?
## centre stall area trade bid month
## 1: AMOY STREET FOOD CENTRE 01-68 5.65 HALAL COOKED FOOD $1,800.00 Jun-2013
## 2: AMOY STREET FOOD CENTRE 01-68 5.65 HALAL COOKED FOOD $1,188.88 Aug-2014
## 3: AMOY STREET FOOD CENTRE 01-68 5.65 HALAL COOKED FOOD $2,240.00 Sep-2015
## 4: AMOY STREET FOOD CENTRE 01-69 5.70 HALAL COOKED FOOD $300.00 Jun-2013
## 5: AMOY STREET FOOD CENTRE 01-69 5.70 HALAL COOKED FOOD $1,100.00 Oct-2013
## 6: AMOY STREET FOOD CENTRE 01-69 5.70 HALAL COOKED FOOD $1,488.88 Aug-2014
## type bidNum date priceM2
## 1: cooked food 1800.00 2013-06-01 318.58407
## 2: cooked food 1188.88 2014-08-01 210.42124
## 3: cooked food 2240.00 2015-09-01 396.46018
## 4: cooked food 300.00 2013-06-01 52.63158
## 5: cooked food 1100.00 2013-10-01 192.98246
## 6: cooked food 1488.88 2014-08-01 261.20702
Now, I wanted to see whether the type of stall had any correlation with bid prices. From NEA’s hawker stall registration form, I gathered that Cooked Food stalls were stalls selling prepared, ready-to-eat dishes such as “chicken rice”, “fishball noodles”, “nasi padang” “drinks”, “desserts”, etc.. Market/Lock-up stalls are stalls selling raw ingredients such as “fresh seafood”, “vegetables”, “eggs”, etc., with Lock-up stalls (I’m guessing) being stalls which can be closed off with pull-down shutters, and Market stalls being open-area stalls situated in a large market space.
I hypothesized that Cooked Food stalls would be more expensive than other kinds due to necessary utilities such as a kitchen and stoves, running water, and perhaps a slightly higher profit margin on prepared food as opposed to food sold by Market/Lock-up stalls. As the plot below shows, Market and Lock-up stalls are indeed cheaper than Cooked Food stalls, with Lock-up stalls being the more expensive of the two, likely due partly to the additional security provided by Lock-up stalls.
library(ggplot2)
# Function for calculating 25th and 75th percentiles
median.quartile <- function(x){
out <- quantile(x, probs = c(0.25,0.75))
names(out) <- c("yupper","ylower")
return(out)
}
# Violin plot of bid price against type
vp <- ggplot(tenders, aes(type, bidNum, fill=tenders$type)) + geom_violin()
# Add mean and interquartile range markers
vp <- vp + stat_summary(fun.y="median", geom="point", color="black", shape=18, size=3) + stat_summary(fun.y=median.quartile, geom="point", color="black", shape=18, size=2)
# Styling
vp <- vp + scale_fill_discrete(h = c(200, 360)) + ggtitle("Bid price according to type of stall") + scale_x_discrete("Type of Stall") + scale_y_continuous("Bid price/SGD") + theme(legend.position = "none")
vp
I chose to display median and the interquartile range instead of displaying the mean and standard deviation ({r, warning=FALSE, message=FALSE}stat_summary(fun.data="mean_sdl", fun.args = list(mult = 1),) because of the skewed nature of the distributions, especially that of market stalls. The median bid prices of Cooked Food, Lock-up, and Market Stalls respectively are $1254.52, $617.98, $195.07. The plot and these values might be useful to someone wanting to bid for a stall, as they would then be able to look at the mean successful bid prices for the stall type they desire and bid somewhere between the 25th and 50th percentile to have a decent chance of success while saving money, or bid above the 50th percentile if they are willing to shell out more for higher chances of bidding success.
I was curious as to whether prices had a directly proportional relationship with area, so I did a scatterplot of area against price:
ggplot(tenders, aes(bidNum, area, color=type)) + geom_jitter() + ggtitle("Stall area against bid prices") + scale_x_continuous("Bid price/SGD") + scale_y_continuous("Stall area/m2") + scale_color_discrete("Stall type", h = c(200, 360))
To my surprise, the stalls with the largest areas were not the most expensive – there was no strong correlation between area and price. Stall types clearly had a correlation with prices, which we had already established in the previous plot, though this was a nice alternative approach to visualisation. Furthermore, this plot showed an additional dimension – area – and allowed us to discover that Market stalls have the smallest areas (which we could not discern from the above plot alone), while Lock-up and Cooked Food Stalls spanned roughly the same areas – though Cooked Food Stalls were more expensive than Lock-up Stalls on average.
However, the lack of correlation is likely due to other more important factors such as location being a larger influence on bid prices, rather than area (case in point, there is a successful bid of $4000 for a 6sqm stall vs $500 for a 12sqm stall), especially since the total range of stall areas is small, at around 16 square metres – hawker stalls can only get so big, after all.
Finally, to find out how bid prices varied according to time, I plotted the mean bid price of each month against the date with a line graph.
# Create column "year"
tenders[,year:=substr(date, 1, 4)]
# Plot mean bid prices by date
lprice <- ggplot(tenders[,list(price=mean(bidNum)),by=date], aes(date, price, group=1)) + geom_line()
#Styling
lprice <- lprice + ggtitle("Mean bid prices by month") + scale_y_continuous("Bid price/SGD")
lprice
The resultant line graph had some noticeable fluctuations, with the highest mean bid prices in March 2012, mid-2013, mid-2015, and late 2016. However, I also wanted to separate the line graph into one line for each stall type, to see if the spikes in prices were distributed across all stall types or if a particular stall type had any specific trends:
library(plyr)
# Aggregate and get mean of bid prices by date, preserving stall type
tenders.meanprices <- ddply(tenders,.(type,date),
summarize, value = mean(bidNum))
# Plot mean bid price by date, colored by type
lpricetyped <- ggplot(tenders.meanprices, aes(x=date, y=value, group=type, color=factor(type))) + geom_line()
# Styling
lpricetyped <- lpricetyped + scale_color_discrete("Stall type", h = c(200, 360)) + ggtitle("Mean bid prices by month and type") + scale_y_continuous("Bid price/SGD")
lpricetyped
The plot shows that prices spiked uniformly across all types in March 2012, but the spikes in mid-2013 and late-2016 were contributed by Cooked Food and Lock-up stalls only, and the spike in mid-2015 was due to Cooked Food alone.
First, we need to geocode the centres and prepare it for spatial analysis:
library(ggmap)
# First replace ampersands in centre names with AND, or you'll get something like
# Warning message:
# geocode failed with status ZERO_RESULTS, location = "TAMAN JURONG MARKET & FOOD CENTRE, Singapore"
tenders[,centre:=gsub(centre, pattern="&", replacement = "AND"),]
# List unique centre names
centres <- tenders[,list(count=.N),by=centre]
# Append ", Singapore"
centres[,location:=paste0(centre, ", Singapore"),]
# Geocode
g <- geocode(centres[,location,], output ="latlon", source = "google", sensor = F)
# Merge back with rest of data into tenders.sp
centres <- cbind(centres, g)
tenders.sp <- merge(tenders, centres, by.x="centre", by.y="centre", all.x = T)
# Test geocoded coordinates with a plot of prices and types
ggplot(tenders.sp, aes(x=lon, y=lat, size=priceM2, color=type)) + geom_point(alpha=0.3) + coord_fixed()
# To circumvent overplotting, plot a density map instead:
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_density2d() + coord_fixed()
# Or a hex plot:
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_hex() + coord_fixed()
Now that all centres have been geocoded, we can visualize the spatial distributions of bids, and facet them by several attributes to investigate whether any patterns will emerge.
# wrap trade names
library(stringr)
tenders.sp[, wraptrade:= str_wrap(trade, width = 16)]
# plot bids faceted by trade coloured by type
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_jitter(aes(group=type, color=type)) + coord_fixed() + facet_wrap(~trade) + scale_color_discrete("Stall type", h = c(200, 360)) + ggtitle("Bids for Hawker Stalls in Singapore by Trade and Type") + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.x = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank(), legend.position = "bottom")
There are specific spatial distributions with regard to the number of bids per trade. For example, Piece and Sundry Goods and Preserved and Dried Goods are located in the same areas of Singapore, likely to be the larger hawker centres that offer not just cooked food stalls, but Lock-up stalls (or less frequently, market stalls) as well, that sell those two types of similar goods. In comparison to perishables, those goods have lower daily demand which is perhaps why all lock-up stalls are concentrated towards central Singapore, with none of them being located in more remote areas. Or perhaps it is simply because hawker centres outside of the central region are not furnished with lock-up stalls.
The more niche/specialty market stalls such as Halal Mutton, Halal Beef, Fresh Seafood, and Bean Cakes & Noodles etc. can be found not just in the central regions, but in the remote regions as well, perhaps because people are willing to travel further just for these specialty food items.
Cooked Food and Halal Cooked Food – and the combination of all the other Cooked Food stalls coloured in blue – both are evenly spread through the residential areas of Singapore, since most public housing estates and residential areas will have a neighbourhood hawker centre or two where residents dine.
On a side note, because the points are colored by type, one can also learn what types of goods are only sold at different types of stalls.
# plot bids faceted by type
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_jitter(aes(group=type, color=type)) + coord_fixed() + facet_wrap(~type) + scale_color_discrete("Type", h = c(200, 360)) + ggtitle("Bids for Hawker Stalls in Singapore by Type") + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.x = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank(), legend.position = "bottom")
For the spatial distributions with regard to the number of bids per type, there isn’t much else to glean apart from the fact that Cooked Food and Market stalls are more spread out through the island, with Market stalls being the more spread out, while lock-up stalls are not found in remote regions.
# plot bids faceted by year, coloured by type
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_density2d(aes(group=type, color=type)) + coord_fixed() + facet_wrap(~year) + scale_color_discrete("Year", h = c(200, 360)) + ggtitle("Bids for Hawker Stalls in Singapore by Year and Type") + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.x = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank(), legend.position = "bottom")
To mix things up a bit, I used a density plot this time. Initially, I did a point plotting, but realized that my eye couldn’t discern meaningful patterns from the dots on my own, and that the density plot might help with that.
While 2016 and 2012 are not complete sets, and hence on the whole are less dense than the rest of the years, one can see that there is a subtle decrease in number of stall bids from 2013 to 2014 across all stall types.
Format the data for spatial analysis.
library(sp)
library(spatstat)
library(maptools)
# create a table with one row for each centre, including the location but also the average price and the number of bids.
centres.sp <- tenders.sp[lat > 0,list(lon=lon[1], lat=lat[1], price=mean(priceM2), count=.N),by=centre]
centres.sp[is.na(price),price:=0,] #NA to 0
# tell R which columns in our data table are spatial coordinates
coordinates(centres.sp) <- c('lon', 'lat')
# convert our spatial dataset to a format that spatstat can understand (‘ppp’)
centres.ppp <- unmark(as.ppp(centres.sp))
plot(centres.ppp)
Bounding box of all points is used as the plotting extent. But we want the area of Singapore to be the bounds:
# define shape of Singapore as 'window' for spatial point pattern
library(rgdal)
sg <- readOGR(".", "sg-all")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "sg-all"
## with 1 features
## It has 13 fields
sg.window <- as.owin(sg)
centres.ppp <- centres.ppp[sg.window]
plot(centres.ppp)
There we go!
Now we can compare the density of points in centres.ppp object with a random Poisson process to see how clustered, dispersed or randomly distributed the points are.
plot(Kest(centres.ppp))
The points are quite clustered, since the results of the actual point process (top three lines) are above the Poisson process.
Hawker centres are clustered where the people are, or at least that’s what you’d naturally assume. To test this assumption, we can look at the population distribution of Singapore and compare it with the spatial distribution of hawker centres.
# raster image of Singapore population
pop <- as.im(readGDAL("sg-pop.tif"))
## sg-pop.tif has GDAL driver GTiff
## and has 37 rows and 58 columns
# computes a smoothing estimate of the intensity of a point process (*hawker centres point pattern*), as a function of a (continuous) spatial covariate (*population*).
plot(rhohat(centres.ppp, pop))
plot(rhohat(centres.ppp, pop, weights=centres.sp$price))
The graph goes as expected up till around pop=11000, with the intensity of hawker centres clustering increasing as population of an area increases.
However, a dip then occurs, up until the very end of the population axis, where there are two hawker centres off to the side. To understand why, we can try overlaying the hawker centres distribution over the population raster image:
plot(pop)
plot(centres.ppp, add=T)
Because I’m not too familiar with Singapore’s neighbourhoods, I took the government’s planning area data and overlaid it on top of the above plot:
I’ve surmised that those two lonely hawker centres off to the high end of the population axis are the hawker centres situated in Woodlands and Sengkang, both of which are among the most densely populated neighbourhoods in Singapore, and both of which boast only one hawker centre in their vicinities.
Now the entire length of the unexpected dip in intensity from pop=11000 up until the Woodlands and Sengkang hawker centres on the rhohat plot, can be attributed to a select subset of neighbourhoods in Singapore that are relatively more densely populated than other neighbourhoods, and yet have few to no hawker centres. For example, the cells around Tampines and Jurong West would have made up some of the last few hawker centres along the rhohat plot right before the Woodlands and Sengkang hawker centres. Cells around Pasir Ris, Bukit Panjang, Bukit Batok, Sembawang, and Yishun would have made up the part of the dip where there are no hawker centres at all from around pop=13000 to pop=17500 – these neighbourhoods are relatively densely populated, though less so than Woodlands and Sengkang, but have no hawker centres at all.
There are important parts being omitted from the rhohat plot, which only plotted up till the last hawker centre in the most populated area (the hawker centre in either Woodlands or Sengkang). There are still neighbourhoods evem more densely populated than Woodlands and Sengkang that have no hawker centres in their vicinities – Choa Chu Kang (CCK) and Sengkang. Poor CCK and Sengkang residents. The good news is, the government is working on giving these under-hawker centred neighbourhoods some hawker centres – just a few months ago, Bukit Panjang’s first ever hawker centre opened, and others on the list of areas identified for these new hawker centres include Punggol, Yishun, Pasir Ris, Bukit Batok, and even Choa Chu Kang.
However, there are still no talks of Sengkang getting a hawker centre anytime soon. Perhaps it is because Sengkang residents are happy with the neighbourhood private food centres (such as Kopitiam food centres). Perhaps someone doesn’t like Sengkang. Till today, it remains a mystery.