The main data is from NEA (see under “List of successful Tenderers”). The library data.table was used to handle the data as there are a lot of entry. How it works is that it is a subset of data.frame and easier to manage larger quantitativeness.
library(data.table)
tenders<-fread("tabula-tender-bids-from-mar-2012-to-jan-2017.csv",header = F,fill=TRUE) #fill is true because ther is one random row that has one column instead of 6 columns because of error out. when fill = TRUE, if there are cols less than expect, fill up the empty cols with empty
colnames(tenders)<-c("centre","stall","area","trade","bid","month") #add in headers to the cols
Tabula was used to convert data from .pdf to .csv. However, the conversion is not perfect and as such, cleaning up is needed. I manually cleaned up the csv first by using a text editor (such as missing commas that results in wrong text splitting) before using code to finish up:
tenders <- tenders[centre!="",] #update tenders without the rows that have no centre names
tenders <- tenders[!1135,]
tenders[1:1134,type:="cooked",] #subset row 1 - 1134 as cooked
tenders[1135:nrow(tenders),type:="lockup",] #subset 1135 to last row of tenders as lockup
tenders[,bidNum:=as.numeric(gsub(bid,pattern = "\\$|,", replacement = "")),] #create new col bidNum without $ and ,
tenders[,date:=as.Date(paste0("01-",month),"%d-%b-%Y"),] #create new date col formatted as YYYY-MM-01
tenders[,priceM2:=bidNum/as.numeric(area),] # create new price per m^2 priceM2 col
tenders<-tenders[!1749]
tenders[1749:nrow(tenders),type:="market",]
The relationship bewtween area and price can be found by plotting with ggplot:
library(ggplot2)
ggplot(tenders,aes(x=area,y=priceM2,group=1)) + geom_line() + xlab("Area of Stall (per square meters") + ylab("Price per square meters") + ggtitle("Graph of Price against Area of Store")
It can be seen that there seems to be a distribution. The graph shows that the most bids were made for stall areas between \(5m^{2}\) to \(10m^{2}\). The most popular area size seems to be at \(6m^{2}\) to \(7m^{2}\) as the price bidded is extremely high. The size might be the optimal size that hawkers need - a just right space to help them prepare food while maximising out the profits they gained. Another possible reason for the popularity could be that NEA mainly builds stalls with those areas.
The average price per \(m^{2}\) per hawker centre is shown below, printed out. The table data is also arranged in descending order of price too!
library(plotly)
print(tenders[,list(Sign='$',Price = mean(priceM2)),by=centre][order(-Price)],nrows = 115, row.names = F)
## centre Sign Price
## BLK 22A HAVELOCK ROAD $ 430.376344
## BLK 14 HAIG ROAD $ 410.028249
## BLK 85 REDHILL LANE $ 409.151140
## BLK 32 NEW MARKET ROAD $ 401.874813
## AMOY STREET FOOD CENTRE $ 357.689442
## BLK 503 WEST COAST DRIVE $ 345.788772
## BLK 50A MARINE TERRACE $ 335.563568
## BLK 79 TELOK BLANGAH DRIVE $ 306.367925
## MAXWELL FOOD CENTRE $ 305.692193
## MARKET STREET FOOD CENTRE $ 297.632945
## BLK 51 OLD AIRPORT ROAD OLD AIRPORT ROA $ 285.984848
## NEWTON FOOD CENTRE $ 280.927536
## EAST COAST LAGOON FOOD VILLAGE $ 278.058148
## BLK 6 JALAN BUKIT MERAH $ 249.930673
## GEYLANG SERAI MARKET $ 217.624649
## BLK 665 BUFFALO ROAD $ 217.512881
## BLK 79A CIRCUIT ROAD $ 210.805318
## ZION RIVERSIDE FOOD CENTRE $ 196.549890
## BLK 226H ANG MO KIO STREET 22 $ 183.225806
## BLK 162 ANG MO KIO AVENUE 4 $ 180.995475
## BLK 120 BUKIT MERAH LANE 1 $ 177.649573
## BLK 75 TOA PAYOH LORONG 5 $ 173.611111
## BLK 37A TEBAN GARDENS ROAD $ 173.512175
## BLK 254 JURONG EAST STREET 24 $ 170.808935
## BLK 335SMITH STREET $ 167.017544
## BLK 51 OLD AIRPORT ROAD $ 165.707887
## BLK 20 GHIM MOH ROAD $ 163.473193
## BLK 79 CIRCUIT ROAD $ 161.330986
## BLK 216 BEDOK NORTH STREET $ 152.328932
## BLK 17 UPPER BOON KENG ROAD $ 146.890328
## HOLLAND VILLAGE MARKET $ 144.160303
## BLK 453A ANG MO KIO AVENUE 10 $ 141.796515
## BLK 91 WHAMPOA DRIVE $ 141.730769
## BLK 16 BEDOK SOUTH ROAD S(460016) $ 132.831001
## BLK 409 ANG MO KIO AVENUE $ 131.978022
## BLK 117 ALJUNIED AVENUE 2 $ 126.383741
## BLK 49 SIMS PLACE $ 125.022218
## BLK 628 ANG MO KIO AVENUE 4 $ 119.166593
## SERANGOON GARDEN MARKET $ 116.932199
## BLK 3 CHANGI VILLAGE ROAD $ 116.729403
## BLK 38A BEO CRESCENT S(169982) $ 115.107914
## BLK 105 HOUGANG AVENUE 1 $ 114.449472
## BLK 726 CLEMENTI WEST STREET 2 $ 112.682440
## BLK 7 EMPRESS ROAD $ 111.915587
## BLK 58 NEW UPPER CHANGI ROAD $ 99.978956
## BLK 210TOA PAYOH LORONG 8 $ 99.672131
## BLK 209 HOUGANG STREET 21 $ 96.656217
## BLK 44 HOLLAND DRIVE $ 95.646219
## TIONG BAHRU MARKET $ 94.296341
## GOLDEN MILE FOOD CENTRE $ 92.097703
## BERSEH FOOD CENTRE $ 90.110381
## HOLLAND ROAD MARKET $ 90.070086
## BLK 335 SMITH STREET $ 89.584482
## BLK 90 WHAMPOA DRIVE $ 88.942126
## BLK 347 JURONG EAST AVENUE 1 $ 82.808113
## BLK 527 ANG MO KIO AVENUE 10 $ 82.301221
## BLK 29 BENDEMEER ROAD $ 80.970900
## BLK 226D ANG MO KIO AVENUE1 $ 80.288462
## BLK 11 TELOK BLANGAH CRESCENT $ 79.591577
## BLK 531A UPPER CROSS STREET $ 78.125384
## BUKIT TIMAH MARKET $ 74.977919
## DUNMAN FOOD CENTRE $ 72.002692
## BLK 216 BEDOK NORTH STREET 1 $ 71.406815
## BLK 92 WHAMPOA DRIVE $ 71.147316
## TAMAN JURONG MARKET & FOOD CENTRE $ 71.130515
## BLK 115 BUKIT MERAH VIEW $ 70.416149
## BLK 21 MARSILING LANE $ 70.111643
## BLK 85 BEDOK NORTH STREET 4 $ 67.264522
## BLK 4A EUNOS CRESCENT $ 66.669064
## BLK 22 TOA PAYOH LORONG 7 $ 66.296008
## BLK 127 TOA PAYOH LORONG 1 $ 66.121504
## BLK 16 BEDOK SOUTH ROAD $ 65.583825
## BLK 210 TOA PAYOH LORONG 8 $ 63.823765
## BLK 22 LORONG 7 TOA PAYOH $ 63.636364
## BLK 93 TOA PAYOH LORONG 4 $ 63.258510
## BLK 448 CLEMENTI AVENUE 4 $ 61.295419
## TANGLIN HALT MARKET $ 61.200437
## BLK 448 CLEMENTI AVENUE 3 $ 59.241706
## BLK 79 REDHILL LANE $ 57.593005
## BLK 226D ANG MO KIO AVENUE 1 $ 56.164234
## BLK 6 TANJONG PAGAR PLAZA $ 55.828696
## BLK 409 ANG MO KIO AVENUE 10 $ 50.691349
## COMMONWEALTH CRESCENT MARKET $ 49.826763
## BLK 341 ANG MO KIO AVENUE 1 $ 48.484305
## BLK 137 TAMPINES STREET 11 $ 47.611929
## BLK 22B HAVELOCK ROAD $ 47.009409
## BLK 270 QUEEN STREET $ 44.769141
## BLK 630 BEDOK RESERVOIR ROAD $ 44.637161
## BLK 159 MEI CHIN ROAD $ 43.167211
## BLK 112 JALAN BUKIT MERAH $ 42.836548
## BLK 538 BEDOK NORTH STREET 3 $ 42.619429
## BLK 36 TELOK BLANGAH RISE $ 42.512523
## BLK 1 JALAN KUKOH $ 41.656051
## BLK 353 CLEMENTI AVENUE 2 $ 41.055931
## BLK 320 SHUNFU ROAD $ 37.531264
## BLK 226D ANG MO KIO AVENUE $ 36.593786
## BLK 89 CIRCUIT ROAD $ 35.944270
## BLK 724 ANG MO KIO AVENUE 6 $ 35.923545
## BLK 4A JALAN BATU $ 34.671561
## NORTH BRIDGE ROAD MARKET $ 33.653757
## BLK 29 BENDEMEER ROAD S(330029) $ 33.333333
## BLK 82 TELOK BLANGAH DRIVE $ 30.050420
## BLK 160 ANG MO KIO AVENUE 4 $ 28.730171
## BLK 80 CIRCUIT ROAD $ 27.542904
## BEO CRESCENT MARKET $ 26.646905
## BLK 74 TOA PAYOH LORONG 4 $ 26.336867
## BLK 502 WEST COAST DRIVE $ 25.692013
## BLK 41A CAMBRIDGE ROAD $ 23.743778
## BLK 22 LORONG 7 TOA PAYOH LORONG 7 $ 18.181818
## BLK 221A BOON LAY PLACE $ 18.093023
## BLK 511 BEDOK NORTH STREET 3 $ 16.666667
## BLK 13 HAIG ROAD $ 15.750728
## KALLANG ESTATE MARKET $ 13.886486
## BLK 724 ANG MO KIO AVENUE $ 10.526316
## BLK 221A BOON LAY $ 5.607477
## centre Sign Price
Or if you prefer visualisation:
plot_ly(tenders[,list(Price = mean(priceM2)),by=centre], x=~centre,y=~Price,type = 'bar'
)%>%
layout(margin = list(b = 300,width = 900, height = 500,autosize = F),title = "Avg Price Per Centre", xaxis=list(title="Centre Name" ,tickangle = -90), yaxis=list(title="Price (S$)") )
For the average price per \(m^{2}\) per type and trade, plot.ly was used to create interactive so that the reader will know exactly the average price per type and trade.
library(plotly)
byTrade<-tenders[,list(Sign='$',Price = mean(priceM2)),by=trade][order(trade)]
byTrade
## trade Sign Price
## 1: ASSORTED SPICES $ 24.19882
## 2: BEANCAKES & NOODLES $ 22.92776
## 3: BEEF $ 28.98285
## 4: COOKED FOOD $ 223.80557
## 5: CUT FRUITS $ 260.27704
## 6: DRINKS $ 214.41455
## 7: EGGS $ 30.33516
## 8: FLOWERS $ 37.14950
## 9: FRESH SEA FOOD $ 42.72850
## 10: FRESH SEAFOOD $ 42.08804
## 11: FROZEN GOODS $ 25.20198
## 12: HALAL BEEF $ 11.80581
## 13: HALAL COOKED FOOD $ 132.15969
## 14: HALAL FROZEN GOODS $ 33.45014
## 15: HALAL MUTTON $ 20.59963
## 16: HALAL POULTRY $ 21.49762
## 17: INDIAN CUISINE $ 93.28537
## 18: MUTTON $ 23.80194
## 19: PIECE & SUNDRY GOODS $ 81.99614
## 20: PORK $ 26.68522
## 21: POULTRY $ 29.43061
## 22: PRESERVED & DRIED GOODS $ 56.74716
## 23: PROCESSED RAW SEAFOOD $ 36.19170
## 24: VEGETABLES $ 50.38332
## 25: WHOLE FRUITS $ 53.46454
## trade Sign Price
gg<-ggplot(byTrade,aes(trade,Price)) + geom_point() + theme(axis.text.x=element_text(angle=90, hjust=1)) + ggtitle("Average Price per m2 per Trade") + xlab("Trade Name") + ylab("Price (S$)")
ggplotly(gg)
byType<-tenders[,list(Sign='$',Price = mean(priceM2)),by=type][order(type)]
byType
## type Sign Price
## 1: cooked $ 180.71627
## 2: lockup $ 78.45532
## 3: market $ 37.02828
gg2 <-ggplot(byType,aes(type,Price)) + geom_point() + theme(axis.text.x=element_text(angle=0, hjust=1)) + ggtitle("Average Price per m2 per Type of Stall") + xlab("Type of Stall") + ylab("Price (S$)")
ggplotly(gg2)
Overall, bidding is highest for cooked food (whether type or trade). It could be due to that one can make the most profit from running such a store as there will always be a constant demand. Or, that there are many current/ potential cooked food owners bidding to retain / get a stall.
In order to plot the data on a spatial area, there was need to geocode the address. The data is from data.gov.sg.
library(ggmap)
library(rgdal)
h<- readOGR("HAWKERCENTRE.kml","HAWKERCENTRE")
## OGR data source with driver: KML
## Source: "HAWKERCENTRE.kml", layer: "HAWKERCENTRE"
## with 116 features
## It has 2 fields
plot(h, main="Places with Hawker Centres")
h.t<-data.frame(toupper(h$Name),h@coords[,1:2])#extracts coordinates, creates t from h
colnames(h.t)<-c("name","lon","lat")
tenders.sp <-merge(tenders,h.t,by.x="centre",by.y="name",all.x=T) #join spatial by using common field name & centre
However, due to the limit of library(ggmap) of maximum amount of queries of 2500 per day, Prof Ate has provided a .csv file to enable a local cache of geocodes.
centres <- read.csv("centres-geocoded.csv")
#spatial join
tenders.sp <- merge(tenders,centres,by.x="centre",by.y="centre",all.x=T)
#check if all is good
tenders.sp[is.na(lon)]
## centre stall area trade bid
## 1: BLK 51 OLD AIRPORT ROAD OLD AIRPORT ROA 01-144 5.28 DRINKS $1,510.00
## month type bidNum date priceM2 count location lon lat
## 1: Jan-2017 cooked 1510 2017-01-01 285.9848 NA NA NA NA
This data might have missing coordinates due to error while cbinding the locations and the centre names. But this can be ignored as this is only 1 data error out of 2780.
#the graph is good but overplotting
ggplot(tenders.sp,aes(x=lon,y=lat,size=priceM2,color=type))+geom_point(alpha=0.3) + coord_fixed() + ggtitle("Locations of stalls with their price per m^2 and type coloured") + xlab("Longitude") + ylab("Latitude")
Due to the overplotting, we can use density graphs instead:
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_density2d() + coord_fixed() + ggtitle("Density of Stalls - Contours") + xlab("Longitude") + ylab("Latitude")
Where we get a density graph using contours. The contours shows the area of concentration - the closer the contours are, the more dense the area is with stalls.
However, it is difficult to read and thus could be better visualised with a hexagonal graph:
ggplot(tenders.sp, aes(x=lon, y=lat)) + geom_point() + geom_hex() + coord_fixed() + xlab("Longitude") + ylab("Latitude") + ggtitle("Density of Stalls - Hexagonal Density")
Q: Do you think there is a specific spatial distribution with regard to the number of bids per ‘trade’ or ‘type’? Or different patterns based on the year? Create a (series of) faceted plot(s) to help you answer that question and include them (and the code leading up to it) in your markdown document.
I created faceted graphs using both facet_grid and facet_wrap, depending whichever creates a more appealing graph set for readers.
byTradeDensity <-ggplot(tenders.sp,aes(x=lon,y=lat)) + coord_fixed() + facet_wrap(~trade) + xlab("Longtitude") + ylab("Latitude") + ggtitle("Spatial Distribution of Stalls Bidding by Stall Trade (2012-2017)") + geom_hex() + scale_fill_gradient(low="orange", high="red", name="Bids") + theme(axis.text.x = element_text(angle=90),strip.text.x = element_text(size = 4.5))
byTradeDensity
Observations:
Halal-related goods are mainly located at Eastern and South Easten of Singapore.
Fresh Seafood seem to have outliers at the edges
But at the coastal areas are where seafood is more popular.
Additionally, some of the shops may have gotten supplies from local fisheries(kelongs), hence the stalls are built near the sea
Vegetables-selling stores are scattered everywhere around Singapore
byTypeGraph<-ggplot(tenders.sp,aes(x=lon,y=lat)) + coord_fixed() + facet_grid(.~type) + xlab("Longtitude") + ylab("Latitude") + ggtitle("Spatial Distribution of Stalls Bidding by Stall Type (2012-2017)") + geom_hex() + scale_fill_gradient(low="orange", high="red", name="Bids") + theme(axis.text.x = element_text(angle=45,hjust=1))
byTypeGraph
Overall, the different types of stalls are scattered around Singapore, mainly concentrated in the lower half of Singapore. Towards the Central Business District of Singapore (est 1.25,103.85), the concentration of stalls are at its peak. This could be due to that that is where demand is highest - where tourists and working Singaporeans will be at. In particular, cooked food typed of stall has its highest density at the area due to the crowd.
There are outliers at for markets, one in the north-west (around Woodlands) and another at the east (Pasir Ris/ Changi). This may be due to the distribution of residential areas in Singapore. One must note that the database is based on stalls bidding from areas under NEA. That is, there might be other wet markets stalls, but not under NEA.
#take the year out from the data
tenders.sp$year = format(as.Date(tenders.sp$date, format="%Y-%m-%d"),"%Y")
tradeAgainstYear<-ggplot(tenders.sp,aes(x=lon,y=lat,color=trade)) +geom_point() + coord_fixed() + facet_grid(year~.) + xlab("Longtitude") + ylab("Latitude") + ggtitle("Spatial Distribution of Stalls Bidding by Year (2012-2017)")+ scale_fill_gradient(low="orange", high="red", name="Bids") + theme(strip.text.y =element_text(angle = 0),axis.text.x = element_text(angle=90))
tradeAgainstYear
In terms of type of stall bidded per year is roughly the same. This may be the result of stalls often having their tenders expiring every other year. 2017 may looked like there was a dip in the number of biddings but this is owing to that we have only been into 3 months of 2017 when NEA released the data.
Overall, there is a general spatial distribution of biddings. This is because of the locations where land parcels are being used by NEA to build stalls.
To discover if there are any spatial patterns in regards to population, we needed to first load the centres’ coordinates into a boundary rectangle.
library(spatstat)
library(maptools)
centres.sp <- tenders.sp[lat>0,list(lon=lon[1],lat=lat[1],price=mean(priceM2),count=.N),by=centre]
#if NA, put price to be 0
centres.sp[is.na(price),price:=0,]
#mark the columns to have longitude and latitude
coordinates(centres.sp) <- c('lon','lat')
centres.ppp<-unmark(as.ppp(centres.sp))
# check if it's alright
plot(centres.ppp)
The plotting turns out fine, with the furthest points forming the boundaries of the rectangles.
#load shp of Singapore
sg<- readOGR(".","sg-all")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "sg-all"
## with 1 features
## It has 13 fields
#a polygonal boundary rectangle , convert data to obsevation window for a point pattern
sg.window <- as.owin(sg)
centres.ppp<-centres.ppp[sg.window]
plot(centres.ppp , main="centres.ppp with shp of Singapore")
#all of the points are clustered
plot(Kest(centres.ppp))
The Kest graph shows that all the points are clustered as they are all above the blue dotted line.
#heat map of density
plot(density(centres.ppp,0.02))
Most of the population is concentrated around the centre of Singapore
#create a contour map of density instead and smoothen it with a 0.02 factor
contour(density(centres.ppp,0.02))
#we read in the population from a raster file
pop <- as.im(readGDAL("sg-pop.tif"))
## sg-pop.tif has GDAL driver GTiff
## and has 37 rows and 58 columns
plot(rhohat(centres.ppp,pop))
The function rhohat helps to plot a covariance graph. Here, we plotted a covariance graph between the population density of Singapore and areas where the centres are located at.
It is observed that there is a dip as the population increases. Next, we average out the data using weights=centres.sp$price:
plot(rhohat(centres.ppp,pop,weights=centres.sp$price))
The data trend remains the same! The graphs show that there are two outliers.
Q: You notice there are outliers in areas with a high population. We would have expected a much higher intensity of hawker centers there based on the rest of the pattern. Try to find out where these outliers are located and reflect on what makes these centres different and why we could expect a ‘dip’ towards the upper end of the population density
To answer this question, we can lean back to our Kest and rhostat graphs. A graph of the density of population and locations of hawker centres comes in handy too:
plot(pop,main="Population density of Singapore")
plot(centres.ppp,add=T)
If the above graph is too blurry, we can refer to the map below on the different population density :
Created in QGIS - Some dense areas are Choa Chu Kang, Bedok, Tampines, Yishun, Woodlands (credits: my lab 2 submission)
From the rhostat and Kest graphs above, it can concluded that areas around higher population density, stalls are absented from the areas! By comparing the Population density of Singapore graph and the QGIS, I conclude that the outliers are in Woodlands and Jurong areas. Woodlands, an extremely densed area, only has 1 centre while Jurong, another highly densed area, has only 1-2 centres.
As mentioned earlier, the volume of demand is mainly concentrated in the Central Business District area as tourists and office workers are mainly located there. Thus, refelcting to this demand, NEA might have built more stalls within the area. However, the majority do not live within the areas as housing prices are much higher. Hence, this creates the relation of lower population density yet higher the number of centres.
The dip represents a drop in the number of stalls while population increased. Areas with higher population tend to be older neighbourhoods. They could have been supplemented with alternatives, such as coffeeshops, home-grown shops, and thus may not have a demand for NEA stalls. They are able to self-sustain despite a high population.
For Woodlands, it is also the area where the checkpoint is to & fro Malaysia. Thus, the centre might have been built to serve the commuters using the checkpoint. For Jurong, multiple, modern shopping malls (such as Jurong Point, Jem) have been built to meet the demand for amenities thus not needing as many stalls as they serve the same needs.
In newer neighbourhoods, such as Pasir Ris and Punggol/Sengkang, centres are almost absent. This might be due to master plans for the area not including the building of centres, waiting for estates to mature in order to create sustainability for the centres and population. In fact, for Pasir Ris, NEA will be opening the first hawker centre in 2017 after the estate has been around since the 1980s. Moreover, it could be that in newer neighbourhoods, shopping malls were built first in order to attract new residents. Shopping malls contain modernised centres - food courts (cooked food), supermarkets(market equivalent) and shops (like dry goods), lowering the demand for hawker centres.
I feel that this lab has given me another opportunity to explore R, ggplot and plotly even more. I wished I had more time to do this lab as I had just discovered this amazing gitbook on plotly that demostrates what plotly can do beyond what is presented on plotly’s official website.