In this lab, we aim to perform spatial analyses on hawker centre rental prices in Singapore. First we proceed by reading in the .csv file.

Now with base data, we can answer several questions. For instance, is there a relation between area and price?

tenders %>%
  drop_na(bidNum) %>%
  drop_na(area) %>%
  ggplot(aes(x=area, y=bidNum)) + 
  geom_point() +
  ggtitle("Scatter plot of bid price against area") + 
  ylab("Bid Price") + 
  xlab("Area")

From the plot above, it seems there there isn’t really a linear correlation between area and bid price. Perhaps the bid price is more strongly related to where the hawker centre is located. Hawker centres located in places with more human traffic could see higher bid prices.

Next we could ask, does the number of bids change over time?

tenders %>%
  drop_na(date) %>%
  group_by(date) %>%
  summarise(count = n()) %>%
  ggplot(aes(x=date, y=count)) +
  geom_col() +
  scale_x_date(date_breaks = "1 year", labels = date_format(format = "%Y")) + 
  xlab("Time") + 
  ylab("Number of bids") + 
  ggtitle("Number of bids over time")

The above chart actually surprised me. There is a clear increase in number of bids over time. This suggests a higher demand for hawker centers in Singapore (more people want to have their own stalls). Considering that Singaporeans are getting more educated, I expected less people in Singapore to be willing to work as hawkers. Perhaps, the increase in number of bids is due to immigrants rather than Singaporeans.

Building charts from this data may seem boring. Why not we add spatial data and do a spatial analysis.

library(sf)
centres <- st_read("hawker-centres/hawker-centres-kml.kml")
Reading layer `HAWKERCENTRE' from data source `/Users/arroyo/Desktop/stuff/Term 5/02221/Week 10/02221-Lab9/hawker-centres/hawker-centres-kml.kml' using driver `KML'
Simple feature collection with 116 features and 2 fields
geometry type:  POINT
dimension:      XYZ
bbox:           xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.443882
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
plot(centres)

centres <- centres %>% 
  mutate(centreName = str_extract(Description, "(?<=<th>NAME<\\/th> <td>).*?(?=<\\/td>)")) %>%
  mutate(centreUpper = toupper(centreName)) %>%
  mutate(lon = st_coordinates(centres)[,1]) %>%
  mutate(lat = st_coordinates(centres)[,2])
tenders.sp <- left_join(tenders, centres, by = c("centre" = "centreUpper"))

This kinda worked. But after looking through tenders.sp, it is obvious that the joining didn’t go quite as well as we hoped. This is due to inconsistent spelling of the hawker centres’ names. We shall fix that using Google’s geocoding service.

centres.unique <- tenders %>% 
  group_by(centre) %>%
  summarise(count = n()) %>%
  mutate(location = paste0(centre, ", Singapore"))
# OVER_QUERY_LIMIT, boohoo
# g <- geocode(centres.unique$location, output = "latlon", source = "google", sensor = F) 
g <- read_csv("cleaned_centres-geocoded.csv")
g <- unique(g[c("centre","lat","lon")])
g <- g[-c(43),]
centres.unique <- centres.unique[centres.unique$centre %in% g$centre,]
centres <- bind_cols(centres.unique, g)
tenders.sp <- left_join(tenders, centres, by = c("centre" = "centre"))
ggplot(tenders.sp %>% drop_na(priceM2), aes(x=lon, y=lat, size=priceM2, color=type)) +
  geom_point() +
  geom_density2d() + 
  coord_fixed() +
  ggtitle("Number of bids in each area") +
  xlab("Longtitude") + 
  ylab("Latitude")

  
ggplot(tenders.sp, aes(x=lon, y=lat)) +
  geom_point() +
  geom_hex() + 
  coord_fixed() +
  ggtitle("Number of bids in each area") +
  xlab("Longitude") + 
  ylab("Latitude")

Now that we have the spatial distribution of hawker centres in Singapore, we could do further analysis on it.

ggplot(tenders.sp, aes(x=lon, y=lat)) +
  geom_point() +
  geom_hex() + 
  facet_wrap(~type, labeller = label_wrap_gen(width = 10, multi_line = TRUE)) +
  scale_x_continuous(breaks = pretty(tenders.sp$lon, n=5)) +
  theme(axis.text.x = element_text(angle = 90), strip.text = element_text(size = 6)) +
  xlab("Longitude") +
  ylab("Latitude") + 
  ggtitle("Spatial distribution of bids for different types")

The above chart shows the spatial distribution of bids for different types of stores. The spread of bids seem to be mostly the same except for an anomaly in the case of lockup stores in the southern tip of Singapore.

ggplot(tenders.sp, aes(x=lon, y=lat)) +
  geom_point(na.rm = T) +
  geom_hex(bins=20, na.rm = T) + 
  facet_wrap(~trade, labeller = label_wrap_gen(width = 10, multi_line = FALSE)) +
  scale_x_continuous(breaks = pretty(tenders.sp$lon, n=2)) +
  theme(axis.text.x = element_text(angle = 90), strip.text = element_text(size = 6)) +
  xlab("Longitude") +
  ylab("Latitude") + 
  ggtitle("Spatial distribution of bids for different trades")

The above plot shows the spatial distribution of bids for different trades. Again, the general distribution of the number of bids look around the same for most types of trades. It is interesting though, that bids for stalls selling mutton only show up on the south of Singapore.

centres.sp <- tenders.sp %>% 
  filter(lat > 0) %>%
  group_by(centre, lon, lat) %>%
  summarise(price = mean(priceM2, na.rm = T)) %>%
  replace_na(list(price=0))
coordinates(centres.sp) <- c('lon', 'lat')
centres.ppp <- unmark(as.ppp(centres.sp))
sg <- readOGR(".", "sg-all")
OGR data source with driver: ESRI Shapefile 
Source: "/Users/arroyo/Desktop/stuff/Term 5/02221/Week 10/02221-Lab9", layer: "sg-all"
with 1 features
It has 13 fields
sg.window <- as.owin(sg)
centres.ppp <- centres.ppp[sg.window]
plot(Kest(centres.ppp))

plot(density(centres.ppp, 0.02))

contour(density(centres.ppp, 0.02))

The Kest plot shows strong evidence for clustering of hawker centers. This agrees with the density plot, where we can see that hawker centers tend to cluster towards the south of Singapore. This makes sense, since that is where the city center is.

pop <- as.im(readGDAL("sg-pop.tif"))
sg-pop.tif has GDAL driver GTiff 
and has 37 rows and 58 columns
plot(rhohat(centres.ppp, pop))

plot(rhohat(centres.ppp, pop, weights=centres.sp$price))

The above rho plots show that there is strong correlation between the population size in the region and number of hawker centers there. However, there is a sudden dip at where the population equal 15000. This is a strange anomaly.

plot(pop)
plot(centres.ppp, add=T)

Looking at the plot of where the hawker centers are, we see that the places causing the anomaly, are in Choa Chu Kang, Seng Kang and Woodlands area. These places have high population, but seemingly low number of hawker centres. After doing a quick check on Google maps, it seems that there are actually many hawker centers in these area, much higher than the numbers recorded in the data given by NEA. Checking the NEA website, we can see that there are only 114 NEA managed hawker centres in Singapore, and this data does not include many of the hawker centers in Choa Chu Kang, Seng Kang and Woodlands. Hence, we can conclude that the dip in the rho plots, is due to incomplete data. There are hawker centers there, but just not recorded.

---
title: "Making Maps Lab 9"
author: "Rayson Lim"
date: "29 March 2018"
output: 
  html_notebook:
    toc: TRUE
---

In this lab, we aim to perform spatial analyses on hawker centre rental prices in Singapore. First we proceed by reading in the .csv file.

```{r, echo = FALSE, results = 'hide'}
library(tidyverse)
library(lubridate)
library(scales)
library(sf)
library(sp)
library(rgdal)
library(spatstat)
library(ggmap)
library(maptools)
tenders <- read_csv("tabula-tender-bids-from-mar-2012-to-jan-2017.csv", col_names = c("centre", "stall", "area", "trade", "bid", "month"))
tenders <- tenders %>% 
  drop_na(centre) %>%  # drop empty rows
  filter(row_number() != 1135) %>% # remove 'lockup' row
  mutate(type = if_else(row_number() < 1135, "cooked", "lockup")) %>% # set type to 'cooked' for anything pre-lockup
  mutate(bidNum = as.numeric(str_replace_all(bid, pattern = "\\$|,", replacement = ""))) %>% # convert price to numeric
  mutate(date = dmy(paste0("01-", month))) %>% # parse date
  mutate(priceM2 = bidNum/area) # price per m2
```

Now with base data, we can answer several questions. For instance, is there a relation between area and price?

```{r}
tenders %>%
  drop_na(bidNum) %>%
  drop_na(area) %>%
  ggplot(aes(x=area, y=bidNum)) + 
  geom_point() +
  ggtitle("Scatter plot of bid price against area") + 
  ylab("Bid Price") + 
  xlab("Area")
```

From the plot above, it seems there there isn't really a linear correlation between area and bid price. Perhaps the bid price is more strongly related to where the hawker centre is located. Hawker centres located in places with more human traffic could see higher bid prices. 

Next we could ask, does the number of bids change over time?

```{r}
tenders %>%
  drop_na(date) %>%
  group_by(date) %>%
  summarise(count = n()) %>%
  ggplot(aes(x=date, y=count)) +
  geom_col() +
  scale_x_date(date_breaks = "1 year", labels = date_format(format = "%Y")) + 
  xlab("Time") + 
  ylab("Number of bids") + 
  ggtitle("Number of bids over time")
```
The above chart actually surprised me. There is a clear increase in number of bids over time. This suggests a higher demand for hawker centers in Singapore (more people want to have their own stalls). Considering that Singaporeans are getting more educated, I expected less people in Singapore to be willing to work as hawkers. Perhaps, the increase in number of bids is due to immigrants rather than Singaporeans. 

Building charts from this data may seem boring. Why not we add spatial data and do a spatial analysis. 

```{r}
library(sf)
centres <- st_read("hawker-centres/hawker-centres-kml.kml")
plot(centres)
centres <- centres %>% 
  mutate(centreName = str_extract(Description, "(?<=<th>NAME<\\/th> <td>).*?(?=<\\/td>)")) %>%
  mutate(centreUpper = toupper(centreName)) %>%
  mutate(lon = st_coordinates(centres)[,1]) %>%
  mutate(lat = st_coordinates(centres)[,2])

tenders.sp <- left_join(tenders, centres, by = c("centre" = "centreUpper"))
```
This kinda worked. But after looking through tenders.sp, it is obvious that the joining didn't go quite as well as we hoped. This is due to inconsistent spelling of the hawker centres' names. We shall fix that using Google's geocoding service. 

```{r, results = 'hide'}
centres.unique <- tenders %>% 
  group_by(centre) %>%
  summarise(count = n()) %>%
  mutate(location = paste0(centre, ", Singapore"))

# OVER_QUERY_LIMIT, boohoo
# g <- geocode(centres.unique$location, output = "latlon", source = "google", sensor = F) 
g <- read_csv("cleaned_centres-geocoded.csv")
g <- unique(g[c("centre","lat","lon")])
g <- g[-c(43),]
centres.unique <- centres.unique[centres.unique$centre %in% g$centre,]
centres <- bind_cols(centres.unique, g)
tenders.sp <- left_join(tenders, centres, by = c("centre" = "centre"))

ggplot(tenders.sp %>% drop_na(priceM2), aes(x=lon, y=lat, size=priceM2, color=type)) +
  geom_point() +
  geom_density2d() + 
  coord_fixed() +
  ggtitle("Number of bids in each area") +
  xlab("Longtitude") + 
  ylab("Latitude")
  
ggplot(tenders.sp, aes(x=lon, y=lat)) +
  geom_point() +
  geom_hex() + 
  coord_fixed() +
  ggtitle("Number of bids in each area") +
  xlab("Longitude") + 
  ylab("Latitude")
```

Now that we have the spatial distribution of hawker centres in Singapore, we could do further analysis on it. 

```{r}
ggplot(tenders.sp, aes(x=lon, y=lat)) +
  geom_point() +
  geom_hex() + 
  facet_wrap(~type, labeller = label_wrap_gen(width = 10, multi_line = TRUE)) +
  scale_x_continuous(breaks = pretty(tenders.sp$lon, n=5)) +
  theme(axis.text.x = element_text(angle = 90), strip.text = element_text(size = 6)) +
  xlab("Longitude") +
  ylab("Latitude") + 
  ggtitle("Spatial distribution of bids for different types")

```
The above chart shows the spatial distribution of bids for different types of stores. The spread of bids seem to be mostly the same except for an anomaly in the case of lockup stores in the southern tip of Singapore. 

```{r, fig.asp=1}
ggplot(tenders.sp, aes(x=lon, y=lat)) +
  geom_point(na.rm = T) +
  geom_hex(bins=20, na.rm = T) + 
  facet_wrap(~trade, labeller = label_wrap_gen(width = 10, multi_line = FALSE)) +
  scale_x_continuous(breaks = pretty(tenders.sp$lon, n=2)) +
  theme(axis.text.x = element_text(angle = 90), strip.text = element_text(size = 6)) +
  xlab("Longitude") +
  ylab("Latitude") + 
  ggtitle("Spatial distribution of bids for different trades")
```
The above plot shows the spatial distribution of bids for different trades. Again, the general distribution of the number of bids look around the same for most types of trades. It is interesting though, that bids for stalls selling mutton only show up on the south of Singapore. 

```{r}
centres.sp <- tenders.sp %>% 
  filter(lat > 0) %>%
  group_by(centre, lon, lat) %>%
  summarise(price = mean(priceM2, na.rm = T)) %>%
  replace_na(list(price=0))

coordinates(centres.sp) <- c('lon', 'lat')
centres.ppp <- unmark(as.ppp(centres.sp))

sg <- readOGR(".", "sg-all")
sg.window <- as.owin(sg)
centres.ppp <- centres.ppp[sg.window]

plot(Kest(centres.ppp))

plot(density(centres.ppp, 0.02))
contour(density(centres.ppp, 0.02))
```

The Kest plot shows strong evidence for clustering of hawker centers. This agrees with the density plot, where we can see that hawker centers tend to cluster towards the south of Singapore. This makes sense, since that is where the city center is. 

```{r}
pop <- as.im(readGDAL("sg-pop.tif"))
plot(rhohat(centres.ppp, pop))
```
```{r}
plot(rhohat(centres.ppp, pop, weights=centres.sp$price))
```

The above rho plots show that there is strong correlation between the population size in the region and number of hawker centers there. However, there is a sudden dip at where the population equal 15000. This is a strange anomaly.

```{r}
plot(pop)
plot(centres.ppp, add=T)
```
Looking at the plot of where the hawker centers are, we see that the places causing the anomaly, are in Choa Chu Kang, Seng Kang and Woodlands area. These places have high population, but seemingly low number of hawker centres. After doing a quick check on Google maps, it seems that there are actually many hawker centers in these area, much higher than the numbers recorded in the data given by NEA. Checking the NEA [website](http://www.nea.gov.sg/docs/default-source/public-health/Hawker-Centres-Division---Tenders/list-of-markets-and-hawker-centres.pdf), we can see that there are only 114 NEA managed hawker centres in Singapore, and this data does not include many of the hawker centers in Choa Chu Kang, Seng Kang and Woodlands. Hence, we can conclude that the dip in the rho plots, is due to incomplete data. There are hawker centers there, but just not recorded. 


