Extracting geocode from kml

centres<-st_read("./hawker-centres/hawker-centres-kml.kml")
plot(centres)

Now we want to extract the Lat/Lon for each centre and join those to the tenders data.frame
BUT we first need to extract the name of the center from the long, complex description field.

centres %>%
  mutate(centreName = str_extract(Description,"(?<=<th>NAME<\\/th> <td>).*?(?=<\\/td>)"))
centres <- centres %>%
  mutate(centreName = str_extract(Description,"(?<=<th>NAME<\\/th> <td>).*?(?=<\\/td>)")) %>%
  mutate(centreUpper = toupper(centreName)) %>%
  mutate(lon = st_coordinates(centres)[,1]) %>%
  mutate(lat = st_coordinates(centres)[,2])
tenders.sp <- left_join(tenders,centres,by=c("centre"="centreUpper"))

There dataset wasn’t joined properly as there are varying ways of spelling as well as alternate names of the same centre.
Another solution would be to manually geocode the data.

Manually Geocoding the Data

centres.unique <- tenders %>%
  group_by(centre) %>%
  summarise(count = n()) %>%
  mutate(location = paste0(centre,", Singapore"))

g <- geocode(centres.unique$location, output = "latlon", source = "google", sensor = F)

centres <- bind_cols(centres.unique,g)

tenders.sp <- left_join(tenders,centres,by=c("centre"="centre"))

There’s still some errors as not all centres got geocoded properly.
Fortunately, Prof. Ate has kindly given us the extracted lat/lon in a csv format.
I went into the csv file to match it with the cleaned up the data (manually)
Here we read in the csv file, “centres_geocoded_clean.csv”

centres<-read_csv("centres_geocoded_clean.csv")

Now we join the 2 dataframes together

tenders.sp <- left_join(tenders,centres,by=c("centre"="centre"))

We double check by doing a quick plot

There is lots of overplotting since many stalls are in the same hawker centre.
Let’s try to do a density plot instead.

## Warning: package 'hexbin' was built under R version 3.4.4

Do you think there is a specific spatial distribution with regard to the number of bids per ‘trade’ or ‘type’?
Or different patterns based on the year?
Create a (series of) faceted plot(s) to help you answer that question.

First, to see the spatial distribution with regards to the number of bids per trade, we plot a distribution of bids faceted by trade. It seems that bids for most trades are spread across the country. But there are a few which are the exception. Cut Fruits for example, have high bids in the south and to the west and is also not spread throughout. And the halal food items see low number of bids as well as bidding in very few centres.

tenders.sp %>%
  group_by(centre,trade)%>%
  ggplot(aes(x=lon,y=lat,colour=count))+
  geom_point()+
  coord_fixed()+
  scale_colour_continuous(trans='reverse')+
  facet_wrap( ~ trade,ncol=4)+ theme(panel.spacing = unit(2, "lines"))+
  labs(x="Longitude",y="Latitude",title="Spatial Distribution of the number of bids",subtitle="Faceted by trade\n")+
  theme(strip.text.x = element_text(size = 7))

Next, to see the spatial distribution with regards to the number of bids per type, we plot a distribution of bids faceted by type.
The number of bids look similar to one another. However, on closer examination, we can see that there are more bids in different centres fro cooked an market, with market type having the most. Market types also have somewhat more bids in the centres as well.

tenders.sp %>%
  group_by(centre,trade)%>%
  ggplot(aes(x=lon,y=lat,colour=count))+
  geom_point()+
  coord_fixed()+
  scale_colour_continuous(trans='reverse')+
  facet_wrap( ~ type,ncol=3)+ theme(panel.spacing = unit(2, "lines"))+
  labs(x="Longitude",y="Latitude",title="Spatial Distribution of the number of bids",subtitle="Faceted by type\n")+
  theme(strip.text.x = element_text(size = 7))

Finally, to see different patterns based on the year, I did a density plot of the price per M2 distribution faceted by year.
The patterns of price distribution is roughly the same through out the years, with the exception of 2017.

#Adding a year column
tenders.sp$year <- as.integer(substring(tenders.sp$date,1,4))
#Density plot faceted by year
ggplot(tenders.sp,aes(x=lon,y=lat))+ 
  geom_point()+ 
  geom_hex()+
  coord_fixed()+
  scale_fill_continuous(trans='reverse')+
  labs(x="Longitude",y="Latitude",title="Density plot of priceM2 distribution around Singapore",subtitle="Faceted by year\n")+
  facet_wrap( ~ year,ncol=2)+ theme(panel.spacing = unit(2, "lines"))


PART 0 - Preparation
PART 1 - Data Manipulation
PART 2 - Spatial Data
PART 3 - Spatial Point Patterns