Take-Home Exercise 02: Examination of Airbnb listings in Singapore

Section A: Nation-wide analysis

1) Exploratory Spatial Data Analysis:

Using appropriate tmap function, display the locations of the Airbnb listing by room type with Openstreemap of Singapore as the background. Describe the spatial patterns observed.

2) With reference to the spatial point patterns observed in (1):

Formulate the null hypothesis and alternative hypothesis and select the confidence level.
Perform the test by using appropriate 1st order spatial point patterns analysis technique.
With reference to the analysis results, draw statistical conclusions.

3) With reference to the results derived in (1) and (2):

Derive kernel density maps of Airbnb listing by room type.
Using appropriate tmap functions, display the kernel density maps on openstreetmap of Singapore. Describe the spatial patterns revealed by the kernel density maps. Highlight the advantage of kernel density map over point map.

Section B: By planning subzones

4) Exploratory Spatial Data Analysis:

Extract Airbnb listing by room type within Aljunied, Balestier, Lavender and Tanjong Pagar planning subzones.
Display their distribution as point maps. Describe the spatial patterns reveal by their respective distribution. #### 5) With reference to the spatial point patterns observed in (4):
Formulate the null hypothesis and alternative hypothesis and select the confidence level.
Perform the test by using appropriate 2ndorder spatial point patterns analysis technique.
With reference to the analysis results, draw statistical conclusions. #### 6) With reference to the results derived in (4) and (5):
Derive kernel density maps of Airbnb listing by room type.
Using appropriate tmap functions, display the kernel density maps on openstreetmap of Singapore. Deduce what are the possible factors determine the spatial patterns observed.

Part 0: Loading in Packages

For this assignment, we will be loading in the necessary packages to conduct both 1st and 2nd order spatial point pattern analysis. We will also be using the ‘osmdata’ package to load in the OpenStreetMap database.

packages = c('rgdal', 'maptools', 'raster','spatstat', 'tmap', 'tmaptools', 'tidyverse','sf')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}

## Loading required package: rgdal

## Loading required package: sp

## rgdal: version: 1.4-8, (SVN revision 845)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
##  Path to GDAL shared files: C:/Users/Isaac Lee/Documents/R/win-library/3.6/rgdal/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/Isaac Lee/Documents/R/win-library/3.6/rgdal/proj
##  Linking to sp version: 1.4-1

## Loading required package: maptools

## Checking rgeos availability: TRUE

## Loading required package: raster

## Loading required package: spatstat

## Loading required package: spatstat.data

## Loading required package: nlme

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:raster':
## 
##     getData

## Loading required package: rpart

## 
## spatstat 1.64-1       (nickname: 'Help you I can, yes!') 
## For an introduction to spatstat, type 'beginner'

## 
## Attaching package: 'spatstat'

## The following objects are masked from 'package:raster':
## 
##     area, rotate, shift

## Loading required package: tmap

## Loading required package: tmaptools

## Loading required package: tidyverse

## Registered S3 method overwritten by 'cli':
##   method     from    
##   print.boxx spatstat

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.0     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   0.8.5
## v tidyr   1.0.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::collapse() masks nlme::collapse()
## x tidyr::extract()  masks raster::extract()
## x dplyr::filter()   masks stats::filter()
## x dplyr::lag()      masks stats::lag()
## x dplyr::select()   masks raster::select()

## Loading required package: sf

## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

Section A: Nation-wide analysis

Part 1) Exploratory Spatial Data Analysis:

Step 1) Loading and formating in Airbnb listings

Loadng in Airbnb listings as a simple dataframe

listings <- read_csv("data/aspatial/listings.csv")

## Parsed with column specification:
## cols(
##   id = col_double(),
##   name = col_character(),
##   host_id = col_double(),
##   host_name = col_character(),
##   neighbourhood_group = col_character(),
##   neighbourhood = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   room_type = col_character(),
##   price = col_double(),
##   minimum_nights = col_double(),
##   number_of_reviews = col_double(),
##   last_review = col_date(format = ""),
##   reviews_per_month = col_double(),
##   calculated_host_listings_count = col_double(),
##   availability_365 = col_double()
## )

Converting aspatial data into sf dataframe

For the purpose of data wrangling, we would usually check for NA values prior to this, however we will perform data cleaning right after this.

listings_sf <- st_as_sf(listings, 
                       coords = c("longitude", "latitude"), crs=4326) #In this case we are setting the CRS in accordance to the World Geodetic System as the data provided to us are in terms of GPS coordinates


listings_sf <- st_transform(listings_sf, crs = 3414) #Reprojecting
st_crs(listings_sf) #Checking CRS

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

NA/Validity Check 1

listings_sf[rowSums(is.na(listings_sf))!=0,]

## Simple feature collection with 2826 features and 14 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 10737.87 ymin: 25166.35 xmax: 43401.32 ymax: 48466.72
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 2,826 x 15
##        id name  host_id host_name neighbourhood_g~ neighbourhood room_type price
##     <dbl> <chr>   <dbl> <chr>     <chr>            <chr>         <chr>     <dbl>
##  1 1.18e5 Priv~  448620 Lynnity   North-East Regi~ Sengkang      Private ~    67
##  2 3.56e5 Doub~ 1759905 Aresha    Central Region   Geylang       Private ~    80
##  3 4.82e5 Mast~ 2386154 Susan     East Region      Tampines      Private ~    38
##  4 6.43e5 BEST~ 3212572 Roger     Central Region   Rochor        Private ~   174
##  5 7.20e5 HDB ~ 3648224 Audrey    Central Region   Queenstown    Shared r~   217
##  6 7.34e5 Home~ 3824517 Shirlnet  North-East Regi~ Serangoon     Private ~    26
##  7 7.56e5 Bedr~ 3981177 Sharon    East Region      Pasir Ris     Private ~   206
##  8 7.68e5 Comm~ 4053150 Immellym~ North-East Regi~ Punggol       Private ~   174
##  9 8.24e5 Apar~ 4177147 Tania     East Region      Bedok         Private ~   290
## 10 1.56e6 Delu~ 8270362 Domus     Central Region   Novena        Private ~    30
## # ... with 2,816 more rows, and 7 more variables: minimum_nights <dbl>,
## #   number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
## #   calculated_host_listings_count <dbl>, availability_365 <dbl>,
## #   geometry <POINT [m]>

test <- st_is_valid(listings_sf)
length(which(test==FALSE))

## [1] 0

All points are valid, replacing - in reviews_per_month from to “0” values
- in last_review from NA to “2013-10-21” which is the earliest date in our dataset - in host_name from NA to “Unknown” - in name from NA to “Unknown” We will be making these changes although we wont be using these columns for any analysis..

listings_sf$last_review[is.na(listings_sf$last_review)] <- "2013-10-21"
listings_sf$reviews_per_month[is.na(listings_sf$reviews_per_month)] <- 0
listings_sf$host_name[is.na(listings_sf$host_name)] <- "Unknown"
listings_sf$name[is.na(listings_sf$name)] <- "Unknown"

NA/Validity Check 2

listings_sf[rowSums(is.na(listings_sf))!=0,]

## Simple feature collection with 0 features and 14 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 0 x 15
## # ... with 15 variables: id <dbl>, name <chr>, host_id <dbl>, host_name <chr>,
## #   neighbourhood_group <chr>, neighbourhood <chr>, room_type <chr>,
## #   price <dbl>, minimum_nights <dbl>, number_of_reviews <dbl>,
## #   last_review <date>, reviews_per_month <dbl>,
## #   calculated_host_listings_count <dbl>, availability_365 <dbl>,
## #   geometry <GEOMETRY [m]>

Cleared of NA values

Step 2) Plotting on t-maps

tmap_mode("view")

tm_basemap(leaflet::providers$OpenStreetMap)+
  tm_shape(listings_sf)+
  tm_dots("room_type", size = 0.01, title = "Room Type")

Step 3) Interpretation of Dot-plot

Based on the points we can see on the map, there seems to be signficiantly more points towards the south-central part of Singapore within the city areas. However, due to the large density of the plot, it is hard to determne the distribution of room types. Specifically, this isdue to the majorty of listings visible on the map to be mostly Entire Homes/apartments and Private Rooms. Although we could filter the data to reveal the distribution, it would be best to take a more objective approach to determine the pattern of the distribution of points.

It would be interesting to map the plot against public transportation routes. As often short-stays within Singapore do not require a visitor to rent a car to move around the country, there may be an interaction between the location of major public transportation routes such MRT routes and major Bus routes to these listing locations.

Part 2) Spatial Point Analysis

Step 1) Define Null Hypothesis, Alternative Hypothesis and Confidence Interval

For the purposes of Spatial Point Analysis, we are attempting to determine a pattern in terms of either clustering or standardization of point spread. However, in order to do so, we must first reject the null hypothesis that they are randomly distributed to arrive at any conclusion. Thus:

The test hypotheses are:

Ho = The distribution of Airbnb services are randomly distributed.
H 1= The distribution of Airbnb services are not randomly distributed.

We will set a 95% confidence interval for the purpose of this study.

Step 1) Convert to SpatialPointsDataFrame format

listings_spdf <- as_Spatial(listings_sf)

Step 2) Importing required Singapore spatial data

To make it geospatial data, we need to match with the according geospatial subzone data, according to the website, this information is still map to the Master Plan 2014 subzones with no sea. For this exercise, we will use the shapefile instead of the KML files also provided

Source: https://data.gov.sg/dataset/master-plan-2014-subzone-boundary-no-sea

Loading in MP14 Planning Area Subzone Data

mpsz <- readOGR(dsn = "data/geospatial/MP14_SUBZONE", layer="MP14_SUBZONE_NO_SEA_PL")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\data\GSA\IS415_Take-home_Ex02\data\geospatial\MP14_SUBZONE", layer: "MP14_SUBZONE_NO_SEA_PL"
## with 323 features
## It has 15 fields

mpsz <- st_as_sf(mpsz) # Converts to Simple Feature DataFrame

NA/Validity Check 1

mpsz[rowSums(is.na(mpsz))!=0,]

## Simple feature collection with 0 features and 15 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## CRS:            +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
##  [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
##  [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
## [13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
## <0 rows> (or 0-length row.names)

test <- st_is_valid(mpsz)
length(which(test==FALSE))

## [1] 9

Rowas are cleared of NA, 9 polygons invalid.

Make valid polygons

mpsz <- st_transform(mpsz, crs= 3414) #Set the CRS to conform to SVY21 projection
mpsz <- st_make_valid(mpsz) #Makes All Polygons Valid

NA/Validity Check 2

mpsz[rowSums(is.na(mpsz))!=0,]

## Simple feature collection with 0 features and 15 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21 / Singapore TM
##  [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
##  [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
## [13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
## <0 rows> (or 0-length row.names)

test <- st_is_valid(mpsz)
length(which(test==FALSE))

## [1] 0

All cleared

Convert to SpatialPolygon Dataframe

mpsz_spdf <- as_Spatial(mpsz)

Merging polygons to create an outline map for SG.

This will be used later to create the owin file necessary for kde analysis.

mpsz$area <- st_area(mpsz)
sg_outline <-
  mpsz %>%
  summarise(area = sum(area))

tmap_mode("plot")

## tmap mode set to plotting

tm_shape(sg_outline)+
  tm_fill("area")

Step 3) Converting files into ppp format

Converting to Spatial Polygon Dataframe

sg_outline_spdf <- as_Spatial(sg_outline)

Converting Spatial Dataframes to Spatial Points

listings_sp <- as(listings_spdf, "SpatialPoints")
mpsz_sp <- as(mpsz_spdf, "SpatialPolygons")
sg_outline_sp <- as(sg_outline_spdf, "SpatialPolygons")

Converting from Spatial Points and Spatial Polygons to ppp format

listings_ppp <- as(listings_sp, "ppp")

Checking for duplicates

any(duplicated(listings_ppp))

## [1] TRUE

Seems we have some duplicates

sum(multiplicity(listings_ppp) > 1)

## [1] 243

A lot of duplicates

Removing duplicates using a jitter

set.seed(1234)
listings_ppp_jit <- rjitter(listings_ppp, retry=TRUE, nsim=1, drop=TRUE)

Reverification of duplicates

any(duplicated(listings_ppp_jit))

## [1] FALSE

sum(multiplicity(listings_ppp_jit) > 1)

## [1] 0

Cleared of duplicates

Creating Owin file using outline made earlier

sg_owin <- as(sg_outline_sp, "owin")

plot(sg_owin)

summary(sg_owin)

## Window: polygonal boundary
## 80 separate polygons (35 holes)
##                   vertices         area relative.area
## polygon 1            14663  6.97996e+08      8.93e-01
## polygon 2 (hole)         3 -3.99521e-02     -5.11e-11
## polygon 3 (hole)         3 -4.95057e-02     -6.33e-11
## polygon 4 (hole)         3 -3.65501e-03     -4.67e-12
## polygon 5 (hole)         4 -2.05611e-02     -2.63e-11
## polygon 6 (hole)         3 -2.05920e-03     -2.63e-12
## polygon 7 (hole)         3 -2.89050e-05     -3.70e-14
## polygon 8 (hole)         3 -2.83151e-01     -3.62e-10
## polygon 9 (hole)        40 -6.00381e+03     -7.68e-06
## polygon 10 (hole)        7 -1.40545e-01     -1.80e-10
## polygon 11 (hole)       28 -1.99862e+01     -2.56e-08
## polygon 12 (hole)       48 -1.38338e+02     -1.77e-07
## polygon 13 (hole)       12 -8.36709e+01     -1.07e-07
## polygon 14 (hole)       20 -4.39069e+00     -5.62e-09
## polygon 15 (hole)      351 -1.21433e+03     -1.55e-06
## polygon 16 (hole)       26 -1.25665e+03     -1.61e-06
## polygon 17 (hole)        3 -2.18000e-06     -2.79e-15
## polygon 18 (hole)        3 -6.62377e-01     -8.47e-10
## polygon 19 (hole)        3 -2.09065e-03     -2.67e-12
## polygon 20 (hole)       36 -7.79904e+03     -9.97e-06
## polygon 21 (hole)        3 -1.68316e-04     -2.15e-13
## polygon 22 (hole)       36 -4.01660e+04     -5.14e-05
## polygon 23 (hole)      317 -5.11280e+04     -6.54e-05
## polygon 24 (hole)        3 -8.83647e-03     -1.13e-11
## polygon 25 (hole)        3 -2.21090e+00     -2.83e-09
## polygon 26             711  1.28815e+07      1.65e-02
## polygon 27 (hole)        3 -3.41405e-01     -4.37e-10
## polygon 28               4  9.47108e+01      1.21e-07
## polygon 29              37  1.29481e+04      1.66e-05
## polygon 30              30  4.28933e+03      5.49e-06
## polygon 31             145  9.61782e+05      1.23e-03
## polygon 32              15  4.03300e+04      5.16e-05
## polygon 33             227  1.10308e+06      1.41e-03
## polygon 34              19  3.09221e+04      3.95e-05
## polygon 35              10  6.60195e+03      8.44e-06
## polygon 36             234  4.72886e+05      6.05e-04
## polygon 37               4  2.69313e+02      3.44e-07
## polygon 38             132  9.53357e+04      1.22e-04
## polygon 39               6  4.50259e+02      5.76e-07
## polygon 40              71  5.63061e+03      7.20e-06
## polygon 41              10  1.99717e+02      2.55e-07
## polygon 42              75  1.73526e+04      2.22e-05
## polygon 43             211  4.70521e+05      6.02e-04
## polygon 44              83  5.28920e+03      6.76e-06
## polygon 45             106  3.04104e+03      3.89e-06
## polygon 46            1027  1.27782e+06      1.63e-03
## polygon 47 (hole)        3 -3.23310e-04     -4.13e-13
## polygon 48 (hole)        3 -1.16959e-03     -1.50e-12
## polygon 49 (hole)        3 -1.46474e-03     -1.87e-12
## polygon 50             155  2.67502e+05      3.42e-04
## polygon 51              47  3.82087e+04      4.89e-05
## polygon 52              65  8.42861e+04      1.08e-04
## polygon 53             478  2.06120e+06      2.64e-03
## polygon 54             266  1.50631e+06      1.93e-03
## polygon 55             234  2.08755e+06      2.67e-03
## polygon 56              10  4.90942e+02      6.28e-07
## polygon 57              22  6.74651e+03      8.63e-06
## polygon 58              95  5.96187e+04      7.62e-05
## polygon 59 (hole)        4 -1.86410e-02     -2.38e-11
## polygon 60              14  5.86546e+03      7.50e-06
## polygon 61              64  3.43149e+04      4.39e-05
## polygon 62 (hole)        3 -5.12482e-03     -6.55e-12
## polygon 63 (hole)        3 -1.96410e-03     -2.51e-12
## polygon 64 (hole)        3 -1.98390e-03     -2.54e-12
## polygon 65 (hole)        3 -5.55856e-03     -7.11e-12
## polygon 66 (hole)        4 -1.13774e-02     -1.46e-11
## polygon 67            1045  4.44510e+06      5.68e-03
## polygon 68 (hole)       13 -3.91907e+02     -5.01e-07
## polygon 69             148  3.10395e+03      3.97e-06
## polygon 70             142  3.22293e+03      4.12e-06
## polygon 71              45  2.51218e+03      3.21e-06
## polygon 72              40  1.38607e+04      1.77e-05
## polygon 73              91  1.49663e+04      1.91e-05
## polygon 74              71  8.18750e+03      1.05e-05
## polygon 75             668  5.40368e+07      6.91e-02
## polygon 76              30  2.80002e+04      3.58e-05
## polygon 77              77  3.29939e+05      4.22e-04
## polygon 78              27  1.50315e+04      1.92e-05
## polygon 79              44  2.26577e+03      2.90e-06
## polygon 80             285  1.61128e+06      2.06e-03
## enclosing rectangle: [2667.54, 56396.44] x [15748.72, 50256.33] units
##                      (53730 x 34510 units)
## Window area = 781945000 square units
## Fraction of frame area: 0.422

Combining ppp files

listings_SG_ppp = listings_ppp_jit[sg_owin]

plot(listings_SG_ppp)

Here we can see the plots are clearly concentrated towards the center.

Step 4) Perform Chi-square test

quadrat_t <- quadrat.test(listings_SG_ppp, 
                   nx = 20, ny = 15)

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

quadrat_t

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  listings_SG_ppp
## X2 = 76113, df = 192, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

plot(listings_SG_ppp)
plot(quadrat_t, add = TRUE, cex =.1)

Step 5) Interpretation of Chi-Square test

Based on our Hypothesis and condifence level specified in Step 1. We can safely reject the null hypothesis at a 95% confidence level. This is due to our extremely low p-value of 2.2e-16. Additionally, with the high X2-value above 1, there seems to be strong indication in clustering of spatial data points for Airbnb listings. Based on the observations on the map, it is clearly due to the high concentration of spatial points towards the southern of Singapore and concentrated in small number of sections of the grid.

Step 6) Conditional Monte Carlo test of CSR using quadrat counts

quadrat.test(listings_SG_ppp, 
             nx = 20, ny = 15,
             method="M",
             nsim=999) #1000 simulations

## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  listings_SG_ppp
## X2 = 76113, p-value = 0.002
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

After utilizing the conditional Monte Carlo test of CSR through 1000 simulaitons, our results remain consistent at rejecting the null hypothesis that the distribution of points are random. The high x2-value also indicates strong clustering remains.

Step 7) Perfomring Nearest Neighbour Analysis using the Clark and Evans Test

The null hypotheis, alternative hypothesis and confidence interval will remain unchanged from Step 1 for the purpose of this analysis.

clarkevans.test(listings_SG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("two.sided"),
                nsim=99)

## 
##  Clark-Evans test
##  No edge correction
##  Monte Carlo test based on 99 simulations of CSR with fixed n
## 
## data:  listings_SG_ppp
## R = 0.33359, p-value = 0.02
## alternative hypothesis: two-sided

Step 8) Interpretation of the Clark-Evans test

Based on the clark evans test with 99 simulations, we are able to reject the null hypothesis at a 95% confidence level with a p-value of 0.02. Although we need to take into account that the p-value is significantly higher compared to our previous test. Additionally, with an index of 0.33359 which is less than 1, there is indication of clustering.

Part 3) Kernal Density Estimation

Step 1) Computing kernel density estimation using adaptive bandwidth selection method

listings_SG_ppp.km <- rescale(listings_SG_ppp, 1000, "km") # converting to KM
kde_AirBnBSG_bw <- density(listings_SG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")

Step 2) Plotting distribution

plot(kde_AirBnBSG_bw)

Step 3) Interpretation of KDE distribution for adaptive bandwidth.

Unfortunately, due to the high concentration of points in certain locations, utilizing the adaptive bandwidth results in a siginficant concentration in a number of rasters while other rasters have no values in them. This results in a large range which is hard to interpret points not on the extremes. Although there is clear signs of large densities within the central region.

Step 4) Computing kernel density estimation using defining bandwidth manually

kde_AirBnBSG_600 <- density(listings_SG_ppp.km, sigma=0.6, edge=TRUE, kernel="gaussian")
plot(kde_AirBnBSG_600)

Step 5) Interpretation of KDE distribution using fixed widths

The plot above us provides us with clear indications of which zones are clustered with the added benefit of having a density value which is understandable and at a reasonable range. However the draw-backs is often the determination for sigma-value or the bandwidth are arbitrary. Additionally there is a requirement to change the value of the bandwidth in relation to the resolution of the raster may result in diffcult implementation.

Step 6) Converting KDE output into grid object.

kde_AirBnBSG_600_grid <- as.SpatialGridDataFrame.im(kde_AirBnBSG_600)
spplot(kde_AirBnBSG_600_grid)

Utiliing grid object plot actually provides us with additional insight, specifically of areas in which the KDE measurement is of value 0. Although this can likely be set manually for other object classes.

Step 7) Converting gridded output into raster

kde_AirBnBSG_600_raster <- raster(kde_AirBnBSG_600_grid)

Assigning projection systems

projection(kde_AirBnBSG_600_raster) <- CRS("+init=EPSG:3414")

Step 8) Visualising the output in tmap

tmap_mode("plot")

## tmap mode set to plotting

tm_shape(kde_AirBnBSG_600_raster) + 
  tm_raster("v", n=8, title= "Listings") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Step 6) Interpretation of KDE Estimate

By utilizing KDE to map, we are able to get more objective measures of distribution of spatial points. However, the clear disadvantage is that these value ranges often result in a loss in ability to distinguish the distribution of smaller values in the range of densities.

Step 7) Overall conclusion of 1st order analysis

Based on the 3 types of Spatial Point Analysis we’ve conducted, its clear there are areas towards the south-central part of Singapore with a significantly large clustered number of Airbnb listings. When comparing and contrasting the methods, it is clear that for visualization purposes, KDE provides a more objective view of the distribution. However, one should not discount the ability for dot-plots to show us how the points spread throughout the country which may be in relation to other factors such as methods of transportation to the city center.

The clustering within the south-central area which is near the central business districts in Singapore is likely related. It could suggest that the people listing on Airbnb are targetting expatriates working office jobs. However, this could also be due to the hotel rooms being in the center. We will need to examine the data furthur by room type to make a certain judgement.

Part 4) Kernel density maps of Airbnb listing by room type

Step 1) Setting up data for plotting

Entire_home_sf <- listings_sf %>%
  filter(room_type == "Entire home/apt")
Hotel_room_sf <- listings_sf %>%
  filter(room_type == "Hotel room")
Private_room_sf <- listings_sf %>%
  filter(room_type == "Private room")
Shared_room_sf <- listings_sf %>%
  filter(room_type == "Shared room")

Entire_home_spdf <- as_Spatial(Entire_home_sf)
Hotel_room_spdf <- as_Spatial(Hotel_room_sf)
Private_room_spdf <- as_Spatial(Private_room_sf)
Shared_room_spdf <- as_Spatial(Shared_room_sf)

Entire_home_sp <- as(Entire_home_spdf, "SpatialPoints")
Hotel_room_sp <- as(Hotel_room_spdf, "SpatialPoints")
Private_room_sp <- as(Private_room_spdf, "SpatialPoints")
Shared_room_sp <- as(Shared_room_spdf, "SpatialPoints")

Entire_home_ppp <- as(Entire_home_sp, "ppp")
Hotel_room_ppp <- as(Hotel_room_sp, "ppp")
Private_room_ppp <- as(Private_room_sp, "ppp")
Shared_room_ppp <- as(Shared_room_sp, "ppp")

Step 2) Testing for duplicates

any(duplicated(Entire_home_ppp))

## [1] TRUE

any(duplicated(Hotel_room_ppp))

## [1] TRUE

any(duplicated(Private_room_ppp))

## [1] TRUE

any(duplicated(Shared_room_ppp))

## [1] TRUE

Step 3) Removing duplicates for all using jitter

set.seed(1234)
Entire_home_ppp_jit <- rjitter(Entire_home_ppp, retry=TRUE, nsim=1, drop=TRUE)
Hotel_room_ppp_jit <- rjitter(Hotel_room_ppp, retry=TRUE, nsim=1, drop=TRUE)
Private_room_ppp_jit <- rjitter(Private_room_ppp, retry=TRUE, nsim=1, drop=TRUE)
Shared_room_ppp_jit <- rjitter(Shared_room_ppp, retry=TRUE, nsim=1, drop=TRUE)

Step 4) Rechecking for duplicates

any(duplicated(Entire_home_ppp_jit))

## [1] FALSE

any(duplicated(Hotel_room_ppp_jit))

## [1] FALSE

any(duplicated(Private_room_ppp_jit))

## [1] FALSE

any(duplicated(Shared_room_ppp_jit))

## [1] FALSE

Cleared of duplicates

Step 5) Appending PPP files to owin file of Singapore borders

Entire_home_SG = Entire_home_ppp_jit[sg_owin]
Hotel_room_SG = Hotel_room_ppp_jit[sg_owin]
Private_room_SG = Private_room_ppp_jit[sg_owin]
Shared_room_SG = Shared_room_ppp_jit[sg_owin]

Step 6) Rescaling

Entire_home_SG.km <- rescale(Entire_home_SG, 1000, "km")
Hotel_room_SG.km <- rescale(Hotel_room_SG, 1000, "km")
Private_room_SG.km <- rescale(Private_room_SG, 1000, "km")
Shared_room_SG.km <- rescale(Shared_room_SG, 1000, "km")

Step 7) Computing kernel density estimation using defining bandwidth manually

For the purpose of consistency, we will be utilizing the same fixed-bandwidth method

kde_Entire_home_SG_600 <- density(Entire_home_SG.km, sigma=0.6, edge=TRUE, kernel="gaussian")
kde_Hotel_room_SG_600 <- density(Hotel_room_SG.km, sigma=0.6, edge=TRUE, kernel="gaussian")
kde_Private_room_SG_600 <- density(Private_room_SG.km, sigma=0.6, edge=TRUE, kernel="gaussian")
kde_Shared_room_SG_600 <- density(Shared_room_SG.km, sigma=0.6, edge=TRUE, kernel="gaussian")

Step 8) Plotting results

plot(kde_Entire_home_SG_600)

plot(kde_Hotel_room_SG_600)

plot(kde_Private_room_SG_600)

plot(kde_Shared_room_SG_600)

Based on the inital plots, although the range of values or intervals were not aligned between the plots, it is clear that our initial assumption that hotel rooms could possibly be driving the high concentration of listings near the CBD to be minimal. In fact, it would seem that Entire homes and Private rooms are commanding higher kernal densities. These values particuarly near the south-central part of Singapore are nearly double the hotel room kernal densities. This could largely be due to the high number of overal Entire home /apartment and Private room poins throughout Singapore relative to Hotel rooms or Shared rooms.

However, with the larger ranges, the dot plot map in the earlier parts had also showed us that the distributions of Entire home.apartment and Private rooms actually more likely span outwards throughout Singapore. Despite this, the concentration still remains high within the areas of south-central Singapore relative to the rest of the map in all KDE plots.

Step 9) Formatting for OpenStreetMap plots

kde_Entire_home_SG_600_grid <- as.SpatialGridDataFrame.im(kde_Entire_home_SG_600)
kde_Hotel_room_SG_600_grid <- as.SpatialGridDataFrame.im(kde_Hotel_room_SG_600)
kde_Private_room_SG_600_grid <- as.SpatialGridDataFrame.im(kde_Private_room_SG_600)
kde_Shared_room_SG_600_grid <- as.SpatialGridDataFrame.im(kde_Shared_room_SG_600)


kde_Entire_home_SG_600_raster <- raster(kde_Entire_home_SG_600_grid)
kde_Hotel_room_SG_600_raster <- raster(kde_Hotel_room_SG_600_grid)
kde_Private_room_SG_600_raster <- raster(kde_Private_room_SG_600_grid)
kde_Shared_room_SG_600_raster <- raster(kde_Shared_room_SG_600_grid)

#kde_Entire_home_SG_600_raster[kde_Entire_home_SG_600_raster < 0.0001 ] <- NA
#kde_Hotel_room_SG_600_raster[kde_Hotel_room_SG_600_raster < 0.0001 ] <- NA
#kde_Private_room_SG_600_raster[kde_Private_room_SG_600_raster < 0.0001 ] <- NA
#kde_Shared_room_SG_600_raster[kde_Shared_room_SG_600_raster < 0.0001 ] <- NA




projection(kde_Entire_home_SG_600_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")
projection(kde_Hotel_room_SG_600_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")
projection(kde_Private_room_SG_600_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")
projection(kde_Shared_room_SG_600_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")

Checking Resolution of data

res(kde_Entire_home_SG_600_raster)

## [1] 0.4197570 0.2695907

res(kde_Hotel_room_SG_600_raster)

## [1] 0.4197570 0.2695907

res(kde_Private_room_SG_600_raster)

## [1] 0.4197570 0.2695907

res(kde_Shared_room_SG_600_raster)

## [1] 0.4197570 0.2695907

Seems our data is not in the proper resolution.

Step 10) Rescaling

extent(kde_Entire_home_SG_600_raster) <- extent(c(xmin(kde_Entire_home_SG_600_raster), xmax(kde_Entire_home_SG_600_raster), ymin(kde_Entire_home_SG_600_raster), ymax(kde_Entire_home_SG_600_raster))*1000)
projection(kde_Entire_home_SG_600_raster) <- gsub("units=km", "units=m", projection(kde_Entire_home_SG_600_raster))


extent(kde_Hotel_room_SG_600_raster) <- extent(c(xmin(kde_Hotel_room_SG_600_raster), xmax(kde_Hotel_room_SG_600_raster), ymin(kde_Hotel_room_SG_600_raster), ymax(kde_Hotel_room_SG_600_raster))*1000)
projection(kde_Hotel_room_SG_600_raster) <- gsub("units=km", "units=m", projection(kde_Hotel_room_SG_600_raster))


extent(kde_Private_room_SG_600_raster) <- extent(c(xmin(kde_Private_room_SG_600_raster), xmax(kde_Private_room_SG_600_raster), ymin(kde_Private_room_SG_600_raster), ymax(kde_Private_room_SG_600_raster))*1000)
projection(kde_Private_room_SG_600_raster) <- gsub("units=km", "units=m", projection(kde_Private_room_SG_600_raster))


extent(kde_Shared_room_SG_600_raster) <- extent(c(xmin(kde_Shared_room_SG_600_raster), xmax(kde_Shared_room_SG_600_raster), ymin(kde_Shared_room_SG_600_raster), ymax(kde_Shared_room_SG_600_raster))*1000)
projection(kde_Shared_room_SG_600_raster) <- gsub("units=km", "units=m", projection(kde_Shared_room_SG_600_raster))

Re-Checking Resolution of data

res(kde_Entire_home_SG_600_raster)

## [1] 419.7570 269.5907

res(kde_Hotel_room_SG_600_raster)

## [1] 419.7570 269.5907

res(kde_Private_room_SG_600_raster)

## [1] 419.7570 269.5907

res(kde_Shared_room_SG_600_raster)

## [1] 419.7570 269.5907

Now our rasterlayer should be in the proper resolution

Step 11) Reprojecting CRS to EPSG 3414

projection(kde_Entire_home_SG_600_raster) <- CRS("+init=EPSG:3414")
projection(kde_Hotel_room_SG_600_raster) <- CRS("+init=EPSG:3414")
projection(kde_Private_room_SG_600_raster) <- CRS("+init=EPSG:3414")
projection(kde_Shared_room_SG_600_raster) <- CRS("+init=EPSG:3414")

kde_Entire_home_SG_600_raster <- crop(kde_Entire_home_SG_600_raster, extent(sg_outline_sp))
kde_Hotel_room_SG_600_raster <- crop(kde_Hotel_room_SG_600_raster, extent(sg_outline_sp))
kde_Private_room_SG_600_raster <- crop(kde_Private_room_SG_600_raster, extent(sg_outline_sp))
kde_Shared_room_SG_600_raster <- crop(kde_Shared_room_SG_600_raster, extent(sg_outline_sp))

Step 12) Plotting layers onto an OpenStreetMap

tmap_mode("view")

tm_basemap(leaflet::providers$OpenStreetMap)+ ### Need to set intervals
tm_shape(kde_Entire_home_SG_600_raster) + 
  tm_raster("v", n=5, title= "Entire Home", alpha=0.4,breaks = c(0,30, 60, 90, 100, Inf) ,drop.levels = TRUE, palette = "Oranges")+ 
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
tm_shape(kde_Hotel_room_SG_600_raster) + 
  tm_raster("v", n=5, title= "Hotel Rm", alpha=0.2, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, palette = "Oranges") +
  tm_layout(legend.position = c("right", "top"), frame = FALSE) +
tm_shape(kde_Private_room_SG_600_raster) + 
  tm_raster("v", n=5, title= "Private Rm", alpha=0.2, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, palette = "Oranges")+
  tm_layout(legend.position = c("left", "top"), frame = FALSE) +
tm_shape(kde_Shared_room_SG_600_raster) + 
  tm_raster("v", n=5, title= "Shared Rm", alpha=0.2, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, palette = "Oranges")+
  tm_layout(legend.position = c("left", "bottom"), frame = FALSE)

Step 13) Interpretation of raster plot on interactive map.

Unfortunately, doing such plots has resulted in sigfincantly longer run-times which are not ideal for proper interaction with the widget. Additionally, even when standardizing the breaks and assigning and alpha of 0.2, it is hard to distinguish raster layer from raster layer.

Overall the ability to filter between raster layers actually allows us to see how that room types actually cluster differently from one-another. Entire rooms type seems to show the most obvious clustering where as Hotel rooms and Shared rooms are harder to distringuish from the background of the interactive map even after manipulating the palettes. This is likely simply due to the fact that there are very little spatial points for Hotel room and Shared room types relative to Entire homes or Private rooms while KDE analysis over the same dimensions for the rasters.

Overall the data seems to suggest that the location of clusters could be differentated between room types. This could be due to how the relate to one another or by external factors such as the concetration of specific residential housing blocks or commerical hotel areas as zoned. Thus, it would be prudent for us to examine subzones of high clustering to see if they indeed have different types of housing concentrations within them.

Section B: Planning Subzone

Part 4) Exploratory Spatial Data Analysis: Extracting Data by subzone

Step 1) Extracting the Study Areas

Extract Airbnb listing by room type within Aljunied, Balestier, Lavender and Tanjong Pagar planning subzones.

AJND <- mpsz_spdf[mpsz_spdf@data$SUBZONE_N == "ALJUNIED",]
BLTR <- mpsz_spdf[mpsz_spdf@data$SUBZONE_N == "BALESTIER",]
LVDR <- mpsz_spdf[mpsz_spdf@data$SUBZONE_N == "LAVENDER",]
TJPR <- mpsz_spdf[mpsz_spdf@data$SUBZONE_N == "TANJONG PAGAR",]

Plotting to check output

tmap_mode("plot")

## tmap mode set to plotting

plot(AJND)

plot(BLTR)

plot(LVDR)

plot(TJPR)

It is important for us to consider the difference in shapes of these polygons during our analysis.

Step 2) Converting the spatial point data frame into generic sp format

AJND_sp = as(AJND, "SpatialPolygons")
BLTR_sp = as(BLTR, "SpatialPolygons")
LVDR_sp = as(LVDR, "SpatialPolygons")
TJPR_sp = as(TJPR, "SpatialPolygons")

Step 3) converting to OWIN Objects

AJND_owin = as(AJND_sp, "owin")
BLTR_owin = as(BLTR_sp, "owin")
LVDR_owin = as(LVDR_sp, "owin")
TJPR_owin = as(TJPR_sp, "owin")

Step 4) Combining Airbnb by room type points and the study area

AJND_ppp_EH <- Entire_home_ppp_jit[AJND_owin]
BLTR_ppp_EH <- Entire_home_ppp_jit[BLTR_owin]
LVDR_ppp_EH <- Entire_home_ppp_jit[LVDR_owin]
TJPR_ppp_EH <- Entire_home_ppp_jit[TJPR_owin]


AJND_ppp_HR <- Hotel_room_ppp_jit[AJND_owin]
BLTR_ppp_HR <- Hotel_room_ppp_jit[BLTR_owin]
LVDR_ppp_HR <- Hotel_room_ppp_jit[LVDR_owin]
TJPR_ppp_HR <- Hotel_room_ppp_jit[TJPR_owin]

AJND_ppp_PR <- Private_room_ppp_jit[AJND_owin]
BLTR_ppp_PR <- Private_room_ppp_jit[BLTR_owin]
LVDR_ppp_PR <- Private_room_ppp_jit[LVDR_owin]
TJPR_ppp_PR <- Private_room_ppp_jit[TJPR_owin]

AJND_ppp_SR <- Shared_room_ppp_jit[AJND_owin]
BLTR_ppp_SR <- Shared_room_ppp_jit[BLTR_owin]
LVDR_ppp_SR <- Shared_room_ppp_jit[LVDR_owin]
TJPR_ppp_SR <- Shared_room_ppp_jit[TJPR_owin]

Step 5) Plotting distributions

Aljunied_ppp_map <- superimpose(Entire_House=AJND_ppp_EH, Hotel_Room=AJND_ppp_HR, Private_Room= AJND_ppp_PR, Shared_Room=AJND_ppp_SR)

spec = levels(marks(Aljunied_ppp_map))
scol <- c("red", "blue", "green", "black")
smap <- symbolmap(inputs=spec, col=scol)


Balestier_ppp_map <- superimpose(Entire_House=BLTR_ppp_EH, Hotel_Room=BLTR_ppp_HR, Private_Room= BLTR_ppp_PR, Shared_Room=BLTR_ppp_SR)


Lavender_ppp_map <- superimpose(Entire_House=LVDR_ppp_EH, Hotel_Room=LVDR_ppp_HR, Private_Room= LVDR_ppp_PR, Shared_Room=LVDR_ppp_SR)

TanjongPagar_ppp_map <- superimpose(Entire_House=TJPR_ppp_EH, Hotel_Room=TJPR_ppp_HR, Private_Room= TJPR_ppp_PR, Shared_Room=TJPR_ppp_SR)


par(mfrow=c(2,2))
plot(Aljunied_ppp_map, symap=smap, use.marks = TRUE, main = "Aljunied Airbnb listings")
plot(Balestier_ppp_map, symap=smap, use.marks = TRUE, main = "Balestier Airbnb listings")
plot(Lavender_ppp_map, symap=smap, use.marks = TRUE, main = "Lavender Airbnb listings")
plot(TanjongPagar_ppp_map, symap=smap, use.marks = TRUE, main = "Tanjong Pagar Airbnb listings")

Step 6) Interpretation of distribution

As mentioned in the previous Part, there appears upon first glance to be a distinct difference between the four zones in terms of room rtype distributions where Aljunied and Lavender appear to be primarily dominated by Private room listings where as Balestier and Tanjong Pagar appear to be dominated by Entire House listings though we do not have an accurate assessment of their distributions yet.

Part 5) 2nd order spatial point patterns analysis

Step 1) Defining null and alternative hypothesis as well as confidence level

Ho = The distribution of <> at <> are randomly distributed.
H1= The distribution of <> services at <> are not randomly distributed

For the purpose of this study, we will be using a confidence level of 95% with an alpha value of 0.05

Step 2) Determine which function to use

Aljunied_nnd <- nndist.ppp(Aljunied_ppp_map)
Balestier_nnd <- nndist.ppp(Balestier_ppp_map)
Lavender_nnd <- nndist.ppp(Lavender_ppp_map)
TanjongPagar_nnd <- nndist.ppp(TanjongPagar_ppp_map)

range(Aljunied_nnd)

## [1]   0.9219135 299.1901108

range(Balestier_nnd)

## [1]   1.13529 545.15750

range(Lavender_nnd)

## [1]  0.9719433 79.4579332

range(TanjongPagar_nnd)

## [1]  0.8170848 90.6607155

Based on our study areas and initial assessment of the points distribution, we can see that due to the polygon shape and geographican ature of our study areas,we cannot assume that they are isotrophic in nature. Thus applying the Ripley’s K function would not be ideal. Addiitonally, the range of values for Aljunied and Balestier vary much more compared to Lavender and Tanjong Pagar. This could cause complications in the application of the G-function due to its cumulative nature. Because of the varied nature of these study areas and point distributions, in order to form an objective view on the point pattern analysis, we will be using the F-function.

Step 3) Apply Function using Monte-carlo and interpret plots

Aljunied Study:

F_AJND.csr <- envelope(Aljunied_ppp_map, Fest, nsim = 999)
F_AJND_EH.csr <- envelope(AJND_ppp_EH, Fest, nsim = 999)
F_AJND_HR.csr <- envelope(AJND_ppp_HR, Fest, nsim = 999)
F_AJND_PR.csr <- envelope(AJND_ppp_PR, Fest, nsim = 999)
F_AJND_SR.csr <- envelope(AJND_ppp_SR, Fest, nsim = 999)

plot(F_AJND.csr)

par(mfrow=c(2,2))
plot(F_AJND_EH.csr)
plot(F_AJND_HR.csr)
plot(F_AJND_PR.csr)
plot(F_AJND_SR.csr)

Interpretation: Overall clustered, primarily due to Entire House and Private Room listings. With reference to the map, it seems that Entire House nad Private Rooms listings are often very close to each other which may be due to renters listing their property twice for both categories. We will also reject the null hypothesis for overall distribution at a 95% confidence interval at a distance of around 10 meters.

Balestier Study:

F_BLTR.csr <- envelope(Balestier_ppp_map, Fest, nsim = 999)
F_BLTR_EH.csr <- envelope(BLTR_ppp_EH, Fest, nsim = 999)
F_BLTR_HR.csr <- envelope(BLTR_ppp_HR, Fest, nsim = 999)
F_BLTR_PR.csr <- envelope(BLTR_ppp_PR, Fest, nsim = 999)
F_BLTR_SR.csr <- envelope(BLTR_ppp_SR, Fest, nsim = 999)

plot(F_BLTR.csr)

par(mfrow=c(2,2))
plot(F_BLTR_EH.csr)
plot(F_BLTR_HR.csr)
plot(F_BLTR_PR.csr)
plot(F_BLTR_SR.csr)

Interpretation: Overall there appears to be some clustering. However, it should be noted that there eappears to be a lack of datapoints for Shared room type resulting in the graph above for Balestier. Therefore we can not draw any conclusions from it. Similar to what was concluded in Aljunied, it appears we are only able to reject the null hypothesis for Entire House and Private room types while we are unable to do so for the other two. This may be due to a lack of datapoints. Private room type also requires a large distance of around 100 meters in order to reject the null hypothesis consistently.

Lavender Study:

F_LVDR.csr <- envelope(Lavender_ppp_map, Fest, nsim = 999)
F_LVDR_EH.csr <- envelope(LVDR_ppp_EH, Fest, nsim = 999)
F_LVDR_HR.csr <- envelope(LVDR_ppp_HR, Fest, nsim = 999)
F_LVDR_PR.csr <- envelope(LVDR_ppp_PR, Fest, nsim = 999)
F_LVDR_SR.csr <- envelope(LVDR_ppp_SR, Fest, nsim = 999)

plot(F_LVDR.csr)

par(mfrow=c(2,2))
plot(F_LVDR_EH.csr)
plot(F_LVDR_HR.csr)
plot(F_LVDR_PR.csr)
plot(F_LVDR_SR.csr)

Interpretation: Overall we are able to reject the null hypothesis at 20 meters with some signs of clustering. We can see that this is driven primary from the clustering of the Private room type with a similar cutoff distance for rejecting the null hypothesis. Additionally we are able to reject the null hypothesis for both Entire House and Hotel Rooms but we are unable to do so for Shared rooms.

Tanjong Pagar Study:

F_TJPR.csr <- envelope(TanjongPagar_ppp_map, Fest, nsim = 999)
F_TJPR_EH.csr <- envelope(TJPR_ppp_EH, Fest, nsim = 999)
F_TJPR_HR.csr <- envelope(TJPR_ppp_HR, Fest, nsim = 999)
F_TJPR_PR.csr <- envelope(TJPR_ppp_PR, Fest, nsim = 999)
F_TJPR_SR.csr <- envelope(TJPR_ppp_SR, Fest, nsim = 999)

plot(F_TJPR.csr)

par(mfrow=c(2,2))
plot(F_TJPR_EH.csr)
plot(F_TJPR_HR.csr)
plot(F_TJPR_PR.csr)
plot(F_TJPR_SR.csr)

## Error in yp[whole] <- yp[whole] + epsilon : 
##   NAs are not allowed in subscripted assignments

Interpretation:

Overall we are able to reject the null hypothesis at 12 meters with some signs of clustering. We can see that this is driven primary from the clustering of the Entire House room type as it is the only room type that we are able to reject the null hypothesis for this subzone. The rest of the distributios we are unable to. It should be noted there do not appear to be ably spatial data points for the Shared room types in this subzone, meaning we are unable to perform analysis on it here.

Part 6) kernel density maps of Airbnb listing by room type within Subzones

Step 1) Rescaling

Aljunied_ppp_map.km <- rescale(Aljunied_ppp_map, 1000, "km")
Balestier_ppp_map.km <- rescale(Balestier_ppp_map, 1000, "km")
Lavender_ppp_map.km <- rescale(Lavender_ppp_map, 1000, "km")
TanjongPagar_ppp_map.km <- rescale(TanjongPagar_ppp_map, 1000, "km")

Step 2) Performing kernel density calculation

kde_AJND_all <- density(Aljunied_ppp_map.km, sigma=0.6, edge=TRUE, kernel="gaussian") 
kde_BLTR_all <- density(Balestier_ppp_map.km, sigma=0.6, edge=TRUE, kernel="gaussian") 
kde_LVDR_all <- density(Lavender_ppp_map.km, sigma=0.6, edge=TRUE, kernel="gaussian") 
kde_TJPR_all <- density(TanjongPagar_ppp_map.km, sigma=0.6, edge=TRUE, kernel="gaussian")

Step 3) Checking Plots

par(mfrow=c(1,4))
plot(kde_AJND_all)
plot(kde_BLTR_all)
plot(kde_LVDR_all)
plot(kde_TJPR_all)

Step 4) Formatting for OpenStreetMap plots

kde_AJND_all_grid <- as.SpatialGridDataFrame.im(kde_AJND_all)
kde_BLTR_all_grid <- as.SpatialGridDataFrame.im(kde_BLTR_all)
kde_LVDR_all_grid <- as.SpatialGridDataFrame.im(kde_LVDR_all)
kde_TJPR_all_grid <- as.SpatialGridDataFrame.im(kde_TJPR_all)


kde_AJND_all_raster <- raster(kde_AJND_all_grid)
kde_BLTR_all_raster <- raster(kde_BLTR_all_grid)
kde_LVDR_all_raster <- raster(kde_LVDR_all_grid)
kde_TJPR_all_raster <- raster(kde_TJPR_all_grid)

#kde_Entire_home_SG_600_raster[kde_Entire_home_SG_600_raster < 0.0001 ] <- NA
#kde_Hotel_room_SG_600_raster[kde_Hotel_room_SG_600_raster < 0.0001 ] <- NA
#kde_Private_room_SG_600_raster[kde_Private_room_SG_600_raster < 0.0001 ] <- NA
#kde_Shared_room_SG_600_raster[kde_Shared_room_SG_600_raster < 0.0001 ] <- NA


projection(kde_AJND_all_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")
projection(kde_BLTR_all_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")
projection(kde_LVDR_all_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")
projection(kde_TJPR_all_raster) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ")

Checking Resolution of data

res(kde_AJND_all_raster)

## [1] 0.01534011 0.01585319

res(kde_BLTR_all_raster)

## [1] 0.01893197 0.01039581

res(kde_LVDR_all_raster)

## [1] 0.008682154 0.009355564

res(kde_TJPR_all_raster)

## [1] 0.003758669 0.004986962

Seems our data is not in the proper resolution.

Step 5) Rescaling

extent(kde_AJND_all_raster) <- extent(c(xmin(kde_AJND_all_raster), xmax(kde_AJND_all_raster), ymin(kde_AJND_all_raster), ymax(kde_AJND_all_raster))*1000)
projection(kde_AJND_all_raster) <- gsub("units=km", "units=m", projection(kde_AJND_all_raster))


extent(kde_BLTR_all_raster) <- extent(c(xmin(kde_BLTR_all_raster), xmax(kde_BLTR_all_raster), ymin(kde_BLTR_all_raster), ymax(kde_BLTR_all_raster))*1000)
projection(kde_BLTR_all_raster) <- gsub("units=km", "units=m", projection(kde_BLTR_all_raster))


extent(kde_LVDR_all_raster) <- extent(c(xmin(kde_LVDR_all_raster), xmax(kde_LVDR_all_raster), ymin(kde_LVDR_all_raster), ymax(kde_LVDR_all_raster))*1000)
projection(kde_LVDR_all_raster) <- gsub("units=km", "units=m", projection(kde_LVDR_all_raster))


extent(kde_TJPR_all_raster) <- extent(c(xmin(kde_TJPR_all_raster), xmax(kde_TJPR_all_raster), ymin(kde_TJPR_all_raster), ymax(kde_TJPR_all_raster))*1000)
projection(kde_TJPR_all_raster) <- gsub("units=km", "units=m", projection(kde_TJPR_all_raster))

Re-Checking Resolution of data

res(kde_AJND_all_raster)

## [1] 15.34011 15.85319

res(kde_BLTR_all_raster)

## [1] 18.93197 10.39581

res(kde_LVDR_all_raster)

## [1] 8.682154 9.355564

res(kde_TJPR_all_raster)

## [1] 3.758669 4.986962

Now our rasterlayer should be in the proper resolution

Step 6) Reprojecting CRS to EPSG 3414

projection(kde_AJND_all_raster) <- CRS("+init=EPSG:3414")
projection(kde_BLTR_all_raster) <- CRS("+init=EPSG:3414")
projection(kde_LVDR_all_raster) <- CRS("+init=EPSG:3414")
projection(kde_TJPR_all_raster) <- CRS("+init=EPSG:3414")

Step 7) Plotting layers onto an OpenStreetMap

tmap_mode("view")

tm_basemap(leaflet::providers$OpenStreetMap)+
tm_shape(kde_AJND_all_raster) + 
  tm_raster("v", n=5, title= "Entire Home", alpha=0.4, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, colorNA= NULL)+ 
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
tm_shape(kde_BLTR_all_raster) + 
  tm_raster("v", n=5, title= "Hotel Rm", alpha=0.2, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, colorNA= NULL) +
  tm_layout(legend.position = c("right", "top"), frame = FALSE) +
tm_shape(kde_LVDR_all_raster) + 
  tm_raster("v", n=5, title= "Private Rm", alpha=0.2, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, colorNA= NULL)+
  tm_layout(legend.position = c("left", "top"), frame = FALSE) +
tm_shape(kde_TJPR_all_raster) + 
  tm_raster("v", n=5, title= "Shared Rm", alpha=0.2, breaks = c(0,30, 60, 90, 100, Inf), drop.levels = TRUE, colorNA= NULL)+
  tm_layout(legend.position = c("left", "bottom"), frame = FALSE)

Step 8) Interpretation of overall distributions

Due to the computationally intensive requirements to plot 4 raster layers onto interactive maps. This makes it difficult to draw proper conclusions regarding the distributions within subzones effectively.

We attempted to standardize the distributions for all, however due to the colorful nature of the interative base-map, it became hard to draw conclusions about the distributions. Overall however, it does seem to provide more nuanced distribution of the clustering withi n the subzones as we project the rasters onto the map.

While we could proceed to map all 16 remaining raster layers, it would be hard to do so without overloading the r-markdown file and presenting issues in publishing. After numerous attempts, the r-markdown file is just too large for upload onto r-pubs

Take-Home_Ex02

Lee Yi De

17 May 2020

Take-Home Exercise 02: Examination of Airbnb listings in Singapore

Section A: Nation-wide analysis

1) Exploratory Spatial Data Analysis:

2) With reference to the spatial point patterns observed in (1):

3) With reference to the results derived in (1) and (2):

Section B: By planning subzones

4) Exploratory Spatial Data Analysis:

Part 0: Loading in Packages

Section A: Nation-wide analysis

Part 1) Exploratory Spatial Data Analysis:

Step 1) Loading and formating in Airbnb listings

Loadng in Airbnb listings as a simple dataframe

Converting aspatial data into sf dataframe

NA/Validity Check 1

NA/Validity Check 2

Step 2) Plotting on t-maps

Step 3) Interpretation of Dot-plot

Part 2) Spatial Point Analysis

Step 1) Define Null Hypothesis, Alternative Hypothesis and Confidence Interval

Step 1) Convert to SpatialPointsDataFrame format

Step 2) Importing required Singapore spatial data

Loading in MP14 Planning Area Subzone Data

NA/Validity Check 1

Make valid polygons

NA/Validity Check 2

Convert to SpatialPolygon Dataframe

Merging polygons to create an outline map for SG.

Step 3) Converting files into ppp format

Converting to Spatial Polygon Dataframe

Converting Spatial Dataframes to Spatial Points

Converting from Spatial Points and Spatial Polygons to ppp format

Checking for duplicates

Removing duplicates using a jitter

Reverification of duplicates

Creating Owin file using outline made earlier

Combining ppp files

Step 4) Perform Chi-square test

Step 5) Interpretation of Chi-Square test

Step 6) Conditional Monte Carlo test of CSR using quadrat counts

Step 7) Perfomring Nearest Neighbour Analysis using the Clark and Evans Test

Step 8) Interpretation of the Clark-Evans test

Part 3) Kernal Density Estimation

Step 1) Computing kernel density estimation using adaptive bandwidth selection method

Step 2) Plotting distribution

Step 3) Interpretation of KDE distribution for adaptive bandwidth.

Step 4) Computing kernel density estimation using defining bandwidth manually

Step 5) Interpretation of KDE distribution using fixed widths

Step 6) Converting KDE output into grid object.

Step 7) Converting gridded output into raster

Assigning projection systems

Step 8) Visualising the output in tmap

Step 6) Interpretation of KDE Estimate

Step 7) Overall conclusion of 1st order analysis

Part 4) Kernel density maps of Airbnb listing by room type

Step 1) Setting up data for plotting

Step 2) Testing for duplicates

Step 3) Removing duplicates for all using jitter

Step 4) Rechecking for duplicates

Step 5) Appending PPP files to owin file of Singapore borders

Step 6) Rescaling

Step 7) Computing kernel density estimation using defining bandwidth manually

Step 8) Plotting results

Step 9) Formatting for OpenStreetMap plots

Checking Resolution of data

Step 10) Rescaling

Re-Checking Resolution of data

Step 11) Reprojecting CRS to EPSG 3414

Step 12) Plotting layers onto an OpenStreetMap

Step 13) Interpretation of raster plot on interactive map.

Section B: Planning Subzone

Part 4) Exploratory Spatial Data Analysis: Extracting Data by subzone

Step 1) Extracting the Study Areas

Plotting to check output

Step 2) Converting the spatial point data frame into generic sp format

Step 3) converting to OWIN Objects

Step 4) Combining Airbnb by room type points and the study area

Step 5) Plotting distributions