Loading of R packages
packages = c('tidyverse','dplyr','ggpubr','knitr','sf', 'tmap')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Load spatial data and CSV
sg <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\Hao Jun\School\[IS428] Visual Analytics for Business Intelligence-G1\Assignment\5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
listings <- read_csv("data/aspatial/listings.csv")
In this assignment, we are not given any data sets and we can work on any case we would like. However, we are required to incorporate interactivity and/or map(s) in our data visualization design.
The dataset I will be using for this assignment is Singapore Airbnb Listing 2019.
Challenge 1: The id, host id, room type and neighbourhood group are not in the correct data type
Using summary, I realised that id, host id, room type and neighbourhood group wasn’t in the correct data type which could hinder later analysis. So I decided that I will be converting id, host id into character and room type, neighbourhood group to factor.
summary(listings)
## id name host_id host_name
## Min. : 49091 Length:7907 Min. : 23666 Length:7907
## 1st Qu.:15821800 Class :character 1st Qu.: 23058075 Class :character
## Median :24706270 Mode :character Median : 63448912 Mode :character
## Mean :23388625 Mean : 91144807
## 3rd Qu.:32348500 3rd Qu.:155381142
## Max. :38112762 Max. :288567551
##
## neighbourhood_group neighbourhood latitude longitude
## Length:7907 Length:7907 Min. :1.244 Min. :103.6
## Class :character Class :character 1st Qu.:1.296 1st Qu.:103.8
## Mode :character Mode :character Median :1.311 Median :103.8
## Mean :1.314 Mean :103.8
## 3rd Qu.:1.322 3rd Qu.:103.9
## Max. :1.455 Max. :104.0
##
## room_type price minimum_nights number_of_reviews
## Length:7907 Min. : 0.0 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.: 65.0 1st Qu.: 1.00 1st Qu.: 0.00
## Mode :character Median : 124.0 Median : 3.00 Median : 2.00
## Mean : 169.3 Mean : 17.51 Mean : 12.81
## 3rd Qu.: 199.0 3rd Qu.: 10.00 3rd Qu.: 10.00
## Max. :10000.0 Max. :1000.00 Max. :323.00
##
## last_review reviews_per_month calculated_host_listings_count
## Min. :2013-10-21 Min. : 0.010 Min. : 1.00
## 1st Qu.:2018-11-21 1st Qu.: 0.180 1st Qu.: 2.00
## Median :2019-06-27 Median : 0.550 Median : 9.00
## Mean :2019-01-11 Mean : 1.044 Mean : 40.61
## 3rd Qu.:2019-08-07 3rd Qu.: 1.370 3rd Qu.: 48.00
## Max. :2019-08-27 Max. :13.000 Max. :274.00
## NA's :2758 NA's :2758
## availability_365
## Min. : 0.0
## 1st Qu.: 54.0
## Median :260.0
## Mean :208.7
## 3rd Qu.:355.0
## Max. :365.0
##
listings$id <- as.character(listings$id)
listings$host_id <- as.character(listings$host_id)
listings$neighbourhood_group <- factor(listings$neighbourhood_group)
listings$neighbourhood <- factor(listings$neighbourhood)
listings$room_type <- factor(listings$room_type)
summary(listings)
## id name host_id host_name
## Length:7907 Length:7907 Length:7907 Length:7907
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## neighbourhood_group neighbourhood latitude longitude
## Central Region :6309 Kallang :1043 Min. :1.244 Min. :103.6
## East Region : 508 Geylang : 994 1st Qu.:1.296 1st Qu.:103.8
## North-East Region: 346 Novena : 537 Median :1.311 Median :103.8
## North Region : 204 Rochor : 536 Mean :1.314 Mean :103.8
## West Region : 540 Outram : 477 3rd Qu.:1.322 3rd Qu.:103.9
## Bukit Merah: 470 Max. :1.455 Max. :104.0
## (Other) :3850
## room_type price minimum_nights number_of_reviews
## Entire home/apt:4132 Min. : 0.0 Min. : 1.00 Min. : 0.00
## Private room :3381 1st Qu.: 65.0 1st Qu.: 1.00 1st Qu.: 0.00
## Shared room : 394 Median : 124.0 Median : 3.00 Median : 2.00
## Mean : 169.3 Mean : 17.51 Mean : 12.81
## 3rd Qu.: 199.0 3rd Qu.: 10.00 3rd Qu.: 10.00
## Max. :10000.0 Max. :1000.00 Max. :323.00
##
## last_review reviews_per_month calculated_host_listings_count
## Min. :2013-10-21 Min. : 0.010 Min. : 1.00
## 1st Qu.:2018-11-21 1st Qu.: 0.180 1st Qu.: 2.00
## Median :2019-06-27 Median : 0.550 Median : 9.00
## Mean :2019-01-11 Mean : 1.044 Mean : 40.61
## 3rd Qu.:2019-08-07 3rd Qu.: 1.370 3rd Qu.: 48.00
## Max. :2019-08-27 Max. :13.000 Max. :274.00
## NA's :2758 NA's :2758
## availability_365
## Min. : 0.0
## 1st Qu.: 54.0
## Median :260.0
## Mean :208.7
## 3rd Qu.:355.0
## Max. :365.0
##
After indicating the correct data type, I could see that most of the listings are in the Central Region and the most type of listings are Entire home/apt.
Challenge 2: last review and reviews per month having NA values
There are a total of 2,758 rows does not have any values in last review and reviews per month. This would either mean that there isn’t anyone leaving review for the airbnb listing or the data is missing in the first place when gathering the data. Therefore, in order to keep the dataset relevant I have decided to exclude all of these rows with NA values.
listings <- listings %>%
drop_na(last_review)
summary(listings)
## id name host_id host_name
## Length:5149 Length:5149 Length:5149 Length:5149
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## neighbourhood_group neighbourhood latitude
## Central Region :4144 Kallang : 697 Min. :1.244
## East Region : 345 Geylang : 648 1st Qu.:1.296
## North-East Region: 215 Rochor : 362 Median :1.311
## North Region : 108 Outram : 345 Mean :1.313
## West Region : 337 Novena : 313 3rd Qu.:1.320
## Downtown Core: 280 Max. :1.452
## (Other) :2504
## longitude room_type price minimum_nights
## Min. :103.6 Entire home/apt:2643 Min. : 0.0 Min. : 1.00
## 1st Qu.:103.8 Private room :2229 1st Qu.: 62.0 1st Qu.: 1.00
## Median :103.8 Shared room : 277 Median : 115.0 Median : 3.00
## Mean :103.8 Mean : 151.3 Mean : 12.46
## 3rd Qu.:103.9 3rd Qu.: 187.0 3rd Qu.: 6.00
## Max. :104.0 Max. :10000.0 Max. :700.00
##
## number_of_reviews last_review reviews_per_month
## Min. : 1.00 Min. :2013-10-21 Min. : 0.010
## 1st Qu.: 2.00 1st Qu.:2018-11-21 1st Qu.: 0.180
## Median : 6.00 Median :2019-06-27 Median : 0.550
## Mean : 19.67 Mean :2019-01-11 Mean : 1.044
## 3rd Qu.: 21.00 3rd Qu.:2019-08-07 3rd Qu.: 1.370
## Max. :323.00 Max. :2019-08-27 Max. :13.000
##
## calculated_host_listings_count availability_365
## Min. : 1.00 Min. : 0.0
## 1st Qu.: 2.00 1st Qu.: 55.0
## Median : 8.00 Median :239.0
## Mean : 35.22 Mean :201.1
## 3rd Qu.: 32.00 3rd Qu.:346.0
## Max. :274.00 Max. :365.0
##
Challenge 3: Joining the listings with the spatial data
Unlike in our previous assignment there a subzone variable that I can use to join both the spatial and aspatial data. Upon looking into the variables from both data, there is another variable that I can used as join variable which is neighbourhood under listings with planning area under sg spatial data. However, the neighbourhood values in listings isn’t captialise which requires transformation.
summary(sg)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.0 Min. : 1.000 ADMIRALTY : 1 AMSZ01 : 1 N:274
## 1st Qu.: 81.5 1st Qu.: 2.000 AIRPORT ROAD : 1 AMSZ02 : 1 Y: 49
## Median :162.0 Median : 4.000 ALEXANDRA HILL : 1 AMSZ03 : 1
## Mean :162.0 Mean : 4.625 ALEXANDRA NORTH: 1 AMSZ04 : 1
## 3rd Qu.:242.5 3rd Qu.: 6.500 ALJUNIED : 1 AMSZ05 : 1
## Max. :323.0 Max. :17.000 ANAK BUKIT : 1 AMSZ06 : 1
## (Other) :317 (Other):317
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH : 17 BM : 17 CENTRAL REGION :134 CR :134
## QUEENSTOWN : 15 QT : 15 EAST REGION : 30 ER : 30
## ANG MO KIO : 12 AM : 12 NORTH-EAST REGION: 48 NER: 48
## DOWNTOWN CORE: 12 DT : 12 NORTH REGION : 41 NR : 41
## TOA PAYOH : 12 TP : 12 WEST REGION : 70 WR : 70
## HOUGANG : 10 HG : 10
## (Other) :245 (Other):245
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :19579
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:21864 1st Qu.:31776
## 01A4287FB060A0A6: 1 Median :2014-12-05 Median :28465 Median :35113
## 029BD940F4455194: 1 Mean :2014-12-05 Mean :27257 Mean :36106
## 0524461C92F35D94: 1 3rd Qu.:2014-12-05 3rd Qu.:31674 3rd Qu.:39869
## 05FD555397CBEE7A: 1 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :317
## SHAPE_Leng SHAPE_Area geometry
## Min. : 871.5 Min. : 39438 MULTIPOLYGON :323
## 1st Qu.: 3709.6 1st Qu.: 628261 epsg:NA : 0
## Median : 5211.9 Median : 1229894 +proj=tmer...: 0
## Mean : 6524.4 Mean : 2420882
## 3rd Qu.: 6942.6 3rd Qu.: 2106483
## Max. :68083.9 Max. :69748299
##
listings <- listings %>%
mutate_at(.vars = vars(neighbourhood, neighbourhood_group), .funs = funs(toupper))
listings$neighbourhood <- factor(listings$neighbourhood)
listings$neighbourhood_group <- factor(listings$neighbourhood_group)
summary(listings)
## id name host_id host_name
## Length:5149 Length:5149 Length:5149 Length:5149
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## neighbourhood_group neighbourhood latitude
## CENTRAL REGION :4144 KALLANG : 697 Min. :1.244
## EAST REGION : 345 GEYLANG : 648 1st Qu.:1.296
## NORTH-EAST REGION: 215 ROCHOR : 362 Median :1.311
## NORTH REGION : 108 OUTRAM : 345 Mean :1.313
## WEST REGION : 337 NOVENA : 313 3rd Qu.:1.320
## DOWNTOWN CORE: 280 Max. :1.452
## (Other) :2504
## longitude room_type price minimum_nights
## Min. :103.6 Entire home/apt:2643 Min. : 0.0 Min. : 1.00
## 1st Qu.:103.8 Private room :2229 1st Qu.: 62.0 1st Qu.: 1.00
## Median :103.8 Shared room : 277 Median : 115.0 Median : 3.00
## Mean :103.8 Mean : 151.3 Mean : 12.46
## 3rd Qu.:103.9 3rd Qu.: 187.0 3rd Qu.: 6.00
## Max. :104.0 Max. :10000.0 Max. :700.00
##
## number_of_reviews last_review reviews_per_month
## Min. : 1.00 Min. :2013-10-21 Min. : 0.010
## 1st Qu.: 2.00 1st Qu.:2018-11-21 1st Qu.: 0.180
## Median : 6.00 Median :2019-06-27 Median : 0.550
## Mean : 19.67 Mean :2019-01-11 Mean : 1.044
## 3rd Qu.: 21.00 3rd Qu.:2019-08-07 3rd Qu.: 1.370
## Max. :323.00 Max. :2019-08-27 Max. :13.000
##
## calculated_host_listings_count availability_365
## Min. : 1.00 Min. : 0.0
## 1st Qu.: 2.00 1st Qu.: 55.0
## Median : 8.00 Median :239.0
## Mean : 35.22 Mean :201.1
## 3rd Qu.: 32.00 3rd Qu.:346.0
## Max. :274.00 Max. :365.0
##
sg_listings <- left_join(sg, listings,
by = c("PLN_AREA_N" = "neighbourhood"))
summary(sg_listings)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1 Min. : 1.000 BENDEMEER : 697 KLSZ01 : 697 N:28917
## 1st Qu.: 49 1st Qu.: 2.000 BOON KENG : 697 KLSZ02 : 697 Y:11028
## Median :100 Median : 4.000 CRAWFORD : 697 KLSZ03 : 697
## Mean :106 Mean : 5.214 GEYLANG BAHRU: 697 KLSZ04 : 697
## 3rd Qu.:148 3rd Qu.: 7.000 KALLANG BAHRU: 697 KLSZ05 : 697
## Max. :323 Max. :17.000 KAMPONG BUGIS: 697 KLSZ06 : 697
## (Other) :35763 (Other):35763
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## KALLANG : 6273 KL : 6273 CENTRAL REGION :31693 CR :31693
## BUKIT MERAH : 4539 BM : 4539 EAST REGION : 2628 ER : 2628
## ROCHOR : 3620 RC : 3620 NORTH-EAST REGION: 1850 NER: 1850
## DOWNTOWN CORE: 3360 DT : 3360 NORTH REGION : 859 NR : 859
## GEYLANG : 3240 GL : 3240 WEST REGION : 2915 WR : 2915
## QUEENSTOWN : 2265 QT : 2265
## (Other) :16648 (Other):16648
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 0524461C92F35D94: 697 Min. :2014-12-05 Min. : 5093 Min. :19579
## 0D1D1759D7BC6D6C: 697 1st Qu.:2014-12-05 1st Qu.:27077 1st Qu.:30537
## 69C9F7CD6F08EA3A: 697 Median :2014-12-05 Median :29817 Median :32276
## 928DCE8E44F904C8: 697 Mean :2014-12-05 Mean :29296 Mean :32862
## 97A1E6DEEC6C442D: 697 3rd Qu.:2014-12-05 3rd Qu.:32138 3rd Qu.:34230
## A7A07F328A38B6EF: 697 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :35763
## SHAPE_Leng SHAPE_Area id name
## Min. : 871.5 Min. : 39438 Length:39945 Length:39945
## 1st Qu.: 2897.1 1st Qu.: 387429 Class :character Class :character
## Median : 4055.2 Median : 839489 Mode :character Mode :character
## Mean : 4626.8 Mean : 1162087
## 3rd Qu.: 5637.7 3rd Qu.: 1524551
## Max. :68083.9 Max. :69748299
##
## host_id host_name neighbourhood_group
## Length:39945 Length:39945 CENTRAL REGION :31691
## Class :character Class :character EAST REGION : 2619
## Mode :character Mode :character NORTH-EAST REGION: 1845
## NORTH REGION : 854
## WEST REGION : 2902
## NA's : 34
##
## latitude longitude room_type price
## Min. :1.244 Min. :103.6 Entire home/apt:20042 Min. : 0.0
## 1st Qu.:1.291 1st Qu.:103.8 Private room :17748 1st Qu.: 60.0
## Median :1.309 Median :103.8 Shared room : 2121 Median : 112.0
## Mean :1.312 Mean :103.8 NA's : 34 Mean : 147.5
## 3rd Qu.:1.320 3rd Qu.:103.9 3rd Qu.: 187.0
## Max. :1.452 Max. :104.0 Max. :10000.0
## NA's :34 NA's :34 NA's :34
## minimum_nights number_of_reviews last_review reviews_per_month
## Min. : 1.00 Min. : 1.00 Min. :2013-10-21 Min. : 0.010
## 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.:2018-11-01 1st Qu.: 0.170
## Median : 2.00 Median : 5.00 Median :2019-06-22 Median : 0.520
## Mean : 12.77 Mean : 18.48 Mean :2019-01-03 Mean : 1.018
## 3rd Qu.: 7.00 3rd Qu.: 19.00 3rd Qu.:2019-08-06 3rd Qu.: 1.290
## Max. :700.00 Max. :323.00 Max. :2019-08-27 Max. :13.000
## NA's :34 NA's :34 NA's :34 NA's :34
## calculated_host_listings_count availability_365 geometry
## Min. : 1.00 Min. : 0.0 MULTIPOLYGON :39945
## 1st Qu.: 2.00 1st Qu.: 54.0 epsg:NA : 0
## Median : 8.00 Median :241.0 +proj=tmer...: 0
## Mean : 33.88 Mean :201.3
## 3rd Qu.: 27.00 3rd Qu.:347.0
## Max. :274.00 Max. :365.0
## NA's :34 NA's :34
After joining, I realised that there is 34 NA values under neighbourbood group, room type, price and other few variables. I believe that there’s no data from the listings that is able to join with the spatial data. Therefore, I will drop those 34 NA values to keep the dataset relevant.
sg_listings <- sg_listings %>%
drop_na(neighbourhood_group)
summary(sg_listings)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C
## Min. : 1.0 Min. : 1.000 BENDEMEER : 697 KLSZ01 : 697
## 1st Qu.: 49.0 1st Qu.: 2.000 BOON KENG : 697 KLSZ02 : 697
## Median :100.0 Median : 4.000 CRAWFORD : 697 KLSZ03 : 697
## Mean :105.9 Mean : 5.216 GEYLANG BAHRU: 697 KLSZ04 : 697
## 3rd Qu.:148.0 3rd Qu.: 7.000 KALLANG BAHRU: 697 KLSZ05 : 697
## Max. :323.0 Max. :17.000 KAMPONG BUGIS: 697 KLSZ06 : 697
## (Other) :35729 (Other):35729
## CA_IND PLN_AREA_N PLN_AREA_C REGION_N
## N:28885 KALLANG : 6273 KL : 6273 CENTRAL REGION :31691
## Y:11026 BUKIT MERAH : 4539 BM : 4539 EAST REGION : 2619
## ROCHOR : 3620 RC : 3620 NORTH-EAST REGION: 1845
## DOWNTOWN CORE: 3360 DT : 3360 NORTH REGION : 854
## GEYLANG : 3240 GL : 3240 WEST REGION : 2902
## QUEENSTOWN : 2265 QT : 2265
## (Other) :16614 (Other):16614
## REGION_C INC_CRC FMEL_UPD_D X_ADDR
## CR :31691 0524461C92F35D94: 697 Min. :2014-12-05 Min. : 5093
## ER : 2619 0D1D1759D7BC6D6C: 697 1st Qu.:2014-12-05 1st Qu.:27077
## NER: 1845 69C9F7CD6F08EA3A: 697 Median :2014-12-05 Median :29817
## NR : 854 928DCE8E44F904C8: 697 Mean :2014-12-05 Mean :29298
## WR : 2902 97A1E6DEEC6C442D: 697 3rd Qu.:2014-12-05 3rd Qu.:32138
## A7A07F328A38B6EF: 697 Max. :2014-12-05 Max. :43592
## (Other) :35729
## Y_ADDR SHAPE_Leng SHAPE_Area id
## Min. :23413 Min. : 871.5 Min. : 39438 Length:39911
## 1st Qu.:30537 1st Qu.: 2897.1 1st Qu.: 387429 Class :character
## Median :32276 Median : 4055.2 Median : 839489 Mode :character
## Mean :32858 Mean : 4619.7 Mean : 1157188
## 3rd Qu.:34230 3rd Qu.: 5637.7 3rd Qu.: 1517767
## Max. :49553 Max. :54928.1 Max. :69748299
##
## name host_id host_name
## Length:39911 Length:39911 Length:39911
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## neighbourhood_group latitude longitude
## CENTRAL REGION :31691 Min. :1.244 Min. :103.6
## EAST REGION : 2619 1st Qu.:1.291 1st Qu.:103.8
## NORTH-EAST REGION: 1845 Median :1.309 Median :103.8
## NORTH REGION : 854 Mean :1.312 Mean :103.8
## WEST REGION : 2902 3rd Qu.:1.320 3rd Qu.:103.9
## Max. :1.452 Max. :104.0
##
## room_type price minimum_nights number_of_reviews
## Entire home/apt:20042 Min. : 0.0 Min. : 1.00 Min. : 1.00
## Private room :17748 1st Qu.: 60.0 1st Qu.: 1.00 1st Qu.: 2.00
## Shared room : 2121 Median : 112.0 Median : 2.00 Median : 5.00
## Mean : 147.5 Mean : 12.77 Mean : 18.48
## 3rd Qu.: 187.0 3rd Qu.: 7.00 3rd Qu.: 19.00
## Max. :10000.0 Max. :700.00 Max. :323.00
##
## last_review reviews_per_month calculated_host_listings_count
## Min. :2013-10-21 Min. : 0.010 Min. : 1.00
## 1st Qu.:2018-11-01 1st Qu.: 0.170 1st Qu.: 2.00
## Median :2019-06-22 Median : 0.520 Median : 8.00
## Mean :2019-01-03 Mean : 1.018 Mean : 33.88
## 3rd Qu.:2019-08-06 3rd Qu.: 1.290 3rd Qu.: 27.00
## Max. :2019-08-27 Max. :13.000 Max. :274.00
##
## availability_365 geometry
## Min. : 0.0 MULTIPOLYGON :39911
## 1st Qu.: 54.0 epsg:NA : 0
## Median :241.0 +proj=tmer...: 0
## Mean :201.3
## 3rd Qu.:347.0
## Max. :365.0
##
In conclusion, the shared rooms are concentrated in the Central region and the private rooms are spread across Singapore more than the Entire home/apt room type from Airbnb listings.
Upon looking in detail, it seems that the central region has the most room type listing in Singapore. There are 2,396 Entire home/apt, 1,490 Private room, and 258 Shared room. But in terms of the most popular listing, the private room located in the East region has a total of 276 reviews and with a 12.6 reviews per month. I believe this listing is near the ariport which most likely cater to those travelling overseas often or for short stay over in Singapore.
Also looking closer into Central region, the popular Entire home/apt room type is Prime Central Apartment For Five has a total of 272 reviews with a 6.95 reviews per month. For the popular Private room type is Quiet and spacious room near Woodleigh MRT(R2) has a total of 114 reviews with a 7.81 reviews per month. Lastly, Shared room is Single Capsule for 1 (Free Breakfast) has a total of 134 reviews with a 3.35 reviews per month.