The data visualization below aims at visualizing Singapore’s average resale flats prices by flat type, region in 2021.Map and heatmap will be used to find insights on the HDB Resale Flat Price dataset.
data set can be found from :
1. HDB Resale flat price data: https://data.gov.sg/dataset/resale-flat-prices
2. URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL)
Caption for the picture.
Caption for the picture.
The first challenge faced is in the data preparation stage where we need to extract the exact year attribute from ‘month’ column. The value in ‘month’ column has a format of ‘YYYY-MM’.
The second challenge faced is that the origin data set don’t provide summary data which is hard to compare the resale price among different regions or planning areas with the flat type.
We can use the substr() function to extract the first 4 digits which is the year from ‘YYYY-MM’ under the ‘month’ column. By using the mutate() function, we can create a new column named ‘year’.
Aggregate and summarize the data to find the average resale price by planning area and flat type. Thus, we can compare the differences and find some insights.
packages = c('sf', 'tmap', 'tidyverse', 'treemap', 'plotly')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: sf
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
## Loading required package: tmap
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: treemap
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Use the read_csv() function of readr package to import Singapore resale flat prices csv file into R.
raw_dataset <- read_csv("data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## month = col_character(),
## town = col_character(),
## flat_type = col_character(),
## block = col_character(),
## street_name = col_character(),
## storey_range = col_character(),
## floor_area_sqm = col_double(),
## flat_model = col_character(),
## lease_commence_date = col_double(),
## remaining_lease = col_character(),
## resale_price = col_double()
## )
Use the st_read() function of sf package to import MP14_PLNG_AREA_WEB_PL shapefile into R as a simple feature data frame called mpsz.
mpsz <- st_read(dsn = "data/geospatial",layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `/Users/MIYA/Desktop/IS 428/assignment/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
unique(mpsz$PLN_AREA_N)
## [1] "MARINA SOUTH" "OUTRAM"
## [3] "SINGAPORE RIVER" "BUKIT MERAH"
## [5] "QUEENSTOWN" "MARINA EAST"
## [7] "RIVER VALLEY" "WESTERN ISLANDS"
## [9] "SOUTHERN ISLANDS" "DOWNTOWN CORE"
## [11] "STRAITS VIEW" "MARINE PARADE"
## [13] "MUSEUM" "ORCHARD"
## [15] "ROCHOR" "KALLANG"
## [17] "TANGLIN" "NEWTON"
## [19] "CLEMENTI" "TUAS"
## [21] "BEDOK" "PIONEER"
## [23] "JURONG EAST" "BUKIT TIMAH"
## [25] "NOVENA" "GEYLANG"
## [27] "BOON LAY" "TOA PAYOH"
## [29] "JURONG WEST" "BUKIT BATOK"
## [31] "SERANGOON" "PAYA LEBAR"
## [33] "BISHAN" "TAMPINES"
## [35] "HOUGANG" "BUKIT PANJANG"
## [37] "ANG MO KIO" "CHOA CHU KANG"
## [39] "PASIR RIS" "CHANGI"
## [41] "SENGKANG" "CHANGI BAY"
## [43] "TENGAH" "SUNGEI KADUT"
## [45] "PUNGGOL" "YISHUN"
## [47] "MANDAI" "SELETAR"
## [49] "WOODLANDS" "WESTERN WATER CATCHMENT"
## [51] "SEMBAWANG" "SIMPANG"
## [53] "LIM CHU KANG" "NORTH-EASTERN ISLANDS"
## [55] "CENTRAL WATER CATCHMENT"
In order to get the data set only for year 2021, we need to extract the year data from the ‘month’ column. So we use the mutate() function to create a new column named ‘year’ with the first 4 digit of the value in ‘month’ column.
raw_dataset <- raw_dataset %>%
mutate(year = substr(month, start = 1, stop = 4))
head(raw_dataset)
## # A tibble: 6 x 12
## month town flat_type block street_name storey_range floor_area_sqm flat_model
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 2017… ANG … 2 ROOM 406 ANG MO KIO… 10 TO 12 44 Improved
## 2 2017… ANG … 3 ROOM 108 ANG MO KIO… 01 TO 03 67 New Gener…
## 3 2017… ANG … 3 ROOM 602 ANG MO KIO… 01 TO 03 67 New Gener…
## 4 2017… ANG … 3 ROOM 465 ANG MO KIO… 04 TO 06 68 New Gener…
## 5 2017… ANG … 3 ROOM 601 ANG MO KIO… 01 TO 03 67 New Gener…
## 6 2017… ANG … 3 ROOM 150 ANG MO KIO… 01 TO 03 68 New Gener…
## # … with 4 more variables: lease_commence_date <dbl>, remaining_lease <chr>,
## # resale_price <dbl>, year <chr>
After generating the ‘year’ attribute, we are able to filter the data set only for 2021.
data2021 <- raw_dataset %>%
filter(year == "2021") %>%
select(c("year", "town", "flat_type","resale_price","flat_model","storey_range"))
head(data2021)
## # A tibble: 6 x 6
## year town flat_type resale_price flat_model storey_range
## <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 2021 ANG MO KIO 2 ROOM 225000 Improved 07 TO 09
## 2 2021 ANG MO KIO 2 ROOM 211000 Improved 01 TO 03
## 3 2021 ANG MO KIO 3 ROOM 275888 New Generation 04 TO 06
## 4 2021 ANG MO KIO 3 ROOM 316800 New Generation 07 TO 09
## 5 2021 ANG MO KIO 3 ROOM 305000 New Generation 10 TO 12
## 6 2021 ANG MO KIO 3 ROOM 260000 New Generation 04 TO 06
[Visualizations for region]
Calculate the average price of each town by flat type. First, use the group_by() function to group the data by town. We don’t include the year because now the data set only contains 2021’s data. Then use the mean() function to calculate the average price.
avg_data <- data2021 %>%
group_by(town,flat_type) %>%
summarize("average_price" = mean(resale_price), "volume" = n())
## `summarise()` has grouped output by 'town'. You can override using the `.groups` argument.
avg_data
## # A tibble: 125 x 4
## # Groups: town [26]
## town flat_type average_price volume
## <chr> <chr> <dbl> <int>
## 1 ANG MO KIO 2 ROOM 218000 2
## 2 ANG MO KIO 3 ROOM 312095. 144
## 3 ANG MO KIO 4 ROOM 468997. 71
## 4 ANG MO KIO 5 ROOM 687892. 39
## 5 ANG MO KIO EXECUTIVE 825000 5
## 6 BEDOK 2 ROOM 251667. 3
## 7 BEDOK 3 ROOM 318013. 130
## 8 BEDOK 4 ROOM 480810. 135
## 9 BEDOK 5 ROOM 602225. 88
## 10 BEDOK EXECUTIVE 762419. 20
## # … with 115 more rows
Calculate the average price of each town regardless of flat type.
full_avg <- data2021 %>%
group_by(town) %>%
summarize("average_price" = mean(resale_price), "volume" = n())
full_avg
## # A tibble: 26 x 3
## town average_price volume
## * <chr> <dbl> <int>
## 1 ANG MO KIO 420035. 261
## 2 BEDOK 466091. 376
## 3 BISHAN 641006. 151
## 4 BUKIT BATOK 446781. 171
## 5 BUKIT MERAH 592222. 259
## 6 BUKIT PANJANG 471567. 284
## 7 BUKIT TIMAH 791167. 28
## 8 CENTRAL AREA 661550. 66
## 9 CHOA CHU KANG 449445. 324
## 10 CLEMENTI 528775. 172
## # … with 16 more rows
Calculate the average price based on the planning area and storey level range.
storey_avgprice <- data2021 %>%
group_by(town,storey_range) %>%
summarize("average_price" = mean(resale_price), "volume" = n())
## `summarise()` has grouped output by 'town'. You can override using the `.groups` argument.
storey_avgprice
## # A tibble: 244 x 4
## # Groups: town [26]
## town storey_range average_price volume
## <chr> <chr> <dbl> <int>
## 1 ANG MO KIO 01 TO 03 368890. 52
## 2 ANG MO KIO 04 TO 06 388659. 69
## 3 ANG MO KIO 07 TO 09 377724. 64
## 4 ANG MO KIO 10 TO 12 383792 42
## 5 ANG MO KIO 13 TO 15 613606. 16
## 6 ANG MO KIO 16 TO 18 780286. 7
## 7 ANG MO KIO 19 TO 21 749650 2
## 8 ANG MO KIO 22 TO 24 741148 6
## 9 ANG MO KIO 25 TO 27 850000 1
## 10 ANG MO KIO 28 TO 30 630000 2
## # … with 234 more rows
Standardized with data with title case for better visualization.
avg_data$town <- str_to_title(avg_data$town)
full_avg$town <- str_to_title(full_avg$town)
mpsz$PLN_AREA_N <- str_to_title(mpsz$PLN_AREA_N)
mpsz$REGION_N <- str_to_title(mpsz$REGION_N)
Use left_join() from dplyr to join the geographical data and resale data using PLN_AREA_N and town as the common identifier.
avg_data_with_PA <- left_join(mpsz,avg_data,by = c("PLN_AREA_N" = "town"))
Prepare the data set for the interactive map regardless of flat type.
full_avg <- left_join(mpsz,full_avg,by = c("PLN_AREA_N" = "town"))
[Create Static Map ]
Use the tmap package with plot mode to create the static visualization. The region will be filled in the map. Bubble’s size refers to the number of flats resaled in the town. Bubble’s color refers to the average prices range.
tmap_mode("plot")
## tmap mode set to plotting
tm_shape(avg_data_with_PA)+
tm_fill("REGION_N",
n=6,
style="pretty",
palette="Blues") +
tm_facets(by="flat_type",
free.coords=TRUE) +
tm_borders(alpha=0.5) +
tm_bubbles (size = "volume",col="average_price") +
tm_layout(main.title = "The Spatial Distribution of HDB Resale Flat Price in Singapore",
main.title.position="center",
main.title.size=1.2,
legend.outside=TRUE,
legend.position=c("right","bottom"),
frame=FALSE)+
tm_compass(type="4star",size=2) +
tm_scale_bar(width=0.2)
[Create Interactive Map regardless of flat type ]
Use the tmap package with view mode to create the static visualization.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(full_avg)+
tm_fill("REGION_N",
n=6,
style="pretty",
palette="Blues") +
tm_borders(alpha=0.5) +
tm_bubbles (size = "volume",col="average_price") +
tm_layout(main.title = "The Spatial Distribution of HDB Resale Flat Price in Singapore",
main.title.position="center",
main.title.size=1.2,
legend.outside=TRUE,
legend.position=c("right","bottom"),
frame=FALSE)+
tm_compass(type="4star",size=2) +
tm_scale_bar(width=0.2)
## Compass not supported in view mode.
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
## Scale bar width set to 100 pixels
## Legend for symbol sizes not available in view mode.
Use ggplot to pot a heatmap.
heatmap <- ggplot(data = storey_avgprice,
aes(x = town,
y = storey_range,
fill = average_price,
text = paste("PLanning Area: ", town,
"<br>Storey Range: ", storey_range,
"<br>Average Resale Price: S$", average_price)
)
) +
geom_tile() +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Planning Area",
y = "Storey") +
scale_fill_gradient(name = "Average Resale Price (S$)",
low = "#B0E2FF",
high = "#003F87")
heatmap_interactive <- ggplotly(heatmap, tooltip = "text")
heatmap_interactive
Insight 1: Hot-selling of 4-room and 5-room flats
The spatial distribution shows that 4-room and 5-room not only have the highest volume of resales but also have the highest average resale price.
Insight 2: Higher Storey Level has Higher Average Resale Price
The heatmap shows that the average resale price increases along with the increase of the storey. The color becomes darker from bottom to top. Moreover, Bukit Timah area is the area with highest average resale price regardless with the storey level.