In this assignment, I am using the HDB dataset realis2019.csv, which contains the transation data fron Jan to Dec 2019, to reveal the housing transaction patterns. Possible findings will be the (1) peak season for transactions, (2) the average sale price for each planning arae and (3) a dumbbell plot showing the price ranges. Finding one will be presented by a bar chart, and finding two by a map.
Description: realis2019.csv doesn’t have subzone level features, so it cannot match with the subzone shapefile and create maps. Solution: Download the plannign area level shapefile from data.gov.sg
Description: To reflect the unit price range of each planning area, the current data we have is not suitable for creating the dumbbell chart. Solution: The data structure consists of three columns, the first column is the y axis labels, the second and third columns are the x values with one as starting point and the other the end point. To find the minimum ans maximum unit price of each planning area and create such a struture, a feasible solution is to use for loop to iterate through each planning area, and use min() and max() function to get the price, and finally put all data in a new data frame.
Description: To visualize the transaction pattern and average sale price in each planning area, additional calculation is needed. Solution: Use group_by() function for each planning area together with summarize() function to create the summed up number of transations and total price. Divide the total price by number of transaction to get the average price.
packages = c('sf', 'tmap', 'tidyverse', 'plotly', 'ggplot2')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
map_pa <- st_read(dsn = "./planning area boundary", layer = "MP14_PLNG_AREA_WEB_PL")
hdb <- read.csv("realis2019.csv", check.names=FALSE)
}
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `D:\xuexi\visualization\assignment - 5\planning area boundary' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `D:\xuexi\visualization\assignment - 5\planning area boundary' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `D:\xuexi\visualization\assignment - 5\planning area boundary' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `D:\xuexi\visualization\assignment - 5\planning area boundary' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `D:\xuexi\visualization\assignment - 5\planning area boundary' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
hdb <- hdb %>% select('Area (sqm)', 'Transacted Price ($)', 'Unit Price ($ psm)', 'Property Type', 'Sale Date', 'Planning Region', 'Planning Area') %>%
mutate_at(.vars = vars('Planning Region', 'Planning Area'), toupper) %>%
mutate('month' = str_sub(hdb$`Sale Date`, -6, -4)) %>%
mutate('number of transaction' = 1)# %>%
names(hdb)[1] <- "area"
names(hdb)[2] <- "total price"
names(hdb)[3] <- "unit price"
hdb <- subset(hdb, select = -c(5))
summary(hdb)
## area total price unit price Property Type
## Min. : 30 Min. : 330000 Min. : 1072 Length:19515
## 1st Qu.: 63 1st Qu.: 1000000 1st Qu.:12522 Class :character
## Median : 89 Median : 1319000 Median :15751 Mode :character
## Mean : 112 Mean : 1769165 Mean :16619
## 3rd Qu.: 121 3rd Qu.: 1819000 3rd Qu.:19041
## Max. :32931 Max. :163333158 Max. :55160
## Planning Region Planning Area month number of transaction
## Length:19515 Length:19515 Length:19515 Min. :1
## Class :character Class :character Class :character 1st Qu.:1
## Mode :character Mode :character Mode :character Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
hdb1 <- hdb %>% group_by(`Property Type`, `month`)
months = factor(hdb1$month, level = c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'))
plot_1 <- ggplot(data = hdb1, aes(x = months, y = `number of transaction`, fill = `Property Type`))+
geom_histogram(stat = "identity")
plot_1
#### 2.3 create the data frame for average price and plot in the interactive map
hdb2 <- subset(hdb, select = c(2, 6, 8)) %>%
group_by(`Planning Area`) %>%
summarise(`total price` = sum(`total price`), `number of transaction` = sum(`number of transaction`)) %>%
mutate(average=`total price`/`number of transaction`)
hdb_map <- left_join(map_pa, hdb2, by = c("PLN_AREA_N" = "Planning Area"))
tmap_mode("view")
tm_shape(hdb_map)+
tm_fill("average",
style = "quantile",
palette = "Blues") +
tm_borders(alpha = 0.5)
areas = unique(hdb$`Planning Area`)
PA = c()
min_unit = c()
max_unit = c()
for(area in areas){
hdb3 <- hdb[hdb$`Planning Area` == area,]
PA = c(PA, area)
min_unit = c(min_unit, min(hdb3$`unit price`))
max_unit = c(max_unit, max(hdb3$`unit price`))
}
hdb3 <- data.frame("Planning Area" = PA, "min unit price" = min_unit, "max unit price" = max_unit, check.names=FALSE)
fig <- plot_ly(hdb3) %>%
add_segments(x = ~`min unit price`, xend = ~`max unit price`, y = ~`Planning Area`, yend = ~`Planning Area`, showlegend = FALSE) %>%
add_markers(x = ~`min unit price`, y = ~`Planning Area`, name = "min", color = I("green")) %>%
add_markers(x = ~`max unit price`, y = ~`Planning Area`, name = "max", color = I("blue")) %>%
layout(
title = "Unit Price Difference Across Planning Areas",
xaxis = list(title = "Unit Prices"),
margin = list(l = 65)
)
fig
The histogram shows us the transaction distribution of several property types across time. This chart has displayed clear off-seasons of Dec, Jan and Feb for housing transactions, while the peak seasons are Jul, Aug and Sep. Among all property types, apartment and condo are the most popular, while others are not so commonly seen.
This map shows the average house transaction price of each planning area. It shows that houses sold along the coast line are in very different price ranges, which stays true with both northern and southern coast line. Therefore, for people who want to buy a house near the sea and with a limited budget, one may find a good price in a neighboring planning area if houses in the current list are too expensive.
This dumbbell chart is complementary to the map. For people who would like a house in the town center, information from the map may be misleading that every house is quite expensive. But the dumbbell chart shows that houses in planning areas such as Tanglin, River Valley and Newton can also be affordable.