Name: Shan Huijie

1. Overview

The data visualization below aims at visualizing Singapore’s average resale flats prices by flat type, region in 2021.Map and heatmap will be used to find insights on the HDB Resale Flat Price dataset.

data set can be found from :
1. HDB Resale flat price data: https://data.gov.sg/dataset/resale-flat-prices
2. URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL)

1.1 Proposed Sketch of Design

Caption for the picture.

Caption for the picture.

1.2 Data and Design Challenges

  1. The first challenge faced is in the data preparation stage where we need to extract the exact year attribute from ‘month’ column. The value in ‘month’ column has a format of ‘YYYY-MM’.

  2. The second challenge faced is that the origin data set don’t provide summary data which is hard to compare the resale price among different regions or planning areas with the flat type.

1.3 How to Overcome Challenges

  1. We can use the substr() function to extract the first 4 digits which is the year from ‘YYYY-MM’ under the ‘month’ column. By using the mutate() function, we can create a new column named ‘year’.

  2. Aggregate and summarize the data to find the average resale price by planning area and flat type. Thus, we can compare the differences and find some insights.

2. Step-by-step Description

2.1 Install and load packages

packages = c('sf', 'tmap', 'tidyverse', 'treemap', 'plotly')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}
## Loading required package: sf
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
## Loading required package: tmap
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: treemap
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

2.2 import data set

Use the read_csv() function of readr package to import Singapore resale flat prices csv file into R.

raw_dataset <- read_csv("data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   month = col_character(),
##   town = col_character(),
##   flat_type = col_character(),
##   block = col_character(),
##   street_name = col_character(),
##   storey_range = col_character(),
##   floor_area_sqm = col_double(),
##   flat_model = col_character(),
##   lease_commence_date = col_double(),
##   remaining_lease = col_character(),
##   resale_price = col_double()
## )

Use the st_read() function of sf package to import MP14_PLNG_AREA_WEB_PL shapefile into R as a simple feature data frame called mpsz.

mpsz <- st_read(dsn = "data/geospatial",layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `/Users/MIYA/Desktop/IS 428/assignment/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
unique(mpsz$PLN_AREA_N)
##  [1] "MARINA SOUTH"            "OUTRAM"                 
##  [3] "SINGAPORE RIVER"         "BUKIT MERAH"            
##  [5] "QUEENSTOWN"              "MARINA EAST"            
##  [7] "RIVER VALLEY"            "WESTERN ISLANDS"        
##  [9] "SOUTHERN ISLANDS"        "DOWNTOWN CORE"          
## [11] "STRAITS VIEW"            "MARINE PARADE"          
## [13] "MUSEUM"                  "ORCHARD"                
## [15] "ROCHOR"                  "KALLANG"                
## [17] "TANGLIN"                 "NEWTON"                 
## [19] "CLEMENTI"                "TUAS"                   
## [21] "BEDOK"                   "PIONEER"                
## [23] "JURONG EAST"             "BUKIT TIMAH"            
## [25] "NOVENA"                  "GEYLANG"                
## [27] "BOON LAY"                "TOA PAYOH"              
## [29] "JURONG WEST"             "BUKIT BATOK"            
## [31] "SERANGOON"               "PAYA LEBAR"             
## [33] "BISHAN"                  "TAMPINES"               
## [35] "HOUGANG"                 "BUKIT PANJANG"          
## [37] "ANG MO KIO"              "CHOA CHU KANG"          
## [39] "PASIR RIS"               "CHANGI"                 
## [41] "SENGKANG"                "CHANGI BAY"             
## [43] "TENGAH"                  "SUNGEI KADUT"           
## [45] "PUNGGOL"                 "YISHUN"                 
## [47] "MANDAI"                  "SELETAR"                
## [49] "WOODLANDS"               "WESTERN WATER CATCHMENT"
## [51] "SEMBAWANG"               "SIMPANG"                
## [53] "LIM CHU KANG"            "NORTH-EASTERN ISLANDS"  
## [55] "CENTRAL WATER CATCHMENT"

2.3 Data Preparation for Visualizations

2.3.1 Manipulate the data set

In order to get the data set only for year 2021, we need to extract the year data from the ‘month’ column. So we use the mutate() function to create a new column named ‘year’ with the first 4 digit of the value in ‘month’ column.

raw_dataset <- raw_dataset %>% 
                mutate(year = substr(month, start = 1, stop = 4))
head(raw_dataset)
## # A tibble: 6 x 12
##   month town  flat_type block street_name storey_range floor_area_sqm flat_model
##   <chr> <chr> <chr>     <chr> <chr>       <chr>                 <dbl> <chr>     
## 1 2017… ANG … 2 ROOM    406   ANG MO KIO… 10 TO 12                 44 Improved  
## 2 2017… ANG … 3 ROOM    108   ANG MO KIO… 01 TO 03                 67 New Gener…
## 3 2017… ANG … 3 ROOM    602   ANG MO KIO… 01 TO 03                 67 New Gener…
## 4 2017… ANG … 3 ROOM    465   ANG MO KIO… 04 TO 06                 68 New Gener…
## 5 2017… ANG … 3 ROOM    601   ANG MO KIO… 01 TO 03                 67 New Gener…
## 6 2017… ANG … 3 ROOM    150   ANG MO KIO… 01 TO 03                 68 New Gener…
## # … with 4 more variables: lease_commence_date <dbl>, remaining_lease <chr>,
## #   resale_price <dbl>, year <chr>

After generating the ‘year’ attribute, we are able to filter the data set only for 2021.

data2021 <- raw_dataset %>% 
      filter(year == "2021") %>% 
      select(c("year", "town", "flat_type","resale_price","flat_model","storey_range")) 

head(data2021)
## # A tibble: 6 x 6
##   year  town       flat_type resale_price flat_model     storey_range
##   <chr> <chr>      <chr>            <dbl> <chr>          <chr>       
## 1 2021  ANG MO KIO 2 ROOM          225000 Improved       07 TO 09    
## 2 2021  ANG MO KIO 2 ROOM          211000 Improved       01 TO 03    
## 3 2021  ANG MO KIO 3 ROOM          275888 New Generation 04 TO 06    
## 4 2021  ANG MO KIO 3 ROOM          316800 New Generation 07 TO 09    
## 5 2021  ANG MO KIO 3 ROOM          305000 New Generation 10 TO 12    
## 6 2021  ANG MO KIO 3 ROOM          260000 New Generation 04 TO 06

[Visualizations for region]

Calculate the average price of each town by flat type. First, use the group_by() function to group the data by town. We don’t include the year because now the data set only contains 2021’s data. Then use the mean() function to calculate the average price.

avg_data <- data2021 %>%
                group_by(town,flat_type)  %>% 
                summarize("average_price" = mean(resale_price), "volume" = n())
## `summarise()` has grouped output by 'town'. You can override using the `.groups` argument.
avg_data
## # A tibble: 125 x 4
## # Groups:   town [26]
##    town       flat_type average_price volume
##    <chr>      <chr>             <dbl>  <int>
##  1 ANG MO KIO 2 ROOM          218000       2
##  2 ANG MO KIO 3 ROOM          312095.    144
##  3 ANG MO KIO 4 ROOM          468997.     71
##  4 ANG MO KIO 5 ROOM          687892.     39
##  5 ANG MO KIO EXECUTIVE       825000       5
##  6 BEDOK      2 ROOM          251667.      3
##  7 BEDOK      3 ROOM          318013.    130
##  8 BEDOK      4 ROOM          480810.    135
##  9 BEDOK      5 ROOM          602225.     88
## 10 BEDOK      EXECUTIVE       762419.     20
## # … with 115 more rows

Calculate the average price of each town regardless of flat type.

full_avg <- data2021 %>%
                group_by(town)  %>% 
                summarize("average_price" = mean(resale_price), "volume" = n())
full_avg
## # A tibble: 26 x 3
##    town          average_price volume
##  * <chr>                 <dbl>  <int>
##  1 ANG MO KIO          420035.    261
##  2 BEDOK               466091.    376
##  3 BISHAN              641006.    151
##  4 BUKIT BATOK         446781.    171
##  5 BUKIT MERAH         592222.    259
##  6 BUKIT PANJANG       471567.    284
##  7 BUKIT TIMAH         791167.     28
##  8 CENTRAL AREA        661550.     66
##  9 CHOA CHU KANG       449445.    324
## 10 CLEMENTI            528775.    172
## # … with 16 more rows

Calculate the average price based on the planning area and storey level range.

storey_avgprice <- data2021 %>%
                group_by(town,storey_range)  %>% 
                summarize("average_price" = mean(resale_price), "volume" = n())
## `summarise()` has grouped output by 'town'. You can override using the `.groups` argument.
storey_avgprice
## # A tibble: 244 x 4
## # Groups:   town [26]
##    town       storey_range average_price volume
##    <chr>      <chr>                <dbl>  <int>
##  1 ANG MO KIO 01 TO 03           368890.     52
##  2 ANG MO KIO 04 TO 06           388659.     69
##  3 ANG MO KIO 07 TO 09           377724.     64
##  4 ANG MO KIO 10 TO 12           383792      42
##  5 ANG MO KIO 13 TO 15           613606.     16
##  6 ANG MO KIO 16 TO 18           780286.      7
##  7 ANG MO KIO 19 TO 21           749650       2
##  8 ANG MO KIO 22 TO 24           741148       6
##  9 ANG MO KIO 25 TO 27           850000       1
## 10 ANG MO KIO 28 TO 30           630000       2
## # … with 234 more rows
2.3.2 Join the data set with the planning area data

Standardized with data with title case for better visualization.

avg_data$town <- str_to_title(avg_data$town)
full_avg$town <- str_to_title(full_avg$town)
mpsz$PLN_AREA_N <- str_to_title(mpsz$PLN_AREA_N)
mpsz$REGION_N <- str_to_title(mpsz$REGION_N)

Use left_join() from dplyr to join the geographical data and resale data using PLN_AREA_N and town as the common identifier.

avg_data_with_PA <- left_join(mpsz,avg_data,by = c("PLN_AREA_N" = "town"))

Prepare the data set for the interactive map regardless of flat type.

full_avg <- left_join(mpsz,full_avg,by = c("PLN_AREA_N" = "town"))

2.4 Visualization Stage

2.4.1 Visualization for Singapore HDB Resale Flat average prices by town and Region

[Create Static Map ]

Use the tmap package with plot mode to create the static visualization. The region will be filled in the map. Bubble’s size refers to the number of flats resaled in the town. Bubble’s color refers to the average prices range.

tmap_mode("plot")
## tmap mode set to plotting
tm_shape(avg_data_with_PA)+
  tm_fill("REGION_N",
          n=6,
          style="pretty",
          palette="Blues") +
  tm_facets(by="flat_type", 
            free.coords=TRUE) +
  tm_borders(alpha=0.5) +
  tm_bubbles (size = "volume",col="average_price") +
  tm_layout(main.title = "The Spatial Distribution of HDB Resale Flat Price in Singapore",
            main.title.position="center",
            main.title.size=1.2,
            legend.outside=TRUE,
            legend.position=c("right","bottom"),
            frame=FALSE)+
  tm_compass(type="4star",size=2) +
  tm_scale_bar(width=0.2)

[Create Interactive Map regardless of flat type ]

Use the tmap package with view mode to create the static visualization.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(full_avg)+
  tm_fill("REGION_N",
          n=6,
          style="pretty",
          palette="Blues") +
  tm_borders(alpha=0.5) +
  tm_bubbles (size = "volume",col="average_price") +
  tm_layout(main.title = "The Spatial Distribution of HDB Resale Flat Price in Singapore",
            main.title.position="center",
            main.title.size=1.2,
            legend.outside=TRUE,
            legend.position=c("right","bottom"),
            frame=FALSE)+
  tm_compass(type="4star",size=2) +
  tm_scale_bar(width=0.2)
## Compass not supported in view mode.
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
## Scale bar width set to 100 pixels
## Legend for symbol sizes not available in view mode.
2.4.2 Heatmap for the average resale price by planning areas and storey level range

Use ggplot to pot a heatmap.

heatmap <- ggplot(data = storey_avgprice, 
                  aes(x = town, 
                      y = storey_range, 
                      fill = average_price,
                      text = paste("PLanning Area: ", town,
                             "<br>Storey Range: ", storey_range,
                             "<br>Average Resale Price: S$", average_price)
                     )
                  ) +
            geom_tile() +
            theme(axis.text.x = element_text(angle = 90)) +
            labs(x = "Planning Area", 
                 y = "Storey") +
            scale_fill_gradient(name = "Average Resale Price (S$)",
                                low = "#B0E2FF",
                                high = "#003F87") 

heatmap_interactive <- ggplotly(heatmap, tooltip = "text")
heatmap_interactive

3. Insights from final visualization

Insight 1: Hot-selling of 4-room and 5-room flats

The spatial distribution shows that 4-room and 5-room not only have the highest volume of resales but also have the highest average resale price.

Insight 2: Higher Storey Level has Higher Average Resale Price

The heatmap shows that the average resale price increases along with the increase of the storey. The color becomes darker from bottom to top. Moreover, Bukit Timah area is the area with highest average resale price regardless with the storey level.