#DATA SETS

Data used for this multidimensional analysis:

Categorised report volumes for each block (block level panel data 2012-2016 for 292 blocks, 164,682 reports total) with block address and complaint description,
HDB housing transaction prices (panel data 2011-2017, 4,600 transactions) with flat characteristics,
HDB resale price index 1Q 2009 collected from HDB website,
HDB block characteristics and location attributes collected from onemap.sg,
Coordinates of blocks and location attributes collected from Google maps.

#METHOD

Following our previous analysis on social interactions, we attempt to understand the practical applicability of the TC residential report data. Although the previous findings once again highlight a strong positive relation between the quality of public space and social interactions in a high-density residential neighborhood, this by itself has limited practical implications except for providing a comparative information on the level of social interactions in various neighborhoods for community planners.

In the current analysis we show that the TC residential report data could be also used to predict neighborhood change, which may have more important practical implications for urban analytics. Since urban change involves a transformation of the quality of the living environment and the social composition of a neighborhood, we look at the indicators of the migration process and the consequent decline of the value of the housing stock, which can cause an influx of lower income families. When this process starts, owners may begin to take less care or even abandon their public spaces so the neighborhood starts to get less atractive to higher-income families.

We expect a slower change in TC service request volumes to reflect the abandonment process in public space. Since we don’t have the socio-economic block composition data, we will use housing price as an important indicator of block’s socio-economic conditions, with the higher price reflecting an influx of wealthier households. We also use transactions volumes as an indicator of block’s social relationships dynamics, with a higher number of sales reflecting a greater change in social relationships. Although housing prices are a lagging indicator of neighborhood economic strength, since recorded sales occur as much as two to more than six months after a contract is signed, we utilize HDB housing price data (2011-2017) that contain HDB housing transactions by a block’s unique postal code, street address and floor range. We also collect the data on proximity to a range of location attributes from Google maps to build a hedonic price model.

Then, we compute a Block’s Price Perfomance Score which predicts changes over time relative to the town’s mean by comparing the prices estimated by the model with the real housing prices indexed to Q1 2009 according to HDB resale index.

For each transaction, we compute how many per cent more (less) expensive the price of the given unit is compared to the price predicted by hedonic price of the unit: \[ Y=100\cdot \frac{P_{actual}-P_{hedonic}}{P_{hedonic}} \] This is called an excess price. Since our spatial unit of analysis is a block, we further compute the mean excess price over all transactions in each block \(z\) to get the Block’s Price Perfomance Score over three given time frames \(t\), 2011-2017, 2011-2014, and 2015-2017 to test whether a change in the public space quality accures in a shorter time frame and whether that change can be assosiated with a change in the Block’s Price Perfomance or a change in the social relationships dynamics: \[ Y_T(z)=\mathrm{mean}_{t\in T} Y_t(z), \] where \(Y_t(z)\) is a transaction in block \(z\) at a moment \(t\) and \(T\) is one of the tree periods.

We further use the Block’s Price Perfomance Score for a correlation analysis with the various types of TC residential reports to find associations between the price and public space’s percived social and physical qualities.

Data

Below is the overall market trend of HDB resale prices. Source:

https://www.hdb.gov.sg/cs/infoweb/residential/buying-a-flat/resale/getting-started/resale-statistics

Here are transactions:

## Variables in transactions data:  year_qtr flat_type storey_range floor_area_sqm flat_model lease_sd resale_price yearmon address index indexed_price price_per_sqm flat_age

## Number of transactions: 17732

## Earliest transaction: Jan 2000

## Latest transaction: Sep 2020

Distribution of indexed price per square metre:

Periods

We will be interested in three periods:

pre-period: before we have the data on residential complaints
main period: when we have the data on residential complaints
post-period: after we have the data on residential complaints

The pre-period starts on

## [1] "Jan 2009"

This is because the newest block in Bukit Panjang was constructed in 2003. We wanted to make sure that even the newest block would be on the market.

The main period starts on

## [1] "Jan 2012"

This is simply the data that we have.

The post-period starts on

## [1] "Jan 2017"

and our latest transaction was made in

## [1] "Sep 2020"

Hedonic regression

The general form of a hedonic regression is \[ Y=f(X), \] where \(Y\) is the unit price and \(X\) is the vector of all known characteristics of a flat. In our case, entries of \(X\) are flat area, age at the moment of transaction, flat type dummies (4-room, 5-room, executive, premium with baseline being 3-room), level dummies (7-12, 13-18 and 19 or above with baseline being 1-6), walking distance to a station, dummy for whether the nearest station is MRT as opposed to LRT, direct distance to a park, top school dummy.

The specific form of the function \(f(X)\) is not known and we are going to try four approaches:

Log-linear, as Eddie D.W. Sue and Wei-Kang Wong. In log-linear regression, adding 1 square metre increases the flat price by a certain percentage, adding one year to its age decreases the price by a certain percentage, and acquiring each of the properties specified by dummies increases the price by a certain percentage.
Linear. In a linear regression, adding 1 square metre increases the price by a certain amount (price of one square metre), adding one year decreases the price by a certain amount (deprecation), and acquiring each of the properties specified by dummies increases the price by a certain amount.
Nonlinear regression that is a mixed of linear and log-linear. In such a regression, adding 1 square metre increases the price by a certain amount (price of one square metre), adding one year decreases the price by a certain amount (deprecation), and acquiring each of the properties specified by dummies increases the price by a certain percentage.

Note that nonlinear regression makes more sense but it is harder to implement. For instance, it’s not very clear how to calculate confidence intervals for the nonlinear regression.

Variables

Flat types / models

Below is the table containing the number of transactions by flat types and flat models.

##                    
##                     3 ROOM 4 ROOM 5 ROOM EXECUTIVE
##   Adjoined flat          0      0      1         0
##   Apartment              0      0      0       831
##   Improved               0      0   4014         0
##   Maisonette             0      0      0       651
##   Model A             2071   5672      0         0
##   Model A2               0    677      0         0
##   Premium Apartment      0   1121   1263       358
##   Simplified           228    845      0         0

The baseline is a 3-room non-premium flat. We will create dummies for 4-room, 5-room, executive, and premium flats. We are not given a flat number of even the floor - instead, we have the storey range:

## 
## 01 TO 03 01 TO 05 04 TO 06 06 TO 10 07 TO 09 10 TO 12 11 TO 15 13 TO 15 
##     3286       73     3948       63     3344     3172       50     1573 
## 16 TO 18 16 TO 20 19 TO 21 21 TO 25 22 TO 24 25 TO 27 26 TO 30 28 TO 30 
##      886       24      588       10      357      202        2      154

We will convert this information to numeric by replacing each range with the average value of the two stories. Finally, we attach

Period 1

Log-linear regression

According to Eddie D.W. Sue and Wei-Kang Wong, the functional form of hedonic regression is \[ \ln Y = a_0+b_1\times \mathrm{Area}+b_2\times \mathrm{Age} + \sum_{i=1}^{k}c_i\times\mathrm{H}_i, \] where the sum is taken over all available hedonic variables. In other words, we get \[ Y = \tilde{a}_0\cdot e^{b_1\times \mathrm{Area}}\cdot e^{b_2\times \mathrm{Age}} \cdot \prod_{i=1}^{k}e^{c_i\times\mathrm{H}_i}, \] It means that adding one square metre adds certain percentage to the flat price rather than adds some fixed sum to the flat price. Every extra minute of walking from a station reduces the flat price by certain percentage rather than by a fixed sum. Being within 1 km from a good school increases the price by a certain percentage rather than by a fixed sum etc.

For the future analysis, we will add groups of variables one by one and see how the model is changing.

We will compare different models based on mean absolute error. The mean absolute error is \[ \frac{1}{N}\sum_{i=1}^{n}\left|Y^i-f(X^i)\right|, \] i.e., it shows how much, on average, our model overpredicts or underpredicts resale price.

Here is how we add variables one by one:

## [[1]]
## [1] "flat_age"       "floor_area_sqm"
## 
## [[2]]
## [1] "flat_age"            "floor_area_sqm"      "floor"              
## [4] "flat_type_4 ROOM"    "flat_type_5 ROOM"    "flat_type_EXECUTIVE"
## [7] "flat_premium"       
## 
## [[3]]
## [1] "flat_age"            "floor_area_sqm"      "floor"              
## [4] "flat_type_4 ROOM"    "flat_type_5 ROOM"    "flat_type_EXECUTIVE"
## [7] "flat_premium"        "max_floor_lvl"       "flats_total"        
## 
## [[4]]
##  [1] "flat_age"            "floor_area_sqm"      "floor"              
##  [4] "flat_type_4 ROOM"    "flat_type_5 ROOM"    "flat_type_EXECUTIVE"
##  [7] "flat_premium"        "max_floor_lvl"       "flats_total"        
## [10] "walk_to_station"     "direct_to_park"      "walk_to_market"     
## [13] "to_BP_walk"          "good_school_1km"

And here are linear models:

## raw price is predicted
## 
## 
## Call:
## lm(formula = indexed_price ~ ., data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -121748  -30859   -4982   21932  239580 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    82145.36    3030.55   27.11   <2e-16 ***
## flat_age       -2883.28      65.22  -44.21   <2e-16 ***
## floor_area_sqm  2734.58      24.41  112.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 43470 on 8763 degrees of freedom
## Multiple R-squared:  0.648,  Adjusted R-squared:  0.6479 
## F-statistic:  8065 on 2 and 8763 DF,  p-value: < 2.2e-16
## 
## Model error = 32961.53 
## raw price is predicted
## 
## 
## Call:
## lm(formula = indexed_price ~ ., data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -130520  -20744   -4431   14190  171155 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         157634.17    5168.47  30.499  < 2e-16 ***
## flat_age             -1759.08      69.39 -25.349  < 2e-16 ***
## floor_area_sqm        1196.35      71.26  16.788  < 2e-16 ***
## floor                 3753.87      64.48  58.217  < 2e-16 ***
## `flat_type_4 ROOM`   11578.61    2402.81   4.819 1.47e-06 ***
## `flat_type_5 ROOM`   48033.37    3772.03  12.734  < 2e-16 ***
## flat_type_EXECUTIVE 107397.98    5265.52  20.396  < 2e-16 ***
## flat_premium         -7776.96    1164.46  -6.679 2.56e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34820 on 8758 degrees of freedom
## Multiple R-squared:  0.7743, Adjusted R-squared:  0.7741 
## F-statistic:  4293 on 7 and 8758 DF,  p-value: < 2.2e-16
## 
## Model error = 25201.98 
## raw price is predicted
## 
## 
## Call:
## lm(formula = indexed_price ~ ., data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -108214  -18047   -1159   15196  176789 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         83331.42    4393.35  18.968  < 2e-16 ***
## flat_age              262.64      63.65   4.127 3.72e-05 ***
## floor_area_sqm       1085.71      57.75  18.800  < 2e-16 ***
## floor                2253.14      57.19  39.399  < 2e-16 ***
## `flat_type_4 ROOM`   2287.48    1948.99   1.174    0.241    
## `flat_type_5 ROOM`  34616.38    3060.46  11.311  < 2e-16 ***
## flat_type_EXECUTIVE 98743.77    4272.72  23.110  < 2e-16 ***
## flat_premium        -9943.41     957.41 -10.386  < 2e-16 ***
## max_floor_lvl        5569.86      91.64  60.777  < 2e-16 ***
## flats_total          -204.19      11.10 -18.397  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28170 on 8756 degrees of freedom
## Multiple R-squared:  0.8523, Adjusted R-squared:  0.8522 
## F-statistic:  5615 on 9 and 8756 DF,  p-value: < 2.2e-16
## 
## Model error = 21108.06 
## raw price is predicted
## 
## 
## Call:
## lm(formula = indexed_price ~ ., data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -127802  -14917    -969   13835  155926 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         169246.677   4321.550  39.163  < 2e-16 ***
## flat_age              -589.323     60.718  -9.706  < 2e-16 ***
## floor_area_sqm        1287.748     50.981  25.260  < 2e-16 ***
## floor                 2226.180     49.496  44.977  < 2e-16 ***
## `flat_type_4 ROOM`    6475.845   1738.550   3.725 0.000197 ***
## `flat_type_5 ROOM`   36753.239   2710.541  13.559  < 2e-16 ***
## flat_type_EXECUTIVE  95458.639   3769.620  25.323  < 2e-16 ***
## flat_premium          8349.689    985.967   8.469  < 2e-16 ***
## max_floor_lvl         2773.580    105.191  26.367  < 2e-16 ***
## flats_total           -171.593     10.470 -16.389  < 2e-16 ***
## walk_to_station       2160.940    171.968  12.566  < 2e-16 ***
## direct_to_park          -1.811      2.363  -0.766 0.443514    
## walk_to_market       -1654.857     94.023 -17.601  < 2e-16 ***
## to_BP_walk           -2615.607     71.364 -36.652  < 2e-16 ***
## good_school_1km       9453.352    829.022  11.403  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24350 on 8751 degrees of freedom
## Multiple R-squared:  0.8897, Adjusted R-squared:  0.8895 
## F-statistic:  5042 on 14 and 8751 DF,  p-value: < 2.2e-16
## 
## Model error = 18222.45

And log-linear models:

## log(price) is predicted
## 
## 
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46144 -0.08501 -0.01053  0.06551  0.48499 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.193e+01  8.400e-03 1420.31   <2e-16 ***
## flat_age       -1.018e-02  1.808e-04  -56.29   <2e-16 ***
## floor_area_sqm  8.534e-03  6.767e-05  126.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1205 on 8763 degrees of freedom
## Multiple R-squared:  0.7093, Adjusted R-squared:  0.7092 
## F-statistic: 1.069e+04 on 2 and 8763 DF,  p-value: < 2.2e-16
## 
## Model error = 31016.24 
## log(price) is predicted
## 
## 
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43179 -0.06354 -0.01123  0.04936  0.40934 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         12.0709542  0.0146165 825.842  < 2e-16 ***
## flat_age            -0.0064594  0.0001962 -32.915  < 2e-16 ***
## floor_area_sqm       0.0046527  0.0002015  23.087  < 2e-16 ***
## floor                0.0108051  0.0001824  59.253  < 2e-16 ***
## `flat_type_4 ROOM`   0.0545024  0.0067952   8.021 1.19e-15 ***
## `flat_type_5 ROOM`   0.1493680  0.0106674  14.002  < 2e-16 ***
## flat_type_EXECUTIVE  0.2730982  0.0148910  18.340  < 2e-16 ***
## flat_premium        -0.0165201  0.0032931  -5.017 5.36e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09846 on 8758 degrees of freedom
## Multiple R-squared:  0.806,  Adjusted R-squared:  0.8058 
## F-statistic:  5197 on 7 and 8758 DF,  p-value: < 2.2e-16
## 
## Model error = 24880.18 
## log(price) is predicted
## 
## 
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.36860 -0.05410 -0.00124  0.04899  0.43151 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.186e+01  1.242e-02 955.118  < 2e-16 ***
## flat_age            -7.418e-04  1.799e-04  -4.124 3.76e-05 ***
## floor_area_sqm       4.344e-03  1.632e-04  26.618  < 2e-16 ***
## floor                6.543e-03  1.616e-04  40.481  < 2e-16 ***
## `flat_type_4 ROOM`   2.825e-02  5.508e-03   5.129 2.97e-07 ***
## `flat_type_5 ROOM`   1.116e-01  8.650e-03  12.901  < 2e-16 ***
## flat_type_EXECUTIVE  2.491e-01  1.208e-02  20.629  < 2e-16 ***
## flat_premium        -2.294e-02  2.706e-03  -8.476  < 2e-16 ***
## max_floor_lvl        1.570e-02  2.590e-04  60.621  < 2e-16 ***
## flats_total         -5.602e-04  3.137e-05 -17.859  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07961 on 8756 degrees of freedom
## Multiple R-squared:  0.8732, Adjusted R-squared:  0.8731 
## F-statistic:  6700 on 9 and 8756 DF,  p-value: < 2.2e-16
## 
## Model error = 20360.73 
## log(price) is predicted
## 
## 
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42694 -0.04460 -0.00126  0.04418  0.36957 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.210e+01  1.220e-02 991.854   <2e-16 ***
## flat_age            -3.148e-03  1.714e-04 -18.370   <2e-16 ***
## floor_area_sqm       4.963e-03  1.439e-04  34.496   <2e-16 ***
## floor                6.456e-03  1.397e-04  46.221   <2e-16 ***
## `flat_type_4 ROOM`   4.099e-02  4.906e-03   8.354   <2e-16 ***
## `flat_type_5 ROOM`   1.179e-01  7.649e-03  15.419   <2e-16 ***
## flat_type_EXECUTIVE  2.390e-01  1.064e-02  22.462   <2e-16 ***
## flat_premium         2.957e-02  2.783e-03  10.627   <2e-16 ***
## max_floor_lvl        7.548e-03  2.969e-04  25.426   <2e-16 ***
## flats_total         -4.578e-04  2.955e-05 -15.495   <2e-16 ***
## walk_to_station      5.283e-03  4.853e-04  10.885   <2e-16 ***
## direct_to_park       5.972e-06  6.668e-06   0.896     0.37    
## walk_to_market      -4.048e-03  2.653e-04 -15.256   <2e-16 ***
## to_BP_walk          -7.536e-03  2.014e-04 -37.417   <2e-16 ***
## good_school_1km      2.963e-02  2.340e-03  12.664   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06872 on 8751 degrees of freedom
## Multiple R-squared:  0.9056, Adjusted R-squared:  0.9054 
## F-statistic:  5993 on 14 and 8751 DF,  p-value: < 2.2e-16
## 
## Model error = 17557.86

The most accurate model is the log-linear model that uses all the available variables as predictors.

Indices

The price performance index shows how different is the actual price from the hedonic price. Its definition is \[ \mbox{price_performance}= 100\times \mathrm{Mean}\left( \frac{\mbox{actual price} - \mbox{predicted price}}{ \mbox{predicted price}} \right) \] where the mean is taken over all transactions in a given block during a given period.

The annual mobility index is the number of transactions per year per 100 units in a given block during a given period.

MRT experiment

Here, we define treatment and control groups

## [1] 30

## [1] 25

## [1] 30

## [1] 29

## numeric(0)

## numeric(0)

## numeric(0)

## numeric(0)

Here we are relabelling our transaction data

Transaction price per square metre indexed according to market trend

Difference-in-difference MRT (Dec 2015)

QUESTIONS: How do urban amenities (MRT, NRP) interact with physical and social qualities of a neighborhood (social issues, conservancy, defects) in high-density residential context?

How does the quality of residential public space interact with housing value and mobility?

First, we find out how much residents value urban amenities, such as MRT and NRP. The MRT opening date December 2015 is used as the exogenous shock in the DID experiment.

We test for differences in housing prices between the blocks located within 12 min walking distance from the MRT (“treatment” area) and those blocks located in the 3 areas outside: buffer - within 13-15 min from MRT control 1 and 3 - within 6 min from markets in mature and new neighbourhoods.

Hedonic model of HDB transations

Data

Periods

Hedonic regression

Variables

Flat types / models

Period 1

Log-linear regression

Indices

MRT experiment

Transaction price per square metre indexed according to market trend

Difference-in-difference MRT (Dec 2015)

DID effect on price

Mobility

Conservancy

Hedonic model of HDB transations

Data

Periods

Hedonic regression

Variables

Flat types / models

Period 1

Log-linear regression

Indices

MRT experiment

Transaction price per square metre indexed according to market trend

Difference-in-difference MRT (Dec 2015)

DID effect on price

Mobility

Social issues

Conservancy