#DATA SETS
Data used for this multidimensional analysis:
#METHOD
Following our previous analysis on social interactions, we attempt to understand the practical applicability of the TC residential report data. Although the previous findings once again highlight a strong positive relation between the quality of public space and social interactions in a high-density residential neighborhood, this by itself has limited practical implications except for providing a comparative information on the level of social interactions in various neighborhoods for community planners.
In the current analysis we show that the TC residential report data could be also used to predict neighborhood change, which may have more important practical implications for urban analytics. Since urban change involves a transformation of the quality of the living environment and the social composition of a neighborhood, we look at the indicators of the migration process and the consequent decline of the value of the housing stock, which can cause an influx of lower income families. When this process starts, owners may begin to take less care or even abandon their public spaces so the neighborhood starts to get less atractive to higher-income families.
We expect a slower change in TC service request volumes to reflect the abandonment process in public space. Since we don’t have the socio-economic block composition data, we will use housing price as an important indicator of block’s socio-economic conditions, with the higher price reflecting an influx of wealthier households. We also use transactions volumes as an indicator of block’s social relationships dynamics, with a higher number of sales reflecting a greater change in social relationships. Although housing prices are a lagging indicator of neighborhood economic strength, since recorded sales occur as much as two to more than six months after a contract is signed, we utilize HDB housing price data (2011-2017) that contain HDB housing transactions by a block’s unique postal code, street address and floor range. We also collect the data on proximity to a range of location attributes from Google maps to build a hedonic price model.
Then, we compute a Block’s Price Perfomance Score which predicts changes over time relative to the town’s mean by comparing the prices estimated by the model with the real housing prices indexed to Q1 2009 according to HDB resale index.
For each transaction, we compute how many per cent more (less) expensive the price of the given unit is compared to the price predicted by hedonic price of the unit: \[ Y=100\cdot \frac{P_{actual}-P_{hedonic}}{P_{hedonic}} \] This is called an excess price. Since our spatial unit of analysis is a block, we further compute the mean excess price over all transactions in each block \(z\) to get the Block’s Price Perfomance Score over three given time frames \(t\), 2011-2017, 2011-2014, and 2015-2017 to test whether a change in the public space quality accures in a shorter time frame and whether that change can be assosiated with a change in the Block’s Price Perfomance or a change in the social relationships dynamics: \[ Y_T(z)=\mathrm{mean}_{t\in T} Y_t(z), \] where \(Y_t(z)\) is a transaction in block \(z\) at a moment \(t\) and \(T\) is one of the tree periods.
We further use the Block’s Price Perfomance Score for a correlation analysis with the various types of TC residential reports to find associations between the price and public space’s percived social and physical qualities.
Below is the overall market trend of HDB resale prices. Source:
Here are transactions:
## Variables in transactions data: year_qtr flat_type storey_range floor_area_sqm flat_model lease_sd resale_price yearmon address index indexed_price price_per_sqm flat_age
## Number of transactions: 17732
## Earliest transaction: Jan 2000
## Latest transaction: Sep 2020
Distribution of indexed price per square metre:
We will be interested in three periods:
pre-period: before we have the data on residential complaints
main period: when we have the data on residential complaints
post-period: after we have the data on residential complaints
The pre-period starts on
## [1] "Jan 2009"
This is because the newest block in Bukit Panjang was constructed in 2003. We wanted to make sure that even the newest block would be on the market.
The main period starts on
## [1] "Jan 2012"
This is simply the data that we have.
The post-period starts on
## [1] "Jan 2017"
and our latest transaction was made in
## [1] "Sep 2020"
The general form of a hedonic regression is \[ Y=f(X), \] where \(Y\) is the unit price and \(X\) is the vector of all known characteristics of a flat. In our case, entries of \(X\) are flat area, age at the moment of transaction, flat type dummies (4-room, 5-room, executive, premium with baseline being 3-room), level dummies (7-12, 13-18 and 19 or above with baseline being 1-6), walking distance to a station, dummy for whether the nearest station is MRT as opposed to LRT, direct distance to a park, top school dummy.
The specific form of the function \(f(X)\) is not known and we are going to try four approaches:
Log-linear, as Eddie D.W. Sue and Wei-Kang Wong. In log-linear regression, adding 1 square metre increases the flat price by a certain percentage, adding one year to its age decreases the price by a certain percentage, and acquiring each of the properties specified by dummies increases the price by a certain percentage.
Linear. In a linear regression, adding 1 square metre increases the price by a certain amount (price of one square metre), adding one year decreases the price by a certain amount (deprecation), and acquiring each of the properties specified by dummies increases the price by a certain amount.
Nonlinear regression that is a mixed of linear and log-linear. In such a regression, adding 1 square metre increases the price by a certain amount (price of one square metre), adding one year decreases the price by a certain amount (deprecation), and acquiring each of the properties specified by dummies increases the price by a certain percentage.
Note that nonlinear regression makes more sense but it is harder to implement. For instance, it’s not very clear how to calculate confidence intervals for the nonlinear regression.
Below is the table containing the number of transactions by flat types and flat models.
##
## 3 ROOM 4 ROOM 5 ROOM EXECUTIVE
## Adjoined flat 0 0 1 0
## Apartment 0 0 0 831
## Improved 0 0 4014 0
## Maisonette 0 0 0 651
## Model A 2071 5672 0 0
## Model A2 0 677 0 0
## Premium Apartment 0 1121 1263 358
## Simplified 228 845 0 0
The baseline is a 3-room non-premium flat. We will create dummies for 4-room, 5-room, executive, and premium flats. We are not given a flat number of even the floor - instead, we have the storey range:
##
## 01 TO 03 01 TO 05 04 TO 06 06 TO 10 07 TO 09 10 TO 12 11 TO 15 13 TO 15
## 3286 73 3948 63 3344 3172 50 1573
## 16 TO 18 16 TO 20 19 TO 21 21 TO 25 22 TO 24 25 TO 27 26 TO 30 28 TO 30
## 886 24 588 10 357 202 2 154
We will convert this information to numeric by replacing each range with the average value of the two stories. Finally, we attach
According to Eddie D.W. Sue and Wei-Kang Wong, the functional form of hedonic regression is \[ \ln Y = a_0+b_1\times \mathrm{Area}+b_2\times \mathrm{Age} + \sum_{i=1}^{k}c_i\times\mathrm{H}_i, \] where the sum is taken over all available hedonic variables. In other words, we get \[ Y = \tilde{a}_0\cdot e^{b_1\times \mathrm{Area}}\cdot e^{b_2\times \mathrm{Age}} \cdot \prod_{i=1}^{k}e^{c_i\times\mathrm{H}_i}, \] It means that adding one square metre adds certain percentage to the flat price rather than adds some fixed sum to the flat price. Every extra minute of walking from a station reduces the flat price by certain percentage rather than by a fixed sum. Being within 1 km from a good school increases the price by a certain percentage rather than by a fixed sum etc.
For the future analysis, we will add groups of variables one by one and see how the model is changing.
We will compare different models based on mean absolute error. The mean absolute error is \[ \frac{1}{N}\sum_{i=1}^{n}\left|Y^i-f(X^i)\right|, \] i.e., it shows how much, on average, our model overpredicts or underpredicts resale price.
Here is how we add variables one by one:
## [[1]]
## [1] "flat_age" "floor_area_sqm"
##
## [[2]]
## [1] "flat_age" "floor_area_sqm" "floor"
## [4] "flat_type_4 ROOM" "flat_type_5 ROOM" "flat_type_EXECUTIVE"
## [7] "flat_premium"
##
## [[3]]
## [1] "flat_age" "floor_area_sqm" "floor"
## [4] "flat_type_4 ROOM" "flat_type_5 ROOM" "flat_type_EXECUTIVE"
## [7] "flat_premium" "max_floor_lvl" "flats_total"
##
## [[4]]
## [1] "flat_age" "floor_area_sqm" "floor"
## [4] "flat_type_4 ROOM" "flat_type_5 ROOM" "flat_type_EXECUTIVE"
## [7] "flat_premium" "max_floor_lvl" "flats_total"
## [10] "walk_to_station" "direct_to_park" "walk_to_market"
## [13] "to_BP_walk" "good_school_1km"
And here are linear models:
## raw price is predicted
##
##
## Call:
## lm(formula = indexed_price ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -121748 -30859 -4982 21932 239580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82145.36 3030.55 27.11 <2e-16 ***
## flat_age -2883.28 65.22 -44.21 <2e-16 ***
## floor_area_sqm 2734.58 24.41 112.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 43470 on 8763 degrees of freedom
## Multiple R-squared: 0.648, Adjusted R-squared: 0.6479
## F-statistic: 8065 on 2 and 8763 DF, p-value: < 2.2e-16
##
## Model error = 32961.53
## raw price is predicted
##
##
## Call:
## lm(formula = indexed_price ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -130520 -20744 -4431 14190 171155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 157634.17 5168.47 30.499 < 2e-16 ***
## flat_age -1759.08 69.39 -25.349 < 2e-16 ***
## floor_area_sqm 1196.35 71.26 16.788 < 2e-16 ***
## floor 3753.87 64.48 58.217 < 2e-16 ***
## `flat_type_4 ROOM` 11578.61 2402.81 4.819 1.47e-06 ***
## `flat_type_5 ROOM` 48033.37 3772.03 12.734 < 2e-16 ***
## flat_type_EXECUTIVE 107397.98 5265.52 20.396 < 2e-16 ***
## flat_premium -7776.96 1164.46 -6.679 2.56e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34820 on 8758 degrees of freedom
## Multiple R-squared: 0.7743, Adjusted R-squared: 0.7741
## F-statistic: 4293 on 7 and 8758 DF, p-value: < 2.2e-16
##
## Model error = 25201.98
## raw price is predicted
##
##
## Call:
## lm(formula = indexed_price ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -108214 -18047 -1159 15196 176789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83331.42 4393.35 18.968 < 2e-16 ***
## flat_age 262.64 63.65 4.127 3.72e-05 ***
## floor_area_sqm 1085.71 57.75 18.800 < 2e-16 ***
## floor 2253.14 57.19 39.399 < 2e-16 ***
## `flat_type_4 ROOM` 2287.48 1948.99 1.174 0.241
## `flat_type_5 ROOM` 34616.38 3060.46 11.311 < 2e-16 ***
## flat_type_EXECUTIVE 98743.77 4272.72 23.110 < 2e-16 ***
## flat_premium -9943.41 957.41 -10.386 < 2e-16 ***
## max_floor_lvl 5569.86 91.64 60.777 < 2e-16 ***
## flats_total -204.19 11.10 -18.397 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28170 on 8756 degrees of freedom
## Multiple R-squared: 0.8523, Adjusted R-squared: 0.8522
## F-statistic: 5615 on 9 and 8756 DF, p-value: < 2.2e-16
##
## Model error = 21108.06
## raw price is predicted
##
##
## Call:
## lm(formula = indexed_price ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127802 -14917 -969 13835 155926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 169246.677 4321.550 39.163 < 2e-16 ***
## flat_age -589.323 60.718 -9.706 < 2e-16 ***
## floor_area_sqm 1287.748 50.981 25.260 < 2e-16 ***
## floor 2226.180 49.496 44.977 < 2e-16 ***
## `flat_type_4 ROOM` 6475.845 1738.550 3.725 0.000197 ***
## `flat_type_5 ROOM` 36753.239 2710.541 13.559 < 2e-16 ***
## flat_type_EXECUTIVE 95458.639 3769.620 25.323 < 2e-16 ***
## flat_premium 8349.689 985.967 8.469 < 2e-16 ***
## max_floor_lvl 2773.580 105.191 26.367 < 2e-16 ***
## flats_total -171.593 10.470 -16.389 < 2e-16 ***
## walk_to_station 2160.940 171.968 12.566 < 2e-16 ***
## direct_to_park -1.811 2.363 -0.766 0.443514
## walk_to_market -1654.857 94.023 -17.601 < 2e-16 ***
## to_BP_walk -2615.607 71.364 -36.652 < 2e-16 ***
## good_school_1km 9453.352 829.022 11.403 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24350 on 8751 degrees of freedom
## Multiple R-squared: 0.8897, Adjusted R-squared: 0.8895
## F-statistic: 5042 on 14 and 8751 DF, p-value: < 2.2e-16
##
## Model error = 18222.45
And log-linear models:
## log(price) is predicted
##
##
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46144 -0.08501 -0.01053 0.06551 0.48499
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.193e+01 8.400e-03 1420.31 <2e-16 ***
## flat_age -1.018e-02 1.808e-04 -56.29 <2e-16 ***
## floor_area_sqm 8.534e-03 6.767e-05 126.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1205 on 8763 degrees of freedom
## Multiple R-squared: 0.7093, Adjusted R-squared: 0.7092
## F-statistic: 1.069e+04 on 2 and 8763 DF, p-value: < 2.2e-16
##
## Model error = 31016.24
## log(price) is predicted
##
##
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.43179 -0.06354 -0.01123 0.04936 0.40934
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.0709542 0.0146165 825.842 < 2e-16 ***
## flat_age -0.0064594 0.0001962 -32.915 < 2e-16 ***
## floor_area_sqm 0.0046527 0.0002015 23.087 < 2e-16 ***
## floor 0.0108051 0.0001824 59.253 < 2e-16 ***
## `flat_type_4 ROOM` 0.0545024 0.0067952 8.021 1.19e-15 ***
## `flat_type_5 ROOM` 0.1493680 0.0106674 14.002 < 2e-16 ***
## flat_type_EXECUTIVE 0.2730982 0.0148910 18.340 < 2e-16 ***
## flat_premium -0.0165201 0.0032931 -5.017 5.36e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09846 on 8758 degrees of freedom
## Multiple R-squared: 0.806, Adjusted R-squared: 0.8058
## F-statistic: 5197 on 7 and 8758 DF, p-value: < 2.2e-16
##
## Model error = 24880.18
## log(price) is predicted
##
##
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36860 -0.05410 -0.00124 0.04899 0.43151
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.186e+01 1.242e-02 955.118 < 2e-16 ***
## flat_age -7.418e-04 1.799e-04 -4.124 3.76e-05 ***
## floor_area_sqm 4.344e-03 1.632e-04 26.618 < 2e-16 ***
## floor 6.543e-03 1.616e-04 40.481 < 2e-16 ***
## `flat_type_4 ROOM` 2.825e-02 5.508e-03 5.129 2.97e-07 ***
## `flat_type_5 ROOM` 1.116e-01 8.650e-03 12.901 < 2e-16 ***
## flat_type_EXECUTIVE 2.491e-01 1.208e-02 20.629 < 2e-16 ***
## flat_premium -2.294e-02 2.706e-03 -8.476 < 2e-16 ***
## max_floor_lvl 1.570e-02 2.590e-04 60.621 < 2e-16 ***
## flats_total -5.602e-04 3.137e-05 -17.859 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07961 on 8756 degrees of freedom
## Multiple R-squared: 0.8732, Adjusted R-squared: 0.8731
## F-statistic: 6700 on 9 and 8756 DF, p-value: < 2.2e-16
##
## Model error = 20360.73
## log(price) is predicted
##
##
## Call:
## lm(formula = log(indexed_price) ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.42694 -0.04460 -0.00126 0.04418 0.36957
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.210e+01 1.220e-02 991.854 <2e-16 ***
## flat_age -3.148e-03 1.714e-04 -18.370 <2e-16 ***
## floor_area_sqm 4.963e-03 1.439e-04 34.496 <2e-16 ***
## floor 6.456e-03 1.397e-04 46.221 <2e-16 ***
## `flat_type_4 ROOM` 4.099e-02 4.906e-03 8.354 <2e-16 ***
## `flat_type_5 ROOM` 1.179e-01 7.649e-03 15.419 <2e-16 ***
## flat_type_EXECUTIVE 2.390e-01 1.064e-02 22.462 <2e-16 ***
## flat_premium 2.957e-02 2.783e-03 10.627 <2e-16 ***
## max_floor_lvl 7.548e-03 2.969e-04 25.426 <2e-16 ***
## flats_total -4.578e-04 2.955e-05 -15.495 <2e-16 ***
## walk_to_station 5.283e-03 4.853e-04 10.885 <2e-16 ***
## direct_to_park 5.972e-06 6.668e-06 0.896 0.37
## walk_to_market -4.048e-03 2.653e-04 -15.256 <2e-16 ***
## to_BP_walk -7.536e-03 2.014e-04 -37.417 <2e-16 ***
## good_school_1km 2.963e-02 2.340e-03 12.664 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06872 on 8751 degrees of freedom
## Multiple R-squared: 0.9056, Adjusted R-squared: 0.9054
## F-statistic: 5993 on 14 and 8751 DF, p-value: < 2.2e-16
##
## Model error = 17557.86
The most accurate model is the log-linear model that uses all the available variables as predictors.
The price performance index shows how different is the actual price from the hedonic price. Its definition is \[ \mbox{price_performance}= 100\times \mathrm{Mean}\left( \frac{\mbox{actual price} - \mbox{predicted price}}{ \mbox{predicted price}} \right) \] where the mean is taken over all transactions in a given block during a given period.
The annual mobility index is the number of transactions per year per 100 units in a given block during a given period.
Here, we define treatment and control groups
## [1] 30
## [1] 25
## [1] 30
## [1] 29
## numeric(0)
## numeric(0)
## numeric(0)
## numeric(0)
Here we are relabelling our transaction data
QUESTIONS: How do urban amenities (MRT, NRP) interact with physical and social qualities of a neighborhood (social issues, conservancy, defects) in high-density residential context?
How does the quality of residential public space interact with housing value and mobility?
First, we find out how much residents value urban amenities, such as MRT and NRP. The MRT opening date December 2015 is used as the exogenous shock in the DID experiment.
We test for differences in housing prices between the blocks located within 12 min walking distance from the MRT (“treatment” area) and those blocks located in the 3 areas outside: buffer - within 13-15 min from MRT control 1 and 3 - within 6 min from markets in mature and new neighbourhoods.
This is the actual price divided by the predicted price
END OF KARTOSHKA
Social issues