House Prices and Access to Parks in London

Spatial Data Analysis on Green Spaces and House Prices in London Boroughs

Author
Affiliations

John Karuitha

Karatina University, School of Business

University of the Witwatersrand, School of Construction Economics & Management

Published

February 11, 2024

Modified

February 11, 2024

Abstract
In this analysis, we examine the connection between the accessibility to local parks and housing prices in London. The results reveal a noteworthy association between house prices and access to local parks and metropolitan parks. However, district parks and open spaces show a negative relationship with house prices, although it is not statistically significant. Notably, Westminster consistently stands out as the priciest area. The regression models employed in the analysis show that accessibility to parks has limited role in influencing housing prices. Instead, the borough is of primary importance in pricing. Nonetheless, access to open spaces, local parks, and district parks has a significant relationship with house prices. Metropolitan parks and regional parks have no significant relationship with prices.
Keywords

R, Quarto, Spatial Data, sf, Housing, London, Parks

1 BACKGROUND

In the urban context of London, the relationship between access to green spaces and housing prices has become a significant focal point for investigation. As urbanization intensifies, the availability of green spaces in a city has profound implications for the quality of life and the economic landscape. Access to parks, gardens, and recreational areas not only contributes to the overall well-being of residents but also serves as a potential factor influencing housing market dynamics. Understanding this relationship is crucial for urban planners, policymakers, and residents alike, as it can inform strategic decisions regarding green space preservation, urban development, and housing affordability in a metropolis as dynamic and diverse as London.

Against this backdrop, the main objective of the forthcoming analysis is to systematically explore and establish the intricate relationship between access to green spaces and the pricing of residential properties in London. This investigation aims to employ quantitative methods to discern patterns, correlations, and potential causation between the proximity and quality of green spaces and the fluctuating prices of houses across different neighborhoods in the city. By unraveling these connections, the analysis seeks to contribute valuable insights that can guide urban planning initiatives, inform housing policies, and enhance our understanding of the complex interplay between urban ecology and real estate dynamics in the context of London.

Code
## Load packages manager ----
if(!require(pacman)){
  install.packages('pacman')
}

## Load required packages ----
p_load(tidyverse, janitor, 
       skimr, mice, ggthemes, 
       rmarkdown, readxl, 
       conflicted, naniar, GGally,
       modelsummary, sf, tmap, gt,
       patchwork, spdep, kableExtra,
       sp, maptools)
## Load from github ----
p_load_gh("datarootsio/artyfarty")

## Set the options ----
options(digits = 3)
options(scipen = 999)

## Set a nice theme for plots ----
theme_set(theme_bain())

2 OBJECTIVE

The main objective of this analysis is to establish the relationship between access to green spaces and the price of houses in London.

3 DATA

The data set encompasses house prices across various London regions. To facilitate mapping, I enhance the set of data by incorporating a shape file of London. Additionally, I categorize the regions into either Inner London or Outer London. The analysis initially encompasses the entirety of London before specifically delving into the examination of Inner London.

Code
inner_london <- c('Camden', 'Greenwich',
'Hackney', 'Hammersmith and Fulham', 'Islington',
'Royal Borough of Kensington and Chelsea', 'Lambeth', 'Lewisham', 'Southwark',
'Tower', 'Hamlets', 'Wandsworth', 'Westminster', 'City of London')

outer_london <- c('Barking and Dagenham', 'Barnet', 'Bexley', 'Brent', 'Bromley', 
'Croydon', 'Ealing', 'Enfield', 'Haringey',
'Harrow', 'Havering', 'Hillingdon', 
'Hounslow', 'Kingston upon Thames', 'Merton',
'Newham', 'Redbridge', 'Richmond upon Thames', 'Sutton', 'Waltham Forest')
Code
## Maps ----
parks <- read_sf("data/London-wards-2018/London-wards-2018_ESRI/London_Ward_CityMerged.shp") %>% 
  left_join(
## Parks ----
read_csv("data/access.csv", 
                 skip = 1,
         na = "") %>% 
  clean_names() %>% 
  mutate(area = case_when(
    borough_name %in% inner_london ~ "inner_london",
    .default = "outer_london"
  )), 
            by = join_by(
              GSS_CODE == wd13cd 
            )) %>% 
  left_join(
####################################
read_csv('data/prices.csv', 
                           skip = 5,
                   na = ":") %>% 
  janitor::remove_empty() %>% 
  clean_names() %>% 
  select(ward_code, ward_name, starts_with("year")) %>% 
  mutate(
    across(
      .cols = c(starts_with("Year ending") & 
                  where(is.character)),
      .fns = ~ parse_number(.x)
    )
  ),

by = join_by(
  GSS_CODE == ward_code,
  NAME == ward_name
)
) %>% 
  pivot_longer(cols = starts_with("year_ending"),
               names_to = "year",
               values_to = "prices") %>%
  mutate(
    year = str_extract(year, "\\d{4}$"),
    year = as.numeric(year)
  ) %>% 
  dplyr::filter(year == 2017, !is.na(prices))

The data contains 17 variables and 525 observations of data. I augment the data with data for the map of London from the sf package to permit data visualization using maps.

We examine the missing values in the data. Eight (8) variables in the parks data have 56 missing values each. The table below shows the variables and their associated missing values.

Code
parks %>% 
  sapply(is.na) %>% 
  colSums() %>% 
  tibble(variables = names(parks),
         missing = .) %>% 
  arrange(desc(missing)) %>% 
  dplyr::filter(missing > 0) %>% 
  gt(caption = "Missing Values")
Missing Values
variables missing
ward_name 56
borough_name 56
open_space 56
local_parks 56
district_parks 56
metropolitan_parks 56
regional_parks 56
area 56

4 PART A: ALL LONDON WARDS

In this section, I examine all the boroughs in London. There are 525 London boroughs in this set of data. I start with data exploration and then run a series of statistical tests.

4.1 Data Exploration and Visualization

The data visualization shows the clustering of houses by neighborhoods and house prices. Central London appears to have a very high concentration of high priced houses. The prices of the houses decline as we get further away from central London. Figure 2 shows the relationship between house prices and access to parks; open space, local parks, district parks, and metropolitan parks. Across all the cases, there is not much pattern. However, much of this pattern could unravel once we do spatial analysis.

Code
## Draw a map ----
colors <- RColorBrewer::brewer.pal(7, "RdPu")
tmap::tm_shape(parks) +
  tm_fill(col = 'prices',
          style = "pretty",
          palette = colors) + 
  tm_legend(outside = TRUE, text.size = 0.8) + 
  tm_layout(frame = FALSE)

House Price in Greater London Boroughs
Code
(parks %>% 
  ggplot(mapping = aes(x = open_space, y = prices)) + 
  geom_point()  + 


parks %>% 
  ggplot(mapping = aes(x = local_parks, y = prices)) + 
  geom_point()) / 


(parks %>% 
  ggplot(mapping = aes(x = district_parks, y = prices)) + 
  geom_point() + 

parks %>% 
  ggplot(mapping = aes(x = metropolitan_parks, 
                       y = prices)) + 
  geom_point())

Access to Parks and House Prices
Code
#parks %>% 
  #ggplot(mapping = aes(x = regional_parks, y = prices)) + 
  #geom_point()
Code
## Summary stats ----
parks %>% 
  as.data.frame() %>% 
  dplyr::select(where(is.numeric),
                -year, -HECTARES, -NONLD_AREA) %>%
  GGally::ggpairs()

Pairs Plot for Greater London Area
Code
parks %>% 
  as.data.frame() %>% 
  dplyr::select(where(is.numeric),
                -year, -HECTARES, -NONLD_AREA) %>%
  modelsummary::datasummary_skim()
Unique (#) Missing (%) Mean SD Min Median Max
open_space 358 11 50.0 26.9 0.0 51.2 100.0
local_parks 355 11 36.5 22.1 0.0 34.3 96.5
district_parks 270 11 33.0 34.5 0.0 20.0 100.0
metropolitan_parks 215 11 55.3 42.0 0.0 66.4 100.0
regional_parks 59 11 24.9 41.2 0.0 0.0 100.0
prices 336 0 535963.1 268089.3 230000.0 476500.0 2550000.0

Figure 3 displays pairs plots illustrating the relationships between house prices and access to parks in London. The focus is on understanding the variable distributions and examining correlation coefficients. Notably, the distribution of prices is right-skewed, indicating a concentration of houses with lower prices and a few with significantly higher prices. A noteworthy finding is the relatively robust and statistically significant positive correlation (0.15) between house prices and access to metropolitan parks. However, access to open spaces and district parks exhibits a negative correlation with house prices, although this correlation is not statistically significant.

5 Statistical Tests

In this section, I run a range of statistical tests to check the relationship between access to green spaces and the prices of housing in London. To run these tests, I start by dropping data that has missing values.

5.0.1 Analysis of spatial autocorrelation (e.g. Moran’s I or LISA mapping).

In this section, I test for spatial autocorrelation.

Spatial autocorrelation is a special case of correlation, which is the global concept that two attribute variables X and Y have some average degree of alignment between the relative magnitudes of their respective values (Griffith and Chun 2018).

We test the following hypothesis:

H0: The Moran 1 test is not significantly different from zero (There is no spatial autocorrelation between prices and access to green spaces). Specifically, prices are randomly distributed across boroughs following a completely random process.

H1: The Moran 1 test is significantly different from zero (There is spatial autocorrelation between prices and access to green spaces). Specifically, the distribution of house prices across boroughs is not random.

The Moran’s I test is sensitive to outliers, prompting an initial exploration of the distribution of the price variable. Observing a pronounced right skewness in the original price distribution, a logarithmic transformation is applied to bring the distribution closer to normal. Consequently, the analysis proceeds with the logarithm of prices, a more suitable representation for robust and accurate assessments (Chen 2021).

Code
(parks %>%
  ggplot(mapping = aes(x = prices)) + 
  geom_density() + 
  labs(x = "Prices", y = "Density",
       title = "Distribution of Prices in London",
       subtitle = "House prices in London are heavily skewed to the right.") |

## Distribution after logging prices ----
parks %>%
  ggplot(mapping = aes(x = prices)) + 
  geom_density() + 
  labs(x = "Prices (Log Scale)", y = "Density",
       title = "Distribution of Prices in London",
       subtitle = "House prices in London are heavily skewed to the right.") + 
  scale_x_log10())

Distribution of house prices in greater London

I additionally investigate the variation in house prices across diverse London boroughs. The presented graph highlights a considerable disparity, with Westminster having notably higher house prices, followed by Camden as a distant second. In contrast, Barking and Dagenham exhibit the lowest median house prices. The overarching objective of this analysis is to discern if this observed price differential is associated with the accessibility of green spaces.

Code
# names(parks)
parks |>
  dplyr::filter(!is.na(borough_name)) |>
  mutate(borough_name = fct_reorder(borough_name, prices, median)) |>
  ggplot(mapping = aes(y = prices, x = borough_name)) + 
  geom_boxplot(aes(fill = borough_name),
               show.legend = FALSE) + 
  geom_jitter(shape = ".") +
  labs(x = "House Price", 
       y = "",
       title = "House Prices by Borough",
       subtitle = "Westmister has the highest priced houses Barking and Dagenham is the cheapest.") + 
  coord_flip()

House prices by region

I start by defining the neighboring polygons. Below, we see that polygon 2 has 3 neighbors; 4, 7, and 8.

Code
## Define neighboring polygons ----
nb <- poly2nb(parks, queen=TRUE)

## Neighbors in second slot 
nb[[2]]
[1] 4 7 8

Next, I assign weights to each neighboring polygon. In this case, each neighboring polygon will be multiplied by the weight \(1/(Number of neighbors)\) (style="W"–note the uppercase "W") such that the sum of the weights equal 1. If a binary weight is desired (i.e. one where each neighboring polygon is a assigned a weight of 1, regardless of the number of neighbors), we set style="B".

Code
lw <- nb2listw(nb, style="W", zero.policy=TRUE)

To get the relationship between the prices in each polygon (house prices in this case), we specify the lag of prices as follows.

Code
parks$lag <- lag.listw(lw, parks$prices)

Next, I specify and run a regression model.

Code
# Create a regression model
M <- lm(log(lag) ~ log(prices), data = parks)
coef(M[1])
(Intercept) log(prices) 
       2.91        0.78 

After extracting the coefficients for the first block (coef(M[1])), we have a positive coefficient 0f 0.78. This coefficient means that as house prices in block one increase, the prices of house in neighboring boroughs also tend to rise.

Code
# Plot the data
plot(log(lag) ~ log(prices), parks, pch=21, asp=1, las=1, col = "grey40", bg="grey80")
abline(M, col="blue") 
# Add the regression line from model M
abline(v = mean(log(parks$prices)), lty=3, col = "grey80")
abline(h = mean(log(parks$prices)), lty=3, col = "grey80")

Plotting the Regression Model

The slope of the regression model is the Moran’s I coefficient. The next step will show you how to compute this statistic without needing to compute the lagged values and fitting a regression model.

We can compute the Moran 1 statistic directly, as below.

Code
moran(log(parks$prices), listw = lw, n = length(nb), S0 = Szero(lw))
$I
[1] 0.749

$K
[1] 5.04

To test the hypothesis that the Moran’s coefficient is statistically different from zero, we get the p-values, as follows.

Code
MC <- moran.mc(log(parks$prices), lw, nsim = 999)
# View results (including pseudo p-value)
MC

    Monte-Carlo simulation of Moran I

data:  log(parks$prices) 
weights: lw  
number of simulations + 1: 1000 

statistic = 0.7, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

We employ visualization to illustrate the simulation, where the curve represents the distribution of Moran I values anticipated if house prices were randomly distributed among the boroughs. The observed value, represented by the vertical line in the graph, significantly surpasses the anticipated value. This discrepancy suggests that house prices exhibit a spatial correlation, indicating a non-random distribution pattern across the boroughs.

Code
# Plot the distribution (note that this is a density plot instead of a histogram)
plot(MC, main="", las = 1)

Margin Plots

5.0.2 ANOVA

I run the analysis of variance (ANOVA). We find a statistically-significant difference in house prices according to access to parks. The access to metropolitan parks, local parks and regional parks have significant relationships with house prices. A Tukey post-hoc test revealed significant differences between access to parks and house prices. House with access to metropolitan parks, regional parks, and district parks attract a better price than house with less access (Potvin 2020).

Code
aov(prices ~ open_space + local_parks + district_parks + metropolitan_parks + regional_parks + borough_name, 
    data = parks) |> summary()
                    Df         Sum Sq      Mean Sq F value               Pr(>F)
open_space           1    20026113893  20026113893    0.69               0.4054
local_parks          1   174625041071 174625041071    6.05               0.0143
district_parks       1     1594639734   1594639734    0.06               0.8143
metropolitan_parks   1   561858578790 561858578790   19.46             0.000013
regional_parks       1   273514175675 273514175675    9.47               0.0022
borough_name        24 11884678030747 495194917948   17.15 < 0.0000000000000002
Residuals          439 12676478957046  28875806280                             
                      
open_space            
local_parks        *  
district_parks        
metropolitan_parks ***
regional_parks     ** 
borough_name       ***
Residuals             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
56 observations deleted due to missingness

5.0.3 Linear Regression Models

Similar to the analysis of variance (ANOVA) discussed earlier, the regression analysis reaffirms that houses with access to parks generally command higher prices. Specifically, the model indicates that, holding all other factors constant, houses with access to metropolitan parks are, on average, £931 more expensive than comparable houses without such access. Similarly, houses with access to local parks carry an average premium of £836 compared to similar houses without access to local parks. Access to regional parks also plays a significant role, correlating with an average increase of £595. The model itself demonstrates statistical significance at a 1% level. However, it is noteworthy that the model exhibits limited explanatory power, accounting for only 2% of the variability in house prices. This suggests that, while park access contributes to pricing, other factors such as location and house type play pivotal roles in determining house prices.

Code
lm(prices ~ open_space + local_parks + district_parks + metropolitan_parks + regional_parks + borough_name, 
   data = parks) |> summary()

Call:
lm(formula = prices ~ open_space + local_parks + district_parks + 
    metropolitan_parks + regional_parks + borough_name, data = parks)

Residuals:
    Min      1Q  Median      3Q     Max 
-573086  -76445  -19350   54350 1397757 

Coefficients:
                                   Estimate Std. Error t value
(Intercept)                          344210      53809    6.40
open_space                             -910        404   -2.25
local_parks                            -861        433   -1.99
district_parks                          513        261    1.97
metropolitan_parks                      106        227    0.47
regional_parks                          589        310    1.90
borough_nameBarnet                   214962      57108    3.76
borough_nameBrent                    173580      56976    3.05
borough_nameBromley                  137679      55171    2.50
borough_nameCamden                   489717      59377    8.25
borough_nameEaling                   191694      55544    3.45
borough_nameEnfield                   65442      59762    1.10
borough_nameGreenwich                123654      59262    2.09
borough_nameHammersmith and Fulham   408569      62947    6.49
borough_nameHaringey                 229032      61122    3.75
borough_nameHarrow                   214019      57507    3.72
borough_nameHavering                  78506      59146    1.33
borough_nameHillingdon               104460      56398    1.85
borough_nameHounslow                  91065      57952    1.57
borough_nameIslington                358784      66975    5.36
borough_nameKingston upon Thames     180087      65979    2.73
borough_nameLambeth                  278536      57619    4.83
borough_nameLewisham                 125622      58090    2.16
borough_nameMerton                   231260      60944    3.79
borough_nameNewham                    21957      59033    0.37
borough_nameRichmond upon Thames     336676      61742    5.45
borough_nameSutton                    74195      59530    1.25
borough_nameWaltham Forest           100113      67722    1.48
borough_nameWandsworth               266596      61422    4.34
borough_nameWestminster              825809      58950   14.01
                                               Pr(>|t|)    
(Intercept)                          0.0000000004069779 ***
open_space                                      0.02470 *  
local_parks                                     0.04765 *  
district_parks                                  0.05002 .  
metropolitan_parks                              0.64032    
regional_parks                                  0.05761 .  
borough_nameBarnet                              0.00019 ***
borough_nameBrent                               0.00245 ** 
borough_nameBromley                             0.01294 *  
borough_nameCamden                   0.0000000000000019 ***
borough_nameEaling                              0.00061 ***
borough_nameEnfield                             0.27409    
borough_nameGreenwich                           0.03751 *  
borough_nameHammersmith and Fulham   0.0000000002313356 ***
borough_nameHaringey                            0.00020 ***
borough_nameHarrow                              0.00022 ***
borough_nameHavering                            0.18509    
borough_nameHillingdon                          0.06467 .  
borough_nameHounslow                            0.11681    
borough_nameIslington                0.0000001368527817 ***
borough_nameKingston upon Thames                0.00660 ** 
borough_nameLambeth                  0.0000018523939757 ***
borough_nameLewisham                            0.03112 *  
borough_nameMerton                              0.00017 ***
borough_nameNewham                              0.71012    
borough_nameRichmond upon Thames     0.0000000828324851 ***
borough_nameSutton                              0.21330    
borough_nameWaltham Forest                      0.14005    
borough_nameWandsworth               0.0000176679369815 ***
borough_nameWestminster            < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 170000 on 439 degrees of freedom
  (56 observations deleted due to missingness)
Multiple R-squared:  0.505, Adjusted R-squared:  0.472 
F-statistic: 15.4 on 29 and 439 DF,  p-value: <0.0000000000000002

5.0.4 Correlations or chi-square tests of association

In this section, I rerun the correlation tests. There is a significant correlation between house prices and access to metropolitan parks. District parks and open spaces have a negative correlation with prices, although this correlation is not statistically significant. There is also a notable degree of skewness in prices and access to parks.

Code
## Pairs plots ----
parks %>% 
  as.data.frame() %>% 
  dplyr::select(where(is.numeric),
                -HECTARES, -NONLD_AREA, 
                -year) %>%
  GGally::ggpairs()

Pairs Plot for all London

6 PART B: INNER LONDON WARDS

Code
inner_parks <- parks %>% 
  dplyr::filter(area == "inner_london")

We repeat the same analysis for inner London boroughs. Inner London consists of 139, starting with the visualization and summary of data.

6.1 Data Exploration and Visualization

In the case of inner London boroughs, we have a set of data with 139 observations of 18 variables. I start by mapping the boroughs of inner London and the corresponding house prices. We see areas of high concentration of high cost house in the North West of London. The South East of London has the bulk of lower priced houses.

Code
## Draw a map ----

tmap::tm_shape(inner_parks) +
  tm_fill(col = 'prices',
          style = "pretty",
          palette = colors) + 
  tm_legend(outside = TRUE, text.size = 0.8) + 
  tm_layout(frame = FALSE)

House Price in Inner London Boroughs
Code
inner_parks %>%
  as.data.frame() %>%
  dplyr::select(prices, open_space, 
         local_parks, district_parks, 
         metropolitan_parks, regional_parks) %>%
  modelsummary::datasummary_skim()
Unique (#) Missing (%) Mean SD Min Median Max
prices 120 0 646120.5 320290.5 259500.0 557500.0 2550000.0
open_space 127 0 61.7 29.2 0.4 67.7 100.0
local_parks 123 0 41.3 25.5 0.0 42.5 96.5
district_parks 82 0 26.9 32.7 0.0 5.7 100.0
metropolitan_parks 66 0 71.3 36.8 0.0 93.6 100.0
regional_parks 20 0 18.6 37.2 0.0 0.0 100.0
Code
inner_parks %>%
  as.data.frame() %>%
  dplyr::select(prices, open_space, 
         local_parks, district_parks, 
         metropolitan_parks, regional_parks) %>%
  GGally::ggpairs()

Pairs Plot of house prices in Inner London

6.2 Statistical Tests

In this section, I run a range of statistical tests to check the relationship between access to green spaces and the prices of housing in Inner London.

6.2.1 Analysis of spatial autocorrelation (e.g. Moran’s I or LISA mapping).

In this analysis of spatial autocorrelation (Moran’s 1 test), I start by defining the neighboring polygons.

In this analysis, we test the following hypothesis:

H0: The Moran 1 test is not significantly different from zero (There is no spatial autocorrelation between prices and access to green spaces).

H1: The Moran 1 test is significantly different from zero (There is spatial autocorrelation between prices and access to green spaces).

We see that polygon 2 has 3 neighbors; 1, 3, and 5.

Code
## Define neighboring polygons ----
nb1 <- poly2nb(inner_parks, queen=TRUE)

## Neighbors in first slot 
nb1[[2]]
[1] 1 3 5

I start by examining the distribution of house prices in inner London boroughs. Like in the case of outer London, the prices are right skewed. because Moran’s 1 test is sensitive to outliers, we work with the logarithm of prices. Westmister is still the most expensive area of inner London as was the case for the whole of London. Greenwich is the cheapest area of Inner London.

Code
(inner_parks %>%
  ggplot(mapping = aes(x = prices)) + 
  geom_density() + 
  labs(x = "Prices", y = "Density",
       title = "Distribution of Prices in London",
       subtitle = "House prices in London are heavily skewed to the right.") |

## Distribution after logging prices ----
inner_parks %>%
  ggplot(mapping = aes(x = prices)) + 
  geom_density() + 
  labs(x = "Prices (Log Scale)", y = "Density",
       title = "Distribution of Prices in London",
       subtitle = "House prices in London are Relatively Normal after Taking Logs.") + 
  scale_x_log10())

Distribution of house prices in Inner London
Code
# names(parks)
inner_parks |>
  dplyr::filter(!is.na(borough_name)) |>
  mutate(borough_name = fct_reorder(borough_name, prices, median)) |>
  ggplot(mapping = aes(y = prices, x = borough_name)) + 
  geom_boxplot(aes(fill = borough_name),
               show.legend = FALSE) + 
  geom_jitter(shape = ".") +
  labs(x = "House Price", 
       y = "",
       title = "House Prices by Borough: Inner London",
       subtitle = "In inner London, Westmister has the highest priced houses Greenwich is the cheapest.")

House prices by region in inner London

As previously, we assign weights to each of the polygons.

Code
lw1 <- nb2listw(nb1, style="W", zero.policy=TRUE)

To get the relationship between the prices in each polygon (house prices in this case), we specify the lag of prices as follows.

Code
inner_parks$lag <- lag.listw(lw1, inner_parks$prices)

Plotting the prices against the lags shows strong positive spatial autocorrelation in prices. Positive spatial autocorrelation means that geographically nearby values of a variable tend to be similar on a map: high values tend to be located near high values, medium values near medium values, and low values near low values. Indeed the map also collaborates this information as we see concentrations of houses of roughly equal prices clustered together.

After running the regression, the coefficient of block 1 is 0.739. This coefficient means that as prices in block A go up, prices in the neighboring blocks also tend to go up.

Code
# Create a regression model
M1 <- lm(log(lag) ~ log(prices), data = inner_parks)
coef(M1[1])
(Intercept) log(prices) 
      3.507       0.739 

I plot the simulation model.

Code
# Plot the data
plot(log(lag) ~ log(prices), inner_parks, pch=21, asp=1, las=1, col = "grey40", bg="grey80")
abline(M, col="blue") 
# Add the regression line from model M
abline(v = mean(log(inner_parks$prices)), lty=3, col = "grey80")
abline(h = mean(log(inner_parks$prices)), lty=3, col = "grey80")

Spatial Correlation Plots

The slope of the regression model is the Moran’s I coefficient (Moraga 2023). Next, we compute this statistic without needing to compute the lagged values and fitting a regression model.

We can compute the Moran 1 statistic directly, as below.

Code
moran(log(inner_parks$prices), listw = lw1, n = length(nb), S0 = Szero(lw))
$I
[1] 0.696

$K
[1] 4.5

We see that the prices are spatially autocorrelated with a sptial correlation coefficient of 0.696. Thus, there is a high degree of correspondence between prices of houses in the same neighborhood. Testing the hypothesis whether the spatial correlation is statistically significant yields a positive. The spatial correlation between prices is significantly different from zero.

Code
MC1 <- moran.mc(inner_parks$prices, lw1, nsim = 999)
# View results (including pseudo p-value)
MC1

    Monte-Carlo simulation of Moran I

data:  inner_parks$prices 
weights: lw1  
number of simulations + 1: 1000 

statistic = 0.5, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Running the simulation yields figure 14 below. The curve shows the distribution of Moran I values we could expect had the house prices been randomly distributed across the boroughs. Our observed value (vertical line in the graph below) is far above the expected value, which implies that the house prices are spatially correlated.

Code
# Plot the distribution (note that this is a density plot instead of a histogram)
plot(MC1, main="", las = 1)

Margin Plots

6.2.2 ANOVA

The ANOVA results for Inner London mirror the results for all of London. There is a significant difference in house prices based on their access to local parks and metropolitan parks.

Code
aov(prices ~ open_space + local_parks + district_parks + metropolitan_parks + regional_parks + borough_name, 
   data = inner_parks) |> summary()
                    Df        Sum Sq       Mean Sq F value          Pr(>F)    
open_space           1 1297589138345 1297589138345   23.27 0.0000039810008 ***
local_parks          1  151426292011  151426292011    2.72         0.10183    
district_parks       1   15039883504   15039883504    0.27         0.60440    
metropolitan_parks   1  645989180350  645989180350   11.59         0.00089 ***
regional_parks       1  134464944819  134464944819    2.41         0.12293    
borough_name         7 4887568257811  698224036830   12.52 0.0000000000041 ***
Residuals          126 7024789694248   55752299161                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.2.3 Regression Model

As was the case for the whole of London, the regression model for inner London shows that access to metropolitan parks and local parks have a positive relationship with house prices in inner London. All else remaining the same, a one unit rise in access to metro parks corresponds to 1983 Pounds rise in average prices. Local parks are even more important with a unit rise of access to local parks corresponding to 2034 pounds rise in prices, ceteris paribus. Access to open spaces in inner London have a negative association with prices. A unit rise in access to open spaces is associated with a 2960 Pounds drop in prices on average, all else remaining the same. Access to regional parks and district parks also have a positive relationship with prices, although this price is not statistically significant.

Code
lm(prices ~ open_space + local_parks + district_parks + metropolitan_parks + regional_parks + borough_name, 
   data = inner_parks) |> summary()

Call:
lm(formula = prices ~ open_space + local_parks + district_parks + 
    metropolitan_parks + regional_parks + borough_name, data = inner_parks)

Residuals:
    Min      1Q  Median      3Q     Max 
-564404  -91204  -26835   56426 1354164 

Coefficients:
                                   Estimate Std. Error t value      Pr(>|t|)
(Intercept)                          793770     112640    7.05 0.00000000011
open_space                             -587        976   -0.60        0.5486
local_parks                           -1240       1006   -1.23        0.2202
district_parks                         1375        693    1.98        0.0495
metropolitan_parks                      411        622    0.66        0.5101
regional_parks                          275        797    0.35        0.7306
borough_nameGreenwich               -403990      91925   -4.39 0.00002328967
borough_nameHammersmith and Fulham   -68713      98686   -0.70        0.4875
borough_nameIslington               -113860      99996   -1.14        0.2570
borough_nameLambeth                 -224096      83938   -2.67        0.0086
borough_nameLewisham                -360047      83191   -4.33 0.00003036697
borough_nameWandsworth              -232048      99844   -2.32        0.0217
borough_nameWestminster              343738      80764    4.26 0.00004027530
                                      
(Intercept)                        ***
open_space                            
local_parks                           
district_parks                     *  
metropolitan_parks                    
regional_parks                        
borough_nameGreenwich              ***
borough_nameHammersmith and Fulham    
borough_nameIslington                 
borough_nameLambeth                ** 
borough_nameLewisham               ***
borough_nameWandsworth             *  
borough_nameWestminster            ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 236000 on 126 degrees of freedom
Multiple R-squared:  0.504, Adjusted R-squared:  0.457 
F-statistic: 10.7 on 12 and 126 DF,  p-value: 0.0000000000000246

6.2.4 Correlations or chi-square tests of association

In this section, we examine the correlation coefficients between house prices and access to green spaces in the inner London boroughs. Again, we see a significant correlation between house prices and access to parks. District parks and open spaces have a negative correlation with house prices. However, the correlation is not significant.

Code
inner_parks %>%
  as.data.frame() %>%
  dplyr::select(where(is.numeric), 
                -HECTARES, -NONLD_AREA, - year) %>%
  GGally::ggpairs()

Pairs Plot for Inner London

7 Conclusion

In this analysis, we have examined the relationship between house prices and access to parks and open spaces. We find that house prices are spatially correlated- with the highest priced houses in Westminster and the lowest priced in Barking and Dagenham. Secondly, there is a statistically significant correlation between house prices and access to local parks. Besides, the access to parks has a significant relationship with house prices, especially access to local parks, regional parks, and metropolitan parks. The regression models have low explanatory power, probably because there are other factors beyond access to parks that determine the prices of houses.

References

Chen, Yanguang. 2021. “An Analytical Process of Spatial Autocorrelation Functions Based on Moran’s Index.” PLoS One 16 (4): e0249589.
Griffith, Daniel A., and Yongwan Chun. 2018. “GIS and Spatial Statistics/Econometrics: An Overview.” In, 1–26. Elsevier. https://doi.org/10.1016/b978-0-12-409548-9.09680-9.
Moraga, Paula. 2023. Spatial Statistics for Data Science: Theory and Practice with r. CRC Press.
Potvin, Catherine. 2020. “ANOVA: Experiments in Controlled Environments.” In Design and Analysis of Ecological Experiments, 46–68. Chapman; Hall/CRC.