The relationship between property prices and proximity to public transport is a well-documented phenomenon in urban studies. Properties located near public transport hubs often command higher prices due to the convenience and accessibility they offer. Understanding this relationship can provide valuable insights for real estate investors, urban planners, and policymakers.
This project aims to explore the correlation between property prices and public transport proximity in Warsaw, Poland. By leveraging spatial data analysis techniques, we will:
Measure Proximity to Public Transport: Calculate the distance from each property to the nearest public transport point.
Evaluate the Impact on Property Prices: Analyze how proximity to public transport influences property prices.
Data Collection and Preparation: Gather and preprocess data on property prices and public transport locations in Warsaw.
Proximity Analysis: Calculate the distance from each property to the nearest public transport point.
Clustering: Extend the distance with public transport point analysis /w weights.
Correlation Analysis: Examine the relationship between property prices and public transport proximity.
Modeling: Develop a regression model to examine the relation between the two main factors.
Visualization: Visualize the results to identify patterns and trends.
This study uses the following data:
# sf: For handling spatial data and performing geometric operations
library(sf)
# jsonlite: For reading JSON files (e.g., public transport data)
library(jsonlite)
# dplyr: For data manipulation and transformation
library(dplyr)
# ggplot2: For creating visualizations and plots
library(ggplot2)
# tmap: For thematic mapping and spatial data visualization
library(tmap)
# tidyr: For reshaping and tidying data
library(tidyr)
# geosphere: For calculating geographic distances (e.g., Haversine distance)
library(geosphere)
# corrplot: For visualizing correlation matrices
library(corrplot)
# gridExtra: For arranging multiple plots in a grid
library(gridExtra)
# GGally: For creating advanced correlation and scatterplot matrices
library(GGally)
# lmtest: For statistical hypothesis testing in regression models
library(lmtest)
# sandwich: For computing robust standard errors in regression analysis
library(sandwich)
# stargazer: For generating formatted regression tables
library(stargazer)
# spdep: For spatial econometric modeling and Moran's I test
library(spdep)
# spatialreg: For estimating spatial regression models (SAR, SEM)
library(spatialreg)
warsaw_districts <- st_read("dzielnice_Warszawy.shp") #for viz
## Reading layer `dzielnice_Warszawy' from data source
## `C:\Users\yayec\Documents\ARUE_Kubara\project\ARUE\dzielnice_Warszawy.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 626505.9 ymin: 472229.5 xmax: 655260.3 ymax: 502172.4
## Projected CRS: ETRS89_Poland_CS92
tbus_data <- fromJSON("bus_tram_stops.customization") #self expanatory
metro_data <- fromJSON("metro_stops.customization") #same as above
properties1 <- read.csv('apartments_pl_2024_06.csv') #apartment prices with geolocation
The first step involves loading and cleaning the public transport data, including bus/tram stops and metro stations. The data is transformed into a consistent format for further analysis.
extract_row <- function(row) {
row %>%
pivot_wider(names_from = key, values_from = value)
}
tbus_values <- tbus_data$result$values
tbus_clean <- tbus_values %>%
lapply(extract_row) %>%
bind_rows()
tbus_clean <- tbus_clean %>%
mutate(
szer_geo = as.numeric(szer_geo),
dlug_geo = as.numeric(dlug_geo)
)
feature_list <- metro_data$result$featureMemberList
coordinates <- feature_list$geometry$coordinates %>%
bind_rows() %>%
rename(latitude = latitude, longitude = longitude)
properties <- feature_list$properties %>%
bind_rows() %>%
rename(OBJECTID = value)
metro_df <- cbind(coordinates, properties)
metro_df <- metro_df %>%
mutate(transport_type = "M")
tbus_clean <- tbus_clean %>%
mutate(transport_type = "T/B") %>%
mutate(szer_geo = as.numeric(szer_geo),
dlug_geo = as.numeric(dlug_geo))
metro_df <- metro_df %>%
mutate(latitude = as.numeric(latitude),
longitude = as.numeric(longitude))
public_transport <- bind_rows(
tbus_clean %>% select(latitude = szer_geo, longitude = dlug_geo, transport_type),
metro_df %>% select(latitude, longitude, transport_type)
)
To analyze the spatial distribution of public transport points, clustering is performed. Points within a 500-meter radius are grouped into clusters, and centroids are calculated for each cluster. The weight of each centroid is determined by the number of points in its cluster.
public_transport_sf <- st_as_sf(public_transport, coords = c("longitude", "latitude"), crs = 4326)
public_transport_sf <- st_transform(public_transport_sf, crs = 32633)
distance_matrix <- st_distance(public_transport_sf)
radius <- 500
clusters <- list()
for (i in 1:nrow(public_transport_sf)) {
nearby_points <- which(as.numeric(distance_matrix[i, ]) <= radius)
clusters[[i]] <- nearby_points
}
clusters <- unique(clusters)
centroids <- lapply(clusters, function(cluster_indices) {
cluster_points <- public_transport_sf[cluster_indices, ]
centroid <- st_centroid(st_union(cluster_points))
return(centroid)
})
centroids <- do.call(c, centroids)
centroids <- st_as_sf(centroids)
cluster_counts <- sapply(clusters, length)
centroids$weight <- cluster_counts
The property data is filtered to include only listings in Warsaw. A function is created to calculate the minimum distance from each property to the nearest public transport point and identify the type of transport.
properties_warsaw <- properties1 %>%
filter(city == "warszawa")
calculate_min_distance_and_type <- function(prop_lat, prop_lon, transport_df) {
distances <- distHaversine(
c(prop_lon, prop_lat),
transport_df %>% select(longitude, latitude)
)
min_index <- which.min(distances)
list(
min_distance = min(distances),
transport_type = transport_df$transport_type[min_index]
)
}
properties_warsaw <- properties_warsaw %>%
rowwise() %>%
mutate(
min_distance = calculate_min_distance_and_type(latitude, longitude, public_transport)$min_distance,
nearest_transport_type = calculate_min_distance_and_type(latitude, longitude, public_transport)$transport_type
) %>%
ungroup()
To better capture the accessibility of properties to public transport, centroid-based scores are calculated. These scores incorporate both the distance to the nearest centroid and the weight of the centroid, reflecting the density of transport points in the area. They are calculated as centroid weight divided by distance from property + 1. The 1 serves as a measure to prevent rapid score explosion, and thereby bias.
centroids_sf <- st_as_sf(centroids, coords = c("longitude", "latitude"), crs = 32633)
centroids_sf <- st_transform(centroids_sf, crs = 4326)
centroids_coords <- st_coordinates(centroids_sf)
centroids <- centroids %>%
mutate(
longitude = centroids_coords[, "X"],
latitude = centroids_coords[, "Y"]
) %>%
st_drop_geometry()
calculate_centroid_score <- function(prop_lat, prop_lon, centroids_df) {
distances <- distHaversine(
c(prop_lon, prop_lat),
centroids_df %>% select(longitude, latitude)
)
scores <- centroids_df$weight / (distances + 1)
max_score_index <- which.max(scores)
list(
centroid_score = max(scores),
centroid_distance = distances[max_score_index],
centroid_weight = centroids_df$weight[max_score_index]
)
}
properties_warsaw <- properties_warsaw %>%
rowwise() %>%
mutate(
centroid_score = calculate_centroid_score(latitude, longitude, centroids)$centroid_score,
centroid_distance = calculate_centroid_score(latitude, longitude, centroids)$centroid_distance,
centroid_weight = calculate_centroid_score(latitude, longitude, centroids)$centroid_weight
) %>%
ungroup()
warsaw_bbox <- st_bbox(warsaw_districts)
zoom_out_factor <- 1.05
expanded_bbox <- warsaw_bbox * c(1/zoom_out_factor, 1/zoom_out_factor, zoom_out_factor, zoom_out_factor)
tm_shape(warsaw_districts, bbox = expanded_bbox) +
tm_polygons(col = "lightgray", border.col = "white") +
tm_shape(public_transport_sf) +
tm_dots(col = "transport_type", palette = "Set1", size = 0.06, legend.show = FALSE) +
tm_layout(main.title = "Public Transport Points in Warsaw")
## This function is deprecated; please use cols4all::c4a() instead
centroids_sf <- st_as_sf(centroids, coords = c("longitude", "latitude"), crs = 4326)
centroids_sf <- st_transform(centroids_sf, crs = st_crs(warsaw_districts))
ggplot() +
geom_sf(data = warsaw_districts, fill = "lightgray", color = "black") +
geom_sf(data = centroids_sf, aes(size = weight), color = "red", alpha = 0.6) + # Plot centroids
scale_size_continuous(range = c(0.05, 3)) + # Adjust the size of the centroids
labs(title = "Cluster Centroids in Warsaw",
size = "Centroid Weight") +
theme_minimal()
properties_warsaw_sf <- st_as_sf(properties_warsaw, coords = c("longitude", "latitude"), crs = 4326)
properties_warsaw_sf <- st_transform(properties_warsaw_sf, crs = st_crs(warsaw_districts))
ggplot() +
geom_sf(data = warsaw_districts, fill = "lightgray", color = "white") +
geom_sf(data = properties_warsaw_sf, aes(color = price), size = 1, alpha = 0.6) +
scale_color_viridis_c(
option = "plasma",
name = "Price",
breaks = seq(0, 3000000, by = 1000000)
) +
labs(title = "Properties in Warsaw Colored by Price",
subtitle = "Overlay on Warsaw Districts",
caption = "Data: properties_warsaw") +
theme_minimal() +
theme(legend.position = "bottom")
By visually examining property prices, we can note that they are related to proximity to the city center. The Wawer district also presents some high prices, likely stemming from its emptiness, enabling investors to construct bigger properties.
ggplot() +
geom_sf(data = warsaw_districts, fill = "lightgray", color = "white") + # Warsaw districts
geom_sf(
data = properties_warsaw_sf,
aes(
color = centroid_score,
size = centroid_score,
alpha = centroid_score
)
) +
scale_color_viridis_c(
option = "plasma",
name = "Centroid Score",
rescaler = ~ scales::rescale_mid(.x, mid = median(properties_warsaw_sf$centroid_score)) # Emphasize higher values
) +
scale_size_continuous(
range = c(0.5, 3), # Size range for points (smaller to larger)
guide = "none" # Hide size legend to avoid clutter
) +
scale_alpha_continuous(
range = c(0.3, 1), # Alpha range (more transparent to more opaque)
guide = "none" # Hide alpha legend to avoid clutter
) +
labs(
title = "Properties in Warsaw Colored by Centroid Score",
subtitle = "Higher scores are more visible",
caption = "Data: properties_warsaw"
) +
theme_minimal() +
theme(legend.position = "bottom")
properties_with_districts <- st_join(properties_warsaw_sf, warsaw_districts)
# Calculate average centroid score by district
avg_centroid_by_district <- properties_with_districts %>%
group_by(nazwa_dzie) %>% # Replace `district_name` with the actual column name for district names
summarise(avg_centroid_score = mean(centroid_score, na.rm = TRUE)) %>%
ungroup()
# Join average scores back to the districts shapefile
warsaw_districts_with_avg <- warsaw_districts %>%
st_join(avg_centroid_by_district, by = "nazwa_dzie") # Replace `district_name` with the actual column name
ggplot() +
geom_sf(
data = warsaw_districts_with_avg,
aes(fill = avg_centroid_score), # Fill districts by average centroid score
color = "white", # District boundaries
size = 0.2
) +
scale_fill_viridis_c(
option = "plasma", # Use a high-contrast color scale
name = "Avg. Centroid Score",
na.value = "lightgray" # Color for districts with no data
) +
labs(
title = "Average Centroid Score by District in Warsaw",
subtitle = "Higher scores indicate better access to public transport",
caption = "Data: properties_warsaw"
) +
theme_minimal() +
theme(legend.position = "bottom")
Visual examination of the average centroid score by district reveals
some surprises. Generally, the scores go lower the further from the city
center, with the Ursus district being the exception. Empirically
speaking, this district is well connected with buses, with many of them
connecting Warsaw with Piastów and Pruszków. Ursus also had some of the
most point counts in both centroids and properties, which might also
inflate the score. The property counts may stem from massive housing
projects in the Szamoty subarea of Ursus (one of the authors grew up in
Ursus, hence the insider knowledge).
# Count properties per district
properties_per_district <- properties_warsaw_sf %>%
st_join(warsaw_districts) %>%
group_by(nazwa_dzie) %>%
summarise(property_count = n()) %>%
st_drop_geometry()
# Calculate area of each district in square kilometers
districts_with_area <- warsaw_districts %>%
mutate(area_km2 = as.numeric(st_area(geometry)) / 1000000)
# Join counts with districts and calculate density
districts_with_density <- districts_with_area %>%
left_join(properties_per_district, by = "nazwa_dzie") %>%
mutate(density = property_count / area_km2)
# Create the map
ggplot(districts_with_density) +
geom_sf(aes(fill = density), color = "white") +
scale_fill_viridis_c(
name = "Properties per km²",
option = "plasma",
direction = -1
) +
theme_minimal() +
labs(
title = "Property Density in Warsaw Districts",
subtitle = "Number of properties per square kilometer"
) +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.text = element_text(size = 8)
)
The map illustrates the spatial distribution of property density across Warsaw’s districts. Central districts exhibit the highest property density, as indicated by the dark purple shades, while peripheral areas display significantly lower density. This pattern aligns with urban development trends, where housing supply is concentrated near the city center, reflecting higher demand and accessibility to key amenities.
Firstly, we remove NA values for the accessibility variables.
properties_warsaw <- properties_warsaw %>%
mutate(across(where(is.character), ~ ifelse(. == "", NA, .)))
# Remove rows where any of the specified columns have NA values
properties_warsaw <- properties_warsaw %>%
filter(!is.na(clinicDistance) &
!is.na(postOfficeDistance) &
!is.na(kindergartenDistance) &
!is.na(restaurantDistance) &
!is.na(collegeDistance) &
!is.na(pharmacyDistance))
properties_warsaw <- properties_warsaw %>%
drop_na(clinicDistance, postOfficeDistance, kindergartenDistance,
restaurantDistance, collegeDistance, pharmacyDistance)
# Check if missing values still exist
colSums(is.na(properties_warsaw))
## id city type
## 0 0 1555
## squareMeters rooms floor
## 0 0 978
## floorCount buildYear latitude
## 63 637 0
## longitude centreDistance poiCount
## 0 0 0
## schoolDistance clinicDistance postOfficeDistance
## 0 0 0
## kindergartenDistance restaurantDistance collegeDistance
## 0 0 0
## pharmacyDistance ownership buildingMaterial
## 0 0 2928
## condition hasParkingSpace hasBalcony
## 5069 0 0
## hasElevator hasSecurity hasStorageRoom
## 260 0 0
## price min_distance nearest_transport_type
## 0 0 0
## centroid_score centroid_distance centroid_weight
## 0 0 0
Let’s explore the effect of min distance to Nearest Public Transport (meters) on Property Price.
ggplot(properties_warsaw, aes(x = min_distance, y = price/squareMeters)) +
geom_point(alpha = 0.5) +
labs(title = "Property Prices vs. Distance to Public Transport",
subtitle = "Effect of Transport Accessibility on Price per Square Meter",
x = "Distance (min) to Nearest Public Transport (meters)",
y = "Property Price") +
theme_minimal()
correlation <- cor(properties_warsaw$min_distance, as.numeric(properties_warsaw$price)/properties_warsaw$squareMeters, method = "pearson")
print(paste("Correlation between distance and price:", correlation))
## [1] "Correlation between distance and price: -0.0960819284388735"
A visible trend suggests that properties closer to public transport hubs tend to have higher prices per square meter. However, the distribution is quite dense, indicating that while accessibility is an important factor, other variables also significantly influence pricing. The presence of some outliers suggests further investigation into high-priced properties that deviate from the general trend.
Let’s explore the effect of center distance on Property Price.
ggplot(properties_warsaw, aes(x = centreDistance, y = price/squareMeters)) +
geom_point(alpha = 0.5) +
labs(title = "Property Prices vs. Distance to Center",
subtitle = "Effect of Center Distance on Price per Square Meter",
x = "Center Distance",
y = "Property Price") +
theme_minimal()
correlation <- cor(properties_warsaw$centreDistance, as.numeric(properties_warsaw$price)/properties_warsaw$squareMeters, method = "pearson")
print(paste("Correlation between Centre Distance and price:", correlation))
## [1] "Correlation between Centre Distance and price: -0.46652755693178"
The scatter plot demonstrates a clear negative relationship between property prices and distance from the city center. Properties located closer to the center tend to have higher prices per square meter, while those farther away show a decline in price. This pattern aligns with the expectation that central locations offer greater accessibility and amenities, making them more desirable and expensive.
Correlation Matrix for Property Prices and Accessibility
selected_vars <- properties_warsaw %>%
select(price_per_m2 = price/squareMeters, centreDistance, min_distance,
centroid_score, restaurantDistance, clinicDistance,
postOfficeDistance, kindergartenDistance, pharmacyDistance) %>%
drop_na()
cor_matrix <- cor(selected_vars, use = "pairwise.complete.obs", method = "pearson")
# Correlation matrix (heatmap)
corrplot(cor_matrix, method = "color", type = "upper",
order = "hclust", addCoef.col = "black", tl.col = "black",
tl.srt = 45, number.cex = 0.8, tl.cex = 0.8)
Scatter plots with correlation matrix for Property Prices and
Accessibility
selected_vars <- selected_vars %>%
mutate(across(everything(), as.numeric))
set.seed(123)
selected_sample <- selected_vars %>% sample_n(min(500, nrow(selected_vars)))
# Correlation graphs
ggpairs(selected_sample,
lower = list(continuous = wrap("smooth", method = "lm", color = "red")),
upper = list(continuous = wrap("cor", method = "pearson", size = 4)),
diag = list(continuous = wrap("densityDiag", alpha = 0.5))) +
theme_minimal()
The correlation matrix and scatter plots above explore the relationships between property prices (price_per_m2) and key accessibility factors such as distance to the city center, public transport, and urban amenities.
lm_linear_raw <- lm(price/squareMeters ~ centreDistance + min_distance + centroid_score +
restaurantDistance + clinicDistance + postOfficeDistance +
kindergartenDistance + pharmacyDistance, data = properties_warsaw)
lm_linear <- coeftest(lm_linear_raw, vcov = vcovHC(lm_linear_raw, type = "HC1"))
lm_linear
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21331.99071 123.48413 172.7509 < 2.2e-16 ***
## centreDistance -566.70278 16.17771 -35.0299 < 2.2e-16 ***
## min_distance 3.12439 0.37842 8.2565 < 2.2e-16 ***
## centroid_score 234.21914 118.79576 1.9716 0.048695 *
## restaurantDistance -3896.56209 239.02551 -16.3019 < 2.2e-16 ***
## clinicDistance -191.10631 61.40101 -3.1124 0.001863 **
## postOfficeDistance 1072.06220 160.65101 6.6732 2.700e-11 ***
## kindergartenDistance 1542.23617 252.35174 6.1115 1.042e-09 ***
## pharmacyDistance 606.48502 231.82430 2.6161 0.008913 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
properties_warsaw <- properties_warsaw %>%
mutate(centreDistance_2 = centreDistance^2)
lm_quad_raw <- lm(price/squareMeters ~ centreDistance + centreDistance_2 + min_distance + centroid_score+
restaurantDistance + clinicDistance + postOfficeDistance +
kindergartenDistance + pharmacyDistance, data = properties_warsaw)
lm_quad <- coeftest(lm_quad_raw, vcov = vcovHC(lm_quad_raw, type = "HC1"))
lm_quad
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23187.09253 177.67383 130.5037 < 2.2e-16 ***
## centreDistance -1276.31661 50.83611 -25.1065 < 2.2e-16 ***
## centreDistance_2 55.72631 3.66795 15.1928 < 2.2e-16 ***
## min_distance 2.96481 0.36885 8.0380 1.070e-15 ***
## centroid_score 13.67036 110.14706 0.1241 0.90123
## restaurantDistance -3871.50673 236.78853 -16.3501 < 2.2e-16 ***
## clinicDistance -266.14226 60.97064 -4.3651 1.290e-05 ***
## postOfficeDistance 1152.27071 159.88397 7.2069 6.344e-13 ***
## kindergartenDistance 1461.41900 244.74229 5.9713 2.474e-09 ***
## pharmacyDistance 565.77052 229.15726 2.4689 0.01358 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the significant results, we removed centroid_score.
lm_coord_raw <- lm(price/squareMeters ~ centreDistance + centreDistance_2 + min_distance +
restaurantDistance + clinicDistance + postOfficeDistance +
kindergartenDistance + pharmacyDistance +
longitude + latitude,
data = properties_warsaw)
lm_coord <- coeftest(lm_coord_raw, vcov = vcovHC(lm_coord_raw, type = "HC1"))
lm_coord
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.5554e+05 4.6063e+04 18.5733 < 2.2e-16 ***
## centreDistance -1.3456e+03 4.9521e+01 -27.1725 < 2.2e-16 ***
## centreDistance_2 5.9170e+01 3.5680e+00 16.5835 < 2.2e-16 ***
## min_distance 1.9847e+00 3.5888e-01 5.5303 3.316e-08 ***
## restaurantDistance -2.9200e+03 2.3906e+02 -12.2143 < 2.2e-16 ***
## clinicDistance -1.7445e+02 5.9276e+01 -2.9431 0.003261 **
## postOfficeDistance 9.8358e+02 1.5670e+02 6.2767 3.672e-10 ***
## kindergartenDistance 1.0492e+03 2.3381e+02 4.4872 7.337e-06 ***
## pharmacyDistance 7.3966e+02 2.2483e+02 3.2898 0.001008 **
## longitude -8.8337e+02 6.1819e+02 -1.4290 0.153058
## latitude -1.5577e+04 7.9253e+02 -19.6544 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lm_lat_raw <- lm(price/squareMeters ~ centreDistance + centreDistance_2 + min_distance +
restaurantDistance + clinicDistance + postOfficeDistance +
kindergartenDistance + pharmacyDistance + latitude,
data = properties_warsaw)
lm_lat <- coeftest(lm_lat_raw, vcov = vcovHC(lm_lat_raw, type = "HC1"))
lm_lat
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.2180e+05 4.0397e+04 20.3431 < 2.2e-16 ***
## centreDistance -1.3408e+03 4.9762e+01 -26.9442 < 2.2e-16 ***
## centreDistance_2 5.8612e+01 3.6034e+00 16.2658 < 2.2e-16 ***
## min_distance 2.0001e+00 3.5894e-01 5.5721 2.614e-08 ***
## restaurantDistance -2.9299e+03 2.3887e+02 -12.2653 < 2.2e-16 ***
## clinicDistance -1.7653e+02 5.8933e+01 -2.9955 0.002750 **
## postOfficeDistance 1.0003e+03 1.5647e+02 6.3932 1.733e-10 ***
## kindergartenDistance 1.0094e+03 2.3209e+02 4.3490 1.388e-05 ***
## pharmacyDistance 7.4043e+02 2.2502e+02 3.2905 0.001005 **
## latitude -1.5286e+04 7.7324e+02 -19.7689 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparison of all models
results <- stargazer(
lm_linear_raw, lm_quad_raw, lm_coord_raw, lm_lat_raw,
se = list(
sqrt(diag(vcovHC(lm_linear_raw, type="HC1"))),
sqrt(diag(vcovHC(lm_quad_raw, type="HC1"))),
sqrt(diag(vcovHC(lm_coord_raw, type="HC1"))),
sqrt(diag(vcovHC(lm_lat_raw, type="HC1")))
),
title = "Regression Output: Impact of Various Distances on Price per m²",
type = "text",
star.cutoffs = c(0.10, 0.05, 0.01)
)
##
## Regression Output: Impact of Various Distances on Price per m²
## =============================================================================================================================
## Dependent variable:
## --------------------------------------------------------------------------------------------------------
## price/squareMeters
## (1) (2) (3) (4)
## -----------------------------------------------------------------------------------------------------------------------------
## centreDistance -566.703*** -1,276.317*** -1,345.617*** -1,340.787***
## (16.178) (50.836) (49.521) (49.762)
##
## centreDistance_2 55.726*** 59.170*** 58.612***
## (3.668) (3.568) (3.603)
##
## min_distance 3.124*** 2.965*** 1.985*** 2.000***
## (0.378) (0.369) (0.359) (0.359)
##
## centroid_score 234.219** 13.670
## (118.796) (110.147)
##
## restaurantDistance -3,896.562*** -3,871.507*** -2,919.990*** -2,929.856***
## (239.026) (236.789) (239.064) (238.874)
##
## clinicDistance -191.106*** -266.142*** -174.452*** -176.532***
## (61.401) (60.971) (59.276) (58.933)
##
## postOfficeDistance 1,072.062*** 1,152.271*** 983.579*** 1,000.321***
## (160.651) (159.884) (156.703) (156.466)
##
## kindergartenDistance 1,542.236*** 1,461.419*** 1,049.154*** 1,009.367***
## (252.352) (244.742) (233.813) (232.094)
##
## pharmacyDistance 606.485*** 565.771** 739.658*** 740.430***
## (231.824) (229.157) (224.835) (225.019)
##
## longitude -883.373
## (618.187)
##
## latitude -15,576.680*** -15,286.000***
## (792.529) (773.236)
##
## Constant 21,331.990*** 23,187.090*** 855,537.400*** 821,798.700***
## (123.484) (177.674) (46,062.710) (40,397.020)
##
## -----------------------------------------------------------------------------------------------------------------------------
## Observations 6,778 6,778 6,778 6,778
## R2 0.248 0.273 0.303 0.303
## Adjusted R2 0.247 0.272 0.302 0.302
## Residual Std. Error 3,401.490 (df = 6769) 3,344.075 (df = 6768) 3,274.731 (df = 6767) 3,274.870 (df = 6768)
## F Statistic 279.229*** (df = 8; 6769) 282.959*** (df = 9; 6768) 294.628*** (df = 10; 6767) 327.162*** (df = 9; 6768)
## =============================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Centre distance & transport accessibility - Key drivers of price variation. Proximity to restaurants and clinics - Increases property value. Proximity to kindergartens, pharmacies and post offices does not necessarily increase property value - in fact, higher-end residential areas might have fewer of these amenities.
Therefore, amenities play a crucial role, and the nonlinear city center effect is essential for price modeling.
# Before running spatial models, we check if property prices cluster geographically.
properties_warsaw <- properties_warsaw %>%
distinct(longitude, latitude, .keep_all = TRUE)
# Create spatial coordinates
coords <- cbind(properties_warsaw$longitude, properties_warsaw$latitude)
# Define spatial neighbors based on the 4 nearest properties
neighbors <- knn2nb(knearneigh(coords, k = 15))
# Convert neighbors into spatial weights
weights <- nb2listw(neighbors, style = "W")
# Moran's I test for spatial autocorrelation
moran_test <- moran.test(properties_warsaw$price / properties_warsaw$squareMeters, listw = weights)
print(moran_test)
##
## Moran I test under randomisation
##
## data: properties_warsaw$price/properties_warsaw$squareMeters
## weights: weights
##
## Moran I statistic standard deviate = 104.16, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 4.917721e-01 -1.870907e-04 2.230603e-05
Spatial dependency confirms the need for spatial regression models (e.g., SEM).
spatial_model_sem <- errorsarlm(price/squareMeters ~ min_distance +centroid_score+ centreDistance +
restaurantDistance + clinicDistance +
postOfficeDistance+ kindergartenDistance+ pharmacyDistance,
data = properties_warsaw, listw = weights)
summary(spatial_model_sem)
##
## Call:errorsarlm(formula = price/squareMeters ~ min_distance + centroid_score +
## centreDistance + restaurantDistance + clinicDistance + postOfficeDistance +
## kindergartenDistance + pharmacyDistance, data = properties_warsaw,
## listw = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9245.44 -1925.46 -315.26 1530.37 12304.27
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 22192.79950 328.60428 67.5366 < 2.2e-16
## min_distance -0.43860 0.52597 -0.8339 0.404342
## centroid_score -97.68410 106.25118 -0.9194 0.357902
## centreDistance -578.41777 53.28382 -10.8554 < 2.2e-16
## restaurantDistance -1620.03038 373.14626 -4.3415 1.415e-05
## clinicDistance -378.72859 185.84497 -2.0379 0.041563
## postOfficeDistance 763.14683 271.98239 2.8059 0.005018
## kindergartenDistance 150.47209 362.14658 0.4155 0.677776
## pharmacyDistance 472.88425 332.68398 1.4214 0.155194
##
## Lambda: 0.73449, LR test value: 1495.4, p-value: < 2.22e-16
## Asymptotic standard error: 0.015046
## z-value: 48.816, p-value: < 2.22e-16
## Wald statistic: 2383, p-value: < 2.22e-16
##
## Log likelihood: -50280.69 for error model
## ML residual variance (sigma squared): 8237600, (sigma: 2870.1)
## Number of observations: 5346
## Number of parameters estimated: 11
## AIC: 100580, (AIC for lm: 102080)
The Spatial Error Model (SEM) accounts for spatial dependence in property prices, correcting biases from OLS. A high Lambda (0.731, p < 2.2e-16) confirms strong spatial autocorrelation, making SEM a better fit than a standard linear model.
SEM produces similar results to OLS, but accounts for spatial effects, refining variable significance.
This study examined the relationship between property prices and proximity to public transport and urban amenities in Warsaw, using spatial and econometric analysis.
Understanding spatial dependencies is essential for accurate property valuation and urban planning.
Lecture Materials by Professor Andrea Caragliu: Applied Regional and Urban Economics
ArcGIS Pro Documentation: “How Spatial
Autocorrelation (Global Moran’s I) works.”
ArcGIS
Pro - Moran’s I
Crime Mapping in R: “Chapter 9: Spatial
regression models.”
Crime
Mapping Textbook