library(tidyverse)
library(GWmodel)
library(cluster)
library(factoextra)
library(leaflet)
library(sf)
library(sp)
library(spatialreg)
library(spdep)
library(spgwr)Spatial Econometrics Project
INTRODUCTION
Spatial Analysis on London AirBnb Data
This dataset provides a comprehensive look at Airbnb prices in London in the weekends. Each listing is evaluated for various attributes such as room types, cleanliness and satisfaction ratings, bedrooms, distance from the city centre, and more to capture an in-depth understanding of Airbnb prices.
Data Description
| realSum | The total price of the Airbnb listing. (Numeric) |
| room_type | The type of room being offered (e.g. private, shared, etc.). (Categorical) |
| room_shared | Whether the room is shared or not. (Boolean) |
| room_private | Whether the room is private or not. (Boolean) |
| person_capacity | The maximum number of people that can stay in the room. (Numeric) |
| host_is_superhost | Whether the host is a superhost or not. (Boolean) |
| multi | Whether the listing is for multiple rooms or not. (Boolean) |
| biz | Whether the listing is for business purposes or not. (Boolean) |
| cleanliness_rating | The cleanliness rating of the listing. (Numeric) |
| guest_satisfaction_overall | The overall guest satisfaction rating of the listing. (Numeric) |
| bedrooms | The number of bedrooms in the listing. (Numeric) |
| dist | The distance from the city centre. (Numeric) |
| metro_dist | The distance from the nearest metro station. (Numeric) |
| lng | The longitude of the listing. (Numeric) |
| lat | The latitude of the listing. (Numeric) |
Research question
How do spatial and non-spatial factors influence Airbnb prices in a given city, and what are the implications?
DATA PREPARATION
Loading libraries
Loading Data (Shapefile and csv file)
Both the shapefile of London geographical data and the economic variables about Airbnb prices are uploaded.
london <- st_read("London_Ward_CityMerged.shp")Reading layer `London_Ward_CityMerged' from data source
`/Users/paky/Desktop/Spatial Econometrics/Spatial Econometrics Project/London_Ward_CityMerged.shp'
using driver `ESRI Shapefile'
Simple feature collection with 625 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 503568.2 ymin: 155850.8 xmax: 561957.5 ymax: 200933.9
Projected CRS: OSGB36 / British National Grid
airbnb <- read_csv("london_weekends.csv")EDA
In this section the structure of the economic data is explored.
Data Structure
dim(airbnb)[1] 5379 20
summary(airbnb) ...1 realSum room_type room_shared
Min. : 0 Min. : 54.33 Length:5379 Mode :logical
1st Qu.:1344 1st Qu.: 174.51 Class :character FALSE:5352
Median :2689 Median : 268.12 Mode :character TRUE :27
Mean :2689 Mean : 364.39
3rd Qu.:4034 3rd Qu.: 438.27
Max. :5378 Max. :12937.27
room_private person_capacity host_is_superhost multi
Mode :logical Min. :2.000 Mode :logical Min. :0.0000
FALSE:2445 1st Qu.:2.000 FALSE:4484 1st Qu.:0.0000
TRUE :2934 Median :2.000 TRUE :895 Median :0.0000
Mean :2.858 Mean :0.2798
3rd Qu.:4.000 3rd Qu.:1.0000
Max. :6.000 Max. :1.0000
biz cleanliness_rating guest_satisfaction_overall bedrooms
Min. :0.0000 Min. : 2.000 Min. : 20.00 Min. :0.000
1st Qu.:0.0000 1st Qu.: 9.000 1st Qu.: 87.00 1st Qu.:1.000
Median :0.0000 Median :10.000 Median : 94.00 Median :1.000
Mean :0.3579 Mean : 9.194 Mean : 90.92 Mean :1.133
3rd Qu.:1.0000 3rd Qu.:10.000 3rd Qu.: 99.00 3rd Qu.:1.000
Max. :1.0000 Max. :10.000 Max. :100.00 Max. :8.000
dist metro_dist attr_index attr_index_norm
Min. : 0.04056 Min. :0.01388 Min. : 68.74 Min. : 4.778
1st Qu.: 3.54568 1st Qu.:0.32404 1st Qu.: 177.22 1st Qu.: 12.320
Median : 4.93914 Median :0.53613 Median : 247.65 Median : 17.215
Mean : 5.32762 Mean :1.01653 Mean : 294.58 Mean : 20.477
3rd Qu.: 6.83807 3rd Qu.:1.09076 3rd Qu.: 361.07 3rd Qu.: 25.099
Max. :17.32120 Max. :9.17409 Max. :1438.56 Max. :100.000
rest_index rest_index_norm lng lat
Min. : 140.5 Min. : 2.515 Min. :-0.25170 Min. :51.41
1st Qu.: 382.1 1st Qu.: 6.839 1st Qu.:-0.16996 1st Qu.:51.49
Median : 527.3 Median : 9.439 Median :-0.11813 Median :51.51
Mean : 625.6 Mean : 11.197 Mean :-0.11478 Mean :51.50
3rd Qu.: 764.2 3rd Qu.: 13.678 3rd Qu.:-0.06772 3rd Qu.:51.53
Max. :5587.1 Max. :100.000 Max. : 0.12018 Max. :51.58
head(airbnb)# A tibble: 6 × 20
...1 realSum room_type room_shared room_private person_capacity
<dbl> <dbl> <chr> <lgl> <lgl> <dbl>
1 0 121. Private room FALSE TRUE 2
2 1 196. Private room FALSE TRUE 2
3 2 193. Private room FALSE TRUE 3
4 3 180. Private room FALSE TRUE 2
5 4 406. Entire home/apt FALSE FALSE 3
6 5 354. Entire home/apt FALSE FALSE 2
# ℹ 14 more variables: host_is_superhost <lgl>, multi <dbl>, biz <dbl>,
# cleanliness_rating <dbl>, guest_satisfaction_overall <dbl>, bedrooms <dbl>,
# dist <dbl>, metro_dist <dbl>, attr_index <dbl>, attr_index_norm <dbl>,
# rest_index <dbl>, rest_index_norm <dbl>, lng <dbl>, lat <dbl>
We’ve got data about 5379 listings (houses) and 20 variables (some feature selection will be made afterwards).
Listing on the map
ggplot(airbnb, aes(x = lng, y = lat)) +
geom_point(alpha = 0.5) +
theme_minimal()With this graph we can see where in the space there more and where there are less Airbnb listings.
Data Manipulation
Firstly, we ensure that both datasets have the same coordinate reference system (CRS).
# Ensure both datasets have the same coordinate reference system (CRS)
london <- st_transform(london, 4326)
airbnb_spatial <- st_as_sf(airbnb, coords = c("lng", "lat"), crs=4326)
joined_data <- st_join(airbnb_spatial, london, join = st_within)Then, a variable selection is done and their values are aggregated by polygons (i.e., for each area of London, the average value of the variables for the listings in that space in taken).
# List of variables
vars <- c("realSum","person_capacity", "bedrooms", "dist", "guest_satisfaction_overall", "cleanliness_rating")
# Aggregate point data by polygon (area) to compute mean values
polygon_summary <- joined_data %>%
group_by(POLY_ID) %>%
summarise(
across(all_of(vars), ~ mean(.x, na.rm = TRUE))
) %>%
st_drop_geometry()
# Retrieve polygon geometries from the original London dataset
polygon_geometries <- london %>%
select(POLY_ID, geometry)
# Merge polygon geometries with polygon summary using left_join
final_summary <- left_join(polygon_summary, polygon_geometries, by = "POLY_ID")
# Convert final_summary to an sf object
final_summary <- st_as_sf(final_summary)We now have a new dataset containing 223 areas of London and 6 economic variables for each one.
MODELING
Spatial weights
cont.sf <- poly2nb(final_summary)
spatial_weights <- nb2listw(cont.sf, style="W")Spatial weights matrix is computed.
Spatial autocorrelation
Moran’s test for spatial autocorrelation is then performed:
moran.test(final_summary$realSum, spatial_weights)
Moran I test under randomisation
data: final_summary$realSum
weights: spatial_weights
Moran I statistic standard deviate = 13.104, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.521592954 -0.004504505 0.001611774
The Moran’s I test results indicate that there is a significant positive spatial autocorrelation in the price variable among the spatial areas (polygons) represented in the dataset.
This positive autocorrelation suggests that values of price tend to be similar among neighboring polygons, implying spatial clustering or patterns in the distribution of this variable across the study area.
The strong statistical significance (very low p-value) reinforces the conclusion that the observed spatial autocorrelation is unlikely to occur by random chance alone.
Spatial Lag Model
# Define the formula for the spatial lag model
formula_lag <- realSum ~ person_capacity + bedrooms + dist + guest_satisfaction_overall + cleanliness_rating
# Fit the spatial lag model
model_lag <- lagsarlm(formula_lag, data = final_summary, listw = spatial_weights)
# View summary of the spatial lag model
summary(model_lag)
Call:
lagsarlm(formula = formula_lag, data = final_summary, listw = spatial_weights)
Residuals:
Min 1Q Median 3Q Max
-170.117 -51.598 -16.053 26.518 915.766
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 134.6892 200.8917 0.6705 0.5026
person_capacity 123.5157 20.2611 6.0962 1.086e-09
bedrooms 37.5831 31.9576 1.1760 0.2396
dist -15.4734 3.2313 -4.7886 1.680e-06
guest_satisfaction_overall -3.8703 3.1624 -1.2238 0.2210
cleanliness_rating 16.8386 30.6019 0.5502 0.5821
Rho: 0.2998, LR test value: 13.459, p-value: 0.00024381
Asymptotic standard error: 0.083576
z-value: 3.5872, p-value: 0.00033427
Wald statistic: 12.868, p-value: 0.00033427
Log likelihood: -1348.753 for lag model
ML residual variance (sigma squared): 10309, (sigma: 101.53)
Number of observations: 223
Number of parameters estimated: 8
AIC: 2713.5, (AIC for lm: 2725)
LM test for residual autocorrelation
test value: 2.1082, p-value: 0.14652
Interpretation:
Coefficients:
Intercept: The estimated intercept is 134.6892, which represents the expected value of price when all other predictor variables are zero.
person_capacity: For every unit increase in person_capacity, the expected value of price increases by 123.5157, holding other variables constant.
bedrooms: The coefficient for bedrooms is 37.5831, suggesting that an increase in the number of bedrooms is associated with an increase in price, although the p-value (0.2396) indicates that this relationship is not statistically significant at conventional levels.
dist: A one-unit increase in dist (distance) is associated with a decrease of 15.4734 in price. This negative coefficient is statistically significant (p-value < 0.001), indicating that properties farther away tend to have lower price values.
guest_satisfaction_overall and cleanness_rating: These coefficients are not statistically significant (p-values > 0.05), suggesting that there is insufficient evidence to conclude that these variables have a linear relationship with price.
Spatial Autocorrelation:
- Rho (
Rho): The spatial autoregressive parameter (rho) is estimated to be 0.2998. This indicates positive spatial autocorrelation, suggesting that similar values ofrealSumtend to occur in nearby locations.
Model Fit:
- AIC: The Akaike Information Criterion (AIC) for the lag model is 2713.5, which is lower than the AIC for a standard linear regression (lm), indicating that the spatial lag model provides a better fit.
Residual Autocorrelation Test:
- LM Test for Residual Autocorrelation: The LM test statistic (2.1082) with a p-value of 0.14652 tests for residual autocorrelation. A higher p-value (> 0.05) suggests no significant evidence of residual autocorrelation, although caution should be exercised given the proximity to conventional significance levels.
Conclusions:
The spatial lag model reveals significant relationships between the price and the variables person_capacity and dist, while also detecting positive spatial autocorrelation, which suggests that nearby observations are more similar than those farther apart. This model provides valuable insights into the spatial dependency of the price variable and the role of different predictors in explaining variations in this variable.
Spatial Error Model
Since we now know that there is spatial autocorrelation, a spatial error model is used to account for it.
# Define the formula for the spatial error model
formula_error <- realSum ~ person_capacity + bedrooms + dist + guest_satisfaction_overall + cleanliness_rating
# Fit the spatial error model using errorsarlm
model_error <- errorsarlm(formula_error, data = final_summary, listw = spatial_weights)
# Summarize the model results
summary(model_error)
Call:
errorsarlm(formula = formula_error, data = final_summary, listw = spatial_weights)
Residuals:
Min 1Q Median 3Q Max
-161.435 -50.154 -16.422 27.628 909.328
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 250.3207 207.3326 1.2073 0.2273
person_capacity 124.4784 21.2092 5.8691 4.383e-09
bedrooms 46.4639 31.7415 1.4638 0.1432
dist -23.8709 3.4822 -6.8551 7.127e-12
guest_satisfaction_overall -2.5608 3.2461 -0.7889 0.4302
cleanliness_rating 6.0222 31.0584 0.1939 0.8463
Lambda: 0.32482, LR test value: 10.347, p-value: 0.0012967
Asymptotic standard error: 0.091854
z-value: 3.5363, p-value: 0.00040582
Wald statistic: 12.505, p-value: 0.00040582
Log likelihood: -1350.309 for error model
ML residual variance (sigma squared): 10419, (sigma: 102.07)
Number of observations: 223
Number of parameters estimated: 8
AIC: 2716.6, (AIC for lm: 2725)
The spatial error model’s results provide insights into how the specified predictors influence the price while considering spatial effects in the data. The significant Lambda value and model fit statistics support the validity and usefulness of this modeling approach for this purpose.
Geographically Weighted Regression
Data preparation
Firstly, centroids of each London area are created.
crds.sf<-st_centroid(final_summary$geometry)
crds<-st_coordinates(crds.sf)Then, the formula for the regression in created:
formula_gwr <- realSum ~ person_capacity + bedrooms + dist + guest_satisfaction_overall + cleanliness_ratingLastly, the optimal bandwidth for the kernel is computed:
bw<-ggwr.sel(formula_gwr, data=final_summary, coords=crds, family=poisson(), longlat=TRUE)Bandwidth: 11.53587 CV score: 2384555
Bandwidth: 18.64679 CV score: 2394005
Bandwidth: 7.141076 CV score: 2360428
Bandwidth: 4.424945 CV score: 2321810
Bandwidth: 2.746284 CV score: 2277530
Bandwidth: 1.708814 CV score: 2310278
Bandwidth: 2.967365 CV score: 2285530
Bandwidth: 2.520723 CV score: 2270120
Bandwidth: 2.210601 CV score: 2265873
Bandwidth: 2.174159 CV score: 2266271
Bandwidth: 2.269214 CV score: 2265744
Bandwidth: 2.251943 CV score: 2265722
Bandwidth: 2.253018 CV score: 2265722
Bandwidth: 2.252651 CV score: 2265722
Bandwidth: 2.252611 CV score: 2265722
Bandwidth: 2.252692 CV score: 2265722
Bandwidth: 2.252651 CV score: 2265722
Model
Finally, a Generalized Geographically Weighted Regression is run:
# Compute GGWR model with bandwidth selection
ggwr_model <- ggwr(formula_gwr, data = final_summary, longlat = TRUE, coords = crds, bandwidth = bw)
# Summary of GGWR model
ggwr_modelCall:
ggwr(formula = formula_gwr, data = final_summary, coords = crds,
bandwidth = bw, longlat = TRUE)
Kernel function: gwr.Gauss
Fixed bandwidth: 2.252651
Summary of GWR coefficient estimates at data points:
Min. 1st Qu. Median 3rd Qu.
X.Intercept. -807.27858 -20.86557 280.29804 484.31534
person_capacity 17.98834 81.95564 115.96462 158.79685
bedrooms -118.96642 -6.56119 38.20485 130.28437
dist -58.32765 -45.52561 -32.67451 -22.02527
guest_satisfaction_overall -26.45223 -6.34695 -3.20808 0.50117
cleanliness_rating -235.73410 -26.11439 17.71770 71.31600
Max. Global
X.Intercept. 976.63962 343.6122
person_capacity 225.52532 147.3500
bedrooms 310.90045 23.0610
dist -6.03762 -22.5523
guest_satisfaction_overall 14.05380 -5.7395
cleanliness_rating 267.55642 22.8392
GWR coefficient estimates help to understand how the relationships between variables differ across space, providing insights into local variations that may not be captured by a traditional global regression model. They highlight the spatial heterogeneity in the studied relationships and can be used as a guide to more targeted and context-specific interpretations.
For instance, if person_capacity has a median coefficient estimate of 116 and a wide range from 82 to 226 across different locations, it suggests that the effect of person_capacity on the price varies substantially depending on the specific geographic context. Some areas might show a stronger positive relationship between person_capacity and price, while others might exhibit weaker or negative relationships.
Visualization
We can see this graphically:
plots_data <- final_summary
par(mfrow = c(3, 2))
plots_data$GWR.person_capacity<-ggwr_model$SDF$person_capacity
ggplot()+geom_sf(data=plots_data, aes(fill=GWR.person_capacity))plots_data$GWR.bedrooms<-ggwr_model$SDF$bedrooms
ggplot()+geom_sf(data=plots_data, aes(fill=GWR.bedrooms))plots_data$GWR.dist<-ggwr_model$SDF$dist
ggplot()+geom_sf(data=plots_data, aes(fill=GWR.dist))plots_data$GWR.guest_satisfaction_overall<-ggwr_model$SDF$guest_satisfaction_overall
ggplot()+geom_sf(data=plots_data, aes(fill=GWR.guest_satisfaction_overall))plots_data$GWR.cleanliness_rating<-ggwr_model$SDF$cleanliness_rating
ggplot()+geom_sf(data=plots_data, aes(fill=GWR.cleanliness_rating))Regions with darker colors suggest locations where the relationships between the predictors and the response variable are less impactful or where other unmodeled factors might be more influential.
Brighter colors highlight locations where the predictor variables strongly explain variations in the response variable. These areas could be significant for targeted interventions or further investigation.
Boxplot
boxplot(as.data.frame(ggwr_model$SDF)[,3:7])
abline(h=0, lty=3, lwd=2, col="red")This boxplot provides insights into the distribution and variability of the GWR coefficient estimates for each predictor variable. By analyzing this boxplot, some can spatial patterns in how different predictors impact the response variable, can be revealed.
For example, if certain predictors consistently show positive or negative coefficients across most locations, it suggests spatially varying relationships. This is the case of the variables person_capacity and bedrooms.
CLUSTERING
We are now going to perform clustering of the different areas:
Optimal number of clusters
fviz_nbclust(as.data.frame(ggwr_model$SDF[,3:7]), FUNcluster=kmeans)- As we can see from the graph, the best number of clusters according to the Silhouette index is 3, even though 2 or 4 also fine.
K-Means
K-Means clustering algorithm is then performed by selecting 3 clusters:
km3c <- eclust(as.data.frame(ggwr_model$SDF[,3:7]), "kmeans", k=3)plots_data$clust3 <- km3c$cluster
ggplot() + geom_sf(data=plots_data, aes(fill=clust3))- The clustering analysis applied to GWR coefficients helps in identifying spatial groupings of locations with similar predictor-response relationships. It facilitates the exploration of spatial patterns, differentiation between areas, and identification of localized trends and variations in the study area.
- As we can see from the graph, around 60% of the total variability is explained, which is not bad.
MODELING II
Let’s now create dummy variables representing clusters and add them to the model:
final_summary$clust1<-rep(0, times=dim(final_summary)[1])
final_summary$clust1[km3c$cluster==1]<-1
final_summary$clust2<-rep(0, times=dim(final_summary)[1])
final_summary$clust2[km3c$cluster==2]<-1
final_summary$clust3<-rep(0, times=dim(final_summary)[1])
final_summary$clust3[km3c$cluster==3]<-1
final_summary$clust4<-rep(0, times=dim(final_summary)[1])Spatial Error Model
By adding the dummy variables to the model, we are controlling for spatial drift.
new_eq <- realSum ~ person_capacity + bedrooms + dist + guest_satisfaction_overall + cleanliness_rating + clust1 + clust2model.sem<-errorsarlm(new_eq, data=final_summary, spatial_weights)
summary(model.sem)
Call:
errorsarlm(formula = new_eq, data = final_summary, listw = spatial_weights)
Residuals:
Min 1Q Median 3Q Max
-165.469 -50.351 -16.813 24.876 921.326
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 251.5349 207.0747 1.2147 0.2245
person_capacity 122.9442 21.3243 5.7655 8.143e-09
bedrooms 45.6809 31.7169 1.4403 0.1498
dist -24.3186 3.8481 -6.3197 2.621e-10
guest_satisfaction_overall -2.7588 3.2421 -0.8509 0.3948
cleanliness_rating 6.6765 31.0178 0.2152 0.8296
clust1 24.0430 24.5375 0.9798 0.3272
clust2 20.4912 28.8552 0.7101 0.4776
Lambda: 0.31442, LR test value: 9.6431, p-value: 0.0019007
Asymptotic standard error: 0.092549
z-value: 3.3973, p-value: 0.00068054
Wald statistic: 11.542, p-value: 0.00068054
Log likelihood: -1349.822 for error model
ML residual variance (sigma squared): 10388, (sigma: 101.92)
Number of observations: 223
Number of parameters estimated: 10
AIC: 2719.6, (AIC for lm: 2727.3)
The significant Lambda and test statistics suggest that there is spatial autocorrelation in the residuals of the model, indicating that nearby locations still exhibit similar “realSum” (price) values even after accounting for all specified predictors and spatially varying effects (clust1, clust2).
The estimated coefficients for each predictor and dummy variable provide insights into how these variables influence price in the presence of spatial effects. For example, person_capacity and dist have significant impacts on the price, while the dummy variables (clust1, clust2) capture additional spatial variability in the price.
In summary, the spatial error model with dummy variables (clust1, clust2) helps to control for spatial heterogeneity and autocorrelation in the residuals, providing a more accurate and robust analysis of the relationship between the price and the predictor variables within a spatial context. The model results highlight the importance of considering spatial effects when analyzing geographic data.
OLS Model with dummies
Lastly, we also include a linear model where the clusters are include to do a comparison:
ols_model<-lm(new_eq, data=final_summary)
summary(ols_model)
Call:
lm(formula = new_eq, data = final_summary)
Residuals:
Min 1Q Median 3Q Max
-130.31 -56.64 -24.54 30.81 906.01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 351.482 207.493 1.694 0.0917 .
person_capacity 142.240 21.266 6.689 1.92e-10 ***
bedrooms 23.112 33.732 0.685 0.4940
dist -23.125 3.059 -7.559 1.16e-12 ***
guest_satisfaction_overall -6.038 3.344 -1.806 0.0724 .
cleanliness_rating 24.680 32.309 0.764 0.4458
clust1 25.397 20.360 1.247 0.2136
clust2 22.569 24.500 0.921 0.3580
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 107.1 on 215 degrees of freedom
Multiple R-squared: 0.5243, Adjusted R-squared: 0.5088
F-statistic: 33.85 on 7 and 215 DF, p-value: < 2.2e-16
CONCLUSIONS
Modeling
According to the R-squared, with the linear model we are only able to explain 50% of the variability of the data.
Among the models, the Spatial Error Model and the Geographically Weighted Regression appear to perform better based on lower AIC values and potentially improved model fit compared to OLS. The inclusion of spatial weights and clustering variables also improved the fit of these models.
Finally, we can say that for this purpose the need to account for spatial information is crucial for proper analysis and conclusions.
Answers to the research question
The research conducted in this project reveals that both spatial and non-spatial factors significantly influence Airbnb prices. The key findings can be summarized as follows:
Influence of Location (Spatial Factors):
Distance to Amenities: Properties located closer to popular attractions, city centers, or transportation hubs generally command higher prices. This is evidenced by the significant negative coefficient for the distance variable (dist) in both the spatial error model and the geographically weighted regression (GWR).
Spatial Autocorrelation: The presence of positive spatial autocorrelation indicates that Airbnb prices are not randomly distributed but are spatially clustered. This means that high-priced properties are often located near other high-priced properties, and the same holds for low-priced properties. This was confirmed by the Moran’s I statistic and the results of spatial lag and error models.
Property Characteristics (Non-Spatial Factors):
Capacity and Size: Larger properties with more bedrooms and higher guest capacity have higher prices. The coefficients for person_capacity and bedrooms are positive and significant across models.
Quality Ratings: Guest satisfaction and cleanliness ratings have a positive, although not always statistically significant, impact on prices.
Local Variations and Clustering
GWR Analysis: The GWR model highlights that the influence of these factors varies across different areas of the city. For instance, the impact of distance to amenities on price is more pronounced in some neighborhoods than others. This local variation can be crucial for hosts aiming to optimize pricing based on their specific location.
Cluster Analysis: The clustering analysis revealed distinct groups of properties with similar characteristics and pricing behaviors. These clusters can help hosts understand their competition and market segment, allowing for more targeted marketing and pricing strategies.