#In this spatial regression assignment, you will design a regression model based on either the Chicago Airbnb or Southern US datasets that are posted. Your regression model must have at least three variables (one dependent variable, and two independent variables). In your submission, please include:
#1) The research question you answer with your model (e.g. What is the impact of education on inequality?)
#2) A correctly interpreted Moran's I test on your original linear model (not the spatial version) - is there evidence of significant spatial autocorrelation in the residuals?
#3) A spatial regression model (error or lag, don't worry about GW regression!), and justification for why you chose it (I'd love to see you run the Lagrange Multiplier test! No need to get too technical, just tell me how you used p-values to determine the right test to use))
#4) A brief discussion of how you would interpret the results of your model - what variables are statistically significant? (e.g. Education had a negative and statistically significant impact on inequality)
#Submit everything as an Rpubs file! If you are unable to do so, let me know. No need to get too technical with your explanations. My goal is to make sure that you have a general conceptual understanding of the unit.
#1) Research Question: How does the conjunction of lack of high school education and unemployment affect poverty rates?
#Load in Packages
library(sf)
## Warning: package 'sf' was built under R version 4.3.2
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(spData)
## Warning: package 'spData' was built under R version 4.3.2
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library(ggplot2)
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.3.2
library(spdep)
## Warning: package 'spdep' was built under R version 4.3.3
library(spatialreg)
## Warning: package 'spatialreg' was built under R version 4.3.3
## Loading required package: Matrix
##
## Attaching package: 'spatialreg'
## The following objects are masked from 'package:spdep':
##
## get.ClusterOption, get.coresOption, get.mcOption,
## get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
## set.coresOption, set.mcOption, set.VerboseOption,
## set.ZeroPolicyOption
library(GWmodel)
## Warning: package 'GWmodel' was built under R version 4.3.3
## Loading required package: robustbase
## Warning: package 'robustbase' was built under R version 4.3.3
## Loading required package: sp
## Warning: package 'sp' was built under R version 4.3.2
## Loading required package: Rcpp
## Welcome to GWmodel version 2.3-2.
library(tidyr)
##
## Attaching package: 'tidyr'
## The following objects are masked from 'package:Matrix':
##
## expand, pack, unpack
#Load in Data
setwd("C:/Users/kajsa/OneDrive/Documents/Binghamton/Senior - Spring/DIDA 370/DIDA 370 - Topic 11/chicago_airbnb")
chi <- st_read("airbnb_Chicago 2015.shp")
## Reading layer `airbnb_Chicago 2015' from data source
## `C:\Users\kajsa\OneDrive\Documents\Binghamton\Senior - Spring\DIDA 370\DIDA 370 - Topic 11\chicago_airbnb\airbnb_Chicago 2015.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 77 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## Geodetic CRS: WGS 84
chi
## Simple feature collection with 77 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## Geodetic CRS: WGS 84
## First 10 features:
## community shape_area shape_len AREAID response_r accept_r
## 1 DOUGLAS 46004621.1581 31027.0545098 35 98.77143 94.51429
## 2 OAKLAND 16913961.0408 19565.5061533 36 99.20000 90.10526
## 3 FULLER PARK 19916704.8692 <NA> 37 68.00000 NA
## 4 GRAND BOULEVARD 48492503.1554 28196.8371573 38 94.03704 83.61539
## 5 KENWOOD 29071741.9283 23325.1679062 39 92.54286 88.14286
## 6 LINCOLN SQUARE 71352328.2399 36624.6030848 4 90.00990 87.89691
## 7 WASHINGTON PARK 42373881.4842 28175.3160866 40 100.00000 97.33333
## 8 HYDE PARK 45105380.1732 29746.7082016 41 92.49505 89.93000
## 9 WOODLAWN 57815179.512 46936.9592443 42 91.09091 96.90000
## 10 ROGERS PARK 51259902.4506 34052.3975757 1 95.82500 88.81034
## rev_rating price_pp room_type num_spots poverty crowded dependency
## 1 87.77778 78.15789 1.789474 38 29.6 1.8 30.7
## 2 88.81250 53.77500 1.850000 20 39.7 1.3 40.4
## 3 91.75000 84.00000 1.833333 6 51.2 3.2 44.9
## 4 92.75000 119.53333 1.533333 30 29.3 3.3 39.5
## 5 90.65625 77.99145 1.615385 39 21.7 2.4 35.4
## 6 95.24176 80.18826 1.436364 110 10.9 3.4 25.5
## 7 94.33333 115.75000 1.750000 4 42.1 5.6 42.8
## 8 90.51685 79.31968 1.587156 109 18.4 1.5 26.2
## 9 93.38889 71.45833 1.958333 24 30.7 2.9 36.1
## 10 93.99020 58.67388 1.645669 127 23.6 7.7 27.5
## without_hs unemployed income_pc harship_in num_crimes num_theft population
## 1 14.3 18.2 23791 47 5013 1241 18238
## 2 18.4 28.7 19252 78 1306 311 5918
## 3 26.6 33.9 10432 97 1764 383 2876
## 4 15.9 24.3 23472 57 6416 1428 21929
## 5 11.3 15.7 35911 26 2713 654 17841
## 6 13.4 8.2 37524 17 3670 1014 39493
## 7 25.4 28.6 13785 88 5433 823 11717
## 8 4.3 8.4 39056 14 3031 996 25681
## 9 16.5 23.4 18672 58 7585 1175 25983
## 10 18.2 8.7 23939 39 7181 1729 54991
## geometry
## 1 MULTIPOLYGON (((-87.60914 4...
## 2 MULTIPOLYGON (((-87.59215 4...
## 3 MULTIPOLYGON (((-87.6288 41...
## 4 MULTIPOLYGON (((-87.60671 4...
## 5 MULTIPOLYGON (((-87.59215 4...
## 6 MULTIPOLYGON (((-87.67441 4...
## 7 MULTIPOLYGON (((-87.60604 4...
## 8 MULTIPOLYGON (((-87.58038 4...
## 9 MULTIPOLYGON (((-87.57714 4...
## 10 MULTIPOLYGON (((-87.65456 4...
#Plot the Data
ggplot()+
geom_sf(data = chi, aes(fill=as.numeric(poverty)))+
scale_fill_steps(
name = "Poverty",
low = "lightsteelblue1",
high = "tomato1",
n.breaks = 5,
show.limits = T)+
theme_void()

#create our variables for the model:
model_2 <- chi %>%
select(c(poverty, unemployed, without_hs, AREAID, geometry)) %>%
filter(!poverty %in% "NA")
#Run a basic model
summary(lm2 <- lm(poverty ~ unemployed + without_hs, data = model_2))
##
## Call:
## lm(formula = poverty ~ unemployed + without_hs, data = model_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.0642 -5.6940 -0.2384 4.8156 15.6825
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.12939 1.95078 0.579 0.5644
## unemployed 1.13589 0.11044 10.285 6.51e-16 ***
## without_hs 0.15609 0.07047 2.215 0.0298 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.789 on 74 degrees of freedom
## Multiple R-squared: 0.6625, Adjusted R-squared: 0.6533
## F-statistic: 72.62 on 2 and 74 DF, p-value: < 2.2e-16
#2) Moran's I test on your original linear model - Is there evidence of significant spatial autocorrelation in the residuals?
#Let's test for spatial autocorrelation in the residuals - create the weights
chi_list <- model_2 %>%
poly2nb(st_geometry(model_2)) %>%
nb2listw(zero.policy = TRUE)
#Run the Moran's I test for regression residuals
lm.morantest(lm2, chi_list)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = poverty ~ unemployed + without_hs, data = model_2)
## weights: chi_list
##
## Moran I statistic standard deviate = 5.692, p-value = 6.279e-09
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.371038481 -0.028342012 0.004923201
#Because our Moran's I value is positive, and the p-value is less than 0.05, we can conclude that there is
#significant positive spatial autocorrelation in the residuals.
#3) A spatial regression model (error or lag, don't worry about GW regression!), and justification for why you chose it (I'd love to see you run the Lagrange Multiplier test! No need to get too technical, just tell me how you used p-values to determine the right test to use))
# Because there was significant dependency in the residuals based on the Moran's I test from the previous code chunk,
# I have decided to move forward with a Spatial Error Regression Model
lm2_error <- errorsarlm(poverty ~ unemployed + without_hs,
data = model_2,
listw = chi_list,
zero.policy = TRUE,
na.action = na.omit)
summary(lm2_error)
##
## Call:errorsarlm(formula = poverty ~ unemployed + without_hs, data = model_2,
## listw = chi_list, na.action = na.omit, zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.27860 -4.23530 -0.57197 2.76407 12.89043
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.217800 2.658946 -0.0819 0.9347
## unemployed 1.042240 0.125797 8.2851 2.220e-16
## without_hs 0.307945 0.077923 3.9519 7.752e-05
##
## Lambda: 0.65353, LR test value: 22.55, p-value: 2.0476e-06
## Asymptotic standard error: 0.10078
## z-value: 6.4845, p-value: 8.9038e-11
## Wald statistic: 42.048, p-value: 8.9038e-11
##
## Log likelihood: -243.9278 for error model
## ML residual variance (sigma squared): 29.57, (sigma: 5.4378)
## Number of observations: 77
## Number of parameters estimated: 5
## AIC: 497.86, (AIC for lm: 518.41)
# The error's model AIC is lower than the orginal linear model's AIC which suggests an improvement in the model!
seResiduals2 <- rep(0, length(model_2$poverty))
resIndex2 <- lm2_error$residuals %>% names() %>% as.integer();
seResiduals2[resIndex2] <- lm2_error$residuals
#perform the Moran Test
chi_list %>%
moran.test(seResiduals2, ., zero.policy = TRUE)
##
## Moran I test under randomisation
##
## data: seResiduals2
## weights: .
##
## Moran I statistic standard deviate = -0.3115, p-value = 0.6223
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.035587486 -0.013157895 0.005184602
#Now, we can see that our Moran's I test is insignificant as indicated by the p-value greater than 0.05 - this tells us that we effectively controlled for the spatial variation in the error term.
#4) Through our statistical testing, we can see that an increase of unemployment and individuals without hs degrees we see an increase in poverty rates.