Spatial-Regression---Assignment-4.knit

Name: Lyndsay Cuff DIDA 370 - Assignment 4

#load in all my packages
library(sf)

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(spData)

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

library(ggplot2)
library(ggthemes)
library(spdep)
library(spatialreg)

## Loading required package: Matrix

## 
## Attaching package: 'spatialreg'

## The following objects are masked from 'package:spdep':
## 
##     get.ClusterOption, get.coresOption, get.mcOption,
##     get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
##     set.coresOption, set.mcOption, set.VerboseOption,
##     set.ZeroPolicyOption

library(GWmodel)

## Loading required package: robustbase

## Loading required package: sp

## Loading required package: Rcpp

## Welcome to GWmodel version 2.3-2.

library(tidyr)

## 
## Attaching package: 'tidyr'

## The following objects are masked from 'package:Matrix':
## 
##     expand, pack, unpack

#RESEARCH QUESTION: Does the hardship index, number of crimes, and percentage of individuals without high school education, significantly affect the change in unemployment within Chicago AirBnB data?

#load in the data
setwd("/Users/lyndsaycuff/DIDA 370/airbnb")
chicago <- read_sf("airbnb_Chicago 2015.shp")

#simple regression model to start

summary(lm1 <- lm( unemployed ~ unemployed + harship_in + num_crimes + without_hs, data = chicago))

## Warning in model.matrix.default(mt, mf, contrasts): the response appeared on
## the right-hand side and was dropped

## Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
## model.matrix: no columns are assigned

## 
## Call:
## lm(formula = unemployed ~ unemployed + harship_in + num_crimes + 
##     without_hs, data = chicago)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2956 -1.8252  0.0558  2.1518  8.7022 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.024e+00  7.921e-01   8.867 3.31e-13 ***
## harship_in   3.741e-01  2.071e-02  18.060  < 2e-16 ***
## num_crimes   5.310e-06  6.032e-05   0.088     0.93    
## without_hs  -5.018e-01  4.958e-02 -10.121 1.53e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.995 on 73 degrees of freedom
## Multiple R-squared:  0.8486, Adjusted R-squared:  0.8424 
## F-statistic: 136.4 on 3 and 73 DF,  p-value: < 2.2e-16

#Let's test for spatial autocorrelation in the residuals!
#Create the weights first
chicago_list <- chicago %>% 
  poly2nb(st_geometry(chicago)) %>% 
  nb2listw(zero.policy = TRUE) 

#run the Moran's I test for regression residuals
lm.morantest(lm1, chicago_list)

## Warning in model.matrix.default(terms(model), model.frame(model)): the response
## appeared on the right-hand side and was dropped

## Warning in model.matrix.default(terms(model), model.frame(model)): problem with
## term 1 in model.matrix: no columns are assigned

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = unemployed ~ unemployed + harship_in + num_crimes +
## without_hs, data = chicago)
## weights: chicago_list
## 
## Moran I statistic standard deviate = 1.4754, p-value = 0.07005
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.068772397     -0.033672615      0.004821243

#GENERAL MORAN'S I TEST NOTES
#calculating how clustered the variable is across space

#checked the spatial error model - lm_error
#Moran's I on the error - specifically testing in the error term 
#is the error term correlated across space?
#want the error to be random ideally
lm_error <- errorsarlm(unemployed ~ unemployed + harship_in + num_crimes + without_hs,
              data = chicago,
              listw = chicago_list,
              zero.policy = TRUE, 
              na.action = na.omit)

## Warning in model.matrix.default(mt, mf): the response appeared on the
## right-hand side and was dropped

## Warning in model.matrix.default(mt, mf): problem with term 1 in model.matrix:
## no columns are assigned

summary(lm_error)

## 
## Call:errorsarlm(formula = unemployed ~ unemployed + harship_in + num_crimes + 
##     without_hs, data = chicago, listw = chicago_list, na.action = na.omit, 
##     zero.policy = TRUE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8.29031 -1.95061  0.12897  1.98196  8.72179 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value Pr(>|z|)
## (Intercept)  7.1103e+00  8.5477e-01  8.3184   <2e-16
## harship_in   3.6582e-01  2.2205e-02 16.4743   <2e-16
## num_crimes   2.0412e-05  6.1780e-05  0.3304   0.7411
## without_hs  -4.9218e-01  5.3272e-02 -9.2389   <2e-16
## 
## Lambda: 0.18754, LR test value: 1.0091, p-value: 0.31511
## Asymptotic standard error: 0.16167
##     z-value: 1.1601, p-value: 0.24602
## Wald statistic: 1.3458, p-value: 0.24602
## 
## Log likelihood: -191.1616 for error model
## ML residual variance (sigma squared): 8.3313, (sigma: 2.8864)
## Number of observations: 77 
## Number of parameters estimated: 6 
## AIC: 394.32, (AIC for lm: 393.33)

#not significant...weemp womp

#spatial lag
lm_lag <- lagsarlm(unemployed ~ unemployed + harship_in + num_crimes + without_hs,
              data = chicago,
              listw = chicago_list,
              zero.policy = TRUE, 
              na.action = na.omit)

## Warning in model.matrix.default(mt, mf): the response appeared on the
## right-hand side and was dropped

## Warning in model.matrix.default(mt, mf): problem with term 1 in model.matrix:
## no columns are assigned

summary(lm_lag)

## 
## Call:lagsarlm(formula = unemployed ~ unemployed + harship_in + num_crimes + 
##     without_hs, data = chicago, listw = chicago_list, na.action = na.omit, 
##     zero.policy = TRUE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7.77334 -1.76733  0.40202  1.91087  8.59803 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)  4.5771e+00  1.1495e+00  3.9817 6.843e-05
## harship_in   3.2293e-01  2.6957e-02 11.9795 < 2.2e-16
## num_crimes   1.6666e-05  5.6040e-05  0.2974    0.7662
## without_hs  -4.3007e-01  5.3134e-02 -8.0941 6.661e-16
## 
## Rho: 0.22034, LR test value: 6.5236, p-value: 0.010645
## Asymptotic standard error: 0.081665
##     z-value: 2.6981, p-value: 0.0069731
## Wald statistic: 7.2799, p-value: 0.0069731
## 
## Log likelihood: -188.4044 for lag model
## ML residual variance (sigma squared): 7.7331, (sigma: 2.7809)
## Number of observations: 77 
## Number of parameters estimated: 6 
## AIC: 388.81, (AIC for lm: 393.33)
## LM test for residual autocorrelation
## test value: 0.16343, p-value: 0.68602

#this is significant

#i am going to test for spatial autocorrelation in the residuals of that lag test
seResiduals <- rep(0, length(chicago$unemployed))
resIndex <- lm_lag$residuals %>% names() %>% as.integer();
seResiduals[resIndex] <- lm_lag$residuals

#perform the Moran Test
chicago_list %>%
  moran.test(seResiduals, ., zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  seResiduals  
## weights: .    
## 
## Moran I statistic standard deviate = -0.17724, p-value = 0.5703
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       -0.02583888       -0.01315789        0.00511869

#not significant tho..this model controls for autocorrelation in the residuals

#lagrange multiplier test 
LM <- lm.LMtests(lm1, chicago_list, test = "all")

## Please update scripts to use lm.RStests in place of lm.LMtests

## Warning in model.matrix.default(terms(model), model.frame(model)): the response
## appeared on the right-hand side and was dropped

## Warning in model.matrix.default(terms(model), model.frame(model)): problem with
## term 1 in model.matrix: no columns are assigned

LM

## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = unemployed ~ unemployed + harship_in + num_crimes +
## without_hs, data = chicago)
## test weights: listw
## 
## RSerr = 0.8545, df = 1, p-value = 0.3553
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = unemployed ~ unemployed + harship_in + num_crimes +
## without_hs, data = chicago)
## test weights: listw
## 
## RSlag = 6.3731, df = 1, p-value = 0.01159
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = unemployed ~ unemployed + harship_in + num_crimes +
## without_hs, data = chicago)
## test weights: listw
## 
## adjRSerr = 0.18143, df = 1, p-value = 0.6701
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = unemployed ~ unemployed + harship_in + num_crimes +
## without_hs, data = chicago)
## test weights: listw
## 
## adjRSlag = 5.7, df = 1, p-value = 0.01696
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = unemployed ~ unemployed + harship_in + num_crimes +
## without_hs, data = chicago)
## test weights: listw
## 
## SARMA = 6.5545, df = 2, p-value = 0.03773

#this just says that the lag model works better

DISCUSSION OF RESULTS:

Initially, from running the linear regression model, it was apparent that the hardship index and percentage of individuals without high school education were significant in predicting unemployment, with p-values < 2e-16 and 1.53e-15, respectively.

From there, I conducted a Moran’s I test on the original linear model, which displayed that there was weakly significant spatial autocorrelation within the residuals; p-value = 0.07005. This means that there is correlation within the residuals, and requires further analyzing.

After that, I conducted a LM-Error test in order to determine if the error in my model was correlated across space. The Moran’s I Error test outputted Lambda: 0.18754, which is insignificant. Because this Lambda is so low, that means this model does not control for autocorrelation in the residuals. Therefore, the spatial error model is not an appropriate fit for my data at hand, and that another model might fit better.

Because the LM-Error test didn’t produce meaningful outputs, I conducted the Spatial Lag test to test for more spatial autocorrelation within the dataset and check for spatial dependencies within the analysis. The spatial lag model produced a p-value of 0.010645, which is significant. That means that this a spatial lag model might work on our data.

I used the Spatial Lag Model since the Spatial Leg test was significant. This spatial lag model gave a p-value of 0.5703, which is insignificant. This means that spatial autocorrelation is controlled within the spatial lag model, making it a good fit for our data.

Finally, I conducted a Lagrange multiplier test to test for spatial dependence in the error and dependent variables. From this the only significant models were the RSLag and the RSLag adjusted models, while the RSerr and RSerr adjusted models were insignificant. This test reinforces the idea that the Spatial Lag Model is a better fit for our data, as compared to the Spatial Error model.