Set working directory and read used cars data

setwd("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab09")
car = st_read("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab09/usedcars/usa48_usedcars.shp",
               quiet = TRUE)

Set coordinate system to NAD83, define geometry, and centroid

car = st_set_crs(car, 5070)
car_geom = st_geometry(car)
car_crs = st_centroid(car_geom)

Build neighborhood function linking the contiguous US

car_nb = knn2nb(knearneigh(car_crs, k = 4))
dists = nbdists(car_nb, car_crs)
dists = unlist(dists)
max_lnn = max(dists)
cars_dist = dnearneigh(car_crs, d1 = 0, d2 = .75 * max_lnn)

Plot

These steps are a little out of order from the instructions, but this is because the plot was having issues being in separate chunks. After much deliberation, I chose the distance function because I felt it best described how much cars would move around the east vs west coast.
Build a spatial weight matrix. It wouldn’t take glist = gl like in the example, so I tried using NULL and it ran. I tried reading documentation on it, but I for sure don’t understand the difference.

car_nb2 = tri2nb(car_crs)
inverse_distance = function(x) {1/(x/1000)}
idw = lapply(dists, inverse_distance)
car_lw_idw = nb2listw(car_nb2, glist = NULL, style ="B")
summary(car_lw_idw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48 
## Number of nonzero links: 264 
## Percentage nonzero weights: 11.45833 
## Average number of links: 5.5 
## Link number distribution:
## 
##  3  4  5  6  7  8  9 
##  1  8 16 16  4  2  1 
## 1 least connected region:
## 4 with 3 links
## 1 most connected region:
## 20 with 9 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0  S1   S2
## B 48 2304 264 528 6072

Reporting Moran’s I and z-score

car_listw = nb2listw(cars_dist)
moran.test(car$price_1960, listw = car_listw, alternative = "two.sided", randomisation = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  car$price_1960  
## weights: car_listw    
## 
## Moran I statistic standard deviate = 11.178, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.813250656      -0.021276596       0.005573973

We reject the null because p < 2.2 x 10-16. We also have z = 11.178, and Moran’s I = 0.8132.

Build a simple linear model

car_lm = lm(price_1960 ~ tax_charge, data = car)
summary(car_lm)
## 
## Call:
## lm(formula = price_1960 ~ tax_charge, data = car)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.701  -45.053   -1.461   43.400  107.807 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1435.7506    27.5796  52.058  < 2e-16 ***
## tax_charge     0.6872     0.1754   3.918 0.000294 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared:  0.2503, Adjusted R-squared:  0.234 
## F-statistic: 15.35 on 1 and 46 DF,  p-value: 0.0002939
moran.mc(car_lm$residuals, listw = car_listw, nsim = 999, alternative = "greater")
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  car_lm$residuals 
## weights: car_listw  
## number of simulations + 1: 1000 
## 
## statistic = 0.56181, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater

Intercept from the summary is 1435.75 with a p < 2 x 10-16. The coefficient for tax charge is 0.687 with p = 0.000294. The R2 value is 0.234. There is autocorrelation because p = 0.001.
Lagrange Multiplier

lmt = lm.LMtests(car_lm, car_listw, test = c("LMerr", "LMlag"))
summary(lmt)
##  Lagrange multiplier diagnostics for spatial dependence
## data:  
## model: lm(formula = price_1960 ~ tax_charge, data = car)
## weights: car_listw
##  
##       statistic parameter   p.value    
## LMerr    49.596         1 1.889e-12 ***
## LMlag    62.759         1 2.331e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both were significant so a robust test should be used

lmt_robust = lm.LMtests(car_lm, car_listw, test = c("RLMerr", "RLMlag"))
summary(lmt_robust)
##  Lagrange multiplier diagnostics for spatial dependence
## data:  
## model: lm(formula = price_1960 ~ tax_charge, data = car)
## weights: car_listw
##  
##        statistic parameter   p.value    
## RLMerr  0.087019         1 0.7680019    
## RLMlag 13.250522         1 0.0002725 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to the robust Lagrange Multiplier, a spatial lag model is necessary. This is because the RLMlag is 13.25 and RLMerr is 0.087019. With a p-value of 0.0002725 and 0.7680019, respectively.

Build a spatial model linking car prices and delivery cost

car_fit = lagsarlm(price_1960 ~ tax_charge, data = car, car_listw)
coef(car_fit)
##          rho  (Intercept)   tax_charge 
##   0.85700076 212.14323329   0.05408073
summary(car_fit)
## 
## Call:lagsarlm(formula = price_1960 ~ tax_charge, data = car, listw = car_listw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -66.9958 -24.1392   1.1423  24.1301  61.9087 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) 212.143233 106.863666  1.9852  0.04712
## tax_charge    0.054081   0.092196  0.5866  0.55748
## 
## Rho: 0.857, LR test value: 51.337, p-value: 7.7793e-13
## Asymptotic standard error: 0.069772
##     z-value: 12.283, p-value: < 2.22e-16
## Wald statistic: 150.87, p-value: < 2.22e-16
## 
## Log likelihood: -235.4973 for lag model
## ML residual variance (sigma squared): 888.07, (sigma: 29.8)
## Number of observations: 48 
## Number of parameters estimated: 4 
## AIC: 478.99, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 0.76534, p-value: 0.38166
  1. Intercept = 212.14, p-value = 0.047, tax_charge = 0.054, p-value = 0.557
  2. ρ = 0.857 p-value = 7.779x10-13
  3. AIC = 478.99 AIC(lm) = 528.33

    Testing for remaining autocorrelation in the residuals
moran.mc(car_fit$residuals, listw = car_listw, nsim = 999)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  car_fit$residuals 
## weights: car_listw  
## number of simulations + 1: 1000 
## 
## statistic = -0.039873, observed rank = 441, p-value = 0.559
## alternative hypothesis: greater
moran.test(car_fit$residuals, listw = car_listw, alternative = "two.sided", randomisation = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  car_fit$residuals  
## weights: car_listw    
## 
## Moran I statistic standard deviate = -0.25005, p-value = 0.8026
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.039872921      -0.021276596       0.005531098

Both Moran statistics are negative which means that it is negative autocorrelation saying that it is dispersed. I’m not sure it adequately accounts for autocorrelation because the p-values are high. My gut is telling me that my model got worse because it’s suddenly negative versus the positive Moran’s I we tested earlier. So I am not very confident that I could say tax charges are significant.