Set working directory and read used cars data
setwd("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab09")
car = st_read("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab09/usedcars/usa48_usedcars.shp",
quiet = TRUE)
Set coordinate system to NAD83, define geometry, and centroid
car = st_set_crs(car, 5070)
car_geom = st_geometry(car)
car_crs = st_centroid(car_geom)
Build neighborhood function linking the contiguous US
car_nb = knn2nb(knearneigh(car_crs, k = 4))
dists = nbdists(car_nb, car_crs)
dists = unlist(dists)
max_lnn = max(dists)
cars_dist = dnearneigh(car_crs, d1 = 0, d2 = .75 * max_lnn)
Plot
These steps are a little out of order from the instructions, but this is
because the plot was having issues being in separate chunks. After much
deliberation, I chose the distance function because I felt it best
described how much cars would move around the east vs west coast.
Build a spatial weight matrix. It wouldn’t take glist = gl like in the
example, so I tried using NULL and it ran. I tried reading documentation
on it, but I for sure don’t understand the difference.
car_nb2 = tri2nb(car_crs)
inverse_distance = function(x) {1/(x/1000)}
idw = lapply(dists, inverse_distance)
car_lw_idw = nb2listw(car_nb2, glist = NULL, style ="B")
summary(car_lw_idw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48
## Number of nonzero links: 264
## Percentage nonzero weights: 11.45833
## Average number of links: 5.5
## Link number distribution:
##
## 3 4 5 6 7 8 9
## 1 8 16 16 4 2 1
## 1 least connected region:
## 4 with 3 links
## 1 most connected region:
## 20 with 9 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 48 2304 264 528 6072
Reporting Moran’s I and z-score
car_listw = nb2listw(cars_dist)
moran.test(car$price_1960, listw = car_listw, alternative = "two.sided", randomisation = TRUE)
##
## Moran I test under randomisation
##
## data: car$price_1960
## weights: car_listw
##
## Moran I statistic standard deviate = 11.178, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.813250656 -0.021276596 0.005573973
We reject the null because p < 2.2 x
10-16. We also have z = 11.178, and
Moran’s I = 0.8132.
Build a simple linear model
car_lm = lm(price_1960 ~ tax_charge, data = car)
summary(car_lm)
##
## Call:
## lm(formula = price_1960 ~ tax_charge, data = car)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 2e-16 ***
## tax_charge 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
moran.mc(car_lm$residuals, listw = car_listw, nsim = 999, alternative = "greater")
##
## Monte-Carlo simulation of Moran I
##
## data: car_lm$residuals
## weights: car_listw
## number of simulations + 1: 1000
##
## statistic = 0.56181, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
Intercept from the summary is 1435.75 with a
p < 2 x 10-16. The coefficient for tax
charge is 0.687 with p = 0.000294. The
R2 value is 0.234. There is autocorrelation
because p = 0.001.
Lagrange Multiplier
lmt = lm.LMtests(car_lm, car_listw, test = c("LMerr", "LMlag"))
summary(lmt)
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = price_1960 ~ tax_charge, data = car)
## weights: car_listw
##
## statistic parameter p.value
## LMerr 49.596 1 1.889e-12 ***
## LMlag 62.759 1 2.331e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both were significant so a robust test should be used
lmt_robust = lm.LMtests(car_lm, car_listw, test = c("RLMerr", "RLMlag"))
summary(lmt_robust)
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = price_1960 ~ tax_charge, data = car)
## weights: car_listw
##
## statistic parameter p.value
## RLMerr 0.087019 1 0.7680019
## RLMlag 13.250522 1 0.0002725 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to the robust Lagrange Multiplier, a spatial lag model is
necessary. This is because the RLMlag is 13.25 and
RLMerr is 0.087019. With a p-value of 0.0002725
and 0.7680019, respectively.
Build a spatial model linking car prices and delivery cost
car_fit = lagsarlm(price_1960 ~ tax_charge, data = car, car_listw)
coef(car_fit)
## rho (Intercept) tax_charge
## 0.85700076 212.14323329 0.05408073
summary(car_fit)
##
## Call:lagsarlm(formula = price_1960 ~ tax_charge, data = car, listw = car_listw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.9958 -24.1392 1.1423 24.1301 61.9087
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 212.143233 106.863666 1.9852 0.04712
## tax_charge 0.054081 0.092196 0.5866 0.55748
##
## Rho: 0.857, LR test value: 51.337, p-value: 7.7793e-13
## Asymptotic standard error: 0.069772
## z-value: 12.283, p-value: < 2.22e-16
## Wald statistic: 150.87, p-value: < 2.22e-16
##
## Log likelihood: -235.4973 for lag model
## ML residual variance (sigma squared): 888.07, (sigma: 29.8)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 478.99, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 0.76534, p-value: 0.38166
moran.mc(car_fit$residuals, listw = car_listw, nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: car_fit$residuals
## weights: car_listw
## number of simulations + 1: 1000
##
## statistic = -0.039873, observed rank = 441, p-value = 0.559
## alternative hypothesis: greater
moran.test(car_fit$residuals, listw = car_listw, alternative = "two.sided", randomisation = TRUE)
##
## Moran I test under randomisation
##
## data: car_fit$residuals
## weights: car_listw
##
## Moran I statistic standard deviate = -0.25005, p-value = 0.8026
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## -0.039872921 -0.021276596 0.005531098
Both Moran statistics are negative which means that it is negative autocorrelation saying that it is dispersed. I’m not sure it adequately accounts for autocorrelation because the p-values are high. My gut is telling me that my model got worse because it’s suddenly negative versus the positive Moran’s I we tested earlier. So I am not very confident that I could say tax charges are significant.