The file usa48_usedcars.shp in directory ‘usedcars’ contains information on tax rates and delivery charges for new cars (tax_charge) in the 48 conterminous U.S. states for the period 1955-1959, as well as the average used car price for 1960 (price_1960). Use this dataset to build a spatial model linking used car prices to the tax and delivery costs. You will need the spdep library.
Build a neighborhood function linking the 48 states. You will need to choose one of the set of neighborhood functions available in R. Explain your choice of neighborhood function
library(spdep)
## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(spatialreg)
## Loading required package: Matrix
##
## Attaching package: 'spatialreg'
## The following objects are masked from 'package:spdep':
##
## as.spam.listw, as_dgRMatrix_listw, as_dsCMatrix_I,
## as_dsCMatrix_IrW, as_dsTMatrix_listw, can.be.simmed, cheb_setup,
## create_WX, do_ldet, eigen_pre_setup, eigen_setup, eigenw,
## errorsarlm, get.ClusterOption, get.coresOption, get.mcOption,
## get.VerboseOption, get.ZeroPolicyOption, GMargminImage, GMerrorsar,
## griffith_sone, gstsls, Hausman.test, impacts, intImpacts,
## Jacobian_W, jacobianSetup, l_max, lagmess, lagsarlm, lextrB,
## lextrS, lextrW, lmSLX, LU_prepermutate_setup, LU_setup,
## Matrix_J_setup, Matrix_setup, mcdet_setup, MCMCsamp, ME, mom_calc,
## mom_calc_int2, moments_setup, powerWeights, sacsarlm,
## SE_classic_setup, SE_interp_setup, SE_whichMin_setup,
## set.ClusterOption, set.coresOption, set.mcOption,
## set.VerboseOption, set.ZeroPolicyOption, similar.listw, spam_setup,
## spam_update_setup, SpatialFiltering, spautolm, spBreg_err,
## spBreg_lag, spBreg_sac, stsls, subgraph_eigenw, trW
usedcars = st_read("../datafiles/usedcars/usedcars/usa48_usedcars.shp")
## Reading layer `usa48_usedcars' from data source
## `N:\Projects\geog6000\datafiles\usedcars\usedcars\usa48_usedcars.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 48 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.6813 ymin: 25.12993 xmax: -67.00742 ymax: 49.38323
## CRS: NA
usedcars.geom = st_geometry(usedcars)
usedcars.coords = st_centroid(usedcars.geom)
car1_nb = poly2nb(usedcars, queen = FALSE)
car2_nb = poly2nb(usedcars, queen = TRUE)
plot(usedcars.geom, reset = FALSE)
plot(car1_nb, usedcars.coords, add = TRUE, col = 2, lwd = 2)
plot(usedcars.geom, reset = FALSE)
plot(car2_nb, usedcars.coords, add = TRUE, col = 2, lwd = 2)
I’m going to go with the Queen’s Case method because I want to retain the connections between Utah and New Mexico, and Arizona and Colorado. I think that the most important neighborhood structure in the case of used cars might be based on political and economic climate, which I think might have more of a regional effect than an effect that’s based on shared border length. For this reason I am also not worried about Michigan and Wisconsin (both, I would say, belong to the Midwest manufacturing region).
Build a spatial weight matrix. Use the summary() function to get information about the distribution of links per state and report this
car2_lw_B = nb2listw(car2_nb, style = 'B')
summary(car2_lw_B)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48
## Number of nonzero links: 214
## Percentage nonzero weights: 9.288194
## Average number of links: 4.458333
## Link number distribution:
##
## 1 2 3 4 5 6 7 8
## 1 4 9 11 10 9 2 2
## 1 least connected region:
## 17 with 1 link
## 2 most connected regions:
## 23 40 with 8 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 48 2304 214 428 4296
Use the moran.test() or moran.mc() function to test for spatial autocorrelation in the prices of used cars. Report the Moran’s I statistic, the z-score and whether or not you can reject the null hypothesis that there is no spatial autocorrelation
moran.test(usedcars$price_1960,
listw = car2_lw_B,
alternative = "two.sided",
randomisation = TRUE)
##
## Moran I test under randomisation
##
## data: usedcars$price_1960
## weights: car2_lw_B
##
## Moran I statistic standard deviate = 9.6492, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.865236386 -0.021276596 0.008440857
Global autocorrelation exists as evidenced by the p-value of < 2.2e-16.The z-score is 9.6492. What about local autocorrelation?
Build a simple linear model using the lm() function between the used car prices (dependent or Y variable) and the tax and delivery cost (independent or X variable). Report the coefficients of the model and the R2.
usedcars.fit1 = lm(price_1960 ~ tax_charge, data = usedcars)
summary(usedcars.fit1)
##
## Call:
## lm(formula = price_1960 ~ tax_charge, data = usedcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 2e-16 ***
## tax_charge 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
The linear model fits decently, with an increase in tax and delivery cost positively and significantly (p < .05) associated with an increase in car price, and a significant F-statistic. The R2 is 0.2503.
Check for autocorrelation in the residuals of the model using the moran.mc() function, and state whether or not this is present
moran.mc(residuals(usedcars.fit1),
listw = car2_lw_B,
nsim = 9999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(usedcars.fit1)
## weights: car2_lw_B
## number of simulations + 1: 10000
##
## statistic = 0.64935, observed rank = 10000, p-value = 1e-04
## alternative hypothesis: greater
Observed I is greater than all other simulated I’s. There is definitely spatial autocorrelation in the residuals.
Use the Lagrange Multiplier test to identify whether to use a spatial error or spatial lag model. Remember that you may need to use the robust version of this test if non-robust results are both significant. Report the p-value of each test and give the model choice
usedcars.lmt = lm.LMtests(usedcars.fit1, car2_lw_B, test = c("LMerr", "LMlag"))
## Warning in lm.LMtests(usedcars.fit1, car2_lw_B, test = c("LMerr", "LMlag")):
## Spatial weights matrix not row standardized
summary(usedcars.lmt)
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = price_1960 ~ tax_charge, data = usedcars)
## weights: car2_lw_B
##
## statistic parameter p.value
## LMerr 45.117555 1 1.856e-11 ***
## LMlag 0.064983 1 0.7988
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Lagrange Multiplier test gave clear results that the autocorrelation of our model is in the model errors (statistic = 45.12, p-value = 1.856e-11) and not in the dependent variable (statistic = 0.065, p-value = 0.799). Thus we should use a spatial error model.
Now build a spatial model linking the car prices and the tax/delivery cost, using the model you chose in the previous section (either use the lagsarlm() or errorsarlm() function).
usedcars.fit2 = errorsarlm(price_1960 ~ tax_charge,
data = usedcars,
car2_lw_B)
summary(usedcars.fit2)
##
## Call:errorsarlm(formula = price_1960 ~ tax_charge, data = usedcars,
## listw = car2_lw_B)
##
## Residuals:
## Min 1Q Median 3Q Max
## -74.9758 -14.0692 1.0081 22.2472 64.8380
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1515.53110 20.34911 74.4765 <2e-16
## tax_charge 0.15942 0.10848 1.4696 0.1417
##
## Lambda: 0.17108, LR test value: 48.482, p-value: 3.334e-12
## Asymptotic standard error: 0.0091693
## z-value: 18.658, p-value: < 2.22e-16
## Wald statistic: 348.12, p-value: < 2.22e-16
##
## Log likelihood: -236.925 for error model
## ML residual variance (sigma squared): 903.69, (sigma: 30.061)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 481.85, (AIC for lm: 528.33)
By using the spatial error test it now appears that the tax and delivery cost variable has both reduced in effect as well as become insignificant at a p-value of 0.142 (given the neighborhood and matrix choices I made). Lambda shows us the strength of the relationship of errors and their neighbors. In this case, the significance test on lambda shows that the spatial error is significant (p = 3.334e-12). The AIC value has decreased to 481.85 in the spatial error model from 528.33 in the linear model, indicating better goodness-of-fit.
Test for remaining autocorrelation in the residuals of the model using the moran.test() or moran.mc() function. Given the value you obtain, state whether you think that the model adequately accounts for autocorrelation?
moran.mc(residuals(usedcars.fit2),
listw = car2_lw_B,
nsim = 9999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(usedcars.fit2)
## weights: car2_lw_B
## number of simulations + 1: 10000
##
## statistic = -0.076852, observed rank = 2837, p-value = 0.7163
## alternative hypothesis: greater
The Moran I Monte Carlo test on the residuals of our spatial error model indicated that the remaining residuals are not significantly autocorrelated, given a p-value of 0.7175. Thus we can say that the model does indeed account for autocorrelation.
Is the coefficient relating tax and delivery to car price significant? If not, give one reason why this may not be the case
No, it is no longer significant. It could be that the spatial similarity in car prices is more of a predictor than tax and delivery price. i.e., accounting for regional similarity removes any significance of tax and delivery price.