The file usa48_usedcars.shp in directory ‘usedcars’ contains
information on tax rates and delivery charges for new cars
(tax_charge) in the 48 conterminous U.S. states for the
period 1955-1959, as well as the average used car price for 1960
(price_1960). Use this dataset to build a spatial model
linking used car prices to the tax and delivery costs. You will need the
spdep library.
usedcars <- st_read("../../datafiles/usedcars/usa48_usedcars.shp")
## Reading layer `usa48_usedcars' from data source
## `C:\Users\sherl\OneDrive\Documents\GRAD SCHOOL II\Geog 6000 - SpatialStats\datafiles\usedcars\usa48_usedcars.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 48 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.6813 ymin: 25.12993 xmax: -67.00742 ymax: 49.38323
## CRS: NA
plot(usedcars)
It is also good to look at the numerical distribution of the data
ggplot(usedcars) +
geom_histogram(aes(x = tax_charge), binwidth = 25) +
geom_vline(xintercept = mean(usedcars$tax_charge, rm.na = T),
col = "darkblue") +
labs(x = "Tax Charge ($)")
ggplot(usedcars) +
geom_histogram(aes(x = price_1960), binwidth = 25) +
geom_vline(xintercept = mean(usedcars$price_1960, rm.na = T),
col = "darkblue") +
labs(x = "Car Price ($)")
Data looks somewhat normally distributed.
Build a neighborhood function linking the 48 states. You will need to choose one of the set of neighborhood functions available in R. Explain your choice of neighborhood function.
# calc centroid
usa_geom <- st_geometry(usedcars)
usa_cent <- st_centroid(usa_geom)
plot(usa_geom, reset = F)
plot(usa_cent, pch = 16, col = 2, add = T)
Quene’s case:
nb_qc <- poly2nb(usedcars)
plot(usa_geom, reset = F)
plot(nb_qc, usa_cent, lwd = 1.5, col = 2, add = T)
This looks pretty good. Neighbors are all surrounding states.
Rook’s Case
nb_rk <- poly2nb(usedcars, queen = F)
plot(usa_geom, reset = F)
plot(nb_rk, usa_cent, lwd = 1.5, col = 2, add = T)
This also looks good. It is slightly less complicated than the Queen’s case. For instance Utah and New Mexico are no longer neighbors, but that seems like a valid relationship.
Delauney Triangulation
nb_dtri <- tri2nb(usa_cent)
plot(usa_geom, reset = F)
plot(nb_dtri, usa_cent, lwd = 1.5, col = 2, add = T)
This is too complicated. In some ways, its good. I think there is probably a neighborhood relationship between Texas and Florida, at least economically. However I doubt there is much of a relationship between Main and North Dakota.
Sphere of Influence
nb_sphere <- graph2nb(soi.graph(nb_dtri, usa_cent))
plot(usa_geom, reset = F)
plot(nb_sphere, usa_cent, lwd = 1.5, col = 2, add = T)
I like this representation. It regionalized the neighborhoods. There is a clear Western USA, Mid-West, and East Coast relationship happening here. I am going to use this neighborhood for the remaining analysis.
Build a spatial weight matrix. Use the summary()
function to get information about the distribution of links per state
and report this
I will use the row standarized method. I feel that it is best to not make assumptions about neighborhood relationships unless there is direct evidence that one relationship is stronger than another.
spw_mtrx <- nb2listw(nb_sphere, style = "W")
summary(spw_mtrx)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48
## Number of nonzero links: 184
## Percentage nonzero weights: 7.986111
## Average number of links: 3.833333
## Link number distribution:
##
## 2 3 4 5 6
## 7 10 18 10 3
## 7 least connected regions:
## 7 17 24 32 37 41 45 with 2 links
## 3 most connected regions:
## 10 13 42 with 6 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 48 2304 48 26.50444 195.9239
Use the moran.test() or moran.mc()
function to test for spatial autocorrelation in the prices of used cars.
Report the Moran’s \(I\) statistic, the
z-score and whether or not you can reject the null hypothesis that there
is no spatial autocorrelation
moran.test(usedcars$price_1960,
listw = spw_mtrx,
alternative = "two.sided",
randomisation = T)
##
## Moran I test under randomisation
##
## data: usedcars$price_1960
## weights: spw_mtrx
##
## Moran I statistic standard deviate = 8.1079, p-value = 5.149e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.82305501 -0.02127660 0.01084445
The Moran’s \(I\) statistic is 0.8230, with a z-score of 8.11 and a \(p\)-value ~ 0. The \(p\)-value is extremely low, thus were are able to reject the null hypothesis that there is no spatial autocorrelation. These values are evidence that there is some spatial clustering in the data.
moran.mc(usedcars$price_1960,
listw = spw_mtrx,
nsim = 999,
alternative = "greater")
##
## Monte-Carlo simulation of Moran I
##
## data: usedcars$price_1960
## weights: spw_mtrx
## number of simulations + 1: 1000
##
## statistic = 0.82306, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
Running a Monte Carlo simulation returns similar results. \(I\) statistic is 0.823, and a \(p\)-value of 0.001. The \(p\)-value is slightly higher in this scenario, but remains well below our level of significance.
moran.plot(usedcars$price_1960,
spw_mtrx,
xlab = "1960 Price ($)",
ylab = "Lagged Price ($)")
lm_cluster <- localmoran(usedcars$price_1960,
listw = spw_mtrx,
alternative = "two.sided")
usedcars$zscore <- lm_cluster[ ,4]
usedcars$pval <- lm_cluster[ , 5]
ggplot(usedcars) +
geom_sf(aes(fill = zscore)) +
labs(fill = "Z-score",
title = "Used Car 1960 Price")
usedcars$pval.bin <- as.factor(ifelse(usedcars$pval < 0.05, "Significant", "Not-significant"))
ggplot(usedcars) +
geom_sf(aes(fill = pval.bin)) +
labs(fill = "Pval",
title = "Used Car 1960 Price")
The Western states and states along the Mississippi River show high autocorrelation.
Build a simple linear model using the lm() function
between the used car prices (dependent or Y variable) and the tax and
delivery cost (independent or X variable). Report the coefficients of
the model and the \(R^2\). Check for
autocorrelation in the residuals of the model using the
moran.mc() function, and state whether or not this is
present
lm1 <- lm(data = usedcars, formula = price_1960 ~ tax_charge)
summary(lm1)
##
## Call:
## lm(formula = price_1960 ~ tax_charge, data = usedcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 2e-16 ***
## tax_charge 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
The model has an intercept of 1435.75 and a slope of 0.6872 (tax $ per dollar $). The adjusted \(R^2\)$ value is 0.234.
moran.mc(residuals(lm1),
listw = spw_mtrx,
nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(lm1)
## weights: spw_mtrx
## number of simulations + 1: 1000
##
## statistic = 0.59707, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
Using the Monte Carlo method, the Moran’s \(I\) is 0.597, with a \(p\)-value of 0.001 meaning that there is autocorrelation happening in the dataset.
Use the Lagrange Multiplier test to identify whether to use a spatial error or spatial lag model. Remember that you may need to use the robust version of this test if non-robust results are both significant. Report the \(p\)-value of each test and give the model choice
lm2 <- lm.RStests(lm1, spw_mtrx, test = c("LMerr", "LMlag"))
summary(lm2)
## Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
## dependence
## data:
## model: lm(formula = price_1960 ~ tax_charge, data = usedcars)
## test weights: spw_mtrx
##
## statistic parameter p.value
## RSerr 30.989 1 2.595e-08 ***
## RSlag 40.817 1 1.672e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Spatial Errors are 31.0, the Spatial Lag is 40.8. Both tests have very low p-vals (~0).
Because both results are significant, we need to redo using the robust LaGrange Multiplier Test.
lm3 <- lm.RStests(lm1, spw_mtrx, test = c("RLMerr", "RLMlag"))
summary(lm3)
## Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
## dependence
## data:
## model: lm(formula = price_1960 ~ tax_charge, data = usedcars)
## test weights: spw_mtrx
##
## statistic parameter p.value
## adjRSerr 0.11465 1 0.734911
## adjRSlag 9.94206 1 0.001615 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The robust test tellls us that spatial lag is more likely the source of correlation.
Now build a spatial model linking the car prices and the
tax/delivery cost, using the model you chose in the previous section
(either use the lagsarlm() or errorsarlm()
function).
slm <- lagsarlm(price_1960 ~ tax_charge, data = usedcars, spw_mtrx)
summary(slm)
##
## Call:lagsarlm(formula = price_1960 ~ tax_charge, data = usedcars,
## listw = spw_mtrx)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.30197 -18.81993 0.38465 23.76249 51.56415
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 307.518016 114.156220 2.6938 0.007064
## tax_charge 0.150429 0.096269 1.5626 0.118147
##
## Rho: 0.78495, LR test value: 47.026, p-value: 7.0068e-12
## Asymptotic standard error: 0.07618
## z-value: 10.304, p-value: < 2.22e-16
## Wald statistic: 106.17, p-value: < 2.22e-16
##
## Log likelihood: -237.6531 for lag model
## ML residual variance (sigma squared): 917.43, (sigma: 30.289)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 483.31, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 1.7967, p-value: 0.18011
Report the following information:
If using a spatial lag model: a) coefficients (and their significance); b) the value of Rho (the spatial autoregressive coefficient); c) the AIC value and the AIC value of the linear model without the spatial weight matrix
The coefficients are an intercept of $307 and a slope of $0.15 per tax dollar. However the tax_charge value is no longer significant.
Rho is 0.785, with a p-value of ~0
AIC is 483.31 (528.33 without the spatial weight matrix)
Test for remaining autocorrelation in the residuals of the model
using the moran.test() or moran.mc() function.
Given the value you obtain, state whether you think that the model
adequately accounts for autocorrelation?
moran.mc(residuals(slm),
listw = spw_mtrx,
nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(slm)
## weights: spw_mtrx
## number of simulations + 1: 1000
##
## statistic = -0.074992, observed rank = 306, p-value = 0.694
## alternative hypothesis: greater
The Moran’s I for the residuals is -0.075, which is slightly dispersed, but the p-value is 0.681, meaning we can’t reject the null hypothesis that the residuals are now random.
Is the coefficient relating tax and delivery to car price significant? If not, give one reason why this may not be the case
The coefficient relating tax and delivery to car price is no longer significant. One reason may be that the delivery charge is a flat rate and the tax is not a constant percentage. Different locations have different tax rates.
The End