Read in data
cars = st_read("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/usedcars 2/usa48_usedcars.shp")
## Reading layer `usa48_usedcars' from data source
## `/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/usedcars 2/usa48_usedcars.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 48 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.6813 ymin: 25.12993 xmax: -67.00742 ymax: 49.38323
## CRS: NA
Define cars geometry and centroids
cars.geom = st_geometry(cars)
cars.coords = st_centroid(cars.geom)
plot (cars.geom)
plot(cars.coords, pch = 19, col = 2, add = TRUE)
Build a neighborhood function linking the 48 states
cars2_nb <- poly2nb(cars, queen = FALSE)
plot(cars.geom, reset = FALSE)
plot(cars2_nb, cars.coords, add = TRUE, col = 2, lwd = 1.5)
I used the Rook’s Case method for my neighborhood function. I chose this function because, after running all of the functions, the Rooks looked the best for linking the 48 states based on it nicely linking states that were touching unlike functions such as the nearest neighbor or distance functions. Those functions linked states that weren’t close together or overly connected the east coast states versus the west coast. Another good option is the Queen’s Case Method.
Build a spatial weight matrix
cars2_lw_W <- nb2listw(cars2_nb, style = 'W')
summary(cars2_lw_W)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48
## Number of nonzero links: 210
## Percentage nonzero weights: 9.114583
## Average number of links: 4.375
## Link number distribution:
##
## 1 2 3 4 5 6 7 8
## 1 4 9 13 9 9 1 2
## 1 least connected region:
## 17 with 1 link
## 2 most connected regions:
## 23 40 with 8 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 48 2304 48 24.26831 197.366
The average number of links is 4.375. Link number distribution: 1:1, 2:4, 3:9, 4:13, 5:9, 6:9, 7:1, 8:2
Use the moran.test() or moran.mc() function to test for spatial autocorrelation in the prices of used cars.
cars.listw = nb2listw(cars2_nb)
moran.test(cars$price_1960,
listw = cars.listw,
alternative = "two.sided",
randomisation = TRUE)
##
## Moran I test under randomisation
##
## data: cars$price_1960
## weights: cars.listw
##
## Moran I statistic standard deviate = 8.0425, p-value = 8.805e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.776348813 -0.021276596 0.009836021
Moran’s I statistic: 0.776348813, z-score = 8.0425 I can reject the null hypothesis that there is no spatial autocorrelation because the p value is less than 0.05.
Build a simple linear model between the used car prices (dependent or Y variable) and the tax and delivery cost (independent or X variable).
cars.lm = lm(cars$price_1960 ~ cars$tax_charge)
coef(cars.lm)
## (Intercept) cars$tax_charge
## 1435.7506037 0.6871577
summary(cars.lm)
##
## Call:
## lm(formula = cars$price_1960 ~ cars$tax_charge)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 2e-16 ***
## cars$tax_charge 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
Coefficients of the model: Intercept = 1435.7506037, cars$tax_charge = 0.6871577. The Multiple R-Squared is 0.2503 and the Adjusted R-Squares is 0.234.
Check for autocorrelation in the residuals of the model using the moran.mc() function.
moran.mc(residuals(cars.lm),
listw = cars.listw,
nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(cars.lm)
## weights: cars.listw
## number of simulations + 1: 1000
##
## statistic = 0.57103, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
The result of the Morans shows autocorrelation in the residuals with the p-value of 0.001.
Use the Lagrange Multiplier test to identify whether to use a spatial error or spatial lag model.
cars.lmt_robust <- lm.LMtests(cars.lm, cars.listw, test = c("RLMerr","RLMlag"))
summary(cars.lmt_robust)
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = cars$price_1960 ~ cars$tax_charge)
## weights: cars.listw
##
## statistic parameter p.value
## RLMerr 0.045696 1 0.83073
## RLMlag 8.460121 1 0.00363 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A spatial lag model is preferred according to the robust Lagrange Multiplier test because of the greater value of 8.460121 for the RLMlag compared to the RLMerr of 0.045696. The p value of the RLMerr is 0.83073 and the p value for the RLMlag is 0.00363.
Build a spatial model linking the car prices and the tax/delivery cost.
cars.fit2 <- lagsarlm(cars$price_1960 ~ cars$tax_charge,
data = cars,
cars.listw)
coef(cars.fit2)
## rho (Intercept) cars$tax_charge
## 0.7671330 332.1737801 0.1800127
summary(cars.fit2)
##
## Call:lagsarlm(formula = cars$price_1960 ~ cars$tax_charge, data = cars,
## listw = cars.listw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.1308 -18.2568 4.3749 20.2427 74.1526
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 332.17378 128.35233 2.5880 0.009654
## cars$tax_charge 0.18001 0.10523 1.7106 0.087158
##
## Rho: 0.76713, LR test value: 40.465, p-value: 2.0016e-10
## Asymptotic standard error: 0.085211
## z-value: 9.0028, p-value: < 2.22e-16
## Wald statistic: 81.05, p-value: < 2.22e-16
##
## Log likelihood: -240.9333 for lag model
## ML residual variance (sigma squared): 1096, (sigma: 33.106)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 489.87, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.3988, p-value: 0.12143
Test for remaining autocorrelation in the residuals of the model using the moran.test() or moran.mc() function.
moran.mc(residuals(cars.fit2),
listw = cars.listw,
nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(cars.fit2)
## weights: cars.listw
## number of simulations + 1: 1000
##
## statistic = -0.081869, observed rank = 286, p-value = 0.714
## alternative hypothesis: greater
The spatial lag model adequately accounts for autocorrelation with of the p-value of 0.703. The coefficient relating tax and delivery to car price is not significant. This could possibly be due to people going out of state to purchase their cars due to limited availability within their own state.