Read in data

cars = st_read("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/usedcars 2/usa48_usedcars.shp")
## Reading layer `usa48_usedcars' from data source 
##   `/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/usedcars 2/usa48_usedcars.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 48 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.6813 ymin: 25.12993 xmax: -67.00742 ymax: 49.38323
## CRS:           NA

Define cars geometry and centroids

cars.geom = st_geometry(cars)
cars.coords = st_centroid(cars.geom)
plot (cars.geom)
plot(cars.coords, pch = 19, col = 2, add = TRUE)

Build a neighborhood function linking the 48 states

cars2_nb <- poly2nb(cars, queen = FALSE)
plot(cars.geom, reset = FALSE)
plot(cars2_nb, cars.coords, add = TRUE, col = 2, lwd = 1.5)

I used the Rook’s Case method for my neighborhood function. I chose this function because, after running all of the functions, the Rooks looked the best for linking the 48 states based on it nicely linking states that were touching unlike functions such as the nearest neighbor or distance functions. Those functions linked states that weren’t close together or overly connected the east coast states versus the west coast. Another good option is the Queen’s Case Method.

Build a spatial weight matrix

cars2_lw_W <- nb2listw(cars2_nb, style = 'W')
summary(cars2_lw_W)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48 
## Number of nonzero links: 210 
## Percentage nonzero weights: 9.114583 
## Average number of links: 4.375 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8 
##  1  4  9 13  9  9  1  2 
## 1 least connected region:
## 17 with 1 link
## 2 most connected regions:
## 23 40 with 8 links
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1      S2
## W 48 2304 48 24.26831 197.366

The average number of links is 4.375. Link number distribution: 1:1, 2:4, 3:9, 4:13, 5:9, 6:9, 7:1, 8:2

Use the moran.test() or moran.mc() function to test for spatial autocorrelation in the prices of used cars.

cars.listw = nb2listw(cars2_nb)
moran.test(cars$price_1960, 
           listw = cars.listw, 
           alternative = "two.sided", 
           randomisation = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  cars$price_1960  
## weights: cars.listw    
## 
## Moran I statistic standard deviate = 8.0425, p-value = 8.805e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.776348813      -0.021276596       0.009836021

Moran’s I statistic: 0.776348813, z-score = 8.0425 I can reject the null hypothesis that there is no spatial autocorrelation because the p value is less than 0.05.

Build a simple linear model between the used car prices (dependent or Y variable) and the tax and delivery cost (independent or X variable).

cars.lm = lm(cars$price_1960 ~ cars$tax_charge)
coef(cars.lm)
##     (Intercept) cars$tax_charge 
##    1435.7506037       0.6871577
summary(cars.lm)
## 
## Call:
## lm(formula = cars$price_1960 ~ cars$tax_charge)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.701  -45.053   -1.461   43.400  107.807 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1435.7506    27.5796  52.058  < 2e-16 ***
## cars$tax_charge    0.6872     0.1754   3.918 0.000294 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared:  0.2503, Adjusted R-squared:  0.234 
## F-statistic: 15.35 on 1 and 46 DF,  p-value: 0.0002939

Coefficients of the model: Intercept = 1435.7506037, cars$tax_charge = 0.6871577. The Multiple R-Squared is 0.2503 and the Adjusted R-Squares is 0.234.

Check for autocorrelation in the residuals of the model using the moran.mc() function.

moran.mc(residuals(cars.lm), 
         listw = cars.listw, 
         nsim = 999)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  residuals(cars.lm) 
## weights: cars.listw  
## number of simulations + 1: 1000 
## 
## statistic = 0.57103, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater

The result of the Morans shows autocorrelation in the residuals with the p-value of 0.001.

Use the Lagrange Multiplier test to identify whether to use a spatial error or spatial lag model.

cars.lmt_robust <- lm.LMtests(cars.lm, cars.listw, test = c("RLMerr","RLMlag"))
summary(cars.lmt_robust)
##  Lagrange multiplier diagnostics for spatial dependence
## data:  
## model: lm(formula = cars$price_1960 ~ cars$tax_charge)
## weights: cars.listw
##  
##        statistic parameter p.value   
## RLMerr  0.045696         1 0.83073   
## RLMlag  8.460121         1 0.00363 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A spatial lag model is preferred according to the robust Lagrange Multiplier test because of the greater value of 8.460121 for the RLMlag compared to the RLMerr of 0.045696. The p value of the RLMerr is 0.83073 and the p value for the RLMlag is 0.00363.

Build a spatial model linking the car prices and the tax/delivery cost.

cars.fit2 <- lagsarlm(cars$price_1960 ~ cars$tax_charge, 
                     data = cars, 
                     cars.listw)
coef(cars.fit2)
##             rho     (Intercept) cars$tax_charge 
##       0.7671330     332.1737801       0.1800127
summary(cars.fit2)
## 
## Call:lagsarlm(formula = cars$price_1960 ~ cars$tax_charge, data = cars, 
##     listw = cars.listw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -78.1308 -18.2568   4.3749  20.2427  74.1526 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)     332.17378  128.35233  2.5880 0.009654
## cars$tax_charge   0.18001    0.10523  1.7106 0.087158
## 
## Rho: 0.76713, LR test value: 40.465, p-value: 2.0016e-10
## Asymptotic standard error: 0.085211
##     z-value: 9.0028, p-value: < 2.22e-16
## Wald statistic: 81.05, p-value: < 2.22e-16
## 
## Log likelihood: -240.9333 for lag model
## ML residual variance (sigma squared): 1096, (sigma: 33.106)
## Number of observations: 48 
## Number of parameters estimated: 4 
## AIC: 489.87, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.3988, p-value: 0.12143
  1. (Intercept) = 332.1737801, p value = 0.009654, cars$tax_charge = 0.1800127, p value = 0.087158
  2. rho = 0.7671330, p value = 2.0016e-10
  3. AIC = 489.87, AIC for lm = 528.33

Test for remaining autocorrelation in the residuals of the model using the moran.test() or moran.mc() function.

moran.mc(residuals(cars.fit2), 
         listw = cars.listw, 
         nsim = 999)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  residuals(cars.fit2) 
## weights: cars.listw  
## number of simulations + 1: 1000 
## 
## statistic = -0.081869, observed rank = 286, p-value = 0.714
## alternative hypothesis: greater

The spatial lag model adequately accounts for autocorrelation with of the p-value of 0.703. The coefficient relating tax and delivery to car price is not significant. This could possibly be due to people going out of state to purchase their cars due to limited availability within their own state.