Advanced Geographical Statistics

Assignment 8

GEOG 6000

Date: 11/25/2015

Sulochan Dhungel

\(Answer 1\)


Reading in files, and performing simple diagnostics to check quality of data

setwd("C:/Users/Sulochan/Copy/Fall 2015/Advance Geo/Assignment 8")
cars = readShapeSpatial("usa48_usedcars.shp")
head(cars@data)
##   SP_ID tax_charge price_1960
## 0    AL        129       1461
## 1    AZ        218       1601
## 2    AR        176       1469
## 3    CA        252       1611
## 4    CO        186       1606
## 5    CT        154       1491
plot(cars, axes = T)

#legend("upperleft")
taxes = cars@data$tax_charge
prices = cars@data$price_1960
boxplot(taxes)

boxplot(prices)

length(prices)
## [1] 48


The data seems to have no outliers and has data for 48 states.

#install.packages("classInt",repos="http://cran.us.r-project.org")
library(classInt)
#install.packages("RColorBrewer",repos="http://cran.us.r-project.org")
library(RColorBrewer)
nclr = 9
plotvar = taxes
class = classIntervals(plotvar, nclr, style = "pretty")
plotclr = brewer.pal(nclr, "YlOrRd")
colcode = findColours(class, plotclr, digits = 3)
plot(cars, axes = T, pch=1, col=colcode)
title(main = "Taxes and Delivery Charges for new cars")
legend("bottomleft", legend = names(attr(colcode,"table")),
       fill = attr(colcode, "palette"),ncol=4,cex=0.6)

plotvar = prices
class = classIntervals(plotvar, nclr, style = "pretty")
plotclr = brewer.pal(nclr, "YlOrRd")
colcode = findColours(class, plotclr, digits = 3)
plot(cars, axes = T, pch=1, col=colcode)
title(main = "Avg Used Car Price")
legend("bottomleft", legend = names(attr(colcode,"table")),
       fill = attr(colcode, "palette"),ncol=4,cex=0.6)


These two plots indicate that there is a high spatial autocorrelation in Average used car prices as compared to taxes and delivery prices.
BUILDING NEIGHBOURHOOD FUNCTIONS

nbhf = poly2nb(cars,queen=F)
plot(cars)
coords = coordinates(cars)
plot(nbhf, coords, add=T, col=2, lwd=2)


A rook’s case of boundary method for building neighbourhood function was used. Since the economies of two adjacent states are very likely to be the same, the prices and taxes would be relative between the two adjacent states.

sw_W = nb2listw(nbhf, style='W')
plot(cars)
plot(sw_W, coords, add=T, col=2, lwd=unlist(sw_W$weights[1:48])*10)


The above figure shows the neighborhood function with the line weight representing the standardized weight. Thicker lines indicate higher spatial weights.
The reason for choosing standardized weight:
If a state is connected to only one state, the economy of that state would be more related to the connected state while, if it is connected to many states around it, the state might have relatively different economy at different regions of state.

summary(sw_W)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48 
## Number of nonzero links: 210 
## Percentage nonzero weights: 9.114583 
## Average number of links: 4.375 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8 
##  1  4  9 13  9  9  1  2 
## 1 least connected region:
## 16 with 1 link
## 2 most connected regions:
## 22 39 with 8 links
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1      S2
## W 48 2304 48 24.26831 197.366


There are 210 total links with an average of 4.375 links per state.

link_no_dist = c(1,4,9,13,9,9,1,2)
barplot(link_no_dist, main = "Link Distribution", xlab = "Number of Links", ylab = "Number of states" ,names.arg=c(1:8))
box()


The above figure shows how many links are most common. There are many states with 3-6 links.

cars.listw = nb2listw(nbhf, style='W')
moran.test(prices, cars.listw, alternative = "two.sided", randomisation = TRUE)
## 
##  Moran's I test under randomisation
## 
## data:  prices  
## weights: cars.listw  
## 
## Moran I statistic standard deviate = 8.0425, p-value = 8.805e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.776348813      -0.021276596       0.009836021


Since the value of Moran’s I = 0.78 with z-score of 8.04 and a p-value of 8.8 e-16, it gives a strong evidence of autocorrelation. So, we can reject the null hypothesis that the data is spatially random.

cars.fit1 = lm(prices ~ taxes, data = cars)
plot(taxes, prices)
abline(cars.fit1)

summary(cars.fit1)
## 
## Call:
## lm(formula = prices ~ taxes, data = cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.701  -45.053   -1.461   43.400  107.807 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1435.7506    27.5796  52.058  < 2e-16 ***
## taxes          0.6872     0.1754   3.918 0.000294 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared:  0.2503, Adjusted R-squared:  0.234 
## F-statistic: 15.35 on 1 and 46 DF,  p-value: 0.0002939


The Coefficients for Intercept is 1435.75 and for taxes is 0.6872. Both the coefficients are highly significant. But the R-squared value is only about 0.25.

lm.morantest(cars.fit1, cars.listw)
## 
##  Global Moran's I for regression residuals
## 
## data:  
## model: lm(formula = prices ~ taxes, data = cars)
## weights: cars.listw
## 
## Moran I statistic standard deviate = 6.2968, p-value = 1.519e-10
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I        Expectation           Variance 
##        0.571027830       -0.030313582        0.009120029


This test for autocorrelation of residuals. With high I, z-score and low p-value, it indicates that the residuals have high autocorrelation.

#LAGRANGE MULTIPLIER TEST#

summary(lm.LMtests(cars.fit1, cars.listw, test=c("LMerr","LMlag")))
##  Lagrange multiplier diagnostics for spatial dependence
## data:  
## model: lm(formula = prices ~ taxes, data = cars)
## weights: cars.listw
##  
##       statistic parameter   p.value    
## LMerr    30.957         1 2.638e-08 ***
## LMlag    39.371         1 3.504e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


The non-robust version shows significance for both error model and lag model. So, we use the robust version of the test.

summary(lm.LMtests(cars.fit1, cars.listw, test=c("RLMerr","RLMlag")))
##  Lagrange multiplier diagnostics for spatial dependence
## data:  
## model: lm(formula = prices ~ taxes, data = cars)
## weights: cars.listw
##  
##        statistic parameter p.value   
## RLMerr  0.045696         1 0.83073   
## RLMlag  8.460121         1 0.00363 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


The robust version of test shows that the test is significant in RMlag test, suggesting use of a spatial lag model.
The p-values for RLMerror is 0.83 and for RLMlag is 0.00363.

cars.fit2 = lagsarlm(prices~taxes, data=cars, cars.listw)
summary(cars.fit2)
## 
## Call:lagsarlm(formula = prices ~ taxes, data = cars, listw = cars.listw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -78.1308 -18.2568   4.3749  20.2427  74.1526 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 332.17378  128.35233  2.5880 0.009654
## taxes         0.18001    0.10523  1.7106 0.087158
## 
## Rho: 0.76713, LR test value: 40.465, p-value: 2.0016e-10
## Asymptotic standard error: 0.085211
##     z-value: 9.0028, p-value: < 2.22e-16
## Wald statistic: 81.05, p-value: < 2.22e-16
## 
## Log likelihood: -240.9333 for lag model
## ML residual variance (sigma squared): 1096, (sigma: 33.106)
## Number of observations: 48 
## Number of parameters estimated: 4 
## AIC: 489.87, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.3988, p-value: 0.12143


The coefficients for intercept is 332.1 with a pvalue of 0.009654, which is highly significant and for taxes is 0.18 with p-value of 0.087 which is less significant.
Value of Rho (spatial autoregressive coefficient) is 0.767 which shows the influence of neighbours.
AIC value of the spatial lag model is 489 while AIC value for linear model was 528. So, the reduction in this value suggests that spatial lag model explains variability better than a single linear model.
The LM test for residual autocorrelation gives a value of 2.4 with a pvalue of 0.12, which can be considered insignificant. This indicates that there is no significant residual autocorrelation in the model.
We can say that the model has adequately accounted for autocorrelation.
I think that the coefficient relating tax to car prices is not too significant and depends mostly on the prices of neighbouring states.