GEOG 6000
Date: 11/25/2015
Sulochan Dhungel
Reading in files, and performing simple diagnostics to check quality of data
setwd("C:/Users/Sulochan/Copy/Fall 2015/Advance Geo/Assignment 8")
cars = readShapeSpatial("usa48_usedcars.shp")
head(cars@data)
## SP_ID tax_charge price_1960
## 0 AL 129 1461
## 1 AZ 218 1601
## 2 AR 176 1469
## 3 CA 252 1611
## 4 CO 186 1606
## 5 CT 154 1491
plot(cars, axes = T)
#legend("upperleft")
taxes = cars@data$tax_charge
prices = cars@data$price_1960
boxplot(taxes)
boxplot(prices)
length(prices)
## [1] 48
The data seems to have no outliers and has data for 48 states.
#install.packages("classInt",repos="http://cran.us.r-project.org")
library(classInt)
#install.packages("RColorBrewer",repos="http://cran.us.r-project.org")
library(RColorBrewer)
nclr = 9
plotvar = taxes
class = classIntervals(plotvar, nclr, style = "pretty")
plotclr = brewer.pal(nclr, "YlOrRd")
colcode = findColours(class, plotclr, digits = 3)
plot(cars, axes = T, pch=1, col=colcode)
title(main = "Taxes and Delivery Charges for new cars")
legend("bottomleft", legend = names(attr(colcode,"table")),
fill = attr(colcode, "palette"),ncol=4,cex=0.6)
plotvar = prices
class = classIntervals(plotvar, nclr, style = "pretty")
plotclr = brewer.pal(nclr, "YlOrRd")
colcode = findColours(class, plotclr, digits = 3)
plot(cars, axes = T, pch=1, col=colcode)
title(main = "Avg Used Car Price")
legend("bottomleft", legend = names(attr(colcode,"table")),
fill = attr(colcode, "palette"),ncol=4,cex=0.6)
These two plots indicate that there is a high spatial autocorrelation in Average used car prices as compared to taxes and delivery prices.
BUILDING NEIGHBOURHOOD FUNCTIONS
nbhf = poly2nb(cars,queen=F)
plot(cars)
coords = coordinates(cars)
plot(nbhf, coords, add=T, col=2, lwd=2)
A rook’s case of boundary method for building neighbourhood function was used. Since the economies of two adjacent states are very likely to be the same, the prices and taxes would be relative between the two adjacent states.
sw_W = nb2listw(nbhf, style='W')
plot(cars)
plot(sw_W, coords, add=T, col=2, lwd=unlist(sw_W$weights[1:48])*10)
The above figure shows the neighborhood function with the line weight representing the standardized weight. Thicker lines indicate higher spatial weights.
The reason for choosing standardized weight:
If a state is connected to only one state, the economy of that state would be more related to the connected state while, if it is connected to many states around it, the state might have relatively different economy at different regions of state.
summary(sw_W)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48
## Number of nonzero links: 210
## Percentage nonzero weights: 9.114583
## Average number of links: 4.375
## Link number distribution:
##
## 1 2 3 4 5 6 7 8
## 1 4 9 13 9 9 1 2
## 1 least connected region:
## 16 with 1 link
## 2 most connected regions:
## 22 39 with 8 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 48 2304 48 24.26831 197.366
There are 210 total links with an average of 4.375 links per state.
link_no_dist = c(1,4,9,13,9,9,1,2)
barplot(link_no_dist, main = "Link Distribution", xlab = "Number of Links", ylab = "Number of states" ,names.arg=c(1:8))
box()
The above figure shows how many links are most common. There are many states with 3-6 links.
cars.listw = nb2listw(nbhf, style='W')
moran.test(prices, cars.listw, alternative = "two.sided", randomisation = TRUE)
##
## Moran's I test under randomisation
##
## data: prices
## weights: cars.listw
##
## Moran I statistic standard deviate = 8.0425, p-value = 8.805e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.776348813 -0.021276596 0.009836021
Since the value of Moran’s I = 0.78 with z-score of 8.04 and a p-value of 8.8 e-16, it gives a strong evidence of autocorrelation. So, we can reject the null hypothesis that the data is spatially random.
cars.fit1 = lm(prices ~ taxes, data = cars)
plot(taxes, prices)
abline(cars.fit1)
summary(cars.fit1)
##
## Call:
## lm(formula = prices ~ taxes, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 2e-16 ***
## taxes 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
The Coefficients for Intercept is 1435.75 and for taxes is 0.6872. Both the coefficients are highly significant. But the R-squared value is only about 0.25.
lm.morantest(cars.fit1, cars.listw)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = prices ~ taxes, data = cars)
## weights: cars.listw
##
## Moran I statistic standard deviate = 6.2968, p-value = 1.519e-10
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.571027830 -0.030313582 0.009120029
This test for autocorrelation of residuals. With high I, z-score and low p-value, it indicates that the residuals have high autocorrelation.
#LAGRANGE MULTIPLIER TEST#
summary(lm.LMtests(cars.fit1, cars.listw, test=c("LMerr","LMlag")))
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = prices ~ taxes, data = cars)
## weights: cars.listw
##
## statistic parameter p.value
## LMerr 30.957 1 2.638e-08 ***
## LMlag 39.371 1 3.504e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The non-robust version shows significance for both error model and lag model. So, we use the robust version of the test.
summary(lm.LMtests(cars.fit1, cars.listw, test=c("RLMerr","RLMlag")))
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = prices ~ taxes, data = cars)
## weights: cars.listw
##
## statistic parameter p.value
## RLMerr 0.045696 1 0.83073
## RLMlag 8.460121 1 0.00363 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The robust version of test shows that the test is significant in RMlag test, suggesting use of a spatial lag model.
The p-values for RLMerror is 0.83 and for RLMlag is 0.00363.
cars.fit2 = lagsarlm(prices~taxes, data=cars, cars.listw)
summary(cars.fit2)
##
## Call:lagsarlm(formula = prices ~ taxes, data = cars, listw = cars.listw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.1308 -18.2568 4.3749 20.2427 74.1526
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 332.17378 128.35233 2.5880 0.009654
## taxes 0.18001 0.10523 1.7106 0.087158
##
## Rho: 0.76713, LR test value: 40.465, p-value: 2.0016e-10
## Asymptotic standard error: 0.085211
## z-value: 9.0028, p-value: < 2.22e-16
## Wald statistic: 81.05, p-value: < 2.22e-16
##
## Log likelihood: -240.9333 for lag model
## ML residual variance (sigma squared): 1096, (sigma: 33.106)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 489.87, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.3988, p-value: 0.12143
The coefficients for intercept is 332.1 with a pvalue of 0.009654, which is highly significant and for taxes is 0.18 with p-value of 0.087 which is less significant.
Value of Rho (spatial autoregressive coefficient) is 0.767 which shows the influence of neighbours.
AIC value of the spatial lag model is 489 while AIC value for linear model was 528. So, the reduction in this value suggests that spatial lag model explains variability better than a single linear model.
The LM test for residual autocorrelation gives a value of 2.4 with a pvalue of 0.12, which can be considered insignificant. This indicates that there is no significant residual autocorrelation in the model.
We can say that the model has adequately accounted for autocorrelation.
I think that the coefficient relating tax to car prices is not too significant and depends mostly on the prices of neighbouring states.