En este laboratorio, las/los estudiantes deberán utilizar sus conocimientos adquiridos en análisis exploratorio de datos espaciales, para implementar modelos de regresión lineales espaciales. De esta forma, los elementos que consideraremos para la realización de este laboratorio son:
Antes de comenzar, debemos incluir los paquetes estadísticos necesarios para llevar a cabo el análisis exploratorio espacial, la construcción de matrices de peso espacial y medidas de autocorrelación. Preliminarmente, estos paquetes son: (Estos paquetes se incluyen de acuerdo al supuesto de que ya están instalados en R. En el caso de que no estén instalados, usar el comando install.package(“nombre del paquete”). También, es posible instalar los paquetes en el menú Herramientas (tools) –> instalar paquetes)
library(spatialreg)
library(spdep)
library(maptools)
library(RColorBrewer)
library(leaflet)
library(dplyr)
library(ggplot2)
library(tmap)
library(tmaptools)
library(lmtest)
library(splm)
library(data.table)
Por favor, fijar el directorio de trabajo en un lugar de su computador que sea de fácil acceso.
En esta sección, usaremos la siguiente base de datos que contiene información acerca de los precios de autos usados y el pago de impuestos por autos nuevos por Estado.
Esta base de datos corresponde al paquete spdep.
data("used.cars")
ls()
## [1] "usa48.nb" "used.cars"
str("used.cars")
## chr "used.cars"
data1<-used.cars
setDT(data1, keep.rownames = "ID_STATE")[]
## ID_STATE tax.charges price.1960
## 1: AL 129 1461
## 2: AZ 218 1601
## 3: AR 176 1469
## 4: CA 252 1611
## 5: CO 186 1606
## 6: CT 154 1491
## 7: DE 92 1536
## 8: FL 150 1517
## 9: GA 149 1481
## 10: ID 168 1659
## 11: IL 138 1515
## 12: IN 52 1460
## 13: IA 195 1592
## 14: KS 141 1574
## 15: KY 144 1418
## 16: LA 165 1509
## 17: ME 159 1547
## 18: MD 139 1510
## 19: MA 96 1572
## 20: MI 133 1509
## 21: MN 82 1586
## 22: MS 159 1460
## 23: MO 136 1468
## 24: MT 196 1631
## 25: NE 97 1584
## 26: NV 220 1636
## 27: NH 96 1539
## 28: NJ 89 1520
## 29: NM 185 1626
## 30: NY 115 1544
## 31: NC 122 1477
## 32: ND 153 1609
## 33: OH 135 1466
## 34: OK 171 1468
## 35: OR 164 1627
## 36: PA 161 1502
## 37: RI 174 1555
## 38: SC 153 1465
## 39: SD 172 1601
## 40: TN 133 1463
## 41: TX 178 1511
## 42: UT 257 1647
## 43: VT 112 1559
## 44: VA 93 1495
## 45: WA 265 1592
## 46: WV 105 1470
## 47: WI 58 1473
## 48: WY 188 1655
## ID_STATE tax.charges price.1960
attach(data1)
Para realizar el análisis exploratorio de datos, usaremos el siguiente archivo shape:
US.shape<-readShapePoly('USA_STATES.shp')
class(US.shape)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
plot(US.shape)
attach(US.shape@data)
Con el fin de explorar la variable dependiente correspondiente a los precios de los autos usados, uniremos al shape la información contenida en la base de datos que hemos obtenido en el paquete.
US.shape@data<-merge(US.shape@data, data1, by.x="STATE_ABBR", by.y="ID_STATE", all.x=TRUE, sort=FALSE)
Graficamos el comportamiento de la variable dependiente del modelo que estimaremos a continuación.
qpal<-colorQuantile("OrRd", US.shape@data$price.1960, n=9)
leaflet(US.shape) %>%
addPolygons(stroke = FALSE, fillOpacity = .8, smoothFactor = 0.2, color = ~qpal(price.1960)
) %>%
addTiles()
## rgeos version: 0.6-4, (SVN revision 699)
## GEOS runtime version: 3.11.0-CAPI-1.17.0
## Please note that rgeos will be retired during October 2023,
## plan transition to sf or terra functions using GEOS at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html for details.
## GEOS using OverlayNG
## Linking to sp version: 2.0-0
## Polygon checking: TRUE
Para comenzar con el análisis, primero calcularemos un modelo de regresión lineal a través de OLS. De esta forma, el modelo que construiremos estará determinado por:
\[ \begin{align*} y=\beta_{0}+\beta_{1}X_{1}+\epsilon \end{align*} \] Donde la variable dependiente \(y\) es el precio de los autos usados en 1960 y la variable explicativa \(X_{1}\) son los impuestos.
Asuminos los supuestos usuales de regresión lineal, los cuales revisaremos a continuación. Estimamos el modelo a través de OLS.
modelUS_1<-lm(price.1960 ~ tax.charges, data=data1)
summary(modelUS_1)
##
## Call:
## lm(formula = price.1960 ~ tax.charges, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 2e-16 ***
## tax.charges 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
Para verificar que la relación a estudiar no genera residuos que se encuentren correlacionados espacialmente, extraemos los errores del modelo OLS y aplicamos el contraste de hipótesis usual, usando el estadístico de Moran (ó Moran’s I):
Antes de aplicar el contraste de hipótesis, construimos la matriz de peso espacial \(W\). Usamos la estructura de vecindario definida por los autores de la base de datos.
rook.US<-nb2listw(usa48.nb, style = "W")
summary(rook.US)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48
## Number of nonzero links: 214
## Percentage nonzero weights: 9.288194
## Average number of links: 4.458333
## Link number distribution:
##
## 1 2 3 4 5 6 7 8
## 1 4 9 11 10 9 2 2
## 1 least connected region:
## ME with 1 link
## 2 most connected regions:
## MO TN with 8 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 48 2304 48 23.94514 197.3707
lm.morantest(modelUS_1, listw = rook.US)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = data1)
## weights: rook.US
##
## Moran I statistic standard deviate = 6.3869, p-value = 8.466e-11
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.574817771 -0.030300549 0.008976437
lm.LMtests(modelUS_1, listw = rook.US, test = "all" )
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = data1)
## weights: rook.US
##
## LMerr = 31.793, df = 1, p-value = 1.715e-08
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = data1)
## weights: rook.US
##
## LMlag = 40.664, df = 1, p-value = 1.808e-10
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = data1)
## weights: rook.US
##
## RLMerr = 0.051751, df = 1, p-value = 0.82
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = data1)
## weights: rook.US
##
## RLMlag = 8.9229, df = 1, p-value = 0.002816
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = data1)
## weights: rook.US
##
## SARMA = 40.716, df = 2, p-value = 1.441e-09
A pesar de que nuestro modelo goza de las características ideales asociadas a un modelo de regresión líneal estándar, hemos comprobado que los residuos de la regresión presentan autocorrelación espacial. Por tanto, para reconocer la existencia de esta estructura espacial en los errores, usamos el Modelo del Error Espacial. Este modelo se define como:
\[ \begin{align*} y&=\beta_{0}+X_{1}\beta_{1}+\mu \\ \mu&=\lambda W \mu + \epsilon \end{align*} \]
Donde la variable \(X_1\) es el impuesto, mientras que \(W\) es la matriz de peso espacial. Asumimos que la matriz W está estándarizada por filas. El parámetro \(\lambda\) es el indicador que mide la fuerza espacial de los residuos y los residuos de los vecinos. En R, estimamos el modelo vía Maximum Likelihood (máxima verosimilitud) de la siguiente forma:
US.sem1<-errorsarlm(price.1960 ~ tax.charges, listw=rook.US, data=data1)
summary(US.sem1)
##
## Call:errorsarlm(formula = price.1960 ~ tax.charges, data = data1,
## listw = rook.US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -74.824 -17.459 2.406 21.278 64.597
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.5283e+03 3.1963e+01 47.8167 <2e-16
## tax.charges 8.8309e-02 1.1923e-01 0.7406 0.4589
##
## Lambda: 0.819, LR test value: 40.899, p-value: 1.603e-10
## Asymptotic standard error: 0.074051
## z-value: 11.06, p-value: < 2.22e-16
## Wald statistic: 122.32, p-value: < 2.22e-16
##
## Log likelihood: -240.7163 for error model
## ML residual variance (sigma squared): 1043.9, (sigma: 32.309)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: NA (not available for weighted model), (AIC for lm: 528.33)
En este modelo, la variable \(X_1\) deja de ser significativa. Mientras que \(\lambda\) nos indica que el modelo tiene una fuerte correlación espacial en los residuos.
Podemos seguir explorando el comportamiento de los residuos, para cercionarnos de que SEM podría eventualmente ser un modelo adecuado para expresar la relación entre ambas variables.
US.sem2<-GMerrorsar(price.1960 ~ tax.charges, listw=rook.US, data=data1)
summary(US.sem2)
##
## Call:GMerrorsar(formula = price.1960 ~ tax.charges, data = data1,
## listw = rook.US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -120.6190 -57.9041 -1.5585 53.3029 116.1085
##
## Type: GM SAR estimator
## Coefficients: (GM standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1512.98359 28.69940 52.7183 <2e-16
## tax.charges 0.17802 0.15018 1.1854 0.2359
##
## Lambda: 0.65398 (standard error): 0.21084 (z-value): 3.1017
## Residual variance (sigma squared): 1672.5, (sigma: 40.896)
## GM argmin sigma squared: 1692.6
## Number of observations: 48
## Number of parameters estimated: 4
US.sar1<-lagsarlm(price.1960 ~ tax.charges, listw=rook.US, data=data1)
summary(US.sar1)
##
## Call:
## lagsarlm(formula = price.1960 ~ tax.charges, data = data1, listw = rook.US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.6781 -16.9504 4.2497 19.5484 58.9811
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 309.42546 123.03167 2.5150 0.0119
## tax.charges 0.16711 0.10212 1.6364 0.1018
##
## Rho: 0.78302, LR test value: 42.681, p-value: 6.4426e-11
## Asymptotic standard error: 0.081636
## z-value: 9.5916, p-value: < 2.22e-16
## Wald statistic: 91.998, p-value: < 2.22e-16
##
## Log likelihood: -239.8252 for lag model
## ML residual variance (sigma squared): 1036.7, (sigma: 32.197)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 487.65, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.1141, p-value: 0.14594
En esta sección, usaremos la siguiente base de datos.
data(boston)
ls()
## [1] "boston.c" "boston.soi" "boston.utm" "data1" "modelUS_1"
## [6] "qpal" "rook.US" "US.sar1" "US.sem1" "US.sem2"
## [11] "US.shape" "usa48.nb" "used.cars"
data2<-boston.c
attach(data2)
Esta base de datos está representada por centroides. Son 506 observaciones de los precios de vivienda de Boston. Con esta información, exploraremos los posibles determinantes de los precios de vivienda en Boston, usando como proxy de los precios de vivienda el precio promedio de 506 census tracts (blocks) y otras variables explicativas medidas al mismo nivel de datos.
map<-SpatialPointsDataFrame(data=data2, coords = cbind(LON,LAT))
colours=c("dark blue", "blue", "red", "dark red")
spplot(map, "MEDV", cuts=quantile(MEDV), col.regions=colours, cex=0.7,
contour=TRUE)
Housing Prices by Census Tracts in Boston
leaflet(data=data2) %>%
addTiles() %>%
addMarkers(clusterOptions = markerClusterOptions())
## Assuming "LON" and "LAT" are longitude and latitude, respectively
Para comenzar con el análisis, primero calcularemos un modelo de regresión lineal a través de OLS. De esta forma, el modelo que construiremos estará determinado por:
\[ y=X\beta +\epsilon \] Donde la variable dependiente \(y\) corresponde al precio de las viviendas, mientras que la matriz \(X\) contiene todos los posibles determinantes de los precios de vivienda.
model_SAR_1<-lm(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, data=data2)
summary(model_SAR_1)
##
## Call:
## lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.7429 -2.8887 -0.7514 1.8144 26.8277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.308337 5.199690 7.175 2.66e-12 ***
## CRIM -0.103402 0.033339 -3.102 0.002035 **
## RM 4.074379 0.420639 9.686 < 2e-16 ***
## INDUS 0.018212 0.062015 0.294 0.769138
## NOX -17.829176 3.889690 -4.584 5.79e-06 ***
## AGE -0.002647 0.013353 -0.198 0.842957
## DIS -1.210182 0.186123 -6.502 1.94e-10 ***
## RAD 0.304603 0.066878 4.555 6.62e-06 ***
## PTRATIO -1.131146 0.126079 -8.972 < 2e-16 ***
## B 0.009853 0.002735 3.603 0.000346 ***
## LSTAT -0.525072 0.051543 -10.187 < 2e-16 ***
## TAX -0.010901 0.003710 -2.939 0.003452 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.838 on 494 degrees of freedom
## Multiple R-squared: 0.7293, Adjusted R-squared: 0.7233
## F-statistic: 121 on 11 and 494 DF, p-value: < 2.2e-16
También, analizamos si el modelo estimado cumple con los supuestos básicos de regresión: normalidad y homogeneidad.
De acuerdo a la tipología de los datos, usaremos una matriz de peso espacial basada en distancias. Por tanto, primero obtenemos las coordenadas de esta base de datos, para calcular la matriz W a través del siguiente procedimiento:
Para obtener la matriz W, basada en distancias usamos la siguiente función:
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 506
## Number of nonzero links: 255530
## Percentage nonzero weights: 99.80237
## Average number of links: 505
## Link number distribution:
##
## 505
## 506
## 506 least connected regions:
## 2011 2021 2022 2031 2032 2033 2041 2042 2043 2044 2045 2046 2047 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2081 2082 2083 2084 2091 2092 2101 2102 2103 2104 2105 2106 2107 2108 2109 2111 2112 2113 2114 2121 2141 2151 2161 2171 2172 2173 2174 2175 2176 2181 3301 3302 3311 3312 3313 3321 3322 3323 3324 3331 3332 3333 3334 3335 3336 3341 3342 3343 3344 3351 3352 3353 3354 3361 3362 3363 3364 3371 3372 3373 3381 3382 3383 3384 3385 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3411 3412 3413 3414 3415 3416 3417 3418 3419 3421 3422 3423 3424 3425 3426 3427 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3561 3562 3563 3564 3565 3566 3567 3571 3572 3573 3574 3575 3576 3577 3578 3581 3583 3584 3585 3586 3587 3591 3593 3602 3611 3612 3613 3651 3652 3661 3662 3671 3672 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3701 3702 3703 3704 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3821 3822 3823 3824 3825 3826 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3851 3852 3861 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4021 4022 4023 4024 4025 4031 4032 4033 4034 4035 4041 4042 4043 4044 4051 4061 4071 4091 4111 4112 4113 4121 4122 4123 4131 4132 4133 4134 4135 4141 4142 4143 4151 4152 4153 4161 4162 4163 4164 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4191 4192 4193 4194 4195 4196 4197 4198 4201 4202 4203 4211 4212 4221 4222 4223 4224 4225 4226 4227 4228 4231 5001 5011 5012 5021 5022 5031 5041 5051 5052 5061 5062 5071 5081 5082 1 2 3 4 5 6 7 8 101 102 104 105 107 108 201 202 203 301 302 401 402 403 404 405 406 407 501 502 503 504 506 507 508 509 510 511 512 601 602 603 604 605 606 607 608 609 610 612 613 614 702 703 705 706 707 708 709 710 801 802 803 805 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 901 902 903 904 905 906 907 908 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1101 1102 1103 1104 1105 1106 1201 1202 1203 1204 1205 1206 1207 1301 1302 1303 1304 1401 1402 1403 1404 1601 1602 1604 1605 1606 1701 1702 1703 1704 1705 1706 1707 1708 1801 1802 1803 1804 1805 with 505 links
## 506 most connected regions:
## 2011 2021 2022 2031 2032 2033 2041 2042 2043 2044 2045 2046 2047 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2081 2082 2083 2084 2091 2092 2101 2102 2103 2104 2105 2106 2107 2108 2109 2111 2112 2113 2114 2121 2141 2151 2161 2171 2172 2173 2174 2175 2176 2181 3301 3302 3311 3312 3313 3321 3322 3323 3324 3331 3332 3333 3334 3335 3336 3341 3342 3343 3344 3351 3352 3353 3354 3361 3362 3363 3364 3371 3372 3373 3381 3382 3383 3384 3385 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3411 3412 3413 3414 3415 3416 3417 3418 3419 3421 3422 3423 3424 3425 3426 3427 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3561 3562 3563 3564 3565 3566 3567 3571 3572 3573 3574 3575 3576 3577 3578 3581 3583 3584 3585 3586 3587 3591 3593 3602 3611 3612 3613 3651 3652 3661 3662 3671 3672 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3701 3702 3703 3704 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3821 3822 3823 3824 3825 3826 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3851 3852 3861 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4021 4022 4023 4024 4025 4031 4032 4033 4034 4035 4041 4042 4043 4044 4051 4061 4071 4091 4111 4112 4113 4121 4122 4123 4131 4132 4133 4134 4135 4141 4142 4143 4151 4152 4153 4161 4162 4163 4164 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4191 4192 4193 4194 4195 4196 4197 4198 4201 4202 4203 4211 4212 4221 4222 4223 4224 4225 4226 4227 4228 4231 5001 5011 5012 5021 5022 5031 5041 5051 5052 5061 5062 5071 5081 5082 1 2 3 4 5 6 7 8 101 102 104 105 107 108 201 202 203 301 302 401 402 403 404 405 406 407 501 502 503 504 506 507 508 509 510 511 512 601 602 603 604 605 606 607 608 609 610 612 613 614 702 703 705 706 707 708 709 710 801 802 803 805 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 901 902 903 904 905 906 907 908 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1101 1102 1103 1104 1105 1106 1201 1202 1203 1204 1205 1206 1207 1301 1302 1303 1304 1401 1402 1403 1404 1601 1602 1604 1605 1606 1701 1702 1703 1704 1705 1706 1707 1708 1801 1802 1803 1804 1805 with 505 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 506 256036 506 4.736809 2065.127
También, podemos usar la matriz de peso espacial que viene por defecto con la base de datos:
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 506
## Number of nonzero links: 2152
## Percentage nonzero weights: 0.8405068
## Average number of links: 4.252964
## Link number distribution:
##
## 1 2 3 4 5 6 7 8
## 10 39 93 143 143 52 22 4
## 10 least connected regions:
## 2033 2081 2181 3592 0001 0101 0108 0201 1301 1805 with 1 link
## 4 most connected regions:
## 3564 3744 4196 4223 with 8 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 506 256036 506 261.3891 2071.255
lm.morantest(model_SAR_1, listw = dist.W1)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W1
##
## Moran I statistic standard deviate = 20.425, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 5.107215e-02 -4.346662e-03 7.361799e-06
residuals_model_SAR_1<-residuals(model_SAR_1)
moran.test(residuals_model_SAR_1, listw = dist.W2, randomisation = FALSE)
##
## Moran I test under normality
##
## data: residuals_model_SAR_1
## weights: dist.W2
##
## Moran I statistic standard deviate = 15.17, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.480765673 -0.001980198 0.001012720
lm.morantest(model_SAR_1, listw = dist.W2)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W2
##
## Moran I statistic standard deviate = 15.841, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.4807656728 -0.0152903082 0.0009806596
mc2<-moran.mc(residuals_model_SAR_1, dist.W2, nsim=599)
plot(mc2)
moran.test(residuals_model_SAR_1, listw = dist.W2, randomisation = FALSE)
##
## Moran I test under normality
##
## data: residuals_model_SAR_1
## weights: dist.W2
##
## Moran I statistic standard deviate = 15.17, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.480765673 -0.001980198 0.001012720
mc2<-moran.mc(residuals_model_SAR_1, dist.W2, nsim=599)
plot(mc2)
lm.LMtests(model_SAR_1, listw = dist.W1, test = c("LMerr", "LMlag","RLMerr", "RLMlag", "SARMA"))
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W1
##
## LMerr = 140.99, df = 1, p-value < 2.2e-16
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W1
##
## LMlag = 37.785, df = 1, p-value = 7.899e-10
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W1
##
## RLMerr = 105.32, df = 1, p-value < 2.2e-16
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W1
##
## RLMlag = 2.1177, df = 1, p-value = 0.1456
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS + RAD +
## PTRATIO + B + LSTAT + TAX, data = data2)
## weights: dist.W1
##
## SARMA = 143.11, df = 2, p-value < 2.2e-16
Este modelo está definido como:
\[ y=\rho W y +X\beta+\epsilon \] Donde la variable dependiente es una combinación lineal de las variables explicativas \(X\) y el rezago espacial de la variable dependiente. En este caso, \(\rho\) mide la fuerza de la relación líneal entre la variable dependiente y su rezago espacial.
boston.sar1<-lagsarlm(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, listw=dist.W1, data=data2)
summary(boston.sar1)
##
## Call:lagsarlm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS +
## RAD + PTRATIO + B + LSTAT + TAX, data = data2, listw = dist.W1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.96036 -2.66180 -0.85651 1.53334 27.37593
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 11.2873055 5.7725789 1.9553 0.0505440
## CRIM -0.0853643 0.0320798 -2.6610 0.0077909
## RM 4.0709460 0.4027779 10.1072 < 2.2e-16
## INDUS -0.0133614 0.0594272 -0.2248 0.8221062
## NOX -12.8163015 3.7757534 -3.3944 0.0006879
## AGE -0.0020591 0.0127871 -0.1610 0.8720704
## DIS -1.1553758 0.1782001 -6.4836 8.957e-11
## RAD 0.2812676 0.0641476 4.3847 1.161e-05
## PTRATIO -0.8771772 0.1236958 -7.0914 1.328e-12
## B 0.0084516 0.0026200 3.2258 0.0012561
## LSTAT -0.4581267 0.0498422 -9.1915 < 2.2e-16
## TAX -0.0092039 0.0035579 -2.5869 0.0096844
##
## Rho: 0.82056, LR test value: 29.813, p-value: 4.7582e-08
## Asymptotic standard error: 0.096046
## z-value: 8.5434, p-value: < 2.22e-16
## Wald statistic: 72.99, p-value: < 2.22e-16
##
## Log likelihood: -1494.707 for lag model
## ML residual variance (sigma squared): 21.425, (sigma: 4.6287)
## Number of observations: 506
## Number of parameters estimated: 14
## AIC: 3017.4, (AIC for lm: 3045.2)
## LM test for residual autocorrelation
## test value: 155.28, p-value: < 2.22e-16
boston.sar2<-lagsarlm(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, listw=dist.W2, data=data2)
summary(boston.sar2)
##
## Call:lagsarlm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS +
## RAD + PTRATIO + B + LSTAT + TAX, data = data2, listw = dist.W2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5758 -2.2207 -0.6105 1.5095 25.8126
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.9494723 4.6246838 2.1514 0.0314459
## CRIM -0.0563840 0.0278782 -2.0225 0.0431238
## RM 3.8170563 0.3563235 10.7123 < 2.2e-16
## INDUS 0.0327685 0.0514609 0.6368 0.5242783
## NOX -7.4193206 3.3049403 -2.2449 0.0247734
## AGE -0.0129553 0.0110875 -1.1685 0.2426207
## DIS -0.8367538 0.1593477 -5.2511 1.512e-07
## RAD 0.2116813 0.0560577 3.7761 0.0001593
## PTRATIO -0.6005236 0.1105190 -5.4337 5.521e-08
## B 0.0075613 0.0022839 3.3106 0.0009309
## LSTAT -0.2885109 0.0447018 -6.4541 1.088e-10
## TAX -0.0091681 0.0030888 -2.9682 0.0029959
##
## Rho: 0.45861, LR test value: 146.81, p-value: < 2.22e-16
## Asymptotic standard error: 0.033612
## z-value: 13.644, p-value: < 2.22e-16
## Wald statistic: 186.17, p-value: < 2.22e-16
##
## Log likelihood: -1436.21 for lag model
## ML residual variance (sigma squared): 16.108, (sigma: 4.0135)
## Number of observations: 506
## Number of parameters estimated: 14
## AIC: 2900.4, (AIC for lm: 3045.2)
## LM test for residual autocorrelation
## test value: 36.098, p-value: 1.8763e-09
residuals_sar_2<-residuals(boston.sar2)
moran.test(residuals_sar_2, listw = dist.W2, randomisation = FALSE)
##
## Moran I test under normality
##
## data: residuals_sar_2
## weights: dist.W2
##
## Moran I statistic standard deviate = 4.4062, p-value = 5.26e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.138239408 -0.001980198 0.001012720
residuals_sar_1<-residuals(boston.sar1)
moran.test(residuals_sar_1, listw = dist.W1, randomisation = FALSE)
##
## Moran I test under normality
##
## data: residuals_sar_1
## weights: dist.W1
##
## Moran I statistic standard deviate = 13.145, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 4.032091e-02 -1.980198e-03 1.035628e-05
boston.sar3<-stsls(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, listw=dist.W1, data=data2)
summary(boston.sar3)
##
## Call:stsls(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS +
## RAD + PTRATIO + B + LSTAT + TAX, data = data2, listw = dist.W1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.92516 -2.67995 -0.83073 1.54768 27.28718
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rho 0.6877218 0.1735907 3.9617 7.440e-05
## (Intercept) 15.4998294 7.4708461 2.0747 0.0380135
## CRIM -0.0882844 0.0326087 -2.7074 0.0067815
## RM 4.0715017 0.4085982 9.9646 < 2.2e-16
## INDUS -0.0082501 0.0606092 -0.1361 0.8917269
## NOX -13.6278318 3.9243451 -3.4726 0.0005154
## AGE -0.0021542 0.0129713 -0.1661 0.8680965
## DIS -1.1642484 0.1811661 -6.4264 1.306e-10
## RAD 0.2850454 0.0651511 4.3751 1.214e-05
## PTRATIO -0.9182920 0.1337369 -6.8664 6.584e-12
## B 0.0086785 0.0026728 3.2470 0.0011664
## LSTAT -0.4689645 0.0520321 -9.0130 < 2.2e-16
## TAX -0.0094787 0.0036214 -2.6174 0.0088608
##
## Residual variance (sigma squared): 22.085, (sigma: 4.6994)
boston.sar4<-stsls(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, listw=dist.W2, data=data2)
summary(boston.sar4)
##
## Call:stsls(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS +
## RAD + PTRATIO + B + LSTAT + TAX, data = data2, listw = dist.W2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.41206 -2.34302 -0.66736 1.39880 25.95890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Rho 0.3684463 0.0555233 6.6359 3.226e-11
## (Intercept) 15.3282846 5.5476952 2.7630 0.0057273
## CRIM -0.0656279 0.0290967 -2.2555 0.0241015
## RM 3.8676465 0.3613648 10.7029 < 2.2e-16
## INDUS 0.0299066 0.0531074 0.5631 0.5733437
## NOX -9.4659209 3.5597068 -2.6592 0.0078330
## AGE -0.0109286 0.0114966 -0.9506 0.3418086
## DIS -0.9101707 0.1655913 -5.4965 3.874e-08
## RAD 0.2299499 0.0583353 3.9419 8.085e-05
## PTRATIO -0.7048451 0.1255846 -5.6125 1.994e-08
## B 0.0080119 0.0023569 3.3994 0.0006754
## LSTAT -0.3350193 0.0525966 -6.3696 1.895e-10
## TAX -0.0095089 0.0031821 -2.9883 0.0028057
##
## Residual variance (sigma squared): 17.146, (sigma: 4.1407)
residuals_stsls_3<-residuals(boston.sar3)
moran.test(residuals_stsls_3, listw = dist.W1, randomisation = FALSE)
##
## Moran I test under normality
##
## data: residuals_stsls_3
## weights: dist.W1
##
## Moran I statistic standard deviate = 13.383, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 4.108651e-02 -1.980198e-03 1.035628e-05
residuals_stsls_2<-residuals(boston.sar4)
moran.test(residuals_stsls_2, listw = dist.W2, randomisation = FALSE)
##
## Moran I test under normality
##
## data: residuals_stsls_2
## weights: dist.W2
##
## Moran I statistic standard deviate = 6.9757, p-value = 1.521e-12
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.220010789 -0.001980198 0.001012720
boston.sarar1<-sacsarlm(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, listw=dist.W1, data=data2, Durbin = FALSE)
summary(boston.sarar1)
##
## Call:sacsarlm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS +
## RAD + PTRATIO + B + LSTAT + TAX, data = data2, listw = dist.W1,
## Durbin = FALSE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.61193 -2.50095 -0.71766 1.59942 26.15880
##
## Type: sac
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.7151466 7.3569857 1.1846 0.2361723
## CRIM -0.1056816 0.0317374 -3.3299 0.0008689
## RM 4.3246443 0.3949666 10.9494 < 2.2e-16
## INDUS -0.0196950 0.0607681 -0.3241 0.7458619
## NOX -14.1141423 4.0996687 -3.4428 0.0005758
## AGE -0.0099845 0.0129829 -0.7690 0.4418644
## DIS -1.1836532 0.1942527 -6.0934 1.106e-09
## RAD 0.3064301 0.0685436 4.4706 7.800e-06
## PTRATIO -0.8407105 0.1319319 -6.3723 1.862e-10
## B 0.0083987 0.0027897 3.0106 0.0026077
## LSTAT -0.4470199 0.0507824 -8.8027 < 2.2e-16
## TAX -0.0099111 0.0035690 -2.7770 0.0054869
##
## Rho: 0.79351
## Asymptotic standard error: 0.18993
## z-value: 4.178, p-value: 2.941e-05
## Lambda: 0.95047
## Asymptotic standard error: 0.056897
## z-value: 16.705, p-value: < 2.22e-16
##
## LR test value: 63.451, p-value: 1.6653e-14
##
## Log likelihood: -1477.888 for sac model
## ML residual variance (sigma squared): 19.843, (sigma: 4.4545)
## Number of observations: 506
## Number of parameters estimated: 15
## AIC: 2985.8, (AIC for lm: 3045.2)
boston.sarar2<-sacsarlm(MEDV ~ CRIM+RM+INDUS+NOX+AGE+DIS+RAD+PTRATIO+B+LSTAT+TAX, listw=dist.W2, data=data2, Durbin = FALSE)
summary(boston.sarar2)
##
## Call:sacsarlm(formula = MEDV ~ CRIM + RM + INDUS + NOX + AGE + DIS +
## RAD + PTRATIO + B + LSTAT + TAX, data = data2, listw = dist.W2,
## Durbin = FALSE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.1904 -2.0443 -0.5744 1.3632 22.9109
##
## Type: sac
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 25.1787143 5.8728488 4.2873 1.809e-05
## CRIM -0.0832015 0.0276981 -3.0039 0.0026657
## RM 4.3310149 0.3585701 12.0786 < 2.2e-16
## INDUS -0.0331919 0.0710612 -0.4671 0.6404359
## NOX -13.2333265 5.0787924 -2.6056 0.0091712
## AGE -0.0342447 0.0130598 -2.6221 0.0087378
## DIS -1.3151723 0.2492327 -5.2769 1.314e-07
## RAD 0.2661552 0.0732950 3.6313 0.0002820
## PTRATIO -0.7619121 0.1433341 -5.3156 1.063e-07
## B 0.0112946 0.0029121 3.8786 0.0001051
## LSTAT -0.3560926 0.0516072 -6.9001 5.198e-12
## TAX -0.0120994 0.0035789 -3.3808 0.0007229
##
## Rho: 0.10485
## Asymptotic standard error: 0.058441
## z-value: 1.7941, p-value: 0.072796
## Lambda: 0.54902
## Asymptotic standard error: 0.058509
## z-value: 9.3834, p-value: < 2.22e-16
##
## LR test value: 193.67, p-value: < 2.22e-16
##
## Log likelihood: -1412.778 for sac model
## ML residual variance (sigma squared): 14.211, (sigma: 3.7698)
## Number of observations: 506
## Number of parameters estimated: 15
## AIC: 2855.6, (AIC for lm: 3045.2)