Areal pattern analysis

Introduction

This document illustrates areal pattern analysis using R. Much of the text included here is borrowed from Oyana & Margai (2016).

Spatial Autocorrelation

A statistical test is applied to determine whether there is a match between locational and attribute similarity. The effects of spatial autocorrelation are commonly quantified using Moran’s I index and Geary’s C ratio, both of which are statistical in nature. Both methods use a measure known as the spatial autocorrelation coefficient, which statistically tests how clustered/dispersed features lie in space with respect to their attribute values. The measure examines whether an observed attribute of a variable at one location is independent of values of that variable at neighboring locations. If these values are similar and statistically significant, then we can conclude that positive spatial autocorrelation is evident in the spatial distribution. However, in a case when values in the neighboring location exhibit different characteristics (are dissimilar), then we can conclude that spatial autocorrelation is weak or nonexistent in the spatial distribution.

Joint Count Statistics

This is a basic method that quantitatively determines the degree of cluster- ing or dispersion among a set of spatially adjacent polygons. It is used for binary nominal data such as 1/0, yes/ no, arable/nonarable lands, and urban/rural counties. The method measures the spatial relationships between similar or dissimilar attributes in adjacent areas. The binary variable is denoted by two colors, black (B) and white (W).

If a given attribute of 1 occurs in an area, then the area will be assigned B. If it does not and has an attribute of 0, then it will be assigned W. If two neighboring areas share a common boundary, they are conceptualized as joined.

There are three possible types of joins: black-black (BB), two B neighboring areas; white-white (WW), two W neighboring areas; and black-white (BW), B and W neighboring areas. Join counts tally the numbers of black-black, white-white, and black-white joins in the study area. Observed join counts are derived as follows:

where \(x_i\) is the observer value for variant \(X_i\) , \(x_i\) = 1 when the ith area is B, \(x_i\) = 0 when the ith area is \(W\), and \(w_{ij}\) is weight for each pair of objects \(i\) and \(j\). We use the observed patterns of join counts to compare whether it is ij different from a random/expected pattern under the null hypothesis of no spatial autocorrelation.

Each of the null hypotheses for the three types of joins determines whether the compared differences are statistically significant at p-value < .05. This is done by calculating the Z-test for each join and deciding whether the null hypothesis is true. The Z-test is calculated as

A z-score for each of the joins in the areal pattern A is calculated using the code below.

# load libraries
library(raster)

## Warning: package 'raster' was built under R version 3.2.5

## Loading required package: sp

## Warning: package 'sp' was built under R version 3.2.5

library(sp)
library(spdep)

## Warning: package 'spdep' was built under R version 3.2.5

## Loading required package: Matrix

# create a matrix representing areal pattern A
pri <- rep(1,12)
seg <- rep(0,4)
ter <- rep(1,2)
cua <- rep(0,4)
qui <- rep(1,2)
sex <-  rep(0,12)

A <- matrix(c(pri, seg, ter, cua ,qui, sex), nrow=6, byrow=FALSE)

A

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    0    0    0    0
## [2,]    1    1    0    0    0    0
## [3,]    1    1    0    0    0    0
## [4,]    1    1    0    0    0    0
## [5,]    1    1    1    1    0    0
## [6,]    1    1    1    1    0    0

# convert matrix to raster
rA <- raster(A)

rA

## class       : RasterLayer 
## dimensions  : 6, 6, 36  (nrow, ncol, ncell)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
## coord. ref. : NA 
## data source : in memory
## names       : layer 
## values      : 0, 1  (min, max)

Let’s plot the raster plus corresponding cell values:

plot(rA)
text(coordinates(rA), labels=rA[], cex=1.5)

Let’s convert raster data into vector data to be able to use the \(spdep\) library to determine which polygons are “near”, and how to quantify that. Here we’ll use adjacency as criterion. To find adjacent polygons, we can use package ‘spdep’ (Bivand and Piras, 2015).

pA <- rasterToPolygons(rA, dissolve=FALSE)

pA

## class       : SpatialPolygonsDataFrame 
## features    : 36 
## extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
## coord. ref. : NA 
## variables   : 1
## names       : layer 
## min values  :     0 
## max values  :     1

spA <- SpatialPolygons(pA@polygons)


# create list of neighborhoods
# queen case

nb1 <- poly2nb(spA, queen = T)

nb1

## Neighbour list object:
## Number of regions: 36 
## Number of nonzero links: 220 
## Percentage nonzero weights: 16.97531 
## Average number of links: 6.111111

Let’s plot adjacent polygons using the Queen criteria:

par(mai=c(0,0,0,0))
plot(spA, col='gray', border='blue')
xy <- coordinates(spA)
plot(nb1, xy, col='red', lwd=2, add=TRUE)

Now, let’s create and visualize adjacent polygons using the Rook criteria:

# create list of neighborhoods
# rook case 

nb2 <- poly2nb(spA, queen = F)

nb2

## Neighbour list object:
## Number of regions: 36 
## Number of nonzero links: 120 
## Percentage nonzero weights: 9.259259 
## Average number of links: 3.333333

par(mai=c(0,0,0,0))
plot(spA, col='gray', border='blue')
xy <- coordinates(spA)
plot(nb2, xy, col='green', lwd=2, add=TRUE)

We need to transform nb objects into spatial weights lists. A spatial weights list reflects the intensity of the geographic relationship between observations. In this case we use a binary (B) criteria, i.e. there is adjacency (1) or there is no adjacency (0). Then, we apply the Join Count test to evaluate presence or absence or spatial autocorrelation:

wl1 <- nb2listw(nb1, style='B')

wl2 <- nb2listw(nb2, style='B')

jc_test1 <- joincount.test(as.factor(pA$layer), wl1)

jc_test1

## 
##  Join count test under nonfree sampling
## 
## data:  as.factor(pA$layer) 
## weights: wl1 
## 
## Std. deviate for 0 = 5.1529, p-value = 1.282e-07
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##              53.00000              33.17460              14.80263 
## 
## 
##  Join count test under nonfree sampling
## 
## data:  as.factor(pA$layer) 
## weights: wl1 
## 
## Std. deviate for 1 = 4.7634, p-value = 9.52e-07
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##              37.00000              20.95238              11.34999

jc_test2 <- joincount.test(as.factor(pA$layer), wl2)

jc_test2

## 
##  Join count test under nonfree sampling
## 
## data:  as.factor(pA$layer) 
## weights: wl2 
## 
## Std. deviate for 0 = 5.4677, p-value = 2.28e-08
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##             30.000000             18.095238              4.740611 
## 
## 
##  Join count test under nonfree sampling
## 
## data:  as.factor(pA$layer) 
## weights: wl2 
## 
## Std. deviate for 1 = 5.1203, p-value = 1.525e-07
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##             22.000000             11.428571              4.262554

Statistical results for the join counts were presented based on Queen’s case and Rook’s case for areal pattern A. It shows more arable/arable land joins (Rook’s case observed = 30, Queen’s case observed = 53) than would be expected under Rook’s case (18.1), and Queen’s case (33.2), implying the presence of positive spatial autocorrelation in land use patterns. A similar observation is evident for the nonarable/nonarable land joins. There are far fewer arable/ nonarable land joins (Rook’s case observed = 8, Queen’s case observed = 20) than would be expected under Rook’s case (30.5), and Queen’s case (55.9), implying the presence of positive spatial autocorrelation in land use patterns.

Now your turn

Repeat the code above to replicate presented results. Then, apply the Join Count Test for areal pattern B and areal pattern C. Are your results similar to those reported by Oyana & Margai?

Are are you brave enough to try Bishop adjacency to apply the Join Count Test?

WARNING: It must be emphasized that Join Count Statistics offer an easy way to represent spatial distribution. However, it can only be applied to nominal data and does not provide a simple summary index that is similar to Geary’s C or Moran’s I. Caution is therefore required when classifying continuous variables into binary variables because the aggregation of the data could lead to loss of information and biased estimates.

Analyzing Global Moran’s I

Moran’s I measures the degree of spatial autocorrelation in ordinal- and interval-measured data. It is one of the widely used indices that evaluates the extent of spatial autocorrelation between a set of n cells = {x } located in neighboring areas, where x is either the rank of the ith cell (ordinal data) or the value of X in the ith cell (interval data). The computation of Moran’s I is achieved by dividing the spatial covariation by the total variation. The resultant values range from approximately -1 (perfect dispersion) to 1 (perfect correlation). The positive sign represents positive spatial autocorrelation while the converse is true for the negative sign, and a zero result represents no spatial autocorrelation (Figure 7.5). Suppose, we have a study region, R, which is subdivided into n cells, where each cell is identified with a spatial feature. Moran’s I is calculated as follows:

where w = 1 if cells i and j are neighbors, w = 0 otherwise; and \(c_{ij} = (Xi - \overline{X}) (Xj - \overline{X})\) are variables at a particular and another location, respectively.

The average of all the \(n\) cells is the mean (\(\overline{X}\)), which is used to compute (\(s^2\)) based on the differences that each X value has from the \(\overline{X}\).

To get the Moran’s I value, we use the \(moran.test\) function to evaluate spatial autocorrelation considering both adjacency cases (i.e Queen and Rook9).

I1 <- moran.test(pA$layer,wl1)

I1

## 
##  Moran I test under randomisation
## 
## data:  pA$layer  
## weights: wl1  
## 
## Moran I statistic standard deviate = 7.4654, p-value = 4.151e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.624090909      -0.028571429       0.007643049

I2 <- moran.test(pA$layer,wl2)

I2

## 
##  Moran I test under randomisation
## 
## data:  pA$layer  
## weights: wl2  
## 
## Moran I statistic standard deviate = 5.9917, p-value = 1.038e-09
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.72500000       -0.02857143        0.01581790

Note that the p-value computed from the moran.test function is not computed from an MC simulation but analytically instead. This may not always prove to be the most accurate measure of significance. To test for significance using the MC simulation method instead, use the moran.mc function.

MC<- moran.mc(pA$layer, wl1, nsim=599)

# View results (including p-value)
MC

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  pA$layer 
## weights: wl1  
## number of simulations + 1: 600 
## 
## statistic = 0.62409, observed rank = 600, p-value = 0.001667
## alternative hypothesis: greater

# Plot the distribution (note that this is a density plot instead of a histogram)
plot(MC, main=NULL)

Analyzing Global Geary’s C

Gear y’s C is an alternative measure of spatial autocorrelation. It determines the degree of spatial association using the sum of squared differences between pairs of data values as its measure of covariation. The computation of Geary’s C results in a value within the range of 0 to +2. When we obtain a zero value, it is interpreted as a strong positive spatial autocorrelation (perfect correlation), a value of 1 indicates a random spatial pattern (no autocorrelation), and a value between 1 and 2 represents a negative spatial autocorrelation (2 is a perfect dispersion). Suppose we have a study region, R, that is subdivided into n cells, where each cell is identified with a spatial feature. Geary’s C can be computed by

where \(w_{ij} = 1\) if cells i and j are neighbors, \(w_{ij} = 0\) otherwise; and

To get the Geary’s C value, we use the \(geary.test\) function to evaluate spatial autocorrelation considering both adjacency cases (i.e Queen and Rook9).

C1 <- geary.test(pA$layer,wl1)

C1

## 
##  Geary C test under randomisation
## 
## data:  pA$layer 
## weights: wl1 
## 
## Geary C statistic standard deviate = 7.4869, p-value = 3.526e-14
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##       0.357954545       1.000000000       0.007354046

C2 <- geary.test(pA$layer,wl2)

C2

## 
##  Geary C test under randomisation
## 
## data:  pA$layer 
## weights: wl2 
## 
## Geary C statistic standard deviate = 6.0193, p-value = 8.758e-10
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##         0.2625000         1.0000000         0.0150117

Again, that the p-value computed from the geary.test function is not computed from an MC simulation but analytically instead. This may not always prove to be the most accurate measure of significance. To test for significance using the MC simulation method instead, use the geary.mc function.

GS1 <- geary.mc(pA$layer, wl1, nsim=599)

# View results (including p-value)
GS1

## 
##  Monte-Carlo simulation of Geary C
## 
## data:  pA$layer 
## weights: wl1 
## number of simulations + 1: 600 
## 
## statistic = 0.35795, observed rank = 1, p-value = 0.001667
## alternative hypothesis: greater

# Plot the distribution (note that this is a density plot instead of a histogram)
plot(GS1, main=NULL)

WARNING: Both Moran’s I and Geary’s C only detect spatial patterns (clusters) of an entire region and are unable to distinguish local patterns. Geary’s C is less arranged, and therefore the extremes are less likely to correspond to the positive or negative correlation. Although their calculations are quite similar, Moran’s I is based on the cross product of deviations from the mean for variables at a particular cell and another neighboring cell (location), while Gear y’s C is a cross product of actual values of a variable at a particular location and another neighboring cell.

Now your turn

Repeat the code above to replicate Global Moran and Geary results for areal pattern A. Then, obtain Global I and Global C for areal pattern B and areal pattern C. ¿Do your results confirm your intuition on spatial autocorrelation?

Local Moran’s I

The Local Moran’s I determines the degree of spatial association at the location-specific level. It belongs to a family of Local Indicators for Spatial Association (LISA) that is used to identify clusters among individual spatial units. LISA statistics measure the degree to which one areal unit is autocorrelated relative to its neighbors. Following the analysis, Moran’s scatterplot can be used to identify the leverage points and spatial outliers. The plot has four quadrants: high-high, high- low, low-high, and low-low. High-high denotes the presence of spatial clustering of neighbors with high values surrounded by those with similar values, low-low denotes spatial clustering of neighbors with low values surrounded by those with similar values, and high-low or low-high represents spatial outliers or neighbors with values that are statistically insignificant.

In line with Anselin’s (1995) suggestions, there are two notable aspects of these statistics: (1) the LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation and (2) the sum of LISAs for all observations is in proportion to a global statistic of spatial association. There are several local versions of global statistics such as Moran’s I, Geary’s C, and Getis-Ord’s G. These measures serve four principal aims: (1) provide a finer-grained analysis at the local level, (2) identify spatial patterns at the local level or hotspots, (3) measure spatial autocorrelation at the local level, and (4) detect spatial clusters or spatial outliers at the local level.

A Local Moran’s I for an observation i is defined as

where \(w_{ij}\) are the spatial weights matrix, the observations \(z _i\) , \(z_j\) are the deviations from the mean, and \(L_i\) is the summation of the spatial weights matrix multiplied by \(z_j\) , \(z_i\) . Deriving the mean deviations for each of the observations is similar to how we calculate a Z-score.

To compute LISA indicators we can use \(localmoran\) function in the \(spdep\) library:

oid <- order(pA$layer)
resI <- localmoran(pA$layer, wl1)
resI

##       Ii        E.Ii   Var.Ii       Z.Ii   Pr(z > 0)
## 1   3.75 -0.08571429 3.045342  2.1980026 0.013974460
## 2   1.75 -0.14285714 5.086399  0.8392912 0.200652974
## 3   0.40 -0.14285714 5.086399  0.2407024 0.404892902
## 4   4.00 -0.14285714 5.086399  1.8369391 0.033109441
## 5   4.00 -0.14285714 5.086399  1.8369391 0.033109441
## 6   2.40 -0.08571429 3.045342  1.4244039 0.077164789
## 7   6.25 -0.14285714 5.086399  2.8345871 0.002294247
## 8   3.25 -0.22857143 8.135738  1.2195579 0.111316248
## 9   1.00 -0.22857143 8.135738  0.4307268 0.333333503
## 10  6.40 -0.22857143 8.135738  2.3239215 0.010064851
## 11  6.40 -0.22857143 8.135738  2.3239215 0.010064851
## 12  4.00 -0.14285714 5.086399  1.8369391 0.033109441
## 13  6.25 -0.14285714 5.086399  2.8345871 0.002294247
## 14  3.25 -0.22857143 8.135738  1.2195579 0.111316248
## 15  1.00 -0.22857143 8.135738  0.4307268 0.333333503
## 16  6.40 -0.22857143 8.135738  2.3239215 0.010064851
## 17  6.40 -0.22857143 8.135738  2.3239215 0.010064851
## 18  4.00 -0.14285714 5.086399  1.8369391 0.033109441
## 19  6.25 -0.14285714 5.086399  2.8345871 0.002294247
## 20  5.50 -0.22857143 8.135738  2.0083891 0.022300982
## 21 -2.60 -0.22857143 8.135738 -0.8314030 0.797126984
## 22  2.80 -0.22857143 8.135738  1.0617917 0.144165124
## 23  4.60 -0.22857143 8.135738  1.6928566 0.045241381
## 24  4.00 -0.14285714 5.086399  1.8369391 0.033109441
## 25  6.25 -0.14285714 5.086399  2.8345871 0.002294247
## 26  7.75 -0.22857143 8.135738  2.7972202 0.002577220
## 27  5.50 -0.22857143 8.135738  2.0083891 0.022300982
## 28 -1.25 -0.22857143 8.135738 -0.3581043 0.639867363
## 29  2.80 -0.22857143 8.135738  1.0617917 0.144165124
## 30  4.00 -0.14285714 5.086399  1.8369391 0.033109441
## 31  3.75 -0.08571429 3.045342  2.1980026 0.013974460
## 32  6.25 -0.14285714 5.086399  2.8345871 0.002294247
## 33  6.25 -0.14285714 5.086399  2.8345871 0.002294247
## 34  1.75 -0.14285714 5.086399  0.8392912 0.200652974
## 35  0.40 -0.14285714 5.086399  0.2407024 0.404892902
## 36  2.40 -0.08571429 3.045342  1.4244039 0.077164789
## attr(,"call")
## localmoran(x = pA$layer, listw = wl1)
## attr(,"class")
## [1] "localmoran" "matrix"

#
pA$z.li <- resI[,4]
pA$pvalue <- resI[,5]
lm.palette <- colorRampPalette(c("white","orange", "red"), space = "rgb")
spplot(pA, zcol="z.li", col.regions=lm.palette(20), main="Local Moran")

#
#
nci <- moran.plot (pA$layer, wl1)

#nci <- moran.plot (pA, wl1, labels=layer , xlab="values", ylab="SL values")
#
L1 <- factor(pA$layer < mean(pA$layer), labels=c("H", "L")) 
lw <- lag(wl1, pA$layer) 
L2 <- factor(lw < mean(lw), labels=c("H", "L")) 
result <- paste(L1, L2)
#rA$quad <- result

Let’s visualize pattern A in another way:

plot(rA)
text(coordinates(rA), labels=result, cex=1.5)

Are you eager to try Local Gery statistics for areal pattern A?

Would you mind to examine local Moran statistics for areal pattern B and areal pattern C? It is advisable to do it before next evaluation.

Now, it’s your turn to code

Do not forget to replicate this exercise. Furthermore, adapt this code to study your own areal pattern data.

Remember: Do not copy & paste. If you are serious about learning R, get your hands dirty and write your code.

REFERENCES

Anselin, L., 1995. Local Indicators of Spatial Association - LISA, Geographical Analysis 27(2), 93-115.

Bivand, R., and Piras, G., 2015. Comparing Implementations of Estimation Methods for Spatial Econometrics. Journal of Statistical Software, 63(18), 1-36. URL http://www.jstatsoft.org/v63/i18/.

Oyana, T.J., and Margai, F.M., 2016. Spatial Analysis: Statistics, Visualization, and Computational Methods. CRC Press. 294 pp.

Areal pattern analysis

Ivan Lizarazo

October 31, 2016

Introduction

Spatial Autocorrelation

Joint Count Statistics

Now your turn

Analyzing Global Moran’s I

Analyzing Global Geary’s C

Now your turn

Local Moran’s I

Now, it’s your turn to code

REFERENCES