Global and local incidences of clustering in Ethiopia malaria prevalence data

In the following, I will discern whether Professor Sturrock’s (UCSF) malaria prevalence data out of Ethiopia (mal_data_eth_2009_no_dups.csv) contains instances of global and local clustering. The data can be found here: https://github.com/HughSt/HughSt.github.io/tree/master/course_materials/week1/Lab_files/Data.

I will focus on the Moran’s I test for global spatial autocorrelation (primarily used on point-referenced data like this) using four different methods:

Creating an inverse-distance weight matrix to calculate Moran’s I
Creating a correlogram to plot Moran’s I statistics at different distance classes
Using k-nearest neighbors and a distance-based approach in tandem to calculate Moran’s I
Using a Monte Carlo simulation to examine Moran’s I

I will then turn to tests for local spatial autocorrelation using LISAs.

Components of the Moran’s I test

The Moran’s I test for global spatial autocorrelation is a key tool for global cluster identiifcation in point referenced or continous data. In this case, “clustering” refers not to the distribution of points, which in and of themselves only refer to survey locations. Clustering here refers to the values of malaria prevalence taken at those points.

The Moran’s I test requires:

A set of continuous data. Let’s say n = 203 where each point has a continuous value.
A defined community of neighbors for each of the 203 points. This requires us to define what’s considered a “neighbor,” and each methodology has its own definition.
A spatial weight for each neighbor linkage.

Map of Ethiopia Prevalence Data

Here is a map of what the point-process data on malaria prevalence in Ethiopia currently looks like.

# load data
url_ethmalaria <- "https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/mal_data_eth_2009_no_dups.csv"

ETH_malaria_data <- read.csv(url_ethmalaria, header=T)

# layer map
pal = colorNumeric("Oranges", ETH_malaria_data$pf_pr)

ETH_malaria_data %>%
  leaflet() %>% addProviderTiles("CartoDB.DarkMatter") %>%
  addCircleMarkers(~longitude, ~latitude, fillOpacity=1,
                   fillColor= ~pal(pf_pr),
                   radius=3,
                   weight=0.1,
                   stroke=TRUE) %>%
  addLegend(pal = pal, values = ~pf_pr, title = "Malaria Prevalence")

Because we care about clustering in our values, what we are looking for on this map is if there are clusters of colors (that is to say, we don’t care if we see clusters of points). There do seem to be clusters of lower prevalence values here and there, and one or two clusters of higher prevalence values. We can now confirm this observation statistically using the Global Moran’s I test.

Before we jump into the Moran’s I test, we first examine our data to see if it is normally distributed.

# see if malaria prevalence variable is normally distributed
ETH_malaria_data %>% ggplot(aes(x=pf_pr)) + geom_histogram() +
  xlab("Ethiopia malaria prevalence") +
  ylab("Frequency") +
  ggtitle("Histogram of Ethiopia malaria prevalence values")

I saw that the malaria prevalence variable was not normally distributed, so I used a logit transformation and added a minuscule value to it to see if that would correct it.

# see if malaria prevalence variable is normally distributed
ETH_malaria_data <- ETH_malaria_data %>% mutate(log_odds = logit(pf_pr) + 0.0001)

ETH_malaria_data %>% ggplot(aes(x=log_odds)) + geom_histogram() +
  xlab("Log odds of Ethiopia malaria prevalence") +
  ylab("Frequency") +
  ggtitle("Histogram of Ethiopia malaria prevalence values")

This did not happen even with a logit transformation. This is not ideal because the null hypothesis of the Moran’s I test is a normal distribution. The lack of normally distributed values will limit the robustness of the Moran’s I test, as noted in the conclusion.

Q1 Create inverse distance matrix

An inverse distance approach defines “neighbors” as essentially every point being a neighbor to every other point in the dataset. This is why we are left with a matrix of n by n values.

In this case, I first take every point of malaria prevalence (n = 203) and calculated the distance to every other point of malaria prevalence. We are left with a matrix of 203 x 203 values where the values represent the distance between each pair of points. These pairs of points are our neighbor linkages.

I then take the inverse of every distance in the matrix. This essentially weights shorter neighbor linkages higher as 1 divided by a small number results in a large number, and vice-versa.

Invariably, the original 203x203 distance matrix includes 203 neighbor linkages of value 0, representing the 203 points’ distances to themselves. When take the inverse of 0, we are left with Infinity, thus leaving with 203 infinity values on our inverse distance matrix. They appear as a diagonal on our inverse distance matrix. We replace these 203 infinity values with 0 to finish our inverse distance-based weighting process, simply as a data cleaning step.

# calculate inverse distance matrix
ETH.dists <- ETH_malaria_data %>% select(latitude, longitude) %>% bind_cols() %>% dist() %>% as.matrix()

# Take the inverse of the matrix values so that closer values have a larger weight and vice versa
ETH.dists.inv <- 1/ETH.dists

# replace the diagonal values with zero
diag(ETH.dists.inv) <- 0

# From the ape package, computes Moran's I where weights = inverse distance matrix
ETH_malaria_data$log_odds %>% Moran.I(weight=ETH.dists.inv)

## $observed
## [1] 0.1847263
## 
## $expected
## [1] -0.004950495
## 
## $sd
## [1] 0.02282913
## 
## $p.value
## [1] 0

The key component of the Moran’s I is defining the spatial weights. Using an inverse distance based approach to define the spatial weights, we derive a Moran’s I statistic of 0.1847263. The associated p-value is lower than 0.01, indicating statistically significant global spatial autocorrelation overall.

Q1 Calculate the maximum distance between points

We can take a quick look at the maximum distance between any two points on our matrix.

maxDist <- ETH_malaria_data %>% select(latitude, longitude) %>% bind_cols() %>% dist() %>% max()
maxDist

## [1] 7.957449

The max distance is 7.957449 decimal degrees.

Q1 Plot data in correlogram

Now, we can turn to correlograms which illustrate correlation between distance classes and global spatial autocorrelation. The correlograms essentially tell us what is the maximum distance at which we can compute statistically significant Moran’s I values.

xy <- ETH_malaria_data %>% select(longitude, latitude) %>% bind_cols() %>% as.matrix()

pgi.cor <- correlog(coords=xy, z=ETH_malaria_data$log_odds, method="Moran", nbclass=10)

pgi.cor %>% plot(xlab = "Distance Classes", ylab = "Moran's I Statistic",
                 main = "Correlogram: 10 Distance Classes v. Moran's I")

pgi.cor

## Moran I statistic 
##       dist.class        coef      p.value    n
##  [1,]  0.3979675  0.22708898 3.307121e-20 3904
##  [2,]  1.1937029  0.02187041 7.781197e-02 5192
##  [3,]  1.9894379 -0.07975471 9.999763e-01 5680
##  [4,]  2.7851728  0.01896800 1.192675e-01 4976
##  [5,]  3.5809077 -0.06374242 9.983910e-01 5078
##  [6,]  4.3766427 -0.07540964 9.999799e-01 5896
##  [7,]  5.1723776 -0.05037439 9.869326e-01 4890
##  [8,]  5.9681126 -0.01292578 6.005353e-01 3694
##  [9,]  6.7638475 -0.01687454 5.884774e-01 1412
## [10,]  7.5595825 -0.01881338 4.486059e-01  284

pgi.cor <- correlog(coords=xy, z=ETH_malaria_data$log_odds, method="Moran", nbclass=15)

pgi.cor %>% plot(xlab = "Distance Classes", ylab = "Moran's I Statistic",
                 main = "Correlogram: 15 Distance Classes v. Moran's I")

pgi.cor

## Moran I statistic 
##       dist.class         coef      p.value    n
##  [1,]  0.2653450  0.281994590 7.904867e-15 2244
##  [2,]  0.7958354  0.120872646 2.050363e-07 3368
##  [3,]  1.3263254 -0.033025904 8.699130e-01 3484
##  [4,]  1.8568154 -0.079725359 9.990843e-01 3668
##  [5,]  2.3873053  0.001291430 3.990732e-01 3878
##  [6,]  2.9177953 -0.008774468 5.540559e-01 3110
##  [7,]  3.4482852 -0.041319537 9.114110e-01 3148
##  [8,]  3.9787752 -0.072158367 9.987303e-01 3884
##  [9,]  4.5092652 -0.079384496 9.995993e-01 3942
## [10,]  5.0397551 -0.047703955 9.606401e-01 3468
## [11,]  5.5702451 -0.035678256 8.566965e-01 2880
## [12,]  6.1007351 -0.015665009 6.047760e-01 2236
## [13,]  6.6312250 -0.019352744 6.086768e-01 1124
## [14,]  7.1617150  0.006508598 3.545727e-01  452
## [15,]  7.6922049 -0.019890375 3.809977e-01  120

pgi.cor <- correlog(coords=xy, z=ETH_malaria_data$log_odds, method="Moran", nbclass=20)

pgi.cor %>% plot(xlab = "Distance Classes", ylab = "Moran's I Statistic",
                 main = "Correlogram: 20 Distance Classes v. Moran's I")

pgi.cor

## Moran I statistic 
##       dist.class         coef      p.value    n
##  [1,]  0.1990337  0.361857358 2.374513e-16 1456
##  [2,]  0.5969017  0.053397800 3.601513e-02 2448
##  [3,]  0.9947692  0.109894328 2.944527e-05 2510
##  [4,]  1.3926366 -0.050663182 9.348842e-01 2682
##  [5,]  1.7905041 -0.067369458 9.813637e-01 2664
##  [6,]  2.1883716 -0.069884071 9.926675e-01 3016
##  [7,]  2.5862391  0.026372246 1.382082e-01 2662
##  [8,]  2.9841065  0.001101876 4.238518e-01 2314
##  [9,]  3.3819740 -0.034663551 8.311488e-01 2282
## [10,]  3.7798415 -0.083070025 9.979919e-01 2796
## [11,]  4.1777089 -0.063152299 9.885816e-01 2994
## [12,]  4.5755764 -0.081461314 9.974368e-01 2902
## [13,]  4.9734439 -0.061362142 9.775390e-01 2652
## [14,]  5.3713114 -0.036867003 8.431153e-01 2238
## [15,]  5.7691788 -0.021614885 6.835567e-01 2144
## [16,]  6.1670463 -0.014011714 5.696890e-01 1550
## [17,]  6.5649138 -0.023713105 6.361859e-01  908
## [18,]  6.9627812  0.009109198 3.402514e-01  504
## [19,]  7.3606487 -0.017917515 4.529364e-01  200
## [20,]  7.7585162 -0.031944444 2.445083e-01   84

Q1 What the correlogram reveals

The correlograms reveal global spatial autocorrelation and instances of positive and negative clustering at different distance classes (where all there is a red dot on the correlograms). I compared correlation when 10 vs. 15 vs. 20 categories were used. With 10 categories, we have statistically significant positive clustering at a maximum distance of 0.3979675 decimal degrees. With 15, we have statistically significant positive clustering at a max distance of 0.7958354 decimal degrees, and with 20, we have it at a max distance of 0.9947692 decimal degrees. This is all at a statistical significance level of 0.05. At further distances, we seem to have primarily negative spatial autocorrelation but none of it is statistically significant.

Q2 Create different neighbor structures and assign weights to them

An inverse-distance based approach to calculating Moran’s I involves defining “neighbors” as every point to every other point, no matter how far apart they are. This means each of our 203 points of malaria prevalence has 202 neighbors, and thus 202 neighbor linkages. These linkages are then each assigned a weight using an inverse distance approach, which greatly decreases the importance of many of the weights. However, it remains to be the case that the inverse distance based approach still uses the greatest number of linkages that are possible.

We can set a limit to this, where not every point is a neighbor to every other point, by defining a threshold distance. This is the crux of the concept of k-nearest neighbors and distance-based neighbors. k-nearest neighbors has to do with limiting the number of neighbor linkages. When k = 1, that means only immediately adjacent points are called “neighbors.” We will use k=1 or immediate neighbors to set neighbor limits:

# define k-nearest neighbors
coords <- xy %>% coordinates()
IDs <- coords %>% as.data.frame() %>% row.names()
Neigh_nb <- coords %>% knearneigh(k=1, longlat = TRUE) %>% knn2nb(row.names=IDs)

dsts <- Neigh_nb %>% nbdists(coords) %>% unlist()
dsts %>% summary()

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00010 0.06127 0.10095 0.12588 0.17375 0.65811

The five-number summary above illustrates the distribution of distances used to provide at least one neighbor (k = 1) to each malaria prevalence point. Now is where the distance-based approach comes in. We take the max distance used to provide at least one neighbor and use that as an input to a distance-based neighbor approach.

# find maximum distance used to provide at least one neighbor to each point
max_1nn <- dsts %>% max()
max_1nn

## [1] 0.6581074

The max distance to provide at least one neighbor to each point is 0.6581074 units. We can input this into our distance-based approach for defining neighbors to see how many neighbor linkages it gives us. For comparison sake, we can also increase the limit of what counts as a “neighbor” by computing twice the max distance. We can see how many more neighbor linkages twice the max distance gives us too.

# define distance-based neighbors within maximum distance:
Neigh_kd1 <- coords %>% dnearneigh(d1=0, d2=max_1nn, row.names=IDs) # when k=1 (immediate neighbors)
# define distance-based neighbors within 2 times the maximum distance:
Neigh_kd2 <- coords %>% dnearneigh(d1=0, d2=2*max_1nn, row.names=IDs) # when k=1

# list of neighbor structures
nb_1 <- list(d1=Neigh_kd1, d2=Neigh_kd2) #when k=1

# Check for symmetry
nb_1 %>% sapply(function(x) is.symmetric.nb(x, verbose=F, force=T))

##   d1   d2 
## TRUE TRUE

nb_1 %>% sapply(function(x) n.comp.nb(x)$nc)

## d1 d2 
##  5  1

As an aside here, the outputs of “5” and “1” indicate that there does not appear to be symmetry between neighbors (i.e. if i is a neighbor of j, then j is a neighbor of i). This should be investigated.

Now to see how many neighbor linkages each distance parameter leaves us with:

# plot neighbor links with different definitions of "neighbors" (max distance vs. 2 times max distance)
par(mfrow=c(2,1), mar= c(1, 0, 1, 0))
xy %>% plot(pch=16)
Neigh_kd1 %>% plot(coords, col="green",add=T, main = "Neighbor Definition: Maximum Distance")
Neigh_kd1

## Neighbour list object:
## Number of regions: 203 
## Number of nonzero links: 2956 
## Percentage nonzero weights: 7.17319 
## Average number of links: 14.56158

xy %>% plot(pch=16)
Neigh_kd2 %>% plot(coords,col="green", add=T, main = "Neighbor Definition: 2 times Maximum Distance")

Neigh_kd2

## Neighbour list object:
## Number of regions: 203 
## Number of nonzero links: 7264 
## Percentage nonzero weights: 17.62722 
## Average number of links: 35.78325

Clearly, looking at the “number of nonzero links” we see that twice the max distance gives us more than double the number of neighbor linkages than the max distance. The number of neighbor linkages would only increase as we increase the distance parameter used. This might tap out at n-squared (in our case 203 * 203 = 41,209) in which case every point is seen as a “neighbor” to every other point.

Moving forward, let’s use the number of neighbor linkages as set by the maximum distance parameter, or 2956 neighbor linkages. Now that we have our number of neighbor linkages, we need to assign weights to these linkages.

weights <- Neigh_kd1 %>% nb2listw(style="W")
weights

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 203 
## Number of nonzero links: 2956 
## Percentage nonzero weights: 7.17319 
## Average number of links: 14.56158 
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0       S1       S2
## W 203 41209 203 46.03848 823.6546

Here, n = 203 indicates that we have 203 malaria prevalence points. By definition per the k-nearest neighbor method, every point has at least one neighbor. Now the number of neighbors that each point has can range quite widely. Here is the total number of neighbors, and corresponding spatial weights, for our 1st malaria prevalence point:

# list of neighbors (each integer represents a unique neighbor/neighbor linkage to point 1) 
weights[["neighbours"]][[1]]

## [1]  9 12 17 20 26 34 35 38

# list of weights (one weight per neighbor linkage)
weights[["weights"]][[1]]

## [1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

The above output shows that point 1 has 8 neighbors (ie. there are 8 integers in that vector), and each of those 8 linkages were assigned a weight of 0.125. This is what’s called row standardized weights; the value of each weight is the output of simply dividing 1 over the number of neighbors (1/8 = 0.125). Row standardized weights are the default option in R when assigning spatial weights.

Q2 Run the Moran’s I test

Now that we have our defined list of neighbor linkages and corresponding row-standardized weights matrix, we can calculate Moran’s I.

# spdep package, Moran's I using k=1 and row-standardized weights
ETH_malaria_data$log_odds %>% moran.test(listw=weights)

## 
##  Moran I test under randomisation
## 
## data:  .  
## weights: weights    
## 
## Moran I statistic standard deviate = 8.8841, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.2709168871     -0.0049504950      0.0009642047

Given the parameters of k=1 and row-standardized weights, we find a statistically significant Moran’s I statistic of 0.2709168871, indicating that our data has positive global clustering.

Q3 Compare results from inverse distance matrix and distance-based neighbor approaches, and draw conclusions

Both the inverse distance based and k-nearest/distance-based neighbor approaches show evidence of statistically significant global spatial autocorrelation (ie positive global clustering). The p-value associated with the Moran’s I statistic using the k-nearest/distance based approach (where k = 1, and row-standardized weights are used) is less than 2.2e-16. This is much less than even the 0.01 threshold for p-values. The Moran’s I statistic with this approach takes on a value of 0.2709168871, which means positive clustering.

The inverse distance matrix and correlograms tell a slightly more granular story of global spatial autocorrelation than the distance-based approach. The correlograms that I generated for 10, 15, and 20 bins indicate that we have statistically significant positive clustering when the max distance is 0.3979675 to 0.9947692 decimal degrees.

Overall, the distance-based neighbor approach shows higher spatial autocorrelation (0.2709168871) than the inverse-distance based approach (0.1847263). This difference makes sense because we approached the neighbor definition and spatial weights questions in different ways with each approach. In the former, we used all neighbor linkages possible and simply took the inverse of the distance as weights. In the latter, we restricted the definition of what counts as a “neighbor” to the max distance needed to give every point at least one neighbor (k=1), and also, we used row-standardized weights instead of weights based on inverse distances.

Perhaps because the k-nearest neighbor method is more popular/common, we can stick with that method and use its parameters for running a Monte Carlo simulation. This might confirm what the value of the Moran’s I statistic is (again, using row-standardized weights and k=1):

set.seed(1234)
bperm <- ETH_malaria_data$log_odds %>% moran.mc(listw=weights,nsim=999)
bperm

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  . 
## weights: weights  
## number of simulations + 1: 1000 
## 
## statistic = 0.27092, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater

# Plot simulated test statistics
par(mfrow=c(1,1), mar= c(5, 4, 4, 2))
hist(bperm$res, freq=T, breaks=20, xlab="Moran's I statistics",
     main = "Distribution of Moran's I statistics over 999 simulations")
abline(v=0.27092, col="red")

The Monte Carlo simulation results indicate that with a high degree of statistical significance, the Moran’s I statistic is 0.27092.

In conclusion, we could tentatively say there is global spatial autocorrelation - specifically positive clustering, as the Moran’s I statistic is positive. This is a tentative conclusion only because the data was not normally distributed to begin with. This reduces the robustness of the Moran’s I test because the null hypothesis of the Moran’s I test is that the data is normally distributed.

Local Spatial Autocorrelation using LISAs

Clustering in the underlying spatial process (in this case being values taken at each point) can vary from location to location. We can decompose the Global Moran’s I into its localized components at each point because the Moran’s I is ultimately a sum of its local parts.

With LISA, our “localized areas” are simply 203 windows, one window drawn around each prevalence point in our dataset.

# First calculate the local Moran's I around each point based on row standardized weights where k = 1
I <- ETH_malaria_data$log_odds %>% localmoran(weights) # "spdep" package

# Print 'LISA' for each point
Coef <- I %>% as.data.frame(row.names=row.names(coords), check.names=FALSE) %>% printCoefmat()

##             Ii       E.Ii     Var.Ii    Z.Ii Pr(z > 0)    
## 1    0.0173397 -0.0049505  0.1085058  0.0677   0.47302    
## 2    0.0258857 -0.0049505  0.2959095  0.0567   0.47740    
## 3    0.0924392 -0.0049505  0.2209480  0.2072   0.41793    
## 4    0.0924392 -0.0049505  0.2959095  0.1790   0.42896    
## 5   -0.8178867 -0.0049505  0.2959095 -1.4944   0.93247    
## 6   -0.0490740 -0.0049505  0.2209480 -0.0939   0.53739    
## 7    0.0478376 -0.0049505  0.0960122  0.1704   0.43236    
## 8    0.0510836 -0.0049505  0.1759711  0.1336   0.44687    
## 9   -0.0204004 -0.0049505  0.1085058 -0.0469   0.51870    
## 10   0.0628995 -0.0049505  0.1245689  0.1922   0.42378    
## 11  -0.1038433 -0.0049505  0.8956013 -0.1045   0.54161    
## 12   0.0066112 -0.0049505  0.1245689  0.0328   0.48693    
## 13  -0.1038433 -0.0049505  0.8956013 -0.1045   0.54161    
## 14  -0.1143389 -0.0049505  0.1759711 -0.2608   0.60286    
## 15   0.0924392 -0.0049505  0.2959095  0.1790   0.42896    
## 16   0.0599998 -0.0049505  0.1459865  0.1700   0.43251    
## 17  -0.0084349 -0.0049505  0.0960122 -0.0112   0.50449    
## 18   0.0478376 -0.0049505  0.0960122  0.1704   0.43236    
## 19  -0.1288262 -0.0049505  0.1759711 -0.2953   0.61612    
## 20   0.0066112 -0.0049505  0.1245689  0.0328   0.48693    
## 21   0.0235131 -0.0049505  0.2959095  0.0523   0.47913    
## 22   0.2207016 -0.0049505  0.0860173  0.7694   0.22083    
## 23  -0.0906693 -0.0049505  0.0778397 -0.3072   0.62067    
## 24  -0.0181249 -0.0049505  0.0860173 -0.0449   0.51791    
## 25  -0.1089802 -0.0049505  0.0860173 -0.3547   0.63859    
## 26  -0.0751900 -0.0049505  0.0960122 -0.2267   0.58966    
## 27   0.0681097 -0.0049505  0.1085058  0.2218   0.41224    
## 28   4.8901264 -0.0049505  0.2959095  8.9987   < 2e-16 ***
## 29  -0.0919487 -0.0049505  0.1459865 -0.2277   0.59006    
## 30   0.0135799 -0.0049505  0.1759711  0.0442   0.48238    
## 31   0.0924392 -0.0049505  0.4458324  0.1459   0.44202    
## 32  -0.8178867 -0.0049505  0.2959095 -1.4944   0.93247    
## 33  -0.1021970 -0.0049505  0.1459865 -0.2545   0.60045    
## 34  -0.0076179 -0.0049505  0.1085058 -0.0081   0.50323    
## 35  -0.0426795 -0.0049505  0.1245689 -0.1069   0.54257    
## 36   0.2647264 -0.0049505  0.0860173  0.9195   0.17892    
## 37   0.0599998 -0.0049505  0.1459865  0.1700   0.43251    
## 38  -0.0393738 -0.0049505  0.1245689 -0.0975   0.53885    
## 39   0.0708118 -0.0049505  0.0489775  0.3423   0.36605    
## 40   0.0708118 -0.0049505  0.0489775  0.3423   0.36605    
## 41  -0.1976531 -0.0049505  0.0603162 -0.7846   0.78367    
## 42   0.0924392 -0.0049505  0.0460379  0.4539   0.32495    
## 43   0.0740559 -0.0049505  0.0410404  0.3900   0.34827    
## 44   0.0749313 -0.0049505  0.0388987  0.4050   0.34273    
## 45   0.0110585 -0.0049505  0.0960122  0.0517   0.47940    
## 46   0.0924392 -0.0049505  0.0335443  0.5317   0.29745    
## 47   0.0771198 -0.0049505  0.0335443  0.4481   0.32704    
## 48   0.0924392 -0.0049505  0.0351739  0.5193   0.30178    
## 49   0.0924392 -0.0049505  0.0652588  0.3812   0.35151    
## 50   0.0924392 -0.0049505  0.0489775  0.4401   0.32995    
## 51   0.0777326 -0.0049505  0.0320451  0.4619   0.32208    
## 52   0.0924392 -0.0049505  0.0351739  0.5193   0.30178    
## 53   0.0360987 -0.0049505  0.0652588  0.1607   0.43617    
## 54   0.0191966 -0.0049505  0.0860173  0.0823   0.46719    
## 55   0.0924392 -0.0049505  0.0281899  0.5801   0.28094    
## 56  -0.1175889 -0.0049505  0.1245689 -0.3191   0.62519    
## 57   0.0924392 -0.0049505  0.0270821  0.5918   0.27699    
## 58   0.0401231 -0.0049505  0.0603162  0.1835   0.42719    
## 59   0.0924392 -0.0049505  0.0560327  0.4114   0.34038    
## 60   0.0924392 -0.0049505  0.0306612  0.5562   0.28904    
## 61   0.0924392 -0.0049505  0.0293798  0.5682   0.28496    
## 62   0.0924392 -0.0049505  0.0260481  0.6034   0.27311    
## 63   0.0924392 -0.0049505  0.0260481  0.6034   0.27311    
## 64   0.0519103 -0.0049505  0.0960122  0.1835   0.42720    
## 65   0.0924392 -0.0049505  0.0270821  0.5918   0.27699    
## 66   0.0519103 -0.0049505  0.0960122  0.1835   0.42720    
## 67   0.0924392 -0.0049505  0.0410404  0.4807   0.31535    
## 68   0.0924392 -0.0049505  0.0369516  0.5066   0.30621    
## 69   0.0924392 -0.0049505  0.0260481  0.6034   0.27311    
## 70   0.0924392 -0.0049505  0.0260481  0.6034   0.27311    
## 71   0.0924392 -0.0049505  0.0522847  0.4259   0.33508    
## 72   0.0924392 -0.0049505  0.0351739  0.5193   0.30178    
## 73   0.0924392 -0.0049505  0.0522847  0.4259   0.33508    
## 74   0.0924392 -0.0049505  0.0522847  0.4259   0.33508    
## 75   0.0924392 -0.0049505  0.0434076  0.4674   0.32009    
## 76   0.0924392 -0.0049505  0.0522847  0.4259   0.33508    
## 77   0.0924392 -0.0049505  0.0434076  0.4674   0.32009    
## 78   0.0924392 -0.0049505  0.0320451  0.5440   0.29321    
## 79   0.0924392 -0.0049505  0.0320451  0.5440   0.29321    
## 80   0.0924392 -0.0049505  0.0560327  0.4114   0.34038    
## 81   0.0924392 -0.0049505  0.0351739  0.5193   0.30178    
## 82   0.0924392 -0.0049505  0.0410404  0.4807   0.31535    
## 83   0.0924392 -0.0049505  0.0652588  0.3812   0.35151    
## 84   0.0924392 -0.0049505  0.0603162  0.3965   0.34585    
## 85  -0.2358937 -0.0049505  0.0778397 -0.8278   0.79610    
## 86   0.0924392 -0.0049505  0.2209480  0.2072   0.41793    
## 87  -0.0093255 -0.0049505  0.0860173 -0.0149   0.50595    
## 88  -0.0206326 -0.0049505  0.0960122 -0.0506   0.52018    
## 89   0.0924392 -0.0049505  0.0860173  0.3321   0.36992    
## 90   0.0924392 -0.0049505  0.0652588  0.3812   0.35151    
## 91  -0.3886916 -0.0049505  0.0460379 -1.7885   0.96315    
## 92   0.0924392 -0.0049505  0.1085058  0.2957   0.38375    
## 93   0.0924392 -0.0049505  0.1459865  0.2549   0.39940    
## 94  -0.5162596 -0.0049505  0.0522847 -2.2361   0.98733    
## 95   0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 96  -0.5162596 -0.0049505  0.0522847 -2.2361   0.98733    
## 97  -0.5162596 -0.0049505  0.0522847 -2.2361   0.98733    
## 98   0.0694601 -0.0049505  0.0522847  0.3254   0.37243    
## 99   0.0590150 -0.0049505  0.0778397  0.2293   0.40933    
## 100  0.0805713 -0.0049505  0.0241741  0.5500   0.29114    
## 101  0.0778326 -0.0049505  0.0306612  0.4728   0.31819    
## 102  0.0805713 -0.0049505  0.0241741  0.5500   0.29114    
## 103  0.0783736 -0.0049505  0.0293798  0.4861   0.31344    
## 104  0.0924392 -0.0049505  0.0710250  0.3654   0.35739    
## 105  0.0924392 -0.0049505  0.0351739  0.5193   0.30178    
## 106  0.0679281 -0.0049505  0.0560327  0.3079   0.37909    
## 107  0.0363223 -0.0049505  0.0652588  0.1616   0.43582    
## 108 -0.4486264 -0.0049505  0.0460379 -2.0678   0.98067    
## 109 -0.4127578 -0.0049505  0.0410404 -2.0130   0.97794    
## 110 -0.4804538 -0.0049505  0.0489775 -2.1486   0.98417    
## 111 -0.0138998 -0.0049505  0.0652588 -0.0350   0.51397    
## 112 -0.0138998 -0.0049505  0.0652588 -0.0350   0.51397    
## 113  0.0316459 -0.0049505  0.0710250  0.1373   0.44539    
## 114  4.4495910 -0.0049505  0.0489775 20.1282   < 2e-16 ***
## 115 -0.4804538 -0.0049505  0.0489775 -2.1486   0.98417    
## 116  0.0592792 -0.0049505  0.0778397  0.2302   0.40896    
## 117  5.1556793 -0.0049505  0.0489775 23.3187   < 2e-16 ***
## 118  0.0316459 -0.0049505  0.0710250  0.1373   0.44539    
## 119  2.1446462 -0.0049505  0.0369516 11.1825   < 2e-16 ***
## 120  5.1556793 -0.0049505  0.0489775 23.3187   < 2e-16 ***
## 121  0.1051839 -0.0049505  0.0652588  0.4311   0.33319    
## 122  0.0924392 -0.0049505  0.2209480  0.2072   0.41793    
## 123  3.9338386 -0.0049505  0.0522847 17.2256   < 2e-16 ***
## 124  5.4923424 -0.0049505  0.0522847 24.0415   < 2e-16 ***
## 125  8.8022934 -0.0049505  0.0522847 38.5170   < 2e-16 ***
## 126  5.1743879 -0.0049505  0.0560327 21.8803   < 2e-16 ***
## 127 -0.0606434 -0.0049505  0.0960122 -0.1797   0.57132    
## 128  0.2330509 -0.0049505  0.0710250  0.8930   0.18592    
## 129 -0.6847870 -0.0049505  0.0652588 -2.6612   0.99611    
## 130  4.5079085 -0.0049505  0.2959095  8.2961   < 2e-16 ***
## 131  0.0924392 -0.0049505  0.8956013  0.1029   0.45902    
## 132  0.7936805 -0.0049505  0.1759711  1.9038   0.02847 *  
## 133 -0.2589839 -0.0049505  0.2209480 -0.5404   0.70555    
## 134  0.7965532 -0.0049505  0.2959095  1.4734   0.07032 .  
## 135 -0.1313601 -0.0049505  0.0960122 -0.4080   0.65835    
## 136 -0.1089802 -0.0049505  0.0860173 -0.3547   0.63859    
## 137 -0.0906693 -0.0049505  0.0778397 -0.3072   0.62067    
## 138 -0.1089802 -0.0049505  0.0860173 -0.3547   0.63859    
## 139  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 140  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 141  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 142  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 143  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 144  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 145  0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 146  0.0924392 -0.0049505  0.1085058  0.2957   0.38375    
## 147  0.0924392 -0.0049505  0.0778397  0.3491   0.36352    
## 148  0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 149  0.0924392 -0.0049505  0.0860173  0.3321   0.36992    
## 150  0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 151  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 152  0.0924392 -0.0049505  0.0860173  0.3321   0.36992    
## 153  0.0924392 -0.0049505  0.0860173  0.3321   0.36992    
## 154  0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 155  0.0924392 -0.0049505  0.0778397  0.3491   0.36352    
## 156  0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 157  0.0924392 -0.0049505  0.8956013  0.1029   0.45902    
## 158  0.0924392 -0.0049505  0.1245689  0.2759   0.39130    
## 159  0.0924392 -0.0049505  0.2209480  0.2072   0.41793    
## 160  0.0924392 -0.0049505  0.2959095  0.1790   0.42896    
## 161  0.0924392 -0.0049505  0.2209480  0.2072   0.41793    
## 162  0.0924392 -0.0049505  0.0960122  0.3143   0.37665    
## 163  0.0924392 -0.0049505  0.0560327  0.4114   0.34038    
## 164  0.0924392 -0.0049505  0.0860173  0.3321   0.36992    
## 165  0.0924392 -0.0049505  0.0522847  0.4259   0.33508    
## 166  0.0924392 -0.0049505  0.0489775  0.4401   0.32995    
## 167  0.0924392 -0.0049505  0.0489775  0.4401   0.32995    
## 168  0.0924392 -0.0049505  0.0434076  0.4674   0.32009    
## 169  0.0759274 -0.0049505  0.0351739  0.4312   0.33315    
## 170  0.0778326 -0.0049505  0.0306612  0.4728   0.31819    
## 171  0.0778326 -0.0049505  0.0306612  0.4728   0.31819    
## 172  0.0783736 -0.0049505  0.0293798  0.4861   0.31344    
## 173  0.0783736 -0.0049505  0.0293798  0.4861   0.31344    
## 174  0.0783736 -0.0049505  0.0293798  0.4861   0.31344    
## 175  0.0788759 -0.0049505  0.0281899  0.4993   0.30880    
## 176  0.0793436 -0.0049505  0.0270821  0.5122   0.30425    
## 177  0.0801885 -0.0049505  0.0250809  0.5376   0.29543    
## 178 -0.2873323 -0.0049505  0.0335443 -1.5418   0.93844    
## 179  0.0783736 -0.0049505  0.0293798  0.4861   0.31344    
## 180  0.0797801 -0.0049505  0.0260481  0.5250   0.29979    
## 181  0.0801885 -0.0049505  0.0250809  0.5376   0.29543    
## 182  0.0809310 -0.0049505  0.0233223  0.5624   0.28694    
## 183  0.0809310 -0.0049505  0.0233223  0.5624   0.28694    
## 184  0.0772483 -0.0049505  0.0320451  0.4592   0.32305    
## 185  0.0801885 -0.0049505  0.0250809  0.5376   0.29543    
## 186  0.0805713 -0.0049505  0.0241741  0.5500   0.29114    
## 187  0.0805713 -0.0049505  0.0241741  0.5500   0.29114    
## 188  0.0759274 -0.0049505  0.0351739  0.4312   0.33315    
## 189  0.0924392 -0.0049505  0.0306612  0.5562   0.28904    
## 190  0.0924392 -0.0049505  0.0460379  0.4539   0.32495    
## 191  0.0924392 -0.0049505  0.0335443  0.5317   0.29745    
## 192  0.0924392 -0.0049505  0.0320451  0.5440   0.29321    
## 193  0.0783736 -0.0049505  0.0293798  0.4861   0.31344    
## 194  0.0924392 -0.0049505  0.0434076  0.4674   0.32009    
## 195  0.0924392 -0.0049505  0.0335443  0.5317   0.29745    
## 196  0.0924392 -0.0049505  0.0388987  0.4938   0.31073    
## 197  0.0924392 -0.0049505  0.0522847  0.4259   0.33508    
## 198  0.0924392 -0.0049505  0.0489775  0.4401   0.32995    
## 199  0.0924392 -0.0049505  0.0778397  0.3491   0.36352    
## 200  0.0924392 -0.0049505  0.1085058  0.2957   0.38375    
## 201  0.0924392 -0.0049505  0.1459865  0.2549   0.39940    
## 202  0.0924392 -0.0049505  0.1459865  0.2549   0.39940    
## 203  0.0924392 -0.0049505  0.2209480  0.2072   0.41793    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The LISA output provided here includes:

Observed local Moran’s I (Ii)
Expected local Moran’s I (E.Ii) (!! based on the Global Moran’s I value? or assumption of stationarity?)
Variance of the local Moran’s I (V.Ii)
Z-value which can be used to determine statistical significance of the clustering (Z.Ii)

We can map the observed vs. expected local moran’s I values in a 4-quadrant grid to see where deviations are the greatest in terms of the observed vs. expected Moran’s I. Areas of higher deviation would indicate that the underlying spatial process is not stationary, as it deviates at those prevalence points.

# Plot the spatial data against its spatially lagged values (the weighted mean of its neighbors)                         
nci <- ETH_malaria_data$log_odds %>% moran.plot(listw=weights,
                                                main = "Observed vs. Expected Log prevalence",
                                               xlab="Log prevalence (log_odds variable)",
                                               ylab="Spatially lagged log prev",
                                               labels=T, pch=16, col="grey")

This plot illustrates all 203 of our malaria point prevalence points in terms of observed vs. expected log prevalence. “Expected” prevalence refers to the weighted mean of each point’s neighbors. The gray spots are used to represent those expected values.

Reading the quadrants can be done as follows:

Bottom right: High-Low (HL) meaning abnormally high: high observed value of log prevalence, but low expected value, where expectation is based on the mean in the local region. The area around the prevalence point has low values, so we expect the observed prevalence point we have to be low too. However, the observed prevalence point we have is actually a high value.
Top left: Low-High (LH) meaning abnormally low: high expected value, but low observed value. The area around the point have high values, but the observed point itself has a low value.
Bottom left: Low-Low (LL): observed value matches the mean in the localized region, which is low
Top right: High-High (HH): observed value matches the mean in the localized region, which is high

In the case of this malaria point prevalence data, most of our data sits in the LH and HH quadrants. This means that most of our deviances from the stationarity assumption are in the LH region: we have abnormally low observed prevalence in regions where mean prevalence is higher.

Now we can identify the number of outlier points in our data:

# find which points are statistically significant outliers
infl <- apply(nci$is.inf,1,any) 
sum(infl %in% T)

## [1] 22

It seems that 22 of our 203 points are outliers, which is more than one might expect is due to chance.