Spatial Statistics in R

# The number of points to create
n <- 200

# Set the range
xmin <- 0
xmax <- 1
ymin <- 0
ymax <- 2

# Sample from a Uniform distribution
x <- runif(n, xmin, xmax)
y <- runif(n, ymin, ymax)

# See pre-defined variables
ls.str()

## n :  num 200
## x :  num [1:200] 0.00169 0.53447 0.6071 0.26857 0.42907 ...
## xmax :  num 1
## xmin :  num 0
## y :  num [1:200] 0.24 0.571 0.356 1.057 0.912 ...
## ymax :  num 2
## ymin :  num 0

# Plot points and a rectangle

mapxy <- function(a=NA){
  plot(x, y, asp = a)
  rect(xmin, ymin, xmax, ymax)
}

mapxy(a=1)

# Load the spatstat package
library(spatstat)

## Loading required package: spatstat.data

## Loading required package: spatstat.geom

## spatstat.geom 2.1-0

## Loading required package: spatstat.core

## Loading required package: nlme

## Loading required package: rpart

## spatstat.core 2.1-2

## Loading required package: spatstat.linnet

## spatstat.linnet 2.1-1

## 
## spatstat 2.1-0       (nickname: 'Comedic violence') 
## For an introduction to spatstat, type 'beginner'

# Create this many points, in a circle of this radius
n_points <- 300
radius <- 10

# Generate uniform random numbers up to radius-squared
r_squared <- runif(n_points, 0, radius^2)
angle <- runif(n_points, 0, 2*pi)

# Take the square root of the values to get a uniform spatial distribution
x <- sqrt(r_squared) * cos(angle)
y <- sqrt(r_squared) * sin(angle)

plot(disc(radius),disc(radius)); points(x, y)

# Set coordinates and window
ppxy <- ppp(x = x, y = y, window = disc(radius))

# Test the point pattern
qt <- quadrat.test(ppxy)

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

# Inspect the results
plot(qt)

print(qt)

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  ppxy
## X2 = 22.004, df = 24, p-value = 0.842
## alternative hypothesis: two.sided
## 
## Quadrats: 25 tiles (irregular windows)

The p-value of the quadrat test is much bigger than 0.05, so you can not reject the hypothesis that the points are completely spatially random

A Poisson point process creates events according to a Poisson distribution with an intensity parameter specifying the expected events per unit area. The total number of events generated is a single number from a Poisson distribution, so multiple realizations of the same process can easily have different numbers of events.

# Create a disc of radius 10
disc10 <- disc(10)

# Compute the rate as count divided by area
lambda <- 500 / area(disc10)

# Create a point pattern object
ppois <- rpoispp(lambda = lambda, win = disc10)

# Plot the Poisson point pattern
plot(ppois)

The following part is about Simulating clustered and inhibitory patterns. We will generate point patterns from other process models from spatstat package. These generally fall into one of two classes: clustered processes, where points occur together more than under a uniform Poisson process, and regular (aka inhibitory) processes where points are more spaced apart than under a uniform intensity Poisson process. Some process models can generate patterns on a continuum from clustered through uniform to regular depending on their parameters.

The quadrat.test() function can test against clustered or regular alternative hypotheses. By default it tests against either of those, but this can be changed with the alternative parameter to create a one-sided test.

A Thomas process is a clustered pattern where a number of “parent” points, uniformly distributed, create a number of “child” points in their neighborhood. The child points themselves form the pattern. This is an attractive point pattern, and makes sense for modeling things like trees, since new trees will grow near the original tree. Random Thomas point patterns can be generated using rThomas(). This takes three numbers that determine the intensity and clustering of the points, and a window object.

Conversely the points of a Strauss process cause a lowering in the probability of finding another point nearby. The parameters of a Strauss process can be such that it is a “hard-core” process, where no two points can be closer than a set threshold. Creating points from this process involves some clever simulation algorithms. This is a repulsive point pattern, and makes sense for modeling things like territorial animals, since the other animals of that species will avoid the territory of a given animal. Random Strauss point patterns can be generated using rStrauss(). This takes three numbers that determine the intensity and “territory” of the points, and a window object. Points generated by a Strauss process are sometimes called regularly spaced.

# Generate clustered points from a Thomas process
set.seed(123)
p_cluster <- rThomas(kappa = 0.35, scale = 1, mu = 3, win = disc10)
plot(p_cluster)

# Run a quadrat test
quadrat.test(p_cluster, alternative = "clustered")

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  p_cluster
## X2 = 53.387, df = 24, p-value = 0.0005142
## alternative hypothesis: clustered
## 
## Quadrats: 25 tiles (irregular windows)

# Regular points from a Strauss process
set.seed(123)
p_regular <- rStrauss(beta = 2.9, gamma = 0.025, R = .5, W = disc10)

## Warning: Simulation will be performed in the containing rectangle and clipped to
## the original window.

plot(p_regular)

# Run a quadrat test
quadrat.test(p_regular,alternative = "regular")

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  p_regular
## X2 = 8.57, df = 24, p-value = 0.001619
## alternative hypothesis: regular
## 
## Quadrats: 25 tiles (irregular windows)

Tree location pattern A ppp object called redoak has been loaded and contains the locations of trees in a woodland. Plot the tree locations and then test the hypothesis that the points are clustered against a null hypothesis of a random uniform distribution.

I can reject the null hypothesis with p-value less than 0.01

Case Study

The dataset we shall use for this example consists of crimes in a 4km radius of the center of Preston, a town in north-west England. We want to look for hotspots of violent crime in the area.

library(spatstat)
library(raster)

## Loading required package: sp

## 
## Attaching package: 'raster'

## The following object is masked from 'package:nlme':
## 
##     getData

## The following objects are masked from 'package:spatstat.geom':
## 
##     area, rotate, shift

setwd("~/Desktop/Spatial Analysis")
preston_crime <- readRDS("pcrime_spatstat.rds")
preston_osm <- readRDS("osm_preston.rds")
# Get some summary information on the dataset
summary(preston_crime)

## Marked planar point pattern:  2036 points
## Average intensity 4.053214e-05 points per square unit
## 
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 units
## 
## Multitype:
##                   frequency proportion    intensity
## Non-violent crime      1812  0.8899804 3.607281e-05
## Violent crime           224  0.1100196 4.459332e-06
## 
## Window: polygonal boundary
## single connected closed polygon with 99 vertices
## enclosing rectangle: [349773, 357771] x [425706.5, 433705.5] units
##                      (7998 x 7999 units)
## Window area = 50231700 square units
## Fraction of frame area: 0.785

# Get a table of marks
table(marks(preston_crime))

## 
## Non-violent crime     Violent crime 
##              1812               224

# Define a function to create a map
preston_map <- function(cols = c("green","red"), cex = c(1, 1), pch = c(1, 1)) {
  plotRGB(preston_osm) # from the raster package
  plot(preston_crime, cols = cols, pch = pch, cex = cex, add = TRUE, show.window = TRUE)
}

# Draw the map with colors, sizes and plot character
preston_map(
  cols = c("black", "red"), 
  cex = c(0.5, 1), 
  pch = c(19,19))

# Use the split function to show the two point patterns
crime_splits <- split(preston_crime)

# Plot the split crime
plot(crime_splits)

# Compute the densities of both sets of points
crime_densities <- density(crime_splits)

# Calc the violent density divided by the sum of both
frac_violent_crime_density <- crime_densities[["Violent crime"]] / 
  (crime_densities[["Non-violent crime"]] + crime_densities[["Violent crime"]])

# Plot the density of the fraction of violent crime
plot(frac_violent_crime_density)

##' Run the spatial segregation analysis
##'
##' This is the details of the S3 generic method
##' @title spseg
##' @return spseg results
##' @author Barry Rowlingson
##' @rdname spseg
##' @export
spseg <- function(pts, ...){
    UseMethod("spseg")
}

##' spseg for spatstat objects
##'
##' Does spseg for marked ppp objects
##' @title spatial segregation for ppp objects
##' @return an spseg object
##' @author Barry Rowlingson
##' @rdname spseg
##' @export
spseg.ppp = function(pts, h, opt, ...){
    pw = as.data.frame(spatstat::as.polygonal(pts$win))
    if(ncol(pw)>2){
        stop("spseg can only handle simple ring windows")
    }
    
    xypoly = as.matrix(pw)
    xypts = cbind(pts$x, pts$y)
    m = spatstat::marks(pts)
    # test if only one mark
    m = as.character(m)
    spseg.matrix(xypts, m, h=h, opt=opt, poly=xypoly, ...)
}

sasq <- readRDS("sasquatch.rds")

# Get a quick summary of the dataset
summary(sasq)

## Marked planar point pattern:  423 points
## Average intensity 2.097156e-09 points per square unit
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 units
## 
## Mark variables: date, year, month
## Summary:
##       date                 year         month    
##  Min.   :1990-05-03   Y2004  : 41   Sep    : 59  
##  1st Qu.:2000-04-30   Y2000  : 36   Oct    : 56  
##  Median :2003-11-05   Y2002  : 30   Aug    : 54  
##  Mean   :2003-08-11   Y2005  : 30   Jul    : 50  
##  3rd Qu.:2007-11-02   Y2001  : 26   Nov    : 43  
##  Max.   :2016-04-05   Y2008  : 26   Jun    : 41  
##                       (Other):234   (Other):120  
## 
## Window: polygonal boundary
## single connected closed polygon with 64 vertices
## enclosing rectangle: [368187.8, 764535.6] x [4644873, 5434933] units
##                      (396300 x 790100 units)
## Window area = 2.01702e+11 square units
## Fraction of frame area: 0.644

# Plot unmarked points
plot(unmark(sasq))

# Plot the points using a circle sized by date
plot(sasq, which.marks = "date")

There’s a cluster in north part.

The following part, we go through the temporal behaviour, to explore are number of sightings increasing or decreasing, dose rate vary over seasonality, does the spatial pattern change much over the course of a year.

# Let's see available marks
names(marks(sasq))

## [1] "date"  "year"  "month"

# Histogram the dates of the sightings, grouped by year
hist(marks(sasq)$date, "years", freq = TRUE)

# Plot and tabulate the calendar month of all the sightings
plot(table(marks(sasq)$month))

# Split on the month mark
sasq_by_month <- split(sasq, "month", un = TRUE)

# Plot monthly maps
plot(sasq_by_month)

# Plot smoothed versions of the above split maps
plot(density(sasq_by_month))

According to the plot, we could say annual sightings peaked around 2004, peak month is September.

TO set up:

event locations event times region polygon time limits the time and space ranges for analysis.

# Get a matrix of event coordinates
sasq_xy <- as.matrix(coords(sasq))

# Check the matrix has two columns
dim(sasq_xy)

## [1] 423   2

# Get a vector of event times
sasq_t <- marks(sasq)$date

# Extract a two-column matrix from the ppp object
sasq_poly <- as.matrix(as.data.frame(Window(sasq)))
dim(sasq_poly)

## [1] 64  2

# Set the time limit to 1 day before and 1 day after the range of times
tlimits <- range(sasq_t) + c(-1, 1)

# Scan over 400m intervals from 100m to 20km
s <- seq(100, 20000, by = 400)

# Scan over 14 day intervals from one week to 31 weeks
tm <- seq(7, 217, by = 14)

Monte-carlo test of space-time clustering

Everything is now ready for you to run the space-time clustering test function. You can then plot the results and compute a p-value for rejecting the null hypothesis of no space-time clustering.

Any space-time clustering in a data set will be removed if you randomly rearrange the dates of the data points. The stmctest() function computes a clustering test statistic for your data based on the space-time K-function - how many points are within a spatial and temporal window of a point of the data. It then does a number of random rearrangements of the dates among the points and computes the clustering statistic. After doing this a large number of times, you can compare the test statistic for your data with the values from the random data. If the test statistic for your data is sufficiently large or small, you can reject the null hypothesis of no space-time clustering.

The output from stmctest() is a list with a single t0 which is the test statistic for your data, and a vector of t from the simulations. By converting to data frame you can feed this to ggplot functions.

Because the window area is a large number of square meters, and we have about 400 events, the numerical value of the intensity is a very small number. This makes values of the various K-functions very large numbers, since they are proportional to the inverse of the intensity. Don’t worry if you see 10^10 or higher!

The p-value of a Monte-Carlo test like this is just the proportion of test statistics that are larger than the value from the data. You can compute this from the t and t0 elements of the output.

library(splancs)

## 
## Spatial Point Pattern Analysis Code in S-Plus
##  
##  Version 2 - Spatial and Space-Time analysis

## 
## Attaching package: 'splancs'

## The following object is masked from 'package:raster':
## 
##     zoom

library(ggplot2)
# Run 999 simulations 
sasq_mc <- stmctest(sasq_xy, sasq_t, sasq_poly, tlimits, s, tm, nsim = 999, quiet = TRUE)
names(sasq_mc)

## [1] "t0" "t"

# Histogram the simulated statistics and add a line at the data value
ggplot(data.frame(sasq_mc), aes(x = t)) +
  geom_histogram(binwidth = 1e13) +
  geom_vline(aes(xintercept = t0))

# Compute the p-value as the proportion of tests greater than the data
sum(sasq_mc$t > sasq_mc$t0) / 1000

## [1] 0.051

In 2016 the UK held a public vote on whether to remain in the European Union. The results of the referendum, where people voted either “Remain” or “Leave”, are available online. The data set london_ref contains the results for the 32 boroughs of London, and includes the number and percentage of votes in each category as well as the count of spoilt votes, the population size and the electorate size.

london_ref <- readRDS("london_eu.rds")

# See what information we have for each borough
summary(london_ref)

## Warning in wkt(obj): CRS object has no comment

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##        min      max
## x 503574.2 561956.7
## y 155850.8 200933.6
## Is projected: TRUE 
## proj4string :
## [+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy
## +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894]
## Data attributes:
##      NAME             TOTAL_POP        Electorate       Votes_Cast    
##  Length:32          Min.   :157711   Min.   : 83042   Min.   : 54801  
##  Class :character   1st Qu.:237717   1st Qu.:143458   1st Qu.:104079  
##  Mode  :character   Median :272017   Median :168394   Median :116280  
##                     Mean   :270780   Mean   :169337   Mean   :118025  
##                     3rd Qu.:316911   3rd Qu.:196285   3rd Qu.:134142  
##                     Max.   :379691   Max.   :245349   Max.   :182570  
##      Remain           Leave       Rejected_Ballots   Pct_Remain   
##  Min.   : 27750   Min.   :17138   Min.   : 60.0    Min.   :30.34  
##  1st Qu.: 55973   1st Qu.:32138   1st Qu.:105.0    1st Qu.:53.69  
##  Median : 70254   Median :45262   Median :138.0    Median :61.01  
##  Mean   : 70631   Mean   :47255   Mean   :139.0    Mean   :60.46  
##  3rd Qu.: 84287   3rd Qu.:59018   3rd Qu.:164.2    3rd Qu.:69.90  
##  Max.   :118463   Max.   :96885   Max.   :267.0    Max.   :78.62  
##    Pct_Leave      Pct_Rejected      Assembly        
##  Min.   :21.38   Min.   :0.0600   Length:32         
##  1st Qu.:30.10   1st Qu.:0.0875   Class :character  
##  Median :38.99   Median :0.1100   Mode  :character  
##  Mean   :39.54   Mean   :0.1187                     
##  3rd Qu.:46.31   3rd Qu.:0.1500                     
##  Max.   :69.66   Max.   :0.2200

# Which boroughs voted to "Leave"?
london_ref$NAME[london_ref$Leave > london_ref$Remain]

## [1] "Sutton"               "Barking and Dagenham" "Bexley"              
## [4] "Havering"             "Hillingdon"

# Plot a map of the percentage that voted "Remain"
spplot(london_ref, zcol = "Pct_Remain")

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

Spatial autocorrelation test

The map of “Remain” votes seems to have spatial correlation. Pick any two boroughs that are neighbors - with a shared border - and the chances are they’ll be more similar than any two random boroughs. This can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.

The spdep package has functions for measures of spatial correlation, also known as spatial dependency. Computing these measures first requires you to work out which regions are neighbors via the poly2nb() function, short for “polygons to neighbors”. The result is an object of class nb. Then you can compute the test statistic and run a significance test on the null hypothesis of no spatial correlation. The significance test can either be done by Monte-Carlo or theoretical models.

In this example you’ll use the Moran “I” statistic to test the spatial correlation of the population and the percentage “Remain” vote.

# Use the spdep package
library(spdep)

## Loading required package: spData

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

## Loading required package: sf

## Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1

# Make neighbor list
borough_nb <- poly2nb(london_ref)

# Get center points of each borough
borough_centers <- coordinates(london_ref)

# Show the connections
plot(london_ref); plot(borough_nb, borough_centers, add = TRUE)

## Warning in wkt(obj): CRS object has no comment

# Map the total pop'n
spplot(london_ref, zcol = "TOTAL_POP")

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

# Run a Moran I test on total pop'n
moran.test(
  london_ref$TOTAL_POP, 
  nb2listw(borough_nb)
)

## 
##  Moran I test under randomisation
## 
## data:  london_ref$TOTAL_POP  
## weights: nb2listw(borough_nb)    
## 
## Moran I statistic standard deviate = 1.2124, p-value = 0.1127
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.11549264       -0.03225806        0.01485190

# Map % Remain
spplot(london_ref, zcol = "Pct_Remain")

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

# Run a Moran I MC test on % Remain
moran.mc(
  london_ref$Pct_Remain, 
  nb2listw(borough_nb), 
  nsim = 999
)

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  london_ref$Pct_Remain 
## weights: nb2listw(borough_nb)  
## number of simulations + 1: 1000 
## 
## statistic = 0.42841, observed rank = 999, p-value = 0.001
## alternative hypothesis: greater

The p-value was around 0.1 in the first case, thus there’s no any significant spatial correlation. In the second case, the p-value was around 0.001, so there’s significant spatial correlation.

Canadian geochemical survey data

Your job is to study the acidity (pH) of some Canadian survey data. The survey measurements are loaded into a spatial data object called ca_geo.

ca_geo <- readRDS("ca_geo.rds")

# ca_geo has been pre-defined
str(ca_geo, 1)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots

# See what measurements are at each location
names(ca_geo)

##  [1] "ID"   "Elev" "pH"   "Zn"   "Cu"   "Pb"   "Ni"   "Co"   "Ag"   "Mn"  
## [11] "Fe"   "Mo"   "U"    "W"    "Sn"   "Hg"   "As"   "Sb"   "Ba"   "Cd"  
## [21] "V"    "Bi"   "Cr"   "LoI"  "F"    "Au"

# Get a summary of the acidity (pH) values
summary(ca_geo$pH)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   3.900   6.100   6.600   6.531   7.000   8.700      33

# Look at the distribution
hist(ca_geo$pH)

# Make a vector that is TRUE for the missing data
miss <- is.na(ca_geo$pH)
table(miss)

## miss
## FALSE  TRUE 
##  1107    33

# Plot a map of acidity
spplot(ca_geo[!miss, ], "pH")

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

Fitting a trend surface The acidity data shows pH broadly increasing from north-east to south-west. Fitting a linear model with the coordinates as covariates will interpolate a flat plane through the values.

str(ca_geo, 1)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots

# Are they called lat-long, up-down, or what?
coordnames(ca_geo)

## [1] "x" "y"

# Complete the formula
m_trend <- lm(pH ~ x + y, as.data.frame(ca_geo))

# Check the coefficients
summary(m_trend)

## 
## Call:
## lm(formula = pH ~ x + y, data = as.data.frame(ca_geo))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.83561 -0.32091 -0.00761  0.33188  2.06249 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.358e+01  3.002e+00   27.84   <2e-16 ***
## x           -5.691e-06  3.483e-07  -16.34   <2e-16 ***
## y           -1.313e-05  5.319e-07  -24.68   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5299 on 1104 degrees of freedom
##   (33 observations deleted due to missingness)
## Multiple R-squared:  0.4237, Adjusted R-squared:  0.4227 
## F-statistic: 405.9 on 2 and 1104 DF,  p-value: < 2.2e-16

Computing the pH at the locations that have missing data in the source.

# Make a vector that is TRUE for the missing data
miss <- is.na(ca_geo$pH)

# Create a data frame of missing data
ca_geo_miss <- as.data.frame(ca_geo)[miss, ]

# Predict pH for the missing data
predictions <- predict(m_trend, newdata = ca_geo_miss, se.fit = TRUE)

# Compute the exceedance probability
pAlkaline <- 1 - pnorm(7, mean = predictions$fit, sd = predictions$se.fit)
hist(pAlkaline)

Variogram estimation

library(gstat)

## 
## Attaching package: 'gstat'

## The following object is masked from 'package:spatstat.core':
## 
##     idw

# Make a cloud from the non-missing data up to 10km
plot(variogram(pH ~ 1, ca_geo[!miss, ], cloud = TRUE, cutoff = 10000))

## Warning in wkt(obj): CRS object has no comment

# Make a variogram of the non-missing data
plot(variogram(pH ~ 1, ca_geo[!miss, ]))

## Warning in wkt(obj): CRS object has no comment

Variogram with spatial trend

# See what coordinates are called
coordnames(ca_geo)

## [1] "x" "y"

# The pH depends on the coordinates
ph_vgm <- variogram(pH ~ x + y, ca_geo[!miss, ])

## Warning in wkt(obj): CRS object has no comment

plot(ph_vgm)

The plot levels off after around 25000m, indicating that there appears to be little spatial correlation beyond that distance.

# Eyeball the variogram and estimate the initial parameters
nugget <- 0.18
psill <- 0.12
range <- 15000

# Fit the variogram
v_model <- fit.variogram(
  ph_vgm, 
  model = vgm(
    model = "Ste",
    nugget = nugget,
    psill = psill,
    range = range,
    kappa = 0.5
  )
)

# Show the fitted variogram on top of the binned variogram
plot(ph_vgm, model = v_model)

print(v_model)

##   model     psill    range kappa
## 1   Nug 0.1545113     0.00   0.0
## 2   Ste 0.1442008 14379.16   0.5

The model prediction helps you ignore measurement errors to more easily see the distance at which spatial correlation no longer occurs.

Filling in the gaps

Kriging itself is the application of the variogram along with the sample data points to produce estimates and uncertainties at new locations.

The computation of estimates and uncertainties, together with the assumption of a normal (Gaussian) response means we can compute any function of the estimates - for example the probability of a new location having alkaline soil.

# Set the trend formula and the new data
km <- krige(pH ~ x + y, ca_geo[!miss, ], newdata = ca_geo[miss, ], model = v_model)

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

## [using universal kriging]

names(km)

## [1] "var1.pred" "var1.var"

#predicted values
spplot(km, "var1.pred")

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment

# Compute the probability of alkaline samples, and map
km$pAlkaline <- 1 - pnorm(7, mean = km$var1.pred, sd = sqrt(km$var1.var))
spplot(km, "pAlkaline")

## Warning in wkt(obj): CRS object has no comment

## Warning in wkt(obj): CRS object has no comment