##Set Working Directory

setwd("~/Desktop/University of Utah PhD /Course Work/Fall 2022 Semester/GEOG6000_Data Analysis/lab14")

library(sf) #simple features
library(spatstat) #used for point pattern analysis

Useful Notes:

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

Point Pattern Data

Note: rior to all analysis, the locations must be read in and stored in a point pattern object (ppp). These objects contain, at minimum, the locations of the objects and a description of the window delimiting the study area (either as a box or a polygon). To create this for the BEI dataset, we:

  1. read in the data;
  2. check the range of coordinates (with the summary() function);
  3. create a bounding window using the owin() function;
  4. create the ppp object with the X and Y coordinates and the window
##Read in the csv file
bei <- read.csv("../datafiles/bei.csv")

##Show the range of coordinates using summary
summary(bei)
##        x               y        
##  Min.   :  0.1   Min.   :  0.1  
##  1st Qu.:151.9   1st Qu.:103.5  
##  Median :342.3   Median :282.6  
##  Mean   :433.8   Mean   :263.5  
##  3rd Qu.:719.0   3rd Qu.:416.1  
##  Max.   :998.9   Max.   :499.9
##Create bounding window based on the range of coordinates from the previous step

bei.owin = owin(xrange = c(0,1000), yrange = c(0,500))

##Create the point pattern object 

bei.ppp = ppp(bei$x, bei$y, window = bei.owin)
##Visualize
##Note the use of cols argument for color
plot(bei.ppp, pch = 16, cols = "goldenrod")

##Summary of the ppp object
summary(bei.ppp)
## Planar point pattern:  3604 points
## Average intensity 0.007208 points per square unit
## 
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 units
## 
## Window: rectangle = [0, 1000] x [0, 500] units
## Window area = 5e+05 square units

Quadrat Counts and Tests

Note: As the distribution of the trees is clearly non-uniform or inhomogenous, we require different methods to assess how this distribution varies over space. The simplest method is to divide the area into quadrats and count the number of trees per quadrat. The parameters nx and ny define the number of quadrats along each coordinate.

##Quadrat count set up 
bei.qc <- quadratcount(bei.ppp, nx = 6, ny = 3)

##Visualize
plot(bei.qc)

Note: We can test for a significant departure from a uniform or homogenous distribution by using the function quadrat.test(). This compares the observed number against an expected number if the objects were distributed uniformly (number of trees / number of quadrats). The differences are used in a Chi-squared test, with the null hypothesis of homogeneity or equal distribution among the transects.

##Test
bei.qt <- quadrat.test(bei.ppp, nx = 6, ny = 3)

##View Results
bei.qt
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  bei.ppp
## X2 = 2207.8, df = 17, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 6 by 3 grid of tiles
##Visualize the result of the test by quadrat

plot(bei.qt, cex=0.8)

Note: - The left top value is the observed number of points. - The right top value is the expected number of points. - The value in the center is the normalized?/standardized? difference. - Beware of the Modifiable Unit Areal Problem!!

Using Coviariates to Check for Correlation

Note: As an alternative to the simple quadrat approach, we can use a covariate to see if there is an association between values of the covariate and the object distribution. Here we use gridded slope values for the study area, contained in the file beislope.csv, read into R as a pixel image object. In order for this to be used, we:

  1. read in the gridded dataset;
  2. create a new window for the gridded data (on a 10m grid);
  3. create the pixel object using the as.im() function. Note that we define the window size by hand by specifying minimum and maximum x and y coordinates of the region.
##Read in the slope data
bei.slope <- read.csv("../datafiles/beislope.csv", header = FALSE)

##Define the window for the slope
bei.slope.owin <- owin(xrange = c(-2.5, 1002.5), yrange = c(-2.5, 502.5))

##Create the pixel object
bei.slope <- as.im(as.matrix(bei.slope), W = bei.slope.owin)


##Visualize
plot(bei.slope, reset = FALSE)

plot(bei.ppp, pch = 16, cex = 0.7, add = TRUE)

Note: To use the quadrat method with this data, we need to convert the slope into slope classes, then check for association with each class. We start be creating the classes:

  1. calculate quartiles of slope values for the class boundaries;
  2. use the cut() function to assign slope values to one of the four classes;
  3. create a tessellation based on the classes, which will be used to identify which class a object belongs to (using the tess() function):
##Break it into quantiles
b <- quantile(bei.slope, probs = seq(0,1, by = 0.25))

##Assign the values to one of the quantiles
bei.slope.cut <- cut(bei.slope, breaks = b, labels = 1:4)

##Create a tesseliation (A tessellation is created when a shape is repeated over and over again covering a plane without any gaps or overlaps)
bei.slope.tess <- tess(image = bei.slope.cut)


##Visualize
plot(bei.slope.tess, valuesAreColours=FALSE)

plot(bei.ppp, add = TRUE, pch = "+")

Note: Now we can use the quadratcount() function, but use the tessellation, rather than a set number of grid boxes.

##Count by the tessellation
qb <- quadratcount(bei.ppp, tess = bei.slope.tess)

##Visualize
plot(qb, valuesAreColours=FALSE)

##Chi-squared test by slope class
bei.qt <- quadrat.test(bei.ppp, tess = bei.slope.tess)

##Output
bei.qt
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  bei.ppp
## X2 = 661.84, df = 3, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 4 tiles (levels of a pixel image)
##Visualize
plot(bei.qt, cex = 0.8, valuesAreColours = FALSE)

Note: The low p-value again suggests that we can reject the null hypothesis and state that the trees are not uniformly distributed across the slope classes.

Kernel Density Functions

Note:Variations in the intensity of a spatial point process may also be investigated using a kernel density method. This fits two-dimensional Gaussian kernels (or windows) to the point objects, and effectively sums them across the area. Areas with higher densities of objects will therefore have a higher sum. These density functions provide a useful summary of variations in intensity and a good visualization method to examine a dataset for random or non-random distributions.

The densities are calculated using the density() function, which adapts to a ppp object. The most important parameter is sigma, which controls the bandwidth or size of the window fit to each point.

##Visualize
##sigma controls the bandwidth size

plot(density(bei.ppp, sigma = 60))

##Visualize
##sigma controls the bandwidth size

plot(density(bei.ppp, sigma = 25))

Note: The bandwidth can be selected using cross validation. This can be done in a two step process by (1) selecting the bandwidth using bw.diggle, then using this in the density function. Note that this tends to give very conservative estimates of bandwidth:

##Estimate the bandwidth
bei.bw <- bw.diggle(bei.ppp)

##Visualize
plot(density(bei.ppp, sigma = bei.bw))

Distance Functions

Note: Distance functions can be used to investigate the interaction between points in space. Various methods exist, all based on the idea of calculating distances between points and other points, or fixed points in the study region. The most well-known of these is Ripley’s K function, which describes the distribution as the set of all pairwise distances between points.

We will run this using a different point data set, the redwoods data: redwood.shp. Read this into R, and create a point process object. As the data is in a shape file, we will need to use the st_read() function from the sf package. This package also includes a helper function as.ppp() to convert directly to a ppp object.

##Read in the shapefile
redwood.sf <- st_read("../datafiles/redwood/redwood.shp", quiet = TRUE)

##Convert it to a point pattern
redwood.ppp <- as.ppp(redwood.sf)

##Check the data
redwood.ppp
## Marked planar point pattern: 62 points
## marks are numeric, of storage type  'double'
## window: rectangle = [0.1, 0.999] x [0.04, 0.92] units

Note: The function has created a ppp object, but by default it uses the first column in the sf data frame as a mark, a label on each point. We’ll look at this later in the lab, but for now we want to ignore this by setting it to a NULL value. The other thing we’ll correct is the window size, setting the minimum and maximum limits to 0 and 1 for both the x and y axes.

##Set the marks the null
marks(redwood.ppp) <- NULL

##adjust the window size
Window(redwood.ppp) <- owin(x = c(0, 1), y = c(0, 1))

##Visualize
plot(redwood.ppp)

Ripley’s K

Note: Ripley’s K function is calculated using the Kest() function and similar functions exist for the F and G functions. Once calculated, we can plot out the results, including the observed values of Ripley’s K and a theoretical curve based on an homogenous poisson process with an intensity equal to our point process object.

##Execute the Ripley's K test
redwood.kest <- Kest(redwood.ppp)

##Visualize
plot(redwood.kest)

Interpretation Note: If the point process data is effectively random (i.e. follows a poisson distribution), we would expect the observed line (black) to fall on top of or close to the theoretical line (blue). Above the theoretical line indicates clustering; below indicates a regular or ordered distribution. The green and red line represent K values calculated with different corrections for the border effect.

Next Potential Step: The redwood data appear to be clustered. To test if these are significantly different from a random distribution, we can run a Monte Carlo series of random simulations of homogenous poisson processes, using the function envelope(). This gives us an envelope of possible values of Ripley’s K, which account for simple stochastic differences in random distributions. If the data are really clustered, we expect the observed Ripley’s K to fall outside this envelope. These random simulations are performed using the envelope() function, which requires:

  1. A point process object
  2. The function to be used (here Ripley’s K; Kest)
  3. The number of random simulations to be performed (99)
##Ripleys K monte carlo
redwood.kest.mc <- envelope(redwood.ppp, 
                            fun = 'Kest', 
                            nsim = 99, 
                            verbose = FALSE)

##Visualize
plot(redwood.kest.mc, shade = c("hi", "lo"))

Note: Note that this uses point-wise estimates of uncertainty, and cannot be used as a post-hoc test. A better approach is to calculate the global uncertainty as the largest deviation between the randomly generated values of K and the theoretical value:

##Re-run the monte carlo using the global argument
redwood.kest.mc <- envelope(redwood.ppp, 
                            fun = 'Kest', 
                            nsim = 99, 
                            verbose = FALSE, 
                            global = TRUE)

##Visualize

plot(redwood.kest.mc, shade = c("hi", "lo"))

Besag’s L

Note: The L-function was proposed by Besag as a way to stabilize the variance of Ripley’s K and improve the interpretation. We can calculate this by simply replacing the function name in the envelope() function, as follows:

redwood.lest.mc <- envelope(redwood.ppp, 
                            fun = 'Lest', 
                            nsim = 99, 
                            verbose = FALSE, 
                            global = TRUE)

plot(redwood.lest.mc, shade = c("hi", "lo"))

Pair Correlation Function

Note: The final function we will look at here is the pair correlation function. Instead of using cumulative pairs of distances, this is based on the number of pairs of points seperated by a band of distances. This has the advantage of providing a clearer idea of the range of interactions - as Ripley’s K is based on the cumulative set of distances, this can make it seem as though interactions are present over greater distances than they really are. Again, we can use the envelope() function, but this time we remove the global option by setting it to false, as this is no longer needed.

redwood.pcf.mc <- envelope(redwood.ppp, 
                           fun = 'pcf', 
                           nsim = 99, 
                           verbose = FALSE)

plot(redwood.pcf.mc, shade=c("hi", "lo"), ylim = c(0, 5))

## Error in rebound.owin(X[[i]], ...) : 
##   The new rectangle 'rect' does not contain the window 'win'
#legend(x = 0.20, y = 2.5)

Interpretation: This shows positive interactions up to a range of about 0.07 map units, much less than in the corresponding K-function.

Marked Point Processes

Note: In the previous sections, the point processes have been considered as single objects. Marked point process data include some information that distinguish the objects into different classes, allowing study of the co-occurrence (either positive or negative) between different classes of object. The Lansing forest data set contains the location of a set of trees in a forest in Michigan. Read this file in and take a look at the structure of the data, and you will see there is a column defining the species name of each tree.

lansing <- read.csv("../datafiles/lansing/lansing.csv")

str(lansing)
## 'data.frame':    2250 obs. of  3 variables:
##  $ x      : num  0.078 0.076 0.051 0.015 0.03 0.102 0.135 0.121 0.04 0.065 ...
##  $ y      : num  0.091 0.266 0.225 0.366 0.426 0.474 0.498 0.489 0.596 0.608 ...
##  $ species: chr  "blackoak" "blackoak" "blackoak" "blackoak" ...

Note: We’ll now create a ppp object using the species to defined the marks or the labels of each point. Some differences from before: (1) the window describing the study area is in a shape file (lansing.shp and will need to be read in and converted to an owin object; (2) we need to specify the class information (the species names) when creating the ppp object, using the marks parameter. Note that we first need to convert this column to a factor so that R will recognize it as labels.

##Convert to factor
lansing$species <- as.factor(lansing$species)

##set window size
lansing.win <- st_read("../datafiles/lansing/lansing.shp", quiet = TRUE)

##Set as ppp
lansing.ppp <- ppp(x = lansing$x, 
                   y = lansing$y, 
                   win = lansing.win, 
                   marks = lansing$species)

##visualize
plot(lansing.ppp)

Note: The plot shows all the different marks (species) plotted together. We can access different marks, using the split() function, and can use this to analyze the distribution of any single species:

##Plot by species
plot(split(lansing.ppp), main = "All marks")

##Maple Only Plot
plot(split(lansing.ppp)$maple, "Maple trees")

#Plot by species
plot(density(split(lansing.ppp)), main = "Lansing density surfaces")

Note: To examine the co-occurrence of two marks or species, we can again use Ripley’s K function. We use the cross version: Kcross(), which examines pairwise differences between objects from the two classes. Again, we use the envelope() function to simulate random distributions, and specify the two species (marks) as i and j in the function:

##Using Ripley's K to test the cooccurence of black oak and hickory
lansing.kc <- envelope(lansing.ppp, 
                       Kcross, 
                       i = "blackoak", 
                       j = "hickory",
                       nsim = 99, 
                       verbose = FALSE)

##Visualize
plot(lansing.kc)

Note: As before, we can plot the output as observed curves and the envelope of simulated random distributions. If the observed curve is above the envelope, this is evidence that the two species co-occur; if below then the two species tend to occur in different areas, suggesting some competitive interaction. If the observed curve is within the envelope, then the combined distribution is random.

Point Process Models

Note: Point process models can be fit to any ppp object using the ppm() function. This uses the set of observed points to model the variations in intensity of a point process, usually based on some set of covariates. This function takes as its first argument, a ppp object, and a set of covariates as the second argument. Note that the second argument is the same syntax as the right hand side of a linear model in R. We start by building a simple model of a homogeneous Poisson process (i.e. with no covariates):

##model build 
##Poisson process/distribution (counts/intensity per unit area)
fit0 <- ppm(bei.ppp ~ 1)


##output
fit0
## Stationary Poisson process
## Intensity: 0.007208
##              Estimate       S.E.   CI95.lo   CI95.hi Ztest      Zval
## log(lambda) -4.932564 0.01665742 -4.965212 -4.899916   *** -296.1182
##Convert back to the original scale
exp(coef(fit0))
## log(lambda) 
##    0.007208

Interpretation: Telling us there is about 0.007th of a tree in each square meter. To check this is right, let’s get the intensity directly from the bei.ppp object:

##Check the intensity against the summary of the original ppp object
summary(bei.ppp)
## Planar point pattern:  3604 points
## Average intensity 0.007208 points per square unit
## 
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 units
## 
## Window: rectangle = [0, 1000] x [0, 500] units
## Window area = 5e+05 square units

Note: The following example models intensity as a second order polynomial function of the x and y coordinates of the objects. The polynom() function expands a set of variables into their second (or nth) order form (i.e. x+y+x2+y2+x∗y for second order coordinates).

##Build the model
fit1 <- ppm(bei.ppp ~ polynom(x, y, 2))

##View the model
fit1
## Nonstationary Poisson process
## 
## Log intensity:  ~x + y + I(x^2) + I(x * y) + I(y^2)
## 
## Fitted trend coefficients:
##   (Intercept)             x             y        I(x^2)      I(x * y) 
## -4.275762e+00 -1.609187e-03 -4.895166e-03  1.625968e-06 -2.836387e-06 
##        I(y^2) 
##  1.331331e-05 
## 
##                  Estimate         S.E.       CI95.lo       CI95.hi Ztest
## (Intercept) -4.275762e+00 7.811138e-02 -4.428857e+00 -4.122666e+00   ***
## x           -1.609187e-03 2.440907e-04 -2.087596e-03 -1.130778e-03   ***
## y           -4.895166e-03 4.838993e-04 -5.843591e-03 -3.946741e-03   ***
## I(x^2)       1.625968e-06 2.197200e-07  1.195325e-06  2.056611e-06   ***
## I(x * y)    -2.836387e-06 3.511163e-07 -3.524562e-06 -2.148212e-06   ***
## I(y^2)       1.331331e-05 8.487506e-07  1.164979e-05  1.497683e-05   ***
##                   Zval
## (Intercept) -54.739290
## x            -6.592577
## y           -10.116084
## I(x^2)        7.400185
## I(x * y)     -8.078197
## I(y^2)       15.685769
##Visualize
plot(fit1, 
     how = 'image', 
     se = FALSE, 
     pause = FALSE)

Note:

Earlier, we saw a relationship between tree location and slope. We can use the same function (ppm()) to model the intensity of the distribution using slope values. For a point process model, it is important to have values of the covariate at locations away from the points, as well as at the, Here, we use the slope image (bei.slope), specified using the usual R model syntax.

##Build the model
fit2 <- ppm(bei.ppp ~ bei.slope)

##Output
fit2
## Nonstationary Poisson process
## 
## Log intensity:  ~bei.slope
## 
## Fitted trend coefficients:
## (Intercept)   bei.slope 
##   -5.391053    5.026710 
## 
##              Estimate       S.E.   CI95.lo   CI95.hi Ztest      Zval
## (Intercept) -5.391053 0.03001787 -5.449887 -5.332219   *** -179.5948
## bei.slope    5.026710 0.24534296  4.545847  5.507573   ***   20.4885

Note: The coefficient for the slope is about 5. Remember for regression models based on the log of the response variable, this is a multiplier, and reflects the increase in the intensity with each unit increase in slope. We can again plot the fitted trend surface:

##Visualize
plot(fit2, 
     how = 'image', 
     se = FALSE, 
     pause = FALSE)

Exercise

Initial Urkiola Data Read

# Tree locations
urkiola.sf <- st_read("../datafiles/urkiola/urkiola.shp")
## Reading layer `urkiola' from data source 
##   `/Users/davidleydet/Desktop/University of Utah PhD /Course Work/Fall 2022 Semester/GEOG6000_Data Analysis/datafiles/urkiola/urkiola.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1245 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 2.1 ymin: 0.1 xmax: 219.1 ymax: 149.9
## CRS:           NA
# Park boundary
urkiola.win <- st_read("../datafiles/urkiola/urkiolaWindow.shp")
## Reading layer `urkiolaWindow' from data source 
##   `/Users/davidleydet/Desktop/University of Utah PhD /Course Work/Fall 2022 Semester/GEOG6000_Data Analysis/datafiles/urkiola/urkiolaWindow.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 3 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 0.05 ymin: 0.05 xmax: 219.95 ymax: 149.95
## CRS:           NA
## Window
urkiola.win <- as.owin(urkiola.win)

# First extract coordinates
urkiola.crds <- st_coordinates(urkiola.sf)

## Convert to ppp, using coordinates from previous step, marks as the tree species, and the window from step two
urkiola.ppp <- ppp(x = urkiola.crds[, 1], y = urkiola.crds[,2], 
                   marks = as.factor(urkiola.sf$tree), window = urkiola.win)

##Visualize
plot(urkiola.ppp)

1.1 Intensity

##Intensity
urkiola.sum = summary(urkiola.ppp)

urkiola.sum
## Marked planar point pattern:  1245 points
## Average intensity 0.06564029 points per square unit
## 
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 units
## 
## Multitype:
##       frequency proportion  intensity
## birch       886  0.7116466 0.04671269
## oak         359  0.2883534 0.01892760
## 
## Window: polygonal boundary
## single connected closed polygon with 44 vertices
## enclosing rectangle: [0.05, 219.95] x [0.05, 149.95] units
##                      (219.9 x 149.9 units)
## Window area = 18967 square units
## Fraction of frame area: 0.575
  • The intensity for the combined set of two tree species is 0.0656403

1.2 Distribution Plot

##Plot by species
plot(split(urkiola.ppp), main = " Tree Species")

1.3 Kernel Density Plot

#Density plot by species
plot(density(split(urkiola.ppp)), main = " Tree Species")

Interpretation:

  • Based on these density plots it appears that these two species co-occur in the southeastern corner and northern portion of the park. The intensity for oak trees is high in the center of the park compared to a low intensity for birch trees, which indicates they do not co-occur in this area.

1.4 Ripley’s K

##Ripleys K monte carlo
urkiola.kest.mc <- envelope(urkiola.ppp, 
                            fun = 'Kest', 
                            global = TRUE,
                            nsim = 99, 
                            verbose = FALSE)

##Visualize
plot(urkiola.kest.mc)

##Summary Function to obtain significance level of Ripleys K
urkiola.kest.sum = summary(urkiola.kest.mc)
## Simultaneous critical envelopes for K(r)
## and observed value for 'urkiola.ppp'
## Obtained from 99 simulations of CSR
## Alternative: two.sided
## Envelopes computed as theoretical curve plus/minus maximum simulated value of 
## maximum absolute deviation
## Significance level of Monte Carlo test: 1/100 = 0.01
## Data: urkiola.ppp
urkiola.kest.sum
## NULL

Interpretation: - Our Ripley’s K plot indicates that the tree species are spatially random at distances less than approximately 20 units. The tree species appear to co-occur at distances larger than 20 units although the observed curve is barely outside of our confidence interval. This curve was calculated using a global envelope, which estimates the confidence interval using the maximum distance between the envelope and the theoretical k-function.

  • The significance value of this Monte Carlo test is 0.01, which leads us to accept the results of this test as the probability of this curve being generated by chance is 0.01.

1.5 Spatial Dependence Test

##Using Ripley's K to test the cooccurence of birch and oak
urkiola.kc <- envelope(urkiola.ppp, 
                       fun = Kcross, 
                       i = "birch", 
                       j = "oak",
                       nsim = 99, 
                       verbose = FALSE)

##Visualize
plot(urkiola.kc)

summary(urkiola.kc)
## Pointwise critical envelopes for "K"["birch", "oak"](r)
## and observed value for 'urkiola.ppp'
## Obtained from 99 simulations of CSR
## Alternative: two.sided
## Upper envelope: pointwise maximum of simulated curves
## Lower envelope: pointwise minimum of simulated curves
## Significance level of Monte Carlo test: 2/100 = 0.02
## Data: urkiola.ppp

Interpretation: - This Ripley’s K suggests that these two species are spatially random as most of the observations fall within the confidence interval of our theoretical curve. In other words, these species are not correlated with each other.

  • The significance value of this Monte Carlo test is 0.02, which leads us to accept the results of this test as the probability of this curve being generated by chance is 0.02.