An introduction to the Analysis of Forest characteristics with Spatial Autocorrelation in R

This webpage introduces the procedures and outcomes for the analysis of forest characteristics by using spatial autocorrelation method in an open-source language, R. The purpose of this analysis is to analyze data on forest height and tree age in the Greater Victoria Water Supply Area (GVWSA) and provide visual output so that a wider public can understand the spatial distribution of forests in this area.

Why using R for the analysis?

The very reason that R is utilized for the analysis is because R is a free, open-source software; that means anyone on the internet can access this tool to analyze spatial distribution of events with data, like the one done in this analysis. For more information about software, please reference to the R-webpage. Another reason is R holds a wide range of packages and functions to analyze data. You can easily look for a function from “help” or even create a new one on your own. If you cannot find the answer on R, you can ask worldwide R users by Googling or looking at R-blogger. The analysis was done in Rstudio, and it was useful as we can write codes and see the output plots simultaneously on the same screen.

Spatial analysis in R

The use of “Big Data” becomes more important and expected rather than just observing the geographical events these days. Spatial analysis of crime data in the city or household income distribution can be done from a huge stack of data in R. In this blog, the forests in the GVWSA will be analyzed in R with three methods in different sections: weights matrix, global Moran’s I, and local Moran’s I. Each section would have sub-topics for describing the purpose of the method, how to implement the method in R, as well as output and findings. A variables caled Vegetation Resource Inventory (VRI) will be used to identify forest characteristics such as if certain types of tree species are distributed in a specific region in the area of interest.

How to format and map VIR data with R

Data loading and clean-up

The first thing to do before any analysis of data in R is to read the file that is going to be used in the analysis and clean the dataset. The data can be downloaded from the page here. The R code used for uploading procedure is here. At first, we are going to install some packages and call them into a library.Here’s an example of installation and storing in the library.

install.packages("spdep")
library(spdep)

Then set the repository and working directory. Use readOGR function to upload the file into Rstudio. Change the epsg into NAD83 UTM Zone 10N, which is 26910.

dir <- "/Users/goshunomacbookair/418Lab/Lab3"
setwd(dir)
VRI <- readOGR(dsn ="./WatershedVRI", layer = "WatershedVRI")
VRI <- spTransform(VRI, CRS("+init=epsg:26910")) 

This loaded file has a lot of information and some of them are not relevant of the analysis or even carry a number of NA values, so the dataset needs to be cleaned up. The irrelevant columns will be got rid of from the dataset, and new column names were given for the resulting columns so that people can easily identify what each column means.

vriCleanCols <- c("FID_VEG_CO", "POLYGON_ID", "PROJ_AGE_1",
                    "SITE_INDEX", "SPECIES__4", "SPECIES__5",
                    "PROJ_HEI_1", "SPECIES_PC", "SPECIES__6",
                    "VRI_LIVE_S", "BASAL_AREA", "WHOLE_STEM",
                    "CROWN_CL_1")
vriClean <- VRI[,vriCleanCols]
newNames <- c("FID", "PolyID", "Stand_Age", "Site_Index",
                "CoDom_Sp", "Dom_Sp", "Stand_HT", "DomSP_Perc", 
                "CDomSP_Perc", "Stand_Dens", "Stand_BA", "Stand_StemBio", "Stand_CrownCl")
colnames(vriClean@data) <- newNames

Finally, the data with NA values in the chosen column of interest, the stand height of the forest in polygons this time, will be taken out from the dataset. Then, the data clean-up process is over.

vriClean <- vriClean[!is.na(vriClean@data$Stand_HT), ]

VRI data mapping

The mapping of the stand height data can be done with the code here. The mapping can be actually done quite easily in Rstudio.

map_StdHT <- tm_shape(vriClean) + tm_polygons(col = "Stand_HT", 
                title = "Stand_HT", style = "jenks", palette = "Greens", n = 6)
map_StdHT

You can change the color graduation of the map by using a palette with the code below. A coution here is that you cannot do anything on Rstudio while you open this palette, so make sure to close the palette page when you are done with it. In the code shown above, the palette called “Greens” was used to illustrate the forest stand height with six classes (where n = 6).

tmaptools::palette_explorer()

When openning the palette, it should look like the figure below (figure ###).

There are two modes in Rstudio when it comes to working in “plots” section. “View” mode and “plot” mode. View mode enalbles us to interact with plots, meaning that we can retrieve the values in output plots. While we cannot do so in the plot mode. The chunck of codes below shows how to switch between the two.

tmap_mode("view") #to work as View mode
tmap_mode("plot") #to work as plot mode

The resulting plot would look like this.

Weights Matrix analysis

Purpose of weights matrix analysis

A spatial auto-correlation is the measurement of how similar or different the target is from the neibours. However, the definition of neighbours has a few possibilities. In this analysis, the methods called queen weights and rook weights are introduced to define surrouding polygons as neighbours. The names are derived from the ways queen and rook can move in the chessboard. These are actually quite similar but one difference is that rook weights do not count the surroundings that do not share the boundary as lines. That is, even if the surrounding polygons attach to the target polygon at corner, they will not be incorporated as neighbours.

Rcode to implement weights matrix

Below is the Rcode for implementing Queen weights on stand height data.

vri.nb <- poly2nb(vriClean)
vri.net <- nb2lines(vri.nb, coords=coordinates(vriClean))
crs(vri.net) <- crs(vriClean)

The code below would map the queen and rook weights by connecting the center of polygons to the neighbours in a color for queen and a yellow for rook.

tm_shape(vriClean) + tm_borders(col='lightgrey') + 
  tm_shape(vri.net) + tm_lines(col='blue', lwd = 2) +
  tm_shape(vri.net2) + tm_lines(col='yellow', lwd = 2)

The section below gives us the characteristics of weights list objects and neighbour list objects. Here again, zero.policy = TRUE means we would incorporate the data with zero value.

vri.lw <- nb2listw(vri.nb, zero.policy = TRUE, style = "W")
print.listw(vri.lw, zero.policy = TRUE)

Output of weights matrix and description

The resulting map with queen and rook weights would look like this. In this map, the most yellow lines representing rook weight neighbourhood definition overlap the blue lines showing the queen weight neighbourhood. However, when using the interactive mode with tmap_mode("view") and closely looking at the map, there are some blue lines that are not overlapped by yellow lines. These lines represent the neighbours that are not included when using rook weights but are included with queen weights. From this comparison, we can see that the result might slightly differ with the choice of neighbour definition. The figure visually shows that the queen weight take more neighbours into account. For the following analysis, the queen weight was selected for the neibourhood definition.

Global Moran’s I

Purpose of Global Moran’s I analysis

The Global Moran’s I is designed to measure the spatial autocorrelation with a single output for the entire dataset (ArcGIS pro). It requires feature locations and values for the analysis and gives the result as a single value of a z-score that represents whether the dataset is clustered, dispersed or random (ArcGIS pro). What it essentially does is three things below; examining the difference between the value for the target and the mean value: multiply that by the difference between the neighbors: and multiply this by the weight (w) that is given to our neighbors. It is a useful tool to grasp a big picture view of the overall tendency of data in terms of spatial autocorrelation. For more information about Global Moran’s I analysis will be found in the ArcGIS pro webpage.

Rcode to implement Global Moran’s I

The Rcode for the Global Moran’s I test is shown here and below.zero.policy = TRUE indicates that the data with the value of zero will also be taken into calculation.

mi <- moran.test(vriClean$Stand_HT, vri.lw, zero.policy = TRUE)
mi

There is a set of code for determine the possible range of the Global Moran’s I value.

moran.range <- function(lw) {
  wmat <- listw2mat(lw)
  return(range(eigen((wmat + t(wmat))/2)$Stand_HT))
  }
moran.range(vri.lw)

From the result of the Global Moran’s I calculation, a z-score can be driven by the code as well.

mI <- mi$estimate[[1]]
eI <- mi$estimate[[2]]
var <- mi$estimate[[3]]
z <- (eI - mI)/(var)^(1/2)

Output of Global Moran’s I and description

The {echo=FALSE} z <- (eI - mI)/(var)^(1/2) After the range calibration, the code returns the negative and positive infinity for the minimum and maximum point respectively. That means the potential range for the Global Moran’s value, represented as eI, from the negative infinity to positive infinity. In other words, the Global Moran’s value could take any real number. A z-score will be calculated with the code as well. The calculated z-score resulted in - , which means the analyzed dataset was significantly dispersed. The Z-score has a close relationship with p-value on the probability. We can use the normal table (figure ###) to see the probability of a given Z-score but the resulting Z-score was way too small to be on the table.

Local Moran’s I

Purpose of Local Moran’s I analysis

The Local Moran’s I has a similar concept from the Global Moran’s I. The difference here is even though the Global Moran’s I provides the single calculated number to represent the spatial autocorrelation of the dataset, the Local Moran’s I will show the a number of calculated results for each features that illustrates the spatial autocorrelation compared with the neighbors. For this analysis, the queen weights are utilized again.

Rcode to implement Local Moran’s I

The Rcode for the Local Moran’s I test is shown here and below.

########################  
lisa.test <- localmoran(vriClean$Stand_HT, vri.lw)

vriClean$Ii <- lisa.test[,1]
vriClean$E.Ii<- lisa.test[,2]
vriClean$Var.Ii<- lisa.test[,3]
vriClean$Z.Ii<- lisa.test[,4]
vriClean$P<- lisa.test[,5]
########################
map_LISA <- tm_shape(vriClean) + 
tm_polygons(col = "Ii", 
              title = "Local Moran's I", 
              style = "jenks", 
              palette = "PiYG", n = 10)
map_LISA

A chunk of code below would run to produce the visualization of a scatter plot that shows the relationship between locations and means.

moran.plot(vriClean$Stand_HT, vri.lw, zero.policy=TRUE, spChk=NULL, labels=NULL, xlab="Stand Height", 
             ylab="Spatially Lagged Stand Height", quiet=NULL)

Output of Local Moran’s I and description

The image for the resulting scatter plot should look like the one below. INSERT PICTURE HERE HERE HERE. The x-axis shows the values at each location and y-axis shows the averaged neighbouring values of each location. That means that if the point is shown up around the right top corner. The values for the location and the neighbour are both relatively higher. While if the point is popped up at the left bottom, that means the value for the location is relatively smaller and so as that of neighbours of that location. Because the dataset holds a lot of locations to consider, it seems challenging to find the absolute nature for the dataset.