Mexico’s Bird Richness

In this section I will explore the SpatialPointsDataFrame called birdRichnessMexico.rds, which contains points indicating bird richness (# of spp) throughout Mexico.

Step 1: read in data and become familiar with rds datafile and structure.

# Set working directory
setwd("~/Documents/ESCI Time Series/Homework 7")

# Load all packages we will use in this homework assignment
library(gstat)

## Warning: package 'gstat' was built under R version 3.3.2

library(sp)

## Warning: package 'sp' was built under R version 3.3.2

library(spatstat)

## Warning: package 'spatstat' was built under R version 3.3.2

## Loading required package: nlme

## Warning: package 'nlme' was built under R version 3.3.2

## Loading required package: rpart

## Warning: package 'rpart' was built under R version 3.3.2

## 
## spatstat 1.51-0       (nickname: 'Poetic Licence') 
## For an introduction to spatstat, type 'beginner'

## 
## Note: R version 3.3.1 (2016-06-21) is more than 9 months old; we strongly recommend upgrading to the latest version

## 
## Attaching package: 'spatstat'

## The following object is masked from 'package:gstat':
## 
##     idw

library(ncf)
library(spdep)

## Warning: package 'spdep' was built under R version 3.3.2

## Loading required package: Matrix

## Warning: package 'Matrix' was built under R version 3.3.2

library(raster)

## 
## Attaching package: 'raster'

## The following objects are masked from 'package:spatstat':
## 
##     area, rotate, shift

## The following object is masked from 'package:nlme':
## 
##     getData

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.2

library(ggmap)

## Google Maps API Terms of Service: http://developers.google.com/maps/terms.

## Please cite ggmap if you use it: see citation("ggmap") for details.

# Read in the rds file
bird <- readRDS("birdRichnessMexico.rds")
str(bird)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  200 obs. of  3 variables:
##   .. ..$ nSpecies: int [1:200] 286 250 385 153 190 236 252 315 370 319 ...
##   .. ..$ lat     : num [1:200] 19.6 24.8 18.7 27.4 25.7 ...
##   .. ..$ long    : num [1:200] -101.1 -107.2 -96.9 -114.1 -104.6 ...
##   ..@ coords.nrs : num(0) 
##   ..@ coords     : num [1:200, 1:2] 409124 -216076 842162 -890506 41190 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:2] "x" "y"
##   ..@ bbox       : num [1:2, 1:2] -1064612 -930031 1743412 894915
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "x" "y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr "+proj=aea +lat_1=14.5 +lat_2=32.5 +lat_0=24 +lon_0=-105 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs +towgs84=0,0,"| __truncated__

Notes from investigating datafile and structure:

The column nSpecies indicates the number of bird species present within one kilometer of the corresponding location. There are 200 observations for nSpecies
A dataframe holds nSpecies and the corresponding lat and long coordinates for each observation, so we will not need to reproject the data.
These data are projected onto a plane using the North America Albers Equal Area Conic and are centered on Mexico (SR-ORG:38).

Step 2: Plot map of data.

For some reason the function get_googlemap is not working this morning. Gives warning “cannot open compressed file”. I am troubleshooting to get this to work again, but for now I blocked it from running the code.

# Use the get_googlemap function to pull in a google map centered on Mexico by lat,long and store this map in an object
mexmap <- get_googlemap(center= c(lon = -102, lat = 23), zoom= 5, maptype= "hybrid")

## Source : https://maps.googleapis.com/maps/api/staticmap?center=23,-102&zoom=5&size=640x640&scale=2&maptype=hybrid

# Plot map using ggmap
map1 <- ggmap(mexmap)
# Combine map with data
map1 <- map1 + geom_point(data=data.frame(bird), aes(x=long, y=lat, color=nSpecies), size=2)
# Combine map with labels
map1 <- map1 + labs(x="Longitude",
                    y="Latitude",
                    title="Bird Species Richness")

# Plot this map
map1

## Warning: Removed 1 rows containing missing values (geom_point).

Step 3: Discuss general patterns of the data visualized in the map.

There appears to be lower species richness in Baja California and along the inland running down the middle of Mexico. There are relatively higher species richness along the western coast beginning in southern Sonora and increasing in species richness in Jalisco. There is also relatively higher species richness along the southern east coast around Veracruz and Oaxaca. Based solely on observations of terrain, the patterns of bird species richness appears to be correlated habitat type and vegetation cover. There appears to be lower species richness in drier mountainous regions and higher species richness in wetter, forested regions.

Step 4: Investigate spatial autocorrelation in bird data.

Lagged Scatter Plots

In the lagged scatterplots, the values of bird species richness (x-axis) plotted against bird species richness values that are a certain spatial distance away. I specify what distance each plot is comparing by choosing what values to set the breaks argument in the hscat function.

# Use the hscat function to plot lagged scatterplots of the bird data. We are looking at a large area and the distance unit is in meters, so we need to set very large values for our breaks.

# The breaks here indicate to go from 0 to 2e6 meters in increments of 2e5 meteres. Note that 2e6 meters is about 1244 miles, which is about the distance from the northern border in Sonora down to Veracruz (almost the entire country, which gives us a good measure of study area). 2e5 meters is about 124 miles.
hscat(nSpecies~1, data=bird, breaks= seq(0, 2e6, by=2e5))

# From 0 to about 500 miles by about 62 miles.
hscat(nSpecies~1, data=bird, breaks= seq(0, 8e5, by=100000))

The first lagged scatterplots included almost the entire study area and indicated that correlation becomes very weakly positively correlated around 8e5 meters. In the second plot, I took this into account and decided to get a higher resolution of distances closer than 8e5 meters. This plot indicates strong positive correlations until about 2e5 (~124 miles) and becomes weakly positively correlated around 7-8e5 meters (435-500 miles).

Variogram with different widths

Using a variogram, we can visualize the variation of the species richness as a function of distance. In a semivariogram, we quantify spatial autocorrelation by measuring the distance between all pairs of points and plotting the squared difference between the values (y-axis) at that distance (x-axis). If a pair of points are spatially close together and have a smaller squared difference, they are spatially autocorrelted.

A variogram is a “measure of environmental distance with respect to Euclidean distance”.

# First, we will plot the variogram cloud.
# Create object to hold variogram function output.
birdVarCloud <- variogram(nSpecies~1, bird, cloud= TRUE)
# Plot the variogram cloud
plot(birdVarCloud, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness")

# Second, we will plot the variogram at various widths and cutoffs

# Default cuttoff and width
birdVar1 <- variogram(nSpecies~1, bird, cloud= FALSE)
plot(birdVar1, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness - Default")

summary(birdVar1)

##        np              dist             gamma           dir.hor 
##  Min.   : 155.0   Min.   :  50455   Min.   : 237.5   Min.   :0  
##  1st Qu.: 757.5   1st Qu.: 298844   1st Qu.:1908.6   1st Qu.:0  
##  Median : 893.0   Median : 558141   Median :2440.4   Median :0  
##  Mean   : 810.0   Mean   : 559756   Mean   :2433.7   Mean   :0  
##  3rd Qu.: 956.0   3rd Qu.: 818467   3rd Qu.:3248.7   3rd Qu.:0  
##  Max.   :1083.0   Max.   :1078585   Max.   :4111.7   Max.   :0  
##     dir.ver     id    
##  Min.   :0   var1:15  
##  1st Qu.:0            
##  Median :0            
##  Mean   :0            
##  3rd Qu.:0            
##  Max.   :0

# The default cutoff is 10,78585 meters based on summary of birdVar1
# The default np (related to width) "number of point pairs used for this estimate" is 1083 based on summary of birdVar1

# Default cuttoff, decrease width to include more points in semivariance calculation
birdVar2 <- variogram(nSpecies~1, bird, cloud= FALSE, width=20000)
plot(birdVar2, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness - width=20,000")

# Default cutoff, further decrease width to include even more points
birdVar3 <- variogram(nSpecies~1, bird, cloud= FALSE, width=10000)
plot(birdVar3, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness - width=10,000")

# Increase cutoff, default width
birdVar4 <- variogram(nSpecies~1, bird, cloud= FALSE, cutoff=3200000)
plot(birdVar4, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness, cutoff=3,200,000")

# Increase cutoff, decrease width
birdVar5 <- variogram(nSpecies~1, bird, cloud= FALSE, width=35000, cutoff=3200000)
plot(birdVar5, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness - width=35,000; cutoff=3,200,000")

## becomes nonsensical at a large cutoff, because you begin to see edge affects.

# Note about the ~1 in variogram function code: Do you want to modify z at all before you analyze it? Do you want to remove a trend? The term ~1 says no trend. Replace 1 with a factor that you want to take into account if you would like to detrend data before analyzing.

The variogram with default width and cutoff values suggests that bird richness is autocorrelated until a distance of about 700,000 - 800,000 meters (~435 - 500 miles) when the semivariance appears to level off. When I increase the cutoff distance, the semivariance appears to increase until a distance of about 2,500,000 meters (1,553 miles) and then begins to decrease again. It looks like values begin to spread and become more variable beyond 1e6, which may be due to edge effects at great distances. It seems like maintaining the default cutoff distance may give a more reasonable visualization of the data.

Moran’s I

Moran’s I gives us a degree of spatial autocorrelation as a function of distance. It is a unitless value ranging from [-1,1] from strongly negatively to strongly positively correlated.

Within the Moran’s I equation, we can specify how we would like points weighted based on distance. If we make the w=1, then all points are equally weighted. If we don’t want all points to have equal weights in the equation, we can adjust the weighting matrix.

The values can be transformed to Z-scores for hypthesis testing. Moran’s Z-scores greater than 1.96 and less than -1.96 indicate significant spatial autocorrelation at an alpha 0.05 level.

First, we calculate weights matrix and run a Moran test using all the datapoing.

# Calculate the inverse distance matrix stored in the object "w" to weight near neighboring points greater than farther points. This line of code is coercing the distance matrix of the birds data, which is taken from the coordinates, and then taking the inverse of all of these values to get the inverse distance matrix.
w <- 1/as.matrix(dist(coordinates(bird)))

# Now, we need to set the diagnol values of the matrix back to zero values because the (1/0=Inf).
diag(w) <- 0

# Run the Moran's I test for the species richness in the bird dataset. The moran.test function needs the w data in a listw format to run the calculation, so we first need to coerce the inverse distance matrix into a listw.
moran.test(bird$nSpecies, mat2listw(w))

## 
##  Moran I test under randomisation
## 
## data:  bird$nSpecies  
## weights: mat2listw(w)  
## 
## Moran I statistic standard deviate = 28.39, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.2940827655     -0.0050251256      0.0001109976

The results indicate that for all points in the bird dataset, the bird species richness is positively autocorrelated, so observations of high bird species richness that are geographically close will be associated with each other.

Now, we will calculate the weights matrix using the correlation between the eight closest neighbors, because this is generally the rule of thumb for running this test.

# Adjust the weights matrix to the 8 closest neighbors.
# k is the number of neighbors you would like to specify
# knearneigh tells you the k nearest neighbors for each point
# knn2nb takes the k nearest neighbors and turns it into a neighbor object nb
w2 <- knn2nb(knearneigh(bird, k=8))

# Run another moran's test with this new weights matrix
moran.test(bird$nSpecies, nb2listw(w2))

## 
##  Moran I test under randomisation
## 
## data:  bird$nSpecies  
## weights: nb2listw(w2)  
## 
## Moran I statistic standard deviate = 21.79, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.707838803      -0.005025126       0.001070321

These test results indicate that the eight nearest neighbors are must more strongly postively correlated (I= 0.71) than all the points (I= 0.29). This means that close locations in Mexico are likely to have similar bird species richness.

Now, let’s make some plots to see these results more easily.

# Create a loop to calculate Moran's I for several distances
# First, make a dataframe for loop values to occupy
n <- 7
res <- data.frame(k=2^(1:n), I=rep(NA,n))
# Second, run loop
for(i in 1:n){
  w <- knn2nb(knearneigh(bird, k=2^i))
  res$I[i] <- moran.test(bird$nSpecies, nb2listw(w))$estimate[1]
} # End of loop

# Now, plot the dataframe, which will show number of neighbors (k) as a function of the calculated moran's estimate (I).
plot(res, type="b",
     main="Moran's I by number of neighbors",
     pch= 20,
     cex= 1.5)

This Moran’s I plot indicates that there is strong postive autocorrelation for neighbors out to about 20. The more neighbors we increase, the lower the Moran’s I value becomes.

Now, I will plot a spline correlogram to look at Moran’s I at increasing distances as a function of distance including 95% confidence intervals calculated by bootstrapping (resampling the data a set number of times).

# Very plain plot using the spline.correlog function
birdI <- spline.correlog(x= coordinates(bird)[,1],
                         y= coordinates(bird)[,2],
                         z= bird$nSpecies,
                         resamp= 100,
                         xmax= 1100000,
                         quiet= TRUE)

plot(birdI, text= TRUE)

I will make this plot look much more pleasant by following Andy’s code tutorial from the lecture notes.

#
x <- birdI$real$predicted$x
y <- birdI$real$predicted$y
lCI <- birdI$boot$boot.summary$predicted$y["0.025",]
uCI <- birdI$boot$boot.summary$predicted$y["0.975",]
xx <- c(x,rev(x))
yy <- c(lCI,rev(uCI))

# Create plot
plot(x, y, 
     xlim= c(0,1100000),
     ylim= c(-1, 1), type= "n", 
     xlab = "Distance (m)", 
     ylab = "Moran's I")
# Add polygon for confidence interval
polygon(xx,yy,col="grey",border=NA)
# Add a line at a Moran's I value of zero
abline(h=0,lty="dashed")
# Add 
lines(x,y)

This plot indicates that bird species richness has strong positive spatial autocorrelation out to distances of about 100,000 meters (62 miles).The Moran’s I values steadily decrease until they touch the 0 line at about 700,000 meters (435 miles), indicating that distances beyond this are no longer positively correlated. The variogram indicated semivariance rapidly increasing until about 400,000 meters, which is about the distance on the spline.cor plot that values become weakly positively autocorrelated. Additionally, the variogram indicates that between 7e5 and 8e5 meters is when the semivariance levels off and this distance corresponds to when the Moran’s I value drop within the zero range.

All three methods (lagged scatterplots, variograms, and Moran’s I) give a similar picture for spatial autocorrelation of the bird data. It was a bit difficult to judge the overall distance and increments to set the lagged scatterplots without looking at the variogram and Moran’s I plots in combination. I found the spline correlogram of Moran’s I as a function of distance to be the most helpful in getting an overall picture of the data in meaninful units. I think it will be helpful to begin with this plot and then utilize the general shape of this plot to investigate increments of the data using the other methods, such plotting lagged scatterplots at meaningful increments and setting the variogram with an interesting width value.

Directional Variograms

Anisotropy is when data are directionally dependent, which means the data have different properties in different directions. This also means that the spatial pattern within the data is not constant in all directions.

# Default cuttoff and width
birdVar1 <- variogram(nSpecies~1, bird, cloud= FALSE)
plot(birdVar1, 
     pch=20, cex=1.5, col="black",
     ylab= expression("Semivariance (" *gamma* ")" ),
     xlab= "Distance (m)",
     main= "Mexico's Bird Species Richness - Default")

summary(birdVar1)

##        np              dist             gamma           dir.hor 
##  Min.   : 155.0   Min.   :  50455   Min.   : 237.5   Min.   :0  
##  1st Qu.: 757.5   1st Qu.: 298844   1st Qu.:1908.6   1st Qu.:0  
##  Median : 893.0   Median : 558141   Median :2440.4   Median :0  
##  Mean   : 810.0   Mean   : 559756   Mean   :2433.7   Mean   :0  
##  3rd Qu.: 956.0   3rd Qu.: 818467   3rd Qu.:3248.7   3rd Qu.:0  
##  Max.   :1083.0   Max.   :1078585   Max.   :4111.7   Max.   :0  
##     dir.ver     id    
##  Min.   :0   var1:15  
##  1st Qu.:0            
##  Median :0            
##  Mean   :0            
##  3rd Qu.:0            
##  Max.   :0

# The default cutoff is 10,78585 meters based on summary of birdVar1

# Try fitting a variogram to the bird data with two different model types and look at fit
fitv1 <- fit.variogram(birdVar1, vgm(psill=2500, model="Sph", range=400000))
plot(birdVar1, fitv1,
     main="Model=Sph")

fitv2 <- fit.variogram(birdVar1, vgm(psill=2500, model="Exp", range=400000))
plot(birdVar1, fitv2,
     main="Model=Exp")

# Add anisotropy into the variogram model and plot
v.dir <- variogram(nSpecies~1, bird, cloud= FALSE, alpha=(0:3) * 45)
vmodel1 <- vgm(psill=2500, model="Exp", range=400000, anis=c(45,0.3))
plot(v.dir, vmodel1)

Homework 7 - Spatial Autocorrelation

Brooke McIntyre

5/18/2017