Threats, context, and factors needed to protect African biodiversity.

We’ll use long-term data from Kruger National Park to explore R. This will lead into the endangered species act.

Resources

Resources/data/knpHerbivores.rdata

Objectives

Introduce background issues on biodiversity threats from global change
Gain an exposure to concepts in modeling and computation, including data sources and quality, sources of uncertainty

For next time

Read The future of sub-Saharan Africa’s biodiversity in the face of climate and societal change.

We continue to look at long-term population data for megaherbivores at Kruger National Park. Climate change is but one of several threats to African biodiversity. Select one the challenges included in Figure 1 of this paper and summarize how funding from developed countries could be targeted to have the most impact, directly or indirectly, on conservation efforts.

Post no more than two paragraphs to Sakai for your group.

Post your answers to questions on the analysis from last time to Sakai. Do this in groups.

Substrates at KNP, with acidic, infertile rocks in the west, fertile volcanic rocks in the east.

Continue today

Background reading: Clark, J. S., C. L. Scher, and M. Swift. 2020. The emergent interactions that govern biodiversity change. Proceedings of the National Academy of Sciences, 117, 17074-17083.

Here is a function to read:

columnSplit <- function(vec, sep='_', ASFACTOR = F, ASNUMERIC=F,
                        LASTONLY=F){
  vec <- as.character(vec)
  nc  <- length( strsplit(vec[1], sep)[[1]] )
  mat <- matrix( unlist( strsplit(vec, sep) ), ncol=nc, byrow=T )
  if(LASTONLY & ncol(mat) > 2){
    rnn <- mat[,1]
    for(k in 2:(ncol(mat)-1)){
      rnn <- columnPaste(rnn,mat[,k])
    }
    mat <- cbind(rnn,mat[,ncol(mat)])
  }
  if(ASNUMERIC){
    mat <- matrix( as.numeric(mat), ncol=nc )
  }
  if(ASFACTOR){
    mat <- data.frame(mat)
  }
  mat
}

Load data

There are three objects to load here:

load( 'data/knpHerbivores.rdata', verbose = T )

## Loading objects:
##   xdata
##   ydata
##   edata

All three of these objects are organized as site-year (rows) by variables (columns). The data.frame ydata holds counts from a fixed-wing aircraft. Sample effort is edata in km$^2$. Predictors are in data.frame xdata.

The first goal is to look at the relationship between rainfall and species abundances. I use the function tapply to obtain precipitation means by site and year. In this example the site is the column shape. I then determine the anomalies from the mean.

prec     <- xdata$rainfall
precYear <- tapply( prec, xdata$year, mean, na.rm=T)   # aggregate to mean for year
precSite <- tapply( prec, xdata$shape, mean, na.rm=T)  # aggregate to mean for site

Exercise 1: Look at the objects xdata, precYear, and precSite.

How are they related?

What are the dimensions of each object?
What are the names (or rownames/colnames) of the elements in each object?
How do their dimensions relate to one another?
How many sites are there?
How many years are there?

Year plot

The function plot( x, y ) needs the numeric vectors x and y. They must have the same length. I use this function to plot annual precipitation:

year <- as.numeric( names(precYear) )
plot( jitter( xdata$year ), prec, cex=.1 )
lines( year, precYear, lwd = 2 )

Exercise 2: Look at the objects xdata$year and prec.

How are they related?

What are the lengths of each vector?
What are the names of elements in each object, and what do these names represent?
What is the object created by names(precYear)?
What is the object created by as.numeric( names(precYear) ), and why is it used here?

Exercise 3:For a normally distributed random variable, 95% of points lie within 1.96 standard deviations of the mean. Add lines for the mean $\pm 95\%$ of observations to the graph. To get the standard deviation use the function tapply, replacing mean with sd (for standard deviation).

Map plot

I now want to see how rainfall is distributed across the map. The locations of sites are held the xdata columns lon and lat. The mean precipitation for each site is held in the object I created precSite. So I need to get the location of each site in precSite from xdata. I do this by matching the names of precSite with the rows in xdata that have the matching shape column. Once I have these matching rows, I can get lon and lat for those rows. Here’s the algorithm:

Step 1. match the sites between names(precSite) and xdata$shape:

mm    <- match( names( precSite ), xdata$shape )    

# check that they match:
all.equal( names(precSite), xdata$shape[mm] )    # are they equivalent?

## [1] TRUE

# show me:
rbind( names(precSite), xdata$shape[mm] )[,1:5] # they are the same

##      [,1]    [,2]       [,3]       [,4]       [,5]      
## [1,] "shp_1" "shp_10.1" "shp_10.2" "shp_10.3" "shp_10.4"
## [2,] "shp_1" "shp_10.1" "shp_10.2" "shp_10.3" "shp_10.4"

Exercise 5: Which rows in xdata match the names of precSite?

Step 2. Generate a color palette for dry to wet:

Here’s a source I often use for colors. I select a diverging color scheme and export. From the javaScript option I copy the hex codes. I use the function colorRampPalette with this color vector to generate a function cfun that will return a new vector of interpolated colors of length n:

library( scales )
co    <- c('#8c510a','#bf812d','#dfc27d','#80cdc1','#35978f','#01665e') # color ramp
cfun  <- colorRampPalette( co )    
n     <- 12
show_col( cfun( n ), borders = NA )

# show more colors:
show_col( cfun( 36 ), borders = NA, cex = .7 )

Step 3. Assign a color to each precSite value, ranging from low (dry) to high (wet):

bin    <- cut( precSite, n )        # assign each value to a color bin
bnames <- names( table( bin ) )     # find the unique bins
cc     <- match( bin, bnames )      # replace bin name with the interval number
col    <- cfun( n )[ cc ]           # assign the color for that bin

Step 4. Assign a symbol size to each precSite value, ranging from low (snakk) to high (large):

Symbol size in the plot function is a scaling factor, the default being cex = 1. I decide to select a range of scaling factors from near zero (0.2) to 1.2:

cex   <- .4 + (precSite - min(precSite))/(max(precSite) - min(precSite))

Exercise 6: What is the object cfun, and what arguments does it take? How long are the vectors col and cex, and what do they hold?

Here is a map of mean precipitation for each site, based on color and size:

plot( xdata$lon[mm], xdata$lat[mm], cex = cex, col = col, asp = 1, pch = 16 )

Exercise 7: What is the range of rainfall across this map? What why asp used in the call to plot?

Anomalies

I can express variation over years or over sites as anomolies. An anomaly is often taken as the difference between a point and its mean value divided by its standard deviation. Here are anomalies from the site means and the year means.

site  <- as.character( xdata$shape )  # character mode needed for names
yname <- as.character( xdata$year )

precYearSd   <- tapply( prec, yname, sd, na.rm=T)               # aggregate to mean for year
precSiteSd   <- tapply( prec, site, sd, na.rm=T)                # aggregate to mean for site
precSiteAnom <- (prec - precSite[ site ])/precSiteSd[ site ]    # yr anom for site
precYearAnom <- (prec - precYear[ yname ])/precYearSd[ yname ]  # site anom for year

par( mfrow = c(2,1), bty = 'n', mar = c(4,4,1,1) )
plot( jitter( xdata$year ), prec, cex=.1, xlab = '')            # the absolute value
lines( as.numeric(names(precYear)), precYear )

plot( jitter( xdata$year ), precYearAnom, cex=.1, xlab = 'Year') # distribution of anomalies by year
abline(h=0, lty=2)

Exercise 8: What are the objects called site and yname? Why are they characters? What purpose do they serve? What are the units for anomalies?

Megaherbivore trends

Now I want to compare the abundances of herbivores with the annual variation in rainfall, again using anomalies. First note that I am interested in counts per effort (CPE), or ydata/edata. In the block that follows I aggregate CPE as mean values by year. I then plot them (red) with precipitation anomalies (blue):

species <- colnames( ydata )                   # a vector of species names
S       <- length( species )                   # number of species in ydata
i       <- rep( species, each = nrow(ydata))   # index for species
j       <- rep( xdata$year, S )
yByYr   <- tapply( as.matrix(ydata/edata), list(species = i , year = j ), mean, na.rm=T)
yByYr[ !is.finite( yByYr ) ] <- NA

Here I put this algorithm into a function so I can use it again:

aggregateMatrix <- function( mat, icol, jrow ){
  mat <- as.matrix( mat )
  i   <- rep( icol, each = nrow(mat))   # index for species
  j   <- rep( jrow, length(icol) )
  z   <- tapply( as.vector( mat ), list( i, j ), mean, na.rm=T)
  wz  <- which( !is.finite(z) )
  if( length(wz) > 0)z[ wz ] <- NA
  z
}

# use it this way:
yByYr <- aggregateMatrix( ydata/edata, species, xdata$year )

Exercise 9: The objects i and j are used in the function tapply. How do i and j determine the size and elements of object yByYr?

Here is a loop to plot each species superimposed on the annual rainfall:

precAnom <- (tapply( prec, xdata$year, mean ) - mean( prec, na.rm = T ))/
             tapply( prec, xdata$year, sd )
corPrec <- rep(NA, S)
names( corPrec) <- species

par(mfrow=c(4,5), bty='n', mar=c(3,3,1,1))
for(j in 1:S){
  yanom <- (yByYr[j,] - mean(yByYr[j,], na.rm=T))/sd(yByYr[j,], na.rm=T)
  yanom[ !is.finite( yanom ) ] <- NA
  plot(year, yanom, type='l', xlab = '', ylab = '', col='brown', ylim=c(-4, 4))
  lines(year, precAnom, col='blue')
  title( rownames(yByYr)[j] )
  abline(h=0, lty=2, col='grey')
  
  corPrec[j] <- round( cor( yanom, precAnom, use = 'complete.obs' ), 3 )
}

Exercise 10: Which species tend to increase in high rainfall years, and vice versa? How usefull are annual correlations for determining the importance of rainfall?

I also decide to look at histograms, confirming that data contain many zeros:

par(mfrow=c(4,5), bty='n', mar=c(3,3,1,1))
for(j in 1:S){
  hist( ydata[,j]/(edata[,j]), nclass = 50,
        main = colnames(ydata)[j])
}
# number of zeros by species:
zeros <- ydata
zeros[ zeros > 0 ] <- 1
zeros[ is.na(zeros) ] <- 0
zeros <- colSums( 1 - zeros, na.rm = T )/nrow(zeros)
print( round(zeros, 2) )

##        elephant   elephant bull         buffalo    buffalo bull     white rhino 
##            0.50            0.54            0.67            0.60            0.59 
##     black rhino         giraffe          impala            kudu       waterbuck 
##            0.91            0.57            0.45            0.55            0.73 
##           zebra blue wildebeest   roan antelope        tsessebe  sable antelope 
##            0.49            0.68            0.94            0.91            0.78 
##           eland    hippopotamus 
##            0.89            0.90

Because hippos are counted using a different method, I omit them:

omitSpec <- c('hippopotamus')
ydata <- ydata[,!colnames(ydata) %in% omitSpec]
edata <- edata[,!colnames(edata) %in% omitSpec]

I now evaluate the association between megaherbivores and predictors. First I aggregate yearly counts into site totals. I do this by summing counts and effort:

tmp <- columnSplit( rownames(ydata), '_' )    # rownames parsed to year and site
iyr <- as.numeric( tmp[,3] )
ist <- paste( tmp[,1], tmp[,2], sep ='-' )

i <- rep( ist, ncol(ydata) )
j <- rep( colnames( ydata), each = nrow(ydata) )

ysite <- tapply( as.vector( as.matrix(ydata) ), list( i, j ), sum, na.rm = T )
esite <- tapply( as.vector( as.matrix(edata) ), list( i, j ), sum, na.rm = T )

For predictors, I want to average continuous predictors (covariates) and simply extract the fixed factors for individual sites (e.g., landcover). First I generate a new data.frame having just enough rows for each site:

xsite <- data.frame( xdata[1:nrow(ysite), ] )  # new data.frame for aggregate
rownames(xsite) <- rownames(ysite)

Exercise 11: What are the values in xsite, and why are they there?

To fill the factors in xsite, I can simply find a matching row in xdata and plug them in. I need to repeat this for each factor column:

# factors/characters
xname <- c( 'shape', 'year.type', 'is.river.near', 'landcover', 'substrate',
            'before.intervention', 'recent.fire' )
mm <- match( rownames(xsite), ist )
for( k in xname )xsite[,k] <- xdata[mm,k]      # assign site value

I cannot do the same thing for variables that change every year. For these continuous variables I want to use the mean value (over all years) for each site and variable. Here are the continuous variables:

xname <- colnames(xsite)[ !colnames(xsite) %in% xname ] # remaining continuous predictors
for( k in xname ){                                      # assign site mean
  kmu <- tapply( xdata[,k], ist, mean, na.rm = T )
  xsite[names(kmu),k] <- kmu
}

Now I fit the model:

library( gjam )
formula <- as.formula( ~ river + years.since.fire + substrate  )
S    <- ncol(ysite)
eff  <- list( columns = 1:S, values = esite )  # ha to km2
ml   <- list(ng = 5000, burnin = 1000, typeNames = 'DA', effort = eff)
out  <- gjam( formula, xsite, ysite, modelList = ml)

We’ll talk about these plots:

gjamPlot( out, list( PLOTALLY = T, GRIDPLOTS = T ) )

Exercise 12: Change the formula to fit a different model. Explain why you did it and what you got from the fitted model.

9. Megaherbivores at Kruger National Park
background

Duke University ENV89S: Environmental change in the big-data era

James S Clark

2023-02-13

Resources

Objectives

For next time

Continue today

Load data

Year plot

Map plot

Anomalies

Megaherbivore trends

9. Megaherbivores at Kruger National Park background

Duke University ENV89S: Environmental change in the big-data era

James S Clark

2023-02-13

Resources

Objectives

For next time

Continue today

Load data

Year plot

Map plot

Anomalies

Megaherbivore trends

9. Megaherbivores at Kruger National Park
background