Threats, context, and factors needed to protect African biodiversity.
We’ll use long-term data from Kruger National Park to explore R. This will lead into the endangered species act.
Resources
Resources/data/knpHerbivores.rdata
Objectives
Introduce background issues on biodiversity threats from global change
Gain an exposure to concepts in modeling and computation, including data sources and quality, sources of uncertainty
For next time
Read The future of sub-Saharan Africa’s biodiversity in the face of climate and societal change.
- We continue to look at long-term population data for megaherbivores at Kruger National Park. Climate change is but one of several threats to African biodiversity. Select one the challenges included in Figure 1 of this paper and summarize how funding from developed countries could be targeted to have the most impact, directly or indirectly, on conservation efforts.
Post no more than two paragraphs to Sakai for your group.
- Post your answers to questions on the analysis from last time to Sakai. Do this in groups.
Substrates at KNP, with acidic, infertile rocks in the west, fertile
volcanic rocks in the east.
Continue today
Background reading: Clark, J. S., C. L. Scher, and M. Swift. 2020. The emergent interactions that govern biodiversity change. Proceedings of the National Academy of Sciences, 117, 17074-17083.
Here is a function to read:
columnSplit <- function(vec, sep='_', ASFACTOR = F, ASNUMERIC=F,
LASTONLY=F){
vec <- as.character(vec)
nc <- length( strsplit(vec[1], sep)[[1]] )
mat <- matrix( unlist( strsplit(vec, sep) ), ncol=nc, byrow=T )
if(LASTONLY & ncol(mat) > 2){
rnn <- mat[,1]
for(k in 2:(ncol(mat)-1)){
rnn <- columnPaste(rnn,mat[,k])
}
mat <- cbind(rnn,mat[,ncol(mat)])
}
if(ASNUMERIC){
mat <- matrix( as.numeric(mat), ncol=nc )
}
if(ASFACTOR){
mat <- data.frame(mat)
}
mat
}Load data
There are three objects to load here:
load( 'data/knpHerbivores.rdata', verbose = T )## Loading objects:
## xdata
## ydata
## edata
All three of these objects are organized as site-year (rows) by
variables (columns). The data.frame ydata holds counts from
a fixed-wing aircraft. Sample effort is edata in km\(^2\). Predictors are in
data.frame xdata.
The first goal is to look at the relationship between rainfall and
species abundances. I use the function tapply to obtain
precipitation means by site and year. In this example the site is the
column shape. I then determine the anomalies from the
mean.
prec <- xdata$rainfall
precYear <- tapply( prec, xdata$year, mean, na.rm=T) # aggregate to mean for year
precSite <- tapply( prec, xdata$shape, mean, na.rm=T) # aggregate to mean for site
Exercise 1: Look at the objects xdata,
precYear, and precSite.
- What are the dimensions of each object?
- What are the names (or rownames/colnames) of the elements in each object?
- How do their dimensions relate to one another?
- How many sites are there?
- How many years are there?
Year plot
The function plot( x, y ) needs the
numeric vectors x and y. They
must have the same length. I use this function
to plot annual precipitation:
year <- as.numeric( names(precYear) )
plot( jitter( xdata$year ), prec, cex=.1 )
lines( year, precYear, lwd = 2 )
Exercise 2: Look at the objects xdata$year
and prec.
- What are the lengths of each vector?
- What are the names of elements in each object, and what do these names represent?
- What is the object created by names(precYear)?
- What is the object created by as.numeric( names(precYear) ), and why is it used here?
Exercise 3:For a normally distributed random variable,
95% of points lie within 1.96 standard deviations of the mean. Add
lines for the mean \(\pm
95\%\) of observations to the graph. To get the standard
deviation use the function tapply, replacing
mean with sd (for standard deviation).
Map plot
I now want to see how rainfall is distributed across the map. The
locations of sites are held the xdata columns
lon and lat. The mean precipitation for each
site is held in the object I created precSite. So I need to
get the location of each site in precSite from
xdata. I do this by matching the names of
precSite with the rows in xdata that have the
matching shape column. Once I have these matching rows, I
can get lon and lat for those rows. Here’s the
algorithm:
Step 1. match the sites between
names(precSite) and xdata$shape:
mm <- match( names( precSite ), xdata$shape )
# check that they match:
all.equal( names(precSite), xdata$shape[mm] ) # are they equivalent?## [1] TRUE
# show me:
rbind( names(precSite), xdata$shape[mm] )[,1:5] # they are the same## [,1] [,2] [,3] [,4] [,5]
## [1,] "shp_1" "shp_10.1" "shp_10.2" "shp_10.3" "shp_10.4"
## [2,] "shp_1" "shp_10.1" "shp_10.2" "shp_10.3" "shp_10.4"
Exercise 5: Which rows in xdata match the
names of precSite?
Step 2. Generate a color palette for dry to wet:
Here’s
a source I often use for colors. I select a diverging color scheme
and export. From the javaScript option I copy the hex
codes. I use the function colorRampPalette with this color
vector to generate a function cfun that will
return a new vector of interpolated colors of length n:
library( scales )
co <- c('#8c510a','#bf812d','#dfc27d','#80cdc1','#35978f','#01665e') # color ramp
cfun <- colorRampPalette( co )
n <- 12
show_col( cfun( n ), borders = NA )# show more colors:
show_col( cfun( 36 ), borders = NA, cex = .7 )Step 3. Assign a color to each precSite
value, ranging from low (dry) to high (wet):
bin <- cut( precSite, n ) # assign each value to a color bin
bnames <- names( table( bin ) ) # find the unique bins
cc <- match( bin, bnames ) # replace bin name with the interval number
col <- cfun( n )[ cc ] # assign the color for that binStep 4. Assign a symbol size to each
precSite value, ranging from low (snakk) to high
(large):
Symbol size in the plot function is a scaling factor,
the default being cex = 1. I decide to select a range of
scaling factors from near zero (0.2) to 1.2:
cex <- .4 + (precSite - min(precSite))/(max(precSite) - min(precSite))
Exercise 6: What is the object cfun, and
what arguments does it take? How long are the vectors col
and cex, and what do they hold?
Here is a map of mean precipitation for each site, based on color and size:
plot( xdata$lon[mm], xdata$lat[mm], cex = cex, col = col, asp = 1, pch = 16 )
Exercise 7: What is the range of rainfall across this
map? What why asp used in the call to plot?
Anomalies
I can express variation over years or over sites as anomolies. An anomaly is often taken as the difference between a point and its mean value divided by its standard deviation. Here are anomalies from the site means and the year means.
site <- as.character( xdata$shape ) # character mode needed for names
yname <- as.character( xdata$year )
precYearSd <- tapply( prec, yname, sd, na.rm=T) # aggregate to mean for year
precSiteSd <- tapply( prec, site, sd, na.rm=T) # aggregate to mean for site
precSiteAnom <- (prec - precSite[ site ])/precSiteSd[ site ] # yr anom for site
precYearAnom <- (prec - precYear[ yname ])/precYearSd[ yname ] # site anom for year
par( mfrow = c(2,1), bty = 'n', mar = c(4,4,1,1) )
plot( jitter( xdata$year ), prec, cex=.1, xlab = '') # the absolute value
lines( as.numeric(names(precYear)), precYear )
plot( jitter( xdata$year ), precYearAnom, cex=.1, xlab = 'Year') # distribution of anomalies by year
abline(h=0, lty=2)
Exercise 8: What are the objects called
site and yname? Why are they characters? What
purpose do they serve? What are the units for anomalies?
Megaherbivore trends
Now I want to compare the abundances of herbivores with the annual
variation in rainfall, again using anomalies. First note that I am
interested in counts per effort (CPE), or ydata/edata. In
the block that follows I aggregate CPE as mean values by year. I then
plot them (red) with precipitation anomalies (blue):
species <- colnames( ydata ) # a vector of species names
S <- length( species ) # number of species in ydata
i <- rep( species, each = nrow(ydata)) # index for species
j <- rep( xdata$year, S )
yByYr <- tapply( as.matrix(ydata/edata), list(species = i , year = j ), mean, na.rm=T)
yByYr[ !is.finite( yByYr ) ] <- NAHere I put this algorithm into a function so I can use it again:
aggregateMatrix <- function( mat, icol, jrow ){
mat <- as.matrix( mat )
i <- rep( icol, each = nrow(mat)) # index for species
j <- rep( jrow, length(icol) )
z <- tapply( as.vector( mat ), list( i, j ), mean, na.rm=T)
wz <- which( !is.finite(z) )
if( length(wz) > 0)z[ wz ] <- NA
z
}
# use it this way:
yByYr <- aggregateMatrix( ydata/edata, species, xdata$year )
Exercise 9: The objects i and
j are used in the function tapply. How do
i and j determine the size and elements of
object yByYr?
Here is a loop to plot each species superimposed on the annual rainfall:
precAnom <- (tapply( prec, xdata$year, mean ) - mean( prec, na.rm = T ))/
tapply( prec, xdata$year, sd )
corPrec <- rep(NA, S)
names( corPrec) <- species
par(mfrow=c(4,5), bty='n', mar=c(3,3,1,1))
for(j in 1:S){
yanom <- (yByYr[j,] - mean(yByYr[j,], na.rm=T))/sd(yByYr[j,], na.rm=T)
yanom[ !is.finite( yanom ) ] <- NA
plot(year, yanom, type='l', xlab = '', ylab = '', col='brown', ylim=c(-4, 4))
lines(year, precAnom, col='blue')
title( rownames(yByYr)[j] )
abline(h=0, lty=2, col='grey')
corPrec[j] <- round( cor( yanom, precAnom, use = 'complete.obs' ), 3 )
}Exercise 10: Which species tend to increase in high rainfall years, and vice versa? How usefull are annual correlations for determining the importance of rainfall?
I also decide to look at histograms, confirming that data contain many zeros:
par(mfrow=c(4,5), bty='n', mar=c(3,3,1,1))
for(j in 1:S){
hist( ydata[,j]/(edata[,j]), nclass = 50,
main = colnames(ydata)[j])
}
# number of zeros by species:
zeros <- ydata
zeros[ zeros > 0 ] <- 1
zeros[ is.na(zeros) ] <- 0
zeros <- colSums( 1 - zeros, na.rm = T )/nrow(zeros)
print( round(zeros, 2) )## elephant elephant bull buffalo buffalo bull white rhino
## 0.50 0.54 0.67 0.60 0.59
## black rhino giraffe impala kudu waterbuck
## 0.91 0.57 0.45 0.55 0.73
## zebra blue wildebeest roan antelope tsessebe sable antelope
## 0.49 0.68 0.94 0.91 0.78
## eland hippopotamus
## 0.89 0.90
Because hippos are counted using a different method, I omit them:
omitSpec <- c('hippopotamus')
ydata <- ydata[,!colnames(ydata) %in% omitSpec]
edata <- edata[,!colnames(edata) %in% omitSpec]I now evaluate the association between megaherbivores and predictors. First I aggregate yearly counts into site totals. I do this by summing counts and effort:
tmp <- columnSplit( rownames(ydata), '_' ) # rownames parsed to year and site
iyr <- as.numeric( tmp[,3] )
ist <- paste( tmp[,1], tmp[,2], sep ='-' )
i <- rep( ist, ncol(ydata) )
j <- rep( colnames( ydata), each = nrow(ydata) )
ysite <- tapply( as.vector( as.matrix(ydata) ), list( i, j ), sum, na.rm = T )
esite <- tapply( as.vector( as.matrix(edata) ), list( i, j ), sum, na.rm = T )For predictors, I want to average continuous predictors (covariates)
and simply extract the fixed factors for individual sites (e.g.,
landcover). First I generate a new data.frame
having just enough rows for each site:
xsite <- data.frame( xdata[1:nrow(ysite), ] ) # new data.frame for aggregate
rownames(xsite) <- rownames(ysite)
Exercise 11: What are the values in xsite,
and why are they there?
To fill the factors in xsite, I can simply find a
matching row in xdata and plug them in. I need to repeat
this for each factor column:
# factors/characters
xname <- c( 'shape', 'year.type', 'is.river.near', 'landcover', 'substrate',
'before.intervention', 'recent.fire' )
mm <- match( rownames(xsite), ist )
for( k in xname )xsite[,k] <- xdata[mm,k] # assign site valueI cannot do the same thing for variables that change every year. For these continuous variables I want to use the mean value (over all years) for each site and variable. Here are the continuous variables:
xname <- colnames(xsite)[ !colnames(xsite) %in% xname ] # remaining continuous predictors
for( k in xname ){ # assign site mean
kmu <- tapply( xdata[,k], ist, mean, na.rm = T )
xsite[names(kmu),k] <- kmu
}Now I fit the model:
library( gjam )
formula <- as.formula( ~ river + years.since.fire + substrate )
S <- ncol(ysite)
eff <- list( columns = 1:S, values = esite ) # ha to km2
ml <- list(ng = 5000, burnin = 1000, typeNames = 'DA', effort = eff)
out <- gjam( formula, xsite, ysite, modelList = ml)We’ll talk about these plots:
gjamPlot( out, list( PLOTALLY = T, GRIDPLOTS = T ) )
Exercise 12: Change the formula to fit a
different model. Explain why you did it and what you got from the fitted
model.