Computational Biology: Determining Extinction Risk

Introduction

The goal of PVA is to get an idea about how likely a population is to go extinct in the near future.

Preliminaries

Load data

Data were originally compiled from previously published sources by Dennis et al (1991). Additional data from Morris & Doak (2002).

The following code builds a dataframe of counts of females with young from 1959 to 1997.

census <- 1:39
year.t   <- 1959:1997
females.N <- c(44,47,46,44,46,
               45,46,40,39,39,
               42,39,41,40,33,
               36,34,39,35,34,
               38,36,37,41,39,
               51,47,57,48,60,
               65,74,69,65,57,
               70,81,99,99)
lambda.i <- females.N[-1]/females.N[-length(females.N)]
lambda.i <- c(lambda.i,NA)
lambda_log <- log(lambda.i)
bear_N <- data.frame(census,
                year.t,
                females.N,
                lambda.i, 
                lambda_log)

20th-Century Yellowstone Grizzly Bear Population Dynamics

par(mfrow = c(1,1))
plot(females.N ~ year.t, data = bear_N, 
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year")
abline(v = 1970)
abline(v = 1987, col = "red")

Model stochastic population dynamics

We can model how a population of bears will change in the future by randomly drawing lambdas values from set of observed lambdas.

# vector of lambdas
hat_of_lambdas <- bear_N$lambda.i

# remove NAs
hat_of_lambdas <- na.omit(hat_of_lambdas)

A for() loop can be used to randomly draw a lambda each year and calculate the population size the following year.

Note, using t as a variable in the for() loops is a bad idea because t() is a function. This overwrites the t() function with a number and I can’t use the t() function until I restart R.

## Preliminaries
### Number of years to simulate
num.years <- 50

### Empty vector to hold data
N.vector <- rep(NA, num.years)

N.1997 <- 99

### Initialize starting population size with last
###  year of available data
N.vector[1] <- N.1997

## for() loop

for(i in 1:num.years){

  # Draw a random lambda
  # Save to a vector
  lambda_rand.i <- sample(x = hat_of_lambdas, size = 1,replace = TRUE)
  
  # Calculate population size the next year
  N.i <- N.vector[i]*lambda_rand.i
  
  # Store the population size
  N.vector[i+1] <- N.i
  
  # R automatically increases i from the current
  
}

Plot the output of a single population simulation.

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)
plot(N.vector ~ year.t,
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year")
abline(v = 1970)
abline(v = 1987, col = "red")

Predicting extinction

To get a sense of how at risk of extinction a population is we need to run many simulations - 1000s of them. There are many ways to approach this computational task. One would be to create what is called a nested for() loop.

Nested for() loops can be a bit mind bending the first time you have to think about them. What I do below is I make a function called simulate_bears() that does what we’ve been doing so far: simulate a single population 50 years into the future.

After that, I’ll make a for() loop which will run simulate_bears() many times to create many simulated bear populations.

This function below is not very well written - I’ve emphasized clarity about what is going on rather than code that would be reusable.

Functions are created with the function called function(). The have the general form

function.name <- function(argumenets){
  #do something cool
  ...
  return(cool.result)
}

In the function below I make a function that has no arguments - everything is hard coded. Not what I’d normally do, but keeps it simple. So, the structure is more like this

function.name <- function(){
  #do something cool
  ...
  return(cool.result)
}

Because its function() with nothing in the parentheses there are no arguments for the function.

simulate_bears <- function(){

## Preliminaries
### Number of years to simulate
num.years <- 50

### starting number of  bears
N.1997 <- 99
  

### Empty vector to hold data
N.vector <- rep(NA, num.years)

### Initialize starting population size with last
###  year of available data
N.vector <- c(N.1997, N.vector)

## the for() loop
for(i in 1:num.years){

  # Draw a random lambda
  # Save to a vector
  lambda_rand.i <- sample(x = hat_of_lambdas, size = 1,replace = TRUE)
  
  # Calculate population size the next year
  N.i <- N.vector[i]*lambda_rand.i
  
  # Store the population size
  N.vector[i+1] <- N.i
  
  # R automatically increases i from the current
}

N.vector <- round(N.vector)

# return the vector containing the 
## population time series
return(N.vector)

}

Run the function 1 time

simulate_bears()

##  [1]  99  94  87  74  90  86  90  74  72  70  63  62  57  61  58  64  63  52  63
## [20]  60  64  74  83  77  95 106  95 110  91 111 137 134 146 142 186 157 180 218
## [39] 213 187 230 249 302 349 390 322 336 440 386 338 322

Now, I want to run this many times. I’ll start with 10 iterations. I’ll save the data in a matrix, made with the function matrix()

many_bears <- matrix(data = NA, 
                     nrow = 50+1,  # year new years, 
                                   #  plus 1 for 1997
                     ncol = 10,
                     byrow = TRUE)

I always check to make sure what I made is what I think it is.

dim(many_bears)

## [1] 51 10

head(many_bears)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA

I’m going to give the matrix some row names and columns names. Note that for some reasons we have row.names() with a period and colnames() without a period.

row.names(many_bears) <- paste("year",seq(1998, 1998+50))
colnames(many_bears)  <- paste("sim",seq(1, ncol(many_bears)))

Let me see what this looks like

many_bears[1:10,1:10]

##           sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 sim 10
## year 1998    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 1999    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2000    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2001    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2002    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2003    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2004    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2005    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2006    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2007    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA

Now I will simulate just 1 time series of bears. The code many_bears[,1] access the first column of the matrix many_bears.

many_bears[,1] <- simulate_bears()

Look again at the data

many_bears[1:10,1:10]

##           sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 sim 10
## year 1998    99    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 1999   113    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2000   105    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2001   117    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2002   112    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2003    95    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2004    93    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2005    91    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2006    98    NA    NA    NA    NA    NA    NA    NA    NA     NA
## year 2007    98    NA    NA    NA    NA    NA    NA    NA    NA     NA

Now I"ll plot the first time series. Not that in my plotting code I access the column I want to plot like this: many_bears[,1]

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)
plot(many_bears[,1] ~ year.t,
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year")

Now I will simulate a second time series of bears and store it in the second column of my big matrix.

many_bears[,2] <- simulate_bears()

Check the data again

many_bears[1:10,1:10]

##           sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 sim 10
## year 1998    99    99    NA    NA    NA    NA    NA    NA    NA     NA
## year 1999   113   108    NA    NA    NA    NA    NA    NA    NA     NA
## year 2000   105   105    NA    NA    NA    NA    NA    NA    NA     NA
## year 2001   117   129    NA    NA    NA    NA    NA    NA    NA     NA
## year 2002   112   106    NA    NA    NA    NA    NA    NA    NA     NA
## year 2003    95   139    NA    NA    NA    NA    NA    NA    NA     NA
## year 2004    93   122    NA    NA    NA    NA    NA    NA    NA     NA
## year 2005    91   148    NA    NA    NA    NA    NA    NA    NA     NA
## year 2006    98   169    NA    NA    NA    NA    NA    NA    NA     NA
## year 2007    98   173    NA    NA    NA    NA    NA    NA    NA     NA

This code lets me plot the first and the second time series. The first time series is dong using plot(many_bears[,1] ~ … and the second timer series is done using points(many_bears[,2] ~ …

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)
plot(many_bears[,1] ~ year.t,
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year")
points(many_bears[,2] ~ year.t, col = 2, type = "b")

Run simulation for multiple iterations

many_bears has 10 columns. I can call the up with ncol(). This for loop is short and just runs simulate_bears() 10 times, and saves each time series in the matrix.

In R, we usually use i to represent rows and j to represent columns

for(j in 1:ncol(many_bears)){
  
  # 
  simulation.j <- simulate_bears()
  
  #
  many_bears[ ,j] <- simulation.j
  
}

Always check your output

many_bears[1:10,1:10]

##           sim 1 sim 2 sim 3 sim 4 sim 5 sim 6 sim 7 sim 8 sim 9 sim 10
## year 1998    99    99    99    99    99    99    99    99    99     99
## year 1999    97   102   104   121    97    97    83    89    97     92
## year 2000    93   105   111   151   121   118    81   102   103    106
## year 2001    86   128   127   162   100   121    71   102   101    116
## year 2002    84   131   144   155    87   119    66   110   124    110
## year 2003    91   131   151   202    82   125    70   119   137    134
## year 2004    91   127   151   198    72   109    68   130   115    134
## year 2005    91   136   151   242    61    91    66   159   151    155
## year 2006    99   154   189   302    53   102    76   131   197    159
## year 2007    99   130   176   296    52   107    72   128   211    150

To plot all 10 time series we can use a for() loop with points() in it.

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)

plot(many_bears[,1] ~ year.t,
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year")



for(j in 2:ncol(many_bears)){

points(many_bears[,j] ~ year.t, 
       col = 2, 
       type = "l")  
  
  
}

Some time series get chopped off, so I can write some more complex code to determine the maximum and minimums that make sense for the y axis. (Don’t worry about the details)

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)


#determine largest and smallest 
## pop sizes
N_max <- max(many_bears)
N_min <- min(many_bears)

# main plot
plot(many_bears[,1] ~ year.t,
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year",
     ylim = c(N_min, 
              N_max))
abline(h = 33, col = 2)
#abline(h = 33, col =2)

#for loop wiht points()
for(j in 2:ncol(many_bears)){

points(many_bears[,j] ~ year.t, 
       col = 2, 
       type = "l")  
  
  
}

Run a big simulation

Now let’s do this for real: 1000 simulations

mega_bears <- matrix(data = NA, 
                     nrow = 50+1,  # year new years, plus 1 for 1997
                     ncol = 1000,
                     byrow = TRUE)

Check my creation

dim(mega_bears)

## [1]   51 1000

Name things

row.names(mega_bears) <- paste("year",seq(1998, 1998+50))
colnames(mega_bears) <- paste("sim",seq(1, ncol(mega_bears)))

Run the simulation 1000 times. In R, we usually use i to represent rows and j to represent columns

for(j in 1:ncol(mega_bears)){
  
  # 
  simulation.j <- simulate_bears()
  
  #
  mega_bears[ ,j] <- simulation.j
  
}

It will look like a mess if I plot everything, so this code, which is a bit dense, just plots the first time series, the time series that reached the largest size, and the time series which reached the smallest size.

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)

N_max <- max(mega_bears)
N_min <- min(mega_bears)

global.max <- which(mega_bears == max(mega_bears),
                    arr.ind = TRUE)

global.min <- which(mega_bears == min(mega_bears), 
                    arr.ind = TRUE)

g.max <- global.max[,2]
g.min <- global.min[,2]


plot(mega_bears[,1] ~ year.t,
     type = "b",
     ylab = "Population index (females + cubs)",
     xlab = "Year",
     ylim = c(N_min, N_max))
abline(h = 33)
points(mega_bears[,g.max] ~ year.t, type = "l")
points(mega_bears[,g.min] ~ year.t, type = "l")

Determine how many populations become “pseudo-extinct”

Simulating absolute extinction is problematic for several reasons. Our model doesn’t include genetics, but we know that small populations are inbred, and inbreeding is bad. Also, small populations are liable to be wiped out by random environmental catastrophes or disease. So often we run simulations and count how many populations drop below a “pseudo extinction threshold.”

This code is a bit dense. See if you can spot the main aspects of what is going on. In it I have added some additional code to the simulate_bears() so that whenever a population drops below my pseudo extinction threshold (set to 50) I overwrite their actual population size to 0, making them go extinct. This will allow me to keep track of all the populations which have dropped below the pseudo extinction threshold.

This is done with the ifelse() function. Which has the logic of “if you are below 50, I turn you into 0, otherwise, I leave you alone.”

simulate_bears_with_pe <- function(){

## Preliminaries
### Number of years to simulate
num.years <- 50

### Empty vector to hold data
N.vector <- rep(NA, num.years)

### Initialize starting population size with last
###  year of available data
N.vector <- c(N.1997, N.vector)

## for() loop

for(i in 1:num.years){

  # Draw a random lambda
  # Save to a vector
  lambda_rand.i <- sample(x = hat_of_lambdas, 
                          size = 1,
                          replace = TRUE)
  
  # Calculate population size the next year
  N.i <- N.vector[i]*lambda_rand.i
  
  # Store the population size
  N.vector[i+1] <- N.i
  
  N.vector[i+1] <- ifelse(N.vector[i+1] < 50, 
                          0, 
                          N.vector[i+1] )
  
  # R automatically increases i from the current
}

N.vector <- round(N.vector)
return(N.vector)
}

Let’s make another big matrix to hold out data.

mega_bears_pe <- matrix(data = NA, 
                     nrow = 50+1,  # year new years, plus 1 for 1997
                     ncol = 1000,
                     byrow = TRUE)

Run our simulation 1000 times.

for(j in 1:ncol(mega_bears_pe)){
  
  # 
  simulation.j <- simulate_bears_with_pe()
  
  #
  mega_bears_pe[ ,j] <- simulation.j
  
}

Glance at the results

mega_bears_pe[,1:2]

##       [,1] [,2]
##  [1,]   99   99
##  [2,]   92   94
##  [3,]   88   94
##  [4,]   73  122
##  [5,]   91  160
##  [6,]   89  194
##  [7,]   86  203
##  [8,]   82  217
##  [9,]   78  217
## [10,]   89  221
## [11,]   95  268
## [12,]   93  287
## [13,]   88  318
## [14,]   81  327
## [15,]   76  349
## [16,]   87  329
## [17,]   89  430
## [18,]   96  374
## [19,]  120  457
## [20,]  139  445
## [21,]  170  436
## [22,]  223  405
## [23,]  238  376
## [24,]  207  367
## [25,]  216  343
## [26,]  211  334
## [27,]  233  281
## [28,]  205  320
## [29,]  180  299
## [30,]  188  391
## [31,]  188  433
## [32,]  200  443
## [33,]  196  433
## [34,]  256  532
## [35,]  241  547
## [36,]  279  475
## [37,]  250  464
## [38,]  211  451
## [39,]  206  438
## [40,]  174  535
## [41,]  194  499
## [42,]  217  553
## [43,]  263  541
## [44,]  248  626
## [45,]  236  683
## [46,]  273  683
## [47,]  357  668
## [48,]  313  760
## [49,]  263  994
## [50,]  285 1206
## [51,]  357 1136

See if anywhere in the dataframe a zero occurs, which means a population dropped below the pseudo extinction threshold and was changed to 0.

any(mega_bears_pe == 0)

## [1] TRUE

#which(mega_bears_pe ==0,arr.ind = T)

This code creates a running tally of how many of our 1000 simulated populations went extinct. To make this work well I added a little bit of code using an if() statement. Don’t worry about this right now.

pops.pe <- rep(NA,  nrow(mega_bears_pe))

for(i in 1:nrow(mega_bears_pe)){
  
  
  #if statement
  if(any(mega_bears_pe[i,] == 0) == FALSE){
    next
  }
  
  pe <- which(mega_bears_pe[i,] == 0)
  
  N.pe <- length(pe)
  
  pops.pe[i] <- N.pe
}

Plot the cumulative extinction risk

This plot shows how as time passes, more of our simulated populations went extinct. The risk never gets too high (~0.06), which means that in 50 year’s we’d guess that the extinction risk is only about 6%.

par(mfrow = c(1,1))
year.t <- seq(1997, 1997+50)

dim(mega_bears_pe)

## [1]   51 1000

N.sims <- ncol(mega_bears_pe)

plot(pops.pe/N.sims ~ year.t,
     xlab = "Time",
     ylab = "Cumulative risk of pseudo-extinction")