Today, scientists must be trained to “get the most out of” different sources of information. Part of the work involves processing databases, summarizing, plotting, and fitting models to data. Working with Excel-type tools is not only limiting (and inelegant) but also goes against the “reproducibility” of research since we normally do not have detailed documentation of how a result or a particular graph is reached (there is a lot mouse movement and clicks involved). Reproducibility is (or should be) a minimum standard for scientific work. Although, the fact that something is reproducible does not mean that it is correct, at least it makes it easier to evaluate, revise, and reuse. An important step to achieve reproducibility is to do everything with written instructions (scripts) instead of clicking with the mouse.
In short, to work in science you have to know a minimum of
programming. If we have to choose which language to program in,
R or Python seem like the best options to
date, but R is currently the most popular among ecologists
and is the one we are going to use in this course. R is a
public domain implementation of the statistical language S
that has gained popularity for several reasons. It can be downloaded
from https://cran.r-project.org/ and works on PC, Mac and
Linux. In addition to being used for statistical analysis,
R can be programmed to do almost anything. It has
object-based programming and functional programming. There is a growing
community of users and it is easy to get help to solve problems.
R is distributed with a rudimentary graphical interface,
but there are several reasons to work with R through RStudio. Among other things, RStudio
makes it possible to connect to online repositories like GitHUb or Bitbucket for version control and to
create documents (like this one that was written in RStudio) that
include R code, figures, equations, etc. By learning to use
R and document our work with the tools available through
RStudio we will be able to work better and more efficiently. It doesn’t
seem like it at first, but it is much better to work
with command logs (scripts) and make our research reproducible for our
future self and others.
R BasicsR works like a calculator:
## [1] 4
## [1] 6
## [1] 9
## [1] 0.6666667
## [1] 2.718282
## [1] 2.302585
## [1] 1
R solves logical operations:
## [1] TRUE
## [1] FALSE
## [1] TRUE
A fundamental aspect that we have to master is that R
works with vectors, matrices, lists and “dataframes”.
Let’s start by generating a vector:
The c command concatenates the values between commas.
The <- arrow maps what is on the right to the ‘object’
on the left. Instead of <- we can use = but
R purists prefer <-.
To see the values of the vector in the command window we write its name in the command window and press enter:
## [1] 2 6 18 54 162 486
A special kind of vector are sequences. To generate a sequence of values we can do, for example:
Let’s pretend that the vectors n and yr
correspond to surveys of an invasive species. If we want to see the
population size in the year \(2010\) we
do:
## [1] 6
or
## [1] 6
or we can use the function which:
## [1] 6
If we want to see population size from \(2011\) to \(2014\)
## [1] 18 54 162 486
or
## [1] 18 54 162 486
using which:
## [1] 18 54 162 486
Something interesting to see in a data set like this is the
“reproductive factor” or “finite rate of population increase”, which is
nothing other than \(n(t+1) / n(t)\).
For example, for \(2010\) the growth
rate was: n[2]/n[1] \(=\)
3.
In order not to repeat the counts year by year we can write a “loop” but first we need to generate a vector where we are going to save the results:
Here we combine two R functions: numeric
and length. To see what numeric does we can do
?numeric.
By putting a question mark and then the name of an R
function we access the documentation for that function (which is
sometimes not very helpful…). A better way to find help and even
information about features we don’t know about is to visit http://rseek.org/.
Returning to our loop, what we want is to repeat an operation (or several operations) and save the results:
The for command in this case repeats in values of
i the calculation \(n(t+1)/n(t)\) which in R
language is n[i]/[i-1] because i is a sequence
defined from \(2\) to the length of the
vector n. We use the value of i for each
iteration to find which values of n to use in the
calculation and to indicate in which position of the rate
vector we store the result.
In this case we could obtain the same values by doing:
It is important to note that 1:(length(n)-1) is not the
same as 1:length(n)-1. Check it out…
To see what came out of these calculations we wrote
## [1] 3 3 3 3 3
To plot the rate of increase in each year we can use
library(ggplot2)
data = data.frame(
yr = yr[2:length(n)],
rate = rate
)
ggplot(data, aes(x = yr, y = rate)) +
geom_point() +
labs(x = "year",
y = "rate of population increase") +
ylim(2,4) +
theme_minimal()Now that we learned how to use a for loop, we can see that
in this case we didn’t need it! In fact, a very nice feature of
R is that it allows you to do operations on vectors, lists,
etc. as we will see in due course.
In any case, it is important to learn how to handle loops since they are a basic programming tool.
We can see that the rate of population increase is constant and equal to 3. We can use that value to project the population size in the future:
lambda <- 3
yr <- c(yr, 2015:2020)
n <- c(n, numeric(6))
for(i in 7:length(n)){
n[i] <- n[i-1] * lambda
}
plot(yr,n)We can also try defining initial conditions and using the formula \(n(t) = n(0) \lambda^{t}\).
# we can plot both trajectories
plot(yr,n)
lines(t + yr[1], nt, col = 2)
lines(t + yr[1], nt, type = "s") It is not very plausible that the population continues and continues to grow like \(n(t) = n(0) \lambda^{t}\) (it is also not very likely that there is no variability between years but that is another topic that we will discuss in due course). One option is to think that the rate of population increase decreases with population size (logistic model).
lambda <- 1.8
N <- 0:100
K <- 80 # carrying capacity
rd <- lambda - 1 # per capita rate of increase
plot(N, lambda - rd/K * N, type="l", ylab="Lambda(N)", ylim=c(0,3), lwd=2)
abline(h=1, lty=2, lwd=2)
abline(v=K, lty=2, lwd=2)# Simulate
n0 <- 2
n <- numeric(20)
n[1] <- n0
for(t in 1:(length(n)-1)){
n[t+1] <- n[t] + rd * n[t] * (1 - n[t]/K)
}
plot(1:length(n), n, type = "l", lwd = 2, xlab = "time")A key aspect of programming is writing functions. A function is a set of instructions that takes inputs or arguments, uses those inputs to calculate other values, and returns as a result. For example, we can write a function for the logistic model of population growth:
logistic <- function(n0,rd,K,nyr=10){
n <- numeric(nyr)
n[1] <- n0
for(t in 1:(length(n) - 1)){
n[t+1] <- n[t] + rd * n[t] * (1 - n[t]/K)
}
return(n)
}This function has as arguments n0, rd,
K and the number of years (nyr) that we want
to simulate.
Unless we tell it otherwise, R respects the order in
which we define the function inputs. So, if we want to simulate the
trajectory of a population that follows a logistic model with
n0 = 1, rd = 1.5, and K = 100 for
10 generations, we can do:
In this case, we didn’t tell the function that we wanted
nyr = 10 because it is defined by default. It’s actually
not a very good idea to pass values in this way to functions. It is
better for us to be more explicit:
We can even change the order of the arguments if we call them by name:
Even better is to define the input values separately:
Write a for loop to calculate the cumulative sum along a vector. For example, the cumulative sum of a vector from 1 to 10 is 1, 3, 6, 10, 15, 21, 28, 36, 45, 55.
Write a double loop to build a 10 by 10 multiplication
table. Hint: first you have to build a 10 by 10 matrix using the
matrix function.
Bill Ricker (1954) changed the linear relationship between \(\lambda\) and \(n\) of the logistic model for an exponential decay \(exp(-a n)\). Write a function to simulate population trajectories with the Ricker model and compare it with the logistic model. ***
Juan Manuel Morales: 2023-12-09