R Notebook files (*Rmd) allow you to combine text elements with snippets of code and the code output. There are three main parts to an R Notebook file. In addition, each R Notebook automatically generates an HTML file that presents you with the formatted output, which you will ultimately submit for your homework.
The header, which you can see at the beginning of the document, is delineated with three dashes (—) at the beginning and the end. It includes some code that is important for the formatting of output files, so I would recommend not altering that section. In general, there should be no reason for you to change the header for any exercises in this course. However, if you would like to learn more about the different header options, you can find a good tutorial here.
Code snippets are delineated with three ticks (’’’) at the beginning and the end, and {r} after the first set of ticks lets your computer know that you will be using the R programming language. You can always add a code chunk by clicking “Insert > R” above, although we usually already created all the snippets you will need. Any text within a snippet, if written correctly, represents executable code, which the computer can interpret as commands to execute certain tasks. You can make your computer execute the code on a snippet by pressing the small, green play arrow on the top right corner of each snippet, or you can just highlight the code and press command+enter (control+enter on PC). When you execute the code, the output Sometimes you will find us using hashtags (#) within code snippets. Hashtags “silence” the code the follows on the same line, such the computer jumps over that section when executing the code. That is useful for code annotation, and you will frequently see us using the hashtags to add further descriptions or explanations.
The text in between code snippets is just that: text. We will use these sections to provide you with background information and discussion prompts, and you will use these sections to respond to questions and offer your interpretations of data. Sections where you need to write something are always highlighted in italics (designated with asterisks). You can use a variety to prompts to format your text (see here for a cheat sheet).
As already mentioned, your R Notebook (including text, code snippets, and the outputs from your code) can be automatically knitted into an HTML file. You can click “Preview > Preview Notebook” to see the live HTML file as you are working on your R Notebook (just make sure to save to update), and you can find the sharable HTML file in the same folder as you *.Rmd-file.
Having a well-organized file structure is critical to avoid issues with coding, because you will frequently read in data files, and you need to make sure that your computer knows where to look for those files. To facilitate this process, we will provide you with all the necessary files in a zipped folder. We recommend that you move that *.zip file to the location where you want it before unzipping. After unzipping, do not move files out of that folder, unless you want to manually tell the computer where to look for readable files. The folder with your files is called a “Working Directory”. At the beginning of any work with an R Notebook, you need to tell your computer where your working directory is. In order to do that, simply execute the following snippet (remember, press the green play button).
#The following lines of code automatically check what folder your *.Rmd-file is in and sets that folder as your working directory
set_wd <- function() {
library(rstudioapi)
current_path <- getActiveDocumentContext()$path
setwd(dirname(current_path ))
print( getwd() )
}
#If you want to manually set your working directory, you can use the setwd command with your specific path as seen below
#setwd("Path")
When you install R, your computer can understand and execute a number of commands. This is what is known as “Base R”. The power of R, however, is that you can expand the number of commands your computer understands by downloading and loading additional R packages (also called libraries). There are R packages specialized for pretty much any area of biology, providing a capability to analyze data from the level of genes and genomes to ecosystem level processes. We will frequently use a package called ggplot2, which allows for plotting data. Depending in the module, we may need to install additional libraries. To download and install new R packages, go to “Tools > Install Packages…” and type in the name of the package you want to install. Try it with ggplot2 now! You only need to do this once for every package. To make use of installed packages, you also need to load the packages every time you use R (i.e., every time you restart the program). You can do this with the library() command, and you will find a code snippet prompting you to load all needed libraries at the beginning of each R Notebook. You can try it here by executing the code snippet below to load ggplot2.
#Note that loading a library does not lead to an output
library(ggplot2)
One of the reasons we’re working through the coding basics here is of course that you will work with actual data. In order to do that, you will need to import data into R. With every exercise, we will provide you with one or more datasets. These datasets will come as *.csv files (which stands for comma-separated values). They are essentially text files containing data tables, and you can also open these up in Excel or other programs. In order to import data, we will use the read.csv() function. In the code snippet below, you can import a simple test dataset (“test_data.csv”) that includes the sex, length, and mass of an animal.
#The line of code simply prompts the computer to read the "test_data.csv" file and generate a data.frame called test.data
test.data <- read.csv("test_data.csv", fileEncoding = 'UTF-8-BOM')
test.data
## sex length mass
## 1 female 90 3.997350
## 2 female 85 3.424650
## 3 female 90 4.252500
## 4 female 94 4.683080
## 5 female 111 6.443883
## 6 female 105 5.154187
## 7 female 103 5.649292
## 8 female 101 4.886279
## 9 female 87 4.091044
## 10 female 82 3.664580
## 11 female 98 5.224576
## 12 female 113 5.975892
## 13 female 94 4.696334
## 14 female 81 3.188646
## 15 female 89 3.932777
## 16 female 81 3.595428
## 17 female 114 6.647454
## 18 female 108 6.292728
## 19 female 118 7.003772
## 20 female 104 5.088928
## 21 female 110 6.552150
## 22 female 116 7.044216
## 23 female 104 5.088928
## 24 female 119 6.896407
## 25 female 96 4.700160
## 26 female 120 7.502400
## 27 female 119 6.400772
## 28 female 106 5.168560
## 29 female 98 4.773188
## 30 male 98 5.286234
## 31 male 136 9.697823
## 32 male 94 5.468247
## 33 male 121 9.154139
## 34 male 115 8.245788
## 35 male 117 8.050775
## 36 male 131 9.515431
## 37 male 96 5.837046
## 38 male 105 6.324160
## 39 male 131 10.112634
## 40 male 96 5.254410
## 41 male 141 11.484856
## 42 male 133 9.438850
## 43 male 118 8.398957
## 44 male 112 7.486510
## 45 male 101 6.342574
## 46 male 123 8.344854
## 47 male 145 13.133477
## 48 male 112 6.555244
## 49 male 130 10.556754
## 50 male 127 8.887079
## 51 male 99 6.201877
## 52 male 110 6.786406
## 53 male 101 5.508336
## 54 male 116 8.225922
## 55 male 121 8.814468
## 56 male 127 8.952563
## 57 male 112 6.919020
## 58 male 141 11.923033
## 59 male 136 11.639533
## 60 male 133 9.951831
If this worked correctly, you should now see this dataset in your global environment (top right panel). You can double click it to view it. There should be three columns: sex, length, and mass.
A key learning objective of this course is that you learn to visualize data in different ways to facilitate interpretation in the context of different evolutionary hypotheses. In the following sections, I will explain step by step (that is code line by code line) how to make a simple graph with our sample dataset. Let’s aim to make a scatter plot showing the relationship between length and mass in our species. The process is not much different than sketching a graph by hand and layering different parts of the graph on top of each other, just that you use words (code) to make the computer draw.
The first step of making any graph is to define the axes and establish the coordinate grid that allows for the plotting of the data. This is accomplished with the following line of code:
#This line of code calls for the ggplot function (a plotting function) and make a grid based on the test.data data frame, using length as the x axis and mass as the y axis
ggplot(test.data, aes(x=length, y=mass))
The second step is to draw the data into the established coordinate system. To do so, you just need to tell the program what kind of graph you want to draw. Different graph types in ggplot are referred to as geom_, and a scatter plot is designated as geom_point. You can literally add that to your existing code with a plus sign. For an overview of some of the graph types (geoms) ggplot offers, check here.
ggplot(test.data, aes(x=length, y=mass)) +
geom_point()
Whenever we look at the relationship between two variables, we may want to add a trendline. You can add a trendline by adding the geom_smooth command to your existing code.
#The code within the brackets of the geom_smooth command specified some additional options, namely that we want to draw a straight line (method="lm") and that we do not want to show the confidence interval (se=FALSE). Set the se=TRUE and see what happens.
ggplot(test.data, aes(x=length, y=mass)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)
The variable names in the dataset do not always provide the clearest description of what a variable means. We can modify the x and y axis labels using the xlab and ylab commands, respectively.
#Simply add the new label text in quotation marks
ggplot(test.data, aes(x=length, y=mass)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
xlab("Body length in cm") +
ylab("Body mass in kg")
I honestly hate the default theme of ggplot with its gray background. But you can quickly alter the look of the graph by switching to a number of other possible themes. I personally like the theme_classic, but you can customize the look of your graph with themes listed here.
ggplot(test.data, aes(x=length, y=mass)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
xlab("Body length in cm") +
ylab("Body mass in kg") +
theme_classic()
One of the most iconic study systems in evolutionary biology are Darwin’s finches on the Galapagos Islands. Rosemary and Peter Grant spend much of their lives devoted to the study of these bird, examining how their traits change in response to major ecological perturbations. To do so, they collected a massive, long-term data set on different traits of the medium ground finch (Geospiza fortis) population on the Daphne Major Island. For this exercise, we will take a look at their beak size data from 1972-1994.
The beak size data can be found in file called “finches.csv”. The file includes three variables: year, the average relative beak size (rel.beak.size), and the standard error (st.err) that describes the variability of beak size in any given year.
finch <- read.csv("finches.csv", fileEncoding = 'UTF-8-BOM')
The following code snippet provides the base code to make a scatter plot as above. You will only have to specify the x and y variables and label the axes correctly.
ggplot(finch, aes(x=year, y=rel.beak.size)) +
geom_point() +
xlab("year") +
ylab("beak size") +
theme_classic()
There are two graphical elements that we can add to facilitate the interpretation of the data:
ggplot(finch, aes(x=year, y=rel.beak.size)) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin=rel.beak.size-st.err, ymax=rel.beak.size+st.err)) +
xlab("Year") +
ylab("Beak Size") +
theme_classic()
What do you observe? How do you interpret the data if I told you that 1977 was a massive drought year?
It seems as though beak size trended towards being smaller over time. Knowing that 1977 was a massive drought year seems to point to that beak sizes become longer in years in and immediately following dry years. This would make since if food would be harder to obtain, thus requiring larger beaks to either reach food, or to eat food that was previously not necessary to eat.
Do you think these data reflect evolutionary change through time? What is a potential alternative explanation? What additional information would you need to either accept or reject the hypothesis that these patterns reflect evolutionary change?
I think that the data does indicate evolution over time. A possible alternative explanation is that a massive die-off of small beak finches occured, or the researchers just excluded them in their data. I would like to know the population sizes of the finches compared to beak size, as well as exact environmental data.
Grant, PR & BR Grant. 2002. Unpredictable evolution in a 30-year study of Darwin’s finches. Science 296: 707-711.
Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.
I used the lecture from class.