Nathan Stewart 811847789

1. The General Structure of an R Notebook

R Notebook files (*Rmd) allow you to combine text elements with snippets of code and the code output. There are three main parts to an R Notebook file. In addition, each R Notebook automatically generates an HTML file that presents you with the formatted output, which you will ultimately submit for your homework.

1.1. The Header

The header, which you can see at the beginning of the document, is delineated with three dashes (—) at the beginning and the end. It includes some code that is important for the formatting of output files, so I would recommend not altering that section. In general, there should be no reason for you to change the header for any exercises in this course. However, if you would like to learn more about the different header options, you can find a good tutorial here.

1.2. Code Snippets

Code snippets are delineated with three ticks (’’’) at the beginning and the end, and {r} after the first set of ticks lets your computer know that you will be using the R programming language. You can always add a code chunk by clicking “Insert > R” above, although we usually already created all the snippets you will need. Any text within a snippet, if written correctly, represents executable code, which the computer can interpret as commands to execute certain tasks. You can make your computer execute the code on a snippet by pressing the small, green play arrow on the top right corner of each snippet, or you can just highlight the code and press command+enter (control+enter on PC). When you execute the code, the output Sometimes you will find us using hashtags (#) within code snippets. Hashtags “silence” the code the follows on the same line, such the computer jumps over that section when executing the code. That is useful for code annotation, and you will frequently see us using the hashtags to add further descriptions or explanations.

1.3. Text

The text in between code snippets is just that: text. We will use these sections to provide you with background information and discussion prompts, and you will use these sections to respond to questions and offer your interpretations of data. Sections where you need to write something are always highlighted in italics (designated with asterisks). You can use a variety to prompts to format your text (see here for a cheat sheet).

1.4. HTML Preview and Output

As already mentioned, your R Notebook (including text, code snippets, and the outputs from your code) can be automatically knitted into an HTML file. You can click “Preview > Preview Notebook” to see the live HTML file as you are working on your R Notebook (just make sure to save to update), and you can find the sharable HTML file in the same folder as you *.Rmd-file.

2. Getting Started

2.1. Setting Your Working Directory

Having a well-organized file structure is critical to avoid issues with coding, because you will frequently read in data files, and you need to make sure that your computer knows where to look for those files. To facilitate this process, we will provide you with all the necessary files in a zipped folder. We recommend that you move that *.zip file to the location where you want it before unzipping. After unzipping, do not move files out of that folder, unless you want to manually tell the computer where to look for readable files. The folder with your files is called a “Working Directory”. At the beginning of any work with an R Notebook, you need to tell your computer where your working directory is. In order to do that, simply execute the following snippet (remember, press the green play button).

#The following lines of code automatically check what folder your *.Rmd-file is in and sets that folder as your working directory

set_wd <- function() {
  library(rstudioapi)
  current_path <- getActiveDocumentContext()$path 
  setwd(dirname(current_path ))
  print( getwd() )
}

#If you want to manually set your working directory, you can use the setwd command with your specific path as seen below
#setwd("Path")

2.2. Loading Your Libraries

When you install R, your computer can understand and execute a number of commands. This is what is known as “Base R”. The power of R, however, is that you can expand the number of commands your computer understands by downloading and loading additional R packages (also called libraries). There are R packages specialized for pretty much any area of biology, providing a capability to analyze data from the level of genes and genomes to ecosystem level processes. We will frequently use a package called ggplot2, which allows for plotting data. Depending in the module, we may need to install additional libraries. To download and install new R packages, go to “Tools > Install Packages…” and type in the name of the package you want to install. Try it with ggplot2 now! You only need to do this once for every package. To make use of installed packages, you also need to load the packages every time you use R (i.e., every time you restart the program). You can do this with the library() command, and you will find a code snippet prompting you to load all needed libraries at the beginning of each R Notebook. You can try it here by executing the code snippet below to load ggplot2.

#Note that loading a library does not lead to an output
library(ggplot2)

2.3. Importing Data

One of the reasons we’re working through the coding basics here is of course that you will work with actual data. In order to do that, you will need to import data into R. With every exercise, we will provide you with one or more datasets. These datasets will come as *.csv files (which stands for comma-separated values). They are essentially text files containing data tables, and you can also open these up in Excel or other programs. In order to import data, we will use the read.csv() function. In the code snippet below, you can import a simple test dataset (“test_data.csv”) that includes the sex, length, and mass of an animal.

#The line of code simply prompts the computer to read the "test_data.csv" file and generate a data.frame called test.data
test.data <- read.csv("test_data.csv", fileEncoding = 'UTF-8-BOM')
test.data

##       sex length      mass
## 1  female     90  3.997350
## 2  female     85  3.424650
## 3  female     90  4.252500
## 4  female     94  4.683080
## 5  female    111  6.443883
## 6  female    105  5.154187
## 7  female    103  5.649292
## 8  female    101  4.886279
## 9  female     87  4.091044
## 10 female     82  3.664580
## 11 female     98  5.224576
## 12 female    113  5.975892
## 13 female     94  4.696334
## 14 female     81  3.188646
## 15 female     89  3.932777
## 16 female     81  3.595428
## 17 female    114  6.647454
## 18 female    108  6.292728
## 19 female    118  7.003772
## 20 female    104  5.088928
## 21 female    110  6.552150
## 22 female    116  7.044216
## 23 female    104  5.088928
## 24 female    119  6.896407
## 25 female     96  4.700160
## 26 female    120  7.502400
## 27 female    119  6.400772
## 28 female    106  5.168560
## 29 female     98  4.773188
## 30   male     98  5.286234
## 31   male    136  9.697823
## 32   male     94  5.468247
## 33   male    121  9.154139
## 34   male    115  8.245788
## 35   male    117  8.050775
## 36   male    131  9.515431
## 37   male     96  5.837046
## 38   male    105  6.324160
## 39   male    131 10.112634
## 40   male     96  5.254410
## 41   male    141 11.484856
## 42   male    133  9.438850
## 43   male    118  8.398957
## 44   male    112  7.486510
## 45   male    101  6.342574
## 46   male    123  8.344854
## 47   male    145 13.133477
## 48   male    112  6.555244
## 49   male    130 10.556754
## 50   male    127  8.887079
## 51   male     99  6.201877
## 52   male    110  6.786406
## 53   male    101  5.508336
## 54   male    116  8.225922
## 55   male    121  8.814468
## 56   male    127  8.952563
## 57   male    112  6.919020
## 58   male    141 11.923033
## 59   male    136 11.639533
## 60   male    133  9.951831

If this worked correctly, you should now see this dataset in your global environment (top right panel). You can double click it to view it. There should be three columns: sex, length, and mass.

3. Making Figures

A key learning objective of this course is that you learn to visualize data in different ways to facilitate interpretation in the context of different evolutionary hypotheses. In the following sections, I will explain step by step (that is code line by code line) how to make a simple graph with our sample dataset. Let’s aim to make a scatter plot showing the relationship between length and mass in our species. The process is not much different than sketching a graph by hand and layering different parts of the graph on top of each other, just that you use words (code) to make the computer draw.

3.1. Define the Axes and Coordinate System

The first step of making any graph is to define the axes and establish the coordinate grid that allows for the plotting of the data. This is accomplished with the following line of code:

#This line of code calls for the ggplot function (a plotting function) and make a grid based on the test.data data frame, using length as the x axis and mass as the y axis
ggplot(test.data, aes(x=length, y=mass))

3.2. Adding a Layer with Data Points

The second step is to draw the data into the established coordinate system. To do so, you just need to tell the program what kind of graph you want to draw. Different graph types in ggplot are referred to as geom_, and a scatter plot is designated as geom_point. You can literally add that to your existing code with a plus sign. For an overview of some of the graph types (geoms) ggplot offers, check here.

ggplot(test.data, aes(x=length, y=mass)) +
  geom_point()

3.3. Adding a Trendline

Whenever we look at the relationship between two variables, we may want to add a trendline. You can add a trendline by adding the geom_smooth command to your existing code.

#The code within the brackets of the geom_smooth command specified some additional options, namely that we want to draw a straight line (method="lm") and that we do not want to show the confidence interval (se=FALSE). Set the se=TRUE and see what happens.
ggplot(test.data, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

3.4. Changing the Axes Labels

The variable names in the dataset do not always provide the clearest description of what a variable means. We can modify the x and y axis labels using the xlab and ylab commands, respectively.

#Simply add the new label text in quotation marks
ggplot(test.data, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg")

3.5. Change the Theme

I honestly hate the default theme of ggplot with its gray background. But you can quickly alter the look of the graph by switching to a number of other possible themes. I personally like the theme_classic, but you can customize the look of your graph with themes listed here.

ggplot(test.data, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg") +
  theme_classic()

4. Your First Dataset: Darwin’s Finches

One of the most iconic study systems in evolutionary biology are Darwin’s finches on the Galapagos Islands. Rosemary and Peter Grant spend much of their lives devoted to the study of these bird, examining how their traits change in response to major ecological perturbations. To do so, they collected a massive, long-term data set on different traits of the medium ground finch (Geospiza fortis) population on the Daphne Major Island. For this exercise, we will take a look at their beak size data from 1972-1994.

4.1. Import data

The beak size data can be found in file called “finches.csv”. The file includes three variables: year, the average relative beak size (rel.beak.size), and the standard error (st.err) that describes the variability of beak size in any given year.

finch <- read.csv("finches.csv", fileEncoding = 'UTF-8-BOM')

4.2. Plotting the Data

The following code snippet provides the base code to make a scatter plot as above. You will only have to specify the x and y variables and label the axes correctly.

ggplot(finch, aes(x=year, y=rel.beak.size)) +
  geom_point() +
  xlab("year") +
  ylab("beak size") +
  theme_classic()

4.3. Adding Additional Graphical Elements

There are two graphical elements that we can add to facilitate the interpretation of the data:

Since this is a time series, it makes sense to connect the dots representing the means from year to year. You can do this by simply adding another geom (geom_line).
We want to know how much the average beak size changes relative to the variability in the population. If variability is high, year to year variation in may be negligible. But if variability is low, changes across year may actually be substantial. You can do this by adding another geom (geom_errorbar), as you can see below.

ggplot(finch, aes(x=year, y=rel.beak.size)) +
  geom_point() +
  geom_line() +
  geom_errorbar(aes(ymin=rel.beak.size-st.err, ymax=rel.beak.size+st.err))  +
  xlab("Year") +
  ylab("Beak Size") +
  theme_classic()

4.4. Interpretation

4.4.1. General patterns

What do you observe? How do you interpret the data if I told you that 1977 was a massive drought year?

It seems as though beak size trended towards being smaller over time. Knowing that 1977 was a massive drought year seems to point to that beak sizes become longer in years in and immediately following dry years. This would make since if food would be harder to obtain, thus requiring larger beaks to either reach food, or to eat food that was previously not necessary to eat.

4.4.2. Evolution… or Not?

Do you think these data reflect evolutionary change through time? What is a potential alternative explanation? What additional information would you need to either accept or reject the hypothesis that these patterns reflect evolutionary change?

I think that the data does indicate evolution over time. A possible alternative explanation is that a massive die-off of small beak finches occured, or the researchers just excluded them in their data. I would like to know the population sizes of the finches compared to beak size, as well as exact environmental data.

5. Resources

5.1. Data References

Grant, PR & BR Grant. 2002. Unpredictable evolution in a 30-year study of Darwin’s finches. Science 296: 707-711.

5.2 Resources You Consulted

Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.

I used the lecture from class.

An Introduction to R Notebooks and Evidence for Evolution