Sydney Tobis, 831694209
1. The General Structure of an R Notebook
R Notebook files (*.Rmd) allow you to combine text elements with
snippets of code and the outputs generated from said code. There are
three main parts to an R Notebook file, which are introduced in this
section. In addition, each R Notebook automatically generates an *.html
file that provides you with the formatted document including all
components, which you will ultimately submit for your homework.
1.2. Code Chunks
R code chunks are delineated with three ticks (''') at
the beginning and the end, and {r} after the first set of
ticks lets your computer know that you will be using the R programming
language. You can always add a code chunk by clicking “Insert > Code
Chunk > R” above or by clicking the “+C” icon, although we usually
already created all the chunks you will need in the template. Any text
within a code chunk, if written correctly, represents executable code,
which the computer can interpret as a command to perform certain tasks.
You can make your computer execute the code in a chunk by pressing the
small, green play arrow on the top right corner of each chunk, or you
can just highlight the code and press command+enter (control+enter on
PC). When you execute the code, the output will automatically appear
below a chunk. Sometimes you will find us using hash tags
(#) within code chunks. Hash tags “silence” the text that
follows on the same line, such that the computer jumps over that section
when executing the code. That is useful for code annotation, and you
will frequently see us using the hash tags to add further descriptions
or explanations within code chunks.
Pro tip: If you want to execute all code chunks in a document
automatically, you can click “Run > Run All” in the RStudio menu.
1.3. Text
The text in between code snippets is just that: text. We will use
these sections to provide you with background information and discussion
prompts, and you will use these sections to respond to questions and
offer your interpretations of data. Sections where you need to write
something are always highlighted in italics. You can use a
variety of prompts to format your text if you are working with basic
Markdown (see here
for a cheat sheet). Most of you, however, will prefer the text editor
that is implemented in R Studio to format text with the click of a
button.
Pro tip: You can toggle back and forth between source code (with
Markdown formatting) and the WYSIWYG editor (with text formatting
through clicking) by using the Source/Visual buttons in the RStudio
menu
1.4. HTML Preview and Output
As already mentioned, your R Notebook (including text, code chunks,
and the outputs from your code) can be automatically knitted into an
*.html file. You can click “Preview > Preview Notebook” or “Preview
> Knit to HTML” to see the live html version as you are working on
your R Notebook (just make sure to save to update), and you can find the
shareable *.html file in the same folder as your *.Rmd file (same file
name with .nb appended).
Note: Sometime R will prompt you to update some packages in the
Console before you can knit the html file. If it is not working on the
first try, make sure to check for prompts in the Console.
2. Getting Started
2.1. Setting Your Working Directory
Having a well-organized file structure is critical to avoid issues
with coding, because you will frequently read in data files, and you
need to make sure that R knows where to look for those files. To
facilitate this process, we will provide you with all the necessary
files in a zipped folder (if you are working through this, you have
already found the first file). We recommend that you move that *.zip
file to the location where you want it (e.g., your folder for this
course) before unzipping.
The folder containing the files for a particular exercise is called a
“Working Directory”, and opening an *.Rmd file automatically sets the
working directory to the directory of that R Notebook file. So after
unzipping, it is important not to move any files out of the folder we
provide you with, unless you want to manually tell R where to look for
readable files. If so, you can use the setwd() command to
point R toward the location of your files (see
textbook for details).
2.2. Loading Your Libraries
When you install R, your computer can understand and execute a number
of commands. This is what is known as “Base R”. The power of R, however,
is that you can expand the number of commands your computer understands
by installing and loading additional R packages (also called libraries).
There are R packages specialized for pretty much any area of biology,
providing the capability to analyze data from the level of genes and
genomes to ecosystem level processes. We will frequently use a package
called ggplot2, which allows for plotting data. Depending
on the module, you will need to install additional libraries. To
download and install new R packages, go to “Tools > Install
Packages…” and type in the name of the package you want to install.
Alternatively, you can use the install.packages() function.
Fore example, execute the following code chunk to install
ggplot2:
Note that you only need to install every package once (unless you
reinstall R). I recommend deleting the code chunk above after you run it
successfully, or you can silence it by a hash tag in the beginning of
install.packages("ggplot2"). Failure to do so can cause
problems during the export (knitting) of your R Notebook as an *.html
file.
To make use of installed packages, you also need to load the packages
every time you use R (i.e., every time you restart the
program). You can do this with the library() command, and
you will find a code snippet prompting you to load all needed libraries
at the beginning of each R Notebook (in a section that is typically
called dependencies). You can try it here by executing the code chunk
below to load ggplot2:
2.3. Importing Data
One of the reasons we’re working through the coding basics here is of
course that you will work with actual data. To do that, you will need to
import data into R. With every exercise, we will provide you with one or
more data sets. These data sets will mostly come as *.csv files (which
stands for comma-separated values). They are essentially text files
containing data tables, and you can also open these files in Excel or
other programs. To import data, we will use the read.csv()
function. In the code chunk below, you can import a simple test data set
(“test_data.csv”) that includes the variables sex, length, and mass for
a population of an animal. Note that the fileEncoding
argument simply indicates that I generated the input files on a Mac,
which will prevent some import issues for those of you that use a
PC.
If this worked correctly, you should now see this data set as
test.data in your global environment (top right panel). You
can double click it to view it. There should be three columns: sex,
length, and mass.
4. Your First Data Set: Darwin’s Finches
One of the most iconic study systems in evolutionary biology are
Darwin’s finches on the Galapagos Islands. Rosemary and Peter Grant
spent much of their lives devoted to the study of these bird, examining
how their traits change in response to major ecological perturbations.
To do so, they collected a massive, long-term data set on different
traits of the medium ground finch (Geospiza fortis) population
on Daphne Major Island. For this exercise, we will take a look at their
beak size data from 1972-1994.

4.1. Import data
The beak size data can be found in file called “finches.csv”. The
file includes three variables: year, the average relative beak size
(rel.beak.size), and the standard error (st.err) that describes the
variability of beak size in any given year.
4.2. Plotting the Data
The following code chunk provides the base code to make a scatter
plot as above. You will only have to specify the x and y variables and
label the axes correctly.
4.3. Adding Additional Graphical Elements
There are two graphical elements that we can add to facilitate the
interpretation of the data:
- Since this is a time series, it makes sense to connect the dots
representing the means from year to year. You can do this by simply
adding another geom:
geom_line().
- We want to know how much the average beak size changes relative to
the variability in the population. If variability is high, year to year
variation in may be negligible. But if variability is low, changes
across year may actually be substantial. You can do this by adding
another geom:
geom_errorbar(). Make sure to specify the x
and y axes variables as above
4.4. Interpretation
4.4.1. General patterns
Based on the graphs you just made, what do you observe? How do you
interpret the data if I told you that 1977 was a massive drought
year?
The beaks were the smallest in 1975, then after 1977 the beak sizes
stayed pretty consistent. If 1977 was a massive drought year, the
average relative beak size must have increased because they were eating
things that would provide them with water. Because this was outside of
their normal diet, it could have increased their beak size.
4.4.2. Evolution… or Not?
Do you think these data reflect evolutionary change through time?
What is a potential alternative explanation? What additional information
would you need to either accept or reject the hypothesis that these
patterns reflect evolutionary change?
The data could also reflect a parent generation passing down their
beak size that they generated during their lifetime. The relative beak
size change is seems rapid in the graph, so that could be an
explanation. To prove that it is evolutionary change, I would want to
see the relative beak size of different generations.
5. Resources
5.1. Data References
Data on beak size variation in Darwin’s finches came from the
following publication:
5.2 Resources You Consulted
Consulting additional resources to solve this assignment is
absolutely allowed, but failure to disclose those resources is
plagiarism. Please list any collaborators you worked with and resources
you used below or state that you have not used any.
I have not used any additional resources
