Sydney Tobis, 831694209
1. The General Structure of an R Notebook
R Notebook files (*.Rmd) allow you to combine text elements with
snippets of code and the outputs generated from said code. There are
three main parts to an R Notebook file, which are introduced in this
section. In addition, each R Notebook automatically generates an *.html
file that provides you with the formatted document including all
components, which you will ultimately submit for your homework.
1.2. Code Chunks
R code chunks are delineated with three ticks (''') at
the beginning and the end, and {r} after the first set of
ticks lets your computer know that you will be using the R programming
language. You can always add a code chunk by clicking “Insert > Code
Chunk > R” above or by clicking the “+C” icon, although we usually
already created all the chunks you will need in the template. Any text
within a code chunk, if written correctly, represents executable code,
which the computer can interpret as a command to perform certain tasks.
You can make your computer execute the code in a chunk by pressing the
small, green play arrow on the top right corner of each chunk, or you
can just highlight the code and press command+enter (control+enter on
PC). When you execute the code, the output will automatically appear
below a chunk. Sometimes you will find us using hash tags
(#) within code chunks. Hash tags “silence” the text that
follows on the same line, such that the computer jumps over that section
when executing the code. That is useful for code annotation, and you
will frequently see us using the hash tags to add further descriptions
or explanations within code chunks.
Pro tip: If you want to execute all code chunks in a document
automatically, you can click “Run > Run All” in the RStudio menu.
1.3. Text
The text in between code snippets is just that: text. We will use
these sections to provide you with background information and discussion
prompts, and you will use these sections to respond to questions and
offer your interpretations of data. Sections where you need to write
something are always highlighted in italics. You can use a
variety of prompts to format your text if you are working with basic
Markdown (see here
for a cheat sheet). Most of you, however, will prefer the text editor
that is implemented in R Studio to format text with the click of a
button.
Pro tip: You can toggle back and forth between source code (with
Markdown formatting) and the WYSIWYG editor (with text formatting
through clicking) by using the Source/Visual buttons in the RStudio
menu
1.4. HTML Preview and Output
As already mentioned, your R Notebook (including text, code chunks,
and the outputs from your code) can be automatically knitted into an
*.html file. You can click “Preview > Preview Notebook” or “Preview
> Knit to HTML” to see the live html version as you are working on
your R Notebook (just make sure to save to update), and you can find the
shareable *.html file in the same folder as your *.Rmd file (same file
name with .nb appended).
Note: Sometime R will prompt you to update some packages in the
Console before you can knit the html file. If it is not working on the
first try, make sure to check for prompts in the Console.
2. Getting Started
2.1. Setting Your Working Directory
Having a well-organized file structure is critical to avoid issues
with coding, because you will frequently read in data files, and you
need to make sure that R knows where to look for those files. To
facilitate this process, we will provide you with all the necessary
files in a zipped folder (if you are working through this, you have
already found the first file). We recommend that you move that *.zip
file to the location where you want it (e.g., your folder for this
course) before unzipping.
The folder containing the files for a particular exercise is called a
“Working Directory”, and opening an *.Rmd file automatically sets the
working directory to the directory of that R Notebook file. So after
unzipping, it is important not to move any files out of the folder we
provide you with, unless you want to manually tell R where to look for
readable files. If so, you can use the setwd() command to
point R toward the location of your files (see
textbook for details).
2.2. Loading Your Libraries
When you install R, your computer can understand and execute a number
of commands. This is what is known as “Base R”. The power of R, however,
is that you can expand the number of commands your computer understands
by installing and loading additional R packages (also called libraries).
There are R packages specialized for pretty much any area of biology,
providing the capability to analyze data from the level of genes and
genomes to ecosystem level processes. We will frequently use a package
called ggplot2, which allows for plotting data. Depending
on the module, you will need to install additional libraries. To
download and install new R packages, go to “Tools > Install
Packages…” and type in the name of the package you want to install.
Alternatively, you can use the install.packages() function.
Fore example, execute the following code chunk to install
ggplot2:
Note that you only need to install every package once (unless you
reinstall R). I recommend deleting the code chunk above after you run it
successfully, or you can silence it by a hash tag in the beginning of
install.packages("ggplot2"). Failure to do so can cause
problems during the export (knitting) of your R Notebook as an *.html
file.
To make use of installed packages, you also need to load the packages
every time you use R (i.e., every time you restart the
program). You can do this with the library() command, and
you will find a code snippet prompting you to load all needed libraries
at the beginning of each R Notebook (in a section that is typically
called dependencies). You can try it here by executing the code chunk
below to load ggplot2:
2.3. Importing Data
One of the reasons we’re working through the coding basics here is of
course that you will work with actual data. To do that, you will need to
import data into R. With every exercise, we will provide you with one or
more data sets. These data sets will mostly come as *.csv files (which
stands for comma-separated values). They are essentially text files
containing data tables, and you can also open these files in Excel or
other programs. To import data, we will use the read.csv()
function. In the code chunk below, you can import a simple test data set
(“test_data.csv”) that includes the variables sex, length, and mass for
a population of an animal. Note that the fileEncoding
argument simply indicates that I generated the input files on a Mac,
which will prevent some import issues for those of you that use a
PC.
If this worked correctly, you should now see this data set as
test.data in your global environment (top right panel). You
can double click it to view it. There should be three columns: sex,
length, and mass.
4. Your First Data Set: Darwin’s Finches
One of the most iconic study systems in evolutionary biology are
Darwin’s finches on the Galapagos Islands. Rosemary and Peter Grant
spent much of their lives devoted to the study of these bird, examining
how their traits change in response to major ecological perturbations.
To do so, they collected a massive, long-term data set on different
traits of the medium ground finch (Geospiza fortis) population
on Daphne Major Island. For this exercise, we will take a look at their
beak size data from 1972-1994.

4.1. Import data
The beak size data can be found in file called “finches.csv”. The
file includes three variables: year, the average relative beak size
(rel.beak.size), and the standard error (st.err) that describes the
variability of beak size in any given year.
4.2. Plotting the Data
The following code chunk provides the base code to make a scatter
plot as above. You will only have to specify the x and y variables and
label the axes correctly.
4.3. Adding Additional Graphical Elements
There are two graphical elements that we can add to facilitate the
interpretation of the data:
- Since this is a time series, it makes sense to connect the dots
representing the means from year to year. You can do this by simply
adding another geom:
geom_line().
- We want to know how much the average beak size changes relative to
the variability in the population. If variability is high, year to year
variation in may be negligible. But if variability is low, changes
across year may actually be substantial. You can do this by adding
another geom:
geom_errorbar(). Make sure to specify the x
and y axes variables as above
4.4. Interpretation
4.4.1. General patterns
Based on the graphs you just made, what do you observe? How do you
interpret the data if I told you that 1977 was a massive drought
year?
The beaks were the smallest in 1975, then after 1977 the beak sizes
stayed pretty consistent. If 1977 was a massive drought year, the
average relative beak size must have increased because they were eating
things that would provide them with water. Because this was outside of
their normal diet, it could have increased their beak size.
4.4.2. Evolution… or Not?
Do you think these data reflect evolutionary change through time?
What is a potential alternative explanation? What additional information
would you need to either accept or reject the hypothesis that these
patterns reflect evolutionary change?
The data could also reflect a parent generation passing down their
beak size that they generated during their lifetime. The relative beak
size change is seems rapid in the graph, so that could be an
explanation. To prove that it is evolutionary change, I would want to
see the relative beak size of different generations.
5. Resources
5.1. Data References
Data on beak size variation in Darwin’s finches came from the
following publication:
5.2 Resources You Consulted
Consulting additional resources to solve this assignment is
absolutely allowed, but failure to disclose those resources is
plagiarism. Please list any collaborators you worked with and resources
you used below or state that you have not used any.
I have not used any additional resources
---
title: "An Introduction to R Notebooks and Evidence for Evolution"
output:
  html_notebook:
    fig_caption: yes
    toc: yes
    toc_depth: 3
    toc_float: yes
  pdf_document:
    toc: yes
    toc_depth: '3'
  html_document:
    keep_md: TRUE
---

## Sydney Tobis, 831694209

------------------------------------------------------------------------

# 1. The General Structure of an R Notebook

R Notebook files (\*.Rmd) allow you to combine text elements with snippets of code and the outputs generated from said code. There are three main parts to an R Notebook file, which are introduced in this section. In addition, each R Notebook automatically generates an \*.html file that provides you with the formatted document including all components, which you will ultimately submit for your homework.

------------------------------------------------------------------------

## 1.1. The Header

The header, which you can see at the beginning of this document, is delineated with three dashes (`---`) at the beginning and the end. It includes some code that is important for the formatting of output files, so I would recommend not altering that section. In general, there should be no reason for you to change the header for any exercises in this course. However, if you would like to learn more about the different header options, you can find a good tutorial [here](https://bookdown.org/yihui/rmarkdown/html-document.html#table-of-contents).

------------------------------------------------------------------------

## 1.2. Code Chunks

R code chunks are delineated with three ticks (`'''`) at the beginning and the end, and `{r}` after the first set of ticks lets your computer know that you will be using the R programming language. You can always add a code chunk by clicking "Insert \> Code Chunk \> R" above or by clicking the "+C" icon, although we usually already created all the chunks you will need in the template. Any text within a code chunk, if written correctly, represents executable code, which the computer can interpret as a command to perform certain tasks. You can make your computer execute the code in a chunk by pressing the small, green play arrow on the top right corner of each chunk, or you can just highlight the code and press command+enter (control+enter on PC). When you execute the code, the output will automatically appear below a chunk. Sometimes you will find us using hash tags (`#`) within code chunks. Hash tags "silence" the text that follows on the same line, such that the computer jumps over that section when executing the code. That is useful for code annotation, and you will frequently see us using the hash tags to add further descriptions or explanations within code chunks.

Pro tip: If you want to execute all code chunks in a document automatically, you can click "Run \> Run All" in the RStudio menu.

------------------------------------------------------------------------

## 1.3. Text

The text in between code snippets is just that: text. We will use these sections to provide you with background information and discussion prompts, and you will use these sections to respond to questions and offer your interpretations of data. Sections where you need to write something are always highlighted in *italics*. You can use a variety of prompts to format your text if you are working with basic Markdown (see [here](https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) for a cheat sheet). Most of you, however, will prefer the text editor that is implemented in R Studio to format text with the click of a button.

Pro tip: You can toggle back and forth between source code (with Markdown formatting) and the WYSIWYG editor (with text formatting through clicking) by using the Source/Visual buttons in the RStudio menu

------------------------------------------------------------------------

## 1.4. HTML Preview and Output

As already mentioned, your R Notebook (including text, code chunks, and the outputs from your code) can be automatically knitted into an \*.html file. You can click "Preview \> Preview Notebook" or "Preview \> Knit to HTML" to see the live html version as you are working on your R Notebook (just make sure to save to update), and you can find the shareable \*.html file in the same folder as your \*.Rmd file (same file name with .nb appended).

Note: Sometime R will prompt you to update some packages in the Console before you can knit the html file. If it is not working on the first try, make sure to check for prompts in the Console.

------------------------------------------------------------------------

# 2. Getting Started

------------------------------------------------------------------------

## 2.1. Setting Your Working Directory

Having a well-organized file structure is critical to avoid issues with coding, because you will frequently read in data files, and you need to make sure that R knows where to look for those files. To facilitate this process, we will provide you with all the necessary files in a zipped folder (if you are working through this, you have already found the first file). We recommend that you move that \*.zip file to the location where you want it (e.g., your folder for this course) before unzipping.

The folder containing the files for a particular exercise is called a "Working Directory", and opening an \*.Rmd file automatically sets the working directory to the directory of that R Notebook file. So after unzipping, it is important not to move any files out of the folder we provide you with, unless you want to manually tell R where to look for readable files. If so, you can use the `setwd()` command to point R toward the location of your files ([see textbook for details](https://www.k-state.edu/biology/p2e/evidence-for-evolution.html#import-data)).

------------------------------------------------------------------------

## 2.2. Loading Your Libraries

When you install R, your computer can understand and execute a number of commands. This is what is known as "Base R". The power of R, however, is that you can expand the number of commands your computer understands by installing and loading additional R packages (also called libraries). There are R packages specialized for pretty much any area of biology, providing the capability to analyze data from the level of genes and genomes to ecosystem level processes. We will frequently use a package called `ggplot2`, which allows for plotting data. Depending on the module, you will need to install additional libraries. To download and install new R packages, go to "Tools \> Install Packages..." and type in the name of the package you want to install. Alternatively, you can use the `install.packages()` function. Fore example, execute the following code chunk to install `ggplot2`:

```{r}
#Install ggplot2
#install.packages("ggplot2")
```

Note that you only need to install every package once (unless you reinstall R). I recommend deleting the code chunk above after you run it successfully, or you can silence it by a hash tag in the beginning of `install.packages("ggplot2")`. Failure to do so can cause problems during the export (knitting) of your R Notebook as an \*.html file.

To make use of installed packages, you also need to load the packages *every time* you use R (*i.e.*, every time you restart the program). You can do this with the `library()` command, and you will find a code snippet prompting you to load all needed libraries at the beginning of each R Notebook (in a section that is typically called dependencies). You can try it here by executing the code chunk below to load `ggplot2`:

```{r}
#Note that loading a library does not lead to an output
library(ggplot2)
```

------------------------------------------------------------------------

## 2.3. Importing Data

One of the reasons we're working through the coding basics here is of course that you will work with actual data. To do that, you will need to import data into R. With every exercise, we will provide you with one or more data sets. These data sets will mostly come as \*.csv files (which stands for comma-separated values). They are essentially text files containing data tables, and you can also open these files in Excel or other programs. To import data, we will use the `read.csv()` function. In the code chunk below, you can import a simple test data set ("test_data.csv") that includes the variables sex, length, and mass for a population of an animal. Note that the `fileEncoding` argument simply indicates that I generated the input files on a Mac, which will prevent some import issues for those of you that use a PC.

```{r}
#The line of code simply prompts the computer to read the "test_data.csv" file and generate a data.frame called test.data
test.data <- read.csv("Desktop/BIOL520-ex1 2/test_data.csv", fileEncoding = 'UTF-8-BOM')
```

If this worked correctly, you should now see this data set as `test.data` in your global environment (top right panel). You can double click it to view it. There should be three columns: sex, length, and mass.

------------------------------------------------------------------------

# 3. Making Figures

A key learning objective of this course is that you learn to visualize data in different ways to facilitate data interpretation in the context of different evolutionary hypotheses. In the following sections, I will explain step by step (that is code line by code line) how to make a simple graph with our sample data set. Let's aim to make a scatter plot showing the relationship between length and mass in our species. The process is not much different than sketching a graph by hand and layering different parts of the graph on top of each other, just that you use words (code) to make the computer draw.

------------------------------------------------------------------------

## 3.1. Define the Axes and Coordinate System

The first step of making any graph is to define the axes and establish the coordinate grid that allows for the plotting of the data. You can do this by calling the `ggplot()` function within which you first need to specify the data source (in our case the data frame we just created, called `test.data`) and then the so called aesthetics---`aes()`---that contain information about what variables define the x and y axes. In practice, this is accomplished with the following line of code:

```{r}
#This line of code calls for the ggplot function (a plotting function) and make a grid based on the test.data data frame, using length as the x axis and mass as the y axis
ggplot(test_data, aes(x=length, y=mass))
```

------------------------------------------------------------------------

## 3.2. Adding a Layer with Data Points

The second step is to draw the data into the established coordinate system. To do so, you just need to tell the program what kind of graph you want to draw. Different graph types in `ggplot2` are referred to as geoms (geometries), and a scatter plot is designated as `geom_point`. You can just add that to your existing code with a plus sign. For an overview of some of the graph types (geoms) `ggplot2` offers, check the [appendix](https://www.k-state.edu/biology/p2e/graph-library.html) of our textbook.

```{r}
ggplot(test_data, aes(x=length, y=mass)) +
  geom_point()
```

------------------------------------------------------------------------

## 3.3. Adding a Trendline

Whenever we look at the relationship between two variables, we may want to add a trendline. You can add a trendline by adding the `geom_smooth()` function to your existing code, and `method="lm"` designates that your trendline should be linear. The `se` argument designates whether or not you want to draw an error estimate around your trendline.

```{r message=FALSE}
#The code within the brackets of the geom_smooth command specified some additional options, namely that we want to draw a straight line (method="lm") and that we do not want to show the confidence interval (se=FALSE). Set the se=TRUE and see what happens.
ggplot(test_data, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)
```

------------------------------------------------------------------------

## 3.4. Changing the Axes Labels

The variable names in the data set do not always provide the clearest description of what a variable means. We can modify the x and y axis labels using the `xlab()` and `ylab()` functions, respectively. The actual titles need to be written within quotation marks:

```{r}
#Simply add the new label text in quotation marks
ggplot(test_data, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg")
```

------------------------------------------------------------------------

## 3.5. Change the Theme

I honestly hate the default theme of `ggplot` with its gray background. But you can quickly alter the look of the graph by switching to a number of other possible themes. I personally like the `theme_classic()`, but you can customize the look of your graph with themes listed [here](https://www.datanovia.com/en/blog/ggplot-themes-gallery/#basic-ggplot).

```{r}
ggplot(test_data, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg") +
  theme_linedraw()
```

Et voilà! You got yourself a perfectly good graph! As you exercise building graphs throughout the semester, make sure to check the "Practical Skills" sections of individual chapters refer to the appendix of the book as needed.

To get additional advice on how to work with different color schemes in `gglot()`, including the use of colorblind-friendly palettes, please check the [corresponding textbook section](https://www.k-state.edu/biology/p2e/evidence-for-evolution.html#graphing-data).

------------------------------------------------------------------------

# 4. Your First Data Set: Darwin's Finches

One of the most iconic study systems in evolutionary biology are Darwin's finches on the Galapagos Islands. Rosemary and Peter Grant spent much of their lives devoted to the study of these bird, examining how their traits change in response to major ecological perturbations. To do so, they collected a massive, long-term data set on different traits of the medium ground finch (*Geospiza fortis*) population on Daphne Major Island. For this exercise, we will take a look at their beak size data from 1972-1994.

![](finch.jpg)

------------------------------------------------------------------------

## 4.1. Import data

The beak size data can be found in file called "finches.csv". The file includes three variables: year, the average relative beak size (rel.beak.size), and the standard error (st.err) that describes the variability of beak size in any given year.

```{r}
finches <- read.csv("Desktop/BIOL520-ex1 2/finches.csv", fileEncoding = "UTF-8-BOM')
```

------------------------------------------------------------------------

## 4.2. Plotting the Data

The following code chunk provides the base code to make a scatter plot as above. You will only have to specify the x and y variables and label the axes correctly.

```{r}
ggplot(finches, aes(x=year, y=rel.beak.size)) +
  geom_point() +
  xlab("year") +
  ylab("rel.beak.size") +
  theme_classic()
```

------------------------------------------------------------------------

## 4.3. Adding Additional Graphical Elements

There are two graphical elements that we can add to facilitate the interpretation of the data:

1.  Since this is a time series, it makes sense to connect the dots representing the means from year to year. You can do this by simply adding another geom: `geom_line()`.
2.  We want to know how much the average beak size changes relative to the variability in the population. If variability is high, year to year variation in may be negligible. But if variability is low, changes across year may actually be substantial. You can do this by adding another geom: `geom_errorbar()`. Make sure to specify the x and y axes variables as above

```{r}
ggplot(finches, aes(x=year, y=rel.beak.size)) +
  geom_point() +
  geom_line() +
  geom_errorbar(aes(ymin=rel.beak.size-st.err, ymax=rel.beak.size+st.err))  +
  xlab("year") +
  ylab("rel.beak.size") +
  theme_classic()
```

------------------------------------------------------------------------

## 4.4. Interpretation

------------------------------------------------------------------------

### 4.4.1. General patterns

Based on the graphs you just made, what do you observe? How do you interpret the data if I told you that 1977 was a massive drought year?

The beaks were the smallest in 1975, then after 1977 the beak sizes stayed pretty consistent. If 1977 was a massive drought year, the average relative beak size must have increased because they were eating things that would provide them with water. Because this was outside of their normal diet, it could have increased their beak size.

------------------------------------------------------------------------

### 4.4.2. Evolution... or Not?

Do you think these data reflect evolutionary change through time? What is a potential alternative explanation? What additional information would you need to either accept or reject the hypothesis that these patterns reflect evolutionary change?

The data could also reflect a parent generation passing down their beak size that they generated during their lifetime. The relative beak size change is seems rapid in the graph, so that could be an explanation. To prove that it is evolutionary change, I would want to see the relative beak size of different generations.

------------------------------------------------------------------------

# 5. Resources

------------------------------------------------------------------------

## 5.1. Data References

Data on beak size variation in Darwin's finches came from the following publication:

-   Grant, PR & BR Grant. 2002. [Unpredictable evolution in a 30-year study of Darwin's finches](https://science.sciencemag.org/content/296/5568/707). *Science* 296, 707-711.

------------------------------------------------------------------------

## 5.2 Resources You Consulted

Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.

I have not used any additional resources
