BIOL 204L Spring 2026
Homework 2: R Packages and Introduction to Data Frames
There is an R script associated with this homework in your section’s shared Posit Cloud Workspace. Please open this R script and as you go through the following material, complete the following exercises. The background information and examples include code that you can copy into your own R script. Then there is an exercise for you to complete by writing your own code. Make sure to #comment all of your code.
What are Packages?
Last week, we went through some R basics on how to create objects and use functions. Because R is an open-source software, anyone can create their own functions and publish them in “packages.” There are a TON of excellent packages that can be used for very specific research purposes. For this class, we will focus on those that help with organizing/analyzing data (dplyr) and creating publication quality figures (ggplot2).
Installing and Loading Packages: Step 1
Before you can use the functions stored in a package, the package needs to be installed in your Posit Cloud account workspace. This can be done by either using the install.packages() function or manually installing. Let’s start by installing the ggplot2 package into your workspace.
OR
install.packages(ggplot2) #install ggplot2 packageOnce you have a package installed, you need to load it in order to access the functions within it. This is done with the following code:
library(ggplot2) #load ggplot2 packageYou will want to load the appropriate packages at the very start of your R script after the assignment title and your name, so that all of the code below that can utilize the functions in that package. Order is very important in R.
What are Data Frames?
A data frame is a two-dimensional, tabular data structure that is very similar to a spreadsheet in Google Sheets. All of the data that we upload into Posit Cloud from lab will be stored in a data frame. For this assisgnment, we will be using one fo the data sets already built into R. During lab, we will be learning to upload data we collect as a data frame.
Loading and Exploring the Iris Data Set: Step 2
The iris data set includes measurements of 50 flowers from three species of iris (Anderson, 1935; Fisher, 1936). Create a data frame to store the iris data set with the following code.
mydata = as.data.frame("iris") You should notice that your data frame (mydata in this example) appear in your environment window. You can see this data set by clicking on the data frame itself
OR
using the view() function.
view(mydata) #view data frame storing iris data setWe worked with vectors last week in lab. Remember that they are simply just a list of data that is all the same type. We can essentially treat each column of a data frame in R as a vector. To check what type of data is stored in each column, you can use the str() function.
str(mydata) #view structure of data frame storing iris data setNotice when you view the structure of the data frame that there is a “$” symbol in front of each of the column names. That symbol indicates to R that you are looking at a specific column of a data frame. We can extract or manipulate specific columns by using this “$” symbol. This can be useful if, for example, we want to calculate the mean of just sepal length.
mean(mydata$Sepal.Length) #calculate the mean sepal lengthCreating a Basic Plot in ggplot2: Step 3
ggplot2 is a widely used package to generate publication worthy figures. You can make bar charts, scatter plots, box plots and more in ggplot2. There are also many ways to customize your figures. A general overview of this package with links to resources can be found here.
The general format for using the ggplot() function is as follows:
For this assignment, we will provide you some code for generating simple plots. This is the most simple version of a plot.
ggplot(mydata, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point() #create scatter plot of sepal length v. sepal widthThen we can also add more elements to customize the plot. You always need to include a “+” symbol between each element that you add or change. Below we are changing the x and y axis labels and the overall theme of the graph to be:
theme_classic()-
A classic-looking theme, with x and y axis lines and no gridlines.
ggplot(mydata, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point() + #create scatter plot of sepal length v. sepal width
theme_classic() + #format figure to remove background and gridlines
xlab("Sepal Length (cm)") + #change x axis label
ylab("Sepal Width (cm)") #change y axis labelThere are MANY ways to customize figures made with ggplot2 functions, for example, you can set the color to be based on a categorical variable like species in the iris data set by including code within the aes() argument.
ggplot(mydata, aes(x=Sepal.Length, y=Sepal.Width, colour = Species)) +
geom_point() + #create scatter plot of sepal length v. sepal width
theme_classic() + #format figure to remove background and gridlines
xlab("Sepal Length (cm)") + #change x axis label
ylab("Sepal Width (cm)") #change y axis labelWe will learn more about figure customization in later labs.
Before Lab Assignment
Make sure you have executed all of the above code in your R script (do NOT delete this) then create two more scatter plots using data from the iris data set, which should already be stored in a data frame in your environment. For example, you could make a scatter plot of sepal length vs. petal length. Make sure your x and y axes are labeled appropriately and you use #comments! All units are in centimeters. Hint: Once you have code, you can copy it and change the data it references.
Extra Credit: One extra credit point will be awarded if you explore ggplot2 resources and add one more customization to your plot. You need to list the source of where you learned the code and explain in a comment what it does.