One-Way ANOVA in R Tutorial

Today we will be conducting an ANOVA in R. R is a statistical software used frequently by biologists. This tutorial assumes no prior knowledge in R.

ANOVA Background: ANOVA stands for Analysis of Variance. This is a statistical test that compares the means of different groups. We will be conducting a one-way ANOVA, which means we have one factor. A factor is a categorical variable we’ll be evaluating. Essentially, a one-way ANOVA is a t-test with more than 2 groups.

Tutorial dataset: This practice dataset is evaluating the effect of fertilizer type on plant growth.

Independent Variable: Fertlizer Type (A, B, C, Control)
Dependent Variable: Plant Height (in cm)
Sample Size: 10 plants per fertilizer treatment

You should run through the tutorial with the practice dataset first. Make sure all outputs are the same. Once you’re successful, modify the code for your own dataset!

Find the practice dataset on D2L.

Downloading R:

STEP 1: The first thing we need to do is download the R software.

Windows: On your personal computer, visit this link https://cran.r-project.org/bin/windows/base/ to download the latest version of R.

Mac: On your personal computer, visit this link https://cran.r-project.org/bin/macosx/ to download the latest version of R.

If you open R, if will look like this. But, this is only a console.

R Console

Step 2: Unless you’re familiar with command lines, R is going to be difficult to work in. Which is why you will download RStudio. RStudio is in integrated development environment (IDE) for R. Follow this link to locate the download links for Windows and Mac. https://posit.co/download/rstudio-desktop/

When you open RStudio, it should look like this (but in a different color).

RStudio

You’ll notice that the program opens with 4 boxes.

Source is where you write, edit and save R scripts
Console if where R interacts with your commands
Environment Pane is where R Objects are stored that are created during your session
Output is where you’ll see plots and tables

Getting the data:

Now that you have RStudio downloaded, you need to input your data from Excel into R.

Step 1: Create a new R script. File –> New File –> R Script

R Script

Step 2: Set your working directory. This tells R what folder you’re working out of in your computer. Navigate to Session –> Set Working Directory –> Choose Directory, then navigate to the folder your csv file is in.

setwd
Now, look down at the console. setwd means set working directory. The text in quotes is the pathway to the folder. Yours will look different!

setwd

Tips and Hints: You can always check your current working directory by typing getwd() into your console! This is helpful if you move about directories and get lost.

setwd("C:/Users/rache/OneDrive - University of North Georgia/Course/BotanyAnova")

Alternatively, if you are comfortable with paths, you can set your working directly manually by typing setwd followed by the absolute pathway to the folder you will work out of.

Step 3. Import csv containing your data. There are 3 things you need to import the csv file.

The name of the file
The command read.csv
The dataframe you’ll store the data as. You get to name this. I’ll name mine “File”

File <- read.csv("practicedata.csv")

This piece of code is telling R to read the csv named practicedata and store it as a dataframe named File. Note that the name you store your data can be anything, but it’s best practice to make the name something meaningful.

Hit Run

Run

Once we hit run, that code goes to the console and R does its thing; it does the action we specified. How do we know it worked? Well, take a look at the environment box, there’s something new there.

Under Data there is a row that says ” File 40 obs. of 2 variables” What does that mean?

Well, File is what we named our dataframe. Then, if you look at our csv file, we have 40 datapoints. There are 2 columns in the data (varaibles); the fertilizer type (A, B, C, or Control) and the plant height.

Let’s make sure our dataframe imported correctly. There are a few ways we can do this!

Option 1: Double click on the data in the environment. When we do that, a tab opens in our source box. Look! It’s our data!

Option 1
- Option 2: Type the name of your dataframe in your console and hit enter.

File

##    Fertilizer Plant_Height_cm
## 1           A            15.2
## 2           A            16.8
## 3           A            14.5
## 4           A            17.0
## 5           A            15.5
## 6           A            16.2
## 7           A            15.9
## 8           A            14.8
## 9           A            16.1
## 10          A            15.7
## 11          B            18.1
## 12          B            19.3
## 13          B            17.8
## 14          B            18.5
## 15          B            19.0
## 16          B            18.6
## 17          B            17.9
## 18          B            18.2
## 19          B            19.4
## 20          B            18.7
## 21          C            10.5
## 22          C            11.0
## 23          C            10.0
## 24          C            11.3
## 25          C            10.8
## 26          C            10.7
## 27          C            11.1
## 28          C            10.4
## 29          C            11.0
## 30          C            10.6
## 31    Control            10.5
## 32    Control            11.0
## 33    Control            10.2
## 34    Control            11.3
## 35    Control            10.8
## 36    Control            10.9
## 37    Control            11.1
## 38    Control            10.4
## 39    Control            11.2
## 40    Control            10.7

Both Option 1 and Option 2 give you all the contents of your dataframe. This works for small datasets, but if you have a large dataset, you likely don’t want to use these options.

Instead, you can look at the first few lines of data with Option 3.

Option 3: Use the Head() function. I can write head(File). This returns the first few rows of your dataset. Remember, this needs to be the name of YOUR dataframe within the ().

head(File)

##   Fertilizer Plant_Height_cm
## 1          A            15.2
## 2          A            16.8
## 3          A            14.5
## 4          A            17.0
## 5          A            15.5
## 6          A            16.2

Performing the ANOVA

Step 1. The below bit of code runs an ANOVA in R. Let’s break it down.

aov.output <- aov(Plant_Height_cm ~ Fertilizer, data = File)

aov() This is the function for calling an ANOVA in R.
(Plant_Height_cm ~ Fertilizer This is telling R to run the ANOVA using the columns named “Plant_Height_cm” and “Fertilizer”. Remember, we’re looking at the effect of fertilizer type on plant height! How would you set this up for your dataset?
Note, that you’ll have to type the names EXACTLY the way you have them in your csv file. Don’t use spaces, use _ instead.
data=File) This is telling the ANOVA function where to pull the data from!
aov.output This is where we are putting the results of the ANOVA

aov.output <- aov(Plant_Height_cm ~ Fertilizer, data = File)

Okay…..Where are the results of our ANOVA?!

Step 2: To view the results of our ANOVA, we need to call our ANOVA table up. We can do this with this line of code.

summary(aov.output)

This table pops up in our console.

##             Df Sum Sq Mean Sq F value Pr(>F)    
## Fertilizer   3  446.3  148.78     480 <2e-16 ***
## Residuals   36   11.2    0.31                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What information do we need? Do you see the column labeled Pr(>F), that’s our p value! The asterisks beside it are our significance values, as shown at the bottom of the table.

For Fertilizer our p value is less than 2e-16, in other words, our p value is very close to 0. Do you remember what a small p value means?

That’s right! A small p value means we reject our null. Fertilizer does have an effect on plant height!

But wait…we had a control and 3 fertilizer treatments….Do…all the treatments have an effect?

We don’t know. The ANOVA can only tell us if there is an effect or fertilizer, not where that effect is.

We need another test!

Tukey Honest Significance Test

If our p value is significant, we can run a Tukey test to determine where the differences are. A Tukey test is a type of Post Hoc analysis (this means you run it after your ANOVA) that compares pairwise differences among sample means.

TukeyHSD(aov.out)

TukeyHSD This is the function for a Tukey Test
(aov.out) This is what you stored your ANOVA results as in the previous steps.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Plant_Height_cm ~ Fertilizer, data = File)
## 
## $Fertilizer
##            diff        lwr        upr     p adj
## B-A        2.78  2.1094219  3.4505781 0.0000000
## C-A       -5.03 -5.7005781 -4.3594219 0.0000000
## Control-A -4.96 -5.6305781 -4.2894219 0.0000000
## C-B       -7.81 -8.4805781 -7.1394219 0.0000000
## Control-B -7.74 -8.4105781 -7.0694219 0.0000000
## Control-C  0.07 -0.6005781  0.7405781 0.9921101

And like that, we’ve completed an ANOVA with only a few lines of code!

File<-read.csv("practicedata.csv")
aov.output <- aov(Plant_Height_cm ~ Fertilizer, data=File)
summary(aov.output)
TukeyHSD(aov.output)

Your Turn!

Now, use your Photosynthesis data to conduct an ANOVA, and if needed, a Tukey test in R. Use the practice csv sheet to help you set up your data before importing it into R. Remember, to save the excel spreadsheet as a .csv, and not a .xlsx (standard excel format). In addition, avoid typing in “10”,”20”,”30” for temperature. The reason for this is the ANOVA test will not appreciate it, as it will read these as integers rather than factors. This will cause the test not to work. We can code the numbers as factors (categories), but that’s an additional level of complication. Instead, let’s write “Ten”, “Twenty” and “Thirty” as a workaround.

For a challenge, you can try using as.factor() after importing your data (if you use ‘10’, ‘20’, ‘30’) to convert your temperature variable to a factor from an integer.

It would look something like this File$Temperature <- as.factor(File$Temperature)

This line of code selects the column called “Temperature” within “File” and converts it to a factor.

Botany ANOVA

Dr. Perez-Udell

2025-03-07