Introduction

Using this cookbook

Change this setting: Tools > Global Options > Code Editing > Soft-wrap R source files

The R code for specific tasks will appear in a box like this that mimics an R script.

The scripts are annotated in plain text to help you along. The following scripts refer to object names that you can or need to edit in CAPS (e.g. INDEPENDENT.VARIABLE, DEPENDENT.VARIABLE, etc.) and these are indicated after each section. All other script text affects functionality of what is performed and can of course be edited if you know what you are doing.
Note that two-parted names including a “.” (e.g. my.name) need not be two-parted and you can edit them as desired (e.g. “myname”). Object names are case sensitive.

In addition to showing the R code, some sections in this cookbook contain examples.

## [1] "outputs will be shown like this, as it would appear in the R console when run."

Example data

The examples shown use ‘iris’ dataset that is provided by R. If you wish to follow along with the examples, everything can be reproduced by copy/pasting & running the provided code. To access the “iris” dataset in R, simply run:

data(iris)

This dataset

How to run code in R:

Open new R script (File > New File > R Script). This will appear in your ‘source’ panel.
Type code into R script.
Highlight code & press ‘command’ + ‘return’ to run
Code will appear in R ‘console’ with outputs if appropriate
Note: Code can be typed directly into the R console and run by pressing ‘return’. However, for reproducibility and editing purposes, it is important to document your work using an R script file. This will save you a lot of time and make things easier down the line.

To test this, type any equation in your script and run to see the solution.
For example:

7*9/(3+4)^2-8

## [1] -6.714286

Getting Started

Set your working directory

define where R will look for and put files.

setwd("C:/Users/YOURNETID/Desktop/")

Replace “YOURNETID” with your case-sensitive net id
PC vs Mac: “C:” Assumes you are on a PC. For Mac, delete “C:”.

To determine your current working directory

getwd()

## [1] "/Users/colleennell/Dropbox/Projects/R_ed"

Importing data

csv:

DF <- read.csv(file="YOURDATA.csv", header=TRUE, sep=",", na.strings = "NA")

txt:

DF <- read.table(file="YOURDATA.txt", header=TRUE, sep=",", na.strings = "NA")

“YOURDATA” must be located in your working directory. You can also provide the absolute path (working directory + filename) as the file location.
**Note that ‘<-’ is used to assign your data file to ‘DF’. Now, ‘DF’ contains your data and will need to be called on to work with it.

View your data

Print in console:

DF

Open dataframe in new window:

View(DF)

Print the first few rows (head) or last few rows (tail) of your data. For example, using the ‘iris’ dataset:

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

tail(iris)

##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica

Look at the structure (str) of your data:

str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

‘num’ after Sepal.Length indicates that the variable is numeric. Other types include ‘int’ (integer), ‘factor’ (categorical factor), ‘ord’ (ordered) and ‘chr’ (character). This output also tells us that the dataframe contains 150 observations (rows) of 5 variables (columns).

Installing Packages

R provides many built-in functions that allow you to explore & plot your data. The majority of what is covered in this cookbook utilizes these base R tools. In addition, there is a diversity of R packages that can be downloaded and implemented in R to increase its’ power and functionality.

Download packages to your computer:

install.packages("PACKAGE.NAME")

Load them for use:

library(PACKAGE.NAME)

To use a package, it only needs to be loaded once per session.

Two very useful packages for data manipulation and plotting:

install.packages("dplyr") #data manipulation
install.packages("ggplot2") #plotting

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.5

library(ggplot2)

Working with your data

Referencing your data

To be able to work with your data you need to be able to refer to your variables. There are several ways to call on a variable within your dataframe.

Using $ to select columns

DF$VARIABLE

Example:

head(iris$Sepal.Width)

## [1] 3.5 3.0 3.2 3.1 3.6 3.9

Selects the ‘Sepal.Width’ variable from the ‘iris’ dataframe.

Selecting columns and/or rows by location in the dataframe

DF[ROW,COL]

Replace ROW and COL with numbers indicating the row and column of interest within the dataframe

Column example:

head(iris[,2])

## [1] 3.5 3.0 3.2 3.1 3.6 3.9

As shown by the output, this is the same as iris$Sepal.Width

Row example:

iris[2,]

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 2          4.9           3          1.4         0.2  setosa

This prints the second row of the iris dataframe.

Providing values for ROW and COL returns the corresponding cell value:

iris[2,2]

## [1] 3

This is the second value in the 2nd row.

You can reference multiple rows and/or columns simultaneously using the ‘:’ operator. ‘:’ = ‘to’, such that

DF[,1:4]

selects columns 1 through 4 from the dataframe. Or listed using ‘c()’

Example:

iris[1:5,c(2,4,5)]

##   Sepal.Width Petal.Width Species
## 1         3.5         0.2  setosa
## 2         3.0         0.2  setosa
## 3         3.2         0.2  setosa
## 4         3.1         0.2  setosa
## 5         3.6         0.2  setosa

This outputs rows 1 through 5 from columns 2, 4, and 5 of the iris dataframe.

Using variable names

Rather than providing the column number, column names can also be used in place of COLNAME (when using names they must be in " “).

DF[,"COLNAME"]

or for multiple columns list the column names in ‘c()’

DF[,c("COLNAME1","COLNAME2")]

Example:

head(iris[,"Petal.Width"])

## [1] 0.2 0.2 0.2 0.2 0.2 0.4

names can be combined with row numbers as well:

iris[1:5,c("Sepal.Width","Species")]

##   Sepal.Width Species
## 1         3.5  setosa
## 2         3.0  setosa
## 3         3.2  setosa
## 4         3.1  setosa
## 5         3.6  setosa

Summarizing Data

Describe your data

Calculate summary statistics on your data frame for each level of a factor.
The resulting data frame calculates statistics for each treatment.

Calculate the mean, standard deviation, and n

In base R there are built in functions to calculate the mean, standard deviation, and n

DATA.SUMMARY <- data.frame(
  treatment=levels(rdata$location),
  mean=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, mean),
  n=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, length),
  sd=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, sd)
)

For example, using the ‘iris dataset where our treatment is ’Species’ and our dependent variable of interest is ‘Sepal.Length’:

iris.summary <- data.frame(
  treatment=levels(iris$Species),
  mean=tapply(iris$Sepal.Width, iris$Species, mean),
  n=tapply(iris$Sepal.Width,iris$Species, length),
  sd=tapply(iris$Sepal.Width, iris$Species, sd)
)

Calculate standard error

There is not a built in function for standard error in base R.
To add standard error of the mean (SEM) & view summary statistics:

DATA.SUMMARY$sem <- DATA.SUMMARY$sd/sqrt(DATA.SUMMARY$n)
DATA.SUMMARY

Using ‘iris’ data:

iris.summary$sem<-iris.summary$sd/sqrt(iris.summary$n)
iris.summary

##             treatment  mean  n        sd        sem
## setosa         setosa 3.428 50 0.3790644 0.05360780
## versicolor versicolor 2.770 50 0.3137983 0.04437778
## virginica   virginica 2.974 50 0.3224966 0.04560791

Standard error can also be written as a function and included in the original code for the same result:

se<-function(x) sqrt(var(x)/length(x))

DATA.SUMMARY <- data.frame(
  treatment=levels(rdata$location),
  mean=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, mean),
  n=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, length),
  sd=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, sd),
  se=tapply(rdata$DEPENDENT.VARIABLE, rdata$location, se)
)

Plot means with error bars

Use summary statistics to make figure with mean and SE error bars Load the ggplot2 library if you have not already:

library(ggplot2)

ggplot(DATA.SUMMARY, aes(x = treatment, y = mean)) +
  geom_bar(position = position_dodge(), stat="identity", fill="grey") +
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.3) +
  labs("TITLE (MOST PLOTS DONT NEED ONE)",x="X AXIS", y="Y AXIS")+
  theme_bw() + theme(panel.grid.major = element_blank())

Make this plot using summary statistics for ‘iris$Sepal.Length’:

ggplot(iris.summary, aes(x = treatment, y = mean)) +
  geom_bar(position = position_dodge(), stat="identity", fill="grey") +
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),width=.3) +
  labs(" ",x="Species", y="Sepal length")+
  theme_bw() + theme(panel.grid.major = element_blank())

Exporting an object

data frame

write.csv(DATA.SUMMARY,"DATA.SUMMARY.csv")

Saves the R data frame named “DATA.SUMMARY” as a comma delimited file called “SUMMARY.STATS.csv” to wherever you have set your working directory.

ANOVAS

One-factor ANOVA

One discontinuous independent variable and a continuous dependent variable, also called one-way ANOVA.

RDATA$INDEPENDENT.VARIABLE<-as.factor(RDATA$INDEPENDENT.VARIABLE)

This defines your independent variable as being discontinuous i.e.categorical.

str(RDATA$INDEPENDENT.VARIABLE)

This displays the structure of the variable INDEPENDENT.VARIABLE to check that it is now a factor variable.

TEST.RESULTS<-aov(DEPENDENT.VARIABLE~INDEPENDENT.VARIABLE, data=RDATA)
TEST.RESULTS

Performs an ANOVA on data frame called “RDATA” with the defined dependent and independent variable and saves the output into the file called “TEST.RESULTS” and displays that object.

summary(TEST.RESULTS)

Provides summary of the ANOVA results which is slightly different from displaying the full results but you could do this too.

Example: One way Anova using ‘iris’ data
Does petal width differ by Species?

iris$Species<-as.factor(iris$Species)
str(iris$Species)

##  Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

test.results<-aov(Petal.Width~Species, data=iris)
test.results

## Call:
##    aov(formula = Petal.Width ~ Species, data = iris)
## 
## Terms:
##                  Species Residuals
## Sum of Squares  80.41333   6.15660
## Deg. of Freedom        2       147
## 
## Residual standard error: 0.20465
## Estimated effects may be unbalanced

summary(test.results)

##              Df Sum Sq Mean Sq F value Pr(>F)    
## Species       2  80.41   40.21     960 <2e-16 ***
## Residuals   147   6.16    0.04                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Boxplot

boxplot(DEPENDENT.VARIABLE~INDEPENDENT.VARIABLE, data=RDATA, main="TITLE", xlab="X AXIS LABEL", ylab="Y AXIS LABEL")

Make a boxplot of the results for each level of the independent variable. These plots have the median (center line), the 25th and 75th quartiles (outside of box), the extreem values (the ends of the wiskers) and outliers (individual circles). These figures are not appropriate in most cases because we want mean and standard error, but as a quick and dirty way of visaulizing your data these plots are nevertheless useful.

Tukey HSD test

TukeyHSD(TEST.RESULTS)

Conducts Tukey Honest Significant Difference test, a post-hoc tests that provides P value for each pairwise comparisons among levels of the independent variable. If your independent variable has 3 levels (e.g. small, medium and large), the P value for the for the ANOVA tells you at least two of these levels differ from each other, but you don’t know which two. This test gives you separate P values for every pair-wise comparison (e.g. small vs. medium, small vs. large, medium vs. large).

Tukey example using ‘iris’ data, testing for pairwise differences between treatment groups (Species), for petal width.

TukeyHSD(test.results)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Petal.Width ~ Species, data = iris)
## 
## $Species
##                      diff       lwr       upr p adj
## versicolor-setosa    1.08 0.9830903 1.1769097     0
## virginica-setosa     1.78 1.6830903 1.8769097     0
## virginica-versicolor 0.70 0.6030903 0.7969097     0

Two-factor ANOVA

Two discontinuous independent variables and a continuous dependent variable, also called two-way ANOVA.

RDATA$INDEPENDENT.VARIABLE1<-as.factor(RDATA$INDEPENDENT.VARIABLE1)
RDATA$INDEPENDENT.VARIABLE2<-as.factor(RDATA$INDEPENDENT.VARIABLE2)

This defines your independent variable as being discontinuous i.e.categorical.

str(RDATA$INDEPENDENT.VARIABLE1)
str(RDATA$INDEPENDENT.VARIABLE2)

This displays the structure of the variables INDEPENDENT.VARIABLE1 and INDEPENDENT.VARIABLE2 to check that they are now factor variables.

TEST.RESULTS<-aov(DEPENDENT.VARIABLE~INDEPENDENT.VARIABLE1+INDEPENDENT.VARIABLE2+INDEPENDENT.VARIABLE1*INDEPENDENT.VARIABLE2,data=RDATA)

Performs a two-factor ANOVA on data frame called “RDATA” with the defined dependent and two independent variables. This analysis includes the main effects of each independent variables as well as the interaction between them. It then saves the output into the file called " TEST.RESULTS“.

summary(TEST.RESULTS)

Provides summary of the two-factor ANOVA results which is slightly different from displaying the full results but you could do this too.

Boxplot

boxplot(DEPENDENT.VARIABLE ~ INDEPENDENT.VARIABLE1, data=RDATA, main="TITLE", xlab="X AXIS LABEL", ylab="Y AXIS LABEL")

Make a boxplot of the results for each level of the independent variable1. These plots have the median (center line), the 25th and 75th quartiles (outside of box), the extreem values (the ends of the wiskers) and outliers (individual circles). These figures are not appropriate in most cases because we want mean and standard error, but as a quick and dirty way of visaulizing your data these plots are nevertheless useful.

boxplot(DEPENDENT.VARIABLE ~ INDEPENDENT.VARIABLE2, data=RDATA, main="TITLE", xlab="X AXIS LABEL", ylab="Y AXIS LABEL")

Makes a boxplot of the results with means and errors for each level of independent variable2.These plots have the median (center line), the 25th and 75th quartiles (outside of box), the extreem values (the ends of the wiskers) and outliers (individual circles). These figures are not appropriate in most cases because we want mean and standard error, but as a quick and dirty way of visaulizing your data these plots are nevertheless useful.

boxplot(DEPENDENT.VARIABLE ~ INDEPENDENT.VARIABLE1*INDEPENDENT.VARIABLE2, data=RDATA, main="TITLE", xlab="X AXIS LABEL", ylab="Y AXIS LABEL")

Makes a boxplot of the results with means and errors for each level of independent variable1 and 2 in combination. These plots have the median (center line), the 25th and 75th quartiles (outside of box), the extreem values (the ends of the wiskers) and outliers (individual circles). These figures are not appropriate in most cases because we want mean and standard error, but as a quick and dirty way of visaulizing your data these plots are nevertheless useful.

Tukey HSD

TukeyHSD(TEST.RESULTS)

Conducts Tukey Honest Significant Difference test, a post-hoc tests that provides P value for each pairwise comparisons among all combinations of the levels of the two independent variable. If one of your independent variable has 3 levels (e.g. small, medium and large) and the other has 3 levels (e.g. slow, fast, extrafast), the P values for these factors for the for the ANOVA tells you at least two of these levels differ from each other, but you don’t know which two. This test gives you separate P values for every pair-wise comparison within each level of each factor (e.g. small vs. medium, small vs. large, medium vs. large) but also for pairwise combinations among factors (e.g. small-slow vs. small-fast, etc.).

ANCOVA

One discontinuous and one continuous independent variables and a continuous dependent variable.

RDATA$INDEPENDENT.VARIABLE.FACTOR<-as.factor(RDATA$INDEPENDENT.VARIABLE.FACTOR)

This defines INDEPENENT.VARIABLE.FACTOR as being discontinuous i.e. a factor variable.

str(RDATA$INDEPENDENT.VARIABLE.FACTOR)

This displays the structure of the variable INDEPENDENT.VARIABLE.FACTOR to check that it is now a factor variable.

RDATA$INDEPENDENT.VARIABLE.NUMERIC<-as.numeric(RDATA$INDEPENDENT.VARIABLE.NUMERIC)

This defines INDEPENDENT.VARIABLE.NUMERIC as being continuous i.e. numeric.

str(RDATA$INDEPENDENT.VARIABLE.NUMERIC)

This displays the structure of the variable INDEPENDENT.VARIABLE.NUMERIC to check that it is now a numeric variable.

TEST.RESULTS<-lm(DEPENDENT.VARIABLE ~ INDEPENDENT.VARIABLE.FACTOR + INDEPENDENT.VARIABLE.NUMERIC + INDEPENDENT.VARIABLE.FACTOR*INDEPENDENT.VARIABLE.NUMERIC, data=RDATA)

Performs an ANCOVA on data frame called “RDATA” with the defined dependent and two independent variables, one continuous and one discontinuous. This analysis includes the main effects of each independent variables as well as the interaction between them. It then saves the output into the object called " TEST.RESULTS“.

summary(TEST.RESULTS)

Provides summary of the ANCOVA results which is slightly different from displaying the full results but you could do this too.

Make a plot showing the two regression lines from the ANCOVA

plot(INDEPENDENT.VARIABLE.NUMERIC,DEPENDENT.VARIABLE,as.numeric(INDEPENDENT.VARIABLE.FACTOR),col=c("blue","red")[as.numeric(INDEPENDENT.VARIABLE.FACTOR)], data=RDATA, main="TITLE", xlab="X AXIS LABEL", ylab="Y AXIS LABEL")

This plots the levels of your discrete independent variable blue for the first alphabetically and red for the second alphabetically.

abline(lm(DEPENDENT.VARIABLE[INDEPENDENT.VARIABLE.FACTOR=="LEVEL1.INDEPENDENT.VARIABLE.FACTOR"]~INDEPENDENT.VARIABLE.NUMERIC[INDEPENDENT.VARIABLE.FACTOR=="LEVEL1.INDEPENDENT.VARIABLE.FACTOR"]),lty=2,col="blue")

Enter the actual value for the level of your discrete independent variable that is fist alphabetically (twice)

abline(lm(DEPENDENT.VARIABLE[INDEPENDENT.VARIABLE.FACTOR=="LEVEL2.INDEPENDENT.VARIABLE.FACTOR"]~INDEPENDENT.VARIABLE.NUMERIC[INDEPENDENT.VARIABLE.FACTOR=="LEVEL2.INDEPENDENT.VARIABLE.FACTOR"]),lty=2,col="red")

Enter the actual value for the level of your discrete independent variable that is second alphabetically (twice)

Simple regression analysis

One continuous independent variables and one continuous dependent variable.

RDATA$INDEPENDENT.VARIABLE<-as.numeric(RDATA$INDEPENDENT.VARIABLE)

This defines your independent variable as being discontinuous i.e.categorical.

str(RDATA$INDEPENDENT.VARIABLE)

This displays the structure of the variable INDEPENDENT.VARIABLE to check that it is now a factor variable.

TEST.RESULTS<-lm(DEPENDENT.VARIABLE ~ INDEPENDENT.VARIABLE, data=RDATA)
TEST.RESULTS

Correlation analysis

Two continuous variables where cause and effect are unclear.

Very similar to a simple regression.

RDATA$INDEPENDENT.VARIABLE1<-as.numeric(RDATA$INDEPENDENT.VARIABLE1)
str(RDATA$INDEPENDENT.VARIABLE1)

This defines your independent variable as being discontinuous i.e.categorical, and displays the structure of the variable INDEPENDENT.VARIABLE1 to check that it is now a factor variable.

RDATA$INDEPENDENT.VARIABLE2<-as.numeric(RDATA$INDEPENDENT.VARIABLE2)
str(RDATA$INDEPENDENT.VARIABLE2)

This defines your independent variable as being discontinuous i.e.categorical, and displays the structure of the variable INDEPENDENT.VARIABLE2 to check that it is now a factor variable.

TEST.RESULTS<-cor.test(RDATA$INDEPENDENT.VARIABLE1, RDATA$INDEPENDENT.VARIABLE2, method = c("pearson")) 
TEST.RESULTS

Plotting

plot(INDEPENDENT.VARIABLE1~INDEPENDENT.VARIABLE2, data=RDATA, main="YOUR TITLE", xlab="X AXIS LABEL", ylab="Y AXIS LAGEL")

To add the correlation line to the plot:

abline(lm(INDEPENDENT.VARIABLE1~INDEPENDENT.VARIABLE2, data=RDATA), col="red", lty=2)

ANOVA+ formulas

The types of variables you have and your experiemntal design are reflected in the formula written to model your data.

ANOVA

As shown above an ANOVA compares the fixed effects of 1 or more dependent variables (X1, X2, …), on your independent variable (Y).
For data frame ‘DF’:

aov(Y~ X1 + X2, data = DF)

This models the additive effects of X1 and X2. If X1 and X2 are crossed,
to test for an interaction between X1 & X1:

aov(Y~X1*X2, data= DF)

aov(Y~X1+X2+X1:X2, data= DF)

is the same, where we indidate we want the additive effects (X1 + X2) AND the interaction (X1:X2). Or simply (X1*X2) to include all combinations.

Nested factors

Random Effects

[Learn more about Formulae in R for ANOVA & mixed effects models] (http://conjugateprior.org/2013/01/formulae-in-r-anova/)

Plotting in R

To create more beautiful custom plots using R try the ‘ggplot2’ package.

Online R resources

www.rseek.org Google search for R only
http://www.statmethods.net –> Quick R, Basic stats for good help outside of R help
http://www.ats.ucla.edu/stat/r/
http://grad.bio.uci.edu/ecoevo/petryw/Introduction_to_R.html
Google is your friend. Searching “how to do ‘X’ in r” will almost always return helpful sites.

Data manipulation/wrangling

include dplyrcheatsheet dplyr tutorial

Writing formulas for in R:

Formulae for ANOVAs, mixed models, nested models
http://conjugateprior.org/2013/01/formulae-in-r-anova/

GGLPOT2

http://www.cookbook-r.com/Graphs/ –> Cookbook For R for GGPLOT2
https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf –> GGPLOT2 cheatsheet

Using dplyr

Calculate mean, sd, n, & se using dplyr

The ‘dplyr’ package is design to make data manipulation easy. ‘%>%’ is a pipe. This basically means apply the step following the pipe to the dataframe preceeding the pipe. Using this and some of the functions in dplyr data summarization and manipulation can be very easy.

DATA.SUMMARY<-iris%>%
  group_by(Species)%>%
  summarize_each(funs(mean,sd,se,length))
DATA.SUMMARY

## # A tibble: 3 × 17
##      Species Sepal.Length_mean Sepal.Width_mean Petal.Length_mean
##       <fctr>             <dbl>            <dbl>             <dbl>
## 1     setosa             5.006            3.428             1.462
## 2 versicolor             5.936            2.770             4.260
## 3  virginica             6.588            2.974             5.552
## # ... with 13 more variables: Petal.Width_mean <dbl>,
## #   Sepal.Length_sd <dbl>, Sepal.Width_sd <dbl>, Petal.Length_sd <dbl>,
## #   Petal.Width_sd <dbl>, Sepal.Length_se <dbl>, Sepal.Width_se <dbl>,
## #   Petal.Length_se <dbl>, Petal.Width_se <dbl>,
## #   Sepal.Length_length <int>, Sepal.Width_length <int>,
## #   Petal.Length_length <int>, Petal.Width_length <int>

E166 Field Biology R cookbook

Written by Colleen Nell & Kailen Mooney

last updated 2016-09-26

Introduction

Using this cookbook

Example data

How to run code in R:

Getting Started

Set your working directory

Importing data

View your data

Installing Packages

Working with your data

Referencing your data

Using $ to select columns

Selecting columns and/or rows by location in the dataframe

Using variable names

Summarizing Data

Describe your data

Calculate the mean, standard deviation, and n

Calculate standard error

Plot means with error bars

Exporting an object

data frame

ANOVAS

One-factor ANOVA

One discontinuous independent variable and a continuous dependent variable, also called one-way ANOVA.

Boxplot

Tukey HSD test

Two-factor ANOVA

Two discontinuous independent variables and a continuous dependent variable, also called two-way ANOVA.

Boxplot

Tukey HSD

ANCOVA

One discontinuous and one continuous independent variables and a continuous dependent variable.

Make a plot showing the two regression lines from the ANCOVA

Simple regression analysis

One continuous independent variables and one continuous dependent variable.

Correlation analysis

Two continuous variables where cause and effect are unclear.

Plotting

ANOVA+ formulas

ANOVA

Nested factors

Random Effects

Plotting in R

Online R resources

Data manipulation/wrangling

Writing formulas for in R:

GGLPOT2

Using dplyr

Calculate mean, sd, n, & se using dplyr