Learning Objectives:

In this session students will learn the basics of the R programming environment and how to work with data sets through an experimental design case study.

  • How to install a package: install.packages()
  • How to call a library: library()
  • How to use a data set built into R: data()
  • Where to find help with R functions and object: help()
  • How to look at the data: head(), tail(), View()
  • What is the structure of the data: str()
  • How to create and assign variables: <-
  • What are some basic built in functions in R that can be performed on vectors: sum(), mean(), length()

1. Installing Packages

Since R is open sources, you can have access to packages previously written by other scholars. Packages often contain functions and even data sets. You should only need to install a package once.

## STEP 1: Install Packages
#install.packages("tidyverse")

Note the use of parentheses in the install.packages() command.

Observe that the pound sign # is used to “comment out” the code following it. This effectively deactivates the code so that it does not run. When using R Markdown, you must comment out the install.packages() command before knitting or else it will throw an error.

2. Calling Libraries

If you wish to use the functions in a particular library, you must call that library in each R session.

## STEP 2: Calling the library
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Comment: The tidyverse is a package of packages that work together and have been specially built for data analysis and visualization

3. Calling A Built In Data Set

## STEP 3: Data sets in the Environment
data("OrchardSprays")

4. Learn About the Data Set

## STEP 4: Learning about the data
help("OrchardSprays")
?OrchardSprays
What does the experiment design look like?
### What does the experiment look like?
ggplot(data=OrchardSprays, aes(x=colpos, y=rowpos, fill=treatment))+
  geom_tile(color="black")  #black outlines

5. Looking at the Data

A. head

Shows the first six rows of a data frame.

head(OrchardSprays)
##   decrease rowpos colpos treatment
## 1       57      1      1         D
## 2       95      2      1         E
## 3        8      3      1         B
## 4       69      4      1         H
## 5       92      5      1         G
## 6       90      6      1         F

B. tail

Show the last six rows of a data frame.

tail(OrchardSprays)
##    decrease rowpos colpos treatment
## 59       39      3      8         D
## 60       14      4      8         B
## 61       86      5      8         H
## 62       55      6      8         E
## 63        3      7      8         A
## 64       19      8      8         C

C. View

Creates a new window to view the data frame.

#View(OrchardSprays)

6. Structure of an R Object

## STEP 6: Data structure
str(OrchardSprays)
## 'data.frame':    64 obs. of  4 variables:
##  $ decrease : num  57 95 8 69 92 90 15 2 84 6 ...
##  $ rowpos   : num  1 2 3 4 5 6 7 8 1 2 ...
##  $ colpos   : num  1 1 1 1 1 1 1 1 2 2 ...
##  $ treatment: Factor w/ 8 levels "A","B","C","D",..: 4 5 2 8 7 6 3 1 3 2 ...

7. Base R Graphics

We can create graphics in base R to visualize distributions of data across the treatment groups. What kinds of patterns do you see?

For now, don’t worry about all the mathematical details. We will talk about it later.

## STEP 7: Graphics in base R
boxplot(decrease~treatment, data = OrchardSprays)

BONUS: Re-ordering Factors

By default, R will order factors alphabetically, but we can specify any order we’d like. In this case, we might want to order it by the amount of chemical used (from least to most).

## BONUS: Reorder factors
### Is this the order we want?
OrchardSprays$treatment <- factor(OrchardSprays$treatment, 
                                  levels=c('H', 'G', 'F', 'E', 
                                           'D', 'C', 'B', 'A'))

## Plot again
boxplot(decrease~treatment, data = OrchardSprays)

8. Variable Assignment

We might want to save a column from our data set to work with later. Define the variable response in the Global Environment. We can either use a <- or = to store these values. In addition, we use the $ operator to access the column for decrease within the OrchardSpray data frame.

## STEP 8: Variable Assignment and $ Operator
response<-OrchardSprays$decrease

9. Vectors

Vectors are one dimensional arrays.

## STEP 9: Vectors
n<-length(response)
n
## [1] 64

10. Common Functions

There are several functions that are already built into R, such as sum, mean, sd, ect.

## STEP 10: Common functions
## how much solution was consumed in the experiment?
s<-sum(response)
s
## [1] 2907

11. Operations on Variables

## STEP 11: Using variables in operations
## what is the average amount of solution consumed?
s/n
## [1] 45.42188
# verify
mean(response)
## [1] 45.42188