In this session students will learn the basics of the R programming environment and how to work with data sets through an experimental design case study.
install.packages()
library()
data()
help()
head()
, tail()
,
View()
str()
<-
sum()
, mean()
,
length()
Since R is open sources, you can have access to packages previously written by other scholars. Packages often contain functions and even data sets. You should only need to install a package once.
## STEP 1: Install Packages
#install.packages("tidyverse")
Note the use of parentheses in the
install.packages()
command.
Observe that the pound sign #
is used to “comment out”
the code following it. This effectively deactivates the code so that it
does not run. When using R Markdown, you must comment out the
install.packages()
command before knitting or else it will
throw an error.
If you wish to use the functions in a particular library, you must call that library in each R session.
## STEP 2: Calling the library
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Comment: The tidyverse
is a package of packages that
work together and have been specially built for data analysis and
visualization
## STEP 3: Data sets in the Environment
data("OrchardSprays")
## STEP 4: Learning about the data
help("OrchardSprays")
?OrchardSprays
### What does the experiment look like?
ggplot(data=OrchardSprays, aes(x=colpos, y=rowpos, fill=treatment))+
geom_tile(color="black") #black outlines
head
Shows the first six rows of a data frame.
head(OrchardSprays)
## decrease rowpos colpos treatment
## 1 57 1 1 D
## 2 95 2 1 E
## 3 8 3 1 B
## 4 69 4 1 H
## 5 92 5 1 G
## 6 90 6 1 F
tail
Show the last six rows of a data frame.
tail(OrchardSprays)
## decrease rowpos colpos treatment
## 59 39 3 8 D
## 60 14 4 8 B
## 61 86 5 8 H
## 62 55 6 8 E
## 63 3 7 8 A
## 64 19 8 8 C
View
Creates a new window to view the data frame.
#View(OrchardSprays)
## STEP 6: Data structure
str(OrchardSprays)
## 'data.frame': 64 obs. of 4 variables:
## $ decrease : num 57 95 8 69 92 90 15 2 84 6 ...
## $ rowpos : num 1 2 3 4 5 6 7 8 1 2 ...
## $ colpos : num 1 1 1 1 1 1 1 1 2 2 ...
## $ treatment: Factor w/ 8 levels "A","B","C","D",..: 4 5 2 8 7 6 3 1 3 2 ...
We can create graphics in base R to visualize distributions of data across the treatment groups. What kinds of patterns do you see?
For now, don’t worry about all the mathematical details. We will talk about it later.
## STEP 7: Graphics in base R
boxplot(decrease~treatment, data = OrchardSprays)
By default, R will order factors alphabetically, but we can specify any order we’d like. In this case, we might want to order it by the amount of chemical used (from least to most).
## BONUS: Reorder factors
### Is this the order we want?
OrchardSprays$treatment <- factor(OrchardSprays$treatment,
levels=c('H', 'G', 'F', 'E',
'D', 'C', 'B', 'A'))
## Plot again
boxplot(decrease~treatment, data = OrchardSprays)
We might want to save a column from our data set to work with later.
Define the variable response
in the Global Environment. We
can either use a <-
or =
to store these
values. In addition, we use the $
operator to access the
column for decrease
within the OrchardSpray
data frame.
## STEP 8: Variable Assignment and $ Operator
response<-OrchardSprays$decrease
Vectors are one dimensional arrays.
## STEP 9: Vectors
n<-length(response)
n
## [1] 64
There are several functions that are already built into R, such as
sum
, mean
, sd
, ect.
## STEP 10: Common functions
## how much solution was consumed in the experiment?
s<-sum(response)
s
## [1] 2907
## STEP 11: Using variables in operations
## what is the average amount of solution consumed?
s/n
## [1] 45.42188
# verify
mean(response)
## [1] 45.42188