First, import your data using read.csv. In this case, we decided to call your data ‘sdata’.
sdata <- read.csv("/Users/Joey/Documents/Nutrient_Analysis/data/Book1.csv")
Then, load packages.
(remember that to install these packages we use install.packages(“packagename”), but you only need to do that once, once they are installed, you just need to load them from your library using the library() command.)
library(plyr)
library(ggplot2)
library(visreg)
suppressPackageStartupMessages(library(dplyr))
The next step is to look at your data…A nice first step is to use the str() function. Str stands for structure, and gives you a nice summary of the data structure.
str(sdata)
## 'data.frame': 40 obs. of 6 variables:
## $ Temp : num 38.3 38.5 38.8 41.9 39.3 ...
## $ dO2 : num -5.5 -4.79 -5.84 -6.21 -5.78 ...
## $ treatment: Factor w/ 4 levels "B2","D2","E2",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ T.ev : int 16 16 16 16 16 16 16 16 16 16 ...
## $ T.acc : int 16 16 16 16 16 16 16 16 16 16 ...
## $ Diff : logi NA NA NA NA NA NA ...
You should notice that the variables ‘Temp’ and ‘dO2’ are numeric, and your treatment is a factor. We’ll talk more about the different types of variables next week. But for now, this looks good.
The glimpse() function is another way to glance at your data.
glimpse(sdata)
## Observations: 40
## Variables: 6
## $ Temp (dbl) 38.28689, 38.54129, 38.79909, 41.88061, 39.32518, 39...
## $ dO2 (dbl) -5.496768, -4.791500, -5.843045, -6.214608, -5.77635...
## $ treatment (fctr) G2, G2, G2, G2, G2, G2, G2, G2, G2, G2, D2, D2, D2,...
## $ T.ev (int) 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 28, 28, 28, ...
## $ T.acc (int) 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 28, 28, 28, ...
## $ Diff (lgl) NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
What if you wanted to know a bit more about the data…say the column names, or number of rows etc? Try these to smell test the data and make sure everything looks right. To find out more about any of these functions, just type ?head, ?tail etc.
head(sdata)
## Temp dO2 treatment T.ev T.acc Diff
## 1 38.28689 -5.496768 G2 16 16 NA
## 2 38.54129 -4.791500 G2 16 16 NA
## 3 38.79909 -5.843045 G2 16 16 NA
## 4 41.88061 -6.214608 G2 16 16 NA
## 5 39.32518 -5.776353 G2 16 16 NA
## 6 39.59361 -5.626821 G2 16 16 NA
tail(sdata)
## Temp dO2 treatment T.ev T.acc Diff
## 35 39.32518 -4.779524 E2 16 28 NA
## 36 39.59361 -4.767689 E2 16 28 NA
## 37 39.86573 -4.828314 E2 16 28 NA
## 38 40.14162 -4.853632 E2 16 28 NA
## 39 40.42135 -5.298317 E2 16 28 NA
## 40 40.70501 -5.426151 E2 16 28 NA
ncol(sdata)
## [1] 6
dim(sdata)
## [1] 40 6
nrow(sdata)
## [1] 40
Using the $ helps you pull out just a single column in the dataset. The hist function creates a basic histogram.
head(sdata$Temp)
## [1] 38.28689 38.54129 38.79909 41.88061 39.32518 39.59361
hist(sdata$Temp)
The next step was to rename your treatments to give them more common-sense names…but you can change them any way you like. Here we are using the ‘revalue’ function from plyr, and the ‘mutate’ function from dplyr. These are just handy ways of playing around with your data right here in R, vs going back to excel to change things like variable names. Don’t worry about the syntax using the %>%, which is called a ‘pipe’, we’ll get to that next week.
sdata <- sdata %>%
mutate(treatment = revalue(treatment,
c("G2" = "evol16Acl16",
"B2" = "evol28Acl16",
"E2" = "evol16Acl28",
"D2" = "evol28Acl28")))
First, we visualized the data using ggplot:
ggplot(sdata, aes(x= Temp, y = dO2, color = treatment)) + geom_point()
Then we used the handy visreg function to create some quick and dirty plots of the relationship between O2 consumption and temperature, by treatment. Visreg is a nice way of visualizing simple linear models like the ones we are working with. Here mod1 is your oxygen consumption as a function of Temp, treatment and a temp*treatment interaction. We use overlay = FALSE to display each treatment separately.
mod1 <- lm(dO2 ~ Temp*treatment, data = sdata)
visreg(mod1, xvar = "Temp", by = "treatment", overlay = "FALSE")
Next, I wrote a function which will do two things: First, it will run a basic linear model of DO2 as function of temperature. It will then pull out the parameter estimates we want: slope and confidence intervals. I called the function temp.fit.
temp.fit <- function(df) {
model <- lm(dO2 ~ Temp, data = df) ## this is the linear model
y = coef(model)[2] ## this is your slope estimate, using coef()
ylo = confint(model)[2] ## this gives your lower confidence interval
yhi = confint(model)[4] ## this gives the upper confidence interval
setNames(data.frame(t(c(y, ylo, yhi))), c("beta", "ylo", "yhi")) ## this just renames those estimates
}
Here we run function on your sdata. This shows the slope and confidence intervals for the effect of temp on all of your treatments lumped together.
temp.fit(sdata)
## beta ylo yhi
## 1 -0.2741507 -0.4417532 -0.1065482
Now we apply the function to sdata, but we run separate models for each treatment, using the group_by function.
test <- sdata %>%
group_by(treatment) %>%
do(temp.fit(.)) %>%
ungroup()
Now we use ggplot to plot the slope estimates and confidence intervals (this is the Ea you want, I think.) You can change the axis labels etc. by changing the xlab and ylab lines etc. We use the command ggsave to save the figure as a png.
test$treatment <- factor(test$treatment, levels=unique(test$treatment))
ggplot(test, aes(x=treatment, y=beta, ymin=ylo, ymax=yhi)) +
geom_pointrange() +
# coord_flip() +
geom_hline(aes(x=0), lty=4) +
xlab('Evolution treatment') +
ylab('Slope, Ea') + theme(legend.position="none")
ggsave("figure2.png")
## Saving 7 x 5 in image