Basic geochemistry data that looks like a set of incubations at different sites and multiple sites are grouped into regions. This is the same basic data with columns for Dose, abs.400, Fe.DOC, site, Fe, DOC, Region. See the output of head:
dat <- read.csv('Jenn_TE_Spreadsheet_Fe2_table.csv')
head(dat)
## Dose abs.400 Fe.DOC site Fe DOC Region
## 1 0 0.02100000 0.010000000 BCA 0.04 6.70 Grand
## 2 1 0.01726574 0.005136986 BCA 0.03 6.42 Grand
## 3 2 0.01688548 0.004292554 BCA 0.03 6.06 Grand
## 4 3 0.01660477 0.004825091 BCA 0.03 5.80 Grand
## 5 4 0.01431910 0.004892966 BCA 0.02 4.91 Grand
## 6 0 0.01780000 0.000000000 BELWOOD 0.03 6.35 Grand
The goal here is to get the coefficients and R2 values for the DOC~Dose relationship for each site. This can be done the long-way round by running lm(y~x) on each set of data. Each set of data can be pulled out of dat by using subset. Subset allows you to subset from a large data.frame given some logical conditions such as is equal to (==), is not equal to (!=), is greater than (>), etc. Here we subset on the fly for one site at a time and send the results to lm. For example:
lm(DOC ~ Dose, data = subset(dat, site == "BCA"))
##
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "BCA"))
##
## Coefficients:
## (Intercept) Dose
## 6.818 -0.420
lm(DOC ~ Dose, data = subset(dat, site == "BELWOOD"))
##
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "BELWOOD"))
##
## Coefficients:
## (Intercept) Dose
## 6.350 -0.213
lm(DOC ~ Dose, data = subset(dat, site == "DE10"))
##
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "DE10"))
##
## Coefficients:
## (Intercept) Dose
## 6.068 0.173
Note the default output gives the intercept and slope coefficients. If the results of the lm are saved as a variable, then other things can be extracted from it, such as:
mod.bca <- lm(DOC ~ Dose, data = subset(dat, site == "BCA"))
coef(mod.bca)
## (Intercept) Dose
## 6.818 -0.420
summary(mod.bca)
##
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "BCA"))
##
## Residuals:
## 1 2 3 4 5
## -0.118 0.022 0.082 0.242 -0.228
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.81800 0.16228 42.013 2.97e-05 ***
## Dose -0.42000 0.06625 -6.339 0.00794 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2095 on 3 degrees of freedom
## Multiple R-squared: 0.9305, Adjusted R-squared: 0.9074
## F-statistic: 40.19 on 1 and 3 DF, p-value: 0.007938
This is fine if there are only a few groups of data but can be tedious for a lot of groups. When there area a lot of groups, it may be easier to use plyr or dplyr to do split–apply–combine to the data, where:
split: split the data into the groupings you want (site, in this case)
apply: apply the function you want (lm or coef(lm), in this case)
combine: put the results back together in a way that matching the original splittig characteristics
See http://plyr.had.co.nz/ for details
Idea here is to use dlply to split the data by site, run lm, using the formula, and save all the models back together. Use dlply since the data coming in is a data.frame (d) and the results coming back are a list (l). Hence dlply. (Note above that the results of summary(mod.bca) were a combination of text and numbers that did not look like a regular data.frame so the results have to be stored in a list.) Then to get the info out of the list, use ldply to extract the coef. To get the R2 values in a similar fashion, ddply can be used.
require(plyr)
## Loading required package: plyr
models <- dlply(dat, .(site), lm, formula = DOC ~ Dose)
ldply(models, coef)
## site (Intercept) Dose
## 1 BCA 6.818 -0.420
## 2 BELWOOD 6.350 -0.213
## 3 DE10 6.068 0.173
## 4 DE10-ND 18.360 -0.670
## 5 DI 0.388 0.022
## 6 H4 1.0 6.796 -0.100
## 7 H4 2.0 6.618 0.002
## 8 H4-21 3.296 -0.048
## 9 IHSS 5.762 0.124
## 10 NEIF 6.612 0.022
## 11 NWIF 6.252 0.205
## 12 P1-08 2.514 -0.111
## 13 P2 6.872 -0.229
## 14 POND 6.858 -0.084
## 15 U8 6.512 0.022
ddply(dat, .(site), summarise, r2 = cor(DOC, Dose)^2)
## site r2
## 1 BCA 0.9305367995
## 2 BELWOOD 0.9951087910
## 3 DE10 0.2148651753
## 4 DE10-ND 0.3457864736
## 5 DI 0.0408507765
## 6 H4 1.0 0.1822423094
## 7 H4 2.0 0.0001121453
## 8 H4-21 0.5619512195
## 9 IHSS 0.6767605634
## 10 NEIF 0.0950510605
## 11 NWIF 0.5611713492
## 12 P1-08 0.9175603217
## 13 P2 0.2904803581
## 14 POND 0.0361957525
## 15 U8 0.0276066621
See https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html for dplyr details (it’s great, you’ll find it useful)
See https://cran.r-project.org/web/packages/broom/vignettes/broom.html for broom details (it’s great too, since it tidies up messy things into useful formats)
See https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html for examples like this next one
The language (or syntax) is similar to above but not exactly the same. Here the pipe (%>%) is used to take the results from one command and send them to another. You could read it as ‘and send it to …’ or ‘and then…’. The data are grouped by site and then sent to lm(DOC ~ Dose) but the results of lm are wrapped in tidy so the results we get back are nice and tidy. Then do the same thing, take the dat, group them by site, and summarise these groupings with the cor(Dose, DOC) command and all the results R2.
library(broom)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dat %>% group_by(site) %>% do(tidy(lm(DOC ~ Dose, data=.)))
## Source: local data frame [30 x 6]
## Groups: site [15]
##
## site term estimate std.error statistic p.value
## (fctr) (chr) (dbl) (dbl) (dbl) (dbl)
## 1 BCA (Intercept) 6.818 0.162283702 42.0128448 2.967833e-05
## 2 BCA Dose -0.420 0.066252044 -6.3394271 7.938230e-03
## 3 BELWOOD (Intercept) 6.350 0.021118712 300.6812146 8.112124e-08
## 4 BELWOOD Dose -0.213 0.008621678 -24.7051673 1.453957e-04
## 5 DE10 (Intercept) 6.068 0.467681516 12.9746415 9.884947e-04
## 6 DE10 Dose 0.173 0.190930179 0.9060904 4.316832e-01
## 7 DE10-ND (Intercept) 18.360 1.303303495 14.0872790 7.747591e-04
## 8 DE10-ND Dose -0.670 0.532071424 -1.2592294 2.970109e-01
## 9 DI (Intercept) 0.388 0.150758084 2.5736597 8.223119e-02
## 10 DI Dose 0.022 0.061546730 0.3574520 7.444212e-01
## .. ... ... ... ... ... ...
dat %>% group_by(site) %>% summarize(R2 = cor(Dose, DOC)^2)
## Source: local data frame [15 x 2]
##
## site R2
## (fctr) (dbl)
## 1 BCA 0.9305367995
## 2 BELWOOD 0.9951087910
## 3 DE10 0.2148651753
## 4 DE10-ND 0.3457864736
## 5 DI 0.0408507765
## 6 H4 1.0 0.1822423094
## 7 H4 2.0 0.0001121453
## 8 H4-21 0.5619512195
## 9 IHSS 0.6767605634
## 10 NEIF 0.0950510605
## 11 NWIF 0.5611713492
## 12 P1-08 0.9175603217
## 13 P2 0.2904803581
## 14 POND 0.0361957525
## 15 U8 0.0276066621
That should do it.