Load data from csv file I made

Basic geochemistry data that looks like a set of incubations at different sites and multiple sites are grouped into regions. This is the same basic data with columns for Dose, abs.400, Fe.DOC, site, Fe, DOC, Region. See the output of head:

dat <- read.csv('Jenn_TE_Spreadsheet_Fe2_table.csv')
head(dat)

##   Dose    abs.400      Fe.DOC    site   Fe  DOC Region
## 1    0 0.02100000 0.010000000     BCA 0.04 6.70  Grand
## 2    1 0.01726574 0.005136986     BCA 0.03 6.42  Grand
## 3    2 0.01688548 0.004292554     BCA 0.03 6.06  Grand
## 4    3 0.01660477 0.004825091     BCA 0.03 5.80  Grand
## 5    4 0.01431910 0.004892966     BCA 0.02 4.91  Grand
## 6    0 0.01780000 0.000000000 BELWOOD 0.03 6.35  Grand

Straight-up linear models

The goal here is to get the coefficients and R2 values for the DOC~Dose relationship for each site. This can be done the long-way round by running lm(y~x) on each set of data. Each set of data can be pulled out of dat by using subset. Subset allows you to subset from a large data.frame given some logical conditions such as is equal to (==), is not equal to (!=), is greater than (>), etc. Here we subset on the fly for one site at a time and send the results to lm. For example:

lm(DOC ~ Dose, data = subset(dat, site == "BCA"))

## 
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "BCA"))
## 
## Coefficients:
## (Intercept)         Dose  
##       6.818       -0.420

lm(DOC ~ Dose, data = subset(dat, site == "BELWOOD"))

## 
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "BELWOOD"))
## 
## Coefficients:
## (Intercept)         Dose  
##       6.350       -0.213

lm(DOC ~ Dose, data = subset(dat, site == "DE10"))

## 
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "DE10"))
## 
## Coefficients:
## (Intercept)         Dose  
##       6.068        0.173

Note the default output gives the intercept and slope coefficients. If the results of the lm are saved as a variable, then other things can be extracted from it, such as:

mod.bca <- lm(DOC ~ Dose, data = subset(dat, site == "BCA"))
coef(mod.bca)

## (Intercept)        Dose 
##       6.818      -0.420

summary(mod.bca)

## 
## Call:
## lm(formula = DOC ~ Dose, data = subset(dat, site == "BCA"))
## 
## Residuals:
##      1      2      3      4      5 
## -0.118  0.022  0.082  0.242 -0.228 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.81800    0.16228  42.013 2.97e-05 ***
## Dose        -0.42000    0.06625  -6.339  0.00794 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2095 on 3 degrees of freedom
## Multiple R-squared:  0.9305, Adjusted R-squared:  0.9074 
## F-statistic: 40.19 on 1 and 3 DF,  p-value: 0.007938

This is fine if there are only a few groups of data but can be tedious for a lot of groups. When there area a lot of groups, it may be easier to use plyr or dplyr to do split–apply–combine to the data, where:

split: split the data into the groupings you want (site, in this case)

apply: apply the function you want (lm or coef(lm), in this case)

combine: put the results back together in a way that matching the original splittig characteristics

Older method with plyr

See http://plyr.had.co.nz/ for details

Idea here is to use dlply to split the data by site, run lm, using the formula, and save all the models back together. Use dlply since the data coming in is a data.frame (d) and the results coming back are a list (l). Hence dlply. (Note above that the results of summary(mod.bca) were a combination of text and numbers that did not look like a regular data.frame so the results have to be stored in a list.) Then to get the info out of the list, use ldply to extract the coef. To get the R2 values in a similar fashion, ddply can be used.

require(plyr)

## Loading required package: plyr

models <- dlply(dat, .(site), lm, formula = DOC ~ Dose)
ldply(models, coef)

##       site (Intercept)   Dose
## 1      BCA       6.818 -0.420
## 2  BELWOOD       6.350 -0.213
## 3     DE10       6.068  0.173
## 4  DE10-ND      18.360 -0.670
## 5       DI       0.388  0.022
## 6   H4 1.0       6.796 -0.100
## 7   H4 2.0       6.618  0.002
## 8    H4-21       3.296 -0.048
## 9     IHSS       5.762  0.124
## 10    NEIF       6.612  0.022
## 11    NWIF       6.252  0.205
## 12   P1-08       2.514 -0.111
## 13      P2       6.872 -0.229
## 14    POND       6.858 -0.084
## 15      U8       6.512  0.022

ddply(dat, .(site), summarise, r2 = cor(DOC, Dose)^2)

##       site           r2
## 1      BCA 0.9305367995
## 2  BELWOOD 0.9951087910
## 3     DE10 0.2148651753
## 4  DE10-ND 0.3457864736
## 5       DI 0.0408507765
## 6   H4 1.0 0.1822423094
## 7   H4 2.0 0.0001121453
## 8    H4-21 0.5619512195
## 9     IHSS 0.6767605634
## 10    NEIF 0.0950510605
## 11    NWIF 0.5611713492
## 12   P1-08 0.9175603217
## 13      P2 0.2904803581
## 14    POND 0.0361957525
## 15      U8 0.0276066621

Newer method with dplyr and broom

See https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html for dplyr details (it’s great, you’ll find it useful)

See https://cran.r-project.org/web/packages/broom/vignettes/broom.html for broom details (it’s great too, since it tidies up messy things into useful formats)

See https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html for examples like this next one

The language (or syntax) is similar to above but not exactly the same. Here the pipe (%>%) is used to take the results from one command and send them to another. You could read it as ‘and send it to …’ or ‘and then…’. The data are grouped by site and then sent to lm(DOC ~ Dose) but the results of lm are wrapped in tidy so the results we get back are nice and tidy. Then do the same thing, take the dat, group them by site, and summarise these groupings with the cor(Dose, DOC) command and all the results R2.

library(broom)
library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dat %>% group_by(site) %>% do(tidy(lm(DOC ~ Dose, data=.)))

## Source: local data frame [30 x 6]
## Groups: site [15]
## 
##       site        term estimate   std.error   statistic      p.value
##     (fctr)       (chr)    (dbl)       (dbl)       (dbl)        (dbl)
## 1      BCA (Intercept)    6.818 0.162283702  42.0128448 2.967833e-05
## 2      BCA        Dose   -0.420 0.066252044  -6.3394271 7.938230e-03
## 3  BELWOOD (Intercept)    6.350 0.021118712 300.6812146 8.112124e-08
## 4  BELWOOD        Dose   -0.213 0.008621678 -24.7051673 1.453957e-04
## 5     DE10 (Intercept)    6.068 0.467681516  12.9746415 9.884947e-04
## 6     DE10        Dose    0.173 0.190930179   0.9060904 4.316832e-01
## 7  DE10-ND (Intercept)   18.360 1.303303495  14.0872790 7.747591e-04
## 8  DE10-ND        Dose   -0.670 0.532071424  -1.2592294 2.970109e-01
## 9       DI (Intercept)    0.388 0.150758084   2.5736597 8.223119e-02
## 10      DI        Dose    0.022 0.061546730   0.3574520 7.444212e-01
## ..     ...         ...      ...         ...         ...          ...

dat %>% group_by(site) %>% summarize(R2 = cor(Dose, DOC)^2)

## Source: local data frame [15 x 2]
## 
##       site           R2
##     (fctr)        (dbl)
## 1      BCA 0.9305367995
## 2  BELWOOD 0.9951087910
## 3     DE10 0.2148651753
## 4  DE10-ND 0.3457864736
## 5       DI 0.0408507765
## 6   H4 1.0 0.1822423094
## 7   H4 2.0 0.0001121453
## 8    H4-21 0.5619512195
## 9     IHSS 0.6767605634
## 10    NEIF 0.0950510605
## 11    NWIF 0.5611713492
## 12   P1-08 0.9175603217
## 13      P2 0.2904803581
## 14    POND 0.0361957525
## 15      U8 0.0276066621

That should do it.

Linear Models Summaries for JLM

jvenkiteswaran

18 décembre 2015

Load data from csv file I made

Straight-up linear models

Older method with plyr

Newer method with dplyr and broom