Aim

In this tutorial I will show how to apply the package functions in batch to all streams (and time points), mostly through the use of:

The results, tables, and figures, should already be useable for the associated manuscript (Frauendorf et al. in revision).

Data Processing

The nice thing about tibbles is that it allows you to organize dataframes where each entry need not be single elements only, but can be any type of object: a data.frame, the results of a model fit, etc. This makes it easy to batch-apply functions using package purr. We can load these functions with:

library(tidyverse)

One of the advantages of the tidyverse family of packages is the use of the pipe operator %>%. It essentially allows you to first indicate what dataset you want to do a bunch of modifications or operations, and then list the series of operations one after the other followed by %>%.

For example, imagine we want to do the following changes to the excretion data:

load('../data/data_excretion.rda')

new_data <- data_excretion %>%
  filter(sex !='M') %>%
  mutate(N_rate = N_rate * 1000) %>%
  group_by(stream, year) %>%
  summarize(n_inds = length(size), 
            size_mean = mean(size, na.rm = TRUE),
            N_rate = mean(N_rate, na.rm = TRUE))
## `summarise()` regrouping output by 'stream' (override with `.groups` argument)
knitr::kable(new_data)
stream year n_inds size_mean N_rate
CA 2009 18 0.1272222 5494.0073
CA 2010 22 0.1418182 1870.1445
CA 2011 26 0.1726923 9785.8555
CA 2014 6 0.1666667 6908.4324
LL 2009 28 0.1357143 4138.9871
LL 2010 28 0.0996429 5542.1272
LL 2011 28 0.1196429 2791.9596
LL 2014 9 0.2177778 7298.9697
TY 2009 19 0.1811111 7270.3752
TY 2010 25 0.2652000 7917.7528
TY 2011 19 0.1015789 102.3317
UL 2009 33 0.2003030 9225.8923
UL 2010 27 0.1018519 6629.6677
UL 2011 25 0.1420000 8202.7205

TADAH! Now let’s forget about this example and use this magic with some real questions. Note that we made the table look prettier than the usual R format using function kable from package knitr

Population Descriptors

Figure 1. Population density

Figure 2. Size structure

Individual Excretion.

First of all, we want to group the excretion data by stream and year.

edata <- data_excretion %>%
  group_by(stream, year) %>%
  nest()

edata
## # A tibble: 14 x 3
## # Groups:   stream, year [14]
##    stream  year data             
##    <fct>  <int> <list>           
##  1 UL      2009 <tibble [42 × 8]>
##  2 LL      2009 <tibble [37 × 8]>
##  3 TY      2009 <tibble [24 × 8]>
##  4 CA      2009 <tibble [25 × 8]>
##  5 UL      2010 <tibble [36 × 8]>
##  6 LL      2010 <tibble [36 × 8]>
##  7 TY      2010 <tibble [36 × 8]>
##  8 CA      2010 <tibble [30 × 8]>
##  9 UL      2011 <tibble [36 × 8]>
## 10 LL      2011 <tibble [37 × 8]>
## 11 TY      2011 <tibble [24 × 8]>
## 12 CA      2011 <tibble [36 × 8]>
## 13 CA      2014 <tibble [15 × 8]>
## 14 LL      2014 <tibble [15 × 8]>

What this did is to create a dataframe whith one entry per stream and year only, and as a third column, the whole data for that stream and year. For example, to access the UL 2009 data:

subset(edata, stream == 'UL' & year == 2009)$data
## [[1]]
## # A tibble: 42 x 8
##     pair canopy year_intro month_cal month_intro sex    size N_rate
##    <int>  <int>      <dbl>     <int>       <int> <fct> <dbl>  <dbl>
##  1     1      1          1         5          14 F      0.55  18.2 
##  2     1      1          1         5          14 F      0.13   5.37
##  3     1      1          1         5          14 M      0.13   9.22
##  4     1      1          1         5          14 M      0.15   3.58
##  5     1      1          1         5          14 F      0.55   8.48
##  6     1      1          1         5          14 F      0.49  27.7 
##  7     1      1          1         5          14 F      0.16  10.5 
##  8     1      1          1         5          14 F      0.12   5.85
##  9     1      1          1         5          14 M      0.16   8.00
## 10     1      1          1         5          14 F      0.04   2.8 
## # … with 32 more rows

Now we can use a very powerful function called map from package purrr, for example, to apply the same function to all the nested dataframes at once.

edata <- edata %>%
  mutate(fit = map(.f = allo_fit, .x = data, N_rate, size))

edata
## # A tibble: 14 x 4
## # Groups:   stream, year [14]
##    stream  year data              fit             
##    <fct>  <int> <list>            <list>          
##  1 UL      2009 <tibble [42 × 8]> <named list [3]>
##  2 LL      2009 <tibble [37 × 8]> <named list [3]>
##  3 TY      2009 <tibble [24 × 8]> <named list [3]>
##  4 CA      2009 <tibble [25 × 8]> <named list [3]>
##  5 UL      2010 <tibble [36 × 8]> <named list [3]>
##  6 LL      2010 <tibble [36 × 8]> <named list [3]>
##  7 TY      2010 <tibble [36 × 8]> <named list [3]>
##  8 CA      2010 <tibble [30 × 8]> <named list [3]>
##  9 UL      2011 <tibble [36 × 8]> <named list [3]>
## 10 LL      2011 <tibble [37 × 8]> <named list [3]>
## 11 TY      2011 <tibble [24 × 8]> <named list [3]>
## 12 CA      2011 <tibble [36 × 8]> <named list [3]>
## 13 CA      2014 <tibble [15 × 8]> <named list [3]>
## 14 LL      2014 <tibble [15 × 8]> <named list [3]>

As you see, there is a new column coefs that stores a list of three elements, the three elements returned by our function allo_fit: the coefficients, and the variance covariance matrix. Let’s check out again, for example the UL 2009 data.

subset(edata, stream == 'UL' & year == 2009)$fit
## [[1]]
## [[1]]$coef
## (Intercept)   log(size) 
##   3.2254379   0.6202035 
## 
## [[1]]$vcov
##             (Intercept)   log(size)
## (Intercept) 0.015757805 0.006071351
## log(size)   0.006071351 0.002952926
## 
## [[1]]$fit
## 
## Call:
## lm(formula = log(response) ~ log(size))
## 
## Coefficients:
## (Intercept)    log(size)  
##      3.2254       0.6202

Therey you go! All models fit with one stroke! Now let’s carry on implementing allo_mc.

edata <- edata %>%
  mutate(mc = map(.f = allo_mc, .x = fit, predict = TRUE))

edata
## # A tibble: 14 x 5
## # Groups:   stream, year [14]
##    stream  year data              fit              mc              
##    <fct>  <int> <list>            <list>           <list>          
##  1 UL      2009 <tibble [42 × 8]> <named list [3]> <named list [3]>
##  2 LL      2009 <tibble [37 × 8]> <named list [3]> <named list [3]>
##  3 TY      2009 <tibble [24 × 8]> <named list [3]> <named list [3]>
##  4 CA      2009 <tibble [25 × 8]> <named list [3]> <named list [3]>
##  5 UL      2010 <tibble [36 × 8]> <named list [3]> <named list [3]>
##  6 LL      2010 <tibble [36 × 8]> <named list [3]> <named list [3]>
##  7 TY      2010 <tibble [36 × 8]> <named list [3]> <named list [3]>
##  8 CA      2010 <tibble [30 × 8]> <named list [3]> <named list [3]>
##  9 UL      2011 <tibble [36 × 8]> <named list [3]> <named list [3]>
## 10 LL      2011 <tibble [37 × 8]> <named list [3]> <named list [3]>
## 11 TY      2011 <tibble [24 × 8]> <named list [3]> <named list [3]>
## 12 CA      2011 <tibble [36 × 8]> <named list [3]> <named list [3]>
## 13 CA      2014 <tibble [15 × 8]> <named list [3]> <named list [3]>
## 14 LL      2014 <tibble [15 × 8]> <named list [3]> <named list [3]>

Let’s check again for UL 2009

str(subset(edata, stream == 'UL' & year == 2009)$mc)
## List of 1
##  $ :List of 3
##   ..$ coefs  : num [1:1000, 1:2] 3.12 3.2 3.24 3.21 3.16 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:2] "(Intercept)" "log(size)"
##   ..$ xpred  : num [1:201] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
##   ..$ pred_ci: num [1:3, 1:201] 0 0 0 1.07 1.44 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:3] "2.5%" "50%" "97.5%"
##   .. .. ..$ : NULL

In reality, it can all be done at the same time.

edata <- data_excretion %>%
  group_by(stream, year) %>%
  nest() %>%
  arrange(stream) %>%
  mutate(fit = map(.f = allo_fit, .x = data, N_rate, size))

edata
## # A tibble: 14 x 4
## # Groups:   stream, year [14]
##    stream  year data              fit             
##    <fct>  <int> <list>            <list>          
##  1 CA      2009 <tibble [25 × 8]> <named list [3]>
##  2 CA      2010 <tibble [30 × 8]> <named list [3]>
##  3 CA      2011 <tibble [36 × 8]> <named list [3]>
##  4 CA      2014 <tibble [15 × 8]> <named list [3]>
##  5 LL      2009 <tibble [37 × 8]> <named list [3]>
##  6 LL      2010 <tibble [36 × 8]> <named list [3]>
##  7 LL      2011 <tibble [37 × 8]> <named list [3]>
##  8 LL      2014 <tibble [15 × 8]> <named list [3]>
##  9 TY      2009 <tibble [24 × 8]> <named list [3]>
## 10 TY      2010 <tibble [36 × 8]> <named list [3]>
## 11 TY      2011 <tibble [24 × 8]> <named list [3]>
## 12 UL      2009 <tibble [42 × 8]> <named list [3]>
## 13 UL      2010 <tibble [36 × 8]> <named list [3]>
## 14 UL      2011 <tibble [36 × 8]> <named list [3]>

Let’s plot the results, which corresponds to Figure 3

## `geom_smooth()` using formula 'y ~ x'

Using package stargazer, we can also automate the construction of a nice summary table for the model results of different streams and years. It may look like a lot of code, but anything beyond the first line is formating and customizing labels. Even if you left that out, the default would look quite decent.

I notice we don’t have this in the paper, and it should be at least a Supplementary Appendix Table S3

my_model_labels <- ifelse(edata$year == min(edata$year), paste(edata$stream, edata$year, sep = '<br>'), paste('<br>',edata$year, sep = ''))

stargazer::stargazer(lapply(edata$fit, with, fit), type = 'html',
                     column.labels = my_model_labels,
                     covariate.labels = c('<em>a</em>','<em>b</em>'),
                     dep.var.caption = "log(E) = <em>a</em> +  <em>b</em> log(size)",
                     dep.var.labels.include = FALSE,
                     model.numbers = FALSE, 
                     df = FALSE, omit.stat = 'adj.rsq')
log(E) = a + b log(size)
CA
2009

2010

2011

2014
LL
2009

2010

2011

2014
TY
2009

2010

2011
UL
2009

2010

2011
a 0.547*** 0.620* 0.494*** 0.762** 1.003*** 0.552*** 0.874*** 0.710*** 0.541*** 0.619*** 0.780 0.620*** 0.550*** 0.182
(0.066) (0.358) (0.087) (0.298) (0.170) (0.115) (0.185) (0.104) (0.091) (0.142) (0.534) (0.054) (0.057) (0.129)
b 2.946*** 0.673 3.173*** 3.203*** 3.557*** 3.026*** 2.479*** 3.017*** 3.011*** 2.701*** 1.169 3.225*** 3.292*** 2.294***
(0.174) (0.957) (0.212) (0.696) (0.385) (0.313) (0.497) (0.210) (0.223) (0.328) (0.954) (0.126) (0.172) (0.338)
Observations 23 22 31 15 31 29 29 15 19 32 14 39 33 32
R2 0.764 0.131 0.525 0.335 0.546 0.459 0.453 0.782 0.675 0.388 0.151 0.779 0.748 0.062
Residual Std. Error 0.300 1.966 0.517 0.559 0.731 0.589 1.066 0.260 0.460 0.910 1.205 0.357 0.371 0.782
F Statistic 67.917*** 3.010* 32.020*** 6.561** 34.872*** 22.872*** 22.372*** 46.510*** 35.342*** 19.019*** 2.137 130.261*** 91.821*** 1.981
Note: p<0.1; p<0.05; p<0.01

Population Parameters

load('../data/data_population.rda')

pdata <- data_population %>%
  group_by(stream, month_intro) %>%
  arrange(month_intro, .by_group = TRUE) %>%
  nest()

pdata
## # A tibble: 14 x 3
## # Groups:   stream, month_intro [14]
##    stream month_intro data                
##    <fct>        <dbl> <list>              
##  1 CA               2 <tibble [73 × 5]>   
##  2 CA              14 <tibble [352 × 5]>  
##  3 CA              25 <tibble [369 × 5]>  
##  4 CA              60 <tibble [351 × 5]>  
##  5 LL              14 <tibble [371 × 5]>  
##  6 LL              26 <tibble [890 × 5]>  
##  7 LL              37 <tibble [781 × 5]>  
##  8 LL              73 <tibble [822 × 5]>  
##  9 TY               2 <tibble [65 × 5]>   
## 10 TY              14 <tibble [732 × 5]>  
## 11 TY              25 <tibble [166 × 5]>  
## 12 UL              14 <tibble [748 × 5]>  
## 13 UL              26 <tibble [2,492 × 5]>
## 14 UL              37 <tibble [984 × 5]>

NOTE: Because these figures are already fine in the MS, I will leave this for the end, but I can re-draw them so that they all look consistent

##Population Recycling through Time

In order to scale up from individual to population, we need to use ind2pop for all the combinations of stream and month since introduction. These are essentially given in pdata, so we can simply use som purrr magic to apply the function to all those combinations.

N = 5 # Define nr of MC simulations throughout the vignette

pdata <- pdata %>%
  mutate(pop_params = purrr::map2(stream, month_intro, pop_bundle, nsims = N),
         pop_excretion = purrr::map(pop_params, ind2pop)) %>%
  mutate(pop_exc_mean = purrr::map_dbl(pop_excretion, mean),
         pop_exc_upp = purrr::map_dbl(pop_excretion, quantile, 0.975),
         pop_exc_low = purrr::map_dbl(pop_excretion, quantile, 0.025))
pdata
## # A tibble: 14 x 8
## # Groups:   stream, month_intro [14]
##    stream month_intro data  pop_params pop_excretion pop_exc_mean pop_exc_upp
##    <fct>        <dbl> <lis> <list>     <list>               <dbl>       <dbl>
##  1 CA               2 <tib… <named li… <dbl [5]>             3.24        3.84
##  2 CA              14 <tib… <named li… <dbl [5]>             1.23        1.81
##  3 CA              25 <tib… <named li… <dbl [5]>            25.8        32.6 
##  4 CA              60 <tib… <named li… <dbl [5]>            12.0        13.4 
##  5 LL              14 <tib… <named li… <dbl [5]>             6.13        8.89
##  6 LL              26 <tib… <named li… <dbl [5]>            13.8        16.5 
##  7 LL              37 <tib… <named li… <dbl [5]>             5.20        6.09
##  8 LL              73 <tib… <named li… <dbl [5]>            10.3        10.8 
##  9 TY               2 <tib… <named li… <dbl [5]>             4.42        4.97
## 10 TY              14 <tib… <named li… <dbl [5]>            28.4        35.2 
## 11 TY              25 <tib… <named li… <dbl [5]>             1.42        1.95
## 12 UL              14 <tib… <named li… <dbl [5]>            13.7        15.1 
## 13 UL              26 <tib… <named li… <dbl [5]>            48.4        54.7 
## 14 UL              37 <tib… <named li… <dbl [5]>            16.6        21.0 
## # … with 1 more variable: pop_exc_low <dbl>

Now that we have the 5 MC simulations of population-level excretion for each stream/month combination, we can plot its evolution through time to reconstruct Figure 4

# natural stream references
natural_ref <- rbind(c(3.174946,3.824626,4.480129),
                     c(12.33031,14.735918,17.10158))
rownames(natural_ref) <- c('HP','LP')

# range of x
max_x <- max(pdata$month_intro) + 2

 # plot
plot_popexc <- pdata %>%
  ggplot(aes(y = pop_exc_mean, x = month_intro, group = stream)) +
    geom_rect(aes(xmin =0, xmax = max_x,
                  ymin = natural_ref['HP',1],
                  ymax = natural_ref['HP',3]), 
              fill = 'gray90') +
    annotate('text', max_x-2, natural_ref['HP',2], 
             label = 'HP', colour = 'gray50') +
    geom_rect(aes(xmin =0, xmax = max_x,
                  ymin = natural_ref['LP',1],
                  ymax = natural_ref['LP',3]), 
              fill = 'gray90') + 
    annotate('text', max_x-2, natural_ref['LP',2], 
            label = 'LP', colour = 'gray50') +
    geom_line(aes(color = factor(stream))) +
    scale_linetype_manual(values = c('solid', 'solid', 
                                     'dashed', 'dashed')) +
    geom_point(aes(color = factor(stream), 
                   shape = factor(stream)),
               size = 2.5) +
    geom_errorbar(aes(ymax = pop_exc_upp, 
                      ymin = pdata$pop_exc_low,
                      color = factor(stream))) +
    scale_color_manual(values = c('black', 'black',
                                  'gray50', 'gray50')) +
    scale_shape_manual(values = c(16, 17, 16, 17)) +
    scale_x_continuous(breaks = seq(0, max_x, 12),
                     expand = c(0, 0)) +
    theme_bw() + theme(text = element_text(size=13),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      axis.text.x = element_text(size = 12),
      axis.text.y = element_text(size = 12),
      legend.title=element_blank(),
      legend.text = element_text(size = 12)) + 
    xlab('\n Months Since Introduction') +
    ylab(expression(Population ~ Excretion ~ Rate ~ (gN/h/m^2)))
  
plot_popexc

Contributions to Change in Recycling

Now finally, the decomposition into contributions to change. This is a bit trickier because it cannot be done independently for all stream-month combinations. Rather, for each stream, we need to compare one time step with the next. To do this, we need to devise a little function that finds what are the relevant comparisons: i.e. that finds the next time step for each month-stream sampling. We have implemented this in function next_sampling. For example:

next_sampling('UL', 14)
## [1] 26

We can check that this is true:

data_ul <- subset(data_excretion, stream == 'UL')

samplings_ul <- sort(unique(data_ul$month_intro))
samplings_ul
## [1] 14 26 37

Yes, the sampling after month 14 is at month 26.

Let’s apply this to all our stream samplings and get our decomposition. This will require three steps:

decomp_data <- pdata %>%
  mutate(next_month = purrr::map2_dbl(stream, month_intro, next_sampling),
         pop_params_next = purrr::map2(stream, next_month, pop_bundle, nsims = N),
         decomposition = purrr::map2(pop_params, pop_params_next, change_decomp))

decomp_data[, c('stream', 'month_intro', 'decomposition')]
## # A tibble: 14 x 3
## # Groups:   stream, month_intro [14]
##    stream month_intro decomposition   
##    <fct>        <dbl> <list>          
##  1 CA               2 <df[,5] [5 × 5]>
##  2 CA              14 <df[,5] [5 × 5]>
##  3 CA              25 <df[,5] [5 × 5]>
##  4 CA              60 <lgl [1]>       
##  5 LL              14 <df[,5] [5 × 5]>
##  6 LL              26 <df[,5] [5 × 5]>
##  7 LL              37 <df[,5] [5 × 5]>
##  8 LL              73 <lgl [1]>       
##  9 TY               2 <df[,5] [5 × 5]>
## 10 TY              14 <df[,5] [5 × 5]>
## 11 TY              25 <lgl [1]>       
## 12 UL              14 <df[,5] [5 × 5]>
## 13 UL              26 <df[,5] [5 × 5]>
## 14 UL              37 <lgl [1]>

DONE! Let’s have a look at one of them

head(decomp_data$decomposition[[1]])
##      change          A         S        D         I
## 1 0.5198320 0.12381828 0.9443387 4.821918 0.9219993
## 2 0.7194713 0.18209928 0.8600252 4.821918 0.9527397
## 3 0.3750426 0.09160362 0.9492725 4.821918 0.8944525
## 4 0.2842011 0.06716755 0.9549846 4.821918 0.9188616
## 5 0.6932695 0.16373332 0.7967160 4.821918 1.1021525

We just need now to make the plot. But first let’s define a function that will calculate mean, upper, and lower CIs for all columns of a data frame.

MC_summary <- function(data){
  
  temp <- data %>%
    #mutate_all(funs(log)) %>%
    summarise_all(funs (mean = mean, 
                        lo = quantile(.,probs = 0.025),
                        up = quantile(.,probs = 0.975))) 
  return(temp)
}
summary_decomp <- decomp_data %>%
  filter(!is.na(decomposition)) %>%
  mutate(MC_summary = purrr::map(decomposition, MC_summary)) %>%
  select(stream, month_intro, MC_summary) %>%
  unnest()

mean_decomp <- summary_decomp %>%
  gather(key, mean, A_mean, S_mean, D_mean, I_mean, factor_key =  TRUE) %>%
  mutate(key = forcats::fct_recode(key, A = "A_mean",
                                        S = "S_mean",
                                        D = "D_mean",
                                        I = "I_mean"))
up_decomp <- summary_decomp  %>%
  gather(key, up, A_up, S_up, D_up, I_up, factor_key =  TRUE) %>%
  mutate(key = forcats::fct_recode(key, A = "A_up",
                                        S = "S_up",
                                        D = "D_up",
                                        I = "I_up"))
low_decomp <- summary_decomp  %>%
  gather(key, low, A_lo, S_lo, D_lo, I_lo, factor_key =  TRUE) %>%
  mutate(key = forcats::fct_recode(key, A = "A_lo",
                                        S = "S_lo",
                                        D = "D_lo",
                                        I = "I_lo"))

merge_decomp <- left_join(up_decomp, low_decomp, by = c('stream', 'month_intro', 'key')) %>%
    left_join(., mean_decomp, by = c('stream', 'month_intro', 'key')) %>%
    select(stream, month_intro, key, mean, up, low)  
  
plot_decomp <- merge_decomp %>%  
  ggplot(aes(y = mean, x = month_intro)) +
    facet_grid(stream ~ key) +
    geom_ribbon(aes(ymin = low, ymax = up), fill = 'grey70') +
    geom_line() +
    scale_x_continuous(breaks = seq(0, max_x, 12)) +
    scale_y_log10() +
    theme_bw() + theme(text = element_text(size=13),
      axis.text.x = element_text(size = 12),
      axis.text.y = element_text(size = 12),
      legend.title=element_blank(),
      legend.text = element_text(size = 12),
      strip.background = element_rect(fill = 'white', colour = 'white'),
      strip.text.y = element_text(angle = 0, size = 14, vjust = .5),
      strip.text.x = element_text(size = 14)) +
    xlab('\n Months Since Introduction') +
    ylab('\n Contribution to Population Recycling')
  

plot_decomp