Data screening

For a first data screening want to do the following:

Get data from eCO2 experiments
Data availability per experiment
Data availability per variable
Long versus wide issue

Let’s check first if the data is ready to use for these steps.

But before we start:

Some variable names end with an underscore (ANPP_, Asat_, SLA_, coarse_root_biomass_, LMA_, soil_NH4+_, N_uptake_). Needed to delete that manually.
Replaced lots of superscripted numbers (actually appeared as superscript when opening the CSV file in my text editor) with plain number (e.g., m2)
Replaced units given like g_m-2 with g/m2, or kg_ha-1 with kg/ha, or g/m2*yr with g/m2/yr (assuming that’s correct if that’s the way it would be entered in R), _g/m2 with g/m2, cm2/m2*d with cm2/m2/d, … I’ll leave it with that for now.
Replaced Vcmax_normalized_to_25°C with Vcmax25, and Jmax_normalized_to_25°C with Jmax25 (problematic degree sign)
Removed weird symbol in units, e.g., ‘mol_CO2/m2/s to mol_CO2/m2/s
Removed line break in date string

All that is saved as a new file NewData_wide_CORRECTED2.csv

Load the corrected file, which is based on the one Kevin sent on 1.3.2019.

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

df <- read_csv("~/Dropbox/data/gcme/data_received_190325/NewData_wide_CORRECTED2.csv") %>%
  
  # something is wrong in columns ambient_Sd, ambient_Se, elevated...
  mutate( ambient_Sd  = as.numeric(ambient_Sd),  ambient_Se  = as.numeric(ambient_Se), 
          elevated_Sd = as.numeric(elevated_Sd), elevated_Se = as.numeric(elevated_Se) )

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   exp_nam = col_character(),
##   prev_name = col_character(),
##   factors = col_character(),
##   treatment = col_character(),
##   TT_Nut_Detail = col_character(),
##   Data_type = col_character(),
##   Unit = col_character(),
##   Sampling_date = col_character(),
##   ambient_Se = col_character(),
##   elevated_Se = col_character()
## )

## See spec(...) for full column specifications.

  # XXX testing
  # filter(exp_nam == "SwissFACE_lolium2" & Data_type == "aboveground_biomass")
  # filter(exp_nam == "SwissFACE_trifolium2" & Data_type == "aboveground_biomass")

# save experiments names
df_experiments <- df %>% select(exp_nam, prev_name) %>% distinct()

0. Issues

Sampling date

This is not read in correctly. Either continue doing manual edits (open CSV in editor and check whether it’s in a consistent format, at least within experiments), or do something clever if sampling date formats are at least consistent within experiments: e.g., reading the table experiment per experiment (making use of automatic interpretation of the sampling date format by read_csv()) and then stack all experiments’ data together again.

Duplicate rows

Some rows are simply duplicates. This is dangerous.

keyvars <- c("exp_nam", "factors", "treatment", "Data_type", "Unit", "Sampling_date", "Start_Year", "Year", "n_plots")
valuevars <-  c("ambient", "ambient_Se", "ambient_Sd", "elevated", "elevated_Se", "elevated_Sd")

df %>% 
  filter(exp_nam == "SwissFACE_trifolium2" & Data_type == "aboveground_biomass") %>% 
  filter(treatment=="cf") %>% 
  select(one_of(keyvars, valuevars)) %>% 
  print()

## # A tibble: 2 x 15
##   exp_nam factors treatment Data_type Unit  Sampling_date Start_Year  Year
##   <chr>   <chr>   <chr>     <chr>     <chr> <chr>              <dbl> <dbl>
## 1 SwissF… cf      cf        abovegro… g/m2… <NA>                1993    NA
## 2 SwissF… cf      cf        abovegro… g/m2… <NA>                1993    NA
## # … with 7 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## #   ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>, elevated_Sd <dbl>

This may be remediated automatically by using only columns that are distinct w.r.t. valuevars and keyvars (basically all columns in original data, except ALIAS).

print(nrow(df))

## [1] 36974

df <- df %>% 
  distinct_at(vars(one_of(keyvars, valuevars))) %>%
  mutate( id=1:n() )   # create new ID key (before this was 'ALIAS')
print(nrow(df))

## [1] 36347

df %>% 
  filter(exp_nam == "SwissFACE_trifolium2" & Data_type == "aboveground_biomass") %>% 
  filter(treatment=="cf") %>% 
  select(one_of(keyvars, valuevars)) %>% 
  print()

## # A tibble: 1 x 15
##   exp_nam factors treatment Data_type Unit  Sampling_date Start_Year  Year
##   <chr>   <chr>   <chr>     <chr>     <chr> <chr>              <dbl> <dbl>
## 1 SwissF… cf      cf        abovegro… g/m2… <NA>                1993    NA
## # … with 7 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## #   ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>, elevated_Sd <dbl>

Still problems?

df %>% 
  filter(exp_nam == "SwissFACE_lolium2" & Data_type == "belowground_biomass") %>% 
  filter(treatment=="cf", Year==6) %>% 
  select(one_of(keyvars, valuevars)) %>% 
  print()

## # A tibble: 2 x 15
##   exp_nam factors treatment Data_type Unit  Sampling_date Start_Year  Year
##   <chr>   <chr>   <chr>     <chr>     <chr> <chr>              <dbl> <dbl>
## 1 SwissF… cf      cf        belowgro… g/m2  11/1/1998           1993     6
## 2 SwissF… cf      cf        belowgro… g/m2  1-Nov-98            1993     6
## # … with 7 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## #   ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>, elevated_Sd <dbl>

Yes. In this case, the problem has to do with Sampling_date. I suspect that the problem with Sampling_date is that it was manually entered and consistency may be respected within experiments (and variables?), but not across variables. When read into R with readr::read_csv(), different formats are interpreted always with the same function. A possible solution to this is to read data separately for each experiment and variable and stack them together again (bind_rows()).

TODO:

First, do some manual cleaning of Sampling_date as far as possible and reasonable. Then, develop a separate read-in by experiments and variables, stack data again, and write the combined data to file.

Distinct ambients

The way the wide format is designed, the ambient values should be the same for different rows corresponding to different treatments (for a given experiment, variable, and sampling year). This is not always the case. It might be that these are actually different data from multiple samplings. However, this is not identifiable, but should be. An example:

df %>% 
  filter(exp_nam == "SwissFACE_lolium2" & Data_type == "Rh" ) %>% 
  print()

## # A tibble: 2 x 16
##   exp_nam factors treatment Data_type Unit  Sampling_date Start_Year  Year
##   <chr>   <chr>   <chr>     <chr>     <chr> <chr>              <dbl> <dbl>
## 1 SwissF… cf      c         Rh        g_SO… 2002                1993    10
## 2 SwissF… cf      cf        Rh        g_SO… 2002                1993    10
## # … with 8 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## #   ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>,
## #   elevated_Sd <dbl>, id <int>

TODO:

I haven’t figured out how to do a bulk check on this and resolve it. It doesn’t affect too many entries. Hence, it might have to be cleaned by hand.

4. Long versus wide

An example conversion

How to perform a conversion of our current wide format to a long format? Let’s do it for one example. An example for how the wide format is implemented in the current dataset:

df_wide <- df %>% 
  filter(exp_nam=="SwissFACE_lolium2", Data_type=="aboveground_biomass") %>%
  select(exp_nam, factors, treatment, Data_type) %>% 
  distinct() %>%
  mutate( id=1:nrow(.),
          ambient=rep("A", 3), elevated=c("B", "C", "D"),
          ambient_Sd=rep("sdA", 3), elevated_Sd=c("sdB", "sdC", "sdD"),
          ambient_Se=rep("seA", 3), elevated_Se=c("seB", "seC", "seD")) %>%
  print()

## # A tibble: 3 x 11
##   exp_nam factors treatment Data_type    id ambient elevated ambient_Sd
##   <chr>   <chr>   <chr>     <chr>     <int> <chr>   <chr>    <chr>     
## 1 SwissF… cf      c         abovegro…     1 A       B        sdA       
## 2 SwissF… cf      cf        abovegro…     2 A       C        sdA       
## 3 SwissF… cf      f         abovegro…     3 A       D        sdA       
## # … with 3 more variables: elevated_Sd <chr>, ambient_Se <chr>,
## #   elevated_Se <chr>

To convert this to a long format, first get all available factors and create a vector. The available factors are:

df %>% 
  filter(!(factors %in% c("x","x99"))) %>% 
  select(factors) %>% 
  unique() %>% 
  unlist() %>% 
  unname() %>% 
  print()

##  [1] "cwd"  "c"    "cf"   "cfd"  "cfw"  "ci"   "cfi"  "cw"   "cdi"  "cfwi"
## [11] "cd"   "i"    "cdf"  "df"   "f"    "fi"   "co"   "o"    "cwi"  "w"   
## [21] "d"    "wd"   "fd"   "fw"   "wi"   "wdi"  "fdi"

Create a vector with all single factors

factors_avl <- df %>%
  filter(!(factors %in% c("x","x99"))) %>%      # Something weird with this
  select(factors) %>% 
  unique() %>% 
  mutate( factors_sep = strsplit(factors, NULL) ) %>% 
  select(factors_sep) %>% 
  unlist() %>% 
  unname() %>% 
  unique() %>% 
  print()

## [1] "c" "w" "d" "f" "i" "o"

Convert from wide to long, adding columns for all factors for which we have ambient and elevated levels’ data.

wide_to_long_gcme <- function( df_wide, keyvars ){
  
  require(stringr)
  
  ## if level is 'elevated', turn all factor levels that are part of 'treatment' to TRUE
  switch_factor <- function(df){
    if (df$level=="elevated"){
      for (ifactor in strsplit(df$treatment, NULL) %>% unlist()){
        df[[ifactor]] <- TRUE
      }
    }
    return(df)
  }
  
  ## First get available factors
  factors_avl <- df_wide %>%
    filter(!(factors %in% c("x","x99"))) %>%      # Something weird with this
    select(factors) %>% 
    unique() %>% 
    mutate( factors_sep = strsplit(factors, NULL) ) %>% 
    select(factors_sep) %>% 
    unlist() %>% 
    unname() %>% 
    unique()  
  
  df_long <- df_wide %>%
    
    ## gather 'mean' based on columns: 'ambient', 'elevated'
    select( -ambient_Sd, -elevated_Sd, -ambient_Se, -elevated_Se ) %>%
    tidyr::gather(level, mean, c(ambient, elevated)) %>%
    
    ## gather 'sd' based on columns: 'ambient_Sd', 'elevated_Sd'
    left_join( 
      select( df_wide, -ambient, -elevated, -ambient_Se, -elevated_Se ) %>%
        tidyr::gather(level, sd, c(ambient_Sd, elevated_Sd)) %>%
        mutate( level = stringr::str_split(.$level, "_") %>% purrr::map(., 1) %>% unlist() ),
      by = c(keyvars) ) %>% 
    
    ## gather 'se' based on columns: 'ambient_Se', 'elevated_Se'
    left_join( 
      select( df_wide, -ambient, -elevated, -ambient_Sd, -elevated_Sd ) %>%
        tidyr::gather(level, se, c(ambient_Se, elevated_Se)) %>%
        mutate( level = stringr::str_split(.$level, "_") %>% purrr::map(., 1) %>% unlist() ),
      by = c(keyvars) ) %>% 
    
    ## magically add columns corresponding to all available levels of 'factors'
    `is.na<-`(factors_avl) %>% 
    
    ## fill all new factor columns with FALSE
    dplyr::mutate_at( factors_avl, ~FALSE ) 
  
  ## Determine new factor columns based on information in 'treatment' and 'level'
  df_long <- purrr::map_dfr(as.list(1:nrow(df_long)), ~switch_factor(df_long[.,]))
  
  ## Now we have duplicate ambients, treat them separately and stack ambient and elevated together again
  df_long <- df_long %>% filter(level=="ambient") %>% 
    
    # WARNING: This assumes that if values are identical, they are repetitions, ignoring differences w.r.t Sampling_Date
    distinct( exp_nam, factors, Data_type, Unit, Year, mean, sd, se, .keep_all=TRUE ) %>%
    
    # bind them together again. also cleaning replicated rows (note treatment added as column!)
    bind_rows( ., 
               filter(df_long, level=="elevated") %>% 
                 distinct( exp_nam, factors, treatment, Data_type, Unit, Year, mean, sd, se, .keep_all=TRUE )
                 ) %>% 
    
    ## abandon columns, now obsolete with boolean columns for factors 
    select(-treatment, -level, -id)
  
  return(df_long)
}

# run it
keyvars <- c("id", "exp_nam", "factors", "level", "treatment", "Data_type")
valuevars <-  c("ambient", "ambient_Se", "ambient_Sd", "elevated", "elevated_Se", "elevated_Sd")

df_long_example <- df_wide %>%
  select(one_of(keyvars, valuevars)) %>%
  wide_to_long_gcme( keyvars ) %>%
  print()

## Loading required package: stringr

## # A tibble: 4 x 8
##   exp_nam          factors Data_type          mean  sd    se    c     f    
##   <chr>            <chr>   <chr>              <chr> <chr> <chr> <lgl> <lgl>
## 1 SwissFACE_loliu… cf      aboveground_bioma… A     sdA   seA   FALSE FALSE
## 2 SwissFACE_loliu… cf      aboveground_bioma… B     sdB   seB   TRUE  FALSE
## 3 SwissFACE_loliu… cf      aboveground_bioma… C     sdC   seC   TRUE  TRUE 
## 4 SwissFACE_loliu… cf      aboveground_bioma… D     sdD   seD   FALSE TRUE

Converting the entire table

Let’s apply this to the full data frame. First, keep only some of the columns:

keyvars <- c("id", "exp_nam", "factors", "treatment", "level", "Data_type", "Unit", "Start_Year", "Year",  "n_plots")
valuevars <-  c("ambient", "ambient_Se", "ambient_Sd", "elevated", "elevated_Se", "elevated_Sd")

df_long <- df %>% 
  select(one_of(keyvars, valuevars)) %>%
  # filter(exp_nam=="BioCON" & Data_type=="leaf_N" & Year==4) %>%  # xxx debug
  wide_to_long_gcme(keyvars) %>% 
  write_csv(path="~/Dropbox/data/gcme/data_received_190325/NewData_LONG.csv")

Thus, we have a long-format data table, that has less than twice as many rows as the original table, because the “absolute ambient” is no longer repeated:

print(nrow(df)*2)
nrow(df_long)

Test

Test, whether the conversion was doing alright.

check_cf <- function(df_wide, df_long, verbose=FALSE){

  ## check cf experiments
  df_cf <- df_wide %>% filter(treatment %in% c("f","c","cf","fc"))

  out <- tibble()

  all_exp <- df_cf$exp_nam %>% unique

  for (iexp in all_exp){

    df_sub <- df_cf %>% filter(exp_nam==iexp)

    all_var <- df_sub$Data_type %>% unique

    for (ivar in all_var){

      df_subsub <- df_sub %>% filter(Data_type==ivar)

      all_var_unit <- df_subsub$Unit %>% unique

      for (iunit in all_var_unit){

        df_subsubsub <- df_subsub %>% filter(Unit==iunit)

        # elevated-f treatment
        correct_f <- df_subsubsub %>% 
          filter(treatment=="f") %>% 
          select(exp_nam, Data_type, Unit, Year, mean=elevated, se=elevated_Se, sd=elevated_Sd) %>%
          all_equal( ., df_long %>% filter(f & !c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )

        # elevated-c treatment
        correct_c <- df_subsubsub %>% 
          filter(treatment=="c") %>% 
          select(exp_nam, Data_type, Unit, Year, mean=elevated, se=elevated_Se, sd=elevated_Sd) %>%
          all_equal( ., df_long %>% filter(!f & c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )

        # elevated-cf treatment
        correct_cf <- df_subsubsub %>% 
          filter(treatment=="cf") %>% 
          select(exp_nam, Data_type, Unit, Year, mean=elevated, se=elevated_Se, sd=elevated_Sd) %>%
          all_equal( ., df_long %>% filter(f & c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )

        # (absolute) control: WARNING: DUPLICATED ROWS
        correct_0 <- df_subsubsub %>% 
          distinct(exp_nam, Data_type, Unit, Year, ambient, ambient_Se, ambient_Sd, .keep_all=TRUE) %>% 
          select(exp_nam, Data_type, Unit, Year, mean=ambient, se=ambient_Se, sd=ambient_Sd) %>%
          all_equal( ., df_long %>% filter(!f & !c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )
        
        if (verbose){
        if (!identical(correct_0, TRUE)){
          rlang::warn(correct_0)
          print("WIDE:")
          df_subsubsub %>%
            distinct(exp_nam, Data_type, Unit, Year, ambient, ambient_Se, ambient_Sd, .keep_all=TRUE) %>%
            select(exp_nam, Data_type, Unit, Year, mean=ambient, se=ambient_Se, sd=ambient_Sd) %>%
            print()
          print("LONG:")
          df_long %>%
            filter(!f & !c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>%
            select(exp_nam, Data_type, Unit, Year, mean, se, sd) %>%
            print()
          print("---------")
        }
        }

        out <- bind_rows(out, c(factors="cf", exp_nam=iexp, Data_type=ivar, Unit=iunit, correct_0=correct_0, correct_c=correct_c, correct_f=correct_f, correct_cf=correct_cf))

      }
    }
  }

  return(out)
}
#out <- check_cf( df, df_long) %>% head()

Issue:

There are still lots of problems with identifying the ambients (see column correct_0 in out). See for example:

## note row 18 is missing in wide
out <- check_cf( filter(df, exp_nam=="BioCON", Data_type=="leaf_N", Year==4), df_long, verbose = TRUE)

TODO:

Make sure the wide-to-long conversion works fine. That is, out should have TRUE in all columns.

Converting back with a twist

In the original data, ambient was the “absolute ambient”. For example in the cf treatment, ambient was no treatment at all (control), and elevated was c \(\times\) f. We may re-design a wide version of the table and allow flexibility in what treatments are compared to each other, i.e., what ambient means. I would argue that this depends whether we are interested in a single effect or in interactions. In our case, we are interested primarily in the CO2 effect. Hence, in a factorial experiment cf, we may want to compare c to 0 (an “absolute ambient”), and cf to f and thus have more data available to evaluate the CO2-only effect.

Let’s implement this, by converting our example back to a CO2-wide format:

long_to_wide_gcme <- function(df_long, keyvar){

  joinvars <- names(df_long)[-which(names(df_long) %in% c("mean", "sd", "se", keyvar))]
  factors_all <- df_long %>%
    filter(!(factors %in% c("x","x99"))) %>%      # Something weird with this
    select(factors) %>% 
    unique() %>% 
    mutate( factors_sep = strsplit(factors, NULL) ) %>% 
    select(factors_sep) %>% 
    unlist() %>% 
    unname() %>% 
    unique()
  factorvars <- factors_all[-which(factors_all==keyvar)]
    
  # take all data where factor 'c' is TRUE
  df_wide <- dplyr::filter(df_long, eval(parse(text=keyvar)) ) %>% 
    
    # call this 'elevated'
    dplyr::rename(elevated=mean, elevated_sd=sd, elevated_se=se) %>% 
    
    # remove column 'c', is no longer used
    dplyr::select(-keyvar) %>% 
    
    # merge this with the corresponding row where all other factors are the same, while 'c' is FALSE
    left_join( 
      
      # take all data where factor 'c' is FALSE
      dplyr::filter(df_long, !(eval(parse(text=keyvar)))) %>% 

        # call this 'ambient'
        dplyr::rename(ambient=mean, ambient_sd=sd, ambient_se=se) %>% 

        # remove column 'c', is no longer used
        dplyr::select(-keyvar),
      
      ## merge by all other columns
      by = joinvars ) %>% 
    
    # order columns
    select( c(joinvars, ambient, elevated, ambient_se, elevated_se, ambient_sd, elevated_sd)  )

  return(df_wide)
}

Run it

df_long_example %>%
  long_to_wide_gcme("c") %>%
  print()

## # A tibble: 2 x 10
##   exp_nam factors Data_type f     ambient elevated ambient_se elevated_se
##   <chr>   <chr>   <chr>     <lgl> <chr>   <chr>    <chr>      <chr>      
## 1 SwissF… cf      abovegro… FALSE A       B        seA        seB        
## 2 SwissF… cf      abovegro… TRUE  D       C        seD        seC        
## # … with 2 more variables: ambient_sd <chr>, elevated_sd <chr>

The same can be done w.r.t. the fertilisation treatment, with two rows corresponding now to whether or not CO2 was elevated.

df_long_example %>%
  long_to_wide_gcme("f") %>%
  print()

## # A tibble: 2 x 10
##   exp_nam factors Data_type c     ambient elevated ambient_se elevated_se
##   <chr>   <chr>   <chr>     <lgl> <chr>   <chr>    <chr>      <chr>      
## 1 SwissF… cf      abovegro… TRUE  B       C        seB        seC        
## 2 SwissF… cf      abovegro… FALSE A       D        seA        seD        
## # … with 2 more variables: ambient_sd <chr>, elevated_sd <chr>

Let’s apply this to the full data frame.

keyvars <- c("ALIAS", "exp_nam", "factors", "Data_type", "Unit", "Sampling_date", "n_plots")
valuevars <-  c("mean", "sd", "se")

df_wide_c <- df_long %>% 
  # select(one_of(keyvars, valuevars, factors_avl)) %>%
  # select(-prev_name, -TT_Nut_Detail) %>% 
  long_to_wide_gcme("c") %>% 
  write_csv(path="~/Dropbox/data/gcme/data_received_190325/created/NewData_WIDE_c.csv")

TODO:

Develop a test to check the new-wide format against the long or against the original-wide table.

Data selection

The initial draft of the figures we want to draw looks like this:

The following is a list of variables we need according to this draft, and the corresponding (possibly) relevant variables (Data_type) in our dataset.

Add two columns to the data. One for the new standard variable name (varnam), taken from CSV files. And one for this specific study where multiple variables (based on varnames) are pooled because they contain relevant and comparable information when evaluating response ratios (named my_varnam).

General filters

We filter the data based on the following criteria:

Only data from experiments is used where CO2 was the only manipulated factor (data from treatment = "c")
Only data from experiments where data from at least two years is available, based on column Year in the data table.
Only data from Open Top Chambers or Free Air CO2 Enrichment CO2 experiments: fumigation type is either "OTC" or "FACE".

## Determine all experiments that have more than 1 year data
list_exp_gt1yr <- df %>% 
  filter(!is.na(Year)) %>% 
  group_by(exp_nam) %>% 
  summarise(nyears=max(Year)) %>% 
  filter(nyears>1) %>% 
  select(exp_nam) %>% 
  unlist() %>% 
  unname()

df_c <- df %>%
  
  # ## Take this info from experiments below
  # select(-Fumigation_type, -Vegetation_type) %>% 
  
  ## Add prev_name back
  left_join( df_experiments, by = "exp_nam") %>% 
  
  ## filter experiments with only manipulated CO2
  ## (no other factors manipulated, strong reduction of data)
  filter(treatment=="c") %>% 
  
  ## More than 1 year data
  filter(exp_nam %in% list_exp_gt1yr) %>% 
  
  ## Combine with experiments meta info
  left_join( 
    read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>% 
      select(prev_name, Fumigation_type=my_fumigation_type, Vegetation_type),
    by = c("prev_name")
  ) %>% 
  
  ## Filter only Fumigation_type OTC or FACE
  filter( Fumigation_type %in% c("OTC", "FACE") ) %>%
  
  {.}

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Experiment Number` = col_double(),
##   `N Quantity control` = col_double(),
##   `P Quantity control` = col_double(),
##   `K Quantity control` = col_double(),
##   `K Distribution` = col_logical()
## )

## See spec(...) for full column specifications.

ANPP

ANPP: my_varnam = my_anpp, and varnam %in% c("anpp", "anpp_bm").

Collect information about experiments providing data for any of varnam %in% c("anpp", "anpp_bm"). See procedure here. Add additional information from the experiments table (table_var_exp_names_experiments.csv) and from the data frame df_experiments, which is used to associate new and old experiments names.

df_anpp <- read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_anpp.csv") %>% 
  select(prev_name=expnam, varnam, Data_type, Source_Reference) %>%
  distinct() %>% 
  filter(varnam %in% c("anpp", "anpp_bm") ) %>% 
  
  ## add 'exp_nam' (new experiment name) from data table
  left_join( df_experiments, by="prev_name" ) %>% 
  
  ## filter only experiments selected by general filters above
  filter( exp_nam %in% unique(df_c$exp_nam) ) %>% 
  
  ## add experiments meta information from experiments table
  left_join( 
    read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>% 
      select(prev_name, Fumigation_type=my_fumigation_type, Cquantity_Control, Cquantity_Treatment, Vegetation_type, Plants_specs, Start_Year, End_Year, Remarks),
    by = c("prev_name")
    )

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   ALIAS = col_double(),
##   Data_Number = col_double(),
##   ambient = col_double(),
##   ambient_Se = col_double(),
##   ambient_Sd = col_double(),
##   elevated = col_double(),
##   elevated_Se = col_double(),
##   elevated_Sd = col_double()
## )

## See spec(...) for full column specifications.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Experiment Number` = col_double(),
##   `N Quantity control` = col_double(),
##   `P Quantity control` = col_double(),
##   `K Quantity control` = col_double(),
##   `K Distribution` = col_logical()
## )

## See spec(...) for full column specifications.

Add my_varnam to the data table, defined by my_varnam = ifelse( varnam %in% c("anpp", "anpp_bm"), "my_anpp", my_varnam ). WARNING: This assumes that variable names ‘Data_type’ are identical in old and new dataset (manually replaced white spaces with underscores in table_var_exp_names_anpp.csv, first)

df_c <- df_c %>% 
  mutate( varnam=NA ) %>% 
  left_join( select( df_anpp, -prev_name ), by=c("exp_nam", "Data_type") ) %>% 
  distinct(id, .keep_all=TRUE) %>% 
  rename( varnam = varnam.x ) %>% 
  rowwise() %>% 
  mutate( varnam = ifelse(!is.na(varnam.y), varnam.y, varnam ) ) %>% 
  select( -varnam.y ) %>% 
  mutate( my_varnam = NA ) %>% 
  mutate( my_varnam = ifelse( varnam %in% c("anpp", "anpp_bm"), "my_anpp", my_varnam ) )

And some information about the data used for my_anpp (only top 5 rows of a total of 110 available rows, full table at the bottom of this document)

df_anpp %>% 
  select(-prev_name, -Cquantity_Control) %>% 
  distinct(varnam, Data_type, exp_nam, .keep_all=TRUE) %>% 
  mutate( Source_Reference = str_replace_all(Source_Reference, "_", " ") ) %>% 
  knitr::kable()

Leaf N

This analysis is done by mass, area, and C:N separately.

my_varnam = "my_nmass_leaf", and varnam = "nmass_leaf".
my_varnam = "my_narea_leaf", and varnam = "narea_leaf".
my_varnam = "my_rcton_leaf", and varnam = "rcton_leaf".

Collect information about experiments providing data for any of varnam %in% c("nmass_leaf", "narea_leaf", "rcton_leaf") from file "table_var_exp_names_leafn.csv". See procedure here. Add additional information from the experiments table (table_var_exp_names_experiments.csv) and from the data frame df_experiments, which is used to associate new and old experiments names.

df_leafn <- read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_leafn.csv") %>% 
  select(prev_name=expnam, varnam, Data_type, Source_Reference) %>%
  distinct() %>% 
  filter(varnam %in% c("nmass_leaf", "narea_leaf", "rcton_leaf") ) %>% 
  
  ## add 'exp_nam' (new experiment name) from data table
  left_join( df_experiments, by="prev_name" ) %>% 
  
  ## filter only experiments selected by general filters above
  filter( exp_nam %in% unique(df_c$exp_nam) ) %>% 
  
  ## add experiments meta information from experiments table
  left_join( 
    read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>% 
      select(prev_name, Fumigation_type=my_fumigation_type, Cquantity_Control, Cquantity_Treatment, Vegetation_type, Plants_specs, Start_Year, End_Year, Remarks),
    by = c("prev_name")
    )

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   Data_Number = col_double(),
##   ambient = col_double(),
##   ambient_Se = col_double(),
##   ambient_Sd = col_double(),
##   elevated = col_double(),
##   elevated_Se = col_double(),
##   elevated_Sd = col_double(),
##   Remarks = col_logical(),
##   X21 = col_logical()
## )

## See spec(...) for full column specifications.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Experiment Number` = col_double(),
##   `N Quantity control` = col_double(),
##   `P Quantity control` = col_double(),
##   `K Quantity control` = col_double(),
##   `K Distribution` = col_logical()
## )

## See spec(...) for full column specifications.

Add my_varnam to the data table, defined by my_varnam = ifelse( varnam == "nmass_leaf", "my_nmass_leaf", my_varnam ).

df_c <- df_c %>%
  left_join( select( df_leafn, exp_nam, Data_type, varnam_unified=varnam ), by=c("exp_nam", "Data_type") ) %>%
  mutate( varnam = ifelse( is.na(varnam), varnam_unified, varnam ) ) %>%
  mutate( my_varnam = ifelse( varnam == "nmass_leaf", "my_nmass_leaf", my_varnam ),
          my_varnam = ifelse( varnam == "narea_leaf", "my_narea_leaf", my_varnam ),
          my_varnam = ifelse( varnam == "rcton_leaf", "my_rcton_leaf", my_varnam )) %>%
  select(-varnam_unified)

Asat

This analysis is done by Asat and Anet, that is at saturating and ambient PAR, respectively, and always at treatment CO2 levels.

my_varnam = "my_asat", and varnam = "asat".
my_varnam = "my_anet", and varnam = "anet".

Collect information about experiments providing data for any of varnam %in% c("asat", "anet") from file "table_var_exp_names_asat.csv". See procedure here. Add additional information from the experiments table (table_var_exp_names_experiments.csv) and from the data frame df_experiments, which is used to associate new and old experiments names.

df_asat <- read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_asat.csv") %>% 
  select(prev_name=expnam, varnam, Data_type, Source_Reference) %>%
  distinct() %>% 
  filter(varnam %in% c("asat", "anet") ) %>% 
  
  ## add 'exp_nam' (new experiment name) from data table
  left_join( df_experiments, by="prev_name" ) %>% 
  
  ## filter only experiments selected by general filters above
  filter( exp_nam %in% unique(df_c$exp_nam) ) %>% 
  
  ## add experiments meta information from experiments table
  left_join( 
    read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>% 
      select(prev_name, Fumigation_type=my_fumigation_type, Cquantity_Control, Cquantity_Treatment, Vegetation_type, Plants_specs, Start_Year, End_Year, Remarks),
    by = c("prev_name")
    )

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   ALIAS = col_double(),
##   Data_Number = col_double(),
##   ambient = col_double(),
##   ambient_Se = col_double(),
##   ambient_Sd = col_double(),
##   elevated = col_double(),
##   elevated_Se = col_double(),
##   elevated_Sd = col_double(),
##   Measurement_replicates = col_double(),
##   X21 = col_logical()
## )

## See spec(...) for full column specifications.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Experiment Number` = col_double(),
##   `N Quantity control` = col_double(),
##   `P Quantity control` = col_double(),
##   `K Quantity control` = col_double(),
##   `K Distribution` = col_logical()
## )

## See spec(...) for full column specifications.

Add my_varnam to the data table, defined by my_varnam = ifelse( varnam == "nmass_leaf", "my_nmass_leaf", my_varnam ).

df_c <- df_c %>%
  left_join( select( df_asat, exp_nam, Data_type, varnam_unified=varnam ), by=c("exp_nam", "Data_type") ) %>%
  mutate( varnam = ifelse( is.na(varnam), varnam_unified, varnam ) ) %>%
  mutate( my_varnam = ifelse( varnam == "anet", "my_anet", my_varnam ),
          my_varnam = ifelse( varnam == "asat", "my_asat", my_varnam ) ) %>%
  select(-varnam_unified)

BNPP

Let’s pool various types of data to analyse response ratios of BNPP, interpreted here as a proxy for total belowground C allocation.

ANPP: my_varnam = my_bnpp, and varnam %in% c("bnpp", "npp_fineroot", "npp_fineroot_length", "bm_fineroot", "bm_fineroot_mass", "bm_fineroot_vol", "bm_fineroot_length").

df_bnpp <- read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_bnpp.csv") %>% 
  select(prev_name=expnam, varnam, Data_type, Source_Reference) %>%
  distinct() %>% 
  filter(varnam %in% c("bnpp", "npp_fineroot", "npp_fineroot_length", "bm_fineroot", "bm_fineroot_mass", "bm_fineroot_vol", "bm_fineroot_length") ) %>% 
  
  ## add 'exp_nam' (new experiment name) from data table
  left_join( df_experiments, by="prev_name" ) %>% 
  
  ## filter only experiments selected by general filters above
  filter( exp_nam %in% unique(df_c$exp_nam) ) %>% 
  
  ## add experiments meta information from experiments table
  left_join( 
    read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>% 
      select(prev_name, Fumigation_type=my_fumigation_type, Cquantity_Control, Cquantity_Treatment, Vegetation_type, Plants_specs, Start_Year, End_Year, Remarks),
    by = c("prev_name")
    ) %>% 
  
  ## String replace in CSV file
  mutate( Data_type = str_replace_all(Data_type, " ", "_") )

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   ALIAS = col_double(),
##   Data_Number = col_double(),
##   ambient = col_double(),
##   ambient_Se = col_double(),
##   ambient_Sd = col_double(),
##   elevated = col_double(),
##   elevated_Se = col_double(),
##   elevated_Sd = col_double()
## )

## See spec(...) for full column specifications.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Experiment Number` = col_double(),
##   `N Quantity control` = col_double(),
##   `P Quantity control` = col_double(),
##   `K Quantity control` = col_double(),
##   `K Distribution` = col_logical()
## )

## See spec(...) for full column specifications.

Add my_varnam to the data table, defined by my_varnam = ifelse( varnam %in% c("bnpp", "npp_fineroot", "npp_fineroot_length", "bm_fineroot", "bm_fineroot_mass", "bm_fineroot_vol", "bm_fineroot_length").

df_c <- df_c %>%
  left_join( select( df_bnpp, exp_nam, Data_type, varnam_unified=varnam ), by=c("exp_nam", "Data_type") ) %>%
  mutate( varnam = ifelse( is.na(varnam), varnam_unified, varnam ) ) %>%
  mutate( my_varnam = ifelse( varnam %in% c("bnpp", "npp_fineroot", "npp_fineroot_length", "bm_fineroot", "bm_fineroot_mass", "bm_fineroot_vol", "bm_fineroot_length"), "my_bnpp", my_varnam ) ) %>%
  select(-varnam_unified)

N availability

Let’s pool various types of data to analyse response ratios of N availability.

navl: my_varnam = my_navl, and varnam %in% c("nmin_mass", "nmin_area", "nh4_mass", "no3_mass", "nmin_solution", "nh4_solution", "no3_solution", "nmin_resin", "nh4_resin", "no3_resin").

See procedure here. Add additional information from the experiments table (table_var_exp_names_experiments.csv) and from the data frame df_experiments, which is used to associate new and old experiments names.

df_navl <- read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_navl.csv") %>% 
  select(prev_name=expnam, varnam, Data_type, Source_Reference) %>%
  distinct() %>% 
  filter(varnam %in% c("nmin_mass", "nmin_area", "nh4_mass", "no3_mass", "nmin_solution", "nh4_solution", "no3_solution", "nmin_resin", "nh4_resin", "no3_resin") ) %>% 
  
  ## add 'exp_nam' (new experiment name) from data table
  left_join( df_experiments, by="prev_name" ) %>% 
  
  ## filter only experiments selected by general filters above
  filter( exp_nam %in% unique(df_c$exp_nam) ) %>% 
  
  ## add experiments meta information from experiments table
  left_join( 
    read_csv("~/Dropbox/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>% 
      select(prev_name, Fumigation_type=my_fumigation_type, Cquantity_Control, Cquantity_Treatment, Vegetation_type, Plants_specs, Start_Year, End_Year, Remarks),
    by = c("prev_name")
    ) %>% 
  
  ## String replace in CSV file
  mutate( Data_type = str_replace_all(Data_type, " ", "_") )

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   ALIAS = col_double(),
##   Data_Number = col_double(),
##   ambient = col_double(),
##   ambient_Sd = col_double(),
##   elevated = col_double(),
##   elevated_Sd = col_double()
## )

## See spec(...) for full column specifications.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Experiment Number` = col_double(),
##   `N Quantity control` = col_double(),
##   `P Quantity control` = col_double(),
##   `K Quantity control` = col_double(),
##   `K Distribution` = col_logical()
## )

## See spec(...) for full column specifications.

Add my_varnam to the data table, defined by my_varnam = ifelse( varnam %in% c("nmin_mass", "nmin_area", "nh4_mass", "no3_mass", "nmin_solution", "nh4_solution", "no3_solution", "nmin_resin", "nh4_resin", "no3_resin").

df_c <- df_c %>%
  left_join( select( df_navl, exp_nam, Data_type, varnam_unified=varnam ), by=c("exp_nam", "Data_type") ) %>%
  mutate( varnam = ifelse( is.na(varnam), varnam_unified, varnam ) ) %>%
  mutate( my_varnam = ifelse( varnam %in% c("nmin_mass", "nmin_area", "nh4_mass", "no3_mass", "nmin_solution", "nh4_solution", "no3_solution", "nmin_resin", "nh4_resin", "no3_resin"), "my_navl", my_varnam ) ) %>%
  select(-varnam_unified)

Root:shoot ratio

Collect experiments info directly from data table.

df_c %>% 
  filter(Data_type=="root-shoot_ratio") %>% 
  select(exp_nam, prev_name, Fumigation_type.x, Vegetation_type.x ) %>% 
  distinct() %>% 
  knitr::kable()

Add my_varnam to the data table.

df_c <- df_c %>%
  mutate( my_varnam = ifelse( Data_type == "root-shoot_ratio", "my_rootshootratio", my_varnam ) )

N uptake

Collect experiments info directly from data table.

df_c %>% 
  filter(Data_type %in% c("N_uptake", "NH4+_uptake", "NO3-_uptake") ) %>% 
  select(exp_nam, prev_name, Fumigation_type.x, Vegetation_type.x ) %>% 
  distinct() %>% 
  knitr::kable()

Add my_varnam to the data table.

df_c <- df_c %>%
  mutate( my_varnam = ifelse( Data_type %in% c("N_uptake", "NH4+_uptake", "NO3-_uptake") , "my_nup", my_varnam ) )

N leaching

Collect experiments info directly from data table.

df_c %>% 
  filter(Data_type %in% c("N_leaching") ) %>% 
  select(exp_nam, prev_name, Fumigation_type.x, Vegetation_type.x ) %>% 
  distinct() %>% 
  View()

Add my_varnam to the data table.

df_c <- df_c %>%
  mutate( my_varnam = ifelse( Data_type %in% c("N_leaching") , "my_nleach", my_varnam ) )

SO FAR ONLY AVAILABLE FOR ONE EXPERIMENT. THEREFORE OMITTED IN META-ANALYSIS.

N2O emissions

Collect experiments info directly from data table.

df_c %>% 
  filter(Data_type == "soil_N2O_efflux" ) %>% 
  select(exp_nam, prev_name, Fumigation_type.x, Vegetation_type.x ) %>% 
  distinct() %>% 
  View()

Add my_varnam to the data table.

df_c <- df_c %>%
  mutate( my_varnam = ifelse( Data_type %in% c("soil_N2O_efflux") , "my_n2o", my_varnam ) )

SO FAR ONLY AVAILABLE FOR ONE EXPERIMENT. THEREFORE OMITTED IN META-ANALYSIS.

Soil C decomposition

Take data directly from Van Groenigen et al. (2014).

df_soilc_c <- read_csv(file="~/Dropbox/data/gcme/data_received_190325/soilc_decomp_vangroenigen14.csv") %>% 
  filter(treatment=="c") %>%
  
  ## Back-calculate SD of k from column wVk = 1/(SD_ambient^2/k_ambient^2 + SD_elevated^2/k_elevated^2)
  ## assuming that ambient_Sd = elevated_Sd
  rowwise() %>% 
  mutate( n_plots = n_experiments * n_ambient ) %>% 
  mutate( ambient_Sd = k_ambient * k_elevated / sqrt(wVk * (k_ambient^2 + k_elevated^2)) ) %>%
  mutate( elevated_Sd = ambient_Sd,
          varnam = "kdecay_soil",
          my_varnam = "my_kdecay_soil" ) %>% 
  left_join(df_experiments, by="prev_name")

## Warning: Duplicated column names deduplicated: 'tau' => 'tau_1' [11]

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   exp_nam_vangroenigen = col_character(),
##   prev_name = col_character(),
##   treatment = col_character(),
##   Ecosystem = col_character()
## )

## See spec(...) for full column specifications.

df_c <- df_c %>% 
  full_join( 
    select(df_soilc_c, exp_nam, prev_name, varnam, my_varnam, exp_nam_vangroenigen, ambient=k_ambient, elevated=k_elevated, ambient_Sd, elevated_Sd, n_plots ),
    by = c("prev_name", "exp_nam", "ambient", "elevated", "ambient_Sd", "elevated_Sd", "varnam", "my_varnam", "n_plots"))

ISSUE: No invormation on SD of decomposition rate estimates in ambient and elevated available.

For labeling

Prepare variable name association for labelling plots

df_varnams <- tibble(
  my_varnam = c("my_anpp", "my_nmass_leaf", "my_narea_leaf", "my_asat", "my_anet", "my_bnpp", "my_navl",
                "my_rootshootratio", "my_nup",   "my_nleach",  "my_n2o",        "my_kdecay_soil"),
  my_lab    = c("ANPP",    "Leaf Nmass",    "Leaf Narea",    "Asat",    "Anet",    "BNPP",    "N availability",
                "root:shoot ratio",  "N uptake", "N leaching", "N2O emissions", "Soil decomposition rate")
)

Data analysis

Test analysis and plot of ANPP data. Calculate the response ratio of ANPP (mean and variance) for each experiment. To get that, we first need to calcuate the means and standard deviation for the ambient and elevated levels, pooling multiple measurements (years, sampling dates), each given with mean \(\mu_i\), number \(N_i\) (replicates/plots), and standard deviation \(\sigma_i\) or standard error. For the function metafor::escalc(), we need standard deviations (\(SD\)). Calculate them for those rows where only standard errors \(SE\) are given as: \[ SD = SE \sqrt{N} \]

Now, do the meta-analysis and plot results.

library(metafor)   # see ?dat.curtis1998 for an example with CO2 manipulation data

## Loading required package: Matrix

## Loading 'metafor' package (version 2.1-0). For an overview 
## and introduction to the package please type: help(metafor).

## aggregate by variable and experiment, pooling multiple years, sampling dates, and plots/replicates and calculate log response ratio
df_c_sub <- df_c %>%         
  
  ## Here only for my variables, selected as described above 
  ## (leaving out "my_anet"; omitting "my_nleach" and "my_n2o" because only one experiment available)
  filter(my_varnam %in% c("my_anpp", "my_nmass_leaf", "my_narea_leaf", "my_asat", "my_bnpp", "my_navl", "my_rootshootratio", "my_nup", "my_kdecay_soil")) %>%
  
  # get standard deviation for all data
  mutate( my_ambient_sd = ambient_Sd, my_elevated_sd = elevated_Sd ) %>%
  rowwise() %>% 
  mutate( my_ambient_sd   = ifelse( is.na(my_ambient_sd),  ambient_Se  * sqrt(n_plots), my_ambient_sd ),
          my_elevated_sd  = ifelse( is.na(my_elevated_sd), elevated_Se * sqrt(n_plots), my_elevated_sd )) %>%

  ## Get logarithm of response ratio and its variance
  metafor::escalc( 
    measure = "ROM", 
    m1i = elevated, sd1i = my_elevated_sd, n1i = n_plots, 
    m2i = ambient,  sd2i = my_ambient_sd,  n2i = n_plots, 
    data=., 
    append = TRUE, var.names = c("logr", "logr_var") ) %>% 
  as_tibble() %>% 
  mutate( logr_se = sqrt(logr_var)/sqrt(n_plots) )

## pool all measurements (multiple years, sampling dates and plots) by variable and experiment for meta analysis
df_c_agg <- df_c_sub %>% 
  filter(!is.na(logr_var) & !is.na(logr)) %>% 
  select(-id) %>% # this is necessary because the dumb function agg() uses my 'id' otherwise
  mutate( id = paste(exp_nam, my_varnam, sep="_XXX_")) %>% 
  MAd::agg( id = id, es = logr, var = logr_var, n.1 = n_plots, n.2 = n_plots, cor = 1.0, method = "BHHR", data = . ) %>% 
  as_tibble() %>% 
  mutate( id = str_split(id, "_XXX_") ) %>% 
  mutate( exp_nam = purrr::map_chr(id, 1),
          my_varnam = purrr::map_chr(id, 2) ) %>% 
  select(exp_nam, my_varnam, es, var) %>% 

  ## add number of plots column and my_varnam
  left_join( df_c_sub %>% 
               group_by( exp_nam, my_varnam ) %>%
               summarise( n_plots = sum(n_plots) ) %>% 
               select( exp_nam, my_varnam, n_plots ),
             by = c("exp_nam", "my_varnam") ) %>% 
  rename( logr = es, logr_var = var ) %>% 
  mutate( logr_se = sqrt(logr_var)/sqrt(n_plots) ) %>% 
  left_join( df_varnams, by = "my_varnam" ) %>% 
  
  ## filter NA for exp_nam due to unidentified experiment name in soil decomposition dataset
  filter(exp_nam!="NA" & !is.na(exp_nam))

## aggregate by variable, doing a meta-analysis of the log response ratios, with experiment as random factor (method="DL")
agg_meta <- function(df, groupvar){
  
  out_meta <- df %>% dplyr::filter(my_varnam==eval(parse_character(groupvar))) %>% 
    
    # main meta analysis function call, adjusted step size (see http://www.metafor-project.org/doku.php/tips:convergence_problems_rma)
    # metafor::rma( logr, logr_var, method = "REML", slab = exp_nam, control = list(stepadj=0.3), data = . )
    metafor::rma.mv( logr, logr_var, method = "REML", random = ~ 1 | exp_nam, slab = exp_nam, control = list(stepadj=0.3), data = . )
  
  # transform back
  out_meta_scaled <- predict( out_meta, transf=exp )
  
  df_box <- tibble(
    my_varnam=groupvar, 
    middle = out_meta$b[1,1], 
    ymin   = out_meta$ci.lb, 
    ymax   = out_meta$ci.ub,
    
    middle_scaled = out_meta_scaled$pred, 
    ymin_scaled   = out_meta_scaled$ci.lb, 
    ymax_scaled   = out_meta_scaled$ci.ub
  )
  return(list(df_box=df_box, out_meta=out_meta))
}

# do meta-analysis on all variables
varlist <- unique(df_c_agg$my_varnam)
list_meta  <- purrr::map(as.list(varlist), ~agg_meta(df_c_agg, .))
df_metabox <- purrr::map_dfr(list_meta, "df_box") %>% left_join( df_varnams, by = "my_varnam" )
names(list_meta) <- varlist

Plot dots and my box

library(ggplot2)
df_c_agg %>%
  # arrange(logr) %>% 
  mutate( my_lab = factor(my_lab, levels=rev(c("Asat", "Leaf Nmass", "Leaf Narea", "ANPP", "N uptake", "N availability", "root:shoot ratio", "BNPP", "Soil decomposition rate")))) %>% 
  ggplot( aes(x=my_lab, y=logr)) +
  geom_jitter( color = rgb(0,0,0,0.3), aes( size = 1/logr_se ), position = position_jitter(w = 0.2, h = 0) ) +
  geom_crossbar( data = df_metabox, aes(x=my_lab, y=middle, ymin=ymin, ymax=ymax), fill = "grey80", alpha = 0.6, width = 0.5 ) +
  geom_hline( yintercept=0.0, size=0.5 ) +
  labs(x="", y="Log Response Ratio", size="1/SE") +
  coord_flip() +
  ylim(-1,1)

## Warning: Removed 5 rows containing missing values (geom_point).

Forest plots

And the forest plots:

forest_byvar <- function(list_meta, varnam_lab){
  par(mar=c(4,4,1,2))
  out_forest <- metafor::forest(list_meta$out_meta, xlab="Log Response Ratio", mlab="", xlim=c(-1,1), cex=0.5)
  text(out_forest$xlim[1], out_forest$ylim[2], varnam_lab,  pos=4, cex=0.7, font=2 )
}

df_varnams_sub <- df_varnams %>% filter(!(my_varnam %in% c("my_anet", "my_n2o", "my_nleach")))
purrr::map(
  as.list(1:nrow(df_varnams_sub)), 
  ~forest_byvar(list_meta[[df_varnams_sub$my_varnam[.]]], df_varnams_sub$my_lab[.])
  )

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL

This shows the log-response ratio of ANPP, each dot for one experiment, multiple years, sampling dates, and plots pooled. The size of the dots represents the standard error of the response ratio (the smaller, the bigger). The box represents the meta-analysis mean and confidence interval of the log response ratio.

Couplings

Plot effect sizes for two different variables against each other, given that data is available for both variables for a given experiment.

ANPP-BNPP

df_c_anpp_bnpp <- df_c_agg %>% 
  filter(my_lab %in% c("ANPP", "BNPP")) %>% 
  select(exp_nam, my_lab, logr) %>%
  tidyr::spread( my_lab, logr )

df_c_anpp_bnpp <- df_c_agg %>% 
  filter(my_lab %in% c("ANPP", "BNPP")) %>% 
  select(exp_nam, my_lab, logr_se) %>%
  tidyr::spread( my_lab, logr_se ) %>% 
  rename(seANPP=ANPP, seBNPP=BNPP) %>%
  right_join(df_c_anpp_bnpp, by="exp_nam") %>% 
  mutate(se = seANPP*seBNPP)

df_c_anpp_bnpp %>% 
  ggplot(aes(x=ANPP, y=BNPP, label=exp_nam)) +
  geom_point(aes(size=1/se)) +
  xlim(0, 0.8) + ylim(0, 0.8) +
  geom_abline() + 
  ggrepel::geom_text_repel(size=3, point.padding = 0.5, segment.alpha = 0) +
  labs(x="Log Response Ratio of ANPP", y="Log Response Ratio of BNPP", size="1/SE")

## Warning: Removed 34 rows containing missing values (geom_point).

## Warning: Removed 34 rows containing missing values (geom_text_repel).

  # geom_errorbar(aes(ymin=BNPP-seBNPP, ymax=BNPP+seBNPP)) +
  # geom_errorbarh(aes(xmin=ANPP-seANPP, xmax=ANPP+seANPP))

CO2 experimental data analysis

Beni Stocker

2019-04-03

Data screening

0. Issues

Sampling date

Duplicate rows

Distinct ambients

4. Long versus wide

An example conversion

Converting the entire table

Test

Converting back with a twist

Data selection

General filters

ANPP

Leaf N

Asat

BNPP

N availability

Root:shoot ratio

N uptake

N leaching

N2O emissions

Soil C decomposition

For labeling

Data analysis

Forest plots

Couplings

ANPP-BNPP