For a first data screening want to do the following:
Let’s check first if the data is ready to use for these steps.
But before we start:
ANPP_
, Asat_
, SLA_
, coarse_root_biomass_
, LMA_
, soil_NH4+_
, N_uptake_
). Needed to delete that manually.m2
)g_m-2
with g/m2
, or kg_ha-1
with kg/ha
, or g/m2*yr
with g/m2/yr
(assuming that’s correct if that’s the way it would be entered in R), _g/m2
with g/m2
, cm2/m2*d
with cm2/m2/d
, … I’ll leave it with that for now.Vcmax_normalized_to_25°C
with Vcmax25
, and Jmax_normalized_to_25°C
with Jmax25
(problematic degree sign)‘mol_CO2/m2/s
to mol_CO2/m2/s
All that is saved as a new file NewData_wide_CORRECTED2.csv
Load the corrected file, which is based on the one Kevin sent on 1.3.2019.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df <- read_csv("~/data/gcme/data_received_190325/NewData_wide_CORRECTED2.csv") %>%
# something is wrong in columns ambient_Sd, ambient_Se, elevated...
mutate( ambient_Sd = as.numeric(ambient_Sd), ambient_Se = as.numeric(ambient_Se),
elevated_Sd = as.numeric(elevated_Sd), elevated_Se = as.numeric(elevated_Se) )
## Parsed with column specification:
## cols(
## .default = col_double(),
## exp_nam = col_character(),
## prev_name = col_character(),
## factors = col_character(),
## treatment = col_character(),
## TT_Nut_Detail = col_character(),
## Data_type = col_character(),
## Unit = col_character(),
## Sampling_date = col_character(),
## ambient_Se = col_character(),
## elevated_Se = col_character()
## )
## See spec(...) for full column specifications.
# XXX testing
# filter(exp_nam == "SwissFACE_lolium2" & Data_type == "aboveground_biomass")
# filter(exp_nam == "SwissFACE_trifolium2" & Data_type == "aboveground_biomass")
# save experiments names
df_experiments <- df %>% select(exp_nam, prev_name) %>% distinct()
This is not read in correctly. Either continue doing manual edits (open CSV in editor and check whether it’s in a consistent format, at least within experiments), or do something clever if sampling date formats are at least consistent within experiments: e.g., reading the table experiment per experiment (making use of automatic interpretation of the sampling date format by read_csv()) and then stack all experiments’ data together again.
Some rows are simply duplicates. This is dangerous.
keyvars <- c("exp_nam", "factors", "treatment", "Data_type", "Unit", "Sampling_date", "Start_Year", "Year", "n_plots")
valuevars <- c("ambient", "ambient_Se", "ambient_Sd", "elevated", "elevated_Se", "elevated_Sd")
df %>%
filter(exp_nam == "SwissFACE_trifolium2" & Data_type == "aboveground_biomass") %>%
filter(treatment=="cf") %>%
select(one_of(keyvars, valuevars)) %>%
print()
## # A tibble: 2 x 15
## exp_nam factors treatment Data_type Unit Sampling_date Start_Year Year
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 SwissF… cf cf abovegro… g/m2… <NA> 1993 NA
## 2 SwissF… cf cf abovegro… g/m2… <NA> 1993 NA
## # … with 7 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## # ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>, elevated_Sd <dbl>
This may be remediated automatically by using only columns that are distinct w.r.t. valuevars
and keyvars
(basically all columns in original data, except ALIAS
).
print(nrow(df))
## [1] 36974
df <- df %>%
distinct_at(vars(one_of(keyvars, valuevars))) %>%
mutate( id=1:n() ) # create new ID key (before this was 'ALIAS')
print(nrow(df))
## [1] 36347
df %>%
filter(exp_nam == "SwissFACE_trifolium2" & Data_type == "aboveground_biomass") %>%
filter(treatment=="cf") %>%
select(one_of(keyvars, valuevars)) %>%
print()
## # A tibble: 1 x 15
## exp_nam factors treatment Data_type Unit Sampling_date Start_Year Year
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 SwissF… cf cf abovegro… g/m2… <NA> 1993 NA
## # … with 7 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## # ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>, elevated_Sd <dbl>
Still problems?
df %>%
filter(exp_nam == "SwissFACE_lolium2" & Data_type == "belowground_biomass") %>%
filter(treatment=="cf", Year==6) %>%
select(one_of(keyvars, valuevars)) %>%
print()
## # A tibble: 2 x 15
## exp_nam factors treatment Data_type Unit Sampling_date Start_Year Year
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 SwissF… cf cf belowgro… g/m2 11/1/1998 1993 6
## 2 SwissF… cf cf belowgro… g/m2 1-Nov-98 1993 6
## # … with 7 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## # ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>, elevated_Sd <dbl>
Yes. In this case, the problem has to do with Sampling_date
. I suspect that the problem with Sampling_date
is that it was manually entered and consistency may be respected within experiments (and variables?), but not across variables. When read into R with readr::read_csv()
, different formats are interpreted always with the same function. A possible solution to this is to read data separately for each experiment and variable and stack them together again (bind_rows()
).
TODO:
Sampling_date
as far as possible and reasonable. Then, develop a separate read-in by experiments and variables, stack data again, and write the combined data to file.The way the wide format is designed, the ambient values should be the same for different rows corresponding to different treatments (for a given experiment, variable, and sampling year). This is not always the case. It might be that these are actually different data from multiple samplings. However, this is not identifiable, but should be. An example:
df %>%
filter(exp_nam == "SwissFACE_lolium2" & Data_type == "Rh" ) %>%
print()
## # A tibble: 2 x 16
## exp_nam factors treatment Data_type Unit Sampling_date Start_Year Year
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 SwissF… cf c Rh g_SO… 2002 1993 10
## 2 SwissF… cf cf Rh g_SO… 2002 1993 10
## # … with 8 more variables: n_plots <dbl>, ambient <dbl>, ambient_Se <dbl>,
## # ambient_Sd <dbl>, elevated <dbl>, elevated_Se <dbl>,
## # elevated_Sd <dbl>, id <int>
TODO:
How to perform a conversion of our current wide format to a long format? Let’s do it for one example. An example for how the wide format is implemented in the current dataset:
df_wide <- df %>%
filter(exp_nam=="SwissFACE_lolium2", Data_type=="aboveground_biomass") %>%
select(exp_nam, factors, treatment, Data_type) %>%
distinct() %>%
mutate( id=1:nrow(.),
ambient=rep("A", 3), elevated=c("B", "C", "D"),
ambient_Sd=rep("sdA", 3), elevated_Sd=c("sdB", "sdC", "sdD"),
ambient_Se=rep("seA", 3), elevated_Se=c("seB", "seC", "seD")) %>%
print()
## # A tibble: 3 x 11
## exp_nam factors treatment Data_type id ambient elevated ambient_Sd
## <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 SwissF… cf c abovegro… 1 A B sdA
## 2 SwissF… cf cf abovegro… 2 A C sdA
## 3 SwissF… cf f abovegro… 3 A D sdA
## # … with 3 more variables: elevated_Sd <chr>, ambient_Se <chr>,
## # elevated_Se <chr>
To convert this to a long format, first get all available factors and create a vector. The available factors are:
df %>%
filter(!(factors %in% c("x","x99"))) %>%
select(factors) %>%
unique() %>%
unlist() %>%
unname() %>%
print()
## [1] "cwd" "c" "cf" "cfd" "cfw" "ci" "cfi" "cw" "cdi" "cfwi"
## [11] "cd" "i" "cdf" "df" "f" "fi" "co" "o" "cwi" "w"
## [21] "d" "wd" "fd" "fw" "wi" "wdi" "fdi"
Create a vector with all single factors
factors_avl <- df %>%
filter(!(factors %in% c("x","x99"))) %>% # Something weird with this
select(factors) %>%
unique() %>%
mutate( factors_sep = strsplit(factors, NULL) ) %>%
select(factors_sep) %>%
unlist() %>%
unname() %>%
unique() %>%
print()
## [1] "c" "w" "d" "f" "i" "o"
Convert from wide to long, adding columns for all factors for which we have ambient and elevated levels’ data.
wide_to_long_gcme <- function( df_wide, keyvars ){
require(stringr)
## if level is 'elevated', turn all factor levels that are part of 'treatment' to TRUE
switch_factor <- function(df){
if (df$level=="elevated"){
for (ifactor in strsplit(df$treatment, NULL) %>% unlist()){
df[[ifactor]] <- TRUE
}
}
return(df)
}
## First get available factors
factors_avl <- df_wide %>%
filter(!(factors %in% c("x","x99"))) %>% # Something weird with this
select(factors) %>%
unique() %>%
mutate( factors_sep = strsplit(factors, NULL) ) %>%
select(factors_sep) %>%
unlist() %>%
unname() %>%
unique()
df_long <- df_wide %>%
## gather 'mean' based on columns: 'ambient', 'elevated'
select( -ambient_Sd, -elevated_Sd, -ambient_Se, -elevated_Se ) %>%
tidyr::gather(level, mean, c(ambient, elevated)) %>%
## gather 'sd' based on columns: 'ambient_Sd', 'elevated_Sd'
left_join(
select( df_wide, -ambient, -elevated, -ambient_Se, -elevated_Se ) %>%
tidyr::gather(level, sd, c(ambient_Sd, elevated_Sd)) %>%
mutate( level = stringr::str_split(.$level, "_") %>% purrr::map(., 1) %>% unlist() ),
by = c(keyvars) ) %>%
## gather 'se' based on columns: 'ambient_Se', 'elevated_Se'
left_join(
select( df_wide, -ambient, -elevated, -ambient_Sd, -elevated_Sd ) %>%
tidyr::gather(level, se, c(ambient_Se, elevated_Se)) %>%
mutate( level = stringr::str_split(.$level, "_") %>% purrr::map(., 1) %>% unlist() ),
by = c(keyvars) ) %>%
## magically add columns corresponding to all available levels of 'factors'
`is.na<-`(factors_avl) %>%
## fill all new factor columns with FALSE
dplyr::mutate_at( factors_avl, ~FALSE )
## Determine new factor columns based on information in 'treatment' and 'level'
df_long <- purrr::map_dfr(as.list(1:nrow(df_long)), ~switch_factor(df_long[.,]))
## Now we have duplicate ambients, treat them separately and stack ambient and elevated together again
df_long <- df_long %>% filter(level=="ambient") %>%
# WARNING: This assumes that if values are identical, they are repetitions, ignoring differences w.r.t Sampling_Date
distinct( exp_nam, factors, Data_type, Unit, Year, mean, sd, se, .keep_all=TRUE ) %>%
# bind them together again. also cleaning replicated rows (note treatment added as column!)
bind_rows( .,
filter(df_long, level=="elevated") %>%
distinct( exp_nam, factors, treatment, Data_type, Unit, Year, mean, sd, se, .keep_all=TRUE )
) %>%
## abandon columns, now obsolete with boolean columns for factors
select(-treatment, -level, -id)
return(df_long)
}
# run it
keyvars <- c("id", "exp_nam", "factors", "level", "treatment", "Data_type")
valuevars <- c("ambient", "ambient_Se", "ambient_Sd", "elevated", "elevated_Se", "elevated_Sd")
df_long_example <- df_wide %>%
select(one_of(keyvars, valuevars)) %>%
wide_to_long_gcme( keyvars ) %>%
print()
## Loading required package: stringr
## # A tibble: 4 x 8
## exp_nam factors Data_type mean sd se c f
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl>
## 1 SwissFACE_loliu… cf aboveground_bioma… A sdA seA FALSE FALSE
## 2 SwissFACE_loliu… cf aboveground_bioma… B sdB seB TRUE FALSE
## 3 SwissFACE_loliu… cf aboveground_bioma… C sdC seC TRUE TRUE
## 4 SwissFACE_loliu… cf aboveground_bioma… D sdD seD FALSE TRUE
Let’s apply this to the full data frame. First, keep only some of the columns:
keyvars <- c("id", "exp_nam", "factors", "treatment", "level", "Data_type", "Unit", "Start_Year", "Year", "n_plots")
valuevars <- c("ambient", "ambient_Se", "ambient_Sd", "elevated", "elevated_Se", "elevated_Sd")
df_long <- df %>%
select(one_of(keyvars, valuevars)) %>%
# filter(exp_nam=="BioCON" & Data_type=="leaf_N" & Year==4) %>% # xxx debug
wide_to_long_gcme(keyvars) %>%
write_csv(path="~/data/gcme/data_received_190325/NewData_LONG.csv")
Thus, we have a long-format data table, that has less than twice as many rows as the original table, because the “absolute ambient” is no longer repeated:
print(nrow(df)*2)
nrow(df_long)
Test, whether the conversion was doing alright.
check_cf <- function(df_wide, df_long, verbose=FALSE){
## check cf experiments
df_cf <- df_wide %>% filter(treatment %in% c("f","c","cf","fc"))
out <- tibble()
all_exp <- df_cf$exp_nam %>% unique
for (iexp in all_exp){
df_sub <- df_cf %>% filter(exp_nam==iexp)
all_var <- df_sub$Data_type %>% unique
for (ivar in all_var){
df_subsub <- df_sub %>% filter(Data_type==ivar)
all_var_unit <- df_subsub$Unit %>% unique
for (iunit in all_var_unit){
df_subsubsub <- df_subsub %>% filter(Unit==iunit)
# elevated-f treatment
correct_f <- df_subsubsub %>%
filter(treatment=="f") %>%
select(exp_nam, Data_type, Unit, Year, mean=elevated, se=elevated_Se, sd=elevated_Sd) %>%
all_equal( ., df_long %>% filter(f & !c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )
# elevated-c treatment
correct_c <- df_subsubsub %>%
filter(treatment=="c") %>%
select(exp_nam, Data_type, Unit, Year, mean=elevated, se=elevated_Se, sd=elevated_Sd) %>%
all_equal( ., df_long %>% filter(!f & c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )
# elevated-cf treatment
correct_cf <- df_subsubsub %>%
filter(treatment=="cf") %>%
select(exp_nam, Data_type, Unit, Year, mean=elevated, se=elevated_Se, sd=elevated_Sd) %>%
all_equal( ., df_long %>% filter(f & c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )
# (absolute) control: WARNING: DUPLICATED ROWS
correct_0 <- df_subsubsub %>%
distinct(exp_nam, Data_type, Unit, Year, ambient, ambient_Se, ambient_Sd, .keep_all=TRUE) %>%
select(exp_nam, Data_type, Unit, Year, mean=ambient, se=ambient_Se, sd=ambient_Sd) %>%
all_equal( ., df_long %>% filter(!f & !c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>% select(exp_nam, Data_type, Unit, Year, mean, se, sd) )
if (verbose){
if (!identical(correct_0, TRUE)){
rlang::warn(correct_0)
print("WIDE:")
df_subsubsub %>%
distinct(exp_nam, Data_type, Unit, Year, ambient, ambient_Se, ambient_Sd, .keep_all=TRUE) %>%
select(exp_nam, Data_type, Unit, Year, mean=ambient, se=ambient_Se, sd=ambient_Sd) %>%
print()
print("LONG:")
df_long %>%
filter(!f & !c & exp_nam==iexp & Data_type==ivar & Unit==iunit) %>%
select(exp_nam, Data_type, Unit, Year, mean, se, sd) %>%
print()
print("---------")
}
}
out <- bind_rows(out, c(factors="cf", exp_nam=iexp, Data_type=ivar, Unit=iunit, correct_0=correct_0, correct_c=correct_c, correct_f=correct_f, correct_cf=correct_cf))
}
}
}
return(out)
}
#out <- check_cf( df, df_long) %>% head()
Issue:
correct_0
in out
). See for example:## note row 18 is missing in wide
out <- check_cf( filter(df, exp_nam=="BioCON", Data_type=="leaf_N", Year==4), df_long, verbose = TRUE)
TODO:
out
should have TRUE
in all columns.In the original data, ambient
was the “absolute ambient”. For example in the cf
treatment, ambient
was no treatment at all (control), and elevated
was c
\(\times\) f
. We may re-design a wide version of the table and allow flexibility in what treatments are compared to each other, i.e., what ambient
means. I would argue that this depends whether we are interested in a single effect or in interactions. In our case, we are interested primarily in the CO2 effect. Hence, in a factorial experiment cf
, we may want to compare c
to 0
(an “absolute ambient”), and cf
to f
and thus have more data available to evaluate the CO2-only effect.
Let’s implement this, by converting our example back to a CO2-wide format:
long_to_wide_gcme <- function(df_long, keyvar){
joinvars <- names(df_long)[-which(names(df_long) %in% c("mean", "sd", "se", keyvar))]
factors_all <- df_long %>%
filter(!(factors %in% c("x","x99"))) %>% # Something weird with this
select(factors) %>%
unique() %>%
mutate( factors_sep = strsplit(factors, NULL) ) %>%
select(factors_sep) %>%
unlist() %>%
unname() %>%
unique()
factorvars <- factors_all[-which(factors_all==keyvar)]
# take all data where factor 'c' is TRUE
df_wide <- dplyr::filter(df_long, eval(parse(text=keyvar)) ) %>%
# call this 'elevated'
dplyr::rename(elevated=mean, elevated_sd=sd, elevated_se=se) %>%
# remove column 'c', is no longer used
dplyr::select(-keyvar) %>%
# merge this with the corresponding row where all other factors are the same, while 'c' is FALSE
left_join(
# take all data where factor 'c' is FALSE
dplyr::filter(df_long, !(eval(parse(text=keyvar)))) %>%
# call this 'ambient'
dplyr::rename(ambient=mean, ambient_sd=sd, ambient_se=se) %>%
# remove column 'c', is no longer used
dplyr::select(-keyvar),
## merge by all other columns
by = joinvars ) %>%
# order columns
select( c(joinvars, ambient, elevated, ambient_se, elevated_se, ambient_sd, elevated_sd) )
return(df_wide)
}
Run it
df_long_example %>%
long_to_wide_gcme("c") %>%
print()
## # A tibble: 2 x 10
## exp_nam factors Data_type f ambient elevated ambient_se elevated_se
## <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr>
## 1 SwissF… cf abovegro… FALSE A B seA seB
## 2 SwissF… cf abovegro… TRUE D C seD seC
## # … with 2 more variables: ambient_sd <chr>, elevated_sd <chr>
The same can be done w.r.t. the fertilisation treatment, with two rows corresponding now to whether or not CO2 was elevated.
df_long_example %>%
long_to_wide_gcme("f") %>%
print()
## # A tibble: 2 x 10
## exp_nam factors Data_type c ambient elevated ambient_se elevated_se
## <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr>
## 1 SwissF… cf abovegro… TRUE B C seB seC
## 2 SwissF… cf abovegro… FALSE A D seA seD
## # … with 2 more variables: ambient_sd <chr>, elevated_sd <chr>
Let’s apply this to the full data frame.
keyvars <- c("ALIAS", "exp_nam", "factors", "Data_type", "Unit", "Sampling_date", "n_plots")
valuevars <- c("mean", "sd", "se")
df_wide_c <- df_long %>%
# select(one_of(keyvars, valuevars, factors_avl)) %>%
# select(-prev_name, -TT_Nut_Detail) %>%
long_to_wide_gcme("c") %>%
write_csv(path="~/data/gcme/data_received_190325/NewData_WIDE_c.csv")
TODO:
The initial draft of the figures we want to draw looks like this:
The following is a list of variables we need according to this draft, and the corresponding (possibly) relevant variables (Data_type
) in our dataset.
Add two columns to the data. One for the new standard variable name (varnam
), taken from CSV files. And one for this specific study where multiple variables (based on varnames
) are pooled because they contain relevant and comparable information when evaluating response ratios (named my_varnam
).
ANPP: my_varnam = my_anpp
, and varnam %in% c("anpp", "anpp_bm")
.
Collect information about experiments providing data for any of c("anpp", "anpp_bm")
. Use file table_var_exp_names_anpp.csv
and manually add a new row varnam
there, translating available variable names (Data_type
) into new, unified variable names. Add additional information from the experiments table (table_var_exp_names_experiments.csv
) and from the data frame df_experiments
, which is used to associate new and old experiments names.
df_anpp <- read_csv("~/data/gcme/data_received_190325/table_var_exp_names_anpp.csv") %>%
select(prev_name=expnam, varnam, Data_type, Source_Reference) %>%
distinct() %>%
filter(varnam %in% c("anpp", "anpp_bm")) %>%
## add 'exp_nam' (new experiment name) from data table
left_join( df_experiments, by="prev_name" ) %>%
## add experiments meta information from experiments table
left_join(
read_csv("~/data/gcme/data_received_190325/table_var_exp_names_experiments.csv") %>%
select(prev_name, Fumigation_type, Cquantity_Control, Cquantity_Treatment, Vegetation_type, Plants_specs, Start_Year, End_Year, Remarks),
by = c("prev_name")
)
## Parsed with column specification:
## cols(
## .default = col_character(),
## ALIAS = col_double(),
## Data_Number = col_double(),
## ambient = col_double(),
## ambient_Se = col_double(),
## ambient_Sd = col_double(),
## elevated = col_double(),
## elevated_Se = col_double(),
## elevated_Sd = col_double()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_character(),
## `Experiment Number` = col_double(),
## `N Quantity control` = col_double(),
## `P Quantity control` = col_double(),
## `K Quantity control` = col_double(),
## `K Distribution` = col_logical()
## )
## See spec(...) for full column specifications.
Add my_varnam
to the data table, defined by my_varnam = ifelse( varnam %in% c("anpp", "anpp_bm"), "my_anpp", my_varnam )
. WARNING: This assumes that variable names ‘Data_type’ are identical in old and new dataset (manually replaced white spaces with underscores in table_var_exp_names_anpp.csv, first)
df <- df %>%
mutate( varnam=NA ) %>%
left_join( select( df_anpp, -prev_name ), by=c("exp_nam", "Data_type") ) %>%
distinct(id, .keep_all=TRUE) %>%
rename( varnam = varnam.x ) %>%
rowwise() %>%
mutate( varnam = ifelse(!is.na(varnam.y), varnam.y, varnam ) ) %>%
select( -varnam.y ) %>%
mutate( my_varnam = NA ) %>%
mutate( my_varnam = ifelse( varnam %in% c("anpp", "anpp_bm"), "my_anpp", my_varnam ) )
Test analysis and plot of ANPP data. Calculate the response ratio of ANPP (mean and variance) for each experiment. To get that, we first need to calcuate the means and standard deviation for the ambient and elevated levels, pooling multiple measurements (years, sampling dates), each given with mean \(\mu_i\), number \(N_i\) (replicates/plots), and standard deviation \(\sigma_i\) or standard error. For the function metafor::escalc()
, we need standard deviations (\(SD\)). Calculate them for those rows where only standard errors \(SE\) are given as: \[
SD = SE \sqrt{N}
\] Then, the pooled means are: \[
\mu = \frac{ \sum_i N_i \mu_i }{\sum_i N_i}
\] and standard deviations: \[
\sigma = \sqrt{ \frac{\sum_i (N_i-1)\sigma_i^2}{\sum_i (N_i-1)} }
\]
QUESTION TO CESAR:
library(metafor) # see ?dat.curtis1998 for an example with CO2 manipulation data
## Loading required package: Matrix
## Loading 'metafor' package (version 2.0-0). For an overview
## and introduction to the package please type: help(metafor).
mean_pooled <- function( x, n ){
sum(x * n)/sum(n)
}
sd_pooled <- function( sd, n ){
sqrt( sum( (n-1)*sd^2 ) / sum(n-1) )
}
## aggregate by variable and experiment, pooling multiple years, sampling dates, and plots/replicates and calculate log response ratio
df_c_anpp_agg <- df %>%
filter(treatment=="c") %>% # filter experiments with only manipulated CO2 (no other factors manipulated, strong reduction of data)
filter(my_varnam=="my_anpp") %>%
mutate( my_ambient_sd = ambient_Sd, my_elevated_sd = elevated_Sd ) %>% # get standard deviation for all data
rowwise() %>%
mutate( my_ambient_sd = ifelse( is.na(my_ambient_sd), ambient_Se * sqrt(n_plots), my_ambient_sd ),
my_elevated_sd = ifelse( is.na(my_elevated_sd), elevated_Se * sqrt(n_plots), my_elevated_sd )) %>%
## pool all measurements (multiple years, sampling dates and plots) by variable and experiment for meta analysis
group_by( exp_nam, treatment, my_varnam ) %>% # prepare for aggregation
summarise( ambient = mean_pooled( ambient, n_plots ),
elevated = mean_pooled( elevated, n_plots ),
ambient_sd = sd_pooled( my_ambient_sd, n_plots ),
elevated_sd = sd_pooled( my_elevated_sd, n_plots ),
n_plots = sum(n_plots) ) %>%
## Get logarithm of response ratio and its variance
metafor::escalc(
measure = "ROM",
m1i = elevated, sd1i = elevated_sd, n1i = n_plots,
m2i = ambient, sd2i = ambient_sd, n2i = n_plots,
data=.,
append = TRUE, var.names = c("logr", "logr_var") ) %>%
as_tibble() %>%
mutate( logr_se = sqrt(logr_var)/sqrt(n_plots) )
## aggregate by variable, doing a meta-analysis of the log response ratios, with experiment as random factor (method="DL")
agg_meta <- function(df, groupvar){
out_meta <- df %>% dplyr::filter(my_varnam==eval(parse_character(groupvar))) %>%
metafor::rma( logr, logr_var, method = "DL", data = . )
# transform back
out_meta_scaled <- predict( out_meta, transf=exp )
tibble(
my_varnam=groupvar,
middle = out_meta$b[1,1],
ymin = out_meta$ci.lb,
ymax = out_meta$ci.ub,
middle_scaled = out_meta_scaled$pred,
ymin_scaled = out_meta_scaled$ci.lb,
ymax_scaled = out_meta_scaled$ci.ub
)
}
# do meta-analysis all variables
df_meta <- purrr::map_dfr(as.list(unique(df_c_anpp_agg$my_varnam)), ~agg_meta(df_c_anpp_agg, .))
# Plot
library(ggplot2)
df_c_anpp_agg %>%
ggplot( aes(x=my_varnam, y=logr)) +
geom_jitter( color = rgb(0,0,0,0.3), aes( size = 1/logr_se ), position = position_jitter(w = 0.2, h = 0) ) +
geom_crossbar( data = df_meta, aes(x=my_varnam, y=middle, ymin=ymin, ymax=ymax), fill = "grey80", alpha = 0.6, width = 0.5 ) +
coord_flip() +
ylim(-1,1)
This shows the log-response ratio of ANPP, each dot for one experiment, multiple years, sampling dates, and plots pooled. The size of the dots represents the standard error of the response ratio (the smaller, the bigger). The box represents the meta-analysis mean and confidence interval of the log response ratio.
And some information about the data used for my_anpp
(only top 20 rows of a total of 110 available rows)
df_anpp %>% select(-prev_name, -Cquantity_Control) %>% distinct(varnam, Data_type, exp_nam, .keep_all=TRUE) %>% slice(1:10) %>% knitr::kable()
varnam | Data_type | Source_Reference | exp_nam | Fumigation_type | Cquantity_Treatment | Vegetation_type | Plants_specs | Start_Year | End_Year | Remarks |
---|---|---|---|---|---|---|---|---|---|---|
anpp | ANPP | Adair_et_al.,_2009 | BioCON | FACE | ambient, 560 (+180) | perennial grassland | Plots were seeded initially with different biodiversity ratios between C3, C4, perennial grasses, forbs and legumes. | 1998 | NA | NA |
anpp | ANPP | Schlesinger_et_al.,_2006 | DUKE | FACE | ambient, +200 | Pinus taeda | Pinus taeda forest (98%), other 2% is Acer rubrum, Ulmus alata, Liquidambar styraciflua, Liriodendron tulipifera and Cercis canadensis | 1996 | NA | NA |
anpp | ANPP | Liberloo_et_al.,_2005 | EUROFACE4_pa | FACE | ambient, 550 | Populus alba | hardwood cuttings (3 year SRC) | 1999 | NA | NA |
anpp | ANPP | Liberloo_et_al.,_2005 | EUROFACE4_pe | FACE | ambient, 550 | Populus x euramericana | hardwood cuttings (3 year SRC) | 1999 | NA | NA |
anpp | ANPP | Liberloo_et_al.,_2005 | EUROFACE4_pn | FACE | ambient, 550 | Populus nigra | hardwood cuttings (3 year SRC) | 1999 | NA | NA |
anpp | ANPP | Hochstrasser_et_al._2012 | GiFACE | FACE | 450–>ambient+20% | temperate, mixed-species pasture | 12 grass sp, 2 legumes, 15 non-leguminous herbs; characterized as an Arrhenatheretum elatioris Br.Bl. Filipendula ulmaria sub-community | 1998 | NA | NA |
anpp | ANPP | Dukes_et_al.,_2005 | JRBP_FACE1 | FACE | ambient, +300 | California annual grassland | Avena barbata and A. fatua, Bromus hordeaceus and Lolium multiflorum dominate the site | 1998 | NA | NA |
anpp | ANPP | Dukes_et_al.,_2005 | JRBP_FACE2 | FACE | ambient, 680 | California annual grassland | Avena barbata and A. fatua, Bromus hordeaceus and Lolium multiflorum dominate the site | 1998 | NA | NA |
anpp | ANPP | Henry_et_al.,_2006 | JRBP_FACE3 | FACE | ambient, +300 | California annual grassland | Avena barbata and A. fatua, Bromus hordeaceus and Lolium multiflorum dominate the site | 1998 | NA | NA |
anpp | ANPP | Dijkstra_et_al.,_2002 | MI | OTC | 410, 760 | Shrub Oak system | Quercus myrtifolia (76%), Q. Geminata (15%), Q. Chapmanii (7%), Serenoa repense and Lyonia ferreginea | 1996 | NA | NA |
Issue:
exp_nam
and prev_name
in the new data table. For the following old experiment names, no new names could be found:df_anpp %>% filter(is.na(exp_nam)) %>% select(prev_name) %>% distinct() %>% unlist() %>% unname() %>% print()
## [1] "N\x8cntuna_c"