My main fear with the escalc()
function is that the raw correlations must first be transformed before using measure='UCOR'
or measure='ZCOR'
. However, shown below, the necessary transformations are conducted within the escalc()
function and therefore the user does not have to do this by hand. In addition, at least for the relationship between total P and the chlorophyll a, the results of the full model are all very significant (pvals\(<0.001\)) and the over all effect size does not vary much when using any of the three specifications for the measure
argurment.
We report initial findings from multiple three-level meta-analyses on the nutrient systematic review data. The R code used to generate the figures and results below are embedded in the HTML document itself. Code and output are visible in the shaded boxes. We focus our efforts on just the differences in the overall estimated effect for total phosphorus on chlorophyll a when using three different specifications when using the escalc()
function in R.
These data are aggregated by hand from 295 individual studies involving the effects of nitrogen and phosphorus on chlorophyll, diatoms, and various macroinvertebrates. The result of this data aggregation process is a dataset with \(n=7,424\) observations and \(p=413\) columns. For congruence with Micah’s previous analyses, the 13 review studies are removed, reducing the dataset to \(n=7,411\) observations. A subset of this dataset can be viewed below. Only the first five rows are currently displayed, but this can be changed if necessary. Also, Only the first fifteen columns are displayed here as the HTML file compiling time increases significantly when all column variables are included. Columns can be sorted or subsetted directly in the document.
As is very common in any meta-analytic study, there are a non-trivial number of missing data points (Lajeunesse 2013). A distribution of the missing data by column can be seen in Figure 2.1 below. The maximum number of missing observations for each column is 7,411, which corresponds to the number of rows in the dataframe. The columns range from no missing data to completely missing (SED.LOAD.MEDIAN is only column that is completely missing).
## Number of missing observations by column
Miss_Col <- apply(Data,2,function(x) sum(is.na(x)))
ggplot(data.frame(Miss_Col),aes(x=Miss_Col)) +
geom_histogram(aes(y=..density..),color="blue",
fill="white",alpha=0.75,bins=20) +
geom_density(alpha=0.5,fill='black') +
xlab('Number of Missing Values by Column (max=7,411)') +
ylab('Density')
Figure 2.1: The density of missing observations by column (maximum of 7,411 missing observations corresponding to the number of rows in the dataset).
In Figure 2.2, we display the number of missing values for each observation (row). here, the maximum number of missing observations for each row is 413, which corresponds to the number of columns in the dataframe. The rows range from 168 missing data points to 364 missing data points.
## Number of missing observations by column
Miss_Row <- apply(Data,1,function(x) sum(is.na(x)))
ggplot(data.frame(Miss_Row),aes(x=Miss_Row)) +
geom_histogram(aes(y=..density..),color="blue",
fill="white",alpha=0.75,bins=20) +
geom_density(alpha=0.5,fill='black') +
xlab('Number of Missing Values by Row (max=413)') +
ylab('Density')
Figure 2.2: The density of missing observations by row (maximum of 413 missing observations corresponding to the number of columns in the dataset).
RESPONSE.MEASURE.VALUE2 is the main response variable in this dataframe. The values in this column represent the raw Pearson correlation coefficients and converted Pearson correlation coefficients using Spearman correlation, Kendall’s \(\tau\), standardized slope coefficients, and R2. These standardized response measurements were gleaned from each of the 295 studies. After reducing the data to only those studies which have a measureable RESPONSE.MEASURE.VALUE2, there are 5,248 observations from 228 unique studies. Figures 2.3 and 2.4 display separate boxplots of these correlation values as a function of year and a function of study type, respectively.
## Response Measures boxplots as a function of study year
ggplot(Data, aes(x=factor(YEAR),y=RESPONSE.MEASURE.VALUE2)) +
geom_boxplot(fill='#A4A4A4', color="black")+
theme_classic() + xlab('Year')
## Warning: Removed 2173 rows containing non-finite values (stat_boxplot).
Figure 2.3: Boxplots of the Pearson correlation coefficient as a function of study year.
## Response Measures boxplots as a function of study type
ggplot(Data, aes(x=factor(STUDY.TYPE),y=RESPONSE.MEASURE.VALUE2)) +
geom_boxplot(fill='#A4A4A4', color="black")+
theme_classic() + xlab('Study Type')
## Warning: Removed 2173 rows containing non-finite values (stat_boxplot).
Figure 2.4: Boxplots of the Pearson correlation coefficient as a function of study type.
Notice the corresponding R code that creates Figures 2.3 and 2.4 includes the following warning message: ## Warning: Removed 2173
rows containing non-finite values (stat_boxplot).
. This warning message is a result of the 2,173 NA values for RESPONSE.MEASURE.VALUE2. Presumably, these studies did not report Pearson correlation values, nor any other measure that may reasonably estimate a Pearson correlation.
In this section, we will explore a series of various three-level meta-analytic models as documented in (Assink 2016). Most notably, we will explore with three different measure
types in the escalc()
package: measure='COR'
, measure='UCOR'
, and measure='ZCOR'
.
Because these Pearson correlation coefficients (raw or transformed) are collected from 295 studies, it may be the case that multiple Pearson correlations are reported in the same study. A consequence of this, is that three different types of variation are being introduced into this problem:
Hence the reason for fitting a three-level meta-analytic model. The term “three-level” refers to the fact that three separate sources of variation need to be quantified and accounted for in an appropriate model.
According to the R documentation page, the escalc()
function (from the metafor
package) is used to calculate effect sizes or outcome measures that are used in meta-analyses. Put differently, this function is used to aggregate relevant study results into a form (or outcome measure) that can later be used to calculate a test statistic and examined more thoroughly. The measure
argument specifies the form of the raw data that are to be transformed via escalc()
.
measure='COR'
When using measure='COR'
, the original variable should be expressed as a raw correlation coefficient. Because the response vaiable represents variables which themselves are raw correlations (or close approximations), I believe that this is likely a correct specification of the measure
argument. Only total phosphorus (TP) is examined in this preliminary study.
########### Using measure='COR' ###########
ses.tp.yv <- escalc(measure="COR",ri=RESPONSE.MEASURE.VALUE2,
ni=IMPACT.SAMPLES, data=ses.tp.all)
## Warning in escalc(measure = "COR", ri = RESPONSE.MEASURE.VALUE2, ni =
## IMPACT.SAMPLES, : Cannot estimate the sampling variance when ni <= 4.
## Remove NAs from rows with impact samples <=4
ses.tp.yv <- ses.tp.yv[!is.na(ses.tp.yv$yi) & !is.na(ses.tp.yv$vi),]
## Put yi and vi as first and second columns
ses.tp.yv <- ses.tp.yv[,c(414, 415, 1:413)]
## Reducing the dataframe
Reduced = cbind.data.frame(RESPONSE.MEASURE.VALUE2=ses.tp.yv$RESPONSE.MEASURE.VALUE2,
ni = ses.tp.yv$IMPACT.SAMPLES,yi = ses.tp.yv$yi,
vi = ses.tp.yv$vi)
## This line adds an interactive viewer to the html file
datatable(Reduced,rownames=T,filter="top",options=list(pageLength=5,scrollX=T))
Observe that the yi column is exactly the same as RESPONSE.MEASURE.VALUE2, implying that the raw correlations remain untouched in escalc()
when measure='COR'
is specified.
########### ASSINK PAPER CODE ###########
## Estimate the overall effect by fitting an intercept-only model.
overall <- rma.mv(yi, vi, random = list(~ 1 | uniquecepair, ~ 1 | CITATION.ID),
tdist=TRUE, data=ses.tp.yv)
## The results rounded to 3 decimals
summary(overall,digits=3)
##
## Multivariate Meta-Analysis Model (k = 149; method: REML)
##
## logLik Deviance AIC BIC AICc
## -45.082 90.163 96.163 105.155 96.330
##
## Variance Components:
##
## estim sqrt nlvls fixed factor
## sigma^2.1 0.036 0.189 148 no uniquecepair
## sigma^2.2 0.095 0.309 72 no CITATION.ID
##
## Test for Heterogeneity:
## Q(df = 148) = 3139.461, p-val < .001
##
## Model Results:
##
## estimate se tval pval ci.lb ci.ub
## 0.414 0.043 9.703 <.001 0.329 0.498 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output above, we observe 148 effect sizes from 72 separate studies. The estimated variance between effect sizes within studies is 0.036 and the estimated variance between studies is 0.095. The overall estimated effect size is 0.414, which is highly significant.
measure='UCOR'
When using measure='UCOR'
, the original variable should be expressed as a raw correlation coefficient that has been corrected for its slight negative bias (based on Equation 2.3 in Olkin and Pratt (1958), see equation below). I am not sure how we would determine if the Pearson correlations have a slight negative bias without the original data. I would recommend avoiding measure='UCOR'
.
\[ G(r) = r\frac{\Gamma(\frac{n-1}{2})}{\Gamma(\frac{1}{2})\Gamma(\frac{n-2}{2})}\int^1_0\frac{t^{-1/2}(1-t)^{((n-2)/2)-1}}{[1-t(1-r^2)]^{1/2}}dt \]
Here, \(r\) represents correlation, \(\Gamma\) indicate the gamma function, and \(n>2\) represents sample size. For the purpose of observing how results differ for TP, we will continue with the analysis.
########### Using measure='UCOR' ###########
ses.tp.yv <- escalc(measure="UCOR",ri=RESPONSE.MEASURE.VALUE2,
ni=IMPACT.SAMPLES, data=ses.tp.all)
## Warning in escalc(measure = "UCOR", ri = RESPONSE.MEASURE.VALUE2, ni =
## IMPACT.SAMPLES, : Cannot compute the bias-corrected correlation coefficient
## when ni <= 4.
## Remove NAs from rows with impact samples <=4
ses.tp.yv <- ses.tp.yv[!is.na(ses.tp.yv$yi) & !is.na(ses.tp.yv$vi),]
## Put yi and vi as first and second columns
ses.tp.yv <- ses.tp.yv[,c(414, 415, 1:413)]
########### Using measure='UCOR' ###########
########### ASSINK PAPER CODE ###########
## Estimate the overall effect by fitting an intercept-only model.
overall <- rma.mv(yi, vi, random = list(~ 1 | uniquecepair, ~ 1 | CITATION.ID),
tdist=TRUE, data=ses.tp.yv)
## The results rounded to 3 decimals
summary(overall,digits=3)
##
## Multivariate Meta-Analysis Model (k = 149; method: REML)
##
## logLik Deviance AIC BIC AICc
## -49.090 98.180 104.180 113.171 104.346
##
## Variance Components:
##
## estim sqrt nlvls fixed factor
## sigma^2.1 0.043 0.208 148 no uniquecepair
## sigma^2.2 0.093 0.305 72 no CITATION.ID
##
## Test for Heterogeneity:
## Q(df = 148) = 3541.655, p-val < .001
##
## Model Results:
##
## estimate se tval pval ci.lb ci.ub
## 0.420 0.043 9.789 <.001 0.335 0.505 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output above, we observe 148 effect sizes from 72 separate studies. The estimated variance between effect sizes within studies is 0.043 and the estimated variance between studies is 0.093. The overall estimated effect size is 0.420, which is highly significant. Overall, the results do not differ that much from measure='COR'
.
measure='ZCOR'
When using measure='ZCOR'
, the original variable should be expressed as the Fisher’s \(r\)-to-\(z\) transformed correlation coefficient as in Fisher (1921). The transformation is displayed in the equation below; the symbol \(z'\) is used to denote an approximate normal transformation, and \(r\) denotes the original correlation.
\[ z'=0.5[ln(1+r)-ln(1-r)] \]
escalc()
Conducts the Necessary Transformation########### Using measure='ZCOR' ###########
ses.tp.yv <- escalc(measure="ZCOR",ri=RESPONSE.MEASURE.VALUE2,
ni=IMPACT.SAMPLES, data=ses.tp.all)
## Remove NAs from rows with impact samples <=4
ses.tp.yv <- ses.tp.yv[!is.na(ses.tp.yv$yi) & !is.na(ses.tp.yv$vi),]
## Put yi and vi as first and second columns
ses.tp.yv <- ses.tp.yv[,c(414, 415, 1:413)]
## CONDUCTING TRANSFORMATION BY HAND
ses.tp.yv$TransformedR = 0.5*(log(1+ses.tp.yv$RESPONSE.MEASURE.VALUE2) -
log(1-ses.tp.yv$RESPONSE.MEASURE.VALUE2))
## Reducing the dataframe
Reduced = cbind.data.frame(RESPONSE.MEASURE.VALUE2 = ses.tp.yv$RESPONSE.MEASURE.VALUE2,
ni = ses.tp.yv$IMPACT.SAMPLES,
yi = ses.tp.yv$yi,
vi = ses.tp.yv$vi,
r2Z_transformed = ses.tp.yv$TransformedR)
## This line adds an interactive viewer to the html file
datatable(Reduced,rownames=T,filter="top",options=list(pageLength=5,scrollX=T))
Originally, I did not believe that escalc()
with measure='ZCOR'
was conducting the necessary transformation, but from the dataset above, we are able to verify that it does. The yi column is created from escalc()
, while the r2z column represents the data transformed by hand. The fact that these are identical confirms that using measure='ZCOR'
is also appropriate.
measure='ZCOR'
########### ASSINK PAPER CODE ###########
## Estimate the overall effect by fitting an intercept-only model.
overall <- rma.mv(yi, vi, random = list(~ 1 | uniquecepair, ~ 1 | CITATION.ID),
tdist=TRUE, data=ses.tp.yv)
## The results rounded to 3 decimals
summary(overall,digits=3)
##
## Multivariate Meta-Analysis Model (k = 149; method: REML)
##
## logLik Deviance AIC BIC AICc
## -78.795 157.591 163.591 172.582 163.757
##
## Variance Components:
##
## estim sqrt nlvls fixed factor
## sigma^2.1 0.032 0.180 148 no uniquecepair
## sigma^2.2 0.135 0.368 72 no CITATION.ID
##
## Test for Heterogeneity:
## Q(df = 148) = 1722.053, p-val < .001
##
## Model Results:
##
## estimate se tval pval ci.lb ci.ub
## 0.479 0.051 9.393 <.001 0.378 0.580 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output above, we observe 148 effect sizes from 72 separate studies. The estimated variance between effect sizes within studies is 0.032 and the estimated variance between studies is 0.135. The overall estimated effect size is 0.479, which is also highly significant.
Assink, Mark. 2016. “Fitting Three-Level Meta-Analytic Models in R: A Step-by-Step Tutorial.” The Quantitative Methods for Psychology 12 (3). Crossmark: 154–74. https://doi.org/10.20982/tqmp.12.3.p154.
Fisher, Ronald A. 1921. “On the ‘Probable Error’ of a Coefficient of Correlation Deduced from a Small Sample.” Fisher: Collected Papers Relating to Statistical and Mathematical Theory and Applications. Metron. https://doi.org/10.1214/aoms/1177706717.
Lajeunesse, Marc J. 2013. “Recovering Missing or Partial Data from Studies: A Survey of Conversions and Imputations for Meta-Analysis.” Handbook of Meta-Analysis in Ecology and Evolution. Princeton University Press, 195–206.
Olkin, Ingram, and John W. Pratt. 1958. “Unbiased Estimation of Certain Correlation Coefficients.” The Annals of Mathematical Statistics 29 (1). Project Euclid: 201–11. https://doi.org/10.1214/aoms/1177706717.