Example 2, network meta-analysis
Example 3, network meta-analysis with composite groups
Funnel plots and dose-response plots
Crosswalking is the process of taking systematically biased data points and estimating their unbiased value. For example, when estimating overweight prevalence (BMI >= 25), some height and weight observations might be measured physically while others are self-reported. Because people tend to overreport their height and underreport their weight, self-reported BMI is systematically biased downward. Crosswalking involves:
To appropriately downweight the adjusted data point, we add uncertainty that comes from crosswalking process itself – standard error of the prediction and between-group heterogeneity – to the standard error of the result.
Sometimes a crosswalk needs to adjust multiple alternative definitions to the gold standard reference. For example, prevalence of schistosomiasis might be measured according to a gold standard diagnostic method A, or three alternatives B, C or D. The process is essentially the same as described above, with the added benefit that network meta-analysis can use information from the indirect comparisons B:C, C:D and B:D. This vignette demonstrates how to use the crosswalk
package to run a meta-regression (optionally network meta-regression) to predict adjustment factors for biased observations.
First, we create some simulated data for a crosswalk with one reference and one alternative. In df_matched
, the variable logit_diff
represents logit(prev_alt) - logit(prev_ref)
, or the difference in logit-transformed prevalence values. By convention, we always subtract the reference from the alternative value, alt minus ref. logit_diff_se
is the standard error of logit_diff
. The package includes the functions crosswalk::delta_transform()
and crosswalk::calculate_diff()
to facilitate calculating these quantities from your matched data. See https://rpubs.com/rsoren/572599 for a full example.
We also simulate the original dataset containing observations to be adjusted (df_orig
). The covariates used in the meta-regression model must exist in the original data, with the same column names (x1
and x2
in this case). We also include a column that indicates the definition/method used to obtain the observation: obs_method
with values “selfreported” and “measured” in this case. For df_orig
, we leave the dependent variable in linear space; the conversion to log or logit space is handled automatically by the adjust_orig_vals()
function later.
library(crosswalk, lib.loc = "/ihme/code/mscm/R/packages/")
library(dplyr)
set.seed(123)
# data for the meta-regression
# -- in a real analysis, you'd get this dataset by
# creating matched pairs of alternative/reference observations,
# then using delta_transform() and calculate_diff() to get
# log(alt)-log(ref) or logit(alt)-logit(ref) as your dependent variable
beta0_true <- -3
beta1_true <- 1
beta2_true <- 2
df_matched <- data.frame(
x1 = runif(n = 200, min = 0, max = 4),
x2 = rbinom(n = 200, prob = 0.5, size = 1) ) %>%
mutate(
logit_diff = beta0_true + x1*beta1_true + x2*beta2_true + rnorm(n = nrow(.)),
logit_diff_se = runif(200, min = 0.9, max = 1.1),
altvar = "selfreported",
refvar = "measured",
group_id = rep(1:20, each = 10)
)
head(df_matched)
#> x1 x2 logit_diff logit_diff_se altvar refvar group_id
#> 1 1.150310 0 0.3491204 0.9941364 selfreported measured 1
#> 2 3.153221 1 3.4656335 0.9731691 selfreported measured 1
#> 3 1.635908 1 0.3707626 0.9242544 selfreported measured 1
#> 4 3.532070 1 3.0752637 0.9093987 selfreported measured 1
#> 5 3.761869 0 0.3475292 0.9525593 selfreported measured 1
#> 6 0.182226 1 -1.2940209 1.0937282 selfreported measured 1
# original dataset with alternative observations to be adjusted
df_orig <- data.frame(stringsAsFactors = FALSE,
meanvar = runif(400, min = 0.2, max = 0.8), # original prevalence values; between 0 and 1
sdvar = runif(400, min = 0.1, max = 0.5), # standard errors of the original prevalence values; >0
x1 = runif(400, min = 0, max = 4),
x2 = rbinom(400, prob = 0.5, size = 1),
obs_method = sample(c("selfreported", "measured"), size = 400, replace = TRUE)
)
df_orig$row_id <- paste0("row", 1:nrow(df_orig))
head(df_orig)
#> meanvar sdvar x1 x2 obs_method row_id
#> 1 0.3641736 0.1932959 0.6210565 1 measured row1
#> 2 0.5563202 0.1922614 3.3834040 1 measured row2
#> 3 0.2961109 0.1246905 0.8575217 1 selfreported row3
#> 4 0.7120581 0.2988474 2.6794930 0 measured row4
#> 5 0.7086435 0.1976500 2.4710258 0 selfreported row5
#> 6 0.4867321 0.4032742 0.1999991 1 selfreported row6
Next we use the crosswalk
functions. CWData()
prepares the data for meta-regression, and CWModel()
fits the model. Inputs to the function parameters are described in the comments below, and more information is available in the R package help documentation. Some additional tips…
alt_dorms
and ref_dorms
(e.g. “selfreported” and “measured”, respectively). This may seem like an unnecessary detail now, but it allows for simpler specification of a network meta-analysis within the same framework. More on that in Example 2.cov_models
parameter takes a list of CovModel()
function calls. These specify the functional form and/or constraints for each predictor in the model; see help(CovModel)
for more information.CovModel(cov_name = "intercept")
as an element of cov_models = list(...)
. Note that for network meta-analysis, we almost always do want to include an “intercept”, which seems at odds with the guidance from GBD 2019. What changed is the definition of intercept, not the model. See the crosswalk training slides (https://rpubs.com/rsoren/572599) for a mathematical explanation. Briefly, “intercepts” refer to what in GBD 2019 were the \({-1,0,1}\) variables that encode the network structure. You no longer have to create these variables yourself, and instead only need to specify CovModel("intercept")
in a network meta-analysis like the one in Example 2.XSpline()
function. It takes arguments about polynomial degree, linearity in the tails and knot location (must include external knots at <=min and >= max values of the covariate). Relatedly, constraints on spline direction and shape may be specified in the CovModel()
function with the spline_monotonicity
and spline_convexity
arguments. An example is commented out below.df1 <- CWData(
df = df_matched, # dataset for metaregression
obs = "logit_diff", # column name for the observation mean
obs_se = "logit_diff_se", # column name for the observation standard error
alt_dorms = "altvar", # column name of the variable indicating the alternative method
ref_dorms = "refvar", # column name of the variable indicating the reference method
covs = list("x1", "x2"), # names of columns to be used as covariates later
study_id = "group_id", # name of the column indicating group membership, usually the matching groups
add_intercept = TRUE # adds a column called "intercept" that may be used in CWModel()
)
fit1 <- CWModel(
cwdata = df1, # object returned by `CWData()`
obs_type = "diff_logit", # "diff_log" or "diff_logit" depending on whether bounds are [0, Inf) or [0, 1]
cov_models = list( # specifying predictors in the model; see help(CovModel)
CovModel(cov_name = "intercept"),
CovModel(cov_name = "x1"),
# CovModel(cov_name = "x1", spline = XSpline(knots = c(0,1,2,3,4), degree = 3L, l_linear = TRUE, r_linear = TRUE), spline_monotonicity = "increasing"),
CovModel(cov_name = "x2") ),
gold_dorm = "measured" # the level of `alt_dorms` that indicates it's the gold standard
# this will be useful when we can have multiple "reference" groups in NMA
)
print(data.frame( # checking that the model recovers the true betas
beta_true = c(beta0_true, beta1_true, beta2_true),
beta_mean = fit1$beta[4:6],
beta_se = fit1$beta_sd[4:6]
))
#> beta_true beta_mean beta_se
#> 1 -3 -3.164455 0.1628918
#> 2 1 1.081816 0.0640169
#> 3 2 2.076966 0.1412373
To create a data frame with estimated coefficients, use create_result_df()
. This is useful for saving coefficient summaries to CSV. To save a model object for future use, use py_save_object()
.
df_result <- fit1$create_result_df()
write.csv(df_result, "/home/j/temp/reed/misc/df_result_crosswalk.csv")
py_save_object(object = fit1, filename = "/home/j/temp/reed/misc/fit1.pkl", pickle = "dill")
fit1 <- py_load_object(filename = "/home/j/temp/reed/misc/fit1.pkl", pickle = "dill")
Finally, adjust_orig_vals()
adjusts biased observations in the original dataset using the meta-regression model to predict the degree of bias. The adjust_orig_vals()
function has a few requirements:
adjust_orig_vals()
we leave the parameter of interest in linear space.adjust_orig_vals()
cannot make adjustments for observations where the parameter value is zero. This happens because log(0) = -Inf
. The same happens in logit-scale crosswalks with parameter values of 0 or 1, because logit(0) = -Inf
and logit(1) = Inf
. For this reason, adjust_orig_vals()
will throw an error if the data frame includes observations exactly at the bound(s).adjust_orig_vals()
, with the same column names.study_id
variable. Whether you want to predict out on these random effects is a modeling decision that depends on whether you believe the random component captures true variation in the crosswalk adjustment, as opposed to noise in the data. As a default we leave the adjust_orig_vals(..., study_id = NULL)
parameter unspecified and do not predict out on the random effects.preds1 <- adjust_orig_vals(
fit_object = fit1, # object returned by `CWModel()`
df = df_orig,
orig_dorms = "obs_method",
orig_vals_mean = "meanvar",
orig_vals_se = "sdvar",
data_id = "row_id" # optional argument to add a user-defined ID to the predictions;
# name of the column with the IDs
)
# the result of adjust_orig_vals() is a five-element list,
# vectors containing:
# -- the adjusted mean and SE of the adjusted mean in linear space
# -- the adjustment factor and SE of the adjustment factor in transformed space;
# note that the adjustment factor is the alt-ref prediction,
# so we *subtract* this value to make the adjustment
# -- an identifier for the row of the prediction frame the corresponds to the prediction
lapply(preds1, head)
#> $ref_vals_mean
#> [1] 0.3641736 0.5563202 0.3304659 0.7120581 0.7990013 0.6938207
#>
#> $ref_vals_sd
#> [1] 0.1932959 0.1922614 0.1412219 0.2988474 0.1580049 0.3459730
#>
#> $pred_diff_mean
#> [1] 0.0000000 0.0000000 -0.1598083 0.0000000 -0.4912590 -0.8711269
#>
#> $pred_diff_sd
#> [1] 0.0000000 0.0000000 0.2224753 0.0000000 0.2270617 0.2159760
#>
#> $data_id
#> [1] "row1" "row2" "row3" "row4" "row5" "row6"
# now we add the adjusted values back to the original dataset
df_orig[,
c("meanvar_adjusted", "sdvar_adjusted",
"pred_logit", "pred_se_logit", "data_id")] <- preds1
# note that the gold standard observations remain untouched
head(df_orig)
#> meanvar sdvar x1 x2 obs_method row_id meanvar_adjusted
#> 1 0.3641736 0.1932959 0.6210565 1 measured row1 0.3641736
#> 2 0.5563202 0.1922614 3.3834040 1 measured row2 0.5563202
#> 3 0.2961109 0.1246905 0.8575217 1 selfreported row3 0.3304659
#> 4 0.7120581 0.2988474 2.6794930 0 measured row4 0.7120581
#> 5 0.7086435 0.1976500 2.4710258 0 selfreported row5 0.7990013
#> 6 0.4867321 0.4032742 0.1999991 1 selfreported row6 0.6938207
#> sdvar_adjusted pred_logit pred_se_logit data_id
#> 1 0.1932959 0.0000000 0.0000000 row1
#> 2 0.1922614 0.0000000 0.0000000 row2
#> 3 0.1412219 -0.1598083 0.2224753 row3
#> 4 0.2988474 0.0000000 0.0000000 row4
#> 5 0.1580049 -0.4912590 0.2270617 row5
#> 6 0.3459730 -0.8711269 0.2159760 row6
To make sure we understand what’s going on, let’s manually calculate the adjustment for a particular row of the original dataset.
print(df_orig[3,]) # row 3 is a self-reported observation with prevalence 0.296
#> meanvar sdvar x1 x2 obs_method row_id meanvar_adjusted
#> 3 0.2961109 0.1246905 0.8575217 1 selfreported row3 0.3304659
#> sdvar_adjusted pred_logit pred_se_logit data_id
#> 3 0.1412219 -0.1598083 0.2224753 row3
fit1$fixed_vars # estimated betas
#> $measured
#> [1] 0 0 0
#>
#> $selfreported
#> [1] -3.164455 1.081816 2.076966
# the predicted adjustment for an observations with x1=0.8575217 and x2=1 should be...
(pred <- -3.164455 + 1.081816*0.8575217 + 2.076966*1)
#> [1] -0.1598083
# the prediction is defined as logit(alt) - logit(ref), so the final adjusted value should be
# logit(mean_adjusted) = logit(mean_alt) - prediction
logit <- function(p) log(p/(1-p))
inv_logit <- function(x) exp(x)/(1+exp(x))
logit_mean_adjusted <- logit(df_orig[3, "meanvar"]) - pred
inv_logit(logit_mean_adjusted)
#> [1] 0.3304659
# check that it's the same
round(inv_logit(logit_mean_adjusted), digits = 5) == round(df_orig[3, "meanvar_adjusted"], digits = 5)
#> [1] TRUE
# SE of the adjusted data point is calculated as sqrt(a^2 + b^2 + c^2), where
# a is the (log or logit) standard error of the original data point,
# b is the standard error of the predicted adjustment
# c is the standard deviation of between-group heterogeneity, a.k.a. sqrt(gamma)
#
# note that a, b, and c are all in transformed (log or logit) space
#
# this method increases an adjusted observation's uncertainty and effectively downweights it in subsequent analyses, like an ST-GPR model to estimate prevalence globally
#
Network meta-analysis is a method that allows the model to incorporate information from indirect comparisons. For example, in the case below, we have a gold standard diagnostic method A and two alternatives B and C. To adjust B and C observations to their predicted A-equivalents, the best input data we could give the meta-regression is many direct comparisons of B:A and C:A. However, B:C comparisons also help to inform the beta estimates for B:A and C:A, even if B:C is not the quantity of interest. In the example, we specify that the adjustment factor should vary by the level of a continuous covariate x1
. By excluding intercepts, we ensure that the adjustment factor is 0 when the value of x1
is 0.
Note that most network meta-analyses will want to include an intercept term in the list passed to cov_models
. The model will return a coefficient for each definition that answers the question: here’s how much you need to adjust a reference observation to make it equivalent to the alternative definition. As described above, this is different than the guidance in GBD 2019 because the word “intercept” now refers to the \({-1,0,1}\) variables that encode the network.
First, we create some simulated data. The column indicating which diagnostic is the reference (refvar
) can include A, B and C. The column indicating the alternative diagnostic (altvar
) includes only the alternatives, B and C. This means that both B and C can act as a “reference” for an indirect comparison, i.e. the order B:C or C:B doesn’t matter.
set.seed(123)
true_beta1 <- 1 # define the true cofficient value for x1
case_defs <- c(B = 2, C = 3) # define case definitions and true coefficient values
df_matched2 <- data.frame(id = 1:400) %>%
mutate(
altvar = sample(c("B", "C"), nrow(.), TRUE),
refvar = sample(c("A", "B", "C"), nrow(.), TRUE),
x1 = runif(n = nrow(.), min = 0, max = 10),
logit_diff = rnorm(n = nrow(.)),
logit_diff = logit_diff + (altvar == "B")*case_defs["B"] * x1*true_beta1,
logit_diff = logit_diff + (altvar == "C")*case_defs["C"] * x1*true_beta1,
logit_diff = logit_diff - (refvar == "B")*case_defs["B"] * x1*true_beta1,
logit_diff = logit_diff - (refvar == "C")*case_defs["C"] * x1*true_beta1,
# logit_diff_se = runif(n = nrow(.), min = 0.4, max = 0.6),
logit_diff_se = runif(n = nrow(.), min = 0.4, max = 0.6),
group_id = sample(1:20, size = nrow(.), replace = TRUE)) %>%
arrange(group_id) %>%
filter(altvar != refvar) %>%
select(altvar, refvar, logit_diff, logit_diff_se, x1, group_id)
head(df_matched2)
#> altvar refvar logit_diff logit_diff_se x1 group_id
#> 1 C B 4.824921 0.5524550 5.394107 1
#> 2 B C -3.309311 0.4719643 3.923313 1
#> 3 C B 4.035348 0.5238145 2.886901 1
#> 4 B A 18.119626 0.5445521 8.666537 1
#> 5 C B 10.743544 0.5142872 9.724320 1
#> 6 C A 24.837169 0.5588720 7.884719 1
Next we specify the model the same way as in Example 1, with an important exception. We can give CWModel()
a prior indicating whether one alternative should give a smaller adjustment than another, using the order_prior
parameter. It takes a list of two-element vectors, with the first element indicating which diagnostic should have the lower crosswalk adjustment. order_prior
is useful when one alternative definition is a subset of another. For example, if C is prevalence of current smokers and B is prevalence of daily smokers, B is a subset of C by definition. The default is not to include order priors.
df2 <- CWData(
df = df_matched2, # dataset for metaregression
obs = "logit_diff", # column name for the observation mean
obs_se = "logit_diff_se", # column name for the observation standard error
alt_dorms = "altvar", # column name of the variable indicating the alternative method
ref_dorms = "refvar", # column name of the variable indicating the reference method
covs = list("x1"), # names of columns to be used as covariates later
# covs = list(), # names of columns to be used as covariates later
study_id = "group_id" # name of the column indicating group membership, usually the matching groups
)
fit2 <- CWModel(
cwdata = df2, # object returned by `CWData()`
obs_type = "diff_logit", # "diff_log" or "diff_logit" depending on whether bounds are [0, Inf) or [0, 1]
cov_models = list( # specying predictors in the model; see help(CovModel)
# CovModel(cov_name = "intercept") ),
CovModel(cov_name = "x1") ),
gold_dorm = "A", # the level of `ref_dorms` that indicates it's the gold standard
order_prior = list(c("B", "C")) # tells the model that the coefficient estimated for B should be <= the coefficient for C
)
fit2_tmp <- CWModel(
cwdata = df2, # object returned by `CWData()`
obs_type = "diff_logit", # "diff_log" or "diff_logit" depending on whether bounds are [0, Inf) or [0, 1]
cov_models = list( # specying predictors in the model; see help(CovModel)
# CovModel(cov_name = "intercept") ),
CovModel(cov_name = "x1", spline = XSpline(knots = c(0, 3.3, 6.6, 10), degree = 2L )) ),
gold_dorm = "A", # the level of `ref_dorms` that indicates it's the gold standard
order_prior = list(c("B", "C")) # tells the model that the coefficient estimated for B should be <= the coefficient for C
)
df_tmp <- fit2$create_result_df()
df_tmp2 <- fit2_tmp$create_result_df()
print(data.frame( # checking that the model recovers the true betas
beta_true = case_defs,
beta_mean = fit2$beta[2:3],
beta_se = fit2$beta_sd[2:3]
))
#> beta_true beta_mean beta_se
#> B 2 2.000413 0.009301158
#> C 3 3.002995 0.009994177
Finally, as before, we use the model to adjust biased observations in the original dataset.
df_orig2 <- data.frame(id = 1:600) %>%
mutate(
meanvar = runif(n = nrow(.), min = 0.2, max = 0.8),
sdvar = runif(n = nrow(.), min = 0.4, max = 0.6),
x1 = runif(n = nrow(.), min = 0, max = 10),
group_id = sample(1:20, size = nrow(.), replace = TRUE),
intercept = 0,
obs_method = sample(c("A", "B", "C"), nrow(.), TRUE) )
head(df_orig2)
#> id meanvar sdvar x1 group_id intercept obs_method
#> 1 1 0.5135624 0.5610527 8.169894 11 0 A
#> 2 2 0.6794278 0.5537111 7.596990 11 0 C
#> 3 3 0.3244733 0.4681815 5.718239 20 0 C
#> 4 4 0.4317691 0.5875787 9.887544 7 0 A
#> 5 5 0.5254297 0.5007663 9.951715 14 0 C
#> 6 6 0.6465518 0.4648555 3.214374 19 0 C
preds2 <- adjust_orig_vals(
fit_object = fit2, # object returned by `CWModel()`
df = df_orig2,
orig_dorms = "obs_method",
orig_vals_mean = "meanvar",
orig_vals_se = "sdvar"
)
# now we add the adjusted values back to the original dataset
df_orig2[, c(
"meanvar_adjusted", "sdvar_adjusted", "pred_logit",
"pred_se_logit", "data_id")] <- preds2
# note that the gold standard observations remain untouched
head(df_orig2)
#> id meanvar sdvar x1 group_id intercept obs_method
#> 1 1 0.5135624 0.5610527 8.169894 11 0 A
#> 2 2 0.6794278 0.5537111 7.596990 11 0 C
#> 3 3 0.3244733 0.4681815 5.718239 20 0 C
#> 4 4 0.4317691 0.5875787 9.887544 7 0 A
#> 5 5 0.5254297 0.5007663 9.951715 14 0 C
#> 6 6 0.6465518 0.4648555 3.214374 19 0 C
#> meanvar_adjusted sdvar_adjusted pred_logit pred_se_logit data_id
#> 1 5.135624e-01 5.611872e-01 0.000000 0.00000000 0
#> 2 2.620254e-10 6.665493e-10 22.813723 0.07592566 1
#> 3 1.674553e-08 3.579003e-08 17.171845 0.05714910 2
#> 4 4.317691e-01 5.877026e-01 0.000000 0.00000000 3
#> 5 1.162370e-13 2.337901e-13 29.884952 0.09945921 4
#> 6 1.175142e-04 2.391163e-04 9.652751 0.03212503 5
Sometimes alternative definitions are not mutually exclusive. For example, maybe the gold standard way to measure prevalence is with diagnostic tool A and among the general population. However, some observations measured prevalence with diagnostic tool B and/or only in urban populations (C). In addition to using the non-standard diagnostic tool in the general population (denoted as “B”), alternative observations can take on values of “C” or “B_C” if the prevalence was measured in an urban population. We account for this in a crosswalk adjustment by assuming additivity of the sub-definitions, i.e. the logit-space adjustment due to taking prevalence in an urban population is the same regardless of which diagnostic tool was used. As in Example 2, we relax the requirement that prevalence comparisons must include the gold-standard reference. Network meta-analysis allows the model to take advantage of comparisons between two alternative definitions, e.g. “B” and “B_C”.
set.seed(1)
df3 <- data.frame(id = 1:400)
#####
# create the simulated dataset
# define case definitions and coefficients for simulation
case_defs <- c(B = 2, C = 3)
n_sample_defs <- 2
# randomly sample from case definition components to make combined case definitions
df3$alt <- sapply(1:nrow(df3), function(i) {
paste(sort(sample(x = names(case_defs), size = sample(1:n_sample_defs))), collapse = '_')
})
df3$ref <- sapply(1:nrow(df3), function(i) {
paste(sort(sample(x = names(case_defs), size = sample(1:n_sample_defs))), collapse = '_')
})
# make half of the reference case definitions 'A' to signify the gold standard
df3$ref <- ifelse(df3$id %% 2 == 0, "A", df3$ref)
df4 <- filter(df3, ref != alt) # remove duplicates
# subtract coefficient if component is in the reference def, and
# add coefficient if component is in the alternative def
df4$logit_prev_diff <- 0
for (i in names(case_defs)) df4$logit_prev_diff <- df4$logit_prev_diff - (sapply(i, grepl, df4$ref) * case_defs[i])
for (i in names(case_defs)) df4$logit_prev_diff <- df4$logit_prev_diff + (sapply(i, grepl, df4$alt) * case_defs[i])
df4$logit_prev_diff <- as.numeric(df4$logit_prev_diff + rnorm(n = nrow(df4), mean = 0, sd = 1))
df4$logit_prev_diff_se <- 0.5
head(df4)
#> id alt ref logit_prev_diff logit_prev_diff_se
#> 1 1 B B_C -1.095623 0.5
#> 2 2 B A 1.651368 0.5
#> 3 3 C B 1.218831 0.5
#> 4 4 B_C A 6.144955 0.5
#> 5 5 C B_C -2.123546 0.5
#> 6 6 B_C A 5.147393 0.5
To use composite definitions – alternative definitions with non-mutually exclusive sub-definitions – we include a delimiter in the values we pass to the alt_dorms
and ref_dorms
columns. If the delimiter is "_" as in this example, we pass it as an argument to the dorm_separator
parameter of CWData()
.
#####
dat4 <- CWData(
df = df4,
obs = "logit_prev_diff",
obs_se = "logit_prev_diff_se",
alt_dorms = "alt",
ref_dorms = "ref",
dorm_separator = "_",
covs = list(),
study_id = "id",
add_intercept = TRUE
)
fit4 <- CWModel(
cwdata = dat4,
obs_type = "diff_logit",
cov_models = list(CovModel("intercept")),
# # we can put priors on the beta for a specific alternative definition:
# cov_models = list(CovModel("intercept", prior_beta_uniform = list(B = array(c(0,0))))),
gold_dorm = "A"
# prior_gamma_uniform = array(c(0, 0.4))
)
fit4$fixed_vars
#> $A
#> [1] 0
#>
#> $B
#> [1] 2.002939
#>
#> $C
#> [1] 3.033783
case_defs
#> B C
#> 2 3
df_orig4 <- df4 %>%
mutate(
prev_orig = runif(n = nrow(.), 0.1, 0.9),
prev_orig_se = 0.2,
meas_method = ref) %>%
select(-ref, -alt, -logit_prev_diff, -logit_prev_diff_se)
preds4 <- adjust_orig_vals(
fit_object = fit4, # object returned by `CWModel()`
df = df_orig4,
orig_dorms = "meas_method",
orig_vals_mean = "prev_orig",
orig_vals_se = "prev_orig_se"
)
df_orig4[, c("prev_adjusted", "prev_se_adjusted", "prediction_logit",
"prediction_se_logit", "data_id")] <- preds4
head(df_orig4)
#> id prev_orig prev_orig_se meas_method prev_adjusted prev_se_adjusted
#> 1 1 0.6294998 0.2 B_C 0.01091491 0.01325531
#> 2 2 0.1380010 0.2 A 0.13800096 0.22532907
#> 3 3 0.8047976 0.2 B 0.35746479 0.35488945
#> 4 4 0.8455840 0.2 A 0.84558400 0.23017319
#> 5 5 0.8639310 0.2 B_C 0.03960492 0.07283514
#> 6 6 0.7042963 0.2 A 0.70429629 0.27022408
#> prediction_logit prediction_se_logit data_id
#> 1 5.036722 0.10433740 0
#> 2 0.000000 0.00000000 1
#> 3 2.002939 0.07302781 2
#> 4 0.000000 0.00000000 3
#> 5 5.036722 0.10433740 4
#> 6 0.000000 0.00000000 5
To assess the fit of a crosswalk model, we use a funnel plot to see how well a single crosswalk coefficient describes the data. When the crosswalk adjustment varies by levels of a continuous variable, we use a dose-response plot. First, to enable the use of the matplotlib
library in the underlying Python code, we run repl_python()
to open an interactive Python interpreter, then type exit
in the console to return to the R interpreter. (We’re still figuring out why this is necessary and will fix it when we do!)
If any covariates were included in the meta-regression model, the plots$funnel_plot()
and plots$dose_response_curve()
functions assume a covariate value of zero unless otherwise specified by the continuous_variables
parameter. The prediction will be based upon the median value observed in the input data for all variables names passed to continuous_variables
. obs_method
indicates which alternative definition/method (dorm) should be visualized, compared to the gold standard reference. This parameter should be specified regardless of whether or not there are multiple alternative definitions.
PDFs of the plots are saved in the location passed to plots_dir
.
df_matched5 <- data.frame(id = 1:200) %>%
mutate(
logit_diff_se = runif(nrow(.), 0.5, 10),
logit_diff = 5 + rnorm(nrow(.), 0, logit_diff_se),
altvar = ifelse(row_number() %% 8 == 0, "awefawef", "selfreported"),
refvar = "measured",
group_id = rep(1:10, each = 20)
)
dat5 <- CWData(
df = df_matched5,
obs = "logit_diff",
obs_se = "logit_diff_se",
alt_dorms = "altvar",
ref_dorms = "refvar",
covs = list(),
study_id = "group_id",
add_intercept = TRUE
)
fit5 <- CWModel(
cwdata = dat5,
obs_type = "diff_logit",
cov_models = list(
CovModel(cov_name = "intercept") ),
gold_dorm = "measured",
inlier_pct = 0.9
)
##### don't forget to run repl_python() !
# ... then type 'exit' to get back to the R interpreter
repl_python()
plots <- import("crosswalk.plots")
plots$funnel_plot(
cwmodel = fit5,
cwdata = dat5,
continuous_variables = list(),
obs_method = 'selfreported',
plot_note = 'Funnel plot example',
plots_dir = '/home/j/temp/reed/misc/',
file_name = "funny_plot_example_v10",
write_file = TRUE
)
For the dose-response plot, the variable given to dose_variable
will be on the x-axis.
plots$dose_response_curve(
dose_variable = 'x1',
obs_method = 'B',
continuous_variables=list(),
cwdata=df2,
cwmodel=fit2,
plot_note="Example dose-response plot",
plots_dir='/home/j/temp/reed/misc/',
file_name = "doseresponse_plot_example_v7",
write_file=TRUE)
# py_run_string("import importlib; importlib.reload(plots)")
# importlib <- import("importlib")
# plots <- importlib$reload(plots)