Summary

A simulation study was performed to identify the best estimator performance for a longitunal TMLE analysis of the effect of second-line diabetes drugs on dementia risk in the Danish National Registry data with four key features: many timepoints (10), a rare outcome (dementia prevalence: 1.9%), competing risks from death, and a high degree of administrative censoring. Three simulations were completed. 1) A simple simulation without positivity violiations, rare outcomes, long-term followup, or competing risks as a sanity check that estimators were implemented correctly, especially as we modified the LTMLE package code, 2) a realistic simulation in terms of dementia prevalence and diabetes drug patterns, but with scrambled outcomes and competing risks to check estimator performance with a known null association, and 3) a realistic simulation with a protective effect of GLP1 usage on dementia and death, with the truth calculated as the counterfactual 5 year risk of dementia prior to death when continiously on GLP1 versus not, with the effect of GLP1 on death removed to remove the competing risk.

Scenario 1: Simple simulation

True RR: 0.57

True RD: -0.3

Notes:

Relative risk performance

estimator bias variance mse bias_se_ratio oracle.coverage coverage_ic coverage_tmle coverage_cv_boot coverage_cv_boot_1000iter coverage_iptw coverage_iptw_boot
GLM -0.001 0.007 0.007 -0.015 94.7 95.0 95.0 93.7 NA 95.7 93.5
LASSO -0.001 0.007 0.007 -0.014 94.5 94.9 94.9 93.7 94.5 95.7 93.5

Risk difference performance

estimator bias variance mse bias_se_ratio oracle.coverage coverage_ic coverage_tmle coverage_cv_boot coverage_cv_boot_1000iter coverage_iptw coverage_iptw_boot_1000iter
GLM 0.00063 0.00159 0.00159 0.01567 95.5 95.3 95.3 94.8 NA 95.3 NA
LASSO 0.00066 0.00159 0.00159 0.01661 95.5 95.3 95.3 94.8 95.2 95.3 95.2

Scenario 2: Realistic simulation, null outcome

True RR: 1

True RD: 0

Notes:

RD oracle coverage of different estimators

estimator Qint DetQ bias variance mse bias_se_ratio oracle.coverage
LASSO No No -0.00006 2e-05 2e-05 -0.01181 96.0
LASSO Yes No -0.00005 2e-05 2e-05 -0.01101 96.5
GLM Yes No -0.00004 2e-05 2e-05 -0.00754 96.5
GLM No Yes 0.00061 3e-05 3e-05 0.10875 97.0
GLM No No 0.00103 9e-05 9e-05 0.10823 98.0

RR oracle coverage of different estimators

estimator Qint DetQ bias variance mse bias_se_ratio oracle.coverage
GLM Yes No -0.053 0.116 0.119 -0.155 95.5
LASSO Yes No -0.053 0.114 0.117 -0.156 96.0
LASSO No No -0.053 0.114 0.117 -0.156 96.5
GLM No Yes -0.013 0.126 0.126 -0.038 97.0
GLM No No -0.023 0.168 0.169 -0.057 98.0

Performance of difference variance estimators on null data

Notes:

  • Only showing LASSO estimator results-all estimator performances assessed in the realistic simulated data below.
  • Sanity-check on estimation performance on data with a known null association between GLP1 and dementia.
  • The IC variance estimator is anti-conservative and the TMLE variance estimator is conservative.
  • The bootstrap is anti-conservative but less so than the IC variance estimator.
  • The TMLE estimator is very conservative, with CI widths 8-10X that of the bootstrap.
  • The IPTW estimator is uniformly biased with overly-wide confidence intervals in all simulations (not shown).

Risk difference performance

variance_estimator coverage mean_ci_width
ic 51.00000 0.00722
tmle 100.00000 0.11535
bootstrap 90.85366 0.01300

Relative risk performance

variance_estimator coverage mean_ci_width
ic 51.50000 0.50639
tmle 100.00000 8.38962
bootstrap 90.85366 1.14126

Note CI width on the log scale for relative risks.

Scenario 3: Realistic simulation, protective effect of GLP1 on dementia

True RD: -0.009683665

True RR: 0.5148661

Comparison of different estimators’ performance

Notes:

  • Based on these results, we chose the LASSO estimator with Q-prediction and no deterministic Q function
  • Several of the estimators have comparable performance, but the chosen estimator performs best in both RR and RD estimation
  • Ridge regressions have lower MSE but not perfect 95% oracle coverage
  • Including the deterministic Q function marginally decreases bias/variance, so we should use in the bootstrap estimator

Risk difference

estimator bias variance mse oracle.coverage
LASSO, Det-Q, AUC fit -0.002080 6.0e-06 1.0e-05 84.50000
LASSO, Det-Q, AUC fit -0.002080 6.0e-06 1.0e-05 84.50000
LASSO, Lambda: 1se -0.001631 1.0e-05 1.3e-05 91.50000
Elastic Net, Lambda: 1se -0.001450 9.0e-06 1.2e-05 92.00000
GLM, LASSO prescreen 0.002793 4.9e-05 5.7e-05 92.78351
LASSO, Q-intercept -0.001583 1.1e-05 1.3e-05 93.00000
LASSO, Det-Q, Lambda: 1se -0.001109 8.0e-06 9.0e-06 93.50000
GLM 0.002819 5.6e-05 6.4e-05 93.50000
GLM, LASSO prescreen, Det-Q 0.002795 5.1e-05 5.9e-05 93.87755
Ridge, Det-Q 0.000446 1.1e-05 1.1e-05 94.00000
Elastic Net, Det-Q, Lambda: 1se -0.000899 8.0e-06 8.0e-06 94.50000
LASSO, Det-Q 0.000267 1.4e-05 1.4e-05 94.50000
Ridge, Lambda: 1se -0.000978 8.0e-06 9.0e-06 94.50000
Ridge -0.000118 1.3e-05 1.3e-05 94.50000
LASSO, AUC fit -0.001365 1.2e-05 1.4e-05 95.00000
LASSO -0.000265 1.7e-05 1.7e-05 95.00000
Ridge, Det-Q, Lambda: 1se -0.000536 6.0e-06 7.0e-06 95.50000

Relative Risk

estimator bias variance mse oracle.coverage
LASSO, Det-Q, AUC fit -0.762 0.209 0.790 34.000
LASSO, Det-Q, AUC fit -0.762 0.209 0.790 34.000
Ridge, Det-Q, Lambda: 1se -0.574 0.228 0.558 65.500
LASSO, Det-Q, Lambda: 1se -0.594 0.250 0.603 69.000
Ridge, Lambda: 1se -0.558 0.239 0.550 69.500
Elastic Net, Det-Q, Lambda: 1se -0.580 0.250 0.585 71.000
LASSO, Lambda: 1se -0.569 0.268 0.592 74.000
Elastic Net, Lambda: 1se -0.561 0.265 0.579 75.000
LASSO, Q-intercept -0.577 0.287 0.619 78.500
LASSO, AUC fit -0.465 0.282 0.498 84.500
Ridge -0.337 0.278 0.392 93.000
GLM, LASSO prescreen -0.022 0.459 0.459 93.299
Ridge, Det-Q -0.315 0.263 0.362 93.500
GLM -0.005 0.469 0.469 93.500
LASSO, Det-Q -0.326 0.308 0.414 95.000
LASSO -0.341 0.328 0.445 95.000
GLM, LASSO prescreen, Det-Q -0.025 0.443 0.443 95.408

Comparison of different variance estimators

Notes:

  • Showing LASSO estimator results with modeled Q (rather than intercept-only)
  • The IC variance estimator is anti-conservative and the TMLE variance estimator is conservative
  • The bootstrap is anti-conservative but less so than the IC variance estimator
  • The IPTW estimator is uniformly biased with overly-wide confidence intervals in all simulations (not shown)

Risk difference coverage

variance_estimator coverage mean_ci_width power bias_se_ratio_emp
ic, Det-Q 67.0 0.00736 92.0 0.14223
tmle 99.5 0.02129 49.0 -0.05020
ic 62.0 0.00737 91.0 -0.14089
Bootstrap, Det Q function 87.0 0.01346 68.5 NA
Bootstrap, Det Q function, 500 iterations 89.0 0.01338 69.5 NA
Bootstrap 85.5 0.01454 68.5 NA
Bootstrap-Ridge 87.5 0.01289 72.0 NA

Relative risk coverage

variance_estimator coverage mean_ci_width power bias_se_ratio_emp
ic, Det-Q 55.0 0.866 92.0 -1.475
ic 48.5 0.841 90.5 -1.591
tmle 100.0 3.579 0.5 -0.374
Bootstrap, Det Q function 76.5 1.952 68.5 NA
Bootstrap, Det Q function, 500 iterations 77.5 1.955 69.5 NA
Bootstrap 75.5 1.988 68.5 NA
Bootstrap- IPTW 100.0 17.888 0.0 NA
Bootstrap-Ridge 76.5 1.870 72.0 NA

Comparison of variance estimator performance over time

The primary analysis examined the effect of continuous GLP1 usage on dementia risk after 5 years, with longitudinal data discretized into 6 month time nodes. The imperfect performance of estimators in simulations may arise from the rare outcome (~2% prevalence after 5 years), positivity issues in the long-term followup (with increasingly small number of individuals continuously on GLP1), or high degrees of administrative censoring (~50% after 5 years). We ran simulations for all length of followup time from 6 months (time=1) to 5 years (time=10). Oracle coverage is good at all times, while IC coverage is increasingly anti-conservative and TMLE coverage is increasingly conservative over time. Interestingly, variance in RD estimates increases more over time while bias increases more in RR estimates.

Risk difference

time bias variance mse bias_se_ratio bias_se_ratio_emp oracle.coverage IC_coverage TMLE_coverage IC_mean_ci_width TMLE_mean_ci_width
1 -0.00021 0e+00 0e+00 -0.24209 -0.35093 96.0 71.0 85.0 0.00232 0.00285
2 -0.00021 0e+00 0e+00 -0.17300 -0.23397 95.5 77.5 77.5 0.00357 0.00357
3 0.00019 0e+00 0e+00 0.11499 0.17473 96.5 76.5 96.5 0.00426 0.00655
4 0.00070 0e+00 0e+00 0.37575 0.56369 95.5 78.5 99.0 0.00487 0.00838
5 0.00027 1e-05 1e-05 0.11295 0.18745 96.0 78.5 98.5 0.00569 0.01069
6 0.00064 1e-05 1e-05 0.25730 0.41259 95.0 78.0 99.5 0.00607 0.01228
7 -0.00019 1e-05 1e-05 -0.06014 -0.11447 95.5 72.5 97.0 0.00656 0.01434
8 -0.00114 1e-05 1e-05 -0.33070 -0.65342 93.5 64.5 97.5 0.00685 0.01649
9 -0.00072 1e-05 1e-05 -0.19364 -0.39991 94.0 64.0 98.0 0.00710 0.01844
10 -0.00045 2e-05 2e-05 -0.11002 -0.24138 94.5 61.5 99.5 0.00737 0.02129

Relative risk

time bias variance mse bias_se_ratio bias_se_ratio_emp oracle.coverage IC_coverage TMLE_coverage IC_mean_ci_width TMLE_mean_ci_width
1 -0.289 0.383 0.467 -0.467 -0.717 96.0 75.5 96.0 1.581 2.199
2 -0.207 0.244 0.287 -0.419 -0.616 96.5 78.5 78.5 1.319 1.319
3 -0.151 0.284 0.306 -0.283 -0.475 98.0 73.0 97.5 1.243 2.351
4 -0.076 0.274 0.280 -0.144 -0.246 97.0 74.0 98.0 1.205 2.720
5 -0.194 0.269 0.307 -0.374 -0.740 96.0 71.0 98.5 1.029 2.517
6 -0.161 0.259 0.285 -0.317 -0.621 95.0 72.0 99.5 1.019 2.852
7 -0.312 0.304 0.401 -0.566 -1.287 95.0 62.5 98.5 0.950 2.892
8 -0.407 0.317 0.482 -0.724 -1.791 93.5 52.5 99.0 0.891 3.030
9 -0.369 0.321 0.458 -0.651 -1.659 94.0 53.0 99.0 0.873 3.300
10 -0.352 0.328 0.452 -0.614 -1.639 94.5 48.0 100.0 0.841 3.579