Refrerence

Selection Bias

Selection Bias, occurs when the study selected population doesn’t represent the target population (like refuse answer and leave study).

No selection bias

  • if the response rate is 100%.

bias_parms is 4 probabilities: among cases exposed (1), among cases unexposed(2), among noncases exposed(3), and among noncases unexposed(4).

library(episensr)
## Loading required package: ggplot2
## Thank you for using episensr!
## This is version 1.3.0 of episensr
## Type 'citation("episensr")' for citing this R package in publications.
stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = c(1, 1, 1, 1))
stang
## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.7984287
##    Selection Bias Corrected Odds Ratio: 0.7061267

or

library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = 1)
stang
## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.7984287
##    Selection Bias Corrected Odds Ratio: 0.7061267

Corrected selection bias

by 4 probabilities: among cases exposed (1), among cases unexposed(2), among noncases exposed(3), and among noncases unexposed(4).

or a single positive selection-bias factor which is the ratio of the exposed versus unexposed selection probabilities (important) comparing cases and noncases [(1/2)/(3/4)=0.8].

  • If there is 0.8 response rate in nocase exposed, there is a selection bias in the control group because not good representative. The association is toward null.
library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = 1/0.8)
stang
## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.6387430
##    Selection Bias Corrected Odds Ratio: 0.5649013

or

library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = c(1, 1, 0.8, 1))
stang
## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.6815567
##    Selection Bias Corrected Odds Ratio: 0.5649013
  • If the response rate is 0.45-0.9 in case group, can not correct selection bias by this approach. you have to compare the characters of response population to no response population.
library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = c(0.45, 0.9, 0.5, 1))
stang
## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.8052260
##    Selection Bias Corrected Odds Ratio: 0.7061267

Information Bias (Misclassification)

Highly sensitive tests (>90%) yield few false negatives, so a negative result constitutes strong evidence to rule out or exclude the target disease. Conversely, highly specific tests (>90%) generate a very low number of false positives, so a positive result strongly implies that the target disease/condition is present. Note the prevalence (low) also affect the diagnosis test result. The association is toward to null.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms =c(0.90, 0.90, 0.90, 0.90))
## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  2.696653                    
##    Misclassification Bias Corrected Odds Ratio:  6.956681  4.048179 11.954860

No differential miscllasification

non-differential misclassification will always bias towards the null.

  • Let’s say the sensitivity of self-reported smoking is 94% and specificity is 97%, for both the case and control groups.

The argument bias_parms should be made of the following components: (1) Sensitivity of classification among those with the outcome, (2) Sensitivity of classification among those without the outcome, (3) Specificity of classification among those with the outcome, and (4) Specificity of classification among those without the outcome.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.94, 0.94, 0.97, 0.97))
## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                              2.5%    97.5%
## Misclassification Bias Corrected Relative Risk: 2.377254                  
##    Misclassification Bias Corrected Odds Ratio: 5.024508 3.282534 7.690912

Differential miscllasification

The effects of differential misclassification and selection bias will depend on the details.

Let’s say the sensitivity of self-reported smoking and specificity are different in the case and control groups.

  • Higher sensitivity (false positive) are in the case group. The association is away null.
misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.95, 1, 0.80, 1))
## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                              2.5%    97.5%
## Misclassification Bias Corrected Relative Risk: 1.865779                  
##    Misclassification Bias Corrected Odds Ratio: 3.205502 2.064626 4.976806
  • Higher sensitivity (false positive) are in the control group. The association is toward null.
misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(1, 0.95, 1, 0.80))
## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  3.578230                    
##    Misclassification Bias Corrected Odds Ratio: 23.881793  6.533357 87.296636
  • Higher sensitivity (false positive) are in the case group with low sensitivity and specificity in the control group. The association is toward null.
misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.95, 0.80, 0.80, 0.80))
## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  2.997496                    
##    Misclassification Bias Corrected Odds Ratio: 13.970407  3.683261 52.988991
  • Higher sensitivity (false positive) are in the control group with low sensitivity and specificity in the case group. The association is toward null.
misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.80, 0.95, 0.80, 0.80))
## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                  2.5%
## Misclassification Bias Corrected Relative Risk:   3.993424           
##    Misclassification Bias Corrected Odds Ratio:  29.686983   7.681550
##                                                      97.5%
## Misclassification Bias Corrected Relative Risk:           
##    Misclassification Bias Corrected Odds Ratio: 114.731667

Covariate misclassification

  • No covariate misclassification
misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
                              781, 21958, 754, 34471, 4860, 383588),
                            dimnames = list(c("Twins+", "Twins-"),
                                            c("Folic acid+", "Folic acid-"),
                                            c("Total", "IVF+", "IVF-")),
                            dim = c(2, 2, 3)),
                      bias_parms = NULL)
## --Observed data-- 
##          Outcome: Twins+ 
##        Comparing: Folic acid+ vs. Folic acid- 
## 
## , , Total
## 
##        Folic acid+ Folic acid-
## Twins+        1319        5641
## Twins-       38054      405546
## 
## , , IVF+
## 
##        Folic acid+ Folic acid-
## Twins+         565         781
## Twins-        3583       21958
## 
## , , IVF-
## 
##        Folic acid+ Folic acid-
## Twins+         754        4860
## Twins-       34471      383588
## 
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
##    Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
##                                                       Observed Corrected
##                       SMR RR adjusted for confounder: 2.261738  2.261738
##    RR due to confounding by misclassified confounder: 1.079661  1.079661
##           Mantel-Haenszel RR adjusted for confounder: 2.228816  2.228816
## MH RR due to confounding by misclassified confounder: 1.095608  1.095608
##                       SMR OR adjusted for confounder: 2.337898  2.337898
##    OR due to confounding by misclassified confounder: 1.065867  1.065867
##           Mantel-Haenszel OR adjusted for confounder: 2.290469  2.290469
## MH OR due to confounding by misclassified confounder: 1.087938  1.087938
  • No differential misclassifcation

Data on IVF were not available and a proxy for it was used, period of involuntary childlessness. However, it was a poor proxy for IVF, with a sensitivity of 60% (here use 0.9) and a specificity of 95%. These bias parameters were assumed to be nondifferential.

misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
                              781, 21958, 754, 34471, 4860, 383588),
                            dimnames = list(c("Twins+", "Twins-"),
                                            c("Folic acid+", "Folic acid-"),
                                            c("Total", "IVF+", "IVF-")),
                            dim = c(2, 2, 3)),
                      bias_parms = c(.9, .9, .95, .95))
## --Observed data-- 
##          Outcome: Twins+ 
##        Comparing: Folic acid+ vs. Folic acid- 
## 
## , , Total
## 
##        Folic acid+ Folic acid-
## Twins+        1319        5641
## Twins-       38054      405546
## 
## , , IVF+
## 
##        Folic acid+ Folic acid-
## Twins+         565         781
## Twins-        3583       21958
## 
## , , IVF-
## 
##        Folic acid+ Folic acid-
## Twins+         754        4860
## Twins-       34471      383588
## 
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
##    Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
##                                                       Observed Corrected
##                       SMR RR adjusted for confounder: 2.261738  1.265607
##    RR due to confounding by misclassified confounder: 1.079661  1.929437
##           Mantel-Haenszel RR adjusted for confounder: 2.228816  1.356976
## MH RR due to confounding by misclassified confounder: 1.095608  1.799523
##                       SMR OR adjusted for confounder: 2.337898  1.269900
##    OR due to confounding by misclassified confounder: 1.065867  1.962271
##           Mantel-Haenszel OR adjusted for confounder: 2.290469  1.399383
## MH OR due to confounding by misclassified confounder: 1.087938  1.780705
  • Differential misclassifcation
misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
                              781, 21958, 754, 34471, 4860, 383588),
                            dimnames = list(c("Twins+", "Twins-"),
                                            c("Folic acid+", "Folic acid-"),
                                            c("Total", "IVF+", "IVF-")),
                            dim = c(2, 2, 3)),
                      bias_parms = c(0.6, 0.95, 0.95, 0.95))
## --Observed data-- 
##          Outcome: Twins+ 
##        Comparing: Folic acid+ vs. Folic acid- 
## 
## , , Total
## 
##        Folic acid+ Folic acid-
## Twins+        1319        5641
## Twins-       38054      405546
## 
## , , IVF+
## 
##        Folic acid+ Folic acid-
## Twins+         565         781
## Twins-        3583       21958
## 
## , , IVF-
## 
##        Folic acid+ Folic acid-
## Twins+         754        4860
## Twins-       34471      383588
## 
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
##    Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
##                                                        Observed Corrected
##                       SMR RR adjusted for confounder: 2.2617379 0.9907589
##    RR due to confounding by misclassified confounder: 1.0796607 2.4646859
##           Mantel-Haenszel RR adjusted for confounder: 2.2288163 0.9864718
## MH RR due to confounding by misclassified confounder: 1.0956082 2.4753971
##                       SMR OR adjusted for confounder: 2.3378982 0.9907460
##    OR due to confounding by misclassified confounder: 1.0658668 2.5151633
##           Mantel-Haenszel OR adjusted for confounder: 2.2904691 0.9834783
## MH OR due to confounding by misclassified confounder: 1.0879378 2.5337497

The mantel haenszel association with missclassifiation is away null.

Confounding bias

unmeasured or unknown confounders

The frequency of drug-attributed rash in relation to allopurinol exposure with sex treated as a potential confounding factor (stratification)

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
RR_no=0.5249886 # RR between confounder and outcome among non-exposed

p1=0.5671642  # prevalence of confounder among exposed

p2=0.5417661  # prevalence of confounder among unexposed 

rash <- matrix(c(15, 94, 52, 1163),
               dimnames = list(c("Rash +", "Rash -"),
                               c("Allopurinol +", "Allopurinol -")),
               nrow = 2, byrow = TRUE)

rash %>% confounders(., type = "RR", bias_parms = c(RR_no, p1, p2))
## --Observed data-- 
##          Outcome: Rash + 
##        Comparing: Allopurinol + vs. Allopurinol - 
## 
##        Allopurinol + Allopurinol -
## Rash +            15            94
## Rash -            52          1163
## 
##                                           2.5%    97.5%
##         Crude Relative Risk: 2.993808 1.840724 4.869218
## Relative Risk, Confounder +: 3.043245                  
## Relative Risk, Confounder -: 3.043245                  
## ---
##                                           RR_conf
## Standardized Morbidity Ratio: 3.0432449 0.9837551
##              Mantel-Haenszel: 3.0432449 0.9837551
  • Generate three matrix by confounders or get these parameters from other researches
## Outcome by confounders
rash_conf <- matrix(c(36, 58, 645, 518),
                    dimnames = list(c("Rash +", "Rash -"),
                                    c("Males", "Females")),
                    nrow = 2, byrow = TRUE)
rash_conf
##        Males Females
## Rash +    36      58
## Rash -   645     518
## By confounders: among males
rash_males <- matrix(c(5, 36, 33, 645),
                     dimnames = list(c("Rash +", "Rash -"),
                                     c("Allopurinol +", "Allopurinol -")),
                     nrow = 2, byrow = TRUE)
rash_males
##        Allopurinol + Allopurinol -
## Rash +             5            36
## Rash -            33           645
## By confounders: among females
rash_females <- matrix(c(10, 58, 19, 518),
                       dimnames = list(c("Rash +", "Rash -"),
                                       c("Allopurinol +", "Allopurinol -")),
                       nrow = 2, byrow = TRUE)
rash_females
##        Allopurinol + Allopurinol -
## Rash +            10            58
## Rash -            19           518
(RR_no <- (36/(36+645))/(58/(58+518))) # RR between confounder and outcome among non-exposed
## [1] 0.5249886
## [1] 0.5249886
(p1 <- (5+33)/(15+52)) # prevalence of confounder among exposed
## [1] 0.5671642
## [1] 0.5671642
(p2 <- (36+645)/(94+1163)) # prevalence of confounder among unexposed
## [1] 0.5417661
## [1] 0.5417661
  • Adjust RR directly
confounders.ext(RR = 2, bias_parms = c(0.1, 0.9, 0.1, 0.4))
## --Input bias parameters--
##                                      
## RR(Confounder-Disease):           0.1
## OR(Exposure category-Confounder): 0.9
## p(Confounder):                    0.1
## p(Exposure):                      0.4
## ---
## 
##                        
## Crude RR       1.009328
## Percent bias -49.533590

Summary