Bias analysis and control

Selection Bias

Selection Bias, occurs when the study selected population doesn’t represent the target population (like refuse answer and leave study).

No selection bias

if the response rate is 100%.

bias_parms is 4 probabilities: among cases exposed (1), among cases unexposed(2), among noncases exposed(3), and among noncases unexposed(4).

library(episensr)

## Loading required package: ggplot2

## Thank you for using episensr!

## This is version 1.3.0 of episensr

## Type 'citation("episensr")' for citing this R package in publications.

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = c(1, 1, 1, 1))
stang

## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.7984287
##    Selection Bias Corrected Odds Ratio: 0.7061267

library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = 1)
stang

## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.7984287
##    Selection Bias Corrected Odds Ratio: 0.7061267

Corrected selection bias

by 4 probabilities: among cases exposed (1), among cases unexposed(2), among noncases exposed(3), and among noncases unexposed(4).

or a single positive selection-bias factor which is the ratio of the exposed versus unexposed selection probabilities (important) comparing cases and noncases [(1/2)/(3/4)=0.8].

If there is 0.8 response rate in nocase exposed, there is a selection bias in the control group because not good representative. The association is toward null.

library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = 1/0.8)
stang

## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.6387430
##    Selection Bias Corrected Odds Ratio: 0.5649013

library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = c(1, 1, 0.8, 1))
stang

## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.6815567
##    Selection Bias Corrected Odds Ratio: 0.5649013

If the response rate is 0.45-0.9 in case group, can not correct selection bias by this approach. you have to compare the characters of response population to no response population.

library(episensr)

stang <- selection(matrix(c(136, 107, 297, 165),
                          dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                          nrow = 2, byrow = TRUE),
                   bias_parms = c(0.45, 0.9, 0.5, 1))
stang

## --Observed data-- 
##          Outcome: UM+ 
##        Comparing: Mobile+ vs. Mobile- 
## 
##     Mobile+ Mobile-
## UM+     136     107
## UM-     297     165
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
##    Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##                                                  
## Selection Bias Corrected Relative Risk: 0.8052260
##    Selection Bias Corrected Odds Ratio: 0.7061267

Information Bias (Misclassification)

Highly sensitive tests (>90%) yield few false negatives, so a negative result constitutes strong evidence to rule out or exclude the target disease. Conversely, highly specific tests (>90%) generate a very low number of false positives, so a positive result strongly implies that the target disease/condition is present. Note the prevalence (low) also affect the diagnosis test result. The association is toward to null.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms =c(0.90, 0.90, 0.90, 0.90))

## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  2.696653                    
##    Misclassification Bias Corrected Odds Ratio:  6.956681  4.048179 11.954860

No differential miscllasification

non-differential misclassification will always bias towards the null.

Let’s say the sensitivity of self-reported smoking is 94% and specificity is 97%, for both the case and control groups.

The argument bias_parms should be made of the following components: (1) Sensitivity of classification among those with the outcome, (2) Sensitivity of classification among those without the outcome, (3) Specificity of classification among those with the outcome, and (4) Specificity of classification among those without the outcome.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.94, 0.94, 0.97, 0.97))

## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                              2.5%    97.5%
## Misclassification Bias Corrected Relative Risk: 2.377254                  
##    Misclassification Bias Corrected Odds Ratio: 5.024508 3.282534 7.690912

Differential miscllasification

The effects of differential misclassification and selection bias will depend on the details.

Let’s say the sensitivity of self-reported smoking and specificity are different in the case and control groups.

Higher sensitivity (false positive) are in the case group. The association is away null.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.95, 1, 0.80, 1))

## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                              2.5%    97.5%
## Misclassification Bias Corrected Relative Risk: 1.865779                  
##    Misclassification Bias Corrected Odds Ratio: 3.205502 2.064626 4.976806

Higher sensitivity (false positive) are in the control group. The association is toward null.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(1, 0.95, 1, 0.80))

## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  3.578230                    
##    Misclassification Bias Corrected Odds Ratio: 23.881793  6.533357 87.296636

Higher sensitivity (false positive) are in the case group with low sensitivity and specificity in the control group. The association is toward null.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.95, 0.80, 0.80, 0.80))

## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  2.997496                    
##    Misclassification Bias Corrected Odds Ratio: 13.970407  3.683261 52.988991

Higher sensitivity (false positive) are in the control group with low sensitivity and specificity in the case group. The association is toward null.

misclassification(matrix(c(126, 92, 71, 224),
                         dimnames = list(c("Case", "Control"),
                                         c("Smoking +", "Smoking - ")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(0.80, 0.95, 0.80, 0.80))

## --Observed data-- 
##          Outcome: Case 
##        Comparing: Smoking + vs. Smoking -  
## 
##         Smoking + Smoking - 
## Case          126         92
## Control        71        224
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
##    Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
##                                                                  2.5%
## Misclassification Bias Corrected Relative Risk:   3.993424           
##    Misclassification Bias Corrected Odds Ratio:  29.686983   7.681550
##                                                      97.5%
## Misclassification Bias Corrected Relative Risk:           
##    Misclassification Bias Corrected Odds Ratio: 114.731667

Covariate misclassification

No covariate misclassification

misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
                              781, 21958, 754, 34471, 4860, 383588),
                            dimnames = list(c("Twins+", "Twins-"),
                                            c("Folic acid+", "Folic acid-"),
                                            c("Total", "IVF+", "IVF-")),
                            dim = c(2, 2, 3)),
                      bias_parms = NULL)

## --Observed data-- 
##          Outcome: Twins+ 
##        Comparing: Folic acid+ vs. Folic acid- 
## 
## , , Total
## 
##        Folic acid+ Folic acid-
## Twins+        1319        5641
## Twins-       38054      405546
## 
## , , IVF+
## 
##        Folic acid+ Folic acid-
## Twins+         565         781
## Twins-        3583       21958
## 
## , , IVF-
## 
##        Folic acid+ Folic acid-
## Twins+         754        4860
## Twins-       34471      383588
## 
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
##    Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
##                                                       Observed Corrected
##                       SMR RR adjusted for confounder: 2.261738  2.261738
##    RR due to confounding by misclassified confounder: 1.079661  1.079661
##           Mantel-Haenszel RR adjusted for confounder: 2.228816  2.228816
## MH RR due to confounding by misclassified confounder: 1.095608  1.095608
##                       SMR OR adjusted for confounder: 2.337898  2.337898
##    OR due to confounding by misclassified confounder: 1.065867  1.065867
##           Mantel-Haenszel OR adjusted for confounder: 2.290469  2.290469
## MH OR due to confounding by misclassified confounder: 1.087938  1.087938

No differential misclassifcation

Data on IVF were not available and a proxy for it was used, period of involuntary childlessness. However, it was a poor proxy for IVF, with a sensitivity of 60% (here use 0.9) and a specificity of 95%. These bias parameters were assumed to be nondifferential.

misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
                              781, 21958, 754, 34471, 4860, 383588),
                            dimnames = list(c("Twins+", "Twins-"),
                                            c("Folic acid+", "Folic acid-"),
                                            c("Total", "IVF+", "IVF-")),
                            dim = c(2, 2, 3)),
                      bias_parms = c(.9, .9, .95, .95))

## --Observed data-- 
##          Outcome: Twins+ 
##        Comparing: Folic acid+ vs. Folic acid- 
## 
## , , Total
## 
##        Folic acid+ Folic acid-
## Twins+        1319        5641
## Twins-       38054      405546
## 
## , , IVF+
## 
##        Folic acid+ Folic acid-
## Twins+         565         781
## Twins-        3583       21958
## 
## , , IVF-
## 
##        Folic acid+ Folic acid-
## Twins+         754        4860
## Twins-       34471      383588
## 
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
##    Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
##                                                       Observed Corrected
##                       SMR RR adjusted for confounder: 2.261738  1.265607
##    RR due to confounding by misclassified confounder: 1.079661  1.929437
##           Mantel-Haenszel RR adjusted for confounder: 2.228816  1.356976
## MH RR due to confounding by misclassified confounder: 1.095608  1.799523
##                       SMR OR adjusted for confounder: 2.337898  1.269900
##    OR due to confounding by misclassified confounder: 1.065867  1.962271
##           Mantel-Haenszel OR adjusted for confounder: 2.290469  1.399383
## MH OR due to confounding by misclassified confounder: 1.087938  1.780705

Differential misclassifcation

misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
                              781, 21958, 754, 34471, 4860, 383588),
                            dimnames = list(c("Twins+", "Twins-"),
                                            c("Folic acid+", "Folic acid-"),
                                            c("Total", "IVF+", "IVF-")),
                            dim = c(2, 2, 3)),
                      bias_parms = c(0.6, 0.95, 0.95, 0.95))

## --Observed data-- 
##          Outcome: Twins+ 
##        Comparing: Folic acid+ vs. Folic acid- 
## 
## , , Total
## 
##        Folic acid+ Folic acid-
## Twins+        1319        5641
## Twins-       38054      405546
## 
## , , IVF+
## 
##        Folic acid+ Folic acid-
## Twins+         565         781
## Twins-        3583       21958
## 
## , , IVF-
## 
##        Folic acid+ Folic acid-
## Twins+         754        4860
## Twins-       34471      383588
## 
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
##    Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
##                                                        Observed Corrected
##                       SMR RR adjusted for confounder: 2.2617379 0.9907589
##    RR due to confounding by misclassified confounder: 1.0796607 2.4646859
##           Mantel-Haenszel RR adjusted for confounder: 2.2288163 0.9864718
## MH RR due to confounding by misclassified confounder: 1.0956082 2.4753971
##                       SMR OR adjusted for confounder: 2.3378982 0.9907460
##    OR due to confounding by misclassified confounder: 1.0658668 2.5151633
##           Mantel-Haenszel OR adjusted for confounder: 2.2904691 0.9834783
## MH OR due to confounding by misclassified confounder: 1.0879378 2.5337497

The mantel haenszel association with missclassifiation is away null.

Confounding bias

unmeasured or unknown confounders

The frequency of drug-attributed rash in relation to allopurinol exposure with sex treated as a potential confounding factor (stratification)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

RR_no=0.5249886 # RR between confounder and outcome among non-exposed

p1=0.5671642  # prevalence of confounder among exposed

p2=0.5417661  # prevalence of confounder among unexposed 

rash <- matrix(c(15, 94, 52, 1163),
               dimnames = list(c("Rash +", "Rash -"),
                               c("Allopurinol +", "Allopurinol -")),
               nrow = 2, byrow = TRUE)

rash %>% confounders(., type = "RR", bias_parms = c(RR_no, p1, p2))

## --Observed data-- 
##          Outcome: Rash + 
##        Comparing: Allopurinol + vs. Allopurinol - 
## 
##        Allopurinol + Allopurinol -
## Rash +            15            94
## Rash -            52          1163
## 
##                                           2.5%    97.5%
##         Crude Relative Risk: 2.993808 1.840724 4.869218
## Relative Risk, Confounder +: 3.043245                  
## Relative Risk, Confounder -: 3.043245                  
## ---
##                                           RR_conf
## Standardized Morbidity Ratio: 3.0432449 0.9837551
##              Mantel-Haenszel: 3.0432449 0.9837551

Generate three matrix by confounders or get these parameters from other researches

## Outcome by confounders
rash_conf <- matrix(c(36, 58, 645, 518),
                    dimnames = list(c("Rash +", "Rash -"),
                                    c("Males", "Females")),
                    nrow = 2, byrow = TRUE)
rash_conf

##        Males Females
## Rash +    36      58
## Rash -   645     518

## By confounders: among males
rash_males <- matrix(c(5, 36, 33, 645),
                     dimnames = list(c("Rash +", "Rash -"),
                                     c("Allopurinol +", "Allopurinol -")),
                     nrow = 2, byrow = TRUE)
rash_males

##        Allopurinol + Allopurinol -
## Rash +             5            36
## Rash -            33           645

## By confounders: among females
rash_females <- matrix(c(10, 58, 19, 518),
                       dimnames = list(c("Rash +", "Rash -"),
                                       c("Allopurinol +", "Allopurinol -")),
                       nrow = 2, byrow = TRUE)
rash_females

##        Allopurinol + Allopurinol -
## Rash +            10            58
## Rash -            19           518

(RR_no <- (36/(36+645))/(58/(58+518))) # RR between confounder and outcome among non-exposed

## [1] 0.5249886

## [1] 0.5249886
(p1 <- (5+33)/(15+52)) # prevalence of confounder among exposed

## [1] 0.5671642

## [1] 0.5671642
(p2 <- (36+645)/(94+1163)) # prevalence of confounder among unexposed

## [1] 0.5417661

## [1] 0.5417661

Adjust RR directly

confounders.ext(RR = 2, bias_parms = c(0.1, 0.9, 0.1, 0.4))

## --Input bias parameters--
##                                      
## RR(Confounder-Disease):           0.1
## OR(Exposure category-Confounder): 0.9
## p(Confounder):                    0.1
## p(Exposure):                      0.4
## ---
## 
##                        
## Crude RR       1.009328
## Percent bias -49.533590

Summary

There are several mechanisms that can produce selectinon bias:
- Selection of a comparison group (“controls”) that is not representative of the population that produced the cases in a case-control study. (Control selection bias)
- Differential loss to follow up in a cohort study, such that the likelihood of being lost to follow up is related to outcome status and exposure status. (Loss to follow-up bias)
- Refusal, non-response, or agreement to participate that is related to the exposure and disease (Self-selection bias)
- Using the general population as a comparison group for an occupational cohort study (“Healthy worker effect”)
- Differential referral or diagnosis of subjects
There are several mechanisms that can produce misclassification bias:
- Recall bias
- Interviewer Bias
- Differences in the Quality of Information
Unlike selection and information bias, which can be introduced by the investigator or by the subjects, confounding is a type of bias that can be adjusted for in the analysis.
Confounding bias can be analyzed by regression and the association can be adjusted by data from other studies (instrument variable, Bayesian).
Selection bias and misclassification bias can be adjusted. selection bias can be identified by no response rate (or comparing the characteristics) and misclassification bias can be identified by validity diagnostic test.