Selection Bias, occurs when the study selected population doesn’t
represent
the target population (like refuse answer and
leave study).
bias_parms is 4 probabilities: among cases exposed (1), among cases unexposed(2), among noncases exposed(3), and among noncases unexposed(4).
library(episensr)
## Loading required package: ggplot2
## Thank you for using episensr!
## This is version 1.3.0 of episensr
## Type 'citation("episensr")' for citing this R package in publications.
stang <- selection(matrix(c(136, 107, 297, 165),
dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
nrow = 2, byrow = TRUE),
bias_parms = c(1, 1, 1, 1))
stang
## --Observed data--
## Outcome: UM+
## Comparing: Mobile+ vs. Mobile-
##
## Mobile+ Mobile-
## UM+ 136 107
## UM- 297 165
##
## 2.5% 97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
## Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##
## Selection Bias Corrected Relative Risk: 0.7984287
## Selection Bias Corrected Odds Ratio: 0.7061267
or
library(episensr)
stang <- selection(matrix(c(136, 107, 297, 165),
dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
nrow = 2, byrow = TRUE),
bias_parms = 1)
stang
## --Observed data--
## Outcome: UM+
## Comparing: Mobile+ vs. Mobile-
##
## Mobile+ Mobile-
## UM+ 136 107
## UM- 297 165
##
## 2.5% 97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
## Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##
## Selection Bias Corrected Relative Risk: 0.7984287
## Selection Bias Corrected Odds Ratio: 0.7061267
by 4 probabilities: among cases exposed (1), among cases unexposed(2), among noncases exposed(3), and among noncases unexposed(4).
or a single positive selection-bias factor which is the ratio of the exposed versus unexposed selection probabilities (important) comparing cases and noncases [(1/2)/(3/4)=0.8].
library(episensr)
stang <- selection(matrix(c(136, 107, 297, 165),
dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
nrow = 2, byrow = TRUE),
bias_parms = 1/0.8)
stang
## --Observed data--
## Outcome: UM+
## Comparing: Mobile+ vs. Mobile-
##
## Mobile+ Mobile-
## UM+ 136 107
## UM- 297 165
##
## 2.5% 97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
## Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##
## Selection Bias Corrected Relative Risk: 0.6387430
## Selection Bias Corrected Odds Ratio: 0.5649013
or
library(episensr)
stang <- selection(matrix(c(136, 107, 297, 165),
dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
nrow = 2, byrow = TRUE),
bias_parms = c(1, 1, 0.8, 1))
stang
## --Observed data--
## Outcome: UM+
## Comparing: Mobile+ vs. Mobile-
##
## Mobile+ Mobile-
## UM+ 136 107
## UM- 297 165
##
## 2.5% 97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
## Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##
## Selection Bias Corrected Relative Risk: 0.6815567
## Selection Bias Corrected Odds Ratio: 0.5649013
library(episensr)
stang <- selection(matrix(c(136, 107, 297, 165),
dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
nrow = 2, byrow = TRUE),
bias_parms = c(0.45, 0.9, 0.5, 1))
stang
## --Observed data--
## Outcome: UM+
## Comparing: Mobile+ vs. Mobile-
##
## Mobile+ Mobile-
## UM+ 136 107
## UM- 297 165
##
## 2.5% 97.5%
## Observed Relative Risk: 0.7984287 0.6518303 0.9779975
## Observed Odds Ratio: 0.7061267 0.5143958 0.9693215
## ---
##
## Selection Bias Corrected Relative Risk: 0.8052260
## Selection Bias Corrected Odds Ratio: 0.7061267
Highly sensitive tests (>90%) yield few false negatives, so a
negative result constitutes strong evidence to rule out or exclude the
target disease. Conversely, highly specific tests (>90%) generate a
very low number of false positives, so a positive result strongly
implies that the target disease/condition is present. Note the
prevalence (low) also affect the diagnosis
test result. The
association is toward to null.
misclassification(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"),
c("Smoking +", "Smoking - ")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms =c(0.90, 0.90, 0.90, 0.90))
## --Observed data--
## Outcome: Case
## Comparing: Smoking + vs. Smoking -
##
## Smoking + Smoking -
## Case 126 92
## Control 71 224
##
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5% 97.5%
## Misclassification Bias Corrected Relative Risk: 2.696653
## Misclassification Bias Corrected Odds Ratio: 6.956681 4.048179 11.954860
non-differential misclassification will always bias towards the null.
The argument bias_parms should be made of the following components: (1) Sensitivity of classification among those with the outcome, (2) Sensitivity of classification among those without the outcome, (3) Specificity of classification among those with the outcome, and (4) Specificity of classification among those without the outcome.
misclassification(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"),
c("Smoking +", "Smoking - ")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(0.94, 0.94, 0.97, 0.97))
## --Observed data--
## Outcome: Case
## Comparing: Smoking + vs. Smoking -
##
## Smoking + Smoking -
## Case 126 92
## Control 71 224
##
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5% 97.5%
## Misclassification Bias Corrected Relative Risk: 2.377254
## Misclassification Bias Corrected Odds Ratio: 5.024508 3.282534 7.690912
The effects of differential misclassification and selection bias will depend on the details.
Let’s say the sensitivity of self-reported smoking and specificity are different in the case and control groups.
misclassification(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"),
c("Smoking +", "Smoking - ")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(0.95, 1, 0.80, 1))
## --Observed data--
## Outcome: Case
## Comparing: Smoking + vs. Smoking -
##
## Smoking + Smoking -
## Case 126 92
## Control 71 224
##
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5% 97.5%
## Misclassification Bias Corrected Relative Risk: 1.865779
## Misclassification Bias Corrected Odds Ratio: 3.205502 2.064626 4.976806
misclassification(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"),
c("Smoking +", "Smoking - ")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(1, 0.95, 1, 0.80))
## --Observed data--
## Outcome: Case
## Comparing: Smoking + vs. Smoking -
##
## Smoking + Smoking -
## Case 126 92
## Control 71 224
##
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5% 97.5%
## Misclassification Bias Corrected Relative Risk: 3.578230
## Misclassification Bias Corrected Odds Ratio: 23.881793 6.533357 87.296636
misclassification(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"),
c("Smoking +", "Smoking - ")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(0.95, 0.80, 0.80, 0.80))
## --Observed data--
## Outcome: Case
## Comparing: Smoking + vs. Smoking -
##
## Smoking + Smoking -
## Case 126 92
## Control 71 224
##
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5% 97.5%
## Misclassification Bias Corrected Relative Risk: 2.997496
## Misclassification Bias Corrected Odds Ratio: 13.970407 3.683261 52.988991
misclassification(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"),
c("Smoking +", "Smoking - ")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(0.80, 0.95, 0.80, 0.80))
## --Observed data--
## Outcome: Case
## Comparing: Smoking + vs. Smoking -
##
## Smoking + Smoking -
## Case 126 92
## Control 71 224
##
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5%
## Misclassification Bias Corrected Relative Risk: 3.993424
## Misclassification Bias Corrected Odds Ratio: 29.686983 7.681550
## 97.5%
## Misclassification Bias Corrected Relative Risk:
## Misclassification Bias Corrected Odds Ratio: 114.731667
misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
781, 21958, 754, 34471, 4860, 383588),
dimnames = list(c("Twins+", "Twins-"),
c("Folic acid+", "Folic acid-"),
c("Total", "IVF+", "IVF-")),
dim = c(2, 2, 3)),
bias_parms = NULL)
## --Observed data--
## Outcome: Twins+
## Comparing: Folic acid+ vs. Folic acid-
##
## , , Total
##
## Folic acid+ Folic acid-
## Twins+ 1319 5641
## Twins- 38054 405546
##
## , , IVF+
##
## Folic acid+ Folic acid-
## Twins+ 565 781
## Twins- 3583 21958
##
## , , IVF-
##
## Folic acid+ Folic acid-
## Twins+ 754 4860
## Twins- 34471 383588
##
##
## 2.5% 97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
## Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
## Observed Corrected
## SMR RR adjusted for confounder: 2.261738 2.261738
## RR due to confounding by misclassified confounder: 1.079661 1.079661
## Mantel-Haenszel RR adjusted for confounder: 2.228816 2.228816
## MH RR due to confounding by misclassified confounder: 1.095608 1.095608
## SMR OR adjusted for confounder: 2.337898 2.337898
## OR due to confounding by misclassified confounder: 1.065867 1.065867
## Mantel-Haenszel OR adjusted for confounder: 2.290469 2.290469
## MH OR due to confounding by misclassified confounder: 1.087938 1.087938
Data on IVF were not available and a proxy for it was used, period of involuntary childlessness. However, it was a poor proxy for IVF, with a sensitivity of 60% (here use 0.9) and a specificity of 95%. These bias parameters were assumed to be nondifferential.
misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
781, 21958, 754, 34471, 4860, 383588),
dimnames = list(c("Twins+", "Twins-"),
c("Folic acid+", "Folic acid-"),
c("Total", "IVF+", "IVF-")),
dim = c(2, 2, 3)),
bias_parms = c(.9, .9, .95, .95))
## --Observed data--
## Outcome: Twins+
## Comparing: Folic acid+ vs. Folic acid-
##
## , , Total
##
## Folic acid+ Folic acid-
## Twins+ 1319 5641
## Twins- 38054 405546
##
## , , IVF+
##
## Folic acid+ Folic acid-
## Twins+ 565 781
## Twins- 3583 21958
##
## , , IVF-
##
## Folic acid+ Folic acid-
## Twins+ 754 4860
## Twins- 34471 383588
##
##
## 2.5% 97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
## Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
## Observed Corrected
## SMR RR adjusted for confounder: 2.261738 1.265607
## RR due to confounding by misclassified confounder: 1.079661 1.929437
## Mantel-Haenszel RR adjusted for confounder: 2.228816 1.356976
## MH RR due to confounding by misclassified confounder: 1.095608 1.799523
## SMR OR adjusted for confounder: 2.337898 1.269900
## OR due to confounding by misclassified confounder: 1.065867 1.962271
## Mantel-Haenszel OR adjusted for confounder: 2.290469 1.399383
## MH OR due to confounding by misclassified confounder: 1.087938 1.780705
misclassification.cov(array(c(1319, 38054, 5641, 405546, 565, 3583,
781, 21958, 754, 34471, 4860, 383588),
dimnames = list(c("Twins+", "Twins-"),
c("Folic acid+", "Folic acid-"),
c("Total", "IVF+", "IVF-")),
dim = c(2, 2, 3)),
bias_parms = c(0.6, 0.95, 0.95, 0.95))
## --Observed data--
## Outcome: Twins+
## Comparing: Folic acid+ vs. Folic acid-
##
## , , Total
##
## Folic acid+ Folic acid-
## Twins+ 1319 5641
## Twins- 38054 405546
##
## , , IVF+
##
## Folic acid+ Folic acid-
## Twins+ 565 781
## Twins- 3583 21958
##
## , , IVF-
##
## Folic acid+ Folic acid-
## Twins+ 754 4860
## Twins- 34471 383588
##
##
## 2.5% 97.5%
## Observed Relative Risk: 2.441910 2.301898 2.590437
## Observed Odds Ratio: 2.491888 2.344757 2.648251
## ---
## Observed Corrected
## SMR RR adjusted for confounder: 2.2617379 0.9907589
## RR due to confounding by misclassified confounder: 1.0796607 2.4646859
## Mantel-Haenszel RR adjusted for confounder: 2.2288163 0.9864718
## MH RR due to confounding by misclassified confounder: 1.0956082 2.4753971
## SMR OR adjusted for confounder: 2.3378982 0.9907460
## OR due to confounding by misclassified confounder: 1.0658668 2.5151633
## Mantel-Haenszel OR adjusted for confounder: 2.2904691 0.9834783
## MH OR due to confounding by misclassified confounder: 1.0879378 2.5337497
The mantel haenszel association with missclassifiation is away null.
The frequency of drug-attributed rash in relation to allopurinol
exposure with sex treated as a potential confounding factor
(stratification
)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
RR_no=0.5249886 # RR between confounder and outcome among non-exposed
p1=0.5671642 # prevalence of confounder among exposed
p2=0.5417661 # prevalence of confounder among unexposed
rash <- matrix(c(15, 94, 52, 1163),
dimnames = list(c("Rash +", "Rash -"),
c("Allopurinol +", "Allopurinol -")),
nrow = 2, byrow = TRUE)
rash %>% confounders(., type = "RR", bias_parms = c(RR_no, p1, p2))
## --Observed data--
## Outcome: Rash +
## Comparing: Allopurinol + vs. Allopurinol -
##
## Allopurinol + Allopurinol -
## Rash + 15 94
## Rash - 52 1163
##
## 2.5% 97.5%
## Crude Relative Risk: 2.993808 1.840724 4.869218
## Relative Risk, Confounder +: 3.043245
## Relative Risk, Confounder -: 3.043245
## ---
## RR_conf
## Standardized Morbidity Ratio: 3.0432449 0.9837551
## Mantel-Haenszel: 3.0432449 0.9837551
## Outcome by confounders
rash_conf <- matrix(c(36, 58, 645, 518),
dimnames = list(c("Rash +", "Rash -"),
c("Males", "Females")),
nrow = 2, byrow = TRUE)
rash_conf
## Males Females
## Rash + 36 58
## Rash - 645 518
## By confounders: among males
rash_males <- matrix(c(5, 36, 33, 645),
dimnames = list(c("Rash +", "Rash -"),
c("Allopurinol +", "Allopurinol -")),
nrow = 2, byrow = TRUE)
rash_males
## Allopurinol + Allopurinol -
## Rash + 5 36
## Rash - 33 645
## By confounders: among females
rash_females <- matrix(c(10, 58, 19, 518),
dimnames = list(c("Rash +", "Rash -"),
c("Allopurinol +", "Allopurinol -")),
nrow = 2, byrow = TRUE)
rash_females
## Allopurinol + Allopurinol -
## Rash + 10 58
## Rash - 19 518
(RR_no <- (36/(36+645))/(58/(58+518))) # RR between confounder and outcome among non-exposed
## [1] 0.5249886
## [1] 0.5249886
(p1 <- (5+33)/(15+52)) # prevalence of confounder among exposed
## [1] 0.5671642
## [1] 0.5671642
(p2 <- (36+645)/(94+1163)) # prevalence of confounder among unexposed
## [1] 0.5417661
## [1] 0.5417661
confounders.ext(RR = 2, bias_parms = c(0.1, 0.9, 0.1, 0.4))
## --Input bias parameters--
##
## RR(Confounder-Disease): 0.1
## OR(Exposure category-Confounder): 0.9
## p(Confounder): 0.1
## p(Exposure): 0.4
## ---
##
##
## Crude RR 1.009328
## Percent bias -49.533590
There are several mechanisms that can produce selectinon bias:
There are several mechanisms that can produce misclassification bias:
Unlike selection and information bias, which can be introduced by the investigator or by the subjects, confounding is a type of bias that can be adjusted for in the analysis.
Confounding bias can be analyzed by regression and the association can be adjusted by data from other studies (instrument variable, Bayesian).
Selection bias and misclassification bias can be adjusted. selection bias can be identified by no response rate (or comparing the characteristics) and misclassification bias can be identified by validity diagnostic test.