An observational study is a scientific investigation in which neither the subjects under study nor any of the variables of interest are manipulated in any way. The simplest form of observational study is one in which there are only two variables of interest. One of the variables is called the risk factor, or independent variable, and the other variable is referred to as the outcome, or dependent variable.
DEFINITION
The term risk factor is used to designate a variable that is thought to be related to some outcome variable. The risk factor may be a suspected cause of some specific state of the outcome variable. For example, the outcome variable might be subjects’ status relative to cancer and the risk factor might be their status with respect to cigarette smoking. The model is further simplified if the variables are categorical with only two categories per variable. For the outcome variable the categories might be cancer present and cancer absent. With respect to the risk factor subjects might be categorized as smokers and nonsmokers.
Types of Observational Studies
There are two basic types of observational studies, prospective studies and retrospective studies
A prospective study is an observational study in which two random samples of subjects are selected. One sample consists of subjects who possess the risk factor, and the other sample consists of subjects who do not possess the risk factor. The subjects are followed into the future (that is, they are followed prospectively), and a record is kept on the number of subjects in each sample who, at some point in time, are classifiable into each of the categories of the outcome variable.
Thedata resulting from a prospective study involving two dichotomous variables can be displayed in a 2 × 2 contingency table that usually provides information regarding the number of subjects with and without the risk factor and the number who did and did not succumb to the disease of interest as well as the frequencies for each combination of categories of the two variables.
Disease Status
Risk Factor
Present
Absent
Total at risk
Absent
a
b
a+ b
Present
c
d
c+d
Total
a+ c
b+ d
n
Aretrospective study is the reverse of a prospective study. The samples are selected from those falling into the categories of the outcome variable. The investigator then looks back (that is, takes a retrospective look) at the subjects and determines which ones have (or had) and which ones do not have (or did not have) the risk factor.
From the data of a retrospective study we may also construct a contingency table with frequencies similar to those that are possible for the data of a prospective study.
Relative Risk
The risk of the development of the disease among the subjects with the risk factor is a/(a+b).The risk of the development of the disease among the subjects without the risk factor is c/(c+d).
Definition: Relative risk is the ratio of the risk of developing a disease among subjects with the risk factor to the risk of developing the disease among subjects without the risk factor. Relative Risk can be calculated as the incidence proportion in the exposed (Smokers) over the incidence proportion in the unexposed (non-smokers) as illustrated with this formula a/(a+b) / c/(c+d), from 2x2 table above
Odds Ratio
The odds for success are the ratio of the probability of success to the probability of failure.
Odds ratio is a measure of association, used in study designs that deal with prevalent cases of disease (case-control, cross-sectional) and calculated as ad/bc, from a standard 2x2 table demonstrated above.
Outcome+ Outcome- Total Inc risk *
Exposure+ 64 342 406 15.76 (12.36 to 19.68)
Exposure- 68 3496 3564 1.91 (1.48 to 2.41)
Total 132 3838 3970 3.32 (2.79 to 3.93)
Point estimates and 95% CIs:
-------------------------------------------------------------------
Inc risk ratio 8.26 (5.97, 11.44)
Inc odds ratio 9.62 (6.72, 13.78)
Attrib risk in the exposed * 13.86 (10.28, 17.43)
Attrib fraction in the exposed (%) 87.90 (83.23, 91.23)
Attrib risk in the population * 1.42 (0.70, 2.13)
Attrib fraction in the population (%) 42.62 (38.62, 46.77)
-------------------------------------------------------------------
Uncorrected chi2 test that OR = 1: chi2(1) = 217.683 Pr>chi2 = <0.001
Fisher exact test that OR = 1: Pr>chi2 = <0.001
Wald confidence limits
CI: confidence interval
* Outcomes per 100 population units
Screening and Diagnostic Testing
Screening versus Diagnostic Testing
Screening refers to testing an asymptomatic population for a particular condition in order to identify those who have the condition so that they can be treated early.
Diagnostic testing, on the other hand, is performed on a patient who is symptomatic in order to determine what condition they have.
Accuracy of Screening and Diagnostic Tests
There are 4 test characteristics that used to quantify how accurate a particular test is: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
Sensitivity and Specificity
Sensitivity (Sn) is the probability that a patient tests positive given that they have the disease. While, Specificity (Sp) is the probability that a patient tests negative given that they do not have the disease .
Positive and Negative Predictive Values
The positive predictive value is the probability that you actually have the disease given that you tested positive. Whereas, negative predictive value is the probability that you do not have the disease given that you tested negative:
PPV and NPV are used to interpret test results once those results are known.
For example, if a patient tests positive for tuberculosis (TB) and you know that the prevalence of TB in the population from which the patient comes is 10%, and the PPV given a 10% prevalence is 52.6%, then the interpretation of that test result is: there is a 52.6% chance that this patient has TB.
In general, PPV will decrease and NPV will increase as prevalence decreases
epi.tests(Table,conf.level =0.95)
Outcome + Outcome - Total
Test + 64 342 406
Test - 68 3496 3564
Total 132 3838 3970
Point estimates and 95% CIs:
--------------------------------------------------------------
Apparent prevalence * 0.10 (0.09, 0.11)
True prevalence * 0.03 (0.03, 0.04)
Sensitivity * 0.48 (0.40, 0.57)
Specificity * 0.91 (0.90, 0.92)
Positive predictive value * 0.16 (0.12, 0.20)
Negative predictive value * 0.98 (0.98, 0.99)
Positive likelihood ratio 5.44 (4.44, 6.66)
Negative likelihood ratio 0.57 (0.48, 0.67)
False T+ proportion for true D- * 0.09 (0.08, 0.10)
False T- proportion for true D+ * 0.52 (0.43, 0.60)
False T+ proportion for T+ * 0.84 (0.80, 0.88)
False T- proportion for T- * 0.02 (0.01, 0.02)
Correctly classified proportion * 0.90 (0.89, 0.91)
--------------------------------------------------------------
* Exact CIs