Homework 1 for EPI 208

Question 1.1:
The following table displays the relationship between BPMEDS2 and PREVCHD2 at the 1956 exam.

##                      prevchd2
## bpmeds2               1 Prevalent CHD 2 Free of CHD  Sum
##   1 Current use                    20           124  144
##   2 Not curently used             166          4063 4229
##   Sum                             186          4187 4373

Let $ p_1 $ be the prevalence in the current anti-hypertensive users, and $ p_2 $ be the prevalence in the non-users.

What is the prevalence of CHD at the 1956 exam among participants taking anti-hypertension medication?

$p_1 = \frac {20} {144} = 0.1389$.

What is the prevalence of CHD at the 1956 exam among participants not taking anti-hypertension medication?

$p_2 = \frac {166} {4229} = 0.0393$.

What is the value for the Prevalence Odds Ratio?

$\frac {\frac {p_1} {1 - p_1}} {\frac {p_2} {1 - p_2}} = \frac {0.1613} {0.0409} = 3.9477$.

In general, what are the reasons why one group of participants might have a higher prevalence of disease than another group in a cross-sectional study?

The association can stem from:

selection bias: If the study is designed in such a way that “unexposed” poople WITHOUT disease are preferrentially included, the exposed people falsely appear to have higher prevalence of disease, as the “control” group has a prevalence lower than the general population. This is a spurious association (Gordis, Epidemiology, 2010).
confounding: If another factor other than the disease or exposure status is associated with both the exposure and disease, exposure and disease appear to have association.
causality: If the exposure does cause the disease, the association is a causal relationship. However, in a cross-sectional study, one cannot tell reverse causality from causality. For example, use of anti-hypertensive (exposure) is associated with hypertension (disease), but this is because presence of hypertension (disease) is the reason for anti-hypertensive prescription (exposure).

Which reason is the most likely reason for the association in these data?

Here confounding by the presence of hypertension, which is the reason for antihypertensive prescription and a risk factor for CHD, is the likely reason in for the association observed in these data.

Question 1.2:
The following table displays the relationship between DIABETES2 at the 1956 exam and mortality (DEATH2) during the 24 years of follow-up. There were no losses-to-follow-up for the outcome of death (mortality status is obtainable all subjects through a search of the National Death Index or through the Social Security Death Index).

Table of diabetes2 by death2

##                 death2
## diabetes2        1 Yes 2 No  Sum
##   1 Diabetic        94   27  121
##   2 Not diabetic  1456 2857 4313
##   Sum             1550 2884 4434

The following table displays the total amount of follow-up time until death (for those who died) or until the end of the study (for those who survived).

Analysis Variable:TIMEDTH

##   diabetes2 N.Obs      Sum   Minimum    Maximum
## 1         1   121  1752.48 0.1451061 24.0000000
## 2         2  4313 89363.10 0.0711841 24.0000000

Suppose you wanted to estimate the effect of diabetes on the incidence of DEATH? Which of the following measures of association could you calculate validly from the data given above? Calculate the value for those that you can calculate. State in words the meaning of the value for each of these values.

Let $ p_1 $ be the incidence proportion (risk) and $ r_1 $ be the incidence rate in the diabetics, and $ p_2 $ be the incidence proportion (risk) and $ r_2 $ be the incidence rate in the non-diabetics.

In this case, all cases are complete observations (24 years of follow-up), as we know, for all participlants, if they were dead or alive at the end of the follow-up period without any lost-to-follow-up cases. Also there are no competing risks as outcome of interest is death, i.e., nothing can prevent people from being at risk of death. When all cases have the equal length of follow-up duration, both risks and rates are valid measure of association.

$p_1 = \frac {94} {121} = 0.7769$, $r_1 = \frac {94} {1752.48} = 0.0536$ cases / pearson-year, $p_2 = \frac {1456} {4313} = 0.3376$, and $r_2 = \frac {1456} {89363.10} = 0.0163$ cases / pearson-year.

Risk Ratio:
$\frac {p_1} {p_2} = 2.3012$.
During the 24 years of follow-up, the diabetic participants had a 2.3 times higher (130% more) risk of death compared to the non-diabetic participants.
Odds Ratio:
${\frac {p_1} {1 - p_1}} / {\frac {p_2} {1 - p_2}} = 6.8315$.
During the 24 years of follow-up, the diabetic participants had 6.8 times higher odds of death compared to the non-diabetic participants.
Rate Ratio:
$\frac {r_1} {r_2} = 3.2921$.
During the 24 years of follow-up, the diabetic participants had a 3.3 times higher (230% more) rate of incidence rate compared to the non-diabetic participants.

Question 1.3:
The following table displays the relationship between DIABETES2 at the 1956 exam and the incidence of coronary heart disease (CHD2) during the 24 years of follow-up. The table excludes the 194 participants who had a history of Coronary Heart Disease (PRECVED2) at the 1956 exam.

Table of diabetes2 by anychd2

##                 anychd2
## diabetes2        1 Yes 2 No  Sum
##   1 Diabetic        55   54  109
##   2 Not diabetic   991 3140 4131
##   Sum             1046 3194 4240

The following table displays the total amount of follow-up time until the development of Coronary Heart Disease (for those who developed CHD) or until last contact (for those who died for other causes or were lost-to-follow-up) or until the end of the study (for others).

Analysis Variable:TIMECHD

##   diabetes2 N.Obs      Sum   Minimum    Maximum
## 1         1   109  1443.30 0.1451061 24.0000000
## 2         2  4131 79481.85 0.0711841 24.0000000

Suppose you wanted to investigate the effect of diabetes on the incidence of CHD. Which of the following measures of association could you calculate validly from the data given above? Calculate the value for those that you can calculate. State in words the meaning of the value for each of these values.

Let $ p_1 $ be the incidence proportion and $ r_1 $ be the incidence rate in the diabetics, and $ p_2 $ be the incidence proportion and $ r_2 $ be the incidence rate in the non-diabetics.

Risk is not reliable in this case because there are incomplete observations, i.e., lost-to-follow-up patients. “The measure of risk requires that all of the N people are followed for the entire time during which the risk is being measured” (page 39, Rothman 2012). Also death from other causes remove the patient from the at-risk population, thus it is a competing risk event. Here only the rate measure is valid.
$p_1 = \frac {55} {109} = 0.5046$, $r_1 = \frac {55} {1443.30} = 0.0381$ cases / pearson-year, $p_2 = \frac {991} {3140} = 0.3156$, and $r_2 = \frac {991} {79481.85} = 0.0125$ cases / pearson-year.

Risk Ratio:
$\frac {p_1} {p_2} = 1.5988$.
Not valid.

Odds Ratio:
${\frac {p_1} {1 - p_1}} / {\frac {p_2} {1 - p_2}} = 2.2087$.
Not valid. Odds also do not account for varying length of follow-up durations.

Rate Ratio:
$\frac {r_1} {r_2} = 3.0563$.
The diabetic patients have 3.1 times higher (210% greater) rate of developing coronary heart disease.

Question 2.1:
The following tables display the number of deaths (DEATHS2) and follow-up time (FUTIME1) by treated groups. Which outcome measure, cumulative incidence or incidence rate, would you use to measure incidence of DEATH in the treatment and placebo groups? Why? Based on your answer, calculate the Risk Ratio, Odds Ratio, or the Incidence Rate Ratio from these data, whichever is more appropriate, and describe its meaning in a sentence.

Table of trtmt2 by death2

##              death2
## trtmt2        1 Died 2 Otherwise  Sum
##   1 Treatment   1181        2216 3397
##   2 Placebo     1194        2209 3403
##   Sum           2375        4425 6800

Analysis Variable:futime1

##   trtmt2 N.Obs     Sum   Minimum   Maximum
## 1      1  3397 9898.82 0.0027379 4.8761123
## 2      2  3403 9904.45 0.0027379 4.8459959

In the question, it is stated that “there is a large variability in the potential follow-up times”. This renders risk and odds unreliable, as “the measure of risk requires that all of the N people are followed for the entire time during which the risk is being measured” (page 39, Rothman 2012). Instead, incidence rate, which can accomodate varying length of follow-up durations, should be used.

Let $ r_1 $ be the incidence rate in the treatment arm, and $ r_2 $ be that in the placebo arm.

$r_1 = 1181 / 9898.82 = 0.1193$ cases / person-year.
$r_2 = 1194 / 9904.45 = 0.1206$ cases / person-year.
$\frac {r_1} {r_2} = 0.9897$ is the incidence rate ratio.

Compared to the placebo arm, the event rate of patients in the treatment arm was lower by 1%.

Question 3:
Use the internet to collect adapt and perform a retrospective cohort study to investigate the relationship between political party (Democrat vs. Republican) and the mortality rate (incidence rate of death) after starting office (until death or 2012 if still alive) for American Presidents who served after 1900. Create and display the data set by completing the following table for these Presidents. Two of the entries are inserted but you will need to fill in the others. Calculate the mortality rate for each party and an appropriate measure of association.

Dataset
Data from Wikipedia.org.

##        President Party YearStarted Death LastFollowup FollowUp
## 1    T.Roosevelt     R        1901     1         1919       18
## 2        WH.Taft     R        1909     1         1930       21
## 3       W.Wilson     D        1913     1         1924       11
## 4     WG.Harding     R        1921     1         1923        2
## 5     C.Coolidge     R        1923     1         1933       10
## 6       H.Hoover     R        1929     1         1964       35
## 7   FD.Roosevelt     D        1933     1         1945       12
## 8      HS.Truman     D        1945     1         1972       27
## 9  DD.Eisenhower     R        1953     1         1969       16
## 10    JF.Kennedy     D        1961     1         1963        2
## 11    LB.Johnson     D        1963     1         1973       10
## 12       R.Nixon     R        1969     1         1994       25
## 13        G.Ford     R        1974     1         2006       32
## 14      J.Carter     D        1977     0         2012       35
## 15      R.Reagan     R        1981     1         2004       23
## 16      GHW.Bush     R        1989     0         2012       23
## 17     B.Clinton     D        1993     0         2012       19
## 18       GW.Bush     R        2001     0         2012       11
## 19       B.Obama     D        2009     0         2012        3

Summary statistics

##   Party number.of.president number.of.death total.futime mortality.rate
## 1     D                   8               5          119        0.04202
## 2     R                  11               9          216        0.04167

Mortality rate ratio (Democrats / Republicans) is 1.0084. Being Democrats is associated with approximately 1% increase in mortality rate after starting officein from these data.

Question 4:

Construct a table showing the relationship between a binary exposure and the development (incidence) of a binary outcome from a prospective cohort study, where you observe the association between these variables. Describe the study in a few sentences. Calculate the appropriate measure of association from these data and explain its meaning in a sentence.

Context: Facebook is becoming a widespread infrastructure. However, It was not clear how people are reacting to my posts. The hypothesis was that women may be more concerned about posts related to security concerns.

Objective: To clarify gender difference in the reaction to my post about security concerns.

Design, Setting, and Patients: Prospective observational cohort study of my Facebook friends.

Exposure: Gender.

Main Outcome Measures: Incidence rate ratio of reaction to my post.

Results: At 1AM Friday 13th, I wrote my concerns about the gun shot-like sounds I heard around my apartment in Roxbury on Facebook. Over the next 24 hours, my Facebook friend cohort (n = 280, female n = 73, male n = 207, mostly Japanese ethnicity) was observed for reactions to the post, defined as “Likes” or “Comments”. The cohort is an ongoing open cohort, however there was no loss to follow up or new enrollment during the study period, enabling risk calculation. Overall 6720 person-hours of total observation time was obtained (1752 p-h among female, 4968 p-h among male). Incidence of first reaction was 19 in total (5 among female and 14 among male).

2 x 2 Table

##         reaction
## gender   Present Absent Sum
##   Female       5     68  73
##   Male        14    193 207
##   Sum         19    261 280

The risks for reactions were 0.0685, 0.0676, 0.0679 for female, male, and overall participants, respectively. Risk ratio for reactions was: 1.0127.

Incidence Table

##   Gender N.event total.obs.time incidence.rate
## 1 Female       5           1752       0.002854
## 2   Male      14           4968       0.002818

Incidence rates were 25.0171, 24.7029 per person-year for female and male participants, respectively.
Incidence rate ratio for reactions was: 1.0127.

Some notable Comments were:
“You should buy one, too!”
“My friend was a victim of a lobster robbery there!”
“Much safer than Los Angeles.”

Conclusions: There was a slightly (1.2%) higher rate of reactions to a post about security concern among female participants. Further studies in different settings are required to clarify the clinical significance of this result. (The abstract format was borrowed from JAMA.)

Appendix

Shown below was my original plan for question 4, until I noticed it really was a cross sectional study, and abandoned the plan.

Context: Daytime drowsiness is considered a serious epidemic among graduate students.

Objective: To clarify the association between classes taken and daytime drowsiness.

Design, Setting, and Patients: Cross sectional study of graduate students who happen to be in the BIO 206 class on 7/13 Friday.

Exposure: Type of afternoon class.

Main Outcome Measures: Prevalence of current sleepiness, or feeling of sleep deprivation.

Results: Data were collected from 89 participants among 174 potential participants. How many of the other 85 participants were actually present in the classroom was unknown.

Number of students in each category.

##  GHP532  HPM276  HPM277  HPM530   ID251  Others  RDS286 SHDH201 
##       7      20      23       1       7       7      12      12

Proportion (%) of students sleepy in each category.

##  GHP532  HPM276  HPM277  HPM530   ID251  Others  RDS286 SHDH201 
##   57.14   65.00   39.13    0.00   57.14   42.86   75.00   66.67

Conclusions: As expected, the decision analysis class (RDS 286) was associated with higher prevalence of sleep-deprivation and daytime sleepiness. Whether this was causal effect of the class itself or hard-working participants who had already been sleep-deprived were enrolled in the class must be clarify in further studies.