Question 1.1:
The following table displays the relationship between BPMEDS2 and PREVCHD2 at the 1956 exam.
## prevchd2
## bpmeds2 1 Prevalent CHD 2 Free of CHD Sum
## 1 Current use 20 124 144
## 2 Not curently used 166 4063 4229
## Sum 186 4187 4373
Let \( p_1 \) be the prevalence in the current anti-hypertensive users, and \( p_2 \) be the prevalence in the non-users.
What is the prevalence of CHD at the 1956 exam among participants taking anti-hypertension medication?
$p_1 = \frac {20} {144} = 0.1389$.
What is the prevalence of CHD at the 1956 exam among participants not taking anti-hypertension medication?
$p_2 = \frac {166} {4229} = 0.0393$.
What is the value for the Prevalence Odds Ratio?
$\frac {\frac {p_1} {1 - p_1}} {\frac {p_2} {1 - p_2}} = \frac {0.1613} {0.0409} = 3.9477$.
In general, what are the reasons why one group of participants might have a higher prevalence of disease than another group in a cross-sectional study?
The association can stem from:
Which reason is the most likely reason for the association in these data?
Here confounding by the presence of hypertension, which is the reason for antihypertensive prescription and a risk factor for CHD, is the likely reason in for the association observed in these data.
Question 1.2:
The following table displays the relationship between DIABETES2 at the 1956 exam and mortality (DEATH2) during the 24 years of follow-up. There were no losses-to-follow-up for the outcome of death (mortality status is obtainable all subjects through a search of the National Death Index or through the Social Security Death Index).
Table of diabetes2 by death2
## death2
## diabetes2 1 Yes 2 No Sum
## 1 Diabetic 94 27 121
## 2 Not diabetic 1456 2857 4313
## Sum 1550 2884 4434
The following table displays the total amount of follow-up time until death (for those who died) or until the end of the study (for those who survived).
Analysis Variable:TIMEDTH
## diabetes2 N.Obs Sum Minimum Maximum
## 1 1 121 1752.48 0.1451061 24.0000000
## 2 2 4313 89363.10 0.0711841 24.0000000
Suppose you wanted to estimate the effect of diabetes on the incidence of DEATH? Which of the following measures of association could you calculate validly from the data given above? Calculate the value for those that you can calculate. State in words the meaning of the value for each of these values.
Let \( p_1 \) be the incidence proportion (risk) and \( r_1 \) be the incidence rate in the diabetics, and \( p_2 \) be the incidence proportion (risk) and \( r_2 \) be the incidence rate in the non-diabetics.
In this case, all cases are complete observations (24 years of follow-up), as we know, for all participlants, if they were dead or alive at the end of the follow-up period without any lost-to-follow-up cases. Also there are no competing risks as outcome of interest is death, i.e., nothing can prevent people from being at risk of death. When all cases have the equal length of follow-up duration, both risks and rates are valid measure of association.
$p_1 = \frac {94} {121} = 0.7769$, $r_1 = \frac {94} {1752.48} = 0.0536$ cases / pearson-year, $p_2 = \frac {1456} {4313} = 0.3376$, and $r_2 = \frac {1456} {89363.10} = 0.0163$ cases / pearson-year.
Risk Ratio:
$\frac {p_1} {p_2} = 2.3012$.
During the 24 years of follow-up, the diabetic participants had a 2.3 times higher (130% more) risk of death compared to the non-diabetic participants.
Odds Ratio:
${\frac {p_1} {1 - p_1}} / {\frac {p_2} {1 - p_2}} = 6.8315$.
During the 24 years of follow-up, the diabetic participants had 6.8 times higher odds of death compared to the non-diabetic participants.
Rate Ratio:
$\frac {r_1} {r_2} = 3.2921$.
During the 24 years of follow-up, the diabetic participants had a 3.3 times higher (230% more) rate of incidence rate compared to the non-diabetic participants.
Question 1.3:
The following table displays the relationship between DIABETES2 at the 1956 exam and the incidence of coronary heart disease (CHD2) during the 24 years of follow-up. The table excludes the 194 participants who had a history of Coronary Heart Disease (PRECVED2) at the 1956 exam.
Table of diabetes2 by anychd2
## anychd2
## diabetes2 1 Yes 2 No Sum
## 1 Diabetic 55 54 109
## 2 Not diabetic 991 3140 4131
## Sum 1046 3194 4240
The following table displays the total amount of follow-up time until the development of Coronary Heart Disease (for those who developed CHD) or until last contact (for those who died for other causes or were lost-to-follow-up) or until the end of the study (for others).
Analysis Variable:TIMECHD
## diabetes2 N.Obs Sum Minimum Maximum
## 1 1 109 1443.30 0.1451061 24.0000000
## 2 2 4131 79481.85 0.0711841 24.0000000
Suppose you wanted to investigate the effect of diabetes on the incidence of CHD. Which of the following measures of association could you calculate validly from the data given above? Calculate the value for those that you can calculate. State in words the meaning of the value for each of these values.
Let \( p_1 \) be the incidence proportion and \( r_1 \) be the incidence rate in the diabetics, and \( p_2 \) be the incidence proportion and \( r_2 \) be the incidence rate in the non-diabetics.
Risk is not reliable in this case because there are incomplete observations, i.e., lost-to-follow-up patients. “The measure of risk requires that all of the N people are followed for the entire time during which the risk is being measured” (page 39, Rothman 2012). Also death from other causes remove the patient from the at-risk population, thus it is a competing risk event. Here only the rate measure is valid.
$p_1 = \frac {55} {109} = 0.5046$, $r_1 = \frac {55} {1443.30} = 0.0381$ cases / pearson-year, $p_2 = \frac {991} {3140} = 0.3156$, and $r_2 = \frac {991} {79481.85} = 0.0125$ cases / pearson-year.
Risk Ratio:
$\frac {p_1} {p_2} = 1.5988$.
Not valid.
Odds Ratio:
${\frac {p_1} {1 - p_1}} / {\frac {p_2} {1 - p_2}} = 2.2087$.
Not valid. Odds also do not account for varying length of follow-up durations.
Rate Ratio:
$\frac {r_1} {r_2} = 3.0563$.
The diabetic patients have 3.1 times higher (210% greater) rate of developing coronary heart disease.
Question 2.1:
The following tables display the number of deaths (DEATHS2) and follow-up time (FUTIME1) by treated groups. Which outcome measure, cumulative incidence or incidence rate, would you use to measure incidence of DEATH in the treatment and placebo groups? Why? Based on your answer, calculate the Risk Ratio, Odds Ratio, or the Incidence Rate Ratio from these data, whichever is more appropriate, and describe its meaning in a sentence.
Table of trtmt2 by death2
## death2
## trtmt2 1 Died 2 Otherwise Sum
## 1 Treatment 1181 2216 3397
## 2 Placebo 1194 2209 3403
## Sum 2375 4425 6800
Analysis Variable:futime1
## trtmt2 N.Obs Sum Minimum Maximum
## 1 1 3397 9898.82 0.0027379 4.8761123
## 2 2 3403 9904.45 0.0027379 4.8459959
In the question, it is stated that “there is a large variability in the potential follow-up times”. This renders risk and odds unreliable, as “the measure of risk requires that all of the N people are followed for the entire time during which the risk is being measured” (page 39, Rothman 2012). Instead, incidence rate, which can accomodate varying length of follow-up durations, should be used.
Let \( r_1 \) be the incidence rate in the treatment arm, and \( r_2 \) be that in the placebo arm.
$r_1 = 1181 / 9898.82 = 0.1193$ cases / person-year.
$r_2 = 1194 / 9904.45 = 0.1206$ cases / person-year.
$\frac {r_1} {r_2} = 0.9897$ is the incidence rate ratio.
Compared to the placebo arm, the event rate of patients in the treatment arm was lower by 1%.
Question 3:
Use the internet to collect adapt and perform a retrospective cohort study to investigate the relationship between political party (Democrat vs. Republican) and the mortality rate (incidence rate of death) after starting office (until death or 2012 if still alive) for American Presidents who served after 1900. Create and display the data set by completing the following table for these Presidents. Two of the entries are inserted but you will need to fill in the others. Calculate the mortality rate for each party and an appropriate measure of association.
Dataset
Data from Wikipedia.org.
## President Party YearStarted Death LastFollowup FollowUp
## 1 T.Roosevelt R 1901 1 1919 18
## 2 WH.Taft R 1909 1 1930 21
## 3 W.Wilson D 1913 1 1924 11
## 4 WG.Harding R 1921 1 1923 2
## 5 C.Coolidge R 1923 1 1933 10
## 6 H.Hoover R 1929 1 1964 35
## 7 FD.Roosevelt D 1933 1 1945 12
## 8 HS.Truman D 1945 1 1972 27
## 9 DD.Eisenhower R 1953 1 1969 16
## 10 JF.Kennedy D 1961 1 1963 2
## 11 LB.Johnson D 1963 1 1973 10
## 12 R.Nixon R 1969 1 1994 25
## 13 G.Ford R 1974 1 2006 32
## 14 J.Carter D 1977 0 2012 35
## 15 R.Reagan R 1981 1 2004 23
## 16 GHW.Bush R 1989 0 2012 23
## 17 B.Clinton D 1993 0 2012 19
## 18 GW.Bush R 2001 0 2012 11
## 19 B.Obama D 2009 0 2012 3
Summary statistics
## Party number.of.president number.of.death total.futime mortality.rate
## 1 D 8 5 119 0.04202
## 2 R 11 9 216 0.04167
Mortality rate ratio (Democrats / Republicans) is 1.0084. Being Democrats is associated with approximately 1% increase in mortality rate after starting officein from these data.
Question 4:
Construct a table showing the relationship between a binary exposure and the development (incidence) of a binary outcome from a prospective cohort study, where you observe the association between these variables. Describe the study in a few sentences. Calculate the appropriate measure of association from these data and explain its meaning in a sentence.
Context: Facebook is becoming a widespread infrastructure. However, It was not clear how people are reacting to my posts. The hypothesis was that women may be more concerned about posts related to security concerns.
Objective: To clarify gender difference in the reaction to my post about security concerns.
Design, Setting, and Patients: Prospective observational cohort study of my Facebook friends.
Exposure: Gender.
Main Outcome Measures: Incidence rate ratio of reaction to my post.
Results: At 1AM Friday 13th, I wrote my concerns about the gun shot-like sounds I heard around my apartment in Roxbury on Facebook. Over the next 24 hours, my Facebook friend cohort (n = 280, female n = 73, male n = 207, mostly Japanese ethnicity) was observed for reactions to the post, defined as “Likes” or “Comments”. The cohort is an ongoing open cohort, however there was no loss to follow up or new enrollment during the study period, enabling risk calculation. Overall 6720 person-hours of total observation time was obtained (1752 p-h among female, 4968 p-h among male). Incidence of first reaction was 19 in total (5 among female and 14 among male).
2 x 2 Table
## reaction
## gender Present Absent Sum
## Female 5 68 73
## Male 14 193 207
## Sum 19 261 280
The risks for reactions were 0.0685, 0.0676, 0.0679 for female, male, and overall participants, respectively. Risk ratio for reactions was: 1.0127.
Incidence Table
## Gender N.event total.obs.time incidence.rate
## 1 Female 5 1752 0.002854
## 2 Male 14 4968 0.002818
Incidence rates were 25.0171, 24.7029 per person-year for female and male participants, respectively.
Incidence rate ratio for reactions was: 1.0127.
Some notable Comments were:
“You should buy one, too!”
“My friend was a victim of a lobster robbery there!”
“Much safer than Los Angeles.”
Conclusions: There was a slightly (1.2%) higher rate of reactions to a post about security concern among female participants. Further studies in different settings are required to clarify the clinical significance of this result. (The abstract format was borrowed from JAMA.)
Appendix
Shown below was my original plan for question 4, until I noticed it really was a cross sectional study, and abandoned the plan.
Context: Daytime drowsiness is considered a serious epidemic among graduate students.
Objective: To clarify the association between classes taken and daytime drowsiness.
Design, Setting, and Patients: Cross sectional study of graduate students who happen to be in the BIO 206 class on 7/13 Friday.
Exposure: Type of afternoon class.
Main Outcome Measures: Prevalence of current sleepiness, or feeling of sleep deprivation.
Results: Data were collected from 89 participants among 174 potential participants. How many of the other 85 participants were actually present in the classroom was unknown.
Number of students in each category.
## GHP532 HPM276 HPM277 HPM530 ID251 Others RDS286 SHDH201
## 7 20 23 1 7 7 12 12
Proportion (%) of students sleepy in each category.
## GHP532 HPM276 HPM277 HPM530 ID251 Others RDS286 SHDH201
## 57.14 65.00 39.13 0.00 57.14 42.86 75.00 66.67
Conclusions: As expected, the decision analysis class (RDS 286) was associated with higher prevalence of sleep-deprivation and daytime sleepiness. Whether this was causal effect of the class itself or hard-working participants who had already been sleep-deprived were enrolled in the class must be clarify in further studies.