Epidemiology 208: Homework #2

Due: Friday, July 27, 2012

Question #1:

exp.dis.tab <- function(a,b,c,d, dimnames = list(exposure = c("Exposed", "Non-Exposed"),
                                                 disease = c("Disease+", "Disease-"))) {
    mat <- matrix(c(a,b,c,d), ncol = 2, byrow = TRUE, dimnames = dimnames)
    mat
}

Question 1.1: What is the value for the crude Odds Ratio for death, comparing the risk of death for diabetic subjects (exposed group) to non-diabetics? (Hint: You did this in Homework #1)

Table 1.2

tab1.2 <-
exp.dis.tab(94,27,1456,2857,
            list(diabetes2 = c("1 Diabetic", "2 Not diabetic"),
                 death2    = c("1 Death","2 No Death")
                )
           )
addmargins(tab1.2)

                death2
diabetes2        1 Death 2 No Death  Sum
  1 Diabetic          94         27  121
  2 Not diabetic    1456       2857 4313
  Sum               1550       2884 4434

The crude odds ratio can be calculated from Table 1.2. By cross product method, it is:

(94 * 2857) / (27 * 1456) = 6.8315.

Question 1.2: Using the Mantel-Haenszel weighting formula, what is the value for the age-adjusted Odds Ratio for death, comparing the risk of death for diabetics (exposed group) to non-diabetics?

## Data in array-format
tab1.3.array <- array(c( 2, 90, 2, 660,
                        16,360,13,1250,
                        41,557, 7, 774,
                        35,449, 5, 173),
                      dim = c(2,2,4),
                      list(diabetes2 = c("1 Diabetic", "2 Not diabetic"),
                           death2    = c("1 Yes", "2 No"),
                           agecat    = c(1:4)))
## Matrix element vectors
ai <- tab1.3.array[1,1,]
bi <- tab1.3.array[1,2,]
ci <- tab1.3.array[2,1,]
di <- tab1.3.array[2,2,]

## Create list for displaying
tab1.3.list <- alply(.data = tab1.3.array, .margins = 3, .fun = invisible)
names(tab1.3.list) <- c("agecat1","agecat2","agecat3","agecat4")
lapply(tab1.3.list, addmargins)

$agecat1
                death2
diabetes2        1 Yes 2 No Sum
  1 Diabetic         2    2   4
  2 Not diabetic    90  660 750
  Sum               92  662 754

$agecat2
                death2
diabetes2        1 Yes 2 No  Sum
  1 Diabetic        16   13   29
  2 Not diabetic   360 1250 1610
  Sum              376 1263 1639

$agecat3
                death2
diabetes2        1 Yes 2 No  Sum
  1 Diabetic        41    7   48
  2 Not diabetic   557  774 1331
  Sum              598  781 1379

$agecat4
                death2
diabetes2        1 Yes 2 No Sum
  1 Diabetic        35    5  40
  2 Not diabetic   449  173 622
  Sum              484  178 662

The odds ratios for the age category strata are: 7.3333, 4.2735, 8.139, 2.6971 (agecat = 1, 2, 3, 4, respectively).

From page 187 of Rothman's Epidemiology: An Introduction.

\[ OR_{MH} = \frac {\sum_i \frac {a_i d_i} {T_i}} {\sum_i \frac {b_i c_i} {T_i}} \]

Calculation for the odds ratio with Mantel-Haenszel weighting using R.

ai <- c(  2,   16,  41,  35)            # Exposed, Disease+ cells
bi <- c(  2,   13,   7,   5)            # Exposed, Disease- cells
ci <- c( 90,  360, 557, 449)            # Non-Exposed, Disease+ cells
di <- c(660, 1250, 774, 173)            # Non-Exposed, Disease- cells

Ti <- (ai + bi + ci + di)               # 1. Total for each level separately
numerator   <- sum((ai * di) / Ti)      # 2. (ai * di) / Ti for each level, then sum
denominator <- sum((bi * ci) / Ti)      # 3. (bi * ci) / Ti for each level, then sum
OR.mh <- numerator / denominator        # 4. Divide 2 by 3

\( OR_{MH} \) = 4.9515 is the answer.

Visualization with a meta-analysis package.

epi.mh(ev.trt = ai, n.trt = ai + bi, ev.ctrl = ci, n.ctrl = ci + di, names, method = "odds.ratio")

meta.res <-
        meta::metabin(event.e = ai, n.e = ai + bi,
                      event.c = ci, n.c = ci + di,
                      sm = "OR", studlab = paste("agecat", c(1:4), sep = ""),
                      method="MH",
                      label.e="Exposed", label.c="Non-Exposed", outclab = "Death",
                      ## label.left="LEFT", label.right="RIGHT",
                      comb.fixed = TRUE, comb.random = FALSE)
meta.res
meta::forest.meta(meta.res)

plot of chunk unnamed-chunk-6

Question 1.3: Is age a confounder in these data? Why or why not

It is a counfounder.

As age can be associated with both diabetes (exposure) and death (outcome), it is a potential confounder. Also the crude OR and \( OR_{MH} \) are different indicating presence of confounding. In fact, the age composition of the diabetics (exposed) is different from that of the non-diabetics because there are more older people in the diabetic population, at least partly contributing to the higher mortality in the diabetic group.

Question 1.4: Is age an effect-modifier in these data? Why or why not

It is an effect measure modifier.

The ORs are different by more than 20% across the age strata, indicating different effect of exposure (being diabetic) on the outcome (death) in these strata, at least in the ratio scale. Thus, the effect measure (odds ratio here) is modified by the agecat variable, thus age is an effect measure modifier.

Question # 2

Question 2.1: Complete the following crude and sex-specific tables showing data that are consistent with the results given above:

From finding 1, there are 500 exposed and 500 non-exposed.
From finding 2, there are 400 exposed men and 100 non-exposed men. Thus, there are 100 exposed women and 500 non-exposed women.
From finding 3, Risk(women non- exposed) = 0.1. Thus, there are 10 diseased and 90 non- diseased among the non- exposed women.
From finding 4, Risk(men exposed) = 3 * Risk(women exposed).
From finding 5, Risk(men non- exposed) = 3 * Risk(women non- exposed) = 3 * 0.1 (finding 3) = 0.3. Thus, there are 30 diseased and 70 non- diseased among the non- exposed men.
From finding 6, Risk(men exposed) = 2 * Risk(men non- exposed) = 2 * 0.3 = 0.6. Thus, there are 240 diseased and 160 non- diseased among the exposed men.
From finding 7, Risk(women exposed) = 2 * Risk(women non- exposed) = 2 * 0.1 = 0.2. Thus, there are 20 diseased and 80 non-diseased among the exposed women.
From these calculations, the tables below were constructed.

tab.q2 <- function(a,b,c,d) {
    mat <- matrix(c(a,b,c,d), ncol = 2, byrow = TRUE,
                  dimnames = list(Exposure = c("Exposed", "Non-Exposed"),
                                  Disease  = c("Disease+", "Disease-")))
    mat
}

Crude Table

addmargins(tab.q2(260,240,70,430))

             Disease
Exposure      Disease+ Disease-  Sum
  Exposed          260      240  500
  Non-Exposed       70      430  500
  Sum              330      670 1000

Sex-specific tables:
Males

addmargins(tab.q2(240,160,30,70))

             Disease
Exposure      Disease+ Disease- Sum
  Exposed          240      160 400
  Non-Exposed       30       70 100
  Sum              270      230 500

Females

addmargins(tab.q2(20,80,40,360))

             Disease
Exposure      Disease+ Disease- Sum
  Exposed           20       80 100
  Non-Exposed       40      360 400
  Sum               60      440 500

Question 2.2: Is sex a confounder? Why or why not?

It is a confounder.

The crude risk ratio is (260 / 500) / (70 / 500) = 3.7143.
The adjusted risk ratio is 2.0 as shown below.
The difference between these suggest there is a confounding by the stratification variable, sex.

The adjusted risk ratio by the Mantel-Haenszel method is defined as:

\[ RR_{MH} = \frac {\sum_i \frac {a_i (c_i + d_i)} {T_i}} {\sum_i \frac {c_i (a_i + b_i)} {T_i}} \]

Calculation for the \( RR_{MH} \).

ai <- c(240,  20)       # Exposed, Disease+ cells
bi <- c(160,  80)       # Exposed, Disease- cells
ci <- c( 30,  40)       # Non-Exposed, Disease+ cells
di <- c( 70, 360)       # Non-Exposed, Disease- cells
Ti <- ai + bi + ci + di # Total cells

numerator   <- sum(ai * (ci + di) / Ti)
denominator <- sum(ci * (ai + bi) / Ti)

RR.mh <- numerator / denominator
cat("RR.mh =", RR.mh, "\n")

RR.mh = 2

Visualization of \( RR_{male}, RR_{female}, RR_{MH} \).

epi.mh(ev.trt = ai, n.trt = ai + bi, ev.ctrl = ci, n.ctrl = ci + di, names, method = "risk.ratio")

meta.res <-
        meta::metabin(event.e = ai, n.e = ai + bi,
                      event.c = ci, n.c = ci + di,
                      sm = "RR", studlab = c("Male","Female"),
                      method="MH",
                      label.e="Exposed", label.c="Non-Exposed", outclab = "Death",
                      comb.fixed = TRUE, comb.random = FALSE)
meta.res
meta::forest.meta(meta.res)

plot of chunk unnamed-chunk-12

There are more exposed males than exposed females, and also the baseline risk of disease among non-exposed is higher among the males. Therefore, sex is associated with both exposure status and outcome, and it is a definition of a confounder. By having more high-risk (mainly male) exposed people and low-risk (mainly female) non-exposed people, the risk ratio between these two groups became larger in the overall sample.

Question 2.3: Is sex an effect modifier of the Risk Ratio? Why or not?

It is not an effect (measure) modifier on the risk ratio scale.

Effect (measure) modification is present in a situation in which a measure of effect changes over value of some other variable (page 199, Rothman's Epidemiology: An Introduction, 2012). In this case, risk ratios are same across the gender strata (2.0 in both groups as shown in the diagram above), meaning sex does not modify the effect of exposure on the risk ratio scale.
In men, it is 0.6 / 0.3 = 2.
In women, it is 0.2 / 0.1 = 2.
However, the risk difference in the men (0.6 - 0.3 = 0.3) is greater than that of the women (0.2 - 0.1 = 0.1), indicating effect modification in the risk difference scale.

Visualization of \( RD_{male}, RD_{female}, RD_{MH} \).

meta.res <-
        meta::metabin(event.e = ai, n.e = ai + bi,
                      event.c = ci, n.c = ci + di,
                      sm = "RD", studlab = c("Male","Female"),
                      method="MH",
                      label.e="Exposed", label.c="Non-Exposed", outclab = "Death",
                      ## label.left="LEFT", label.right="RIGHT",
                      comb.fixed = TRUE, comb.random = FALSE)
meta.res
meta::forest.meta(meta.res)

plot of chunk unnamed-chunk-13

Question 2.4: Change the statements for one or more of the seven “findings” listed in the original table to reverse your answer to Question 2.2. (You do not need to change the answers to Question 2.1 but only write the new “finding(s)” in the space that follows).

Finding 2:
80% of exposed subjects were males \( \dashrightarrow \) 50% of the exposed subjects were males
20% of the non-exposed subjects were males \( \dashrightarrow \) 50% of the non-exposed subjects were males

In this case, the overall risk ratio is (200/500) / (100/500) = 0.4 / 0.2 = 2, and it is also 2 in each stratum.
Males: (150/250) / (75/250) = 0.6 / 0.3 = 2.0
Females: (50/250) / (25/250) = 0.2 / 0.1 = 2.0
\( RR_{MH} \) = 2.0

Visualization of \( RR_{male}, RR_{female}, RR_{MH} \).

ai <- c(150,  50)       # Exposed, Disease+ cells
bi <- c(100, 200)       # Exposed, Disease- cells
ci <- c( 75,  25)       # Non-Exposed, Disease+ cells
di <- c(175, 225)       # Non-Exposed, Disease- cells
Ti <- ai + bi + ci + di # Total cells

numerator   <- sum(ai * (ci + di) / Ti)
denominator <- sum(ci * (ai + bi) / Ti)

RR.mh <- numerator / denominator
cat("RR.mh =", RR.mh, "\n")

meta.res <-
        meta::metabin(event.e = ai, n.e = ai + bi,
                      event.c = ci, n.c = ci + di,
                      sm = "RR", studlab = c("Male","Female"),
                      method="MH",
                      label.e="Exposed", label.c="Non-Exposed", outclab = "Death",
                      ## label.left="LEFT", label.right="RIGHT",
                      comb.fixed = TRUE, comb.random = FALSE)
meta.res
meta::forest.meta(meta.res)

plot of chunk unnamed-chunk-14

Although being male is still associated with higher baseline risk of disease (0.3), now there is no association between sex and exposure, i.e., male:female ratios are the same between the exposed and non-exposed. Without having association with both exposure and outcome, the sex variable is no longer a confounder.

Crude Table

men <- addmargins(tab.q2(150,100,75,175))
women <- addmargins(tab.q2(50,200,25,225))
total <- men + women
total

             Disease
Exposure      Disease+ Disease-  Sum
  Exposed          200      300  500
  Non-Exposed      100      400  500
  Sum              300      700 1000

Sex-specific tables:
Males

men

             Disease
Exposure      Disease+ Disease- Sum
  Exposed          150      100 250
  Non-Exposed       75      175 250
  Sum              225      275 500

Females

women

             Disease
Exposure      Disease+ Disease- Sum
  Exposed           50      200 250
  Non-Exposed       25      225 250
  Sum               75      425 500

Question 2.5: Change the statements for one or more of the seven “findings” listed in the original table to reverse your answer to Question 2.3. (You do not need to change the answers to Question 2.1 but only write the new “finding(s)” in the space that follows).

Changed:
Finding 4: three times \( \dashrightarrow \) two times
Finding 6: 2.0 \( \dashrightarrow \) 1.333333

In this case, the risk ratio are different across the sex strata.
In the males, it is (160 / 400) / (30 / 100) = 0.4 / 0.3 = 1.333333.
In the females, it is (20 / 100) / (40 / 400) = 0.2 / 0.1 = 2.
The risk ratio shows effect modification by sex. However, the risk difference is constant across sex strata, i.e., 0.4 - 0.3 = 0.2 - 0.1 = 0.1.

Crude Table

men <- addmargins(tab.q2(160,240,30,70))
women <- addmargins(tab.q2(20,80,40,360))
total <- men + women
total

             Disease
Exposure      Disease+ Disease-  Sum
  Exposed          180      320  500
  Non-Exposed       70      430  500
  Sum              250      750 1000

Sex-specific tables:
Males

men

             Disease
Exposure      Disease+ Disease- Sum
  Exposed          160      240 400
  Non-Exposed       30       70 100
  Sum              190      310 500

Females

women

             Disease
Exposure      Disease+ Disease- Sum
  Exposed           20       80 100
  Non-Exposed       40      360 400
  Sum               60      440 500

Question #3:
After reading this manuscript your colleague says that the study design is biased because of the role of random error in any single blood pressure measurement impacting the selection of subjects for this study. Do you agree with your colleague? Why or why not? If the study design is biased, how would you modify the study design to avoid this problem?

I agree.

There are more people with the true blood pressure below 140 mmHg than above 140 mmHg, since by definition there should be more normal people than abnormal people. Thus, if a single blood pressure measurement is used for enrollment, there will be more false positives (normotensive people enrolled) than false negatives (hypertensive people missed). These people are expected to show regression to their mean at the final measurement, thus, the previously normotensive people are expected to have more normal measurements, causing the sample mean to be naturally lower regardless of the intervention.

To avoid this issue with random error, the mean of several blood pressure measurement should be calculated for each candidate for inclusion assessment, as random errors are expected to cancel each other out if the mean is used. The investigators should be able to include hypertensive subjects more accurately with this method.

Also, a randomized controlled trial is ideal, as comparison to a placebo arm is ultimately necessary to rule out an effect of simple secular trend.

Question #4:

Question 4.1 Comparing Groups A to Group B
Different: new vs standard treatment
Same: both in trial and randomized, blinded to treatment, cancer patients survived to 1 year

After randomization, both groups are expected to have similar baseline characteristics and are comparable except for the treatment allocation. Therefore, any between-group difference in the outcome is explained solely by the treatment allocation without confounding by other factors. Since both groups were in the trial, it is not possible to measure the “care effect” from this comparison.

Question 4.2 Comparing Groups A to Group C
Different: new vs standard treatment, in or outside trial, with or without randomization and blinding
Same: cancer patients survived to 1 year

There are many differences between these groups. Thus, the difference in outcome can be explained by either one or combinations of benefit or harm from the new treatment, the “care effect” by being in the trial, high expectation from being in the trial, and baseline differences (one arm is not randomized). Therefore, it is hard to measure the independent effect of the “care effect” by simply comparing these groups because of confounding by other between-group differences.

Question 4.3 Comparing Groups B to Group C
Different: in or outside trial, with or without randomization and blinding
Same: standard care, cancer patients survived to 1 year

These groups both receive standard of care, and the largest difference is whether they were enrolled in the trial or not, making this comparison most appropriate for measurement of the “care effect”. However, as the patients outside the trial have not been randomized, there may be baseline differences, for example, age, sex, comorbidities, and stage of cancer. Therefore, meticulous adjustment for the baseline differences to remove confounding is required for a valid comparison. Even after the adjustment, unmeasured differences such as the high expectation from being in the trial may partly explain differences in the outcome.

Question #5.

tab.q5 <- function(a,b,c,d, exposure = c("Exposed", "Non-Exposed"), disease = c("Disease+", "Disease-")) {
    mat <- matrix(c(a,b,c,d), ncol = 2, byrow = TRUE,
                  dimnames = list(Exposure = exposure,
                                  Disease  = disease)
                 )
    mat
}

Question 5.1 Complete the following 2x2 table showing the physical activity status among cases and controls. Calculate an appropriate measure of association between physical activity and cancer from these data.

addmargins(tab.q5(40,50,60,50, c("Physically Active","Not Physically Active"), c("cases","controls")))

                       Disease
Exposure                cases controls Sum
  Physically Active        40       50  90
  Not Physically Active    60       50 110
  Sum                     100      100 200

Odds ratio = (40 * 50) / (50 * 60) = 0.6667

This shows a negative association with physical activity and cancer.

Question 5.2 Suppose that the next-of-kin of the cases and the controls misreported the true physical activity status in both groups with the same error probabilities. Would the result of such misreporting among cases and controls make the value for the measure of association in the previous question be:

2. Biased towards the null (i.e. the reported value in Question 5.1. underestimates the true effect of physical activity on cancer)

As the errors occur randomly, both types of misclassification (active to not active, not active to active) occur similarly in both groups. Therefore, it is a nondifferential misclassification. This type of misclassification tends to bias the result towards null.

Question 5.3 Suppose that the next-of-kin of the cases over-reported the actual number of physically active cases at the start of the PHS but that the controls correctly reported their true physical activity at the start of the PHS. Would the result of such over-reporting among cases make the value for the measure of association in the previous question be:

2. Biased towards the null (i.e. the reported value in Question 5.1. underestimates the true effect of physical activity on cancer)

The over-report means there were fewer physically active people among the cases than actually shown in the data. As it occurred exclusively in the cases, it is a differential misclassification. For example, if there were only 20 physically active people among the cases, the odds ratio would have been:

\( (20 * 50) / (50 * 80) = 0.25 \).

This means that the effect of physical activity was actually more prominent. Thus, the unidirectional over-reporting of the number of physically active people among the cases only, but not in the controls, caused a bias towards null.

Question 5.4 Concerned that the next-of-kin of diseased subjects might recall past physical activity differently than alive subjects, suppose you enrolled deceased subjects as controls and obtain information of their level of physical activity at the start of the PHS from their next-of-kin (as done for the 100 cases). Suppose that you enroll 100 controls who died from coronary heart disease as controls and that next of kin reported that 20 of these controls were physically active at the start of the PHS.

Complete the following 2x2 table showing the physical activity status among cases and controls. Calculate an appropriate measure of association between physical activity and cancer from these data.

addmargins(tab.q5(40,20,60,80, c("Physically Active","Not Physically Active"), c("cases","controls")))

                       Disease
Exposure                cases controls Sum
  Physically Active        40       20  60
  Not Physically Active    60       80 140
  Sum                     100      100 200

Odds ratio = (40 * 80) / (20 * 60) = 2.6667

This shows a positive association with physical activity and cancer.

Do you prefer to use the deceased controls in this table or the alive controls in the table in Question 5.1? Why?

Deceased controls are preferable in terms of information bias, however, those who died of coronary heart disease are not appropriate.

Controls in a case-control study should represent the population at risk for the disease of interest, in this case cancer death. The deceased control in this example is solely comprised of patients died of a single cause, coronary heart disease. If the reason of death is associated with the exposure in question, the result will be biased, as the exposure status is expected to be different form the population at risk. In this particular example, there were fewer physically active people in the controls, probably fewer than the general at-risk population, which may be the cause or result of their heart disease. As the controls did not correctly represent the physical activity status of the at-risk population, the resulting odds ratio was probably biased upward.

Alive controls can recall the exercise status themselves, thus it is expected to be more accurate than information from their family. This can cause information bias.

The most appropriate kind of patients are the deceased patients at the same hospital whose cause of death was not associated with exercise habits. These patients would be from the same source population both geographically and socioeconomically as they were at the same hospital at death. If the cause of death was not associated with exercise, the exposure status is expected to represent that of the source population.

Question #6.

Question 6.1 Construct a 2x2 table showing the relationship between time of bleeding and mortality, using the Late Bleeders as the “exposed” group. Calculate the mortality risk ratio form these data.

addmargins(tab.q5(9, 36-9, 18, 41-18, c("Late Bleeders","Early Bleeders"), c("Died","Survived")))

                Disease
Exposure         Died Survived Sum
  Late Bleeders     9       27  36
  Early Bleeders   18       23  41
  Sum              27       50  77

Risk ratio = (9/36) / (18/41) = 0.5694 (lower risk among late bleeders)

Question 6.2 Louis claimed that the two comparison groups were comparable with respect to diet, age, and treatments other than bloodletting. However, if the early bleeders had severe symptoms that caused them to see a physician early in the course of disease, but the late bleeders were less severely ill and saw a physician later in the course of disease, how might this effect the interpretation of the answer to Question 6.1 and any conclusion that you might reach. (Note: In reality, Louis claimed that the two comparison groups had similar severity of symptoms).

If there had been more severely ill patients in the Early Bleeder group, the observed difference in mortality ratio (more in the Early Bleeder) might be explained by the severity of their illness, rather than the exposure status. If that was the case, the severity of illness was associated with both the exposure (blood letting initiation time) and outcome (death), serving as a counfounder.

Question 6.3 The reported mean duration of disease was 17.8 days for the 41 early bleeders and 20.8 days for the 36 late bleeders. Do these data on duration of disease support a benefit from bloodletting? Why or why not?

These data do not give enough information, thus it is inconclusive.

These data are really time-to-death data with censoring by cure. Thus, short duration may mean a severe disease that killed a patient or a mild disease in which the patient was cured early. Without combining the time variable with the outcome variable and doing survival analysis, it is not possible to fully utilize these data. However, in this particular case, the shorter mean duration of the 41 early bleeders probably mean early death among 18 people who ultimately died.

As a matter of fact, if survival analysis is performed on the original data (manually reproduced from J Clin Epidemiol. 1996 Dec;49(12):1327-33.), The late bleeders show better survival, indicating the shorter mean disease duration among the early bleeders was indeed due to early death.

Data also available at :http://www.epidemiology.ch/history/louis.htm

blood.letting <- read.table(header = TRUE,
text = "start.letting duration n.times.letting death age
1 10 3 0 NA
1 12 2 0 NA
1 14 2 0 NA
2  7 3 0 NA
2 10 2 0 NA
2 12 2 0 NA
3 19 3 0 NA
3 29 3 0 NA
3 20 2 0 NA
3 20 NA 0 NA
3 16 3 0 NA
3 17 4 0 NA
4 19 3 0 NA
4 12 2 0 NA
4 15 2 0 NA
4 22 4 0 NA
4 12 4 0 NA
4 21 2 0 NA
4 25 3 0 NA
4 28 4 0 NA
4 40 2 0 NA
4 16 2 0 NA
4 12 4 0 NA
5 28 2 0 NA
5 17 3 0 NA
5 40 2 0 NA
5 13 2 0 NA
5 21 2 0 NA
5 13 2 0 NA
6 13 1 0 NA
6 16 2 0 NA
6 23 3 0 NA
6 35 5 0 NA
6 17 2 0 NA
7 24 2 0 NA
7 12 4 0 NA
7 19 2 0 NA
7 18 2 0 NA
7 15 2 0 NA
7 27 2 0 NA
8 19 2 0 NA
8 12 1 0 NA
8 18 1 0 NA
8 20 3 0 NA
8 13 2 0 NA
8 21 2 0 NA
9 35 1 0 NA
9 11 2 0 NA
9 17 2 0 NA
9 30 3 0 NA
1 6 5 1 18
2 53 5 1 65
2 12 3 1 69
2  8 2 1 65
2 12 1 1 55
2 17 7 1 75
3  4 1 1 57
3 16 2 1 54
3  6 3 1 30
3  6 4 1 47
3 47 2 1 75
3 11 4 1 45
4 29 2 1 19
4 29 4 1 46
4 12 1 1 85
4 15 3 1 37
4 17 1 1 67
4 20 3 1 22
5 16 4 1 58
5  8 2 1 63
5  9 4 1 24
6 62 4 1 20
6 10 2 1 40
6 29 3 1 24
7 20 2 1 68
8 25 1 1 40
9 22 1 1 50")

blood.letting <- within(blood.letting, {
        late.bleeder <- start.letting >= 5
})

## layout(matrix(c(1:2), ncol = 2))

## Time to death
survplot(survfit(Surv(duration, death == 1) ~ late.bleeder,
         data = blood.letting), label.curves=list(keys = 1:2),
         n.risk = TRUE, time.inc = 10, conf = "none")

plot of chunk unnamed-chunk-25


## In fact, if the graph is plotted for time to cure (death as censoring), these two groups show similar curves.
## Time to cure
## survplot(survfit(Surv(duration, death == 0) ~ late.bleeder,
##          data = blood.letting), label.curves=list(keys = 1:2),
##          n.risk = TRUE, time.inc = 10, conf = "none",
##          fun = function(x) 1 - x, ylab = "Cumulative Cure Probability")

All computations were carried out with R 2.15.1 (optional R packages used: plyr, rms, meta, epiR).