RM2 2025 S2 Assignment 2

Author

Beth Firipis

Published

October 7, 2025

Submission Instructions

Due 11:59pm 7th October 2025. This assignment is worth 30% of the overall mark.

Please submit your work in one document, with your name, the course title and the assignment number in the file name (e.g. RM2 Assignment2 JimmyBarnes.docx) and in the document itself (either on the title page, or in the header/footer). Please submit either a Word or PDF file. You should include relevant statistical code in an appendix, showing how you answered the question. Please use a different font (Courier 10pt is recommended) and format it neatly, to make your answers easy to read and understand. You should submit your assignment online according to the instructions in the BCA Turnitin Guide (available on Canvas).

Question 1. [5 Marks]

A fictitious study is performed to determine the effect of a genetic marker on survival of children from birth who were born with a life-threatening condition.

Study data for genetic marker and survival
ID Marker Time (years) Event
1 0 1 0
2 0 3 0
3 0 7 1
4 0 9 0
5 1 2 0
6 1 4 1
7 1 8 0
8 1 10 0

a) All subjects enter at time=0

i. Risk sets at each failure time

ii. Partial likelihood

iii. Kaplan-Meier estimate

b) Subject 7 enters at time 5

i. Characteristic of subject 7

ii. Partial likelihood with delayed entry

Question 2. [21 Marks]

This data is a subset from a randomised controlled trial to compare placebo and Digitalis in patients with congestive heart failure.

Variable Description
id Patient ID
trtmt Treatment (0=Placebo, 1=Digitalis)
age Age in years
sex Sex (1=Male, 2=Female)
ejfper Ejection Fraction (percent)
functcls Current NYHA Functional Class (1=I, 2=II, 3=III, 4=IV)
cvd CVD occured (0=censored, 1=yes)
cvddays Time until CVD or censoring (days)

i. Graphical displays of continuous predictors

🌸 Distribution of continuous variables 🌸

Comments

Your comments here

ii. Examine categorical predictors

🌸 Categorical predictor examination 🌸

Comments

Your comments here

iii. Demographics table

Table 1. Baseline Characteristics by Treatment Group

iv. Survival plots by predictor

🌸 Kaplan-Meier curves by predictor 🌸

Comments

Your comments here

v. Comparison table for time to CVD

Table 2. Time to CVD by Predictor

vi. Cox model for treatment only

Table 3. Cox Regression Results - Treatment Only

Comments

Your comments here

vii. Cox model with backward elimination

🌸 Backward elimination process 🌸

Table 4. Final Cox Regression Model

Comments

Your comments here

Question 3. [4 Marks]

This question uses data from the Framingham study.

Variable Description
randid Patient ID
timeevent Time to event or censoring (days)
event Event (0=censored, 1=stroke, 2=cvd, 3=death)

i. Survival curve for time to CVD

🌸 Kaplan-Meier curve for CVD (other events censored) 🌸

ii. Survival curve for time to any event

🌸 Kaplan-Meier curve for any event 🌸

Appendix: Statistical Code

Show code
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

library(knitr)
library(tidyverse)
library(survival)
library(survminer)
library(gridExtra)
library(kableExtra)

theme_set(theme_minimal())

q2_data <- read.csv(file = "assn2q2data.csv")
q3_data <- read.csv(file = "assn2q3data.csv")
blockquote {
  font-size: inherit;
}

h3 {
  font-size: 14px;
}
# Create the dataset
q1_data <- data.frame(
  id = 1:8,
  marker = c(0, 0, 0, 0, 1, 1, 1, 1),
  time = c(1, 3, 7, 9, 2, 4, 8, 10),
  event = c(0, 0, 1, 0, 0, 1, 0, 0)
)

kable(q1_data, 
      caption = "Study data for genetic marker and survival",
      col.names = c("ID", "Marker", "Time (years)", "Event"))
# Your code here to determine risk sets

# Your code here to determine partial likelihood

# Your code here for KM estimate calculation

# Your analysis here

# Your code here

# Load data
q2_data <- read.csv("assn2q2data.csv")

# Display codebook
codebook_q2 <- data.frame(
  Variable = c("id", "trtmt", "age", "sex", "ejfper", "functcls", "cvd", "cvddays"),
  Description = c(
    "Patient ID",
    "Treatment (0=Placebo, 1=Digitalis)",
    "Age in years",
    "Sex (1=Male, 2=Female)",
    "Ejection Fraction (percent)",
    "Current NYHA Functional Class (1=I, 2=II, 3=III, 4=IV)",
    "CVD occured (0=censored, 1=yes)",
    "Time until CVD or censoring (days)"
  )
)

kable(codebook_q2, 
      col.names = c("Variable", "Description"),
      align = c("l", "l"))
# Your plots here

# Your analysis here

# Your demographics table here

# Your KM plots here

# Your comparison table here

# Your Cox model here

# Your backward elimination code here

# Your final model table here

# Load data
q3_data <- read.csv("assn2q3data.csv")

# Display codebook
codebook_q3 <- data.frame(
  Variable = c("randid", "timeevent", "event"),
  Description = c(
    "Patient ID",
    "Time to event or censoring (days)",
    "Event (0=censored, 1=stroke, 2=cvd, 3=death)"
  )
)

kable(codebook_q3, 
      col.names = c("Variable", "Description"),
      align = c("l", "l"))
# Your code here

# Your code here