RM2 2025 S2 Assignment 2
Submission Instructions
Due 11:59pm 7th October 2025. This assignment is worth 30% of the overall mark.
Please submit your work in one document, with your name, the course title and the assignment number in the file name (e.g. RM2 Assignment2 JimmyBarnes.docx) and in the document itself (either on the title page, or in the header/footer). Please submit either a Word or PDF file. You should include relevant statistical code in an appendix, showing how you answered the question. Please use a different font (Courier 10pt is recommended) and format it neatly, to make your answers easy to read and understand. You should submit your assignment online according to the instructions in the BCA Turnitin Guide (available on Canvas).
Question 1. [5 Marks]
A fictitious study is performed to determine the effect of a genetic marker on survival of children from birth who were born with a life-threatening condition.
| ID | Marker | Time (years) | Event |
|---|---|---|---|
| 1 | 0 | 1 | 0 |
| 2 | 0 | 3 | 0 |
| 3 | 0 | 7 | 1 |
| 4 | 0 | 9 | 0 |
| 5 | 1 | 2 | 0 |
| 6 | 1 | 4 | 1 |
| 7 | 1 | 8 | 0 |
| 8 | 1 | 10 | 0 |
a) All subjects enter at time=0
i. Risk sets at each failure time
ii. Partial likelihood
iii. Kaplan-Meier estimate
b) Subject 7 enters at time 5
i. Characteristic of subject 7
ii. Partial likelihood with delayed entry
Question 2. [21 Marks]
This data is a subset from a randomised controlled trial to compare placebo and Digitalis in patients with congestive heart failure.
| Variable | Description |
|---|---|
| id | Patient ID |
| trtmt | Treatment (0=Placebo, 1=Digitalis) |
| age | Age in years |
| sex | Sex (1=Male, 2=Female) |
| ejfper | Ejection Fraction (percent) |
| functcls | Current NYHA Functional Class (1=I, 2=II, 3=III, 4=IV) |
| cvd | CVD occured (0=censored, 1=yes) |
| cvddays | Time until CVD or censoring (days) |
i. Graphical displays of continuous predictors
🌸 Distribution of continuous variables 🌸
ii. Examine categorical predictors
🌸 Categorical predictor examination 🌸
Comments
Your comments here
iii. Demographics table
Table 1. Baseline Characteristics by Treatment Group
iv. Survival plots by predictor
🌸 Kaplan-Meier curves by predictor 🌸
Comments
Your comments here
v. Comparison table for time to CVD
Table 2. Time to CVD by Predictor
vi. Cox model for treatment only
Table 3. Cox Regression Results - Treatment Only
Comments
Your comments here
vii. Cox model with backward elimination
🌸 Backward elimination process 🌸
Table 4. Final Cox Regression Model
Comments
Your comments here
Question 3. [4 Marks]
This question uses data from the Framingham study.
| Variable | Description |
|---|---|
| randid | Patient ID |
| timeevent | Time to event or censoring (days) |
| event | Event (0=censored, 1=stroke, 2=cvd, 3=death) |
i. Survival curve for time to CVD
🌸 Kaplan-Meier curve for CVD (other events censored) 🌸
ii. Survival curve for time to any event
🌸 Kaplan-Meier curve for any event 🌸
Appendix: Statistical Code
Show code
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(knitr)
library(tidyverse)
library(survival)
library(survminer)
library(gridExtra)
library(kableExtra)
theme_set(theme_minimal())
q2_data <- read.csv(file = "assn2q2data.csv")
q3_data <- read.csv(file = "assn2q3data.csv")
blockquote {
font-size: inherit;
}
h3 {
font-size: 14px;
}
# Create the dataset
q1_data <- data.frame(
id = 1:8,
marker = c(0, 0, 0, 0, 1, 1, 1, 1),
time = c(1, 3, 7, 9, 2, 4, 8, 10),
event = c(0, 0, 1, 0, 0, 1, 0, 0)
)
kable(q1_data,
caption = "Study data for genetic marker and survival",
col.names = c("ID", "Marker", "Time (years)", "Event"))
# Your code here to determine risk sets
# Your code here to determine partial likelihood
# Your code here for KM estimate calculation
# Your analysis here
# Your code here
# Load data
q2_data <- read.csv("assn2q2data.csv")
# Display codebook
codebook_q2 <- data.frame(
Variable = c("id", "trtmt", "age", "sex", "ejfper", "functcls", "cvd", "cvddays"),
Description = c(
"Patient ID",
"Treatment (0=Placebo, 1=Digitalis)",
"Age in years",
"Sex (1=Male, 2=Female)",
"Ejection Fraction (percent)",
"Current NYHA Functional Class (1=I, 2=II, 3=III, 4=IV)",
"CVD occured (0=censored, 1=yes)",
"Time until CVD or censoring (days)"
)
)
kable(codebook_q2,
col.names = c("Variable", "Description"),
align = c("l", "l"))
# Your plots here
# Your analysis here
# Your demographics table here
# Your KM plots here
# Your comparison table here
# Your Cox model here
# Your backward elimination code here
# Your final model table here
# Load data
q3_data <- read.csv("assn2q3data.csv")
# Display codebook
codebook_q3 <- data.frame(
Variable = c("randid", "timeevent", "event"),
Description = c(
"Patient ID",
"Time to event or censoring (days)",
"Event (0=censored, 1=stroke, 2=cvd, 3=death)"
)
)
kable(codebook_q3,
col.names = c("Variable", "Description"),
align = c("l", "l"))
# Your code here
# Your code here
Comments
Your comments here