Context

There is a new version of the survival package arriving to CRAN in the next couple of weeks. The new version changed the way censored data are stored internally for multi-state models (the most common one being competing risks). I think the censored data is now a matrix, where previously it was a vector (though, I haven’t looked into the details). Below is an example of broom::tidy() now giving incorrect results for competing risks estimates using the dev version of the {survival} package. I don’t think CRAN checks reverse deps for Suggested packages, so the broom team may not be aware of the coming change in a very common package.

Example

First get the most recent version of broom.

pak::pak("tidymodels/broom")
packageVersion("broom")
## [1] '1.0.5.9000'

Set temp dirs for installation for the current CRAN release of survival and the dev version from GitHub.

path <- fs::path_temp()
path_cran <- fs::path(path, "cran")
path_dev <- fs::path(path, "dev")

fs::dir_create(path_cran)
fs::dir_create(path_dev)

Install both versions of packages

install.packages("survival", lib = path_cran)
## 
## The downloaded binary packages are in
##  /var/folders/6f/gdjf_vxj2wl3jhmxdkd1hd_w0000gn/T//Rtmprl9cre/downloaded_packages
pak::pak("therneau/survival", lib = path_dev)

# print CRAN and DEV versions of the package
desc::desc_get_version(fs::path(path_cran, "survival", "DESCRIPTION"))
## [1] '3.5.8'
desc::desc_get_version(fs::path(path_dev, "survival", "DESCRIPTION"))
## [1] '3.6.1'

Simulate a competing risks data set

set.seed(1234)
n <- 100
df <- 
  dplyr::tibble(
    death_cr = 
      sample.int(n = 3, size = n, replace = TRUE) |> 
      factor(labels = c("censor", "cancer death", "other cause death")),
    ttdeath = rgamma(n = n, shape = 1)
  )

Illustrate the differences in the results of broom::tidy() from the prod to dev versions of the survival package.

First begin with the version of survival on CRAN.

# print broom pkg version
packageVersion("broom")
## [1] '1.0.5.9000'
# now create a tidy data frame of the results from CRAN and DEV and compare
library("survival", lib.loc = path_cran)
tidy_cran <-
  survfit(Surv(ttdeath, death_cr) ~ 1, data = df) |> 
  broom::tidy()

detach("package:survival", unload=TRUE)

Now, we will perform the same calculation using the dev version of the package.

library("survival", lib.loc = path_dev)
tidy_dev <-
  survfit(Surv(ttdeath, death_cr) ~ 1, data = df) |> 
  broom::tidy()

Print censoring results that do not match

dplyr::full_join(
  tidy_cran |> dplyr::select(state, time, n.censor),
  tidy_dev |> dplyr::select(state, time, n.censor),
  by = c("state", "time"),
  suffix = c(".cran", ".dev")
) |> 
  dplyr::filter(n.censor.cran != n.censor.dev)
## # A tibble: 58 × 4
##    state         time n.censor.cran n.censor.dev
##    <chr>        <dbl>         <int>        <dbl>
##  1 cancer death 0.130             1            0
##  2 cancer death 0.153             1            0
##  3 cancer death 0.192             1            0
##  4 cancer death 0.192             1            0
##  5 cancer death 0.202             1            0
##  6 cancer death 0.211             1            0
##  7 cancer death 0.269             1            0
##  8 cancer death 0.288             1            0
##  9 cancer death 0.292             1            0
## 10 cancer death 0.343             1            0
## # ℹ 48 more rows