This is a reminder to myself (and to whoever finds this usefull) on the calculation of:
What you can learn in this reminder
rowwise
functionlist.files
function and the set_names
from purrr
to make it easy to map the files into functions and keep a record of what comes fromThe SMR is calculated as the ratio of observed hospital mortality over predicted hospital mortality.
I have data on ~1000 patients of the iCU and the following info:
The data are in sepate excel workbooks named 2012.xlsx, 2013.xlsx etc.. until 2017, in the folder “ECDC data” in my working directory (my project)
library(tidyverse)
library(readxl)
# create a vector of the file names
list_of_files =
list.files(path = "ECDC data", pattern = ".xlsx$", full.names = TRUE) %>%
# name the vectors. It will be usefull to map the names to the read_xlsx function to get all data at once (purrr magic..)
set_names()
# Map the file names to the readxl:: read_xlsx function to read all excel files and once.
# Combine all the workbooks into a dataframe.
# Also, capture the filename as an `id` variable in the dataframe
all_data =
map_df( list_of_files,
~ readxl::read_excel(path = .,
# get some columns only (i have a lot)
range = readxl::cell_cols("A:M")),
.id = "dataset") %>%
#get the year from the dataset name - use some regular expressions
mutate(year = str_extract(dataset, pattern ="[0-9]+" )) %>%
# Do some cleaning
janitor::clean_names() %>% # the column names
# remove no saps score and no info on death
filter(!is.na(saps_ii),
!is.na(outcome_at_icu_discharge)) %>%
mutate(death = if_else(outcome_at_icu_discharge == "Alive", 0, 1)) %>%
# Select relevant variables only
select(year,death, saps_ii, type_of_admision)
Here is a bit of an output
all_data %>%
head(10) %>%
knitr::kable()
year | death | saps_ii | type_of_admision |
---|---|---|---|
2012 | 0 | 49 | medical |
2012 | 0 | 24 | Unschedule surgical |
2012 | 0 | 84 | medical |
2012 | 0 | 50 | medical |
2012 | 0 | 12 | medical |
2012 | 1 | 64 | medical |
2012 | 0 | 31 | Unschedule surgical |
2012 | 0 | 35 | medical |
2012 | 0 | 63 | medical |
2012 | 0 | 26 | medical |
Lets aggregate by year, the observed mortality, and the median SAPS II score
all_data_aggr =
all_data %>%
group_by(year) %>%
summarise(n = n(),
obs_mortality = sum(death),
mortality_rate = mean(death),
# I get the median SAPS score
med_saps = median(saps_ii)
)
all_data_aggr %>% knitr::kable()
year | n | obs_mortality | mortality_rate | med_saps |
---|---|---|---|---|
2012 | 144 | 26 | 0.1805556 | 48 |
2013 | 165 | 33 | 0.2000000 | 47 |
2014 | 159 | 28 | 0.1761006 | 48 |
2015 | 198 | 41 | 0.2070707 | 49 |
2016 | 184 | 32 | 0.1739130 | 60 |
2017 | 216 | 47 | 0.2175926 | 57 |
Now I need to calculate the predicted mortality by year.
The predicted mortality is derived from the SAPS II score as follows:
\(logit = -7.7631 + 0.0737*Score + 0.9971*ln(Score+1)\)
and then
\(Mortality = \frac{e^{logit}}{1+e^{logit}}\)
Here I need a function that has the saps II score as an input and returns the predicted mortality
predict_mortality = function(score){
logit = -7.7631 + 0.0737 * score + 0.9971 * log(score+1)
mortality = exp(logit)/ (1+exp(logit))
return(mortality)
}
Now lets use our function predict_mortality
to calculate the predicted mortality out of the SAPS II score
smr_table =
all_data_aggr %>%
mutate(pred_mortality_rate = predict_mortality(med_saps),
# get expecte counts of deaths
pred_mortality = pred_mortality_rate * n,
# The SMR
smr = obs_mortality/ pred_mortality)
#Have a look
smr_table %>% knitr::kable(digits = 2)
year | n | obs_mortality | mortality_rate | med_saps | pred_mortality_rate | pred_mortality | smr |
---|---|---|---|---|---|---|---|
2012 | 144 | 26 | 0.18 | 48 | 0.41 | 59.70 | 0.44 |
2013 | 165 | 33 | 0.20 | 47 | 0.39 | 64.67 | 0.51 |
2014 | 159 | 28 | 0.18 | 48 | 0.41 | 65.92 | 0.42 |
2015 | 198 | 41 | 0.21 | 49 | 0.44 | 86.63 | 0.47 |
2016 | 184 | 32 | 0.17 | 60 | 0.68 | 125.28 | 0.26 |
2017 | 216 | 47 | 0.22 | 57 | 0.62 | 133.76 | 0.35 |
An approximate 95% confidence interval (CI) for the SMR was calculated by using the method proposed by Vandenbroucke JP. A shortcut method for calculating the 95 percent confidence interval of the standardized mortality ratio. (Letter). Am J Epidemiol 1982; 115:303-4.
I found the formulas of all possible CI calculations for the SMR here. It is the documentation of the this and this online calculators
I tried out a few formulas in the documentation, specifically the ones that are called approximations
I haven’t tried the Exact Tests calculations - I think they need more programming + I couldn’t figure out how I am supposed to do the iterative process. Perhaps its easy and don’t have time to think about it :)
Well i tried this approximation by Vanderbroucke
Beware that the \(\sqrt(α)\) refers to the observed mortality and the \(λ\) to the predicted mortality. DO NOT confuse with the \(α\) of the Z score (i.e. the 1.96)
So I created this function, that reads the observed and predicted mortality (actual counts) and returns a vector of 2 values. The 1st is the lower limit, and the 2nd the upper limit
smr_conf = function(observed, predicted){
lower = ((sqrt(observed) - 1.96*0.5)^2)/ predicted
upper = ((sqrt(observed) + 1.96*0.5)^2)/ predicted
return(c(lower, upper))
}
Now lets use the smr_table
we calculated earlier
NOTE HERE the use of rowwise
function of dplyr
The rowwise
is needed since the inputs to the the function are taken rowwise
smr_table %>%
rowwise() %>%
mutate( lower_95 = smr_conf(obs_mortality, pred_mortality)[1],
upper_95 = smr_conf(obs_mortality, pred_mortality)[2]) %>%
knitr::kable(digits = 2)
year | n | obs_mortality | mortality_rate | med_saps | pred_mortality_rate | pred_mortality | smr | lower_95 | upper_95 |
---|---|---|---|---|---|---|---|---|---|
2012 | 144 | 26 | 0.18 | 48 | 0.41 | 59.70 | 0.44 | 0.28 | 0.62 |
2013 | 165 | 33 | 0.20 | 47 | 0.39 | 64.67 | 0.51 | 0.35 | 0.70 |
2014 | 159 | 28 | 0.18 | 48 | 0.41 | 65.92 | 0.42 | 0.28 | 0.60 |
2015 | 198 | 41 | 0.21 | 49 | 0.44 | 86.63 | 0.47 | 0.34 | 0.63 |
2016 | 184 | 32 | 0.17 | 60 | 0.68 | 125.28 | 0.26 | 0.17 | 0.35 |
2017 | 216 | 47 | 0.22 | 57 | 0.62 | 133.76 | 0.35 | 0.26 | 0.46 |