Sample-size and Power calculations: Respiratory syncytial virus (RSV) and incidence of heart failure

Author

Kwabena Asare

Published

October 18, 2024

1 Introduction

PURPOSE: To conduct statistical power calculations for our study on the effects of respiratory syncytial virus (RSV) in triggering heart failure

APPROACH: As discussed yesterday, it would be great if you could prepare some sample size calculations for a cohort and a self-controlled case series study. These are for a grant application I am putting in to the British Heart Foundation looking at the role of respiratory syncytial virus (RSV) in triggering heart failure.

One challenge with RSV is that it may be difficult to identify in primary care data, as symptoms are not specific and similar to many other respiratory virus infections. People with an acute respiratory illness who present to the GP are not routinely tested with PCR in the UK. Therefore, the coding used by GPs may be non-specific i.e. an RSV infection may be coded as ‘acute respiratory infection’. Once people go to hospital, they are more likely to be tested, so we will pick up more specific episodes in linked hospital data.

We will be doing the study in CPRD Aurum, linked to Hospital Episode Statistics data from 01/01/2015 to 31/12/2023. At present (without an approved study protocol), we only have access to very limited Aurum data for feasibility counts. Alex Lyons has generated some counts to assess feasibility - numbers are shown in red below. As there is the issue of diagnostic difficulty, we have used two codelists for RSV. One is specific, containing 55 RSV codes, and the other is a broad, sensitive definition, including 192 codes.

The study questions for which we need power calculations are:

Does having an episode of RSV result in an increase in the risk of heart failure? Study design = cohort. Analysis = Cox regression?
Does relative incidence of heart failure increase in a 28 day risk window after RSV compared to other time periods? Study design = self-controlled case series. Analysis = conditional Poisson regression?

Could you please:

Use the feasibility data provided by Alex located on drive Z to conduct two separate power calculations for a cohort study using first the specific RSV definition, then the sensitive definition. To do this, you will need to merge the specific RSV data with the heart failure data to remove any individuals who had a heart failure code before their RSV date. (Note the key variables needed are patid, the medcode and the date of event). I have attached a guide to sample size calculations for Cox regression in case helpful. Can also look at sample size calculation in R and Sample Size Calculation for Cox Proportional Hazards Regression Please keep the data on the Z drive in the same folder and delete it when you are done. Can you let me know when you have deleted the data?
For the self-controlled case series calculations, you can just use the counts below from Alex i.e. no need to use individual level data. There is a Stata package sampsi_sccs that allows calculation of sample size for self-controlled case series. Relevant parameters are a risk period of 28 days, a proportion exposed of 1, a rho (or relative incidence) of between 2 and 3. Can use the R Package ‘SCCS’ I hope this all makes sense. Let me know if you have any queries.

Information from Alex:

Query 1) How many people in CPRD Aurum have been diagnosed with RSV defined using the specific_RSV.dta_define.txt codelist at aged 40 years or over from 01 Jan 2015 until 31 Dec 2023? Response: You provided 55 RSV codes and 30 HF codes (please check that is correct). There were 6,809 with at least one ‘specific’ RSV code meeting the criteria.
Query 2) How many of those also have a heart failure code at aged 40 years or more during the same time period using HF.dta_define.txt? Response: Of those with a ‘specific’ code, 615 also had at least one heart failure code that met the criteria.
Query 3) How many people in CPRD Aurum have been diagnosed with RSV defined using the sensitive_RSV.dta_define.txt codelist at aged 40 years or over from 01 Jan 2015 until 31 Dec 2023? Response: You provided 192 RSV codes and the same 30 HF codes. There were 2,311,092 with at least one ‘sensitive’ RSV code meeting the criteria.
Query 4) How many of those also have a heart failure code at aged 40 years or more during the same time period using HF.dta_define.txt? Response: Of those witRh a ‘sensitive’ code, 143,040 also had at least one heart failure code that met the criteria.

2 Power/Sample-size calculation, COX PH Regression

Information needed:

Sample size: Feasibility counts shows

(a) 6,809 with at least one RSV code meeting the specific criteria (55 codes).
(b) 2,311,092 with at least one RSV code meeting the non-specific criteria (192 codes).
The final sample size in each scenario will be based on the matching ratio, will look at 1:1 up to 1:5.

Statistical Power: 80%. But for power calculation, this will be generated.
Alpha: To use 5%.
Postulated Heart Failure event probability in whole sample: A 2023-CPRD AURUM study in England (people aged ≥ 18 yrs) estimated cumulative Heart Failure incidence (incidence proportion) of 0.0156808 (175790 new cases from 11,210,522 people).
Postulated Hazard ratio (Heart Failure): Will need failure probability in exposed and unexposed.

(a) The FP will be 0.09032163 (615/6,809) the exposed sample with the specific RSV criteria.
(b) The FP will be 0.06189282 (143040/2,311,092) the unexposed sample with the non-specific RSV criteria.
- But these are likely overestimates since denominators include people with heart failure before rsv.
(c) For the unexposed, the failure rate is expected to be lower/equal to the population rate of 0.0156808 in The 2023-CPRD AURUM study in England (people aged ≥ 18 yrs). The HRs based on these estimates will be (a) 5.760014 and (b) 3.947045 assuming the unexposed have the same failure rate as the general population. But lets do for HR of 1.1 to 1.5 (minimum).
- important to use a realistic HR in terms of a plausible direction towards/away from null.

Standard deviation: Ratio of exposed to unexposed individuals using this formula sqroot of [(1/n)*(n-1/n)] .

3 Cox power calculation

Using feasibility counts supplemented with evidence/literature

3.1 Sample with specific RSV definition (55 rsv codes,6809 exposed)

  matching_ratio N_exposed N_unexposed     N        sd
1              1      6809        6809 13618 0.5000000
2              2      6809       13618 20427 0.4714045
3              3      6809       20427 27236 0.4330127
4              4      6809       27236 34045 0.4000000
5              5      6809       34045 40854 0.3726780

# 1:1 matching
stata("stpower cox, n(13618) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5) hr table")

. stpower cox, n(13618) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5
> ) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  | .103191    13618      214      1.1       .5      .05  .015681 |
  | .265059    13618      214      1.2       .5      .05  .015681 |
  | .482853    13618      214      1.3       .5      .05  .015681 |
  | .690927    13618      214      1.4       .5      .05  .015681 |
  | .841967    13618      214      1.5       .5      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:2 matching
stata("stpower cox, n(20427) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4714045) hr table")

. stpower cox, n(20427) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4
> 714045) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  | .123872    20427      321      1.1  .471405      .05  .015681 |
  | .336606    20427      321      1.2  .471405      .05  .015681 |
  | .600083    20427      321      1.3  .471405      .05  .015681 |
  | .810245    20427      321      1.4  .471405      .05  .015681 |
  | .927976    20427      321      1.5  .471405      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:3 matching
stata("stpower cox, n(27236) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4330127) hr table")

. stpower cox, n(27236) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4
> 330127) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  | .134132    27236      428      1.1  .433013      .05  .015681 |
  | .371291    27236      428      1.2  .433013      .05  .015681 |
  | .650931    27236      428      1.3  .433013      .05  .015681 |
  | .853371    27236      428      1.4  .433013      .05  .015681 |
  | .952381    27236      428      1.5  .433013      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:4 matching
stata("stpower cox, n(34045) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4) hr table")

. stpower cox, n(34045) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4
> ) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  | .140272    34045      534      1.1       .4      .05  .015681 |
  | .391685    34045      534      1.2       .4      .05  .015681 |
  | .678975    34045      534      1.3       .4      .05  .015681 |
  | .874876    34045      534      1.4       .4      .05  .015681 |
  | .963062    34045      534      1.5       .4      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:5 matching
stata("stpower cox, n(40854) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.372678) hr table")

. stpower cox, n(40854) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.3
> 72678) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |  .14436    40854      641      1.1  .372678      .05  .015681 |
  | .405094    40854      641      1.2  .372678      .05  .015681 |
  | .696666    40854      641      1.3  .372678      .05  .015681 |
  | .887602    40854      641      1.4  .372678      .05  .015681 |
  | .968885    40854      641      1.5  .372678      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

3.2 Sample with non-specific RSV definition (192 rsv codes, 2,311,092 exposed)

  matching_ratio N_exposed N_unexposed        N        sd
1              1   2311092     2311092  4622184 0.5000000
2              2   2311092     4622184  6933276 0.4714045
3              3   2311092     6933276  9244368 0.4330127
4              4   2311092     9244368 11555460 0.4000000
5              5   2311092    11555460 13866552 0.3726780

# 1:1 matching
stata("stpower cox, n(4622184) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5) hr table")

. stpower cox, n(4622184) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .5) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |       1  4.6e+06    72480      1.1       .5      .05  .015681 |
  |       1  4.6e+06    72480      1.2       .5      .05  .015681 |
  |       1  4.6e+06    72480      1.3       .5      .05  .015681 |
  |       1  4.6e+06    72480      1.4       .5      .05  .015681 |
  |       1  4.6e+06    72480      1.5       .5      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:2 matching
stata("stpower cox, n(6933276) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4714045) hr table")

. stpower cox, n(6933276) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4714045) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |       1  6.9e+06   108720      1.1  .471405      .05  .015681 |
  |       1  6.9e+06   108720      1.2  .471405      .05  .015681 |
  |       1  6.9e+06   108720      1.3  .471405      .05  .015681 |
  |       1  6.9e+06   108720      1.4  .471405      .05  .015681 |
  |       1  6.9e+06   108720      1.5  .471405      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:3 matching
stata("stpower cox, n(9244368) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4330127) hr table")

. stpower cox, n(9244368) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4330127) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |       1  9.2e+06   144960      1.1  .433013      .05  .015681 |
  |       1  9.2e+06   144960      1.2  .433013      .05  .015681 |
  |       1  9.2e+06   144960      1.3  .433013      .05  .015681 |
  |       1  9.2e+06   144960      1.4  .433013      .05  .015681 |
  |       1  9.2e+06   144960      1.5  .433013      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:4 matching
stata("stpower cox, n(11555460) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4) hr table")

. stpower cox, n(11555460) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(
> 0.4) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |       1  1.2e+07   181199      1.1       .4      .05  .015681 |
  |       1  1.2e+07   181199      1.2       .4      .05  .015681 |
  |       1  1.2e+07   181199      1.3       .4      .05  .015681 |
  |       1  1.2e+07   181199      1.4       .4      .05  .015681 |
  |       1  1.2e+07   181199      1.5       .4      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:5 matching
stata("stpower cox, n(13866552) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.372678) hr table")

. stpower cox, n(13866552) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(
> 0.372678) hr table

Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |       1  1.4e+07   217439      1.1  .372678      .05  .015681 |
  |       1  1.4e+07   217439      1.2  .372678      .05  .015681 |
  |       1  1.4e+07   217439      1.3  .372678      .05  .015681 |
  |       1  1.4e+07   217439      1.4  .372678      .05  .015681 |
  |       1  1.4e+07   217439      1.5  .372678      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

4 Cox sample-size calculation

Assumes:

Hazard ratios: 1.1 - 1.5
Power: 80%
Gives N and number of events
Note: the expected number of total heart failure events from the feasibility counts is 299,017

# 1:1 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5) hr table")

. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .5) hr table

Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |      .8   220405     3457      1.1       .5      .05  .015681 |
  |      .8    60232      945      1.2       .5      .05  .015681 |
  |      .8    29087      457      1.3       .5      .05  .015681 |
  |      .8    17685      278      1.4       .5      .05  .015681 |
  |      .8    12179      191      1.5       .5      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:2 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4714045) hr table")

. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4714045) hr table

Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |      .8   247956     3889      1.1  .471405      .05  .015681 |
  |      .8    67761     1063      1.2  .471405      .05  .015681 |
  |      .8    32723      514      1.3  .471405      .05  .015681 |
  |      .8    19896      312      1.4  .471405      .05  .015681 |
  |      .8    13701      215      1.5  .471405      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:3 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4330127) hr table")

. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4330127) hr table

Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |      .8   293873     4609      1.1  .433013      .05  .015681 |
  |      .8    80309     1260      1.2  .433013      .05  .015681 |
  |      .8    38782      609      1.3  .433013      .05  .015681 |
  |      .8    23580      370      1.4  .433013      .05  .015681 |
  |      .8    16238      255      1.5  .433013      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:4 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4) hr table")

. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4) hr table

Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |      .8   344383     5401      1.1       .4      .05  .015681 |
  |      .8    94112     1476      1.2       .4      .05  .015681 |
  |      .8    45448      713      1.3       .4      .05  .015681 |
  |      .8    27633      434      1.4       .4      .05  .015681 |
  |      .8    19029      299      1.5       .4      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

# 1:5 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.372678) hr table")

. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .372678) hr table

Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]

  +---------------------------------------------------------------+
  |   Power        N        E       HR       SD    Alpha*   Pr(E) |
  |---------------------------------------------------------------|
  |      .8   396729     6222      1.1  .372678      .05  .015681 |
  |      .8   108417     1701      1.2  .372678      .05  .015681 |
  |      .8    52356      821      1.3  .372678      .05  .015681 |
  |      .8    31833      500      1.4  .372678      .05  .015681 |
  |      .8    21922      344      1.5  .372678      .05  .015681 |
  +---------------------------------------------------------------+
  * two sided

5 Sample size calculation, Self-Controlled Case-Series

Information needed:

Duration of post-exposure risk period (to use 28, 90 and 180 days)
rho(#) relative incidence rate between exposure periods (to use 1.5,2.0,2.5,3.0)
Duration of entire observation period (01 Jan 2015 until 31 Dec 2023)
Assumes 80% power, 0.5 alpha, all subjects are exposed, binomial method.
command = sampsi_sccs , power(#) rho(#), the rest of the information/parameters are imputed sequentially.

Sample sizes (number of heart failure events) for varied parameters are shown below, assuming:

80% power
0.5 alpha
all subjects are exposed
binomial method and
total observation period of 3286 days.

risk_period_days	relative_incidence	events_N
28	1.5	4618
28	2.0	1365
28	2.5	696
28	3.0	441
90	1.5	1478
90	2.0	441
90	2.5	227
90	3.0	145
180	1.5	771
180	2.0	233
180	2.5	121
180	3.0	78