matching_ratio N_exposed N_unexposed N sd
1 1 6809 6809 13618 0.5000000
2 2 6809 13618 20427 0.4714045
3 3 6809 20427 27236 0.4330127
4 4 6809 27236 34045 0.4000000
5 5 6809 34045 40854 0.3726780
Sample-size and Power calculations: Respiratory syncytial virus (RSV) and incidence of heart failure
1 Introduction
PURPOSE: To conduct statistical power calculations for our study on the effects of respiratory syncytial virus (RSV) in triggering heart failure
APPROACH: As discussed yesterday, it would be great if you could prepare some sample size calculations for a cohort and a self-controlled case series study. These are for a grant application I am putting in to the British Heart Foundation looking at the role of respiratory syncytial virus (RSV) in triggering heart failure.
One challenge with RSV is that it may be difficult to identify in primary care data, as symptoms are not specific and similar to many other respiratory virus infections. People with an acute respiratory illness who present to the GP are not routinely tested with PCR in the UK. Therefore, the coding used by GPs may be non-specific i.e. an RSV infection may be coded as ‘acute respiratory infection’. Once people go to hospital, they are more likely to be tested, so we will pick up more specific episodes in linked hospital data.
We will be doing the study in CPRD Aurum, linked to Hospital Episode Statistics data from 01/01/2015 to 31/12/2023. At present (without an approved study protocol), we only have access to very limited Aurum data for feasibility counts. Alex Lyons has generated some counts to assess feasibility - numbers are shown in red below. As there is the issue of diagnostic difficulty, we have used two codelists for RSV. One is specific, containing 55 RSV codes, and the other is a broad, sensitive definition, including 192 codes.
The study questions for which we need power calculations are:
- Does having an episode of RSV result in an increase in the risk of heart failure? Study design = cohort. Analysis = Cox regression?
- Does relative incidence of heart failure increase in a 28 day risk window after RSV compared to other time periods? Study design = self-controlled case series. Analysis = conditional Poisson regression?
Could you please:
Use the feasibility data provided by Alex located on drive Z to conduct two separate power calculations for a cohort study using first the specific RSV definition, then the sensitive definition. To do this, you will need to merge the specific RSV data with the heart failure data to remove any individuals who had a heart failure code before their RSV date. (Note the key variables needed are patid, the medcode and the date of event). I have attached a guide to sample size calculations for Cox regression in case helpful. Can also look at sample size calculation in R and Sample Size Calculation for Cox Proportional Hazards Regression Please keep the data on the Z drive in the same folder and delete it when you are done. Can you let me know when you have deleted the data?
For the self-controlled case series calculations, you can just use the counts below from Alex i.e. no need to use individual level data. There is a Stata package sampsi_sccs that allows calculation of sample size for self-controlled case series. Relevant parameters are a risk period of 28 days, a proportion exposed of 1, a rho (or relative incidence) of between 2 and 3. Can use the R Package ‘SCCS’ I hope this all makes sense. Let me know if you have any queries.
Information from Alex:
Query 1) How many people in CPRD Aurum have been diagnosed with RSV defined using the specific_RSV.dta_define.txt codelist at aged 40 years or over from 01 Jan 2015 until 31 Dec 2023? Response: You provided 55 RSV codes and 30 HF codes (please check that is correct). There were 6,809 with at least one ‘specific’ RSV code meeting the criteria.
Query 2) How many of those also have a heart failure code at aged 40 years or more during the same time period using HF.dta_define.txt? Response: Of those with a ‘specific’ code, 615 also had at least one heart failure code that met the criteria.
Query 3) How many people in CPRD Aurum have been diagnosed with RSV defined using the sensitive_RSV.dta_define.txt codelist at aged 40 years or over from 01 Jan 2015 until 31 Dec 2023? Response: You provided 192 RSV codes and the same 30 HF codes. There were 2,311,092 with at least one ‘sensitive’ RSV code meeting the criteria.
Query 4) How many of those also have a heart failure code at aged 40 years or more during the same time period using HF.dta_define.txt? Response: Of those witRh a ‘sensitive’ code, 143,040 also had at least one heart failure code that met the criteria.
2 Power/Sample-size calculation, COX PH Regression
Information needed:
- Sample size: Feasibility counts shows
- (a) 6,809 with at least one RSV code meeting the
specific
criteria (55 codes). - (b) 2,311,092 with at least one RSV code meeting the
non-specific
criteria (192 codes). - The final sample size in each scenario will be based on the matching ratio, will look at 1:1 up to 1:5.
- Statistical Power: 80%. But for power calculation, this will be generated.
- Alpha: To use 5%.
- Postulated Heart Failure event probability in whole sample: A 2023-CPRD AURUM study in England (people aged ≥ 18 yrs) estimated cumulative Heart Failure incidence (incidence proportion) of 0.0156808 (175790 new cases from 11,210,522 people).
- Postulated Hazard ratio (Heart Failure): Will need failure probability in exposed and unexposed.
- (a) The FP will be 0.09032163 (615/6,809) the exposed sample with the specific RSV criteria.
- (b) The FP will be 0.06189282 (143040/2,311,092) the unexposed sample with the non-specific RSV criteria.
- But these are likely overestimates since denominators include people with heart failure before rsv.
- (c) For the unexposed, the failure rate is expected to be lower/equal to the population rate of 0.0156808 in The 2023-CPRD AURUM study in England (people aged ≥ 18 yrs). The HRs based on these estimates will be (a) 5.760014 and (b) 3.947045 assuming the unexposed have the same failure rate as the general population. But lets do for HR of 1.1 to 1.5 (minimum).
- important to use a realistic HR in terms of a plausible direction towards/away from null.
- Standard deviation: Ratio of exposed to unexposed individuals using this formula sqroot of [(1/n)*(n-1/n)] .
3 Cox power calculation
Using feasibility counts supplemented with evidence/literature
3.1 Sample with specific RSV definition (55 rsv codes,6809 exposed)
# 1:1 matching
stata("stpower cox, n(13618) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5) hr table")
. stpower cox, n(13618) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5
> ) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .103191 13618 214 1.1 .5 .05 .015681 |
| .265059 13618 214 1.2 .5 .05 .015681 |
| .482853 13618 214 1.3 .5 .05 .015681 |
| .690927 13618 214 1.4 .5 .05 .015681 |
| .841967 13618 214 1.5 .5 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:2 matching
stata("stpower cox, n(20427) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4714045) hr table")
. stpower cox, n(20427) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4
> 714045) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .123872 20427 321 1.1 .471405 .05 .015681 |
| .336606 20427 321 1.2 .471405 .05 .015681 |
| .600083 20427 321 1.3 .471405 .05 .015681 |
| .810245 20427 321 1.4 .471405 .05 .015681 |
| .927976 20427 321 1.5 .471405 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:3 matching
stata("stpower cox, n(27236) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4330127) hr table")
. stpower cox, n(27236) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4
> 330127) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .134132 27236 428 1.1 .433013 .05 .015681 |
| .371291 27236 428 1.2 .433013 .05 .015681 |
| .650931 27236 428 1.3 .433013 .05 .015681 |
| .853371 27236 428 1.4 .433013 .05 .015681 |
| .952381 27236 428 1.5 .433013 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:4 matching
stata("stpower cox, n(34045) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4) hr table")
. stpower cox, n(34045) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4
> ) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .140272 34045 534 1.1 .4 .05 .015681 |
| .391685 34045 534 1.2 .4 .05 .015681 |
| .678975 34045 534 1.3 .4 .05 .015681 |
| .874876 34045 534 1.4 .4 .05 .015681 |
| .963062 34045 534 1.5 .4 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:5 matching
stata("stpower cox, n(40854) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.372678) hr table")
. stpower cox, n(40854) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.3
> 72678) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .14436 40854 641 1.1 .372678 .05 .015681 |
| .405094 40854 641 1.2 .372678 .05 .015681 |
| .696666 40854 641 1.3 .372678 .05 .015681 |
| .887602 40854 641 1.4 .372678 .05 .015681 |
| .968885 40854 641 1.5 .372678 .05 .015681 |
+---------------------------------------------------------------+
* two sided
3.2 Sample with non-specific RSV definition (192 rsv codes, 2,311,092 exposed)
matching_ratio N_exposed N_unexposed N sd
1 1 2311092 2311092 4622184 0.5000000
2 2 2311092 4622184 6933276 0.4714045
3 3 2311092 6933276 9244368 0.4330127
4 4 2311092 9244368 11555460 0.4000000
5 5 2311092 11555460 13866552 0.3726780
# 1:1 matching
stata("stpower cox, n(4622184) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5) hr table")
. stpower cox, n(4622184) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .5) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| 1 4.6e+06 72480 1.1 .5 .05 .015681 |
| 1 4.6e+06 72480 1.2 .5 .05 .015681 |
| 1 4.6e+06 72480 1.3 .5 .05 .015681 |
| 1 4.6e+06 72480 1.4 .5 .05 .015681 |
| 1 4.6e+06 72480 1.5 .5 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:2 matching
stata("stpower cox, n(6933276) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4714045) hr table")
. stpower cox, n(6933276) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4714045) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| 1 6.9e+06 108720 1.1 .471405 .05 .015681 |
| 1 6.9e+06 108720 1.2 .471405 .05 .015681 |
| 1 6.9e+06 108720 1.3 .471405 .05 .015681 |
| 1 6.9e+06 108720 1.4 .471405 .05 .015681 |
| 1 6.9e+06 108720 1.5 .471405 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:3 matching
stata("stpower cox, n(9244368) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4330127) hr table")
. stpower cox, n(9244368) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4330127) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| 1 9.2e+06 144960 1.1 .433013 .05 .015681 |
| 1 9.2e+06 144960 1.2 .433013 .05 .015681 |
| 1 9.2e+06 144960 1.3 .433013 .05 .015681 |
| 1 9.2e+06 144960 1.4 .433013 .05 .015681 |
| 1 9.2e+06 144960 1.5 .433013 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:4 matching
stata("stpower cox, n(11555460) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4) hr table")
. stpower cox, n(11555460) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(
> 0.4) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| 1 1.2e+07 181199 1.1 .4 .05 .015681 |
| 1 1.2e+07 181199 1.2 .4 .05 .015681 |
| 1 1.2e+07 181199 1.3 .4 .05 .015681 |
| 1 1.2e+07 181199 1.4 .4 .05 .015681 |
| 1 1.2e+07 181199 1.5 .4 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:5 matching
stata("stpower cox, n(13866552) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.372678) hr table")
. stpower cox, n(13866552) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(
> 0.372678) hr table
Estimated power for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| 1 1.4e+07 217439 1.1 .372678 .05 .015681 |
| 1 1.4e+07 217439 1.2 .372678 .05 .015681 |
| 1 1.4e+07 217439 1.3 .372678 .05 .015681 |
| 1 1.4e+07 217439 1.4 .372678 .05 .015681 |
| 1 1.4e+07 217439 1.5 .372678 .05 .015681 |
+---------------------------------------------------------------+
* two sided
4 Cox sample-size calculation
Assumes:
- Hazard ratios: 1.1 - 1.5
- Power: 80%
- Gives N and number of events
- Note: the expected number of total heart failure events from the feasibility counts is 299,017
# 1:1 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.5) hr table")
. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .5) hr table
Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .8 220405 3457 1.1 .5 .05 .015681 |
| .8 60232 945 1.2 .5 .05 .015681 |
| .8 29087 457 1.3 .5 .05 .015681 |
| .8 17685 278 1.4 .5 .05 .015681 |
| .8 12179 191 1.5 .5 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:2 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4714045) hr table")
. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4714045) hr table
Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .8 247956 3889 1.1 .471405 .05 .015681 |
| .8 67761 1063 1.2 .471405 .05 .015681 |
| .8 32723 514 1.3 .471405 .05 .015681 |
| .8 19896 312 1.4 .471405 .05 .015681 |
| .8 13701 215 1.5 .471405 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:3 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4330127) hr table")
. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4330127) hr table
Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .8 293873 4609 1.1 .433013 .05 .015681 |
| .8 80309 1260 1.2 .433013 .05 .015681 |
| .8 38782 609 1.3 .433013 .05 .015681 |
| .8 23580 370 1.4 .433013 .05 .015681 |
| .8 16238 255 1.5 .433013 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:4 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.4) hr table")
. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .4) hr table
Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .8 344383 5401 1.1 .4 .05 .015681 |
| .8 94112 1476 1.2 .4 .05 .015681 |
| .8 45448 713 1.3 .4 .05 .015681 |
| .8 27633 434 1.4 .4 .05 .015681 |
| .8 19029 299 1.5 .4 .05 .015681 |
+---------------------------------------------------------------+
* two sided
# 1:5 matching
stata("stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0.372678) hr table")
. stpower cox, power(0.8) alpha(0.05) hratio(1.1(0.1)1.5) failp(0.0156808) sd(0
> .372678) hr table
Estimated sample size for Cox PH regression
Wald test, hazard metric
H0: [b1, b2, ..., bp] = [0, b2, ..., bp]
+---------------------------------------------------------------+
| Power N E HR SD Alpha* Pr(E) |
|---------------------------------------------------------------|
| .8 396729 6222 1.1 .372678 .05 .015681 |
| .8 108417 1701 1.2 .372678 .05 .015681 |
| .8 52356 821 1.3 .372678 .05 .015681 |
| .8 31833 500 1.4 .372678 .05 .015681 |
| .8 21922 344 1.5 .372678 .05 .015681 |
+---------------------------------------------------------------+
* two sided
5 Sample size calculation, Self-Controlled Case-Series
Information needed:
- Duration of post-exposure risk period (to use 28, 90 and 180 days)
- rho(#) relative incidence rate between exposure periods (to use 1.5,2.0,2.5,3.0)
- Duration of entire observation period (01 Jan 2015 until 31 Dec 2023)
- Assumes 80% power, 0.5 alpha, all subjects are exposed, binomial method.
- command =
sampsi_sccs , power(#) rho(#)
, the rest of the information/parameters are imputed sequentially.
Sample sizes (number of heart failure events) for varied parameters are shown below, assuming:
- 80% power
- 0.5 alpha
- all subjects are exposed
- binomial method and
- total observation period of 3286 days.
risk_period_days | relative_incidence | events_N |
---|---|---|
28 | 1.5 | 4618 |
28 | 2.0 | 1365 |
28 | 2.5 | 696 |
28 | 3.0 | 441 |
90 | 1.5 | 1478 |
90 | 2.0 | 441 |
90 | 2.5 | 227 |
90 | 3.0 | 145 |
180 | 1.5 | 771 |
180 | 2.0 | 233 |
180 | 2.5 | 121 |
180 | 3.0 | 78 |