Study Design - Sample Size Estimation

1. choose an appropriate sample size for estimating the 16- month mortality rate for children younger than 3 years of age

According to the information from the Nepal data set, I can make the following assumptions to calculate a sample size for estimating the 16- month mortality rate for children younger than 3 years of age: p = perc_died for placebo group = 0.0294, q = 1 – p = 0.9706, d=0.05,  = 0.05. With these values, the sample size estimating mortality to within +/- 0.5% is 44 (rounded up from 43.85).

When supposing there is no prior information about the proportion, I assume p = q = 0.5 to find the most conservative sample size. The result is 385 (rounded up from 384.2). I can find the sample size increase very dramatically.

# import dataset
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nepalA = read_csv("nepal621.csv")
## Rows: 27121 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): sex, age, trt, status
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(nepalA)
##      sex                age                trt               status         
##  Length:27121       Length:27121       Length:27121       Length:27121      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character
# filtering for the children younger than 3 years of age 
nepalbelow3 = nepalA %>%
  filter(age <= 3)

# calculating p and q
nepalbelow3 %>%
  group_by(trt) %>%
  summarize(n_alive = sum(status=="Alive"),
            perc_alive = n_alive/n(),
            n_died = sum(status=="Died"),
            perc_died = n_died/n(),
                          obs=n())
## # A tibble: 2 × 6
##   trt     n_alive perc_alive n_died perc_died   obs
##   <chr>     <int>      <dbl>  <int>     <dbl> <int>
## 1 Placebo    7880      0.971    239    0.0294  8119
## 2 Vit A      8218      0.976    206    0.0245  8424
# calculate a sample size
n1 <- (1.96)^2 * 0.0294 * (1 - 0.0294) / (0.05)^2
n1
## [1] 43.84901
n2 <- (1.96)^2 * 0.5 * 0.5 / (0.05)^2
n2
## [1] 384.16

2.

Given assumption Now suppose you have a chance to investigate the effect of vitamin A supplementation on the mortality of children under 3 years of age. Confirm from the data set that the 16-month mortality in the placebo group is 0.0294 and the 16- month mortality in the Vitamin A group is 0.0245 for the Nepal study. The estimated relative risk of death in the placebo group as compared to the Vitamin A group is 0.029/0.0245 = 1.2. Assuming a significance level of 0.05 and power of 80%, the sample size needed in the new study to detect a relative risk of 1.2 is 17,144 children per group according to the results on the next page. A total sample size of 34,288 children would be required.

I can figure out from the data set that the 16-month mortality in the placebo group is 0.0294 and the 16- month mortality in the Vitamin A group is 0.0245 for the Nepal study (you can find the detail code in the first question). With assumptions of a significance level of 0.05 and power of 80%, the sample size needed in the new study to detect a relative risk of 1.2 is 17,144 children per group according to the results obtained from the following command, which means I need 34,288 children in total.

##  The power.prop.test() command in R can be used with the results of the Nepal trial to choose the size of the vitamin A and control groups (assuming equal sample sizes for both groups) for the new study.
power.prop.test(n=NULL, p1=0.0294, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 17143.9
##              p1 = 0.0294
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## to calculate Z(a/2) and Z(b) value
## Z(a/2) 
qnorm(0.975)
## [1] 1.959964
## Z(b) 
qnorm(0.9)
## [1] 1.281552

Construct a table that displays the total sample sizes required under various assumptions about the mortality rate in the control group and the relative risk of interest. Assume a significance level of 0.05 and 80% power. Vary the assumptions by:

  1. Assuming that the control group mortality rate (risk) is:
  1. the same as that observed in Nepal placebo group of children < 3 years of age
  2. or .5% lower 3.or .5% higher
  1. Assuming that the relative risk of death for children in the control group as compared to children receiving vitamin A is hypothesized to be:
  1. 1.2 (the same as the relative risk that was estimated for Nepali children in this age group
  2. or 1.5
  3. or 1.75.
n_a1 <- (1.96)^2 * 0.0294 * (1 - 0.0294) / (0.05)^2
n_a1
## [1] 43.84901
#### 4a-1 : p1 = 0.0294
z1 = 1.96
z2 = 0.84

p1 = 0.0294
q1= 1-p1
p2 = 0.0245
q2 = 1 - p2
p=(p1+p2)/2
q=1-p
delta = p1 - p2
#100mil - increase in power -> increase in ss

n <- z1*(2*p*q)^(1/2) + z2*(p1*q1 + p2*q2)^(1/2)
sample_size = n/delta^2
sample_size*2
## [1] 53412.5
#### a-1 
power.prop.test(n=NULL, p1=0.0294, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 17143.9
##              p1 = 0.0294
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
#### a-2 
power.prop.test(n=NULL, p1=0.029253, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 18172.49
##              p1 = 0.029253
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
#### a-3
power.prop.test(n=NULL, p1=0.029547, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 16202.55
##              p1 = 0.029547
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
#### b-1 
rr <- c(1.2, 1.5, 1.75)
p1 = rr*p2

power.prop.test(n=NULL, p1 = 1.2*0.0245, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 17143.9
##              p1 = 0.0294
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
#### b-2 
power.prop.test(n=NULL, p1 = 1.5*0.0245, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 3104.334
##              p1 = 0.03675
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
#### b-3 
power.prop.test(n=NULL, p1 = 1.75*0.0245, p2=0.0245, sig.level=0.05, power=0.8,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 1512.275
##              p1 = 0.042875
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Construct another table that displays the total sample sizes required under the same varying assumptions of the mortality rate in the control group and the relative risk of interest. This time, assume a significance level of 0.05 and 90% power. Comment on what you observe.

# a-1 
power.prop.test(n=NULL, p1=0.0294, p2=0.0245, sig.level=0.05, power=0.9,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 22950.32
##              p1 = 0.0294
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
n=22951*2
n
## [1] 45902
# a-2 
power.prop.test(n=NULL, p1=0.029253, p2=0.0245, sig.level=0.05, power=0.9,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 24327.31
##              p1 = 0.029253
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
n=24373*2
n
## [1] 48746
# a-3
power.prop.test(n=NULL, p1=0.029547, p2=0.0245, sig.level=0.05, power=0.9,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 21690.12
##              p1 = 0.029547
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
n=21691*2
n
## [1] 43382
# b-1
power.prop.test(n=NULL, p1 = 1.2*0.0245, p2=0.0245, sig.level=0.05, power=0.9,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 22950.32
##              p1 = 0.0294
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
n=22950*2
n
## [1] 45900
# b-2 
power.prop.test(n=NULL, p1 = 1.5*0.0245, p2=0.0245, sig.level=0.05, power=0.9,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 4155.324
##              p1 = 0.03675
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
n=4156*2
n
## [1] 8312
# b-3 
power.prop.test(n=NULL, p1 = 1.75*0.0245, p2=0.0245, sig.level=0.05, power=0.9,
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 2024.008
##              p1 = 0.042875
##              p2 = 0.0245
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

In determining the necessary overall sample size for estimating the 16-month mortality rate for children younger than 3 years of age, I established several assumptions about the mortality rate in the control group and the relative risk of interest. To enhance statistical power, a significance level of 0.05 and 90% power were employed. Various assumptions were considered, and the computed total sample sizes indicated a requirement of 45,901 participants assuming a mortality rate of 0.0294, derived from available data in the Nepal dataset. Allowing for an estimated mortality rate variation within 5%, the range for the required sample size fluctuates from 43,382 to 48,746. Alternatively, a total of 45,900 participants would be needed if considering a relative risk of 1.2, as indicated by prior findings in the Nepal dataset. However, the necessary sample size varies from 45,900 to 48,746 as the relative risk of death increases to 1.5 and 1.75, respectively. Based on the sensitivity analysis, a more robust outcome can be anticipated with assumptions about the mortality rate. Furthermore, in a conservative approach, the largest sample size of 48,746 was chosen for our study.