Include totals across all strata and percentages so that the reader of your table can see the evidence for whether or not vitamin A is beneficial. Ignoring age and sex for the moment, estimate the proportion of children who died in the vitamin A group and in the control group and estimate the difference in mortality rates between the two groups.
# import data
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nepal621 = read_csv("nepal621_v2.csv")
## Rows: 27121 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): sex, age, trt, status
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Create a table that displays the numbers of deaths and numbers of survivors for the vitamin A and control groups separately for the six age-by-sex strata.
nepal621 %>%
group_by(trt, sex, age) %>%
summarize(N_Alive = sum(status=="Alive"),
Perc_Alive = N_Alive/n(),
N_Died = sum(status=="Died"),
Perc_Died = N_Died/n(),
Total=n())
## `summarise()` has grouped output by 'trt', 'sex'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 8
## # Groups: trt, sex [4]
## trt sex age N_Alive Perc_Alive N_Died Perc_Died Total
## <chr> <chr> <chr> <int> <dbl> <int> <dbl> <int>
## 1 Placebo Female 1 to 2 2615 0.973 72 0.0268 2687
## 2 Placebo Female 3 to 4 2542 0.990 25 0.00974 2567
## 3 Placebo Female <1 1219 0.946 69 0.0536 1288
## 4 Placebo Male 1 to 2 2770 0.983 47 0.0167 2817
## 5 Placebo Male 3 to 4 2677 0.990 26 0.00962 2703
## 6 Placebo Male <1 1276 0.962 51 0.0384 1327
## 7 Vit A Female 1 to 2 2724 0.981 52 0.0187 2776
## 8 Vit A Female 3 to 4 2529 0.994 15 0.00590 2544
## 9 Vit A Female <1 1291 0.960 54 0.0401 1345
## 10 Vit A Male 1 to 2 2837 0.986 40 0.0139 2877
## 11 Vit A Male 3 to 4 2752 0.996 12 0.00434 2764
## 12 Vit A Male <1 1366 0.958 60 0.0421 1426
## adj % (Make it a little prettier!)
nepal621 %>%
group_by(trt, sex, age) %>%
summarize(N_Alive = sum(status=="Alive"),
Perc_Alive = round(N_Alive/n(),4)*100,
N_Died = sum(status=="Died"),
Perc_Died = round(N_Died/n(),4)*100,
Total=n())
## `summarise()` has grouped output by 'trt', 'sex'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 8
## # Groups: trt, sex [4]
## trt sex age N_Alive Perc_Alive N_Died Perc_Died Total
## <chr> <chr> <chr> <int> <dbl> <int> <dbl> <int>
## 1 Placebo Female 1 to 2 2615 97.3 72 2.68 2687
## 2 Placebo Female 3 to 4 2542 99.0 25 0.97 2567
## 3 Placebo Female <1 1219 94.6 69 5.36 1288
## 4 Placebo Male 1 to 2 2770 98.3 47 1.67 2817
## 5 Placebo Male 3 to 4 2677 99.0 26 0.96 2703
## 6 Placebo Male <1 1276 96.2 51 3.84 1327
## 7 Vit A Female 1 to 2 2724 98.1 52 1.87 2776
## 8 Vit A Female 3 to 4 2529 99.4 15 0.59 2544
## 9 Vit A Female <1 1291 96.0 54 4.01 1345
## 10 Vit A Male 1 to 2 2837 98.6 40 1.39 2877
## 11 Vit A Male 3 to 4 2752 99.6 12 0.43 2764
## 12 Vit A Male <1 1366 95.8 60 4.21 1426
# (Ignoring age and sex for the moment) estimate the proportion of children who died in the vitamin A group and in the control group and estimate the difference in mortality rates between the two groups.
nepal621 %>%
group_by(trt) %>%
summarize(N_Alive = sum(status=="Alive"),
Perc_Alive = round(N_Alive/n(),4)*100,
N_Died = sum(status=="Died"),
Perc_Died = round(N_Died/n(),4)*100,
Total=n())
## # A tibble: 2 × 6
## trt N_Alive Perc_Alive N_Died Perc_Died Total
## <chr> <int> <dbl> <int> <dbl> <int>
## 1 Placebo 13099 97.8 290 2.17 13389
## 2 Vit A 13499 98.3 233 1.7 13732
# Calculate a 95% confidence interval for each true mortality rate.
nepal621 %>%
group_by(trt) %>%
summarize(N_Alive = sum(status=="Alive"),
p_Alive = N_Alive/n(),
N_Died = sum(status=="Died"),
p_Died = N_Died/n(),
Total=n(),
se_Died = sqrt(p_Died *(1-p_Died)/Total),
CI_L = p_Died - 1.96*se_Died,
CI_U = p_Died + 1.96*se_Died)
## # A tibble: 2 × 9
## trt N_Alive p_Alive N_Died p_Died Total se_Died CI_L CI_U
## <chr> <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 Placebo 13099 0.978 290 0.0217 13389 0.00126 0.0192 0.0241
## 2 Vit A 13499 0.983 233 0.0170 13732 0.00110 0.0148 0.0191
# By hand, calculate a 95% confidence interval for the true difference in mortality rates by treatment. Confirm using R.
p.1 = 290/13389 # fill in sample proportion for first sample
n.1 = 13389 # fill in sample size for first sample
p.2 = 233/13732 # fill in sample proportion for second sample
n.2 = 13732 # fill in sample size for second sample
diff = p.1 - p.2
se = sqrt(p.1*(1-p.1)/n.1 + p.2*(1-p.2)/n.2) # standard error
diff - 1.96*se; diff + 1.96*se # confidence interval
## [1] 0.001413756
## [1] 0.007970053
I displayed the confidence intervals for the six strata and for the overall groups (Table A) on the following graph. We can find the expected difference in mortality rate for the vitamin A and control groups, and it’s confidence interval for each stratum. We can find larger confidence intervals for those aged 3-4 year, which was due to smaller sample size of the age 3-4 group compared with the otehrs.
# 95% CI for the difference in mortality rates for the vitamin A and control groups separately for each age-sex stratum.
nepal621 %>%
group_by(sex, age, trt) %>%
summarize(N_Died = sum(status=="Died"),
p_Died = N_Died/n(),
Total = n())
## `summarise()` has grouped output by 'sex', 'age'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 6
## # Groups: sex, age [6]
## sex age trt N_Died p_Died Total
## <chr> <chr> <chr> <int> <dbl> <int>
## 1 Female 1 to 2 Placebo 72 0.0268 2687
## 2 Female 1 to 2 Vit A 52 0.0187 2776
## 3 Female 3 to 4 Placebo 25 0.00974 2567
## 4 Female 3 to 4 Vit A 15 0.00590 2544
## 5 Female <1 Placebo 69 0.0536 1288
## 6 Female <1 Vit A 54 0.0401 1345
## 7 Male 1 to 2 Placebo 47 0.0167 2817
## 8 Male 1 to 2 Vit A 40 0.0139 2877
## 9 Male 3 to 4 Placebo 26 0.00962 2703
## 10 Male 3 to 4 Vit A 12 0.00434 2764
## 11 Male <1 Placebo 51 0.0384 1327
## 12 Male <1 Vit A 60 0.0421 1426
## alternatively, calculate the CIs directly within each age/sex strata
nepal621 %>%
group_by(sex, age) %>%
summarize(N_Plac = sum(trt=="Placebo"),
p_Plac = sum(status=="Died" & trt=="Placebo")/N_Plac,
N_VitA = sum(trt=="Vit A"),
p_VitA = sum(status=="Died" & trt=="Vit A")/N_VitA,
diff = p_Plac - p_VitA,
se = sqrt(p_Plac*(1 - p_Plac)/N_Plac + p_VitA*(1 - p_VitA)/N_VitA),
CI_L = diff - 1.96*se,
CI_U = diff + 1.96*se)
## `summarise()` has grouped output by 'sex'. You can override using the `.groups`
## argument.
## # A tibble: 6 × 10
## # Groups: sex [2]
## sex age N_Plac p_Plac N_VitA p_VitA diff se CI_L CI_U
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Female 1 to 2 2687 0.0268 2776 0.0187 0.00806 0.00404 0.000144 0.0160
## 2 Female 3 to 4 2567 0.00974 2544 0.00590 0.00384 0.00246 -0.000983 0.00867
## 3 Female <1 1288 0.0536 1345 0.0401 0.0134 0.00825 -0.00274 0.0296
## 4 Male 1 to 2 2817 0.0167 2877 0.0139 0.00278 0.00325 -0.00360 0.00916
## 5 Male 3 to 4 2703 0.00962 2764 0.00434 0.00528 0.00226 0.000856 0.00970
## 6 Male <1 1327 0.0384 1426 0.0421 -0.00364 0.00749 -0.0183 0.0110
# draw the graph for the above 95% ci
#utils::install.packages("Hmisc")
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
library(dplyr)
dataForCIplot = nepal621 %>%
group_by(sex, age) %>%
dplyr::summarize(N_Plac = sum(trt=="Placebo"),
p_Plac = sum(status=="Died" & trt=="Placebo")/N_Plac,
N_VitA = sum(trt=="Vit A"),
p_VitA = sum(status=="Died" & trt=="Vit A")/N_VitA,
diff = p_Plac - p_VitA,
se = sqrt(p_Plac*(1 - p_Plac)/N_Plac + p_VitA*(1 - p_VitA)/N_VitA),
CI_L = diff - 1.96*se,
CI_U = diff + 1.96*se)
## `summarise()` has grouped output by 'sex'. You can override using the `.groups`
## argument.
agestrata = c(1,2,3,4,5,6,7)
agestrata_labels = c("F < 1", "F 1-2", "F 3-4", "M < 1", "M 1-2", "M 3-4", "Overall")
diff = c(dataForCIplot$diff, 0.0047)
LL = c(dataForCIplot$CI_L, 0.00142)
UL = c(dataForCIplot$CI_U, 0.00798)
## Add labels to the axes
errbar(x = agestrata,
y = diff,
yplus = LL,
yminus = UL,
xaxt = "n", #xaxt removes the numberic lables
xlab = "Age/Gender Group", #label for x-axis
ylab = "Difference in Mortality Rates (Placebo - VitA)") #label for y-axis()
## Add a title
title(main="95% Confidence Intervals for Difference in Mortality Rates")
## Add group labels for the age-gender groups
axis(side=1, #1 = the bottom of graph
at=agestrata, #where on x-axis; same as "x" in errbar
labels=agestrata_labels) #what the labels are
# Add horizontal line at zero
abline(h=0, col="red")
Vitamin A supplementation is associated with large reductions in mortality in children. Our results indicate statistically significant decrease in the mortality rates for the vitamin A group compared with that of the control group in overall children (risk difference = 0.0047, 95% CI = (0.00798, 0.00142)). When comparing with each age-gender strata, we can find vitamine A supplementation significantly decreases the mortality rate in only girls aged under 1 year and boys aged 1-2 year. Several factors such as risk of bias in study design, small sample size, and risk of measurement error can contribute to increase uncertainty in our estimated values.
# linear regression
model1 = glm(as.factor(status) ~ trt, data=nepal621, family=binomial(link="identity"))
summary(model1)
##
## Call:
## glm(formula = as.factor(status) ~ trt, family = binomial(link = "identity"),
## data = nepal621)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.021660 0.001258 17.217 < 2e-16 ***
## trtVit A -0.004692 0.001673 -2.805 0.00503 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5166.0 on 27120 degrees of freedom
## Residual deviance: 5158.1 on 27119 degrees of freedom
## AIC: 5162.1
##
## Number of Fisher Scoring iterations: 2
confint(model1)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.019284720 0.024217501
## trtVit A -0.007988047 -0.001420949
Based on the following logistic regression, there are statistically significant difference in the log odds between vitamin A and placedo group in both age strata.
Among children aged more than 3 years, I can estimate the odds of death is statistically significantly reduced by 48% in children who have vitamin A (OR = 0.523, 95%CI = (0.328, 0.835)). With the confidence intervals excluding the null value, I can reject the null hypothesis that vitamin A is not effective for the age group. Among children aged less than 3 years, I can estimate the odds of death is reduced by 18% in children who have vitamin A (OR = 0.826, 95%CI = (0.684, 0.998)). I can also reject the null hypothesis with the confidence intervals.
Among children aged more than 3 years, the odds of death is statistically significantly reduced by 48% in children who have vitamin A compared with those without vitamin A (OR = 0.523, 95%CI = (0.328, 0.835)). Among children aged less than 3 years, the odds of death statistically significantly decrease by 18% in children who have vitamin A compared with the control group (OR = 0.826, 95%CI = (0.684, 0.998)). As a result, vitamin A supplementation is associated with reductions in mortality in children.
# import dataset
library(tidyverse)
nepal621 = read_csv("nepal621.csv")
## Rows: 27121 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): sex, age, trt, status
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Create two age groups (< 3 years, ≥ 3 years)
nepal621 = nepal621 %>%
mutate(agegp = ifelse(age == "3-4", "3+ years", "<3 years"))
# Calculates the odds by age group and trt:
nepal621 %>%
group_by(agegp, trt) %>%
dplyr::summarize(N_Alive = sum(status=="Alive"),
N_Died = sum(status=="Died"),
Odds = N_Died/N_Alive)
## `summarise()` has grouped output by 'agegp'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 5
## # Groups: agegp [2]
## agegp trt N_Alive N_Died Odds
## <chr> <chr> <int> <int> <dbl>
## 1 3+ years Placebo 5219 51 0.00977
## 2 3+ years Vit A 5281 27 0.00511
## 3 <3 years Placebo 7880 239 0.0303
## 4 <3 years Vit A 8218 206 0.0251
nepal621 %>%
group_by(agegp) %>%
dplyr::summarize(N_Alive_P = sum(status=="Alive" & trt=="Placebo"),
N_Died_P = sum(status=="Died" & trt=="Placebo"),
N_Alive_V = sum(status=="Alive" & trt=="Vit A"),
N_Died_V = sum(status=="Died" & trt=="Vit A"),
OR = (N_Died_V/N_Alive_V)/(N_Died_P/N_Alive_P),
se = sqrt(1/N_Alive_P + 1/N_Died_P + 1/N_Alive_V + 1/N_Died_V),
CI_L = exp(log(OR)-1.96*se),
CI_U = exp(log(OR)+1.96*se))
## # A tibble: 2 × 9
## agegp N_Alive_P N_Died_P N_Alive_V N_Died_V OR se CI_L CI_U
## <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 3+ years 5219 51 5281 27 0.523 0.239 0.328 0.835
## 2 <3 years 7880 239 8218 206 0.826 0.0964 0.684 0.998
Now separately for each age stratum, estimate the odds ratio by using a logistic regression of the binary survival indicator on vitamin A.
# seperately for each age startum, estimate the odds ratio by using a logistic regression of the binary survival indicator
nepal621.lowage = nepal621 %>% filter(agegp == "<3 years")
model2 = glm(as.factor(status) ~ trt, data=nepal621.lowage,
family=binomial(link="logit"))
summary(model2) # This summary is on the logOR scale
##
## Call:
## glm(formula = as.factor(status) ~ trt, family = binomial(link = "logit"),
## data = nepal621.lowage)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.49562 0.06566 -53.240 <2e-16 ***
## trtVit A -0.19059 0.09637 -1.978 0.048 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4095.8 on 16542 degrees of freedom
## Residual deviance: 4091.9 on 16541 degrees of freedom
## AIC: 4095.9
##
## Number of Fisher Scoring iterations: 6
exp(model2$coefficients) # We exponentiate to get on the OR scale
## (Intercept) trtVit A
## 0.03032995 0.82647439
exp(confint(model2)) # We can also exponentiate the CI to the OR scale
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.02659672 0.03440764
## trtVit A 0.68385047 0.99800770
nepal621.highage = nepal621 %>% filter(agegp == "3+ years")
model3 = glm(as.factor(status) ~ trt, data=nepal621.highage,
family=binomial(link="logit"))
summary(model3)
##
## Call:
## glm(formula = as.factor(status) ~ trt, family = binomial(link = "logit"),
## data = nepal621.highage)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.6282 0.1407 -32.892 < 2e-16 ***
## trtVit A -0.6478 0.2388 -2.713 0.00667 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 921.36 on 10577 degrees of freedom
## Residual deviance: 913.62 on 10576 degrees of freedom
## AIC: 917.62
##
## Number of Fisher Scoring iterations: 8
exp(model3$coefficients)
## (Intercept) trtVit A
## 0.009771987 0.523196364
exp(confint(model3))
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.007320666 0.01272323
## trtVit A 0.323301346 0.82799851