Quantitative Research Methodology Workshops - SoN

Welcome to our 2022 quantitative methods workshop.

The program is to run over three sessions.

Workshop 1 : Wednesday 22 September 2021, 9:00am – 12:00pm - Research design – planning a study and determining an appropriate sample size: What is the appropriate study design for your project? What data will you need to collect, and how many study participants do you need?
Workshop 2: Wednesday 29 September 2021 - Preparing data for analysis and descriptive statistics : How to prepare data for analysis and presenting the characteristics of your study population. Preparing tables and figures for publication.
Workshop 3 :Wednesday 29 September 2021 Inferential statistics and combining data from multiple studies in a meta-analysis: Comparing groups and adjusting for potential confounding factors. How do you combine results from multiple studies in a meta-analysis to summarize current evidence?

For the practical sessions we will be using R language for statistical computing.

For our practical sessions we will be using the R Statistical language (https://www.r-project.org/) via the R Studio editor (https://www.rstudio.com/products/rstudio/download/#download).

The R language is a object orientated statistical program, that can be used for all stages of the research process, including:

the planning of a study, meta-analysis of current literature and sample size;
the cleaning and management of study data;
the analysis; and,
the preparation of manuscripts for publication - including all tables and figures.

Workshop 1 : Research design – planning a study and determining an appropriate sample size:

Research design: What is the most appropriate study design for your research aim?

What is the appropriate study design for your project?
What data will you need to collect, and
How many study participants do you need?

What type of study are you and your team planning to undertake? Is your project aimed at evaluating the effectiveness of a new interventions? Identifying risk factors for a specific outcome of interest? Are you interested in describing the burden of a problem among a specific population and at a specific time? Or, do you want to evaluate the accuracy of a screening or diagnostic test. Depending on the specific aim of your project, consideration needs to be given to the best study design to address your aims.

Table 1. Research aim, suitable study design, and outcome measure.

Aim of study	Best study design	Outcomes measures
Effectiveness of a new intervention	Randomised controlled trial	Relative Risk Reduction, Absolute Risk Reduction, Risk Difference, and confidence intervals
Risk factor for disease	Cohort or Case-control study (rare outcome)	Relative Risk, Odds Ratio for exposure, and confidence intervals
Burden of a disease among a specific population	Cohort for incidence ( short event, may results in death), cross-sectional survey (disease for long period of time)	Incidence, prevalence, and confidence intervals
Development of diagnostic test	Case-control	Sensitivity, Specificity, Area Under the ROC Curve, and confidence intervals

Reading data into R.

Data sets can be loaded into R various ways. The simplist, is via a spreadsheet, such as a comma separated volume (csv). Using the ‘read.csv’ function. However, a working directory needs to be set-up.

library(epiR)
options(width = 800, fig.width = 8, fig.height = 6)
# set working directly
setwd('~/OneDrive - University of Wollongong/Data/')

# read dataset into R, naming the data set dta.
dta <- read.csv(file = 'framingham.csv', header = T, sep = ',')

# check the names of the columns
names(dta)

##  [1] "male"            "age"             "education"       "currentSmoker"   "cigsPerDay"      "BPMeds"          "prevalentStroke" "prevalentHyp"    "diabetes"        "totChol"         "sysBP"           "diaBP"           "BMI"             "heartRate"       "glucose"         "TenYearCHD"

Loading and using R packages

The first time you use a R package it will be need to be loaded into R using the “install.packages” function. Once installed into R, the package is loaded a the start of your R session, using “library(package.name)”. A common packages to create summary tables is Frank Harrel’s Hmisc package, and Bendix Carsten’s Epi package.

# Frank Harrels's Hmisc package
## install.packages('Hmisc')

# Bendix Carsten's Epi package
## install.packages('Epi')

Describing data

# examples of describing data 
library(Hmisc)
library(Epi)

# creat an ID variable from 1 to the end of the last row
dta$id <- 1:length(dta[,1])

# describe a single variable
describe(dta$age)

## dta$age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4240        0       39    0.999    49.58     9.85       37       39       42       49       56       62       64 
## 
## lowest : 32 33 34 35 36, highest: 66 67 68 69 70

# Mean and SD age
mean(dta$age)

## [1] 49.58019

sd(dta$age)

## [1] 8.572942

# median and interquartile range (IQR)
quantile(dta$age, probs = c(0.5,0.25,0.75))

## 50% 25% 75% 
##  49  42  56

# mean and sd of age based on sex
tapply(dta$age, dta$male, mean)

##        0        1 
## 49.79587 49.29341

tapply(dta$age, dta$male, sd)

##        0        1 
## 8.597883 8.533584

# describe the whole data file
## describe(dta)

# create a variable equal 1 for counting purposes
dta$n <- 1

# Create a new variable called sex and label numeric values (0 = female, 1 = male)
dta$sex <- factor(dta$male, levels = 0:1, labels = c('female','male'))
table(dta$sex)

## 
## female   male 
##   2420   1820

# look at first 10 records
head(dta, 10)

##    male age education currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp diabetes totChol sysBP diaBP   BMI heartRate glucose TenYearCHD id n    sex
## 1     1  39         4             0          0      0               0            0        0     195 106.0    70 26.97        80      77          0  1 1   male
## 2     0  46         2             0          0      0               0            0        0     250 121.0    81 28.73        95      76          0  2 1 female
## 3     1  48         1             1         20      0               0            0        0     245 127.5    80 25.34        75      70          0  3 1   male
## 4     0  61         3             1         30      0               0            1        0     225 150.0    95 28.58        65     103          1  4 1 female
## 5     0  46         3             1         23      0               0            0        0     285 130.0    84 23.10        85      85          0  5 1 female
## 6     0  43         2             0          0      0               0            1        0     228 180.0   110 30.30        77      99          0  6 1 female
## 7     0  63         1             0          0      0               0            0        0     205 138.0    71 33.11        60      85          1  7 1 female
## 8     0  45         2             1         20      0               0            0        0     313 100.0    71 21.68        79      78          0  8 1 female
## 9     1  52         1             0          0      0               0            1        0     260 141.5    89 26.36        76      79          0  9 1   male
## 10    1  43         1             1         30      0               0            1        0     225 162.0   107 23.61        93      88          0 10 1   male

# looks at the last 10 records
tail(dta, 10)

##      male age education currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp diabetes totChol sysBP diaBP   BMI heartRate glucose TenYearCHD   id n    sex
## 4231    0  56         1             1          3      0               0            1        0     268 170.0   102 22.89        57      NA          0 4231 1 female
## 4232    1  58         3             0          0      0               0            1        0     187 141.0    81 24.96        80      81          0 4232 1   male
## 4233    1  68         1             0          0      0               0            1        0     176 168.0    97 23.14        60      79          1 4233 1   male
## 4234    1  50         1             1          1      0               0            1        0     313 179.0    92 25.97        66      86          1 4234 1   male
## 4235    1  51         3             1         43      0               0            0        0     207 126.5    80 19.71        65      68          0 4235 1   male
## 4236    0  48         2             1         20     NA               0            0        0     248 131.0    72 22.00        84      86          0 4236 1 female
## 4237    0  44         1             1         15      0               0            0        0     210 126.5    87 19.16        86      NA          0 4237 1 female
## 4238    0  52         2             0          0      0               0            0        0     269 133.5    83 21.47        80     107          0 4238 1 female
## 4239    1  40         3             0          0      0               0            1        0     185 141.0    98 25.60        67      72          0 4239 1   male
## 4240    0  39         3             1         30      0               0            0        0     196 133.0    86 20.91        85      80          0 4240 1 female

Creating summary tables using the Epi package.

# Using the Epi package stat.table create summary  of rates 
# various factors
stat.table(sex, contents = list('Age (mean)' = mean(age),
                                'Age (SD)' = sd(age),
                                'HR (mean)' = mean(heartRate),
                                'HR (SD)' = sd(heartRate),
                                'BMI (mean)' = mean(BMI)),
                                 margin = T,
                                 data = dta)

##  ------------------------------------------------- 
##  sex          Age     Age      HR HR (SD)     BMI  
##            (mean)    (SD)  (mean)          (mean)  
##  ------------------------------------------------- 
##  female     49.80    8.60   77.10   12.09   25.51  
##  male       49.29    8.53   74.26   11.74   26.19  
##                                                    
##  Total      49.58    8.57   75.88   12.03   25.80  
##  -------------------------------------------------

stat.table(sex, contents = list('BMI (SD)' = sd(BMI),
                                'BSL (mean)' = mean(glucose),
                                'BSL (SD)' = sd(glucose),
                                'SBP (mean)' = mean(sysBP),
                                'SBP (sd)' = sd(sysBP)),
                                 margin = T,
                                 data = dta)

##  ------------------------------------------------- 
##  sex          BMI     BSL     BSL     SBP     SBP  
##              (SD)  (mean)    (SD)  (mean)    (sd)  
##  ------------------------------------------------- 
##  female      4.50   81.84   23.24  133.04   23.79  
##  male        3.42   82.12   24.83  131.44   19.42  
##                                                    
##  Total       4.08   81.96   23.95  132.35   22.03  
##  -------------------------------------------------

stat.table(sex, contents = list('smk (%)' = ratio(currentSmoker,n),
                                'HT (%)' = ratio(prevalentHyp,n),
                                'DM (%)' = ratio(diabetes,n),
                                'CHD (%)' = ratio(TenYearCHD,n)),
                                 margin = T,
                                 data = dta)

##  ----------------------------------------- 
##  sex      smk (%)  HT (%)  DM (%) CHD (%)  
##  ----------------------------------------- 
##  female      0.41    0.31    0.02    0.12  
##  male        0.61    0.31    0.03    0.19  
##                                            
##  Total       0.49    0.31    0.03    0.15  
##  -----------------------------------------

Using Frank Harrell’s Hmisc package can be easier, as continuous and frequencey data can easily be combined into a single table. This approach can create Table 1 for a manuscript.

# Using Hmsic summary function

print(summary(sex ~ age + 
                    BMI +
                    heartRate +
                    glucose +
                    sysBP +
                    currentSmoker +
                    prevalentHyp +
                    diabetes, method = 'reverse',
                    overall = TRUE,
                    test = T,
                    data = dta), prmsd = T, 
                                 digits = 2,
                                 prtest = 'P')

## 
## 
## Descriptive Statistics by sex
## 
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |             |N   |female                     |male                       |Combined                   |P-value          |
## |             |    |(N=2420)                   |(N=1820)                   |(N=4240)                   |                 |
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |age          |4240|42.0/49.0/57.0  49.8+/- 8.6|42.0/48.0/56.0  49.3+/- 8.5|42.0/49.0/56.0  49.6+/- 8.6|           0.054 |
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |BMI          |4221|22.5/24.7/27.7  25.5+/- 4.5|24.0/26.1/28.3  26.2+/- 3.4|23.1/25.4/28.0  25.8+/- 4.1|           <0.001|
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |heartRate    |4239|     69/75/85  77+/-12     |     66/75/80  74+/-12     |     68/75/83  76+/-12     |           <0.001|
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |glucose      |3852|     72/78/86  82+/-23     |     70/78/87  82+/-25     |     71/78/87  82+/-24     |            0.56 |
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |sysBP        |4240|   116/128/145  133+/- 24  |   118/128/141  131+/- 19  |   117/128/144  132+/- 22  |            0.72 |
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |currentSmoker|4240|        41%  ( 989)        |        61%  (1106)        |        49%  (2095)        |           <0.001|
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |prevalentHyp |4240|        31%  ( 746)        |        31%  ( 571)        |        31%  (1317)        |            0.7  |
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |diabetes     |4240|         2%  (  57)        |         3%  (  52)        |         3%  ( 109)        |            0.31 |
## +-------------+----+---------------------------+---------------------------+---------------------------+-----------------+

Subset the data to include study particpants aged 70 years for more.

# check the range of ages of the original dataset.
min(dta$age)

## [1] 32

max(dta$age)

## [1] 70

# check if any ages are missing
describe(dta$age)

## dta$age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4240        0       39    0.999    49.58     9.85       37       39       42       49       56       62       64 
## 
## lowest : 32 33 34 35 36, highest: 66 67 68 69 70

table(is.na(dta$age))

## 
## FALSE 
##  4240

Using the subset function to limit the data to study particpants aged 50 year or older.

# subset the data to focus on study participants aged 50 years or older.
d <- subset(dta, age > 49)

# Create 10-year age groups
d$agegrp <- cut(d$age, breaks = c(49,54,59,64,70), labels = c('50-54','55-59','60-64','65-70'))

# check the results
tapply(d$age, d$agegrp, min)

## 50-54 55-59 60-64 65-70 
##    50    55    60    65

tapply(d$age, d$agegrp, max)

## 50-54 55-59 60-64 65-70 
##    54    59    64    70

Workshop 2: Wednesday 29 September 2021 Preparing data for analysis and descriptive statistic.

How to prepare data for analysis and presenting the characteristics of your study population. Preparing tables and figures for publication.

Week 2 Practical sessions r-code

## Framingham data
## install.packages('epiR')
## install.packages('Hmisc')
## install.packages('Epi')

library(epiR)
library(Hmisc)
library(Epi)
library(epiDisplay)

## set working directory
## setwd('c:/data/') ## Windows
setwd('~/OneDrive - University of Wollongong/Data/')
dta <- read.csv(file = 'framingham.csv', header = T, sep = ',')

# look at names
names(dta)

##  [1] "male"            "age"             "education"       "currentSmoker"   "cigsPerDay"      "BPMeds"          "prevalentStroke" "prevalentHyp"    "diabetes"        "totChol"         "sysBP"           "diaBP"           "BMI"             "heartRate"       "glucose"         "TenYearCHD"

# check distribution of variables and creating categories
# Age
min(dta$age)

## [1] 32

max(dta$age)

## [1] 70

mean(dta$age)

## [1] 49.58019

sd(dta$age)

## [1] 8.572942

quantile(dta$age, probs = c(0.5,0.25, 0.75))

## 50% 25% 75% 
##  49  42  56

describe(dta)

## dta 
## 
##  16  Variables      4240  Observations
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## male 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4240        0        2    0.735     1820   0.4292   0.4901 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4240        0       39    0.999    49.58     9.85       37       39       42       49       56       62       64 
## 
## lowest : 32 33 34 35 36, highest: 66 67 68 69 70
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## education 
##        n  missing distinct     Info     Mean      Gmd 
##     4135      105        4    0.894    1.979    1.093 
##                                   
## Value          1     2     3     4
## Frequency   1720  1253   689   473
## Proportion 0.416 0.303 0.167 0.114
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## currentSmoker 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4240        0        2     0.75     2095   0.4941      0.5 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## cigsPerDay 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4211       29       33    0.862    9.006       12        0        0        0        0       20       25       30 
## 
## lowest :  0  1  2  3  4, highest: 43 45 50 60 70
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## BPMeds 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4187       53        2    0.086      124  0.02962  0.05749 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## prevalentStroke 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4240        0        2    0.018       25 0.005896  0.01173 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## prevalentHyp 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4240        0        2    0.642     1317   0.3106   0.4284 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## diabetes 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4240        0        2    0.075      109  0.02571  0.05011 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## totChol 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4190       50      248        1    236.7    49.07      170      183      206      234      263      292      312 
## 
## lowest : 107 113 119 124 126, highest: 439 453 464 600 696
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## sysBP 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4240        0      234        1    132.4    23.78      104      109      117      128      144      162      175 
## 
## lowest :  83.5  85.0  85.5  90.0  92.0, highest: 235.0 243.0 244.0 248.0 295.0
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## diaBP 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4240        0      146    0.999     82.9     13.1     66.0     69.0     75.0     82.0     90.0     98.0    104.5 
## 
## lowest :  48.0  50.0  51.0  52.0  53.0, highest: 133.0 135.0 136.0 140.0 142.5
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## BMI 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4221       19     1364        1     25.8    4.428    20.06    21.08    23.07    25.40    28.04    30.77    32.78 
## 
## lowest : 15.54 15.96 16.48 16.59 16.61, highest: 44.71 45.79 45.80 51.28 56.80
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## heartRate 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4239        1       73    0.996    75.88     13.3       60       60       68       75       83       92       98 
## 
## lowest :  44  45  46  47  48, highest: 122 125 130 140 143
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## glucose 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     3852      388      143    0.999    81.96    18.27     62.0     65.0     71.0     78.0     87.0     98.0    108.4 
## 
## lowest :  40  43  44  45  47, highest: 348 368 370 386 394
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## TenYearCHD 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     4240        0        2    0.386      644   0.1519   0.2577 
## 
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

describe(dta$age)

## dta$age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4240        0       39    0.999    49.58     9.85       37       39       42       49       56       62       64 
## 
## lowest : 32 33 34 35 36, highest: 66 67 68 69 70

describe(dta$BMI)

## dta$BMI 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4221       19     1364        1     25.8    4.428    20.06    21.08    23.07    25.40    28.04    30.77    32.78 
## 
## lowest : 15.54 15.96 16.48 16.59 16.61, highest: 44.71 45.79 45.80 51.28 56.80

hist(dta$age)

# Sex - male yes(1)/no(0)
table(dta$male)

## 
##    0    1 
## 2420 1820

stat.table(male, contents = list(N = count(),
                                 P = percent(male)),
                                 margin = T,
                                 data = dta)

##  ------------------------ 
##  male          N       P  
##  ------------------------ 
##  0          2420    57.1  
##  1          1820    42.9  
##                           
##  Total      4240   100.0  
##  ------------------------

mean(dta$male)

## [1] 0.4292453

# creating new variables sex = Women and Men, based on male 1/0
dta$sex <- factor(dta$male, levels = 0:1, labels = c('Women','Men'))

# Check results
table(dta$male, dta$sex)

##    
##     Women  Men
##   0  2420    0
##   1     0 1820

# recode diabetes yes/no
head(dta$diabetes)

## [1] 0 0 0 0 0 0

table(dta$diabetes)

## 
##    0    1 
## 4131  109

dta$diabetes <- factor(dta$diabetes, levels = 0:1, labels = c('No','Yes'))
table(dta$diabetes)

## 
##   No  Yes 
## 4131  109

# HT
head(dta$prevalentHyp)

## [1] 0 0 0 1 0 1

table(dta$prevalentHyp)

## 
##    0    1 
## 2923 1317

mean(dta$prevalentHyp)

## [1] 0.3106132

dta$prevalentHyp <- factor(dta$prevalentHyp, levels = 0:1, labels = c('No','Yes'))
table(dta$prevalentHyp)

## 
##   No  Yes 
## 2923 1317

# recode currentSmoker yes/no
head(dta$currentSmoker)

## [1] 0 0 1 1 1 0

table(dta$currentSmoker)

## 
##    0    1 
## 2145 2095

mean(dta$currentSmoker)

## [1] 0.4941038

dta$currentSmoker <- factor(dta$currentSmoker, levels = 0:1, labels = c('No','Yes'))
table(dta$currentSmoker)

## 
##   No  Yes 
## 2145 2095

# education
table(dta$education)

## 
##    1    2    3    4 
## 1720 1253  689  473

dta$education <- factor(dta$education, levels = 1:4,
                                       labels = c('Some high school', 
                                                  'high school/GED', 
                                                  'some college/vocational school', 
                                                  'college'))
table(dta$education)

## 
##               Some high school                high school/GED some college/vocational school                        college 
##                           1720                           1253                            689                            473

# the units for totChol and glucose are mg/dL (18 mg/dL = 1 mmol/L)
# create new variables in mmol/L
head(dta$totChol)

## [1] 195 250 245 225 285 228

dta$totChol.mmols <- dta$totChol / 18 
dta$glucose.mmols <- dta$glucose / 18 

d <- subset(dta, age > 49)
d$n <- 1
# Create skewed follow-up in mths
set.seed(12345)
d$follow.up <- runif(n = length(d$age), min = 6, max = 15*24)
hist(d$follow.up)

mean(d$follow.up)

## [1] 187.8905

sd(d$follow.up)

## [1] 101.5397

median(d$follow.up)

## [1] 188.2808

Creating Table 1. Framingham data.

# Table 1.
tab.1 <- print(summary(sex ~ age +
                             education +
                             BMI +
                             sysBP +
                             prevalentHyp +
                             currentSmoker +
                             diabetes +
                             totChol.mmols +
                             glucose.mmols +
                             follow.up,
                             method = 'reverse',
                             overall = T,
                             data = d,
                             test = T), prmsd = T, 
                                        digits = 3,
                                        prtest = 'P')

## 
## 
## Descriptive Statistics by sex
## 
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |                                  |N   |Women                           |Men                             |Combined                        |P-value            |
## |                                  |    |(N=1177)                        |(N=846)                         |(N=2023)                        |                   |
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |age                               |2023|53.00/57.00/61.00  57.36+/- 4.90|53.00/57.00/61.00  57.12+/- 4.94|53.00/57.00/61.00  57.26+/- 4.92|              0.247|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |education : Some high school      |1967|           49%  ( 564)          |           56%  ( 459)          |           52%  (1023)          |             <0.001|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |    high school/GED               |    |           27%  ( 310)          |           18%  ( 146)          |           23%  ( 456)          |                   |
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |    some college/vocational school|    |           17%  ( 195)          |           12%  ( 101)          |           15%  ( 296)          |                   |
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |    college                       |    |            7%  (  78)          |           14%  ( 114)          |           10%  ( 192)          |                   |
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |BMI                               |2013|23.42/25.76/28.70  26.48+/- 4.64|24.05/26.05/28.39  26.16+/- 3.32|23.66/25.91/28.55  26.35+/- 4.14|              0.536|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |sysBP                             |2023|126.0/140.0/157.5  143.3+/- 24.9|120.0/132.0/148.0  135.8+/- 22.1|123.0/136.0/154.0  140.2+/- 24.0|             <0.001|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |prevalentHyp : Yes                |2023|           48%  ( 570)          |           38%  ( 322)          |           44%  ( 892)          |             <0.001|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |currentSmoker : Yes               |2023|           29%  ( 336)          |           53%  ( 448)          |           39%  ( 784)          |             <0.001|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |diabetes : Yes                    |2023|            4%  (  44)          |            4%  (  37)          |            4%  (  81)          |              0.472|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |totChol.mmols                     |1994|12.72/14.11/15.67  14.30+/- 2.47|11.58/13.00/14.39  13.08+/- 2.29|12.17/13.56/15.17  13.79+/- 2.47|             <0.001|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |glucose.mmols                     |1844|   4.06/4.44/4.93  4.71+/-1.52  |   3.94/4.39/4.94  4.70+/-1.67  |   4.00/4.39/4.94  4.70+/-1.58  |              0.081|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+
## |follow.up                         |2023|     102/185/278  186+/-102     |     102/198/275  190+/-101     |     102/188/276  188+/-102     |              0.428|
## +----------------------------------+----+--------------------------------+--------------------------------+--------------------------------+-------------------+

tab.1

##                                    [,1]   [,2]                               [,3]                               [,4]                               [,5]                 
##                                    "N"    "Women \n(N=1177)"                 "Men \n(N=846)"                    "Combined \n(N=2023)"              "P-value"            
## age                                "2023" "53.00/57.00/61.00  57.36+/- 4.90" "53.00/57.00/61.00  57.12+/- 4.94" "53.00/57.00/61.00  57.26+/- 4.92" "              0.247"
## education : Some high school       "1967" "           49%  ( 564)"           "           56%  ( 459)"           "           52%  (1023)"           "             <0.001"
##     high school/GED                ""     "           27%  ( 310)"           "           18%  ( 146)"           "           23%  ( 456)"           "                "   
##     some college/vocational school ""     "           17%  ( 195)"           "           12%  ( 101)"           "           15%  ( 296)"           "                "   
##     college                        ""     "            7%  (  78)"           "           14%  ( 114)"           "           10%  ( 192)"           "                "   
## BMI                                "2013" "23.42/25.76/28.70  26.48+/- 4.64" "24.05/26.05/28.39  26.16+/- 3.32" "23.66/25.91/28.55  26.35+/- 4.14" "              0.536"
## sysBP                              "2023" "126.0/140.0/157.5  143.3+/- 24.9" "120.0/132.0/148.0  135.8+/- 22.1" "123.0/136.0/154.0  140.2+/- 24.0" "             <0.001"
## prevalentHyp : Yes                 "2023" "           48%  ( 570)"           "           38%  ( 322)"           "           44%  ( 892)"           "             <0.001"
## currentSmoker : Yes                "2023" "           29%  ( 336)"           "           53%  ( 448)"           "           39%  ( 784)"           "             <0.001"
## diabetes : Yes                     "2023" "            4%  (  44)"           "            4%  (  37)"           "            4%  (  81)"           "              0.472"
## totChol.mmols                      "1994" "12.72/14.11/15.67  14.30+/- 2.47" "11.58/13.00/14.39  13.08+/- 2.29" "12.17/13.56/15.17  13.79+/- 2.47" "             <0.001"
## glucose.mmols                      "1844" "   4.06/4.44/4.93  4.71+/-1.52"   "   3.94/4.39/4.94  4.70+/-1.67"   "   4.00/4.39/4.94  4.70+/-1.58"   "              0.081"
## follow.up                          "2023" "     102/185/278  186+/-102"      "     102/198/275  190+/-101"      "     102/188/276  188+/-102"      "              0.428"

# create csv for your working directory.

write.csv(tab.1, file = 'TableOne.csv', row.names = T)

Table 1. Characteristics of study participants.

	Women (n=1,117)	Men (n=846)	Combined (N=2,023)	p-value
Age (years), mean (SD)	57 (4.9)	57 (4.9)	57 (4.9)	0.247
Education, n (%)				<0.001
- Some high school	564 (49)	459 (56)	1,023 (52)
- high school/GED	310 (27)	146 (18)	456 (23)
- some college/vocational school	195 (17)	101 (12)	295 (15)
- college	78 (7)	114 (14)	192 (10)
BMI (kg/m^2), mean (SD)	26.5 (4.6)	26.2 (3.3)	26.4 (4.1)	0.536
Systolic BP (mmHg), mean (SD)	143.3 (24.9)	135.8 (22.1)	140.2 (24.0)	<0.001
Hypertension, n (%)	570 (48)	322 (38)	892 (44)	<0.001
Current smoker, n (%)	336 (29)	448 (53)	784 (39)	<0.001
Diabetes, n (%)	44 (4)	37 (4)	81 (4)	0.472
Total chol (mmol/L), mean (SD)	14.3 (2.5)	13.1 (2.3)	13.8 (2.5)	<0.001
Glucose (mmol/L), mean (SD)	4.7 (1.5)	4.7 (1.7)	4.7 (1.6)	0.081
Follow-up (mths), median (IQR)	185 (102, 278)	198 (102, 275)	188 (102, 276)	0.428

Using the Hmisc ‘summary’ function we have created the above Table, we have included test = TRUE, so the function will compare the continuous variables using (un-paired T-Test), and the frequency data using a Pearson’s chi-squared test. The mean age of women and men are both essentially 57 years and 4.9 years, respectively, so even before comparing these mean and SD, it is obvious there is now differences between the mean values. To assess the role of change when comparing the mean values of two groups, we often use a t-test, formally known as Student’s t-test, as William S. Gosset ¹ at the time of publishing this method was employee of the Head Brewer of Guinness, his employer’s policy in publishing research results at that time prevented him from using his real name. Student’s t-test essentially is the ration of the difference between the two group mean weighted by an estimate of error. If we let the two group means be expressed as \(\bar{x_1}\) and \(\bar{x_2}\), repestively, the test statistic is estimates as:

\(t = \frac{\bar{x_1} - \bar{x_2}}{SE(\bar{x_1} - \bar{x_2})}\)

Where, \(SE(\bar{x_1} - \bar{x_2})\) is a summary standard error of both the means, using the sample size and SD of the groups. However, there is two version, one based on assuming and equal variance (\(SD^2\) or \(\sigma^2\)), and the other unequal variance. The former being:

\(SE(\bar{x_1} - \bar{x_2}) = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\)

When using the R t-test function we can choose between the two by using the subcommand ‘var.equal’ T/F. Given the equal SD and therefore variance of age between women and men, we will include T.

# Comparing the mean age of women and men using a unpaired t-test.
t.test(age ~ sex, data = d, equal.var = T)

## 
##  Welch Two Sample t-test
## 
## data:  age by sex
## t = 1.0718, df = 1810.5, p-value = 0.2839
## alternative hypothesis: true difference in means between group Women and group Men is not equal to 0
## 95 percent confidence interval:
##  -0.1974703  0.6734129
## sample estimates:
## mean in group Women   mean in group Men 
##            57.35854            57.12057

# calculating the t statistic
t <- (57.35854 - 57.12057) / ((4.9^2/(1177-1)) + (4.94^2/(846-1)))^0.5
t

## [1] 1.071799

# and the associated p-value
pt(t, df = 1810.5, lower.tail = F)*2

## [1] 0.2839531

Essentially the p-value is saying that given a true no-difference in the mean age between women and men, if we repeated the sampling from the population 100 times, we would observe 28 of these occasions a mean difference between the groups of 0.238 year (chance variation).

So let’s look at the case where in fact there appears to be a difference in the mean values of a continuous variable. From Table 1 above we can see that at baseline there appears to be, on average a higher systolic blood pressure (sysBP) between women and men. Women, on average, where observed to have a mean sysBP of 143.3 mmHg, and men 135.8 mmHg (a mean difference of 7.5).

# Comparing the mean age of women and men using a unpaired t-test.
t.test(sysBP ~ sex, data = d, equal.var = F)

## 
##  Welch Two Sample t-test
## 
## data:  sysBP by sex
## t = 7.0613, df = 1934.4, p-value = 2.291e-12
## alternative hypothesis: true difference in means between group Women and group Men is not equal to 0
## 95 percent confidence interval:
##  5.355001 9.473401
## sample estimates:
## mean in group Women   mean in group Men 
##            143.2540            135.8398

The estimated confidence interval around the difference between women and men, 5.4 to 9.5 mmHg does not contain the null (0), and the t statistic (7.1), with the df of 1934.4 is beyond the 0.001 upper tail. Indicating that on less than 1 in 1000 occasions of repeat sampling from a null population, in which women and men on average have a similar mean sysBP, would you observe a difference of 7.5 mmHg. An example of how we would repoprt this in a manuscript could be:

Among study participants women were observed to have, on average, a higher systolic blood pressure (mean difference 7.5 mmHg, 95% CI 5.4 to 9.5 mmHg, p < 0.001), when compared to men.

Comparing the frequency of diabetes between women and men.

# changing the reference group of diabetes from no/yes to yes/no
d$diabetes <- relevel(d$diabetes, ref = 'Yes')

# simply frequency table
table(d$sex, d$diabetes)

##        
##          Yes   No
##   Women   44 1133
##   Men     37  809

# more complex table using Epi packages stat.table
stat.table(list(sex,diabetes), contents = list(N = count(),
                                P = percent(diabetes)),
                                margin = 1, # add totals
                                data = d)

##  -------------------------------- 
##         --------diabetes--------- 
##  sex         Yes      No   Total  
##  -------------------------------- 
##  Women        44    1133    1177  
##              3.7    96.3   100.0  
##                                   
##  Men          37     809     846  
##              4.4    95.6   100.0  
##                                   
##                                   
##  Total        81    1942    2023  
##              4.0    96.0   100.0  
##  --------------------------------

Pearson’s chi-squared test is base on comparing the difference between the observed (O) table cell frequencies and what would be expected (E) by chance. The specific statistic being the sum of these differences:

\(\chi^2 = \sum{\frac{(O-E)^2}{E}}\)

Commonly the approach of squaring an subtraction of two values is seen in statistics, this is a trick statisticians use to remove negative values! And, the expected values of each given cell is estimated using the row and column totals:

\(E = \frac{(total_{row} \times total_{column})}{Total_N}\)

So the expected for the women and diabetes yes is:

(81*1177)/2023

## [1] 47.12654

And this would be repeated for all cells:

tab.O <- table(d$sex, d$diabetes)
tab.O

##        
##          Yes   No
##   Women   44 1133
##   Men     37  809

tab.E <- matrix(c(81*1177/2023, 1942*1177/2023,81*846/2023,1942*846/2023), byrow = T, nrow = 2)
tab.E

##          [,1]      [,2]
## [1,] 47.12654 1129.8735
## [2,] 33.87346  812.1265

chi2 <- sum((tab.O - tab.E)^2/tab.E)
chi2

## [1] 0.5166969

# associated p-value
pchisq(chi2, df = 1, lower.tail = F)

## [1] 0.4722541

chisq.test(d$diabetes, d$sex, correct = F) # no Yates correction (0.5 added to each cell)

## 
##  Pearson's Chi-squared test
## 
## data:  d$diabetes and d$sex
## X-squared = 0.5167, df = 1, p-value = 0.4723

In the case which one or more cells have an expected values < 5 the Fisher’s exact test is recommended.

fisher.test(d$diabetes, d$sex)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  d$diabetes and d$sex
## p-value = 0.4916
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.530572 1.365729
## sample estimates:
## odds ratio 
##  0.8491938

Workshop 3. Inferential statistics and combining data from multiple studies in a meta-analysis.

Research question - The association between baseline diabetes and ten-year risk of CHD.

Outcome table - the outcome of interest is 10-year incidence of CHD.

# Now - what of our research questions to explore the association between
# diabetes and ten-year risk of CHD?
# summary function from Hmisc package.


# create new variable CHD yes/no
d$TenYearCHD.yes <- factor(d$TenYearCHD, levels = 0:1, labels = c('No','Yes'))
d$TenYearCHD.yes <- relevel(d$TenYearCHD.yes, ref = 'Yes')
d$diabetes <- relevel(d$diabetes, ref = 'No')

print(summary(TenYearCHD.yes ~ diabetes +
                               age +
                               sex +
                               education +
                               BMI +
                               prevalentHyp +
                               currentSmoker +
                               totChol.mmols +
                               follow.up,
                               method = 'reverse',
                               overall = T,
                               data = d,
                               test = T,
                                      long = T), prmsd = T, 
                                      digits = 2,
                                      prtest = 'P')

## 
## 
## Descriptive Statistics by TenYearCHD.yes
## 
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |                                  |N   |Yes                        |No                         |Combined                   |P-value          |
## |                                  |    |(N=454)                    |(N=1569)                   |(N=2023)                   |                 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |diabetes : Yes                    |2023|         7%  (  32)        |         3%  (  49)        |         4%  (  81)        |           <0.001|
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |age                               |2023|54.0/58.0/63.0  58.3+/- 5.1|53.0/56.0/61.0  57.0+/- 4.8|53.0/57.0/61.0  57.3+/- 4.9|           <0.001|
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |sex : Men                         |2023|        51%  ( 233)        |        39%  ( 613)        |        42%  ( 846)        |           <0.001|
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |education : Some high school      |1967|        57%  ( 254)        |        50%  ( 769)        |        52%  (1023)        |           0.021 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |    high school/GED               |    |        18%  (  81)        |        25%  ( 375)        |        23%  ( 456)        |                 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |    some college/vocational school|    |        14%  (  62)        |        15%  ( 234)        |        15%  ( 296)        |                 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |    college                       |    |        10%  (  46)        |        10%  ( 146)        |        10%  ( 192)        |                 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |BMI                               |2013|23.9/26.3/29.1  26.7+/- 4.5|23.6/25.8/28.4  26.2+/- 4.0|23.7/25.9/28.6  26.3+/- 4.1|           0.016 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |prevalentHyp : Yes                |2023|        58%  ( 263)        |        40%  ( 629)        |        44%  ( 892)        |           <0.001|
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |currentSmoker : Yes               |2023|        43%  ( 195)        |        38%  ( 589)        |        39%  ( 784)        |           0.037 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |totChol.mmols                     |1994|12.1/13.5/15.2  13.8+/- 2.6|12.2/13.6/15.2  13.8+/- 2.4|12.2/13.6/15.2  13.8+/- 2.5|            0.71 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+
## |follow.up                         |2023|   117/198/280  195+/- 99  |    99/186/275  186+/-102  |   102/188/276  188+/-102  |           0.077 |
## +----------------------------------+----+---------------------------+---------------------------+---------------------------+-----------------+

Table 2. Risk factors for incidence of CHD.

	CHD=yes (n=454)	CHD=no (n=1,569)	Combined (N=2,023)	p-value
Diabetes, n (%)	32 (4)	49 (3)	81 (4)	<0.001
Age (years), mean (SD)	58.3 (5.1)	57.0 (4.8)	57.3 (4.9)	<0.001
Males, n (%)	233 (51)	613 (39)	846 (42)	<0.001
Education, n (%)				0.021
- Some high school	254 (57)	769 (50)	1,023 (52)
- high school/GED	81 (18)	375 (25)	456 (23)
- some college/vocational school	62 (14)	234 (15)	295 (15)
- college	46 (10)	146 (10)	192 (10)
BMI (\(kg/m^2\)), mean (SD)	26.7 (4.5)	26.2 (4.0)	26.4 (4.1)	0.016
Hypertension, n (%)	263 (58)	629 (40)	892 (44)	<0.001
Current smoker, n (%)	195 (43)	589 (38)	784 (39)	0.037
Total chol (mmol/L), mean (SD)	14.3 (2.5)	13.1 (2.3)	13.8 (2.5)	<0.001
Follow-up (mths), median (IQR)	198 (117, 280)	186 (99, 275)	188 (102, 276)	0.077

Initially let us explore the unadjusted assocation between diabetes at baseline and risk of CHD.

d$diabetes <- relevel(d$diabetes, ref = 'Yes')
stat.table(diabetes, contents = list(N = sum(n),
                                    CHD = sum(TenYearCHD),
                                    P = ratio(TenYearCHD,n),
                                    pyrs = sum(follow.up),
                                    Rate.1000pyrs = ratio(TenYearCHD, follow.up,1000)),
                                                         margin = 1, # add totals
                                                         data = d)

##  ----------------------------------------------------------- 
##  diabetes         N     CHD       P      pyrs Rate.1000pyrs  
##  ----------------------------------------------------------- 
##  Yes          81.00   32.00    0.40  15133.09          2.11  
##  No         1942.00  422.00    0.22 364969.41          1.16  
##                                                              
##  Total      2023.00  454.00    0.22 380102.50          1.19  
##  -----------------------------------------------------------

tab.O <- table(d$diabetes, d$TenYearCHD.yes)
tab.O

##      
##        Yes   No
##   Yes   32   49
##   No   422 1520

tab.E <- matrix(c(81*454/2023, 81*1569/2023,1942*454/2023,1942*1569/2023), byrow = T, nrow = 2)
tab.E

##           [,1]       [,2]
## [1,]  18.17795   62.82205
## [2,] 435.82205 1506.17795

chi2 <- sum((tab.O - tab.E)^2/tab.E)
chi2

## [1] 14.11625

# associated p-value
pchisq(chi2, df = 1, lower.tail = F)

## [1] 0.0001718528

chisq.test(d$diabetes, d$TenYearCHD.yes, correct = F) # no Yates correction (0.5 added to each cell)

## 
##  Pearson's Chi-squared test
## 
## data:  d$diabetes and d$TenYearCHD.yes
## X-squared = 14.116, df = 1, p-value = 0.0001719

# Rate Ratio
epi.2by2(dat = c(32,81-32,422,1942-422), method = 'cohort.count', units = 100)

##              Outcome +    Outcome -      Total        Inc risk *        Odds
## Exposed +           32           49         81              39.5       0.653
## Exposed -          422         1520       1942              21.7       0.278
## Total              454         1569       2023              22.4       0.289
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio                                 1.82 (1.37, 2.41)
## Odds ratio                                     2.35 (1.49, 3.72)
## Attrib risk in the exposed *                   17.78 (6.97, 28.58)
## Attrib fraction in the exposed (%)            45.00 (27.05, 58.53)
## Attrib risk in the population *                0.71 (-1.87, 3.29)
## Attrib fraction in the population (%)         3.17 (1.11, 5.19)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 14.116 Pr>chi2 = <0.001
## Fisher exact test that OR = 1: Pr>chi2 = <0.001
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units

# Incidence Rate Ratio
epi.2by2(dat = c(32,15133,422,364969), method = 'cohort.time', units = 1000)

##              Outcome +    Time at risk        Inc rate *
## Exposed +           32           15133              2.11
## Exposed -          422          364969              1.16
## Total              454          380102              1.19
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc rate ratio                                 1.83 (1.23, 2.62)
## Attrib rate in the exposed *                   0.96 (0.22, 1.70)
## Attrib fraction in the exposed (%)            45.32 (19.02, 61.86)
## Attrib rate in the population *                0.04 (-0.12, 0.19)
## Attrib fraction in the population (%)         3.19 (2.86, 3.54)
## -------------------------------------------------------------------
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 1000 units of population time at risk

Adjusting for the effect of age and sex

# Create 10-year age groups
d$agegrp <- cut(d$age, breaks = c(49,54,59,64,70), labels = c('50-54','55-59','60-64','65-70'))

d$diabetes.n <- ifelse(d$diabetes == 'Yes',1,0)
stat.table(agegrp, contents = list(N = sum(n),
                                    CHD = sum(TenYearCHD),
                                    P = ratio(TenYearCHD,n),
                                    pyrs = sum(follow.up),
                                    Rate.1000pyrs = ratio(TenYearCHD, follow.up,1000)),
                                                         margin = 1, # add totals
                                                         data = d)

##  --------------------------------------------------------- 
##  agegrp         N     CHD       P      pyrs Rate.1000pyrs  
##  --------------------------------------------------------- 
##  50-54     706.00  125.00    0.18 129037.33          0.97  
##  55-59     627.00  138.00    0.22 118161.33          1.17  
##  60-64     523.00  129.00    0.25  99498.12          1.30  
##  65-70     167.00   62.00    0.37  33405.71          1.86  
##                                                            
##  Total    2023.00  454.00    0.22 380102.50          1.19  
##  ---------------------------------------------------------

stat.table(agegrp, contents = list(N = sum(n),
                                    DM = sum(diabetes.n),
                                    P = ratio(diabetes.n,n)),
                                                         margin = 1, # add totals
                                                         data = d)

##  --------------------------------- 
##  agegrp         N      DM       P  
##  --------------------------------- 
##  50-54     706.00   23.00    0.03  
##  55-59     627.00   21.00    0.03  
##  60-64     523.00   26.00    0.05  
##  65-70     167.00   11.00    0.07  
##                                    
##  Total    2023.00   81.00    0.04  
##  ---------------------------------

stat.table(sex, contents = list(N = sum(n),
                                    CHD = sum(TenYearCHD),
                                    P = ratio(TenYearCHD,n),
                                    pyrs = sum(follow.up),
                                    Rate.1000pyrs = ratio(TenYearCHD, follow.up,1000)),
                                                         margin = 1, # add totals
                                                         data = d)

##  -------------------------------------------------------- 
##  sex           N     CHD       P      pyrs Rate.1000pyrs  
##  -------------------------------------------------------- 
##  Women   1177.00  221.00    0.19 219324.06          1.01  
##  Men      846.00  233.00    0.28 160778.44          1.45  
##                                                           
##  Total   2023.00  454.00    0.22 380102.50          1.19  
##  --------------------------------------------------------

stat.table(currentSmoker, contents = list(N = sum(n),
                                    CHD = sum(TenYearCHD),
                                    P = ratio(TenYearCHD,n),
                                    pyrs = sum(follow.up),
                                    Rate.1000pyrs = ratio(TenYearCHD, follow.up,1000)),
                                                         margin = 1, # add totals
                                                         data = d)

##  ---------------------------------------------------------------- 
##  currentSmoker         N     CHD       P      pyrs Rate.1000pyrs  
##  ---------------------------------------------------------------- 
##  No              1239.00  259.00    0.21 234304.02          1.11  
##  Yes              784.00  195.00    0.25 145798.47          1.34  
##                                                                   
##  Total           2023.00  454.00    0.22 380102.50          1.19  
##  ----------------------------------------------------------------

stat.table(currentSmoker, contents = list(N = sum(n),
                                    DM = sum(diabetes.n),
                                    P = ratio(diabetes.n,n)),
                                                         margin = 1, # add totals
                                                         data = d)

##  ---------------------------------------- 
##  currentSmoker         N      DM       P  
##  ---------------------------------------- 
##  No              1239.00   60.00    0.05  
##  Yes              784.00   21.00    0.03  
##                                           
##  Total           2023.00   81.00    0.04  
##  ----------------------------------------

stat.table(prevalentHyp, contents = list(N = sum(n),
                                    CHD = sum(TenYearCHD),
                                    P = ratio(TenYearCHD,n),
                                    pyrs = sum(follow.up),
                                    Rate.1000pyrs = ratio(TenYearCHD, follow.up,1000)),
                                                         margin = 1, # add totals
                                                         data = d)

##  --------------------------------------------------------------- 
##  prevalentHyp         N     CHD       P      pyrs Rate.1000pyrs  
##  --------------------------------------------------------------- 
##  No             1131.00  191.00    0.17 212405.59          0.90  
##  Yes             892.00  263.00    0.29 167696.91          1.57  
##                                                                  
##  Total          2023.00  454.00    0.22 380102.50          1.19  
##  ---------------------------------------------------------------

stat.table(prevalentHyp, contents = list(N = sum(n),
                                    DM = sum(diabetes.n),
                                    P = ratio(diabetes.n,n)),
                                                         margin = 1, # add totals
                                                         data = d)

##  --------------------------------------- 
##  prevalentHyp         N      DM       P  
##  --------------------------------------- 
##  No             1131.00   33.00    0.03  
##  Yes             892.00   48.00    0.05  
##                                          
##  Total          2023.00   81.00    0.04  
##  ---------------------------------------

# Adjusting for both agegrp and sex using logistic regression
# rate ratio
d$diabetes <- relevel(d$diabetes, ref = 'No')

lrm.adj <- glm(cbind(TenYearCHD,n) ~ diabetes +
                                     agegrp +
                                     sex, family = binomial(link=log),
                                     data = d)

logistic.display(lrm.adj)

## 
## Logistic regression predicting cbind(TenYearCHD, n) 
##  
##                     crude OR(95%CI)   adj. OR(95%CI)    P(Wald's test) P(LR-test)
## diabetes: Yes vs No 1.82 (1.19,2.77)  1.48 (1.09,2)     0.011          < 0.001   
##                                                                                  
## agegrp: ref.=50-54                                                     < 0.001   
##    55-59            1.24 (0.95,1.62)  1.21 (0.97,1.51)  0.092                    
##    60-64            1.39 (1.06,1.83)  1.31 (1.05,1.64)  0.017                    
##    65-70            2.1 (1.48,2.97)   1.77 (1.36,2.31)  < 0.001                  
##                                                                                  
## sex: Men vs Women   1.47 (1.2,1.8)    1.36 (1.16,1.61)  < 0.001        < 0.001   
##                                                                                  
## Log-likelihood = -846.623
## No. of observations = 2023
## AIC value = 1705.2459

Overall model fit and potential confounder.

lrm.adj.RR <- glm(cbind(TenYearCHD,n) ~ diabetes +
                                        agegrp +
                                        sex +
                                        currentSmoker +
                                        prevalentHyp, family = binomial(link=log),
                                        data = d)

logistic.display(lrm.adj.RR)

## 
## Logistic regression predicting cbind(TenYearCHD, n) 
##  
##                          crude OR(95%CI)   adj. OR(95%CI)    P(Wald's test) P(LR-test)
## diabetes: Yes vs No      1.82 (1.19,2.77)  1.44 (1.06,1.95)  0.019          < 0.001   
##                                                                                       
## agegrp: ref.=50-54                                                          < 0.001   
##    55-59                 1.24 (0.95,1.62)  1.19 (0.95,1.48)  0.122                    
##    60-64                 1.39 (1.06,1.83)  1.27 (1.01,1.58)  0.038                    
##    65-70                 2.1 (1.48,2.97)   1.64 (1.26,2.15)  < 0.001                  
##                                                                                       
## sex: Men vs Women        1.47 (1.2,1.8)    1.38 (1.16,1.64)  < 0.001        < 0.001   
##                                                                                       
## currentSmoker: Yes vs No 1.19 (0.97,1.46)  1.15 (0.96,1.36)  0.124          < 0.001   
##                                                                                       
## prevalentHyp: Yes vs No  1.75 (1.42,2.15)  1.55 (1.31,1.84)  < 0.001        < 0.001   
##                                                                                       
## Log-likelihood = -832.435
## No. of observations = 2023
## AIC value = 1680.8701

adj.rr <- logistic.display(lrm.adj.RR)
adj.rr

## 
## Logistic regression predicting cbind(TenYearCHD, n) 
##  
##                          crude OR(95%CI)   adj. OR(95%CI)    P(Wald's test) P(LR-test)
## diabetes: Yes vs No      1.82 (1.19,2.77)  1.44 (1.06,1.95)  0.019          < 0.001   
##                                                                                       
## agegrp: ref.=50-54                                                          < 0.001   
##    55-59                 1.24 (0.95,1.62)  1.19 (0.95,1.48)  0.122                    
##    60-64                 1.39 (1.06,1.83)  1.27 (1.01,1.58)  0.038                    
##    65-70                 2.1 (1.48,2.97)   1.64 (1.26,2.15)  < 0.001                  
##                                                                                       
## sex: Men vs Women        1.47 (1.2,1.8)    1.38 (1.16,1.64)  < 0.001        < 0.001   
##                                                                                       
## currentSmoker: Yes vs No 1.19 (0.97,1.46)  1.15 (0.96,1.36)  0.124          < 0.001   
##                                                                                       
## prevalentHyp: Yes vs No  1.75 (1.42,2.15)  1.55 (1.31,1.84)  < 0.001        < 0.001   
##                                                                                       
## Log-likelihood = -832.435
## No. of observations = 2023
## AIC value = 1680.8701

print(adj.rr$table)

##                          crude OR(95%CI)     adj. OR(95%CI)      P(Wald's test) P(LR-test)
## diabetes: Yes vs No      "1.82 (1.19,2.77) " "1.44 (1.06,1.95) " "0.019"        "< 0.001" 
##                          ""                  ""                  ""             ""        
## agegrp: ref.=50-54       ""                  ""                  ""             "< 0.001" 
##    55-59                 "1.24 (0.95,1.62) " "1.19 (0.95,1.48) " "0.122"        ""        
##    60-64                 "1.39 (1.06,1.83) " "1.27 (1.01,1.58) " "0.038"        ""        
##    65-70                 "2.1 (1.48,2.97) "  "1.64 (1.26,2.15) " "< 0.001"      ""        
##                          ""                  ""                  ""             ""        
## sex: Men vs Women        "1.47 (1.2,1.8) "   "1.38 (1.16,1.64) " "< 0.001"      "< 0.001" 
##                          ""                  ""                  ""             ""        
## currentSmoker: Yes vs No "1.19 (0.97,1.46) " "1.15 (0.96,1.36) " "0.124"        "< 0.001" 
##                          ""                  ""                  ""             ""        
## prevalentHyp: Yes vs No  "1.75 (1.42,2.15) " "1.55 (1.31,1.84) " "< 0.001"      "< 0.001" 
##                          ""                  ""                  ""             ""

write.csv(adj.rr$table, file = 'TableTwo.csv')

	CHD/N	crude RR (95% CI)	adj RR (95% CI)	\(p-value^1\)
Diabetes
No	422/1942 (22%)	1.0 (ref)	1.0 (ref)
Yes	32/81 (40%)	1.82 (1.19,2.77)	1.48 (1.09,2)	0.011
Age group
50-54	125/706 (18%)	1.0 (ref)	1.0 (ref)
55-59	138/627 (22%)	1.24 (0.95,1.62)	1.21 (0.97,1.51)	0.092
60-64	129/523 (25%)	1.39 (1.06,1.83)	1.31 (1.05,1.64)	0.017
65-70	62/167 (37%)	2.1 (1.48,2.97)	1.77 (1.36,2.31)	< 0.001
Sex
Women	621/1177 (19%)	1.0 (ref)	1.0 (ref)
Men	233/846 (28%)	1.47 (1.2,1.8)	1.36 (1.16,1.61)	< 0.001
Current smoker
No	259/1239 (21%)	1.0 (ref)	1.0 (ref)
Yes	195/784 (25%)	1.19 (0.97,1.46)	1.15 (0.96,1.36)	0.124
Hypertension
No	191/1131 (17%)	1.0 (ref)	1.0 (ref)
Yes	1263/892 (29%)	1.75 (1.42,2.15)	1.55 (1.31,1.84)	< 0.001

1^adjusted p-value

Meta-analysis - combining the results of a number of trials.

Using trials involving the application of targeted therapeutic hypothermia to out-of-hospital cardiac arrest patients, we can present the approach of combining trials into a meta-analysis to assess the current state of published literature.

library(meta)

d.3 <- data.frame(study = c('Study 1','Study 2','Study 3'),
                  event.e = c(5,20,15),
                  n.e = c(100,2000,300),
                  event.c = c(10,40,30),
                  n.c = c(100,2000,300))

meta.3 <- metabin(event.e, n.e,
                    event.c, n.c,
                    sm = 'RR',
                    data = d.3,
                    studlab = study)

meta.3

## Number of studies combined: k = 3
## Number of observations: o = 4800
## Number of events: e = 120
## 
##                          RR           95%-CI     z p-value
## Common effect model  0.5000 [0.3447; 0.7252] -3.65  0.0003
## Random effects model 0.5000 [0.3448; 0.7251] -3.65  0.0003
## 
## Quantifying heterogeneity:
##  tau^2 = 0; tau = 0; I^2 = 0.0% [0.0%; 89.6%]; H = 1.00 [1.00; 3.10]
## 
## Test of heterogeneity:
##     Q d.f. p-value
##  0.00    2  1.0000
## 
## Details on meta-analytical method:
## - Mantel-Haenszel method
## - Restricted maximum-likelihood estimator for tau^2

forest(meta.3)

# meta-analysis - outcome of interest mortality
dta <- data.frame(study = c('Look (2018)',  
                            'Nielsen (2013)',  
                            'Laurent (2005)', 
                            'Hachimi-Idrissi (2005)',
                            'Bernard  (2002)',  
                            'HACA (2002)',
                            'Hachimi-Idrissi (2001)',
                            'Tiainen (2007)',
                            'Bernard (1997)',
                            'Dankiewicz (2021)',
                            'Lascarrou (2019)',
                            'Dankiewicz (2021)',
                            'Lascarrou (2019)',
                            'Nielsen (2013)'),
                  location = rep('Out of hospital',14),
                  year = c(2018,2013,2005,2005,2002,2002,2001,2007,1997,2021,2019,2021,2019,2013),
                  n = c(87,939,42,28,77,275,30,70,44,1850,581,939,1850,581),
                  event.e = c(27,226,15,6,22,56,13,8,10,465,231,465,231,226),
                  n.e = c(45,473,22,14,43,137,16,36,22,925,284,925,284,473),
                  event.c = c(33,220,11,8,23,76,13,12,17,446,247,446,247,220),
                  n.c = c(42,466,20,14,34,138,14,34,22,925,297,925,297,466),
                  target.temp = c(rep(33,5),32,34,33,33,33,33,33,33,33),
                  comparison = c('No-TTM','TTM-36','TTM-37','TTM-37',
                                 'No-TTM','No-TTM','Tx>38','Tx>38',
                                 'No-TTM','Tx>37.8','TTM-37','TTM-36','Tx>37.8','TTM-37'),
                  design = c('RCT','RCT','RCT','RCT','RCT',
                             'RCT','RCT','RCT','Non-R','RCT','RCT','RCT','RCT','RCT'),
                  control.target = c('Trials without explicit normothermia / fever avoidance',
                                     'Trials with explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials with explicit normothermia / fever avoidance',
                                     'Trials with explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance'))
dta <- dta[ order(dta$year),]
dta$id <- 1:14
dta$ratio.e <- paste(dta$event.e,'/',dta$n.e, sep = '')
dta$p.e <- paste('(', round(dta$event.e/dta$n.e,2)*100,'%)', sep = '') 
dta$ratio.c <- paste(dta$event.c,'/',dta$n.c, sep = '')
dta$p.c <- paste('(', round(dta$event.c/dta$n.c,2)*100,'%)', sep = '') 
dta$tx <- paste(dta$ratio.e,dta$p.e, sep = ' ') 
dta$c <- paste(dta$ratio.c,dta$p.c, sep = ' ')   

meta.all <- metabin(event.e, n.e,
                    event.c, n.c,
                    sm = 'RR',
                    data = dta,
                    studlab = study,
                    byvar = control.target,
                    subset = id %in% c(1:8,11,13))

# ---------------------------- Forest plot ---------------------------------------------

par(cex = 1.15) 

forest(meta.all, leftcols = c('studlab',
                              'tx','c'),
                 leftlabs = c('Study (Year)','Hypothermia\n (n/N)(%)',
                                             'Control\n (n/N)(%)'),
                 bylab = '',
                 label.right = '',
                 label.left = '',
                 col.square = 'blue',
                 col.diamond = 'blue',
                 overall = F,
                 col.by = 'black')

Student. The Probable error of a mean. Biometrika vol 6, no. 1, 1908 pp1-25↩︎