Introduction to Biostatistics using R: Summer edition, 2023

Steve Frost UoW

Summer, 2023.

Introduction

Welcome to an “Introduction to biostatistics using R: summer edition 2023”.

The six modules will be delivered over three weekly Wednesday morning’s seminars, between 9am and 12 midday, on the the 1st, and the 8th of February 2023. Venue: Building 41, room 153.

Week 1. Wednesday February 1st, 2023 (9am to 12 midday).

Module 1. Data good care of your data, reproducible research.

This module will introduce you to good data management practices, and the concept of reproducible research. Concepts such as data security, archiving, data cleaning, and documenting a clear audit trail of these processes will be covered.

Module 2. Preparing data for analysis and presenting descriptive statistics.

This module provides you the skills to prepare data for analysis, including various formats of data, labelling discrete levels of variables, and creating new variables. This module will also explore the presentation of descriptive statistics. When is the median and interquartile range preferred over the mean and standard deviation? And, the presentation of descriptive statistics in a manuscript being prepared for peer review.

Reading data into R.

Data sets can be loaded into R various ways. The simplist, is via a spreadsheet, such as a comma separated volume (csv). Using the ‘read.csv’ function. However, a working directory needs to be set-up.

# set working directly
setwd('~/OneDrive - University of Wollongong/Data/')
## setwd("C:/Users/25105517/OneDrive - University of Wollongong/Data/")

# read dataset into R, naming the data set dta.
framingham <- read.csv(file = 'framingham.csv', header = T, sep = ',')
# check the names of variable.
names(framingham)

##  [1] "randid"      "sex"         "totchol"     "age"         "sysbp"      
##  [6] "diabp"       "cursmoke"    "cigpday"     "bmi"         "diabetes"   
## [11] "bpmeds"      "heartrte"    "glucose"     "educ"        "prevchd"    
## [16] "prevap"      "prevmi"      "prevstrk"    "prevhyp"     "time"       
## [21] "period"      "hdlc"        "ldlc"        "death"       "angina"     
## [26] "hospmi"      "mi_fchd"     "anychd"      "stroke"      "cvd"        
## [31] "hyperten"    "timeap"      "timemi"      "timemifc"    "timechd"    
## [36] "timestrk"    "timecvd"     "timedth"     "timehyp"     "start.date" 
## [41] "stroke.date" "death.date"

# Look at the first ten rows.
# head(framingham)

# Look at the last ten rows.
# tail(framingham)

# View help file for dataset
# install riskCommunicator package which has the Framingham dataset.
## install.packages('riskCommunicator')
library(riskCommunicator)
## help(framingham)

# Create age groups with labels
framingham$age.group <- cut(framingham$age, breaks = c(30,39,49,59,70),
                                            labels = c('30-39','40-49',
                                                        '50-59','60-69'))
# Check lower and upper values of groups
tapply(framingham$age, framingham$age.group, min)

## 30-39 40-49 50-59 60-69 
##    32    40    50    60

tapply(framingham$age, framingham$age.group, max)

## 30-39 40-49 50-59 60-69 
##    39    49    59    70

# Convert md/dL to mmols (1 mg/dL = 18 mmols/L)
framingham$glucose.mmols <- framingham$glucose/18
framingham$totchol.mmols <- framingham$totchol/18

# Label factors with numeric values
table(framingham$sex)

## 
##    1    2 
## 5022 6605

framingham$sex <- factor(framingham$sex, level = c(1,2), 
                                         labels = c('Male','Female'))
table(framingham$sex)

## 
##   Male Female 
##   5022   6605

table(framingham$diabetes)

## 
##     0     1 
## 11097   530

framingham$diabetes <- factor(framingham$diabetes, level = c(0,1), 
                         labels = c('No','Yes'))
table(framingham$diabetes)

## 
##    No   Yes 
## 11097   530

Our specific hypothesis is related to confirming the relationship between blood pressure and the risk of stroke. Therefore, we will subset the data set.

# remove participants with a prior stroke or tx 
# for BP and only use first visit data.
names(framingham)

##  [1] "randid"        "sex"           "totchol"       "age"          
##  [5] "sysbp"         "diabp"         "cursmoke"      "cigpday"      
##  [9] "bmi"           "diabetes"      "bpmeds"        "heartrte"     
## [13] "glucose"       "educ"          "prevchd"       "prevap"       
## [17] "prevmi"        "prevstrk"      "prevhyp"       "time"         
## [21] "period"        "hdlc"          "ldlc"          "death"        
## [25] "angina"        "hospmi"        "mi_fchd"       "anychd"       
## [29] "stroke"        "cvd"           "hyperten"      "timeap"       
## [33] "timemi"        "timemifc"      "timechd"       "timestrk"     
## [37] "timecvd"       "timedth"       "timehyp"       "start.date"   
## [41] "stroke.date"   "death.date"    "age.group"     "glucose.mmols"
## [45] "totchol.mmols"

d <- subset(framingham,  bpmeds == 0 &
                         prevstrk == 0 &
                         period == 1, select = c(randid,age,totchol.mmols,
                                                 diabp,glucose.mmols,bmi,
                                                 sex,sysbp,cursmoke,educ,
                                                 cigpday,diabetes,stroke,
                                                 timestrk,death,timedth,
                                                 prevhyp,age.group))
# add n = 1, for counting
d$n <- 1
# create follow-up in years
d$timestrk.yrs <- d$timestrk/365.25
d$stroke.yes <- factor(d$stroke, levels= 0:1, labels = c('No','Yes'))
table(d$stroke.yes,d$stroke)

##      
##          0    1
##   No  3863    0
##   Yes    0  343

d$HT.yes <- factor(d$prevhyp, levels= 0:1, labels = c('No','Yes'))
table(d$HT.yes,d$prevhyp)

##      
##          0    1
##   No  2963    0
##   Yes    0 1243

# variables labels and unit
library(Hmisc)
d <- upData(d, labels = c(age = 'Age',
                          totchol.mmols = 'Total cholesterol',
                          diabp = 'Diasolic BP',
                          glucose.mmols = 'BGL',
                          sex = 'Sex',
                          sysbp = 'Systolic BP',
                          cursmoke = 'Current smoker',
                          educ = 'Education level',
                          stroke = 'Stroke',
                          timestrk = 'Follow-up',
                          timestrk.yrs = 'Follow-up'),
             units = c(age = 'yrs',
                       totchol.mmols = 'mmol/L',
                       glucose.mmols = 'mmol/L',
                       diabp = 'mmHg',
                       sysbp = 'mmHg',        
                       timestrk = 'days',
                       timestrk.yrs = 'yrs'))

## Input object size:    510712 bytes;   22 variables    4206 observations
## New object size: 501024 bytes;   22 variables    4206 observations

d$educ <- factor(d$educ, levels = 1:4, labels = c('4th grade or less',
                                                  '5th, 6th, or 7th grades',
                                                  'grade school graduate no high school',
                                                  'high school, did not graduate'))


## The Hmisc package will create Table 1. Including continuous and discrete variables. Frequencies are presented as "% (n)" and ## continuous variables as "p25/p50/p75 mean +/- SD".  

# Summary table using Hmisc package
# saving summary table a csv file

table.1 <- print(summary(sex ~ age +
                               sysbp +
                               bmi +
                               glucose.mmols +
                               diabetes +
                               HT.yes +
                               cursmoke +
                               educ +
                               timestrk.yrs,
                               method = 'reverse',
                               overall = F,
                               test = T,
                               data = d), prmsd = T, 
                                          digits = 3,
                                          prtest = 'P')

## 
## 
## Descriptive Statistics by Sex
## 
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |                                        |N   |Male                            |Female                          |P-value            |
## |                                        |    |(N=1868)                        |(N=2338)                        |                   |
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |Age [yrs]                               |4206|42.00/49.00/57.00  49.61+/- 8.70|42.00/49.00/56.00  49.62+/- 8.53|              0.808|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |Systolic BP [mmHg]                      |4206|118.0/128.0/141.0  130.9+/- 18.8|115.5/127.5/144.0  132.3+/- 23.2|              0.894|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |bmi                                     |4190|23.97/26.05/28.30  26.16+/- 3.40|22.53/24.71/27.64  25.47+/- 4.47|             <0.001|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |BGL [mmol/L]                            |3822|   3.94/4.33/4.83  4.58+/-1.39  |   4.00/4.33/4.78  4.53+/-1.23  |              0.817|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |diabetes : Yes                          |4206|            3%  (  56)          |            2%  (  51)          |              0.095|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |HT.yes : Yes                            |4206|           31%  ( 570)          |           29%  ( 673)          |              0.222|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |Current smoker                          |4206|           61%  (1135)          |           41%  ( 957)          |             <0.001|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |educ : 4th grade or less                |4096|           44%  (805)           |           40%  (909)           |             <0.001|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |    5th, 6th, or 7th grades             |    |           27%  (497)           |           32%  (724)           |                   |
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |    grade school graduate no high school|    |           13%  (228)           |           20%  (453)           |                   |
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |    high school, did not graduate       |    |           16%  (282)           |            9%  (198)           |                   |
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+
## |Follow-up [yrs]                         |4206|15.99/24.00/24.00  19.49+/- 6.59|21.09/24.00/24.00  21.02+/- 5.77|             <0.001|
## +----------------------------------------+----+--------------------------------+--------------------------------+-------------------+

write.csv(table.1, file = 'TableOne.csv', row.names = F)

Module 3. Inferential statistics 1: Comparing two or more groups, continuous and categorical outcomes.

This module will introduce the concepts of estimation and null-hypothesis statistical significance testing. In addition sample size and power calculations will be covered, and Type I and Type II error.

# SysBP and stroke
hist(d$sysbp, main = '',
              xlab = 'Systolic BP (mmHg) \n First visit')

describe(d$sysbp)

## d$sysbp : Systolic BP [mmHg] 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     4206        0      226        1    131.7    23.06    104.0    108.5 
##      .25      .50      .75      .90      .95 
##    116.6    128.0    142.5    160.0    173.0 
## 
## lowest :  83.5  85.0  85.5  90.0  92.0, highest: 230.0 232.0 235.0 243.0 295.0

boxplot(sysbp ~ stroke.yes, data = d,
        col = c('blue','red'),
        xlab = 'Stroke',
        ylab = 'SBP (mmHg)',
        las = 1)

tapply(d$sysbp, d$stroke, mean)

##        0        1 
## 130.5967 143.7114

tapply(d$sysbp, d$stroke, sd)

##        0        1 
## 20.56035 25.91658

# save a jpeg version of SBP and stroke status
## library(grDevices)
## setwd("C:/Users/25105517/OneDrive - University of Wollongong/Data/")

jpeg(file = 'FigureOne.jpg',
             width = 600, height = 600,
             units = 'px', 
             pointsize = 12,
     quality = 100)

par(family = 'sans')
boxplot(sysbp ~ stroke.yes, data = d,
        col = c('lightgray','lightgray'),
        xlab = 'Stroke',
        ylab = 'SBP (mmHg)',
        las = 1,
        ylim = c(0,300),
        frame = F,
        yaxt = 'n')
axis(2, at = seq(0,300,25), label = seq(0,300,25), las = 1)
tapply(d$sysbp, d$stroke, mean)

##        0        1 
## 130.5967 143.7114

tapply(d$sysbp, d$stroke, sd)

##        0        1 
## 20.56035 25.91658

TTest <- t.test(sysbp ~ stroke, data = d)
TTest

## 
##  Welch Two Sample t-test
## 
## data:  sysbp by stroke
## t = -9.1205, df = 381.19, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -15.94196 -10.28741
## sample estimates:
## mean in group 0 mean in group 1 
##        130.5967        143.7114

library(epiR)
d$stroke.yes <- relevel(d$stroke.yes, ref = 'Yes')
sbp.stroke <- epi.conf(data.frame(group = d$stroke.yes, val = d$sysbp), 
                                          ctype = "mean.unpaired", conf.level = 0.95 )
sbp.stroke

##        est       se    lower    upper
## 1 13.11468 1.185815 10.78986 15.43951

str(sbp.stroke)

## 'data.frame':    1 obs. of  4 variables:
##  $ est  : num 13.1
##  $ se   : num 1.19
##  $ lower: num 10.8
##  $ upper: num 15.4

text(1.5, 40, paste('mean difference (mmHg) = ', round(sbp.stroke$est,1),
                    ' (95%CI ', round(sbp.stroke$lower,1),' to ',
                                round(sbp.stroke$upper,1),', p < 0.001)', sep = ''), cex = 0.75)
     

dev.off()

## quartz_off_screen 
##                 2

Null Hypothesis Significance Testing.

\(H_0\) Mean systolic BP among non-stroke \(=\) Mean systolic BP among stroke.

\(H_i\) Mean systolic BP among non-stroke \(\neq\) Mean systolic BP among stroke.

Then a critical value is set, say from a z-statistic (i.e 1.96, < 2.5% tail)

# SHADE TAILS OF n-DISTRIBUTED PLOTS
z <- pretty(c(-3,3), 100)
ht <- dnorm(z)
data <- data.frame(z = z, ht = ht)
zc <- 1.96
t <- subset(data, z > zc)
plot(data, type = "l", 
           axes = T,
           xlab = '',
           ylab = '')

lines(data)
polygon(c(rev(t$z), t$z),c(rep(0, nrow(t)), t$ht), col = 'grey', 
                                                      border = NA)
polygon(c(rev(t$z*-1), t$z*-1), c(rep(0, nrow(t)), t$ht), 
                                        col = 'grey', border = NA)
                                        

abline(v = 0, lty = 2)
abline(v = c(-1,1), lty = 2, col = 'grey') 
text(-0.75, 0.02, '-1 SD')
text(0.75, 0.02, '+1 SD')

text(-1.75, 0.02, '-1.96 SD')
text(1.75, 0.02, '+1.96 SD')

mean(d$sysbp[d$stroke == 1]) - mean(d$sysbp[d$stroke == 0])

## [1] 13.11468

t.test(sysbp ~ stroke, data = d)

## 
##  Welch Two Sample t-test
## 
## data:  sysbp by stroke
## t = -9.1205, df = 381.19, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -15.94196 -10.28741
## sample estimates:
## mean in group 0 mean in group 1 
##        130.5967        143.7114

# EXAMPLE OF T-TEST
set.seed(12345)
n <- 3000 # large number
a <- rnorm(n, mean = 61.8, sd = 10)
b <- rnorm(n, mean = 62.7, sd = 9.5)


plot(density(a), lty = 1, col = 'blue',
     frame = F,
     main = '',
     xlab = '',
     ylab = '',
     ylim = c(0,0.075),
     lwd = 2,
     las = 1)
lines(density(b), lty = 1, col = 'black', lwd = 2)

legend(20,0.065, c('PCI - mean age 61.8 yrs (SD 10)',
                   'CABG - mean age 62.7 yrs (SD 9.5)'), 
       cex = 0.75,
       bty = 'n',
       lty = 1,
       lwd = 2,
       col = c('blue','black'))

TTest <- t.test(a,b)
TTest

## 
##  Welch Two Sample t-test
## 
## data:  a and b
## t = -3.4118, df = 5986.2, p-value = 0.0006496
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.3417023 -0.3625029
## sample estimates:
## mean of x mean of y 
##  61.76762  62.61972

str(TTest)

## List of 10
##  $ statistic  : Named num -3.41
##   ..- attr(*, "names")= chr "t"
##  $ parameter  : Named num 5986
##   ..- attr(*, "names")= chr "df"
##  $ p.value    : num 0.00065
##  $ conf.int   : num [1:2] -1.342 -0.363
##   ..- attr(*, "conf.level")= num 0.95
##  $ estimate   : Named num [1:2] 61.8 62.6
##   ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
##  $ null.value : Named num 0
##   ..- attr(*, "names")= chr "difference in means"
##  $ stderr     : num 0.25
##  $ alternative: chr "two.sided"
##  $ method     : chr "Welch Two Sample t-test"
##  $ data.name  : chr "a and b"
##  - attr(*, "class")= chr "htest"

text(90, 0.04, paste('p-value = ', round(TTest$p.value,3), sep = ''))

n <- 30 # small number
a <- rnorm(n, mean = 61.8, sd = 10)
b <- rnorm(n, mean = 62.7, sd = 9.5)


plot(density(a), lty = 1, col = 'blue',
     frame = F,
     main = '',
     xlab = '',
     ylab = '',
     ylim = c(0,0.075),
     lwd = 2,
     las = 1)
lines(density(b), lty = 1, col = 'black', lwd = 2)

legend(50,0.065, c('PCI - mean age 61.8 yrs (SD 10)',
                   'CABG - mean age 62.7 yrs (SD 9.5)'), 
       cex = 0.75,
       bty = 'n',
       lty = 1,
       lwd = 2,
       col = c('blue','black'))

TTest <- t.test(a,b)
TTest

## 
##  Welch Two Sample t-test
## 
## data:  a and b
## t = -0.28393, df = 54.736, p-value = 0.7775
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.455870  4.101878
## sample estimates:
## mean of x mean of y 
##  61.79375  62.47075

str(TTest)

## List of 10
##  $ statistic  : Named num -0.284
##   ..- attr(*, "names")= chr "t"
##  $ parameter  : Named num 54.7
##   ..- attr(*, "names")= chr "df"
##  $ p.value    : num 0.778
##  $ conf.int   : num [1:2] -5.46 4.1
##   ..- attr(*, "conf.level")= num 0.95
##  $ estimate   : Named num [1:2] 61.8 62.5
##   ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
##  $ null.value : Named num 0
##   ..- attr(*, "names")= chr "difference in means"
##  $ stderr     : num 2.38
##  $ alternative: chr "two.sided"
##  $ method     : chr "Welch Two Sample t-test"
##  $ data.name  : chr "a and b"
##  - attr(*, "class")= chr "htest"

text(80, 0.04, paste('p-value = ', round(TTest$p.value,3), sep = ''))

## Type I and Type II error

NEJM 2000 ARDSNet lower tidal volume (6 mL/kg) versus traditional 12mL/kg. The primary outome death before discharge and breathing without assitance. Results were tx group 134/432 (31.0%) versus control 171/429 (39.8%).

## setwd('~/OneDrive - University of Wollongong/Data/')
## setwd("C:/Users/25105517/OneDrive - University of Wollongong/Data/")
source("~/OneDrive - University of Wollongong/Data/epiTablesV3.R")

## 
## ------------------------------------------------------------------------------
## Epi Tables
## R epiTab Function
## Input values epiTab(a,b,c,d)
## a,b,c,d values from two-by-two table (rows exposure and columns outcome)
## -------------------------------------------------------------------------------
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention  Yes   No Total
##        Yes    482 1622  2104
##        No    1110 3211  4321
##        Total 1592 4833  6425
## 
## Rate of outcome among intervention group (p1) = 0.2291 (95% CI: 21.11, 24.7)
## 
## Rate of outcome among control group (p2) = 0.2569 (95% CI: 24.39, 26.99)
## 
## Risk Difference = -2.78 (95% CI: -5, -0.56)
## 
## Hazard Risk = 0.876 95% CI = 0.787 to 0.975
## Rate Ratio = 0.892 95% CI = 0.812 to 0.979
## Odds Ratio = 0.86 95% CI = 0.761 to 0.972
## chi2 p-value = 0.0154
## Fisher's exact p-value = 0.0163
## ------------------------------------------------------------------------
## 
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention Yes  No Total
##        Yes    95 229   324
##        No    283 400   683
##        Total 378 629  1007
## 
## Rate of outcome among intervention group (p1) = 0.2932 (95% CI: 24.34, 34.3)
## 
## Rate of outcome among control group (p2) = 0.4143 (95% CI: 37.73, 45.14)
## 
## Risk Difference = -12.11 (95% CI: -18.3, -5.92)
## 
## Hazard Risk = 0.649 95% CI = 0.514 to 0.819
## Rate Ratio = 0.708 95% CI = 0.585 to 0.857
## Odds Ratio = 0.586 95% CI = 0.441 to 0.778
## chi2 p-value = 2e-04
## Fisher's exact p-value = 2e-04
## ------------------------------------------------------------------------

epiTab(134,298,171,258)

## 
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention Yes  No Total
##        Yes   134 298   432
##        No    171 258   429
##        Total 305 556   861
## 
## Rate of outcome among intervention group (p1) = 0.3102 (95% CI: 26.64, 35.39)
## 
## Rate of outcome among control group (p2) = 0.3986 (95% CI: 35.21, 44.51)
## 
## Risk Difference = -8.84 (95% CI: -15.21, -2.47)
## 
## Hazard Risk = 0.73 95% CI = 0.582 to 0.915
## Rate Ratio = 0.778 95% CI = 0.648 to 0.934
## Odds Ratio = 0.678 95% CI = 0.512 to 0.898
## chi2 p-value = 0.0067
## Fisher's exact p-value = 0.0069
## ------------------------------------------------------------------------

An example of potentially Type II error - NEJM 2018 EOLIA ECMO for ARDS primary outome 60-day mortality tx 44/124 (35%) control 57/125 (46%)

epiTab(44,80,57,68)

## 
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention Yes  No Total
##        Yes    44  80   124
##        No     57  68   125
##        Total 101 148   249
## 
## Rate of outcome among intervention group (p1) = 0.3548 (95% CI: 26.98, 43.99)
## 
## Rate of outcome among control group (p2) = 0.456 (95% CI: 36.78, 54.42)
## 
## Risk Difference = -10.12 (95% CI: -22.31, 2.07)
## 
## Hazard Risk = 0.72 95% CI = 0.486 to 1.067
## Rate Ratio = 0.778 95% CI = 0.574 to 1.055
## Odds Ratio = 0.656 95% CI = 0.394 to 1.091
## chi2 p-value = 0.1041
## Fisher's exact p-value = 0.1217
## ------------------------------------------------------------------------

# original sample size calculation was based on 40% vs 60%
# 60-day mortality
power.prop.test(p1 = 0.4, p2 = 0.6, sig.level = 0.05, power = 0.8)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 96.92364
##              p1 = 0.4
##              p2 = 0.6
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

p1 <- 0.6
q1 <- 1 - p1
p2 <- 0.4
q2 <- 1 - p2
p.bar <- (p1 + p2)/2
q.bar <- 1 - p.bar

# Sample size
n <- 2*((p.bar * q.bar) * (1.96 + 0.84)^2)  / (p1 - p2)^2
n

## [1] 98

power.prop.test(p1 = 0.36, p2 = 0.46, sig.level = 0.05, power = 0.8)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 378.5477
##              p1 = 0.36
##              p2 = 0.46
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

power.prop.test(p1 = 0.36, p2 = 0.46, sig.level = 0.05, n = 125) # power

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 125
##              p1 = 0.36
##              p2 = 0.46
##       sig.level = 0.05
##           power = 0.3615169
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

# Age and stroke
boxplot(age ~ stroke.yes, data = d,
        col = c('blue','red'),
        xlab = 'Stroke',
        ylab = 'Age (years)',
        las = 1)

tapply(d$age, d$stroke, mean)

##        0        1 
## 49.13461 55.07289

tapply(d$age, d$stroke, sd)

##        0        1 
## 8.504941 7.768804

# Age and Systolic BP
plot(d$age, d$sysbp, pch = 16,
                     xlab = 'Age (yrs)',
                     ylab = 'Systolic BP (mmHg)',
                     main = '')

plot(d$age[d$stroke == 0], d$sysbp[d$stroke == 0], pch = 16,
                     col = 'blue',
                     xlab = 'Age (yrs)',
                     ylab = 'Systolic BP (mmHg)',
                     main = '',
                     ylim = c(80,300),
                     las = 1)
points(d$age[d$stroke == 1], d$sysbp[d$stroke == 1], pch = 16,
                             col = 'red')
abline(lsfit(d$age,d$sysbp), lty = 2, lwd = 2)

# Cigs per day and stroke
boxplot(cigpday ~ stroke.yes, data = d,
        col = c('blue','red'),
        xlab = 'Stroke',
        ylab = 'Age (years)',
        las = 1)

tapply(d$cigpday, d$stroke, mean, na.rm = T)

##        0        1 
## 9.105085 9.170088

tapply(d$cigpday, d$stroke, sd, na.rm = T)

##        0        1 
## 12.00131 12.06747

# table of rates of stroke based 
# systolic HT
library(Epi)
d$sysHT <- ifelse(d$sysbp > 140, 1, 0)
d$sysHT <- factor(d$sysHT, levels = 0:1, labels = c('No','Yes'))
print(stat.table(list(sysHT), contents = list(N = count(),
                                        'Stroke (n)' = sum(stroke),
                                        'Stroke (%)' = ratio(stroke,n,100),
                                        'Stroke (rate)' = ratio(stroke,timestrk,365.25),
                                        'follow-up (Total years)' = sum(timestrk.yrs)),
                                         data = d,
                                         margin = T),
                                         digits = 3)

##  --------------------------------------------------- 
##  sysHT          N  Stroke  Stroke  Stroke follow-up  
##                       (n)     (%)  (rate)    (Total  
##                                              years)  
##  --------------------------------------------------- 
##  No      3035.000 178.000   5.865   0.003 64518.680  
##  Yes     1171.000 165.000  14.091   0.008 21014.623  
##                                                      
##  Total   4206.000 343.000   8.155   0.004 85533.303  
##  ---------------------------------------------------

tapply(d$timestrk.yrs, d$sysHT, sum)

##       No      Yes 
## 64518.68 21014.62

# rate with 95% CI
library(epiDisplay)
stroke.tab <- data.frame(cbind(tapply(d$stroke, d$sysHT, sum), tapply(d$n, d$sysHT, sum)))
names(stroke.tab) <- c('Strokes','N')
head(stroke.tab)

##     Strokes    N
## No      178 3035
## Yes     165 1171

ci.binomial(stroke.tab$Strokes, stroke.tab$N)

##   events total probability          se exact.lower95ci exact.upper95ci
## 1    178  3035  0.05864909 0.004265079      0.05054965      0.06761068
## 2    165  1171  0.14090521 0.010167300      0.12146029      0.16216781

# incidence based on person years
stroke.tab <- data.frame(cbind(tapply(d$stroke, d$sysHT, sum), tapply(d$timestrk.yrs, d$sysHT, sum)))
names(stroke.tab) <- c('Strokes','pyrs')
head(stroke.tab)

##     Strokes     pyrs
## No      178 64518.68
## Yes     165 21014.62

ci.poisson(stroke.tab$Strokes, stroke.tab$pyrs)

##   events person.time   incidence           se exact.lower95ci exact.upper95ci
## 1    178    64518.68 0.002758891 0.0002067876     0.002367128     0.003197555
## 2    165    21014.62 0.007851676 0.0006112521     0.006697479     0.009147202

# risk of stroke
# Rate Ratio
## setwd('~/OneDrive - University of Wollongong/Data/')
## setwd("C:/Users/25105517/OneDrive - University of Wollongong/Data/")
source('~/OneDrive - University of Wollongong/Data/epiTablesV3.R')

## 
## ------------------------------------------------------------------------------
## Epi Tables
## R epiTab Function
## Input values epiTab(a,b,c,d)
## a,b,c,d values from two-by-two table (rows exposure and columns outcome)
## -------------------------------------------------------------------------------
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention  Yes   No Total
##        Yes    482 1622  2104
##        No    1110 3211  4321
##        Total 1592 4833  6425
## 
## Rate of outcome among intervention group (p1) = 0.2291 (95% CI: 21.11, 24.7)
## 
## Rate of outcome among control group (p2) = 0.2569 (95% CI: 24.39, 26.99)
## 
## Risk Difference = -2.78 (95% CI: -5, -0.56)
## 
## Hazard Risk = 0.876 95% CI = 0.787 to 0.975
## Rate Ratio = 0.892 95% CI = 0.812 to 0.979
## Odds Ratio = 0.86 95% CI = 0.761 to 0.972
## chi2 p-value = 0.0154
## Fisher's exact p-value = 0.0163
## ------------------------------------------------------------------------
## 
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention Yes  No Total
##        Yes    95 229   324
##        No    283 400   683
##        Total 378 629  1007
## 
## Rate of outcome among intervention group (p1) = 0.2932 (95% CI: 24.34, 34.3)
## 
## Rate of outcome among control group (p2) = 0.4143 (95% CI: 37.73, 45.14)
## 
## Risk Difference = -12.11 (95% CI: -18.3, -5.92)
## 
## Hazard Risk = 0.649 95% CI = 0.514 to 0.819
## Rate Ratio = 0.708 95% CI = 0.585 to 0.857
## Odds Ratio = 0.586 95% CI = 0.441 to 0.778
## chi2 p-value = 2e-04
## Fisher's exact p-value = 2e-04
## ------------------------------------------------------------------------

epiTab(165,1171-165,178,3035-178)

## 
## ------------------------------------------------------------------------
## Trial Data:
##             Outcome
## Intervention Yes   No Total
##        Yes   165 1006  1171
##        No    178 2857  3035
##        Total 343 3863  4206
## 
## Rate of outcome among intervention group (p1) = 0.1409 (95% CI: 12.1, 16.09)
## 
## Rate of outcome among control group (p2) = 0.0586 (95% CI: 5.03, 6.7)
## 
## Risk Difference = 8.23 (95% CI: 6.06, 10.39)
## 
## Hazard Risk = 2.513 95% CI = 2.033 to 3.106
## Rate Ratio = 2.403 95% CI = 1.966 to 2.937
## Odds Ratio = 2.633 95% CI = 2.105 to 3.293
## chi2 p-value = 0
## Fisher's exact p-value = 0
## ------------------------------------------------------------------------

(sum(d$stroke[d$sysHT == 'Yes']) / sum(d$n[d$sysHT == 'Yes'])) / 
(sum(d$stroke[d$sysHT == 'No']) / sum(d$n[d$sysHT == 'No']))

## [1] 2.402513

Confidence intervals

Confidence intervals can be found all over statistics. They provide an interval likely to include the true population parameter we’re trying to estimate, allowing us to express estimated values from sample data with some confidence.

Depending on the situation, there are numerous methods for calculating them.The following formula is used to compute it:

    $$Confidence Interval = (point estimate)+/-(critical value)*(standard error)$$

WEEK 2. Wednesday February 8th, 2023 (9am to 12 midday).

Module 4. Inferential statistics 2: Correlation and regression, adjusting for confounders.

This module will initially explore correlation and simple linear regression. We will then extend these concepts to multi-linear regression, and generalised-linear-models that are use to analysis various outcomes, such as continuous, binomial, and rates.

# Correlation and linear regression

set.seed(12345)
d.samp <-subset(d, randid %in% sample(d$randid, 150))

par(mfrow = c(1,1))
plot(sysbp ~ age, data = d.samp,
                  las = 1,
                  xlab = 'Age (yrs)',
                  ylab = 'Systolic BP (mmHg)')

summary(lm(sysbp ~ age, data = d.samp))

## 
## Call:
## lm(formula = sysbp ~ age, data = d.samp)
## 
## Residuals:
## Systolic BP [mmHg] 
##     Min      1Q  Median      3Q     Max 
## -49.120 -14.068  -1.351   9.512  91.466 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  87.9392     9.7463   9.023 8.91e-16 ***
## age           0.8622     0.1925   4.479 1.49e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.21 on 148 degrees of freedom
## Multiple R-squared:  0.1194, Adjusted R-squared:  0.1134 
## F-statistic: 20.06 on 1 and 148 DF,  p-value: 1.489e-05

abline(a = 87.9392, b = 0.8622, lty = 2)

summary(lm(sysbp ~ I(age-35), data = d.samp))

## 
## Call:
## lm(formula = sysbp ~ I(age - 35), data = d.samp)
## 
## Residuals:
## Systolic BP [mmHg] 
##     Min      1Q  Median      3Q     Max 
## -49.120 -14.068  -1.351   9.512  91.466 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 118.1164     3.3385  35.380  < 2e-16 ***
## I(age - 35)   0.8622     0.1925   4.479 1.49e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.21 on 148 degrees of freedom
## Multiple R-squared:  0.1194, Adjusted R-squared:  0.1134 
## F-statistic: 20.06 on 1 and 148 DF,  p-value: 1.489e-05

cor(d.samp$age, d.samp$sysbp)

## [1] 0.3455124

cov(d.samp$age, d.samp$sysbp) / (var(d.samp$age) * var(d.samp$sysbp))^0.5

## [1] 0.3455124

tapply(d$sysbp, d$stroke, mean)

##        0        1 
## 130.5967 143.7114

summary(lm(sysbp ~ stroke, data = d))

## 
## Call:
## lm(formula = sysbp ~ stroke, data = d)
## 
## Residuals:
## Systolic BP [mmHg] 
##     Min      1Q  Median      3Q     Max 
## -49.711 -14.597  -3.597  10.903 151.289 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 130.5967     0.3386  385.66   <2e-16 ***
## stroke       13.1147     1.1858   11.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.05 on 4204 degrees of freedom
## Multiple R-squared:  0.02827,    Adjusted R-squared:  0.02804 
## F-statistic: 122.3 on 1 and 4204 DF,  p-value: < 2.2e-16

summary(lm(sysbp ~ stroke - 1, data = d))

## 
## Call:
## lm(formula = sysbp ~ stroke - 1, data = d)
## 
## Residuals:
## Systolic BP [mmHg] 
##    Min     1Q Median     3Q    Max 
## -49.71 113.00 125.00 139.88 243.00 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)    
## stroke  143.711      6.854   20.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 126.9 on 4205 degrees of freedom
## Multiple R-squared:  0.09467,    Adjusted R-squared:  0.09445 
## F-statistic: 439.7 on 1 and 4205 DF,  p-value: < 2.2e-16

regress.display(lm(sysbp ~ stroke, data = d))

## Linear regression predicting sysbp
##  
##                 Coeff.(95%CI)        P(t-test) P(F-test)
## stroke: 1 vs 0  13.11 (10.79,15.44)  < 0.001   < 0.001  
##                                                         
## No. of observations = 4206

# crude models 
plot(stroke ~ sysbp, data = d, las = 1)

plot(sysbp ~ stroke.yes, data = d, las = 1)

stat.table(HT.yes, contents = list(D = sum(stroke),
                                   N = sum(n),
                                   R = ratio(stroke,n)),
                                   margin = T,
                                   data = d)

##  --------------------------------- 
##  HT.yes         D       N       R  
##  --------------------------------- 
##  No        157.00 2963.00    0.05  
##  Yes       186.00 1243.00    0.15  
##                                    
##  Total     343.00 4206.00    0.08  
##  ---------------------------------

# logit model
summary(glm(stroke ~ HT.yes, data = d, family = binomial(link='logit')))

## 
## Call:
## glm(formula = stroke ~ HT.yes, family = binomial(link = "logit"), 
##     data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5694  -0.5694  -0.3300  -0.3300   2.4239  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.88327    0.08201  -35.16   <2e-16 ***
## HT.yesYes    1.14583    0.11423   10.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2376.7  on 4205  degrees of freedom
## Residual deviance: 2277.3  on 4204  degrees of freedom
## AIC: 2281.3
## 
## Number of Fisher Scoring iterations: 5

# predicted rate exp(a+bx)/(a+bx)+1
exp(-2.88327 + (1*1.1458)) / (exp(-2.88327 + (1*1.1458)) + 1)

## [1] 0.1496346

# log odds
# observed odds ratio
(0.15/0.85)/(0.05/0.95)

## [1] 3.352941

# predicted OR
exp(1.14583)

## [1] 3.145051

# Rate Ratio
tapply(d$stroke, d$HT.yes, mean)

##         No        Yes 
## 0.05298684 0.14963797

0.14963797/0.05298684

## [1] 2.824059

glm.stroke <- glm(cbind(stroke,n) ~ sysHT, data = d, family = binomial)
print(logistic.display(glm.stroke), digits = 3)

## 
## Logistic regression predicting cbind(stroke, n) 
##  
##                  OR(95%CI)     P(Wald's test) P(LR-test)
## sysHT: Yes vs No 2.4 (1.92,3)  < 0.001        < 0.001   
##                                                         
## Log-likelihood = -949.6711
## No. of observations = 4206
## AIC value = 1903.3422

glm.stroke <- glm(stroke ~ sysHT + offset(n), data = d, family = poisson)
glm.stroke <- glm(stroke ~ sysHT, offset(n), data = d, family = poisson)
print(idr.display(glm.stroke), digits = 3)

## 
## 
## Poisson regression predicting stroke 
##  
##                  IDR(95%CI)       P(Wald's test) P(LR-test)
## sysHT: Yes vs No 2.4 (1.94,2.97)  < 0.001        < 0.001   
##                                                            
## Log-likelihood = -1171.1858
## No. of observations = 4206
## AIC value = 2346.3716

cph <- coxph(Surv(timestrk, stroke) ~ I(sysbp/10), data = d)
summary(cph)

## Call:
## coxph(formula = Surv(timestrk, stroke) ~ I(sysbp/10), data = d)
## 
##   n= 4206, number of events= 343 
## 
##                coef exp(coef) se(coef)     z Pr(>|z|)    
## I(sysbp/10) 0.28278   1.32681  0.02013 14.05   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##             exp(coef) exp(-coef) lower .95 upper .95
## I(sysbp/10)     1.327     0.7537     1.275      1.38
## 
## Concordance= 0.684  (se = 0.015 )
## Likelihood ratio test= 157  on 1 df,   p=<2e-16
## Wald test            = 197.3  on 1 df,   p=<2e-16
## Score (logrank) test = 196.6  on 1 df,   p=<2e-16

# adjusted models 
# freq data
stroke.tab <- data.frame(cbind(tapply(d$stroke, d$sysHT, sum), tapply(d$n, d$sysHT, sum)))
names(stroke.tab) <- c('Strokes','N')
ci.binomial(stroke.tab$Strokes, stroke.tab$N)

##   events total probability          se exact.lower95ci exact.upper95ci
## 1    178  3035  0.05864909 0.004265079      0.05054965      0.06761068
## 2    165  1171  0.14090521 0.010167300      0.12146029      0.16216781

stroke.tab <- data.frame(cbind(tapply(d$stroke, d$sex, sum), tapply(d$n, d$sex, sum)))
names(stroke.tab) <- c('Strokes','N')
ci.binomial(stroke.tab$Strokes, stroke.tab$N)

##   events total probability          se exact.lower95ci exact.upper95ci
## 1    167  1868  0.08940043 0.006601537      0.07683967       0.1032664
## 2    176  2338  0.07527802 0.005456541      0.06490470       0.0867278

stroke.tab <- data.frame(cbind(tapply(d$stroke, d$age.group, sum), tapply(d$n, d$age.group, sum)))
names(stroke.tab) <- c('Strokes','N')
ci.binomial(stroke.tab$Strokes, stroke.tab$N)

##   events total probability          se exact.lower95ci exact.upper95ci
## 1     11   553  0.01989150 0.005937564      0.00996962      0.03531338
## 2     68  1642  0.04141291 0.004916965      0.03229793      0.05220926
## 3    148  1307  0.11323642 0.008765147      0.09655669      0.13168263
## 4    116   704  0.16477273 0.013981658      0.13809602      0.19428352

table(d$age.group, d$stroke.yes)

##        
##          Yes   No
##   30-39   11  542
##   40-49   68 1574
##   50-59  148 1159
##   60-69  116  588

round(epi.conf(dat = table(d$age.group, d$stroke.yes), ctype = "prop.single"), digits = 3)

##         est    se lower upper
## 30-39 0.020 0.006 0.011 0.035
## 40-49 0.041 0.005 0.033 0.052
## 50-59 0.113 0.009 0.097 0.132
## 60-69 0.165 0.014 0.139 0.194

d$sex <- relevel(d$sex, ref = 'Female')
glm.stroke <- glm(cbind(stroke,n) ~ sysHT + sex + age.group, data = d, family = binomial)
print(logistic.display(glm.stroke), digits = 3)

## 
## Logistic regression predicting cbind(stroke, n) 
##  
##                       crude OR(95%CI)    adj. OR(95%CI)     P(Wald's test)
## sysHT: Yes vs No      2.4 (1.92,3)       1.72 (1.36,2.17)   < 0.001       
##                                                                           
## sex: Male vs Female   1.19 (0.95,1.48)   1.27 (1.01,1.59)   0.038         
##                                                                           
## age.group: ref.=30-39                                                     
##    40-49              2.08 (1.09,3.96)   1.96 (1.03,3.74)   0.041         
##    50-59              5.69 (3.06,10.59)  4.84 (2.59,9.05)   < 0.001       
##    60-69              8.28 (4.42,15.53)  6.49 (3.42,12.28)  < 0.001       
##                                                                           
##                       P(LR-test)
## sysHT: Yes vs No      < 0.001   
##                                 
## sex: Male vs Female   < 0.001   
##                                 
## age.group: ref.=30-39 < 0.001   
##    40-49                        
##    50-59                        
##    60-69                        
##                                 
## Log-likelihood = -903.8435
## No. of observations = 4206
## AIC value = 1819.687

# Using follow-up time
# Kaplan-Meier estimates
km.0 <- survfit(Surv(timestrk.yrs, stroke) ~ sysHT, data = d)
summary(km.0, times = c(5,10,20))

## Call: survfit(formula = Surv(timestrk.yrs, stroke) ~ sysHT, data = d)
## 
##                 sysHT=No 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     5   2945      13    0.996 0.00120        0.993        0.998
##    10   2819      20    0.989 0.00194        0.985        0.993
##    20   2374      91    0.955 0.00398        0.947        0.963
## 
##                 sysHT=Yes 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     5   1082      24    0.979 0.00427        0.971        0.987
##    10    950      36    0.944 0.00702        0.931        0.958
##    20    639      74    0.858 0.01154        0.836        0.881

par(mfrow = c(1,2))
plot(km.0, lty = c(1,2),
           las = 1,
           col = c('blue','red'),
           xlab = 'Follow-up (yrs)',
           ylab = 'Probability of avoiding stroke')

legend('bottomleft', c('no-HT','HT'),
                     lty = c(1,2),
                     col = c('blue','red'),
                     bty = 'n')

plot(km.0, fun = 'event',
           lty = c(1,2),
           col = c('blue','red'),
           las = 1,
           xlab = 'Follow-up (yrs)',
           ylab = 'Probability of stroke')

legend('topleft', c('no-HT','HT'),
                  lty = c(1,2),
                  col = c('blue','red'),
                  bty = 'n')

cph.0 <- coxph(Surv(timestrk.yrs, stroke) ~ sysHT, data = d)
summary(cph.0)

## Call:
## coxph(formula = Surv(timestrk.yrs, stroke) ~ sysHT, data = d)
## 
##   n= 4206, number of events= 343 
## 
##            coef exp(coef) se(coef)     z Pr(>|z|)    
## sysHTYes 1.1139    3.0464   0.1083 10.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## sysHTYes     3.046     0.3283     2.464     3.767
## 
## Concordance= 0.628  (se = 0.013 )
## Likelihood ratio test= 99.52  on 1 df,   p=<2e-16
## Wald test            = 105.8  on 1 df,   p=<2e-16
## Score (logrank) test = 117.1  on 1 df,   p=<2e-16

newData <- data.frame(expand.grid(stroke = 0:1,
                                  sysHT = c('Yes','No'),
                                  timestrk.yrs = c(5,10,20)))

newData

##    stroke sysHT timestrk.yrs
## 1       0   Yes            5
## 2       1   Yes            5
## 3       0    No            5
## 4       1    No            5
## 5       0   Yes           10
## 6       1   Yes           10
## 7       0    No           10
## 8       1    No           10
## 9       0   Yes           20
## 10      1   Yes           20
## 11      0    No           20
## 12      1    No           20

yhat <- predict(cph.0, newdata = newData, type = 'expected')
pred.stroke <- cbind(newData, yhat)
subset(pred.stroke, stroke == 1 & timestrk.yrs %in% c(10,20))

##    stroke sysHT timestrk.yrs       yhat
## 6       1   Yes           10 0.04631465
## 8       1    No           10 0.01520324
## 10      1   Yes           20 0.14772021
## 12      1    No           20 0.04849063

par(mfrow = c(1,1),
    mar = c(8,4,2,8),
    cex = 0.9)
plot(km.0, fun = 'event',
     lty = c(2,2),
     col = c('blue','red'),
     las = 1,
     lwd = 2,
     xlab = 'Follow-up (yrs)',
     ylab = 'Probability of stroke',
     ylim = c(0,0.15),
     las = 1,
     frame = F,
     axes = F,
     xmax = 20)
axis(1, at = seq(0,20,1), label = seq(0,20,1))
axis(2, at = seq(0,0.15,0.01), label = seq(0,0.15,0.01), las = 1)

summary(km.0, times = c(5,10,20))

## Call: survfit(formula = Surv(timestrk.yrs, stroke) ~ sysHT, data = d)
## 
##                 sysHT=No 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     5   2945      13    0.996 0.00120        0.993        0.998
##    10   2819      20    0.989 0.00194        0.985        0.993
##    20   2374      91    0.955 0.00398        0.947        0.963
## 
##                 sysHT=Yes 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     5   1082      24    0.979 0.00427        0.971        0.987
##    10    950      36    0.944 0.00702        0.931        0.958
##    20    639      74    0.858 0.01154        0.836        0.881

mtext(side = 4,at = 1-0.858,  col = "red", text = 'Hypertension', las = 1)
mtext(side = 4,at = 1-0.955,  col = "blue", text = 'No-hypertension', las = 1)

# Add no. at risk
n.surv <- summary(km.0, times = seq(0,20,1))
## n.surv
n.risk <- matrix(n.surv$n.risk, byrow = T, nrow = 2)
## n.risk

mtext(side = 1, at = -0.1, line = 4, text = 'no. at risk \nHT group', outer = F, cex = 0.7, font = 1, adj = 0)
mtext(side = 1, at = c(-1, seq(0,20,1)), line = 5, text = c('HT', n.risk[1,]), outer = F, cex = 0.7, adj = 0, col = 1)
mtext(side = 1, at = c(-1, seq(0,20,1)), line = 6, text = c('No HT', n.risk[2,]), outer = F, cex = 0.7, adj = 0, col = 2)

Module 5. An introduction to meta-analysis: approaches to combining the results of a number of studies.

This module provides an introduction to the meta-analysis of the results a number of studies, including prevalence, continuous outcomes, and rates. This model will also cover approaches to heterogeneity between studies, and the concepts of fixed-effects and random-effects models. We will also cover Trial Sequential Analysis approaches to meta-analysis, essentially do we need more studies? ## Meta-analysis of various estimates (Rate Ratio, Prevalence etc)

library(meta)
# Meta analysis of Rate Ratios
d.3 <- data.frame(study = c('Study 1','Study 2','Study 3'),
                  event.e = c(5,20,15),
                  n.e = c(100,2000,300),
                  event.c = c(10,40,30),
                  n.c = c(100,2000,300))

meta.3 <- metabin(event.e, n.e,
                    event.c, n.c,
                    sm = 'RR',
                    data = d.3,
                    studlab = study)

meta.3

##             RR           95%-CI %W(fixed) %W(random)
## Study 1 0.5000 [0.1772; 1.4105]      12.5       12.8
## Study 2 0.5000 [0.2934; 0.8522]      50.0       48.6
## Study 3 0.5000 [0.2747; 0.9099]      37.5       38.5
## 
## Number of studies combined: k = 3
## 
##                          RR           95%-CI     z p-value
## Fixed effect model   0.5000 [0.3447; 0.7252] -3.65  0.0003
## Random effects model 0.5000 [0.3448; 0.7251] -3.65  0.0003
## 
## Quantifying heterogeneity:
##  tau^2 = 0; tau = 0; I^2 = 0.0% [0.0%; 89.6%]; H = 1.00 [1.00; 3.10]
## 
## Test of heterogeneity:
##     Q d.f. p-value
##  0.00    2  1.0000
## 
## Details on meta-analytical method:
## - Mantel-Haenszel method
## - DerSimonian-Laird estimator for tau^2
## - Mantel-Haenszel estimator used in calculation of Q and tau^2 (like RevMan 5)

par(cex = 0.9)
forest(meta.3)

# Clinical variation among surgical patients
# meta analysis of a proportion
d <- data.frame(study = c('Chen (2015)',
                          'Hewitt (2015)',
                          'Joseph (2014)',
                          'Makary (2010)',
                          'Reisinger (2015)',
                          'Robinson (2013)',
                          'Tegels (2014)',
                          'Richardson (2019)'),
                          n = c(379,325,220,594,159,72,127,160),
                          events = c(100,88,82,62,39,15,30,47),
                          year = c(2015,2015,2014,2010,2015,2013,
                                   2014,2019))
d <- d[ order(d$year),]
d

##               study   n events year
## 4     Makary (2010) 594     62 2010
## 6   Robinson (2013)  72     15 2013
## 3     Joseph (2014) 220     82 2014
## 7     Tegels (2014) 127     30 2014
## 1       Chen (2015) 379    100 2015
## 2     Hewitt (2015) 325     88 2015
## 5  Reisinger (2015) 159     39 2015
## 8 Richardson (2019) 160     47 2019

meta.prop <- metaprop(events, n, studlab = study, 
                                 data = d)
meta.prop

##                   proportion           95%-CI
## Makary (2010)         0.1044 [0.0810; 0.1318]
## Robinson (2013)       0.2083 [0.1216; 0.3202]
## Joseph (2014)         0.3727 [0.3087; 0.4403]
## Tegels (2014)         0.2362 [0.1654; 0.3197]
## Chen (2015)           0.2639 [0.2202; 0.3113]
## Hewitt (2015)         0.2708 [0.2232; 0.3226]
## Reisinger (2015)      0.2453 [0.1806; 0.3197]
## Richardson (2019)     0.2938 [0.2245; 0.3708]
## 
## Number of studies combined: k = 8
## 
##                      proportion           95%-CI
## Fixed effect model       0.2274 [0.2097; 0.2461]
## Random effects model     0.2401 [0.1863; 0.3035]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1852; tau = 0.4303; I^2 = 91.4% [85.5%; 94.9%]; H = 3.41 [2.62; 4.44]
## 
## Test of heterogeneity:
##      Q d.f.  p-value             Test
##  81.49    7 < 0.0001        Wald-type
##  94.55    7 < 0.0001 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Logit transformation
## - Clopper-Pearson confidence interval for individual studies

forest(meta.prop)

# sample size based on this MA of prevalence abd 5% margin of error
n.for.survey(p = 0.25, delta = 0.05)

## 
## Sample size for survey. 
## Assumptions: 
##   Proportion       = 0.25 
##   Confidence limit = 95 % 
##   Delta            = 0.05 from the estimate. 
## 
##   Sample size      = 288

dta <- read.csv('~/OneDrive - University of Wollongong/Data/DataCaVitDFx.csv', header = T, sep = ',')
## dta <- read.csv('c:/data/DataCaVitDFx.csv', header = T, sep = ',')
dta

##                   study   n.e event.e   n.c event.c fUp.mth year
## 1         Chapuy (1992)  1387      80  1403     110      18 1992
## 2  Dawson-Hughes (1997)   187      11   202      26      36 1997
## 3         Chapuy (2002)   393      27   190      21      24 2002
## 4        Harwood (2004)    75       6    37       5      12 2004
## 5        Avenell (2004)    35       3    35       5      12 2004
## 6         Larsen (2004)  4957     318  2116     167      36 2004
## 7      Porthouse (2005)  1321      58  1993      91      24 2005
## 8          Grant (2005)  1306     165  1332     179      62 2005
## 9        Jackson (2006) 18176    2012 18106    2158      84 2006
## 10  Bolton-Smith (2007)    62       2    61       2      24 2007
## 11     Salovaara (2010)  1586      78  1609      94      36 2010

# annualsed rates of fracture
dta$yr.rate.e <- (dta$event.e/dta$n.e)/ (dta$fUp.mth/12)
dta$yr.rate.c <- (dta$event.c/dta$n.c)/ (dta$fUp.mth/12)
dta$rate.e <- (dta$event.e/dta$n.e)
dta$rate.c <- (dta$event.c/dta$n.c)
dta$RD <-  round(dta$rate.e - dta$rate.c,4)
dta$RR <- round(dta$rate.e/dta$rate.c,2)
dta

##                   study   n.e event.e   n.c event.c fUp.mth year  yr.rate.e
## 1         Chapuy (1992)  1387      80  1403     110      18 1992 0.03845230
## 2  Dawson-Hughes (1997)   187      11   202      26      36 1997 0.01960784
## 3         Chapuy (2002)   393      27   190      21      24 2002 0.03435115
## 4        Harwood (2004)    75       6    37       5      12 2004 0.08000000
## 5        Avenell (2004)    35       3    35       5      12 2004 0.08571429
## 6         Larsen (2004)  4957     318  2116     167      36 2004 0.02138390
## 7      Porthouse (2005)  1321      58  1993      91      24 2005 0.02195307
## 8          Grant (2005)  1306     165  1332     179      62 2005 0.02445290
## 9        Jackson (2006) 18176    2012 18106    2158      84 2006 0.01581363
## 10  Bolton-Smith (2007)    62       2    61       2      24 2007 0.01612903
## 11     Salovaara (2010)  1586      78  1609      94      36 2010 0.01639344
##     yr.rate.c     rate.e     rate.c      RD   RR
## 1  0.05226895 0.05767844 0.07840342 -0.0207 0.74
## 2  0.04290429 0.05882353 0.12871287 -0.0699 0.46
## 3  0.05526316 0.06870229 0.11052632 -0.0418 0.62
## 4  0.13513514 0.08000000 0.13513514 -0.0551 0.59
## 5  0.14285714 0.08571429 0.14285714 -0.0571 0.60
## 6  0.02630750 0.06415170 0.07892250 -0.0148 0.81
## 7  0.02282990 0.04390613 0.04565981 -0.0018 0.96
## 8  0.02600988 0.12633997 0.13438438 -0.0080 0.94
## 9  0.01702672 0.11069542 0.11918701 -0.0085 0.93
## 10 0.01639344 0.03225806 0.03278689 -0.0005 0.98
## 11 0.01947379 0.04918033 0.05842138 -0.0092 0.84

d <- dta
# TRADITIONAL META-ANALYSIS

meta.all <- metabin(event.e,n.e,event.c,n.c,
                 data = d,
                 sm = 'RR',
                 studlab = study,
                 method = 'MH')

meta.all

##                          RR           95%-CI %W(fixed) %W(random)
## Chapuy (1992)        0.7357 [0.5570; 0.9717]       3.8        7.0
## Dawson-Hughes (1997) 0.4570 [0.2324; 0.8988]       0.9        1.3
## Chapuy (2002)        0.6216 [0.3610; 1.0702]       1.0        2.0
## Harwood (2004)       0.5920 [0.1932; 1.8137]       0.2        0.5
## Avenell (2004)       0.6000 [0.1552; 2.3203]       0.2        0.3
## Larsen (2004)        0.8128 [0.6788; 0.9734]       8.0       14.6
## Porthouse (2005)     0.9616 [0.6969; 1.3267]       2.5        5.3
## Grant (2005)         0.9401 [0.7718; 1.1452]       6.1       12.6
## Jackson (2006)       0.9288 [0.8772; 0.9834]      74.2       49.9
## Bolton-Smith (2007)  0.9839 [0.1431; 6.7637]       0.1        0.2
## Salovaara (2010)     0.8418 [0.6286; 1.1274]       3.2        6.4
## 
## Number of studies combined: k = 11
## 
##                          RR           95%-CI     z  p-value
## Fixed effect model   0.9026 [0.8588; 0.9486] -4.04 < 0.0001
## Random effects model 0.8755 [0.8102; 0.9460] -3.36   0.0008
## 
## Quantifying heterogeneity:
##  tau^2 = 0.0023; tau = 0.0478; I^2 = 12.8% [0.0%; 53.3%]; H = 1.07 [1.00; 1.46]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  11.47   10  0.3224
## 
## Details on meta-analytical method:
## - Mantel-Haenszel method
## - DerSimonian-Laird estimator for tau^2
## - Mantel-Haenszel estimator used in calculation of Q and tau^2 (like RevMan 5)

forest(meta.all)

# meta-regression
mreg.all <- metareg(meta.all, ~ I(yr.rate.c * 1000), method.tau = 'REML', hakn = F)
mreg.all

## 
## Mixed-Effects Model (k = 11; tau^2 estimator: REML)
## 
## tau^2 (estimated amount of residual heterogeneity):     0 (SE = 0.0050)
## tau (square root of estimated tau^2 value):             0
## I^2 (residual heterogeneity / unaccounted variability): 0.00%
## H^2 (unaccounted variability / sampling variability):   1.00
## R^2 (amount of heterogeneity accounted for):            100.00%
## 
## Test for Residual Heterogeneity:
## QE(df = 9) = 4.9581, p-val = 0.8379
## 
## Test of Moderators (coefficient 2):
## QM(df = 1) = 6.5063, p-val = 0.0107
## 
## Model Results:
## 
##                      estimate      se     zval    pval    ci.lb    ci.ub 
## intrcpt                0.0281  0.0568   0.4947  0.6208  -0.0832   0.1393    
## I(yr.rate.c * 1000)   -0.0063  0.0025  -2.5507  0.0107  -0.0112  -0.0015  * 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Predicted RR for various baseline risk
pred <- data.frame(annualRisk.1000 = seq(5,60,5),
           pred.RR = round(exp(predict(mreg.all, newmods = seq(5,60,5))$pred),2),
           lcl = round(exp(predict(mreg.all, newmods = seq(5,60,5))$ci.lb),2),
           ucl = round(exp(predict(mreg.all, newmods = seq(5,60,5))$ci.ub),2))
pred

##    annualRisk.1000 pred.RR  lcl  ucl
## 1                5    1.00 0.91 1.09
## 2               10    0.97 0.90 1.04
## 3               15    0.94 0.88 0.99
## 4               20    0.91 0.86 0.95
## 5               25    0.88 0.83 0.93
## 6               30    0.85 0.80 0.91
## 7               35    0.82 0.76 0.90
## 8               40    0.80 0.72 0.89
## 9               45    0.77 0.68 0.88
## 10              50    0.75 0.64 0.87
## 11              55    0.73 0.61 0.87
## 12              60    0.70 0.58 0.86

# NNT with annual risk ranging from 5 to 60 fractures per 1000
risk.1000 <- seq(5,60,5)
RR <- pred$pred.RR
Exp <- risk.1000 * RR                                                               
ARR <- risk.1000 - Exp
NNT <- ceiling(1/(ARR/1000))
cbind(risk.1000, RR, Exp, ARR, NNT)

##       risk.1000   RR   Exp   ARR  NNT
##  [1,]         5 1.00  5.00  0.00  Inf
##  [2,]        10 0.97  9.70  0.30 3334
##  [3,]        15 0.94 14.10  0.90 1112
##  [4,]        20 0.91 18.20  1.80  556
##  [5,]        25 0.88 22.00  3.00  334
##  [6,]        30 0.85 25.50  4.50  223
##  [7,]        35 0.82 28.70  6.30  159
##  [8,]        40 0.80 32.00  8.00  125
##  [9,]        45 0.77 34.65 10.35   97
## [10,]        50 0.75 37.50 12.50   80
## [11,]        55 0.73 40.15 14.85   68
## [12,]        60 0.70 42.00 18.00   56

# meta-gression plot
par(mfrow = c(1,1), cex = 1, 
                    family = 'sans', 
                    bg = 'white', 
                    col = 'black',
                    mar = c(5,5,3,3))

plot(d$yr.rate.c * 1000, meta.all$TE, 
                    cex = d$n.c^0.5/7,
                    pch = c(rep(16,8),16,rep(16,8)),
                    las = 1,
                    xlab = 'Annual rate (per 1000) of fracture among control group',
                    xlim = c(0,150),
                    ylab = 'log(Relative Risk)',
                    ylim = c(-1.5,0.75),
                    axes = F,
                    col = 'lightgrey',
                    col.axis = 'black',
                    col.lab = 'black')
points(d$yr.rate.c * 1000, meta.all$TE, pch = 1, col = 'black', 
                                                 cex = d$n.c^0.5/7)                    
axis(1, at = seq(0,150,10), labels = seq(0,150,10), cex.axis = 0.85, col = 'black', col.axis = 'black') 
axis(2, at = seq(-1.5,0.75, 0.25), labels = seq(-1.5,0.75, 0.25), cex.axis = 0.85, las = 1, col = 'black', col.axis = 'black')
                           
lines(seq(0,150,5), predict(mreg.all, newmods = seq(0,150,5))$pred, lty = 1, lwd = 2.5, col = 'black')
lines(seq(0,150,5), predict(mreg.all, newmods = seq(0,150,5))$cr.lb, lty = 3, lwd = 2, col = 'black') 
lines(seq(0,150,5), predict(mreg.all, newmods = seq(0,150,5))$cr.ub, lty = 3, lwd = 2, col = 'black')  
abline(h = 0, lty = 2, lwd = 1.25)
abline(h = seq(-1.5,0.5, 0.25), v = seq(0,150,5),lty = 3, lwd = 1, col = 'grey')

text(d$yr.rate.c[1]  * 1000 + 16.5, meta.all$TE[1]         , d$study[1], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[2]  * 1000 + 5   , meta.all$TE[2]  - 0.125 , d$study[2], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[3]  * 1000       , meta.all$TE[3]  - 0.095 , d$study[3], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[4]  * 1000 - 7.5 , meta.all$TE[4]  - 0.095 , d$study[4], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[5]  * 1000       , meta.all$TE[5]  + 0.08  , d$study[5], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[6]  * 1000 + 10   , meta.all$TE[6]  - 0.165 , d$study[6], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[7]  * 1000       , meta.all$TE[7]  + 0.175 , d$study[7], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[8]  * 1000 + 14  , meta.all$TE[8]          , d$study[8], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[9]  * 1000       , meta.all$TE[9]  + 0.425 , d$study[9], cex = 0.7, family = 'sans', col = 'black')
text(d$yr.rate.c[10] * 1000 - 9   , meta.all$TE[10] + 0.075 , d$study[10], cex = 0.5, family = 'sans', col = 'black')
text(d$yr.rate.c[11] * 1000 - 12   , meta.all$TE[11] - 0.15 , d$study[11], cex = 0.7, family = 'sans', col = 'black')

# meta-analysis - outcome of interest mortality
dta <- data.frame(study = c('Look (2018)',  
                            'Nielsen (2013)',  
                            'Laurent (2005)', 
                            'Hachimi-Idrissi (2005)',
                            'Bernard  (2002)',  
                            'HACA (2002)',
                            'Hachimi-Idrissi (2001)',
                            'Tiainen (2007)',
                            'Bernard (1997)',
                            'Dankiewicz (2021)',
                            'Lascarrou (2019)',
                            'Dankiewicz (2021)',
                            'Lascarrou (2019)',
                            'Nielsen (2013)'),
                  location = rep('Out of hospital',14),
                  year = c(2018,2013,2005,2005,2002,2002,2001,2007,1997,2021,2019,2021,2019,2013),
                  n = c(87,939,42,28,77,275,30,70,44,1850,581,939,1850,581),
                  event.e = c(27,226,15,6,22,56,13,8,10,465,231,465,231,226),
                  n.e = c(45,473,22,14,43,137,16,36,22,925,284,925,284,473),
                  event.c = c(33,220,11,8,23,76,13,12,17,446,247,446,247,220),
                  n.c = c(42,466,20,14,34,138,14,34,22,925,297,925,297,466),
                  target.temp = c(rep(33,5),32,34,33,33,33,33,33,33,33),
                  comparison = c('No-TTM','TTM-36','TTM-37','TTM-37',
                                 'No-TTM','No-TTM','Tx>38','Tx>38',
                                 'No-TTM','Tx>37.8','TTM-37','TTM-36','Tx>37.8','TTM-37'),
                  design = c('RCT','RCT','RCT','RCT','RCT',
                             'RCT','RCT','RCT','Non-R','RCT','RCT','RCT','RCT','RCT'),
                  control.target = c('Trials without explicit normothermia / fever avoidance',
                                     'Trials with explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials with explicit normothermia / fever avoidance',
                                     'Trials with explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance',
                                     'Trials without explicit normothermia / fever avoidance'))
dta <- dta[ order(dta$year),]
dta$id <- 1:14
dta$ratio.e <- paste(dta$event.e,'/',dta$n.e, sep = '')
dta$p.e <- paste('(', round(dta$event.e/dta$n.e,2)*100,'%)', sep = '') 
dta$ratio.c <- paste(dta$event.c,'/',dta$n.c, sep = '')
dta$p.c <- paste('(', round(dta$event.c/dta$n.c,2)*100,'%)', sep = '') 
dta$tx <- paste(dta$ratio.e,dta$p.e, sep = ' ') 
dta$c <- paste(dta$ratio.c,dta$p.c, sep = ' ')   

meta.all <- metabin(event.e, n.e,
                    event.c, n.c,
                    sm = 'RR',
                    data = dta,
                    studlab = study,
                    byvar = control.target,
                    subset = id %in% c(1:8,11,13))

# ---------------------------- Forest plot ---------------------------------------------

par(cex = 1.15) 

forest(meta.all, leftcols = c('studlab',
                              'tx','c'),
                 leftlabs = c('Study (Year)','Hypothermia\n (n/N)(%)',
                                             'Control\n (n/N)(%)'),
                 bylab = '',
                 label.right = '',
                 label.left = '',
                 col.square = 'blue',
                 col.diamond = 'blue',
                 overall = F,
                 col.by = 'black')

Module 6. Writing for publication and presenting results at conferences.

This module will explore approaches to interpreting and presenting results of your research. We will focus on the presentation of results in manuscripts, abstract submissions, and reports. The presentation of results in the context of conference presentations will also be covered.

Summary table using Hmisc package

Table 1. Characteristics of study participants.

	Males (n=1,868)	Females (n=2,338)	p-value
Age (years), mean (SD)	49.6 (8.70)	49.6 (8.53)	0.808
Systolic BP (mmHg), mean (SD)	130.9 (18.80)	132.3 (23.20)	0.808
BMI (kg/m²), mean (SD)	26.2 (3.4)	25.5 (3.3)	<0.001
Hypertension, n (%)	570 (31)	673 (29)	0.222
Current smoker, n (%)	1,135 (61)	957 (41)	<0.001
Diabetes, n (%)	56 (3)	51 (2)	0.095
BGL, (mmol/L), mean (SD)	4.6 (1.39)	4.5 (1.23)	0.817
Education, n (%)			<0.001
- 4th grade or less	805 (44)	909 (40)
- 5th, 6th, or 7th grades	497 (27)	724 (32)
- grade school graduate no high school	228 (13)	453 (20)
- high school, did not graduate	282 (16)	198 (9)
Follow-up (yrs), mean (SD)	19.5 (6.6)	21 (5.8)	<0.001

Using the Hmisc ‘summary’ function we have created the above Table, we have included test = TRUE, so the function will compare the continuous variables using (un-paired T-Test), and the frequency data using a Pearson’s chi-squared test.

Table 2. Hypertension and risk of stroke.

	events/n (%)	crude OR(95%CI)	adj. OR(95%CI)	p-value^*
No-hypertension	178/3,035 (5.9%)	1.0 (ref)	1.0 (ref)
Hypertension	165/1,171 (14.1%)	2.40 (1.92, 3.00)	1.72 (1.36,2.17)	<0.001
Females	176/2,338 (7.5%)	1.0 (ref)	1.0 (ref)
Males	167/1,868 (14.1%)	1.19 (0.95, 1.48)	1.27 (1.01,1.59)	0.038
Age Group
30-39	11/553 (2.0%)	1.0 (ref)	1.0 (ref)
40-49	68/1,642 (4.1%)	2.08 (1.09,3.96)	1.96 (1.03,3.74)	0.041
50-59	148/1,307 (11.3%)	5.69 (3.06,10.59)	4.84 (2.59,9.05)	< 0.001
60-69	116/704 (16.4%)	8.28 (4.42,15.53)	6.49 (3.42,12.28)	< 0.001

adjusted \(p\)-values.

Our practical sessions we will be using the R Statistical Language (https://www.r-project.org/) via the R Studio editor (https://www.rstudio.com/products/rstudio/download/#download). And, we suggest you load this onto your laptop prior to Week 1.

The R language is an object orientated statistical program, that can be used for all stages of the research process, including:

The planning of a study, meta-analysis of current literature, power and sample size;
All data cleaning and management;
All analysis; and,
Most important, presentation of results for publication in manuscripts (including all tables and figures).

Some useful links:

BMJ Statistics at square one.

https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one

Bland and Altman Statistical notes in the BMJ.

https://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm