knitr::opts_chunk$set(
    echo = TRUE,
    message = TRUE,
    warning = FALSE
)
### Create datable() with Filter:
options(DT.options = list(pageLength = 5, 
                          lengthMenu = c(5, 10, 20, 50, 100), # Adjust option in Show entries
                          autoWidth = TRUE,
                          language = list(search = 'Filter:')))

Note: This analysis is used for my own study purpose. In this section, I will summerize some basic commands for making table based on several online sources.

R Packages used in this practice:

# Load the required library
library(tidyverse)    # Data Wrangling

## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6      v purrr   0.3.4 
## v tibble  3.1.8      v dplyr   1.0.10
## v tidyr   1.2.0      v stringr 1.4.1 
## v readr   2.1.2      v forcats 0.5.2 
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(conflicted)   # Dealing with conflict package
library(readxl)       # Read csv file
library(DT)           # For using datatable()
library(table1)       # For create table list (table1)
library(ggpubr)       # For combing plot ggarrange()
library(compareGroups)  # For create table list (createTable)
library(gmodels)      # For Chi-square test

Dealing with Conflicts
There is a lot of packages here, and sometimes individual functions are in conflict. R’s default conflict resolution system gives precedence to the most recently loaded package. This can make it hard to detect conflicts, particularly when introduced by an update to an existing package.
Using the code below helps the entire book run properly. You may or may not need to look into the conflicted package for your work.

conflict_prefer("filter", "dplyr")

## [conflicted] Will prefer dplyr::filter over any other package

conflict_prefer("select", "dplyr")

## [conflicted] Will prefer dplyr::select over any other package

conflict_prefer("Predict", "rms")

## [conflicted] Will prefer rms::Predict over any other package

conflict_prefer("impute_median", "simputation")

## [conflicted] Will prefer simputation::impute_median over any other package

conflict_prefer("summarize", "dplyr")

## [conflicted] Will prefer dplyr::summarize over any other package

Data used in this notes
fakestroke.csv
(Source: https://github.com/THOMASELOVE/432-data/blob/master/data/fakestroke.csv)
Loading the data, adjust the character of each column:

# Loading the data, adjust the character of each column
df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx", 
                 col_types = c("text", "text", "numeric",
                               "text", "numeric", "text", "text", 
                               "numeric", "numeric", "text", "numeric", 
                               "text", "numeric", "numeric", "text", 
                               "numeric", "numeric", "numeric"))

1 Example from Journal Article

1.1 Example 1 - New England Journal of Medicine

A typical Table 1 involves a group comparison, for example in this excerpt from Roy et al. (2008). This Table 1 describes a multi-center randomized clinical trial comparing two different approaches to caring for patients with heart failure and atrial fibrillation:

# To add the image to R markdown: 
## ![Name of the caption](Link to the image){width=50%}: 

### ![](/Statistics/R/R data/Model practise/Table pratise/Table 1 Baseline characteristic NEJ.jpeg){height=30%}

The article provides percentages, means and standard deviations across groups, but note that it does not provide p values for the comparison of baseline characteristics. This is a common feature of NEJM reports on randomized clinical trials, where we anticipate that the two groups will be well matched at baseline.

1.2 Example 2: The MR CLEAN trial

Berkhemer et al. (2015) reported on the MR CLEAN trial, involving 500 patients with acute ischemic stroke caused by a proximal intracranial arterial occlusion. The trial was conducted at 16 medical centers in the Netherlands, where 233 were randomly assigned to the intervention (intraarterial treatment plus usual care) and 267 to control (usual care alone.)
Here’s the Table 1 from Berkhemer et al. (2015).

### ![](/Statistics/R/R data/Model practise/Table pratise/Table 1 Baseline characteristic 500 patients.jpeg)

2 Simulated fakestroke data

The fakestroke.csv file contains the following 18 variables for 500 patients:

# To make a table in R markdown: 1st row: header, 2nd row: Alignment; the remaining row: for content
## |Variable |  Description |
## |:------- | :----------  |
## |studyid  | Study ID  # (z001 through z500) |

Variable	Description
studyid	Study ID # (z001 through z500)
trt	Treatment group (Intervention or Control)
age	Age in years
sex	Male or Female
nihss	NIH Stroke Scale Score (can range from 0-42; higher scores indicate more severe neurological deficits)
location	Stroke Location - Left or Right Hemisphere
hx.isch	History of Ischemic Stroke (Yes/No)
afib	Atrial Fibrillation (1 = Yes, 0 = No)
dm	Diabetes Mellitus (1 = Yes, 0 = No)
mrankin	Pre-stroke modified Rankin scale score (0, 1, 2 or > 2) indicating functional disability - complete range is 0 (no symptoms) to 6 (death)
sbp	Systolic blood pressure, in mm Hg
iv.altep	Treatment with IV alteplase (Yes/No)
time.iv	Time from stroke onset to start of IV alteplase (minutes) if iv.altep=Yes
aspects	Alberta Stroke Program Early Computed Tomography score, which measures extent of stroke from 0 - 10; higher scores indicate fewer early ischemic changes
ia.occlus	Intracranial arterial occlusion, based on vessel imaging - five categories3
extra.ica	Extracranial ICA occlusion (1 = Yes, 0 = No)
time.rand	Time from stroke onset to study randomization, in minutes
time.punc	Time from stroke onset to groin puncture, in minutes (only if Intervention)

A quick look at the simulated data in fakestroke

df

## # A tibble: 500 x 18
##    studyid trt         age sex   nihss locat~1 hx.isch  afib    dm mrankin   sbp
##    <chr>   <chr>     <dbl> <chr> <dbl> <chr>   <chr>   <dbl> <dbl> <chr>   <dbl>
##  1 z001    Control      53 Male     21 Right   No          0     0 2         127
##  2 z002    Interven~    51 Male     23 Left    No          1     0 0         137
##  3 z003    Control      68 Fema~    11 Right   No          0     0 0         138
##  4 z004    Control      28 Male     22 Left    No          0     0 0         122
##  5 z005    Control      91 Male     24 Right   No          0     0 0         162
##  6 z006    Control      34 Fema~    18 Left    No          0     0 2         166
##  7 z007    Interven~    75 Male     25 Right   No          0     0 0         140
##  8 z008    Control      89 Fema~    18 Right   No          0     0 0         157
##  9 z009    Control      75 Male     25 Left    No          1     0 2         129
## 10 z010    Interven~    26 Fema~    27 Right   No          0     0 0         143
## # ... with 490 more rows, 7 more variables: iv.altep <chr>, time.iv <dbl>,
## #   aspects <dbl>, ia.occlus <chr>, extra.ica <dbl>, time.rand <dbl>,
## #   time.punc <dbl>, and abbreviated variable name 1: location

str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : chr [1:500] "z001" "z002" "z003" "z004" ...
##  $ trt      : chr [1:500] "Control" "Intervention" "Control" "Control" ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : chr [1:500] "Male" "Male" "Female" "Male" ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : chr [1:500] "Right" "Left" "Right" "Left" ...
##  $ hx.isch  : chr [1:500] "No" "No" "No" "No" ...
##  $ afib     : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
##  $ dm       : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mrankin  : chr [1:500] "2" "0" "0" "0" ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : chr [1:500] "Yes" "Yes" "No" "Yes" ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: chr [1:500] "M1" "M1" "ICA with M1" "ICA with M1" ...
##  $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

summary(df)

##    studyid              trt                 age            sex           
##  Length:500         Length:500         Min.   :23.00   Length:500        
##  Class :character   Class :character   1st Qu.:55.00   Class :character  
##  Mode  :character   Mode  :character   Median :65.75   Mode  :character  
##                                        Mean   :64.71                     
##                                        3rd Qu.:76.00                     
##                                        Max.   :96.00                     
##                                                                          
##      nihss         location           hx.isch               afib     
##  Min.   :10.00   Length:500         Length:500         Min.   :0.00  
##  1st Qu.:14.00   Class :character   Class :character   1st Qu.:0.00  
##  Median :18.00   Mode  :character   Mode  :character   Median :0.00  
##  Mean   :18.03                                         Mean   :0.27  
##  3rd Qu.:22.00                                         3rd Qu.:1.00  
##  Max.   :28.00                                         Max.   :1.00  
##                                                                      
##        dm          mrankin               sbp          iv.altep        
##  Min.   :0.000   Length:500         Min.   : 78.0   Length:500        
##  1st Qu.:0.000   Class :character   1st Qu.:128.5   Class :character  
##  Median :0.000   Mode  :character   Median :145.0   Mode  :character  
##  Mean   :0.126                      Mean   :145.5                     
##  3rd Qu.:0.000                      3rd Qu.:162.5                     
##  Max.   :1.000                      Max.   :231.0                     
##                                     NA's   :1                         
##     time.iv          aspects        ia.occlus           extra.ica     
##  Min.   : 42.00   Min.   : 5.000   Length:500         Min.   :0.0000  
##  1st Qu.: 67.00   1st Qu.: 7.000   Class :character   1st Qu.:0.0000  
##  Median : 86.00   Median : 9.000   Mode  :character   Median :0.0000  
##  Mean   : 92.64   Mean   : 8.506                      Mean   :0.2906  
##  3rd Qu.:115.00   3rd Qu.:10.000                      3rd Qu.:1.0000  
##  Max.   :218.00   Max.   :10.000                      Max.   :1.0000  
##  NA's   :55       NA's   :4                           NA's   :1       
##    time.rand       time.punc  
##  Min.   :100.0   Min.   :180  
##  1st Qu.:151.2   1st Qu.:212  
##  Median :201.5   Median :260  
##  Mean   :208.6   Mean   :263  
##  3rd Qu.:257.8   3rd Qu.:313  
##  Max.   :360.0   Max.   :360  
##  NA's   :2       NA's   :267

3 Building table: compareGroups() function - preferable

Loading the data: fakestroke data

# Loading the data, adjust the character of each column
df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx", 
                 col_types = c("text", "text", "numeric",
                               "text", "numeric", "text", "text", 
                               "numeric", "numeric", "text", "numeric", 
                               "text", "numeric", "numeric", "text", 
                               "numeric", "numeric", "numeric"))

3.1 Step 1: Identify categorical or numeric variables

Before constructing a table, I must check numeric/ categorical variables.
For example: in our data:

Numeric variable: age; nihss; sbp; time.iv; aspects; time.rand; time.punc.
Character variable: sex; location; hx.isch; afib; dm; mrankin; iv.altep; ia.occlus; extra.ica.

In the excel file, afib, dm, extra.ica are noted as “1” and “0”.
Therefore, I must convert them back to “Yes-No” for character variables:

# Changing coding variable to categorical:
df$afib       <- factor(df$afib , levels=0:1, labels=c("No", "Yes"))
df$dm         <- factor(df$dm , levels=0:1, labels=c("No", "Yes"))
df$extra.ica  <- factor(df$extra.ica, levels=0:1, labels=c("No", "Yes"))

Reorder the level of some character variables:

trt: Intervention (baseline) , Control
mrankin: 0, 1, 2, > 2.
ia.occlus: rearrange the ia.occlus variable to the order presented in Berkhemer et al. (2015).

# Relevel factors:
df$trt = factor(df$trt, levels=c("Intervention", "Control"))
df$mrankin = factor(df$mrankin, levels=c("0", "1", "2", "> 2"))
df$ia.occlus = factor(df$ia.occlus, levels=c("Intracranial ICA", "ICA with M1", 
                                             "M1", "M2", "A1 or A2"))

# Check order of factor after relevel: 
df$trt[1:2]

## [1] Control      Intervention
## Levels: Intervention Control

df$mrankin[1:2]

## [1] 2 0
## Levels: 0 1 2 > 2

df$ia.occlus[1:3]

## [1] M1          M1          ICA with M1
## Levels: Intracranial ICA ICA with M1 M1 M2 A1 or A2

3.2 Step 2: First draft: compareGroups()

Our goal, then, is to take the data in fakestroke.csv and use it to generate a Table for the study that compares the 233 patients in the Intervention group to the 267 patients in the Control group, on all of the other variables (except study ID #) available.

I’ll use the compareGroups package of functions available in R to help me complete this task.

The command is as follow:
createTable( compareGroups( group ~ var1 + var2 + …., data = … ))

var1; var2: numeric or categorical variables
** For numeric var: Mean (SD) or Median (Q1, Q3)
** For categorical var: arragne in order: count (%)
group: name of the main group(s) you want to summerize/ put to header.

createTable( compareGroups( trt ~ age + sex + nihss + location + hx.isch + afib + dm +
                              mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus +
                              extra.ica + time.rand + time.punc, 
                            data = df ))

## 
## --------Summary descriptives table by 'trt'---------
## 
## _______________________________________________________ 
##                      Intervention   Control   p.overall 
##                         N=233        N=267              
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  63.9 (18.1)  65.4 (16.1)   0.347   
## sex:                                            0.917   
##     Female            98 (42.1%)  110 (41.2%)           
##     Male             135 (57.9%)  157 (58.8%)           
## nihss                18.0 (5.04)  18.1 (4.32)   0.790   
## location:                                       0.111   
##     Left             116 (49.8%)  153 (57.3%)           
##     Right            117 (50.2%)  114 (42.7%)           
## hx.isch:                                        0.335   
##     No               204 (87.6%)  242 (90.6%)           
##     Yes               29 (12.4%)  25 (9.36%)            
## afib:                                           0.601   
##     No               167 (71.7%)  198 (74.2%)           
##     Yes               66 (28.3%)  69 (25.8%)            
## dm:                                             1.000   
##     No               204 (87.6%)  233 (87.3%)           
##     Yes               29 (12.4%)  34 (12.7%)            
## mrankin:                                        0.922   
##     0                190 (81.5%)  214 (80.1%)           
##     1                 21 (9.01%)  29 (10.9%)            
##     2                 12 (5.15%)  13 (4.87%)            
##     > 2               10 (4.29%)  11 (4.12%)            
## sbp                   146 (26.0)  145 (24.4)    0.649   
## iv.altep:                                       0.267   
##     No                30 (12.9%)  25 (9.36%)            
##     Yes              203 (87.1%)  242 (90.6%)           
## time.iv              98.2 (45.5)  88.0 (26.0)   0.005   
## aspects              8.35 (1.64)  8.65 (1.47)   0.034   
## ia.occlus:                                      0.819   
##     Intracranial ICA  1 (0.43%)    3 (1.13%)            
##     ICA with M1       59 (25.3%)  75 (28.2%)            
##     M1               154 (66.1%)  165 (62.0%)           
##     M2                18 (7.73%)  21 (7.89%)            
##     A1 or A2          1 (0.43%)    2 (0.75%)            
## extra.ica:                                      0.179   
##     No               158 (67.8%)  196 (73.7%)           
##     Yes               75 (32.2%)  70 (26.3%)            
## time.rand             203 (57.3)  214 (70.3)    0.048   
## time.punc             263 (54.2)     . (.)        .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Discuss

The p-value here is just for normal distribution - continuous variables; not non-normal. Therefore, we must check the distribution of continuous variable - Step 3 or go directly to Step 4 !!
For numeric variable: Mean (SD) or Median (Q1-Q3)?
** Normal distribution: Mean (SD)
** Non-normal: Median (Q1-Q3)

3.3 Step 3: Testing normality - Shapiro–Wilk test (can skip this step)

3.3.1 Shapiro-Wilk test

Mechanism: it compares the scores in the sample to a normally distributed set of scores with the same mean and standard deviation. In compareGroups() function, we have the command to help us checking directly, so we can skip this step.

Null hypothesis (p> 0.05): the data are normally distributed.
Alternative hypothesis (p <0.05): the data are not normally distributed.

Limitations: because with large sample sizes it is very easy to get significant results from small deviations from normality, and so a significant test doesn’t necessarily tell us whether the deviation from normality is enough to bias any statistical procedures that I apply to the data.
=> Plot your data (histogram / Q-Q plots) as well and try to make an informed decision about the extent of non-normality (and the values of skew and kurtosis.).

For 1 variable that have many groups:
shapiro.test(variable)
For many groups in 1 variable:
by(numeric variable, group/categorical variable, name of test)

3.3.2 Drawing Q-Q plot

We have 2 ways to command:

Package: ggplot2

ggplot(data = , aes(sample = )) + geom_qq() + geom_qq_line() + labs( x = “Theoretical”, y = “Sample Quantiles -”)

Package: ggpubr

ggqqplot(variable, ylab = ” “)

3.3.3 Practice

Let check the numeric variable: age; nihss; sbp; time.iv; aspects; time.rand; time.punc.

# Calculate p-value
by(df$age, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.97333, p-value = 0.0002228
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.96996, p-value = 2.132e-05

# Draw Q-Q plot for age
a = ggplot(data = df, aes(sample = age)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - Age")

# Draw Q-Q plot for Age - Intervention vS Control
b = ggplot(data = df, aes(sample = age)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - Age")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for Age", 
               color = "red", face = "bold", size = 12))

A quicker way to draw Q-Q plot:

ggqqplot(df$age, ylab = "age ")

nihss

# Calculate p-value
by(df$nihss, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.94435, p-value = 9.121e-08
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.93951, p-value = 5.145e-09

# Draw Q-Q plot for nihss
a = ggplot(data = df, aes(sample = nihss)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - nihss")

# Draw Q-Q plot for nihss - Intervention vS Control
b = ggplot(data = df, aes(sample = nihss)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - nihss")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for nihss", 
               color = "red", face = "bold", size = 12))

sbp

# Calculate p-value
by(df$sbp, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.99584, p-value = 0.7869
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.99676, p-value = 0.8671

# Draw Q-Q plot for sbp
a = ggplot(data = df, aes(sample = sbp)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - sbp")

# Draw Q-Q plot for sbp - Intervention vS Control
b = ggplot(data = df, aes(sample = sbp)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - sbp")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for sbp", 
               color = "red", face = "bold", size = 12))

time.iv

# Calculate p-value
by(df$time.iv, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.88032, p-value = 1.31e-11
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.93338, p-value = 5.276e-09

# Draw Q-Q plot for time.iv
a = ggplot(data = df, aes(sample = time.iv)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - time.iv")

# Draw Q-Q plot for time.iv - Intervention vS Control
b = ggplot(data = df, aes(sample = time.iv)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - time.iv")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for time.iv", 
               color = "red", face = "bold", size = 12))

aspects

# Calculate p-value
by(df$aspects, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.85145, p-value = 3.364e-14
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.82188, p-value < 2.2e-16

# Draw Q-Q plot for aspects
a = ggplot(data = df, aes(sample = aspects)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - aspects")

# Draw Q-Q plot for aspects - Intervention vS Control
b = ggplot(data = df, aes(sample = aspects)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - aspects")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for aspects", 
               color = "red", face = "bold", size = 12))

time.rand

# Calculate p-value
by(df$time.rand, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.95684, p-value = 2.032e-06
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.92567, p-value = 2.703e-10

# Draw Q-Q plot for time.rand
a = ggplot(data = df, aes(sample = time.rand)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - time.rand")

# Draw Q-Q plot for time.rand - Intervention vS Control
b = ggplot(data = df, aes(sample = time.rand)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - time.rand")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for time.rand", 
               color = "red", face = "bold", size = 12))

time.punc.

# Because time.punc has values in only one group - Intervention
shapiro.test(df$time.punc)

## 
##  Shapiro-Wilk normality test
## 
## data:  df$time.punc
## W = 0.93577, p-value = 1.435e-08

# Draw Q-Q plot for time.punc
plot = ggplot(data = df, aes(sample = time.punc)) + 
        geom_qq() +
        geom_qq_line() +
        labs( x = "Theoretical", y = "Sample Quantiles - time.punc")

annotate_figure(plot, bottom = text_grob("The Q-Q plot for time.punc", 
               color = "red", face = "bold", size = 12))

Discuss
Based on the p value + Q-Q plot for Intervention and Control groups:

sbp has p-value > 0.05 => normal distribution => Use Mean [SD].
Other variables has p-value < 0.05 => not normal distribution => Use Median-[Q1-Q3].

3.4 Step 4: Displaying different statistics for different variables:

By default continuous variables are analyzed as normal-distributed. When a table is built, continuous variables sbp will be described with mean and standard deviation. To change default options, e.g., age, nihss, time.iv, aspects, time.rand, time.punc used as non-normal distributed:

compareGroups(group variable ~ numeric/categorical variable, data = , method = c(numeric variable = 2))

Possible values in methods statement are:

1: forces analysis as normal-distributed
2: forces analysis as continuous non-normal
3: forces analysis as categorical
NA: performs a Shapiro-Wilks test to decide between normal or non-normal (preferable).

If the method argument is stated as NA for a variable, then a Shapiro-Wilk test for normality is used to decide if the variable is normal or non-normal distributed.

In case, I already know about distribution of data (From step 3- normal or non-normal):

createTable( compareGroups( trt ~ age + sex + nihss + location + hx.isch + afib + dm +
                              mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus +
                              extra.ica + time.rand + time.punc, 
                            data = df,
                            method = c(age=2, nihss=2, sbp = 1, time.iv=2, aspects=2, 
                                       time.rand =2, time.punc =2)))

## 
## --------Summary descriptives table by 'trt'---------
## 
## ________________________________________________________________ 
##                        Intervention       Control      p.overall 
##                           N=233            N=267                 
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  65.8 [54.5;76.0] 65.7 [55.8;76.2]   0.579   
## sex:                                                     0.917   
##     Female              98 (42.1%)      110 (41.2%)              
##     Male               135 (57.9%)      157 (58.8%)              
## nihss                17.0 [14.0;21.0] 18.0 [14.0;22.0]   0.453   
## location:                                                0.111   
##     Left               116 (49.8%)      153 (57.3%)              
##     Right              117 (50.2%)      114 (42.7%)              
## hx.isch:                                                 0.335   
##     No                 204 (87.6%)      242 (90.6%)              
##     Yes                 29 (12.4%)       25 (9.36%)              
## afib:                                                    0.601   
##     No                 167 (71.7%)      198 (74.2%)              
##     Yes                 66 (28.3%)       69 (25.8%)              
## dm:                                                      1.000   
##     No                 204 (87.6%)      233 (87.3%)              
##     Yes                 29 (12.4%)       34 (12.7%)              
## mrankin:                                                 0.922   
##     0                  190 (81.5%)      214 (80.1%)              
##     1                   21 (9.01%)       29 (10.9%)              
##     2                   12 (5.15%)       13 (4.87%)              
##     > 2                 10 (4.29%)       11 (4.12%)              
## sbp                     146 (26.0)       145 (24.4)      0.649   
## iv.altep:                                                0.267   
##     No                  30 (12.9%)       25 (9.36%)              
##     Yes                203 (87.1%)      242 (90.6%)              
## time.iv              85.0 [67.0;110]  87.0 [65.0;116]    0.596   
## aspects              9.00 [7.00;10.0] 9.00 [8.00;10.0]   0.075   
## ia.occlus:                                               0.819   
##     Intracranial ICA    1 (0.43%)        3 (1.13%)               
##     ICA with M1         59 (25.3%)       75 (28.2%)              
##     M1                 154 (66.1%)      165 (62.0%)              
##     M2                  18 (7.73%)       21 (7.89%)              
##     A1 or A2            1 (0.43%)        2 (0.75%)               
## extra.ica:                                               0.179   
##     No                 158 (67.8%)      196 (73.7%)              
##     Yes                 75 (32.2%)       70 (26.3%)              
## time.rand             204 [152;250]    196 [149;266]     0.251   
## time.punc             260 [212;313]       . [.;.]          .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

If I skip step 3 for normality checking, I can define method = c( numeric variable=NA)

createTable( compareGroups( trt ~ age + sex + nihss + location + hx.isch + afib + dm +
                              mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus +
                              extra.ica + time.rand + time.punc, 
                            data = df,
                            method = c(age=NA, nihss=NA, sbp = NA, time.iv=NA, aspects=NA, 
                                       time.rand =NA, time.punc =NA)))

## 
## --------Summary descriptives table by 'trt'---------
## 
## ________________________________________________________________ 
##                        Intervention       Control      p.overall 
##                           N=233            N=267                 
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  65.8 [54.5;76.0] 65.7 [55.8;76.2]   0.579   
## sex:                                                     0.917   
##     Female              98 (42.1%)      110 (41.2%)              
##     Male               135 (57.9%)      157 (58.8%)              
## nihss                17.0 [14.0;21.0] 18.0 [14.0;22.0]   0.453   
## location:                                                0.111   
##     Left               116 (49.8%)      153 (57.3%)              
##     Right              117 (50.2%)      114 (42.7%)              
## hx.isch:                                                 0.335   
##     No                 204 (87.6%)      242 (90.6%)              
##     Yes                 29 (12.4%)       25 (9.36%)              
## afib:                                                    0.601   
##     No                 167 (71.7%)      198 (74.2%)              
##     Yes                 66 (28.3%)       69 (25.8%)              
## dm:                                                      1.000   
##     No                 204 (87.6%)      233 (87.3%)              
##     Yes                 29 (12.4%)       34 (12.7%)              
## mrankin:                                                 0.922   
##     0                  190 (81.5%)      214 (80.1%)              
##     1                   21 (9.01%)       29 (10.9%)              
##     2                   12 (5.15%)       13 (4.87%)              
##     > 2                 10 (4.29%)       11 (4.12%)              
## sbp                     146 (26.0)       145 (24.4)      0.649   
## iv.altep:                                                0.267   
##     No                  30 (12.9%)       25 (9.36%)              
##     Yes                203 (87.1%)      242 (90.6%)              
## time.iv              85.0 [67.0;110]  87.0 [65.0;116]    0.596   
## aspects              9.00 [7.00;10.0] 9.00 [8.00;10.0]   0.075   
## ia.occlus:                                               0.819   
##     Intracranial ICA    1 (0.43%)        3 (1.13%)               
##     ICA with M1         59 (25.3%)       75 (28.2%)              
##     M1                 154 (66.1%)      165 (62.0%)              
##     M2                  18 (7.73%)       21 (7.89%)              
##     A1 or A2            1 (0.43%)        2 (0.75%)               
## extra.ica:                                               0.179   
##     No                 158 (67.8%)      196 (73.7%)              
##     Yes                 75 (32.2%)       70 (26.3%)              
## time.rand             204 [152;250]    196 [149;266]     0.251   
## time.punc             260 [212;313]       . [.;.]          .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Discuss:

The 2 table in Step 4 are identical. Therefore, we can skip step 3 and define (method = c(variable=NA) directly.

3.5 Step 5: Odds Ratio

When the response variable is binary, the Odds Ratio (OR) can be printed in the final table. If the response variable is time-to-event, the Hazard Ratio (HR) can be printed instead.

ref: This statement can be used to change the reference category of row.
ref.y: By default (ref.y =1 = 1st reference category of column) when OR or HR are calculated, the reference category for the response variable is the first. The reference category could be changed using the ref.y statement.
https://cran.r-project.org/web/packages/compareGroups/vignettes/compareGroups_vignette.html

Note: This out put show OR of Control, Intervention is the refernce category. This code should be used for visualization only (default ref.y =1)

OR = compareGroups( trt ~ age + sex + nihss + location + hx.isch + 
                                     afib + dm + mrankin + sbp + iv.altep + 
                                     time.iv + aspects + ia.occlus + extra.ica + 
                                     time.rand + time.punc,
                    data = df,
                    method = c(age=NA, nihss=NA, sbp = NA, time.iv=NA,
                               aspects=NA, time.rand =NA, time.punc =NA),
                    ref = 1,
                    ref.y=1)

createTable(OR, show.ratio = TRUE)

## 
## --------Summary descriptives table by 'trt'---------
## 
## _________________________________________________________________________________________ 
##                        Intervention       Control             OR        p.ratio p.overall 
##                           N=233            N=267                                          
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  65.8 [54.5;76.0] 65.7 [55.8;76.2] 1.01 [0.99;1.02]  0.342    0.579   
## sex:                                                                              0.917   
##     Female              98 (42.1%)      110 (41.2%)          Ref.        Ref.             
##     Male               135 (57.9%)      157 (58.8%)    1.04 [0.72;1.48]  0.846            
## nihss                17.0 [14.0;21.0] 18.0 [14.0;22.0] 1.01 [0.97;1.04]  0.787    0.453   
## location:                                                                         0.111   
##     Left               116 (49.8%)      153 (57.3%)          Ref.        Ref.             
##     Right              117 (50.2%)      114 (42.7%)    0.74 [0.52;1.05]  0.094            
## hx.isch:                                                                          0.335   
##     No                 204 (87.6%)      242 (90.6%)          Ref.        Ref.             
##     Yes                 29 (12.4%)       25 (9.36%)    0.73 [0.41;1.28]  0.273            
## afib:                                                                             0.601   
##     No                 167 (71.7%)      198 (74.2%)          Ref.        Ref.             
##     Yes                 66 (28.3%)       69 (25.8%)    0.88 [0.59;1.31]  0.534            
## dm:                                                                               1.000   
##     No                 204 (87.6%)      233 (87.3%)          Ref.        Ref.             
##     Yes                 29 (12.4%)       34 (12.7%)    1.03 [0.60;1.75]  0.925            
## mrankin:                                                                          0.922   
##     0                  190 (81.5%)      214 (80.1%)          Ref.        Ref.             
##     1                   21 (9.01%)       29 (10.9%)    1.22 [0.68;2.25]  0.507            
##     2                   12 (5.15%)       13 (4.87%)    0.96 [0.42;2.20]  0.924            
##     > 2                 10 (4.29%)       11 (4.12%)    0.98 [0.40;2.42]  0.956            
## sbp                     146 (26.0)       145 (24.4)    1.00 [0.99;1.01]  0.646    0.649   
## iv.altep:                                                                         0.267   
##     No                  30 (12.9%)       25 (9.36%)          Ref.        Ref.             
##     Yes                203 (87.1%)      242 (90.6%)    1.43 [0.81;2.53]  0.215            
## time.iv              85.0 [67.0;110]  87.0 [65.0;116]  0.99 [0.99;1.00]  0.004    0.596   
## aspects              9.00 [7.00;10.0] 9.00 [8.00;10.0] 1.13 [1.01;1.27]  0.033    0.075   
## ia.occlus:                                                                        0.819   
##     Intracranial ICA    1 (0.43%)        3 (1.13%)           Ref.        Ref.             
##     ICA with M1         59 (25.3%)       75 (28.2%)    0.46 [0.02;4.10]  0.513            
##     M1                 154 (66.1%)      165 (62.0%)    0.39 [0.01;3.40]  0.414            
##     M2                  18 (7.73%)       21 (7.89%)    0.43 [0.01;4.06]  0.484            
##     A1 or A2            1 (0.43%)        2 (0.75%)     0.71 [0.01;38.5]  0.857            
## extra.ica:                                                                        0.179   
##     No                 158 (67.8%)      196 (73.7%)          Ref.        Ref.             
##     Yes                 75 (32.2%)       70 (26.3%)    0.75 [0.51;1.11]  0.152            
## time.rand             204 [152;250]    196 [149;266]   1.00 [1.00;1.01]  0.051    0.251   
## time.punc             260 [212;313]       . [.;.]          . [.;.]         .        .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Note: To show OR of Intervention, define using ref.y =2:

OR <- compareGroups(trt ~ age + sex + nihss + location + hx.isch + afib + dm + 
                      mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus + 
                      extra.ica + time.rand + time.punc,
                    data = df, 
                    method = c(age=NA, nihss=NA, sbp = NA, time.iv=NA, 
                               aspects=NA, time.rand =NA, time.punc =NA),
                    ref = c(sex = 2, location = 2),
                    ref.y = 2
                    )
createTable(OR, show.ratio = TRUE)

## 
## --------Summary descriptives table by 'trt'---------
## 
## _________________________________________________________________________________________ 
##                        Intervention       Control             OR        p.ratio p.overall 
##                           N=233            N=267                                          
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  65.8 [54.5;76.0] 65.7 [55.8;76.2] 1.00 [1.01;0.98]  0.342    0.579   
## sex:                                                                              0.917   
##     Female              98 (42.1%)      110 (41.2%)    1.04 [0.72;1.48]  0.846            
##     Male               135 (57.9%)      157 (58.8%)          Ref.        Ref.             
## nihss                17.0 [14.0;21.0] 18.0 [14.0;22.0] 0.99 [1.03;0.96]  0.787    0.453   
## location:                                                                         0.111   
##     Left               116 (49.8%)      153 (57.3%)    0.74 [0.52;1.05]  0.094            
##     Right              117 (50.2%)      114 (42.7%)          Ref.        Ref.             
## hx.isch:                                                                          0.335   
##     No                 204 (87.6%)      242 (90.6%)          Ref.        Ref.             
##     Yes                 29 (12.4%)       25 (9.36%)    1.37 [0.78;2.44]  0.273            
## afib:                                                                             0.601   
##     No                 167 (71.7%)      198 (74.2%)          Ref.        Ref.             
##     Yes                 66 (28.3%)       69 (25.8%)    1.13 [0.76;1.69]  0.534            
## dm:                                                                               1.000   
##     No                 204 (87.6%)      233 (87.3%)          Ref.        Ref.             
##     Yes                 29 (12.4%)       34 (12.7%)    0.97 [0.57;1.66]  0.925            
## mrankin:                                                                          0.922   
##     0                  190 (81.5%)      214 (80.1%)          Ref.        Ref.             
##     1                   21 (9.01%)       29 (10.9%)    0.82 [0.44;1.48]  0.507            
##     2                   12 (5.15%)       13 (4.87%)    1.04 [0.45;2.37]  0.924            
##     > 2                 10 (4.29%)       11 (4.12%)    1.03 [0.41;2.51]  0.956            
## sbp                     146 (26.0)       145 (24.4)    1.00 [1.01;0.99]  0.646    0.649   
## iv.altep:                                                                         0.267   
##     No                  30 (12.9%)       25 (9.36%)          Ref.        Ref.             
##     Yes                203 (87.1%)      242 (90.6%)    0.70 [0.40;1.23]  0.215            
## time.iv              85.0 [67.0;110]  87.0 [65.0;116]  1.01 [1.01;1.00]  0.004    0.596   
## aspects              9.00 [7.00;10.0] 9.00 [8.00;10.0] 0.88 [0.99;0.79]  0.033    0.075   
## ia.occlus:                                                                        0.819   
##     Intracranial ICA    1 (0.43%)        3 (1.13%)           Ref.        Ref.             
##     ICA with M1         59 (25.3%)       75 (28.2%)    2.16 [0.24;63.1]  0.513            
##     M1                 154 (66.1%)      165 (62.0%)    2.57 [0.29;74.2]  0.414            
##     M2                  18 (7.73%)       21 (7.89%)    2.33 [0.25;71.2]  0.484            
##     A1 or A2            1 (0.43%)        2 (0.75%)     1.41 [0.03;76.8]  0.857            
## extra.ica:                                                                        0.179   
##     No                 158 (67.8%)      196 (73.7%)          Ref.        Ref.             
##     Yes                 75 (32.2%)       70 (26.3%)    1.33 [0.90;1.96]  0.152            
## time.rand             204 [152;250]    196 [149;266]   1.00 [1.00;0.99]  0.051    0.251   
## time.punc             260 [212;313]       . [.;.]          . [.;.]         .        .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Checking the OR with Crosstable() function:

CrossTable(df$sex, df$trt, fisher = TRUE, chisq = TRUE,expected = TRUE, sresid = TRUE, format = "SPSS")

## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |         Expected Values |
## | Chi-square contribution |
## |             Row Percent |
## |          Column Percent |
## |           Total Percent |
## |            Std Residual |
## |-------------------------|
## 
## Total Observations in Table:  500 
## 
##              | df$trt 
##       df$sex | Intervention  |      Control  |    Row Total | 
## -------------|--------------|--------------|--------------|
##       Female |          98  |         110  |         208  | 
##              |      96.928  |     111.072  |              | 
##              |       0.012  |       0.010  |              | 
##              |      47.115% |      52.885% |      41.600% | 
##              |      42.060% |      41.199% |              | 
##              |      19.600% |      22.000% |              | 
##              |       0.109  |      -0.102  |              | 
## -------------|--------------|--------------|--------------|
##         Male |         135  |         157  |         292  | 
##              |     136.072  |     155.928  |              | 
##              |       0.008  |       0.007  |              | 
##              |      46.233% |      53.767% |      58.400% | 
##              |      57.940% |      58.801% |              | 
##              |      27.000% |      31.400% |              | 
##              |      -0.092  |       0.086  |              | 
## -------------|--------------|--------------|--------------|
## Column Total |         233  |         267  |         500  | 
##              |      46.600% |      53.400% |              | 
## -------------|--------------|--------------|--------------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  0.03801773     d.f. =  1     p =  0.8454075 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  0.01082402     d.f. =  1     p =  0.9171387 
## 
##  
## Fisher's Exact Test for Count Data
## ------------------------------------------------------------
## Sample estimate odds ratio:  1.03604 
## 
## Alternative hypothesis: true odds ratio is not equal to 1
## p =  0.8561188 
## 95% confidence interval:  0.7138417 1.50323 
## 
## Alternative hypothesis: true odds ratio is less than 1
## p =  0.6126348 
## 95% confidence interval:  0 1.419597 
## 
## Alternative hypothesis: true odds ratio is greater than 1
## p =  0.4584408 
## 95% confidence interval:  0.7559892 Inf 
## 
## 
##  
##        Minimum expected frequency: 96.928

location

CrossTable(df$location, df$trt, fisher = TRUE, chisq = TRUE,expected = TRUE, sresid = TRUE, format = "SPSS")

## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |         Expected Values |
## | Chi-square contribution |
## |             Row Percent |
## |          Column Percent |
## |           Total Percent |
## |            Std Residual |
## |-------------------------|
## 
## Total Observations in Table:  500 
## 
##              | df$trt 
##  df$location | Intervention  |      Control  |    Row Total | 
## -------------|--------------|--------------|--------------|
##         Left |         116  |         153  |         269  | 
##              |     125.354  |     143.646  |              | 
##              |       0.698  |       0.609  |              | 
##              |      43.123% |      56.877% |      53.800% | 
##              |      49.785% |      57.303% |              | 
##              |      23.200% |      30.600% |              | 
##              |      -0.835  |       0.780  |              | 
## -------------|--------------|--------------|--------------|
##        Right |         117  |         114  |         231  | 
##              |     107.646  |     123.354  |              | 
##              |       0.813  |       0.709  |              | 
##              |      50.649% |      49.351% |      46.200% | 
##              |      50.215% |      42.697% |              | 
##              |      23.400% |      22.800% |              | 
##              |       0.902  |      -0.842  |              | 
## -------------|--------------|--------------|--------------|
## Column Total |         233  |         267  |         500  | 
##              |      46.600% |      53.400% |              | 
## -------------|--------------|--------------|--------------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  2.829263     d.f. =  1     p =  0.09256083 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  2.534881     d.f. =  1     p =  0.1113553 
## 
##  
## Fisher's Exact Test for Count Data
## ------------------------------------------------------------
## Sample estimate odds ratio:  0.7391921 
## 
## Alternative hypothesis: true odds ratio is not equal to 1
## p =  0.105602 
## 95% confidence interval:  0.5108763 1.068214 
## 
## Alternative hypothesis: true odds ratio is less than 1
## p =  0.05565712 
## 95% confidence interval:  0 1.00947 
## 
## Alternative hypothesis: true odds ratio is greater than 1
## p =  0.9618255 
## 95% confidence interval:  0.5408447 Inf 
## 
## 
##  
##        Minimum expected frequency: 107.646

Discuss

p.ratio: we can ignore this parameter.
** For numeric variable: It reflects the p-value of parametric test.
** For categorical variable: it is p-value of Chi-squared test.
p.overall:
** For numeric variable: it reflects p-value of parametric or non-parametric tests, depend on what we define the method = c(variable=NA)
** For categorical variable: it reflects p-value of Chi-squared with Yates’ continuity correction.
OR: results of Fisher exact test.
OR of Fisher exact test similar to OR created by compareGroups()

3.6 Step 6: Exporting the table (for CompareGroups() only)

Tables can be exported to CSV, HTML, LaTeX, PDF, Markdown, Word or Excel:

export2csv(restab, file=‘table1.csv’), exports to CSV format
export2html(restab, file=‘table1.html’), exports to HTML format
export2pdf(restab, file=‘table1.pdf’), exports to PDF format
export2word(restab, file=‘table1.docx’), exports to Word format
export2xls(restab, file=‘table1.xlsx’), exports to Excel format

restab = createTable(OR, show.ratio = TRUE)

export2csv(restab, file='table1.csv')

3.7 Extra: Dealing with missing value

Many times, it is important to be aware of the missingness contained in each variable, possibly by groups. Althought “available” table shows the number of the non-missing values for each row-variable and in each group, it would be desirable to test whether the frequency of non-available data is different between groups. For this porpose, a new function has been implemented in the compareGroups package, which is called missingTable. This function applies to both compareGroups and createTable class objects. This last option is useful when the table is already created.

The following code missingTable() is preferable as it can show missing value in both numeric and categorical variables.

# Identify both numeric and categorical variables:
OR <- compareGroups(trt ~ age + sex + nihss + location + hx.isch + afib + dm + 
                      mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus + 
                      extra.ica + time.rand + time.punc,
                    data = df, 
                    method = c(age=NA, nihss=NA, sbp = NA, time.iv=NA, 
                               aspects=NA, time.rand =NA, time.punc =NA),
                    ref = c(sex = 2, location = 2),
                    ref.y = 2,
                    )
missingTable(OR)

## 
## --------Missingness table by 'trt'---------
## 
## ___________________________________________ 
##           Intervention  Control   p.overall 
##              N=233       N=267              
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age        0 (0.00%)   0 (0.00%)      .     
## sex        0 (0.00%)   0 (0.00%)      .     
## nihss      0 (0.00%)   0 (0.00%)      .     
## location   0 (0.00%)   0 (0.00%)      .     
## hx.isch    0 (0.00%)   0 (0.00%)      .     
## afib       0 (0.00%)   0 (0.00%)      .     
## dm         0 (0.00%)   0 (0.00%)      .     
## mrankin    0 (0.00%)   0 (0.00%)      .     
## sbp        0 (0.00%)   1 (0.37%)    1.000   
## iv.altep   0 (0.00%)   0 (0.00%)      .     
## time.iv    30 (12.9%)  25 (9.36%)   0.267   
## aspects    0 (0.00%)   4 (1.50%)    0.127   
## ia.occlus  0 (0.00%)   1 (0.37%)    1.000   
## extra.ica  0 (0.00%)   1 (0.37%)    1.000   
## time.rand  2 (0.86%)   0 (0.00%)    0.217   
## time.punc  0 (0.00%)   267 (100%)  <0.001   
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Discuss

p.overall: show if the difference is significant between Intervention and Control.
** Except time.punc, other variables p>0.05: The missing value was not significant different between 2 groups.
** time.punc: p<0.05: because we only measure in Intervention group, no apply in Control group.

4 Building table: table1() function

Loading the data: fakestroke data

# Loading the data, adjust the character of each column
df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx", 
                 col_types = c("text", "text", "numeric",
                               "text", "numeric", "text", "text", 
                               "numeric", "numeric", "text", "numeric", 
                               "text", "numeric", "numeric", "text", 
                               "numeric", "numeric", "numeric"))

4.1 Step 1: Identify categorical or numeric variables

Before constructing a table, I must check numeric/ categorical variables.
For example: in our data:

Numeric variable: age; nihss; sbp; time.iv; aspects; time.rand; time.punc.
Character variable: sex; location; hx.isch; afib; dm; mrankin; iv.altep; ia.occlus; extra.ica.

In the excel file, afib, dm, extra.ica are noted as “1” and “0”.
Therefore, I must convert them back to “Yes-No” for character variables:

# Changing coding variable to categorical:
df$afib       <- factor(df$afib , levels=0:1, labels=c("No", "Yes"))
df$dm         <- factor(df$dm , levels=0:1, labels=c("No", "Yes"))
df$extra.ica  <- factor(df$extra.ica, levels=0:1, labels=c("No", "Yes"))

Reorder the level of some character variables:

trt: Intervention (baseline) , Control
mrankin: 0, 1, 2, > 2.
ia.occlus: rearrange the ia.occlus variable to the order presented in Berkhemer et al. (2015).

# Relevel factors:
df$trt = factor(df$trt, levels=c("Intervention", "Control"))
df$mrankin = factor(df$mrankin, levels=c("0", "1", "2", "> 2"))
df$ia.occlus = factor(df$ia.occlus, levels=c("Intracranial ICA", "ICA with M1", 
                                             "M1", "M2", "A1 or A2"))

# Check order of factor after relevel: 
df$trt[1:2]

## [1] Control      Intervention
## Levels: Intervention Control

df$mrankin[1:2]

## [1] 2 0
## Levels: 0 1 2 > 2

df$ia.occlus[1:3]

## [1] M1          M1          ICA with M1
## Levels: Intracranial ICA ICA with M1 M1 M2 A1 or A2

4.2 Step 2: First draft: table1()

I’ll use the table1 package of functions available in R to help me complete this task.

The command is as follow:
table1( ~ var1 + var2 + … | group, data = …)
table1( ~ . | group, data = …)

var1; var2: numeric or categorical variables
** For numeric var: Mean (SD) – Median (Q1, Q3)
** For categorical var: arragne in order: count (%)
group: name of the main group(s) you want to summerize/ put to header.

Using built-in styles for table1():

zebra: alternating shaded and unshaded rows (zebra stripes)
grid: show all grid lines
shade: shade the header row(s) in gray
times: use a serif font
center: center all columns, including the first which contains the row labels

Adjust the components of table:

render.continuous = c(.=“Mean (SD)”, .=“Median [Q1, Q3]”)
render.continuous = c(.=“Mean (SD)”, .=“Median [Mix, Max]”)

More adjustment can be found here: https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html

table1( ~ age + sex + nihss + location + hx.isch + afib + dm + mrankin +
          sbp + iv.altep + time.iv + aspects + ia.occlus + extra.ica + time.rand + 
          time.punc | trt, data = df, topclass="Rtable1-zebra",
          render.continuous = c(.="Mean (SD)", .="Median [Q1, Q3]"),
          overall="Total")

	Intervention (N=233)	Control (N=267)	Total (N=500)
age
Mean (SD)	63.9 (18.1)	65.4 (16.1)	64.7 (17.1)
Median [Q1, Q3]	65.8 [54.5, 76.0]	65.7 [55.8, 76.2]	65.8 [55.0, 76.0]
sex
Female	98 (42.1%)	110 (41.2%)	208 (41.6%)
Male	135 (57.9%)	157 (58.8%)	292 (58.4%)
nihss
Mean (SD)	18.0 (5.04)	18.1 (4.32)	18.0 (4.67)
Median [Q1, Q3]	17.0 [14.0, 21.0]	18.0 [14.0, 22.0]	18.0 [14.0, 22.0]
location
Left	116 (49.8%)	153 (57.3%)	269 (53.8%)
Right	117 (50.2%)	114 (42.7%)	231 (46.2%)
hx.isch
No	204 (87.6%)	242 (90.6%)	446 (89.2%)
Yes	29 (12.4%)	25 (9.4%)	54 (10.8%)
afib
No	167 (71.7%)	198 (74.2%)	365 (73.0%)
Yes	66 (28.3%)	69 (25.8%)	135 (27.0%)
dm
No	204 (87.6%)	233 (87.3%)	437 (87.4%)
Yes	29 (12.4%)	34 (12.7%)	63 (12.6%)
mrankin
0	190 (81.5%)	214 (80.1%)	404 (80.8%)
1	21 (9.0%)	29 (10.9%)	50 (10.0%)
2	12 (5.2%)	13 (4.9%)	25 (5.0%)
> 2	10 (4.3%)	11 (4.1%)	21 (4.2%)
sbp
Mean (SD)	146 (26.0)	145 (24.4)	145 (25.1)
Median [Q1, Q3]	146 [129, 164]	145 [128, 161]	145 [129, 163]
Missing	0 (0%)	1 (0.4%)	1 (0.2%)
iv.altep
No	30 (12.9%)	25 (9.4%)	55 (11.0%)
Yes	203 (87.1%)	242 (90.6%)	445 (89.0%)
time.iv
Mean (SD)	98.2 (45.5)	88.0 (26.0)	92.6 (36.5)
Median [Q1, Q3]	85.0 [67.0, 110]	87.0 [65.0, 116]	86.0 [67.0, 115]
Missing	30 (12.9%)	25 (9.4%)	55 (11.0%)
aspects
Mean (SD)	8.35 (1.64)	8.65 (1.47)	8.51 (1.56)
Median [Q1, Q3]	9.00 [7.00, 10.0]	9.00 [8.00, 10.0]	9.00 [7.00, 10.0]
Missing	0 (0%)	4 (1.5%)	4 (0.8%)
ia.occlus
Intracranial ICA	1 (0.4%)	3 (1.1%)	4 (0.8%)
ICA with M1	59 (25.3%)	75 (28.1%)	134 (26.8%)
M1	154 (66.1%)	165 (61.8%)	319 (63.8%)
M2	18 (7.7%)	21 (7.9%)	39 (7.8%)
A1 or A2	1 (0.4%)	2 (0.7%)	3 (0.6%)
Missing	0 (0%)	1 (0.4%)	1 (0.2%)
extra.ica
No	158 (67.8%)	196 (73.4%)	354 (70.8%)
Yes	75 (32.2%)	70 (26.2%)	145 (29.0%)
Missing	0 (0%)	1 (0.4%)	1 (0.2%)
time.rand
Mean (SD)	203 (57.3)	214 (70.3)	209 (64.8)
Median [Q1, Q3]	204 [152, 250]	196 [149, 266]	202 [151, 258]
Missing	2 (0.9%)	0 (0%)	2 (0.4%)
time.punc
Mean (SD)	263 (54.2)	NA	263 (54.2)
Median [Q1, Q3]	260 [212, 313]	NA	260 [212, 313]
Missing	0 (0%)	267 (100%)	267 (53.4%)

Discuss:

This table provides full detail Mean-Median for numeric variable; Count-% for character variable, missing value.
For numeric variable: Mean (SD) or Median (Q1-Q3)?
** Normal distribution: Mean (SD)
** Non-normal: Median (Q1-Q3)

4.3 Step 3: Testing normality - Shapiro–Wilk test

Mechanism: it compares the scores in the sample to a normally distributed set of scores with the same mean and standard deviation.

If the test is non-significant (p > .05) = normal distribution.
If the test is significant (p < .05) = non-normal distribution.

For 1 variable that have many groups:
shapiro.test(variable)
For many groups in 1 variable:
by(numeric variable, group/categorical variable, name of test)

Let check the numeric variable: age; nihss; sbp; time.iv; aspects; time.rand; time.punc.

# Calculate p-value
by(df$age, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.97333, p-value = 0.0002228
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.96996, p-value = 2.132e-05

# Draw Q-Q plot for age
a = ggplot(data = df, aes(sample = age)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - Age")

# Draw Q-Q plot for Age - Intervention vS Control
b = ggplot(data = df, aes(sample = age)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - Age")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for Age", 
               color = "red", face = "bold", size = 12))

nihss

# Calculate p-value
by(df$nihss, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.94435, p-value = 9.121e-08
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.93951, p-value = 5.145e-09

# Draw Q-Q plot for nihss
a = ggplot(data = df, aes(sample = nihss)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - nihss")

# Draw Q-Q plot for nihss - Intervention vS Control
b = ggplot(data = df, aes(sample = nihss)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - nihss")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for nihss", 
               color = "red", face = "bold", size = 12))

sbp

# Calculate p-value
by(df$sbp, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.99584, p-value = 0.7869
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.99676, p-value = 0.8671

# Draw Q-Q plot for sbp
a = ggplot(data = df, aes(sample = sbp)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - sbp")

# Draw Q-Q plot for sbp - Intervention vS Control
b = ggplot(data = df, aes(sample = sbp)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - sbp")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for sbp", 
               color = "red", face = "bold", size = 12))

time.iv

# Calculate p-value
by(df$time.iv, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.88032, p-value = 1.31e-11
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.93338, p-value = 5.276e-09

# Draw Q-Q plot for time.iv
a = ggplot(data = df, aes(sample = time.iv)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - time.iv")

# Draw Q-Q plot for time.iv - Intervention vS Control
b = ggplot(data = df, aes(sample = time.iv)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - time.iv")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for time.iv", 
               color = "red", face = "bold", size = 12))

aspects

# Calculate p-value
by(df$aspects, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.85145, p-value = 3.364e-14
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.82188, p-value < 2.2e-16

# Draw Q-Q plot for aspects
a = ggplot(data = df, aes(sample = aspects)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - aspects")

# Draw Q-Q plot for aspects - Intervention vS Control
b = ggplot(data = df, aes(sample = aspects)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - aspects")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for aspects", 
               color = "red", face = "bold", size = 12))

time.rand

# Calculate p-value
by(df$time.rand, df$trt, shapiro.test)

## df$trt: Intervention
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.95684, p-value = 2.032e-06
## 
## ------------------------------------------------------------ 
## df$trt: Control
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.92567, p-value = 2.703e-10

# Draw Q-Q plot for time.rand
a = ggplot(data = df, aes(sample = time.rand)) + 
      geom_qq() +
      geom_qq_line() +
      labs( x = "Theoretical", y = "Sample Quantiles - time.rand")

# Draw Q-Q plot for time.rand - Intervention vS Control
b = ggplot(data = df, aes(sample = time.rand)) + 
      geom_qq() +
      geom_qq_line() +
      facet_wrap(~trt) +
      labs( x = "Theoretical", y = "Sample Quantiles - time.rand")
      

# Combine 2 plots:
plot = ggarrange(a,b, 
                 ncol=2, nrow=1, 
                 common.legend = FALSE,
                 legend="right", 
                 labels = c("A","B"))

annotate_figure(plot, bottom = text_grob("The Q-Q plot for time.rand", 
               color = "red", face = "bold", size = 12))

time.punc.

# Because time.punc has values in only one group - Intervention
shapiro.test(df$time.punc)

## 
##  Shapiro-Wilk normality test
## 
## data:  df$time.punc
## W = 0.93577, p-value = 1.435e-08

# Draw Q-Q plot for time.punc
plot = ggplot(data = df, aes(sample = time.punc)) + 
        geom_qq() +
        geom_qq_line() +
        labs( x = "Theoretical", y = "Sample Quantiles - time.punc")

annotate_figure(plot, bottom = text_grob("The Q-Q plot for time.punc", 
               color = "red", face = "bold", size = 12))

Discuss
Based on the p value + Q-Q plot for Intervention and Control groups:

sbp has p-value > 0.05 => normal distribution => Use Mean [SD].
Other variables has p-value < 0.05 => not normal distribution => Use Median-[Q1-Q3].

4.4 Step 4: Displaying different statistics for different variables: table1()

# Function for numeric variable
rndr <- function(x, name, ...) {
    if (!is.numeric(x)) return(render.categorical.default(x))
    what <- switch(name,
        age = "Median [Q1, Q3]",
        nihss = "Median [Q1, Q3]",
        sbp = "Mean (SD)",
        time.iv = "Median [Q1, Q3]",
        aspects = "Median [Q1, Q3]",
        time.rand = "Median [Q1, Q3]",
        time.punc = "Median [Q1, Q3]")
    parse.abbrev.render.code(c("", what))(x)
}

table1( ~ age + sex + nihss + location + hx.isch + afib + dm + mrankin +
          sbp + iv.altep + time.iv + aspects + ia.occlus + extra.ica + time.rand + 
          time.punc | trt, data = df, 
                            render=rndr,
                            overall="Total")

	Intervention (N=233)	Control (N=267)	Total (N=500)
age

Median [Q1, Q3]	65.8 [54.5, 76.0]	65.7 [55.8, 76.2]	65.8 [55.0, 76.0]
sex
Female	98 (42.1%)	110 (41.2%)	208 (41.6%)
Male	135 (57.9%)	157 (58.8%)	292 (58.4%)
nihss

Median [Q1, Q3]	17.0 [14.0, 21.0]	18.0 [14.0, 22.0]	18.0 [14.0, 22.0]
location
Left	116 (49.8%)	153 (57.3%)	269 (53.8%)
Right	117 (50.2%)	114 (42.7%)	231 (46.2%)
hx.isch
No	204 (87.6%)	242 (90.6%)	446 (89.2%)
Yes	29 (12.4%)	25 (9.4%)	54 (10.8%)
afib
No	167 (71.7%)	198 (74.2%)	365 (73.0%)
Yes	66 (28.3%)	69 (25.8%)	135 (27.0%)
dm
No	204 (87.6%)	233 (87.3%)	437 (87.4%)
Yes	29 (12.4%)	34 (12.7%)	63 (12.6%)
mrankin
0	190 (81.5%)	214 (80.1%)	404 (80.8%)
1	21 (9.0%)	29 (10.9%)	50 (10.0%)
2	12 (5.2%)	13 (4.9%)	25 (5.0%)
> 2	10 (4.3%)	11 (4.1%)	21 (4.2%)
sbp

Mean (SD)	146 (26.0)	145 (24.4)	145 (25.1)
iv.altep
No	30 (12.9%)	25 (9.4%)	55 (11.0%)
Yes	203 (87.1%)	242 (90.6%)	445 (89.0%)
time.iv

Median [Q1, Q3]	85.0 [67.0, 110]	87.0 [65.0, 116]	86.0 [67.0, 115]
aspects

Median [Q1, Q3]	9.00 [7.00, 10.0]	9.00 [8.00, 10.0]	9.00 [7.00, 10.0]
ia.occlus
Intracranial ICA	1 (0.4%)	3 (1.1%)	4 (0.8%)
ICA with M1	59 (25.3%)	75 (28.1%)	134 (26.8%)
M1	154 (66.1%)	165 (61.8%)	319 (63.8%)
M2	18 (7.7%)	21 (7.9%)	39 (7.8%)
A1 or A2	1 (0.4%)	2 (0.7%)	3 (0.6%)
extra.ica
No	158 (67.8%)	196 (73.4%)	354 (70.8%)
Yes	75 (32.2%)	70 (26.2%)	145 (29.0%)
time.rand

Median [Q1, Q3]	204 [152, 250]	196 [149, 266]	202 [151, 258]
time.punc

Median [Q1, Q3]	260 [212, 313]	NA	260 [212, 313]

Discuss:

With new table, I don’t display missing value in extra.ica, ia.occlus in Intervention group. (Total observations= 266, not 267).
Compare to the image of Berkhemer et al 2015 (as below), our table from fakestroke data display all the essential elements.

4.5 Step 5: Exporting the table (for table1)

a= table1( ~ age + sex + nihss + location + hx.isch + afib + dm + mrankin +
          sbp + iv.altep + time.iv + aspects + ia.occlus + extra.ica + time.rand + 
          time.punc | trt, data = df, 
                            render=rndr,
                            overall="Total")
write.table (a, "test_file.csv", col.names = T, row.names=F, append= T, sep=',')

4.6 Extra: Adding extra column: p-value (not recommend)

A user asked if it was possible to add a column to the table showing the p-value associated with a univariate test for differences in each variable across strata. This can be accomplished using the extra.col feature.
However, I personally prefer compareGroups function as they can calculate non-normal distributed data.

Next, I create a function to compute the p-value for continuous or categorical variables.

t-test: continuous variables
Chi-squared: categorical variables

#Writing function for p-value column (numeric - t-test; categorical - Chi-squared test)
pvalue <- function(x, ...) {
    x <- x[-length(x)]  # Remove "overall" group
    # Construct vectors of data y, and groups (strata) g
    y <- unlist(x)
    g <- factor(rep(1:length(x), times=sapply(x, length)))
    if (is.numeric(y)) {
        # For numeric variables, perform a standard 2-sample t-test
        p <- t.test(y ~ g)$p.value
    } else {
        # For categorical variables, perform a chi-squared test of independence
        p <- chisq.test(table(y, g))$p.value
    }
    # Format the p-value, using an HTML entity for the less-than sign.
    # The initial empty string places the output on the line below the variable label.
    c("", sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}

# Discard "time.punc" because it has value in "Intervention" only - cannot calculate:
table1( ~ age + sex + nihss + location + hx.isch + afib + 
          dm + mrankin + sbp + iv.altep + time.iv + aspects + 
          ia.occlus + extra.ica + time.rand | trt, 
        data = df, 
        render=rndr,
        overall="Total",
        extra.col=list(`Valor-p`=pvalue))

	Intervention (N=233)	Control (N=267)	Total (N=500)	Valor-p
age
				0.347
Median [Q1, Q3]	65.8 [54.5, 76.0]	65.7 [55.8, 76.2]	65.8 [55.0, 76.0]
sex
Female	98 (42.1%)	110 (41.2%)	208 (41.6%)	0.917
Male	135 (57.9%)	157 (58.8%)	292 (58.4%)
nihss
				0.79
Median [Q1, Q3]	17.0 [14.0, 21.0]	18.0 [14.0, 22.0]	18.0 [14.0, 22.0]
location
Left	116 (49.8%)	153 (57.3%)	269 (53.8%)	0.111
Right	117 (50.2%)	114 (42.7%)	231 (46.2%)
hx.isch
No	204 (87.6%)	242 (90.6%)	446 (89.2%)	0.335
Yes	29 (12.4%)	25 (9.4%)	54 (10.8%)
afib
No	167 (71.7%)	198 (74.2%)	365 (73.0%)	0.601
Yes	66 (28.3%)	69 (25.8%)	135 (27.0%)
dm
No	204 (87.6%)	233 (87.3%)	437 (87.4%)	1
Yes	29 (12.4%)	34 (12.7%)	63 (12.6%)
mrankin
0	190 (81.5%)	214 (80.1%)	404 (80.8%)	0.922
1	21 (9.0%)	29 (10.9%)	50 (10.0%)
2	12 (5.2%)	13 (4.9%)	25 (5.0%)
> 2	10 (4.3%)	11 (4.1%)	21 (4.2%)
sbp
				0.649
Mean (SD)	146 (26.0)	145 (24.4)	145 (25.1)
iv.altep
No	30 (12.9%)	25 (9.4%)	55 (11.0%)	0.267
Yes	203 (87.1%)	242 (90.6%)	445 (89.0%)
time.iv
				0.00471
Median [Q1, Q3]	85.0 [67.0, 110]	87.0 [65.0, 116]	86.0 [67.0, 115]
aspects
				0.0338
Median [Q1, Q3]	9.00 [7.00, 10.0]	9.00 [8.00, 10.0]	9.00 [7.00, 10.0]
ia.occlus
Intracranial ICA	1 (0.4%)	3 (1.1%)	4 (0.8%)	0.795
ICA with M1	59 (25.3%)	75 (28.1%)	134 (26.8%)
M1	154 (66.1%)	165 (61.8%)	319 (63.8%)
M2	18 (7.7%)	21 (7.9%)	39 (7.8%)
A1 or A2	1 (0.4%)	2 (0.7%)	3 (0.6%)
extra.ica
No	158 (67.8%)	196 (73.4%)	354 (70.8%)	0.179
Yes	75 (32.2%)	70 (26.2%)	145 (29.0%)
time.rand
				0.0475
Median [Q1, Q3]	204 [152, 250]	196 [149, 266]	202 [151, 258]

OR I can use ANOVA instead of t-test (if outcome variable has more than 2 groups):

ANOVA: continuous variables
Chi-squared: categorical variables
https://github.com/benjaminrich/table1/issues/52

#Writing function for p-value column (numeric - ANOVA; categorical - Chi-squared test)
pvalue <- function(x, ...) {
  x <- x[-length(x)]  # Remove "overall" group
  # Construct vectors of data y, and groups (strata) g
  y <- unlist(x)
  g <- factor(rep(1:length(x), times=sapply(x, length)))
  if (is.numeric(y)) {
    # For numeric variables, perform an ANOVA
    p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
  } else {
    # For categorical variables, perform a chi-squared test of independence
    p <- chisq.test(table(y, g))$p.value
  }
  # Format the p-value, using an HTML entity for the less-than sign.
  # The initial empty string places the output on the line below the variable label.
  c("", sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}

# Discard "time.punc" because it has value in "Intervention" only - cannot calculate:
table1( ~ age + sex + nihss + location + hx.isch + afib + 
          dm + mrankin + sbp + iv.altep + time.iv + aspects + 
          ia.occlus + extra.ica + time.rand | trt, 
        data = df,
        render=rndr,
        overall="Total", 
        extra.col=list(`Valor-p`=pvalue))

	Intervention (N=233)	Control (N=267)	Total (N=500)	Valor-p
age
				0.343
Median [Q1, Q3]	65.8 [54.5, 76.0]	65.7 [55.8, 76.2]	65.8 [55.0, 76.0]
sex
Female	98 (42.1%)	110 (41.2%)	208 (41.6%)	0.917
Male	135 (57.9%)	157 (58.8%)	292 (58.4%)
nihss
				0.787
Median [Q1, Q3]	17.0 [14.0, 21.0]	18.0 [14.0, 22.0]	18.0 [14.0, 22.0]
location
Left	116 (49.8%)	153 (57.3%)	269 (53.8%)	0.111
Right	117 (50.2%)	114 (42.7%)	231 (46.2%)
hx.isch
No	204 (87.6%)	242 (90.6%)	446 (89.2%)	0.335
Yes	29 (12.4%)	25 (9.4%)	54 (10.8%)
afib
No	167 (71.7%)	198 (74.2%)	365 (73.0%)	0.601
Yes	66 (28.3%)	69 (25.8%)	135 (27.0%)
dm
No	204 (87.6%)	233 (87.3%)	437 (87.4%)	1
Yes	29 (12.4%)	34 (12.7%)	63 (12.6%)
mrankin
0	190 (81.5%)	214 (80.1%)	404 (80.8%)	0.922
1	21 (9.0%)	29 (10.9%)	50 (10.0%)
2	12 (5.2%)	13 (4.9%)	25 (5.0%)
> 2	10 (4.3%)	11 (4.1%)	21 (4.2%)
sbp
				0.647
Mean (SD)	146 (26.0)	145 (24.4)	145 (25.1)
iv.altep
No	30 (12.9%)	25 (9.4%)	55 (11.0%)	0.267
Yes	203 (87.1%)	242 (90.6%)	445 (89.0%)
time.iv
				0.00307
Median [Q1, Q3]	85.0 [67.0, 110]	87.0 [65.0, 116]	86.0 [67.0, 115]
aspects
				0.0327
Median [Q1, Q3]	9.00 [7.00, 10.0]	9.00 [8.00, 10.0]	9.00 [7.00, 10.0]
ia.occlus
Intracranial ICA	1 (0.4%)	3 (1.1%)	4 (0.8%)	0.795
ICA with M1	59 (25.3%)	75 (28.1%)	134 (26.8%)
M1	154 (66.1%)	165 (61.8%)	319 (63.8%)
M2	18 (7.7%)	21 (7.9%)	39 (7.8%)
A1 or A2	1 (0.4%)	2 (0.7%)	3 (0.6%)
extra.ica
No	158 (67.8%)	196 (73.4%)	354 (70.8%)	0.179
Yes	75 (32.2%)	70 (26.2%)	145 (29.0%)
time.rand
				0.0508
Median [Q1, Q3]	204 [152, 250]	196 [149, 266]	202 [151, 258]

Discuss

I must exclude the time.punc because it has value only in “Intervention” => cannot calculate p-value.
For character variable: Chi-squared or Fisher?
** Expected frequencies > 5: Chi-squared
** Expected frequencies < 5: Fisher
For numeric variable: t-test/ANOVA or Wilcoxon rank-sum test?
** t-test and ANOVA test show nearly similar results. t-test should be used for sbp because outcome variable (trt) has 2 groups.
** Wilcoxon rank-sum test should applied to all other numeric variables - because they are non-normal distributed. (calculate outside or using compareGroups)
** Let compare t.test vs Wilcoxon rank-sum test: sbp, age, nihss :
sbp - normal distribution

# For sbp variable
t.test(sbp ~ trt, data =df, paired = FALSE, var.equal= FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  sbp by trt
## t = 0.45607, df = 478.62, p-value = 0.6485
## alternative hypothesis: true difference in means between group Intervention and group Control is not equal to 0
## 95 percent confidence interval:
##  -3.420249  5.487854
## sample estimates:
## mean in group Intervention      mean in group Control 
##                   146.0300                   144.9962

wilcox.test(sbp ~ trt, data = df, pair = FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  sbp by trt
## W = 32039, p-value = 0.5137
## alternative hypothesis: true location shift is not equal to 0

age - non-normal distribiution

# For AGE variable
t.test(age ~ trt, data =df, paired = FALSE, var.equal= FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  age by trt
## t = -0.94205, df = 468.4, p-value = 0.3467
## alternative hypothesis: true difference in means between group Intervention and group Control is not equal to 0
## 95 percent confidence interval:
##  -4.480512  1.576662
## sample estimates:
## mean in group Intervention      mean in group Control 
##                   63.93047                   65.38240

wilcox.test(age ~ trt, data = df, pair = FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  age by trt
## W = 30211, p-value = 0.5788
## alternative hypothesis: true location shift is not equal to 0

nhiss - non-normal distribiution

# For nihss variable
t.test(nihss ~ trt, data =df, paired = FALSE, var.equal= FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  nihss by trt
## t = -0.26692, df = 459.95, p-value = 0.7896
## alternative hypothesis: true difference in means between group Intervention and group Control is not equal to 0
## 95 percent confidence interval:
##  -0.9448105  0.7188376
## sample estimates:
## mean in group Intervention      mean in group Control 
##                   17.96567                   18.07865

wilcox.test(nihss ~ trt, data = df, pair = FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  nihss by trt
## W = 29900, p-value = 0.4533
## alternative hypothesis: true location shift is not equal to 0

Discuss:

There are differences between parametric (t-test) and non-parametric test (Wilcoxon rank sum test).
=> We must check the Normality assumption, Homogeneity of variance to know which test we should apply.
If we want to know p-value, should use compareGroups() function. It is more optimal than table1().

4.7 More comparisons: table with trt*sex variable

More format style can be found here: https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html

In this section, I temp to display a table based on trt x sex. The step by step analysis can be followed as what I already showed previously:

Step 1: Identify categorical variables / numeric variables.
Step 2: First draft: table1()
Step 3: Testing normality
Step 4: Displaying different statistics for different variables: table1().

table1( ~ age + nihss + location + hx.isch + afib + dm + mrankin +
          sbp + iv.altep + time.iv + aspects + ia.occlus + extra.ica + time.rand + 
          time.punc | trt*sex, data = df, 
                            render.continuous = c(.="Mean (SD)", .="Median [Q1, Q3]"),
                            overall="Total")

	Intervention		Control		Total
	Female (N=98)	Male (N=135)	Female (N=110)	Male (N=157)	Female (N=208)	Male (N=292)
age
Mean (SD)	65.0 (18.9)	63.2 (17.5)	66.2 (17.2)	64.8 (15.3)	65.6 (18.0)	64.0 (16.4)
Median [Q1, Q3]	66.0 [54.3, 80.5]	65.0 [54.8, 75.0]	66.5 [56.0, 80.8]	65.0 [55.5, 75.0]	66.0 [55.0, 81.0]	65.0 [55.0, 75.0]
nihss
Mean (SD)	18.3 (5.44)	17.7 (4.74)	17.6 (4.16)	18.4 (4.41)	17.9 (4.81)	18.1 (4.57)
Median [Q1, Q3]	17.0 [14.0, 23.5]	17.0 [14.0, 21.0]	18.0 [14.0, 21.8]	18.0 [14.0, 22.0]	17.0 [14.0, 22.0]	18.0 [14.0, 22.0]
location
Left	46 (46.9%)	70 (51.9%)	64 (58.2%)	89 (56.7%)	110 (52.9%)	159 (54.5%)
Right	52 (53.1%)	65 (48.1%)	46 (41.8%)	68 (43.3%)	98 (47.1%)	133 (45.5%)
hx.isch
No	80 (81.6%)	124 (91.9%)	106 (96.4%)	136 (86.6%)	186 (89.4%)	260 (89.0%)
Yes	18 (18.4%)	11 (8.1%)	4 (3.6%)	21 (13.4%)	22 (10.6%)	32 (11.0%)
afib
No	69 (70.4%)	98 (72.6%)	84 (76.4%)	114 (72.6%)	153 (73.6%)	212 (72.6%)
Yes	29 (29.6%)	37 (27.4%)	26 (23.6%)	43 (27.4%)	55 (26.4%)	80 (27.4%)
dm
No	84 (85.7%)	120 (88.9%)	99 (90.0%)	134 (85.4%)	183 (88.0%)	254 (87.0%)
Yes	14 (14.3%)	15 (11.1%)	11 (10.0%)	23 (14.6%)	25 (12.0%)	38 (13.0%)
mrankin
0	80 (81.6%)	110 (81.5%)	91 (82.7%)	123 (78.3%)	171 (82.2%)	233 (79.8%)
1	12 (12.2%)	9 (6.7%)	9 (8.2%)	20 (12.7%)	21 (10.1%)	29 (9.9%)
2	4 (4.1%)	8 (5.9%)	8 (7.3%)	5 (3.2%)	12 (5.8%)	13 (4.5%)
> 2	2 (2.0%)	8 (5.9%)	2 (1.8%)	9 (5.7%)	4 (1.9%)	17 (5.8%)
sbp
Mean (SD)	148 (26.0)	144 (26.0)	141 (25.9)	148 (23.0)	145 (26.1)	146 (24.4)
Median [Q1, Q3]	150 [133, 163]	143 [127, 164]	141 [123, 157]	146 [131, 162]	145 [126, 162]	146 [129, 163]
Missing	0 (0%)	0 (0%)	1 (0.9%)	0 (0%)	1 (0.5%)	0 (0%)
iv.altep
No	13 (13.3%)	17 (12.6%)	11 (10.0%)	14 (8.9%)	24 (11.5%)	31 (10.6%)
Yes	85 (86.7%)	118 (87.4%)	99 (90.0%)	143 (91.1%)	184 (88.5%)	261 (89.4%)
time.iv
Mean (SD)	102 (44.2)	95.5 (46.4)	90.2 (26.3)	86.4 (25.8)	95.6 (36.1)	90.5 (36.8)
Median [Q1, Q3]	90.0 [68.0, 128]	83.0 [64.8, 108]	89.0 [67.5, 117]	85.0 [63.0, 113]	90.0 [68.0, 118]	83.0 [64.0, 110]
Missing	13 (13.3%)	17 (12.6%)	11 (10.0%)	14 (8.9%)	24 (11.5%)	31 (10.6%)
aspects
Mean (SD)	8.33 (1.67)	8.36 (1.62)	8.67 (1.38)	8.63 (1.53)	8.51 (1.53)	8.50 (1.58)
Median [Q1, Q3]	9.00 [7.00, 10.0]	9.00 [7.00, 10.0]	9.00 [8.00, 10.0]	9.00 [8.00, 10.0]	9.00 [8.00, 10.0]	9.00 [7.00, 10.0]
Missing	0 (0%)	0 (0%)	0 (0%)	4 (2.5%)	0 (0%)	4 (1.4%)
ia.occlus
Intracranial ICA	0 (0%)	1 (0.7%)	1 (0.9%)	2 (1.3%)	1 (0.5%)	3 (1.0%)
ICA with M1	21 (21.4%)	38 (28.1%)	30 (27.3%)	45 (28.7%)	51 (24.5%)	83 (28.4%)
M1	68 (69.4%)	86 (63.7%)	68 (61.8%)	97 (61.8%)	136 (65.4%)	183 (62.7%)
M2	8 (8.2%)	10 (7.4%)	9 (8.2%)	12 (7.6%)	17 (8.2%)	22 (7.5%)
A1 or A2	1 (1.0%)	0 (0%)	1 (0.9%)	1 (0.6%)	2 (1.0%)	1 (0.3%)
Missing	0 (0%)	0 (0%)	1 (0.9%)	0 (0%)	1 (0.5%)	0 (0%)
extra.ica
No	70 (71.4%)	88 (65.2%)	77 (70.0%)	119 (75.8%)	147 (70.7%)	207 (70.9%)
Yes	28 (28.6%)	47 (34.8%)	32 (29.1%)	38 (24.2%)	60 (28.8%)	85 (29.1%)
Missing	0 (0%)	0 (0%)	1 (0.9%)	0 (0%)	1 (0.5%)	0 (0%)
time.rand
Mean (SD)	201 (55.2)	203 (59.0)	206 (62.6)	219 (75.0)	204 (59.2)	212 (68.4)
Median [Q1, Q3]	200 [158, 244]	205 [148, 255]	192 [152, 252]	208 [149, 274]	196 [155, 250]	206 [149, 268]
Missing	2 (2.0%)	0 (0%)	0 (0%)	0 (0%)	2 (1.0%)	0 (0%)
time.punc
Mean (SD)	261 (55.4)	264 (53.6)	NA	NA	261 (55.4)	264 (53.6)
Median [Q1, Q3]	253 [208, 319]	265 [217, 312]	NA	NA	253 [208, 319]	265 [217, 312]
Missing	0 (0%)	0 (0%)	110 (100%)	157 (100%)	110 (52.9%)	157 (53.8%)

5 Summary

For table1():

table1( ~ age + sex + nihss + location + hx.isch + afib + dm + mrankin +
          sbp + iv.altep + time.iv + aspects + ia.occlus + extra.ica + time.rand + 
          time.punc | trt, data = df, 
                            render=rndr,
                            overall="Total")

	Intervention (N=233)	Control (N=267)	Total (N=500)
age

Median [Q1, Q3]	65.8 [54.5, 76.0]	65.7 [55.8, 76.2]	65.8 [55.0, 76.0]
sex
Female	98 (42.1%)	110 (41.2%)	208 (41.6%)
Male	135 (57.9%)	157 (58.8%)	292 (58.4%)
nihss

Median [Q1, Q3]	17.0 [14.0, 21.0]	18.0 [14.0, 22.0]	18.0 [14.0, 22.0]
location
Left	116 (49.8%)	153 (57.3%)	269 (53.8%)
Right	117 (50.2%)	114 (42.7%)	231 (46.2%)
hx.isch
No	204 (87.6%)	242 (90.6%)	446 (89.2%)
Yes	29 (12.4%)	25 (9.4%)	54 (10.8%)
afib
No	167 (71.7%)	198 (74.2%)	365 (73.0%)
Yes	66 (28.3%)	69 (25.8%)	135 (27.0%)
dm
No	204 (87.6%)	233 (87.3%)	437 (87.4%)
Yes	29 (12.4%)	34 (12.7%)	63 (12.6%)
mrankin
0	190 (81.5%)	214 (80.1%)	404 (80.8%)
1	21 (9.0%)	29 (10.9%)	50 (10.0%)
2	12 (5.2%)	13 (4.9%)	25 (5.0%)
> 2	10 (4.3%)	11 (4.1%)	21 (4.2%)
sbp

Mean (SD)	146 (26.0)	145 (24.4)	145 (25.1)
iv.altep
No	30 (12.9%)	25 (9.4%)	55 (11.0%)
Yes	203 (87.1%)	242 (90.6%)	445 (89.0%)
time.iv

Median [Q1, Q3]	85.0 [67.0, 110]	87.0 [65.0, 116]	86.0 [67.0, 115]
aspects

Median [Q1, Q3]	9.00 [7.00, 10.0]	9.00 [8.00, 10.0]	9.00 [7.00, 10.0]
ia.occlus
Intracranial ICA	1 (0.4%)	3 (1.1%)	4 (0.8%)
ICA with M1	59 (25.3%)	75 (28.1%)	134 (26.8%)
M1	154 (66.1%)	165 (61.8%)	319 (63.8%)
M2	18 (7.7%)	21 (7.9%)	39 (7.8%)
A1 or A2	1 (0.4%)	2 (0.7%)	3 (0.6%)
extra.ica
No	158 (67.8%)	196 (73.4%)	354 (70.8%)
Yes	75 (32.2%)	70 (26.2%)	145 (29.0%)
time.rand

Median [Q1, Q3]	204 [152, 250]	196 [149, 266]	202 [151, 258]
time.punc

Median [Q1, Q3]	260 [212, 313]	NA	260 [212, 313]

For compareGroup():

createTable(compareGroups( trt ~ age + sex + nihss + location + hx.isch + afib + dm +
                              mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus +
                              extra.ica + time.rand + time.punc, data = df,
                            method = c(age=NA, nihss=NA, sbp = NA, time.iv=NA, aspects=NA, 
                                       time.rand =NA, time.punc =NA)))

## 
## --------Summary descriptives table by 'trt'---------
## 
## ________________________________________________________________ 
##                        Intervention       Control      p.overall 
##                           N=233            N=267                 
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  65.8 [54.5;76.0] 65.7 [55.8;76.2]   0.579   
## sex:                                                     0.917   
##     Female              98 (42.1%)      110 (41.2%)              
##     Male               135 (57.9%)      157 (58.8%)              
## nihss                17.0 [14.0;21.0] 18.0 [14.0;22.0]   0.453   
## location:                                                0.111   
##     Left               116 (49.8%)      153 (57.3%)              
##     Right              117 (50.2%)      114 (42.7%)              
## hx.isch:                                                 0.335   
##     No                 204 (87.6%)      242 (90.6%)              
##     Yes                 29 (12.4%)       25 (9.36%)              
## afib:                                                    0.601   
##     No                 167 (71.7%)      198 (74.2%)              
##     Yes                 66 (28.3%)       69 (25.8%)              
## dm:                                                      1.000   
##     No                 204 (87.6%)      233 (87.3%)              
##     Yes                 29 (12.4%)       34 (12.7%)              
## mrankin:                                                 0.922   
##     0                  190 (81.5%)      214 (80.1%)              
##     1                   21 (9.01%)       29 (10.9%)              
##     2                   12 (5.15%)       13 (4.87%)              
##     > 2                 10 (4.29%)       11 (4.12%)              
## sbp                     146 (26.0)       145 (24.4)      0.649   
## iv.altep:                                                0.267   
##     No                  30 (12.9%)       25 (9.36%)              
##     Yes                203 (87.1%)      242 (90.6%)              
## time.iv              85.0 [67.0;110]  87.0 [65.0;116]    0.596   
## aspects              9.00 [7.00;10.0] 9.00 [8.00;10.0]   0.075   
## ia.occlus:                                               0.819   
##     Intracranial ICA    1 (0.43%)        3 (1.13%)               
##     ICA with M1         59 (25.3%)       75 (28.2%)              
##     M1                 154 (66.1%)      165 (62.0%)              
##     M2                  18 (7.73%)       21 (7.89%)              
##     A1 or A2            1 (0.43%)        2 (0.75%)               
## extra.ica:                                               0.179   
##     No                 158 (67.8%)      196 (73.7%)              
##     Yes                 75 (32.2%)       70 (26.3%)              
## time.rand             204 [152;250]    196 [149;266]     0.251   
## time.punc             260 [212;313]       . [.;.]          .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

OR <- compareGroups(trt ~ age + sex + nihss + location + hx.isch + afib + dm + 
                      mrankin + sbp + iv.altep + time.iv + aspects + ia.occlus + 
                      extra.ica + time.rand + time.punc,
                    data = df, 
                    method = c(age=NA, nihss=NA, sbp = NA, time.iv=NA, 
                               aspects=NA, time.rand =NA, time.punc =NA),
                    ref = c(sex = 2, location = 2),
                    ref.y = 2
                    )
createTable(OR, show.ratio = TRUE)

## 
## --------Summary descriptives table by 'trt'---------
## 
## _________________________________________________________________________________________ 
##                        Intervention       Control             OR        p.ratio p.overall 
##                           N=233            N=267                                          
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age                  65.8 [54.5;76.0] 65.7 [55.8;76.2] 1.00 [1.01;0.98]  0.342    0.579   
## sex:                                                                              0.917   
##     Female              98 (42.1%)      110 (41.2%)    1.04 [0.72;1.48]  0.846            
##     Male               135 (57.9%)      157 (58.8%)          Ref.        Ref.             
## nihss                17.0 [14.0;21.0] 18.0 [14.0;22.0] 0.99 [1.03;0.96]  0.787    0.453   
## location:                                                                         0.111   
##     Left               116 (49.8%)      153 (57.3%)    0.74 [0.52;1.05]  0.094            
##     Right              117 (50.2%)      114 (42.7%)          Ref.        Ref.             
## hx.isch:                                                                          0.335   
##     No                 204 (87.6%)      242 (90.6%)          Ref.        Ref.             
##     Yes                 29 (12.4%)       25 (9.36%)    1.37 [0.78;2.44]  0.273            
## afib:                                                                             0.601   
##     No                 167 (71.7%)      198 (74.2%)          Ref.        Ref.             
##     Yes                 66 (28.3%)       69 (25.8%)    1.13 [0.76;1.69]  0.534            
## dm:                                                                               1.000   
##     No                 204 (87.6%)      233 (87.3%)          Ref.        Ref.             
##     Yes                 29 (12.4%)       34 (12.7%)    0.97 [0.57;1.66]  0.925            
## mrankin:                                                                          0.922   
##     0                  190 (81.5%)      214 (80.1%)          Ref.        Ref.             
##     1                   21 (9.01%)       29 (10.9%)    0.82 [0.44;1.48]  0.507            
##     2                   12 (5.15%)       13 (4.87%)    1.04 [0.45;2.37]  0.924            
##     > 2                 10 (4.29%)       11 (4.12%)    1.03 [0.41;2.51]  0.956            
## sbp                     146 (26.0)       145 (24.4)    1.00 [1.01;0.99]  0.646    0.649   
## iv.altep:                                                                         0.267   
##     No                  30 (12.9%)       25 (9.36%)          Ref.        Ref.             
##     Yes                203 (87.1%)      242 (90.6%)    0.70 [0.40;1.23]  0.215            
## time.iv              85.0 [67.0;110]  87.0 [65.0;116]  1.01 [1.01;1.00]  0.004    0.596   
## aspects              9.00 [7.00;10.0] 9.00 [8.00;10.0] 0.88 [0.99;0.79]  0.033    0.075   
## ia.occlus:                                                                        0.819   
##     Intracranial ICA    1 (0.43%)        3 (1.13%)           Ref.        Ref.             
##     ICA with M1         59 (25.3%)       75 (28.2%)    2.16 [0.24;63.1]  0.513            
##     M1                 154 (66.1%)      165 (62.0%)    2.57 [0.29;74.2]  0.414            
##     M2                  18 (7.73%)       21 (7.89%)    2.33 [0.25;71.2]  0.484            
##     A1 or A2            1 (0.43%)        2 (0.75%)     1.41 [0.03;76.8]  0.857            
## extra.ica:                                                                        0.179   
##     No                 158 (67.8%)      196 (73.7%)          Ref.        Ref.             
##     Yes                 75 (32.2%)       70 (26.3%)    1.33 [0.90;1.96]  0.152            
## time.rand             204 [152;250]    196 [149;266]   1.00 [1.00;0.99]  0.051    0.251   
## time.punc             260 [212;313]       . [.;.]          . [.;.]         .        .     
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Should use both commands:
** table1(): Show the “Total” column.
** compareGroups() - preferable: Show the “p-overall” column.

6 References

thomaselove.github.io/432-notes
https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html
https://github.com/benjaminrich/table1/issues/52
Berkhemer, Olvert A., Puck S. S. Fransen, Debbie Buemer, et al. 2015. “A Randomized Trial of Intraarterial Treatment for Acute Ischemic Stroke.” New England Journal of Medicine 372: 11–20. http://www.nejm.org/doi/full/10.1056/NEJMoa1411587.
Roy, Denis, Mario Talajic, Stanley Nattel, et al. 2008. “Rhythm Control Versus Rate Control for Atrial Fibrillation and Heart Failure.” New England Journal of Medicine 358: 2667–77. http://www.nejm.org/doi/full/10.1056/NEJMoa0708789.

Table - Descriptive analysis

Minh Tri

2022-07-30

1 Example from Journal Article

1.1 Example 1 - New England Journal of Medicine

1.2 Example 2: The MR CLEAN trial

2 Simulated fakestroke data

3 Building table: compareGroups() function - preferable

3.1 Step 1: Identify categorical or numeric variables

3.2 Step 2: First draft: compareGroups()

3.3 Step 3: Testing normality - Shapiro–Wilk test (can skip this step)

3.3.1 Shapiro-Wilk test

3.3.2 Drawing Q-Q plot

3.3.3 Practice

3.4 Step 4: Displaying different statistics for different variables:

3.5 Step 5: Odds Ratio

3.6 Step 6: Exporting the table (for CompareGroups() only)

3.7 Extra: Dealing with missing value

4 Building table: table1() function

4.1 Step 1: Identify categorical or numeric variables

4.2 Step 2: First draft: table1()

4.3 Step 3: Testing normality - Shapiro–Wilk test

4.4 Step 4: Displaying different statistics for different variables: table1()

4.5 Step 5: Exporting the table (for table1)

4.7 More comparisons: table with trt*sex variable

5 Summary

6 References