This tutorial is located on RPubs.
The entire R Markdown code is located on my GitHub page
For this tutorial, you
will need the following packages: survival
,
dplyr
, psych
, survminer
,
gmodels
, and gtsummary
.
A dichotomous variable has two outcomes (Yes or No, Survives or Dies,
Cured or Uncured) and is coded a 0 for \(No\) and 1 for \(Yes\). We can compare the differences in
the probability or risk of having the event (outcome = 1
)
between two groups using a relative risk ratio or odds ratio. This give
us information about the size of the effect, the direction of the
effect, and the uncertainty surrounding the effect (e.g., 95% confidence
interval). However, we do not have information on how long they were
alive or at risk, which is critical information from a survival point of
view. Hence, a different kind of analysis was needed to handle this
issue–survival analysis.
Survival analysis is also known as a time-to-event analysis. This is a special type of analysis that takes into consideration when the event occurred rather than if the event occurred. In other words, we are focused on acquiring the rate, which is the number of events per unit time.
In survival analysis, we are interested in the hazard, which is the instantaneous event (e.g., death) ro rate at a particular time \(t\).
By combining these two elements (survival and hazard), we will be able to estimate the hazard ratio of an event occurring between two groups.
In survival analysis, the two main features are the survivor and hazard functions.
Survivor function \(S(t)\) is a probability and hazard function \(h(t)\) is a rate.
The survivor function is described as:
where \(S(t)\) is the survival probability, \(0 \le S(t) \le 1\), and \(Pr(T > t)\) is the probability that the time of the event (\(T\)) is greater than the some time \(t\).
Hazards function \(h(t)\) is the instantaneous potential per unit time for the event to occur, given that the subject is alive (rate)
\(\large h(t) = \lim_{\Delta t \to 0} \frac{P(t \le T \le t + \Delta t | T \ge t)}{\Delta t}\),
where the hazard rate at some instantaneous point in time \(\lim_{\Delta t \to 0}\) is the probability of the event occurring (\(P(t \le T \le t + \Delta t )\)) conditioned on the subject being alive (\(T \ge t\) divided by per unit time.
The survivor and hazard functions are related by:
\(h(t) = \lambda\) if and only if \(S(t) = \exp^{-\lambda t}\)
This means that you can derive the hazard function \(h(t)\) from the survivor function \(S(t)\) and vice versa.
When comparing the survival between two groups, it’s good practice to plot out the survival curves. The Kaplan-Meier (KM) curve provides a non-parametric visualization of the survivor function. You can determine the survival probability at specific time points. The KM curve has a unique “step-like” appearance that represents when patients either experience the event (e.g., death) or are censored.
The data to plot a KM curve requires at minimum the subject
identifier, survival time (time spent at risk), status (event), and
group indicator. I generated these data to illustrate a basic KM curve
dataset. There are 10 subjects in each group. If a subject died, they
are coded as event = 1
, but if they withdraw or lost to
follow-up, they are censored event = 0
. the
survivaltime
is the time the subjects are at risk or
contributing to the survival time.
You can download the data used in this tutorial from my GitHub site.
kmplot <- read.csv("kmcurve.csv")
knitr::kable(
(kmplot), caption = "Table 1. Variables for constructing KM curves"
)
Table 1. Variables for constructing KM curves
subject | survivaltime | event | group |
---|---|---|---|
1 | 10 | 1 | 1 |
2 | 20 | 1 | 1 |
3 | 30 | 1 | 1 |
4 | 35 | 0 | 1 |
5 | 40 | 1 | 1 |
6 | 50 | 1 | 1 |
7 | 60 | 0 | 1 |
8 | 70 | 1 | 1 |
9 | 80 | 1 | 1 |
10 | 90 | 1 | 1 |
11 | 5 | 1 | 0 |
12 | 8 | 1 | 0 |
13 | 14 | 1 | 0 |
14 | 20 | 1 | 0 |
15 | 22 | 0 | 0 |
16 | 23 | 0 | 0 |
17 | 24 | 0 | 0 |
18 | 25 | 1 | 0 |
19 | 45 | 1 | 0 |
20 | 60 | 1 | 0 |
Let’s take a look at a typical KM curve. The KM curve has a
noticeable “step-like” plot that decreases (or increases) based on when
the event of interest occurs at some specific time point. For example at
time = 0
, there are 10 subjects in each group. But at
time = 25
, only 3 subjects in Treatment 0 remaining and 5
subjects in Treatment 1 remaining at risk. At time = 75
,
there are 0 subjects in Treatment 0 remaining and 2 subjects in
Treatment 1 remaining at risk. The hash marks on the KM curves indicate
that a subject was censored, which means that they either were lost to
follow-up or withdrew from the study. There are three hash marks in
Treatment 1 and two hash marks in Treatment 2.
The Kaplan-Meier curve has a unique “step-like” plot that distinguishes it from other survival plots.
kmplot <- read.csv("kmcurve.csv")
km.plot <- survfit(Surv(survivaltime, event) ~ group, data = kmplot)
ggsurvplot(km.plot,
risk.table = TRUE,
legend.labs = c("treatment 0", "treatment 1"),
title = "Kaplan-Meier curves")
The log rank test is used to compare the survival between groups. It is a test of significance, which means that it doesn’t tell us anything about the size of the effect, its direction, nor its uncertainty. However, it does let us know whether the survival between the groups are significantly different.
Statistical hypotheses for log rank test:
\(H_{0}\): There is no difference between the populations in the probability of an event at any time point
\(H_{a}\): There is a difference between the populations in the probability of an event at any time point
The log rank test is
based on the chi square test. It estimates the expected number of deaths
at each time point and compares that to the observed number of deaths.
The degrees of freedom is estimated as the number of groups minus 1 (n -
1). Fortunately, R can perform this statistical comparison using the
survdiff
function.
If the p-value is < 0.05, we reject the null that there is no difference in survival between the populations. Otherwise, we fail to reject the null and the survival between the populations are the same (assuming our statistical threshold is \(\alpha\) = 0.5).
To perform the log rank test on our data, we need to use the
survdiff
function from the survival
package.
### log rank test
survdiff(Surv(survivaltime, event) ~ group, data = kmplot)
## Call:
## survdiff(formula = Surv(survivaltime, event) ~ group, data = kmplot)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## group=0 10 7 4.14 1.978 3.27
## group=1 10 8 10.86 0.754 3.27
##
## Chisq= 3.3 on 1 degrees of freedom, p= 0.07
The log rank test yields a chi square statistical value because the underlying foundation is based on this framework. Like the chi square tests, the log rank estimates the expected number of subjects who died and compares that to the observed number of subjects who died. This yields a p-value = 0.07, which means that we fail to reject the null and the that the survival between Treatment 0 and Treatment 1 are statistically significantly different.
Since the log rank test is a test of significance, it doesn’t give us information about the treatment effect size, direction, and uncertainty. For that, we will need to construct a Cox proportional hazard model.
Cox proportional hazard regression model is a type of survival regression. It assumes that the hazards between two groups are proportional (proportional hazards regression). This means that the effects of the predictor variables upon the outcome (survival) are constant over time and are additive on a linear scale. For example, if something doubles the risk of an end point at time = 1, it will also double the risk at time = 2, time = 3, etc. One way to determine if the proportional hazards assumption is violated is to look and see if the KM curves cross at multiple points. It’s not a great way to do this, but it’s an easy visual.
The Cox proportional hazard model does not have a constant term; rather, it has a constant hazard \(h_{0}(t)\).
The hazard rate can be described as:
\(\large h(t, X) = h_{0}(t) e^{\sum_{i=1}^{p} \beta_{i} X_{i}}\)
The hazard ratio is the exponential of the two hazards for the groups and can be described as:
\(\large HR = \frac{h(t, X^{*})}{h(t, X)} = \frac{h_{0}(t) e^{\sum_{i=1}^{p} \beta_{i} X_{i}^{*}}}{h_{0}(t) e^{\sum_{i=1}^{p} \beta_{i} X_{i}}}\)
\(\large HR = e^{\sum_{i=1}^{p} \beta_{i} (X_{i}^{*} - X_{i})}\)
We can estimate the Cox proportional hazard model using the
coxph
function from the survival
package. We
can rewrite our \(HR\) formula to
reflect the main predictor of interest, which is group (Treatment 1
versus Treatment 2):
We are interested in the \(\beta_{1}\) coefficient. \(\large HR = e^{\beta_{1} [(Group = 1) - (Group = 0)]}\)
coxmodel <- coxph(Surv(survivaltime, event) ~ group, data = kmplot)
coxmodel ## Generates the log hazard ratio
## Call:
## coxph(formula = Surv(survivaltime, event) ~ group, data = kmplot)
##
## coef exp(coef) se(coef) z p
## group -1.0554 0.3481 0.6084 -1.735 0.0828
##
## Likelihood ratio test=3.06 on 1 df, p=0.08047
## n= 20, number of events= 15
We have to exponentiate the log hazard ratio to get the hazard ratio.
We also use the confint
function to generate the 95%
confidence interval for the hazard ratio.
exp(coef(coxmodel)) ## Generates exp coefficients (or hazard ratio)
## group
## 0.34805
exp(confint(coxmodel))
## 2.5 % 97.5 %
## group 0.1056181 1.146952
We are interested in the group
coefficient, which is the
log hazard ratio. When we exponentiate this, we get the hazard ratio,
which is 0.35. The 95% confidence interval is 0.11, 1.15.
We conclude that the hazard of having the event is 65% lower among subjects in Treatment 1 compared to Treatment 0 with a 95% confidence interval of 0.11, 1.15. This result is not statistically significant since the 95% confidence interval cross the null or \(HR = 1\).
We can apply what we’ve learned using data from the Veterans Lung Cancer Study.
The Veteran Lung Cancer
Study is part of the survival
package. You can load it by
assigning it to an object. For instance, veteran.data <-
veteran
. Once done, you can use veteran.data
for
your analyses.
We will use the Veterans Administration Lung Cancer Trial from Prentice, RL. Exponential survivals with censoring and explanatory variables. Biometrika. 1973;60:279–88. In this study, 137 patients with advanced, inoperable lung cancer were randomized into two groups: chemotherapy + new drug treatment (experimental group) and chemotherapy (control treatment). You can download the data for the variable list from my GitHub site.
The variables in the veteran dataset:
### Load the "variables.csv" data from my GitHub site; this is a list of variables in the "veteran" dataset
variables <- read.csv("variables.csv")
knitr::kable(
(variables), caption = "Table 2. Variables in the Veterans Affairs Lung Cancer Trial"
)
Table 2. Variables in the Veterans Affairs Lung CancerTrial
variable | description |
---|---|
trt | treatment assignment (1 = control, 2 = experimental) |
celltype | lung cancer cell type (1 = squamous, 2 = smallcell, 3 = adeno, 4 = large) |
time | survival time in days |
status | censoring status (1 = death occurred, 0 = censored) |
karno | Karnofsky performance score (100 = good) |
diagtime | months from diagnosis to randomization |
age | in years |
prior | prior therapy (0 = no, 10 = yes) |
Let’s take a look at the first six rows of the data.
### Load "veteran" data from the "survival" package
veteran.data <- veteran
knitr::kable(
head(veteran.data), caption = "Table 3. First six rows of the Veterans Affairs Lung Cancer Trial"
)
Table 3. First six rows of the Veterans Affairs Lung CancerTrial
trt | celltype | time | status | karno | diagtime | age | prior |
---|---|---|---|---|---|---|---|
1 | squamous | 72 | 1 | 60 | 7 | 69 | 0 |
1 | squamous | 411 | 1 | 70 | 5 | 64 | 10 |
1 | squamous | 228 | 1 | 60 | 3 | 38 | 0 |
1 | squamous | 126 | 1 | 60 | 9 | 63 | 10 |
1 | squamous | 118 | 1 | 70 | 11 | 65 | 10 |
1 | squamous | 10 | 1 | 20 | 5 | 49 | 0 |
We can look at some summary statistics of the veterans datasets.
There are 137 subjects in the study. The average survival time is 121.63
days and the average age of the sample is 58.3 years. The other
variables are discrete, so the means do not provide a good description
of them. We will need to use the CrossTable
command, which
is part of the gmodels
package.
knitr::kable(
describeBy(veteran.data) %>% round(2)
)
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trt | 1 | 137 | 1.50 | 0.50 | 1 | 1.50 | 0.00 | 1 | 2 | 1 | 0.01 | -2.01 | 0.04 |
celltype* | 2 | 137 | 2.34 | 1.07 | 2 | 2.30 | 1.48 | 1 | 4 | 3 | 0.28 | -1.17 | 0.09 |
time | 3 | 137 | 121.63 | 157.82 | 80 | 90.33 | 87.47 | 1 | 999 | 998 | 3.06 | 12.33 | 13.48 |
status | 4 | 137 | 0.93 | 0.25 | 1 | 1.00 | 0.00 | 0 | 1 | 1 | -3.47 | 10.10 | 0.02 |
karno | 5 | 137 | 58.57 | 20.04 | 60 | 59.37 | 29.65 | 10 | 99 | 89 | -0.34 | -0.83 | 1.71 |
diagtime | 6 | 137 | 8.77 | 10.61 | 5 | 6.73 | 4.45 | 1 | 87 | 86 | 4.06 | 23.00 | 0.91 |
age | 7 | 137 | 58.31 | 10.54 | 62 | 59.13 | 8.90 | 34 | 81 | 47 | -0.63 | -0.49 | 0.90 |
prior | 8 | 137 | 2.92 | 4.56 | 0 | 2.43 | 0.00 | 0 | 10 | 10 | 0.91 | -1.19 | 0.39 |
CrossTable(veteran.data$celltype)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 137
##
##
## | squamous | smallcell | adeno | large |
## |-----------|-----------|-----------|-----------|
## | 35 | 48 | 27 | 27 |
## | 0.255 | 0.350 | 0.197 | 0.197 |
## |-----------|-----------|-----------|-----------|
##
##
##
##
Using the CrossTable
function, we report that there were
35 (25.5%) subjects with squamous lung cancer, 48 (35.0%) with small
cell lung cancer, 27 (19.7%) with adenocarcinoma, and 27 (19.7%) with
large cell lung cancer.
We can create a demographics table using the gtsummary
package and the tbl_summary
function.
veteran.data %>%
select(trt, celltype, time, status, karno, diagtime, age, prior) %>%
tbl_summary(by = trt,
missing = "no") %>%
add_p(pvalue_fun = ~style_pvalue(.x, digits = 2)) %>%
add_overall() %>%
modify_header(label = "**Variable**") %>%
bold_labels() %>%
modify_caption("**Table 4. Patient characteristics from the Veterans Lung Cancer Trial**") %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Received**")
Variable | Overall, N = 1371 | Treatment Received | p-value2 | |
---|---|---|---|---|
1, N = 691 | 2, N = 681 | |||
celltype | 0.071 | |||
squamous | 35 (26%) | 15 (22%) | 20 (29%) | |
smallcell | 48 (35%) | 30 (43%) | 18 (26%) | |
adeno | 27 (20%) | 9 (13%) | 18 (26%) | |
large | 27 (20%) | 15 (22%) | 12 (18%) | |
time | 80 (25, 144) | 97 (25, 153) | 52 (25, 117) | 0.35 |
status | 128 (93%) | 64 (93%) | 64 (94%) | >0.99 |
karno | 60 (40, 75) | 60 (40, 80) | 60 (40, 70) | 0.75 |
diagtime | 5 (3, 11) | 5 (4, 11) | 4 (3, 11) | 0.35 |
age | 62 (51, 66) | 62 (49, 66) | 62 (52, 66) | 0.46 |
prior | 0.75 | |||
0 | 97 (71%) | 48 (70%) | 49 (72%) | |
10 | 40 (29%) | 21 (30%) | 19 (28%) | |
1 n (%); Median (IQR) | ||||
2 Pearson's Chi-squared test; Wilcoxon rank sum test; Fisher's exact test |
You can plot the Kaplan-Meier (KM) curves for the two groups and estimate the log rank test.
In the basic KM curve
plot, notice how there are no hash marks for the censored data. The
basic plot
function will not include this, unfortunately.
However, the ggsurvplot
has that ability along with other
options to improve your KM curves.
We first use the basic plot
function to visual the KM
curve without any of the color or 95% confidence interval.
kmcurve <- survfit(Surv(time, status) ~ trt, data = veteran.data)
plot(kmcurve)
We can also install the survminer
package to use the
ggsurvplot
function to plot the KM curves with additional
features such as the 95% confidence interval (CI), log-rank test, and
risk tables. We can estimate the log rank test by including the option
pval = TRUE
into the ggsurvplot
function. But
we can also estimate the log rank test separately using the
survdiff
function.
ggsurvplot(kmcurve,
conf.int = TRUE,
pval = TRUE,
risk.table = TRUE,
legend.labs = c("treatment 1", "treatment 2"),
title = "Kaplan-Meier curves")
survdiff(Surv(time, status) ~ trt, data = veteran.data)
## Call:
## survdiff(formula = Surv(time, status) ~ trt, data = veteran.data)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## trt=1 69 64 64.5 0.00388 0.00823
## trt=2 68 64 63.5 0.00394 0.00823
##
## Chisq= 0 on 1 degrees of freedom, p= 0.9
Both the ggsurvplot
and survdiff
yield the
same p-value for the log rank test, which is 0.93. Therefore, we fail to
reject the null that there is no statistically significant difference in
survival between Treatment 1 and Treatment 2.
We can estimate the hazard ratio between Treatment 1 and Treatment 2 using the Cox proportional hazards model.
\(\large HR = e^{\beta_{1}(Treatment)}\)
coxmodel2 <- coxph(Surv(time, status) ~ trt, data = veteran.data)
exp(coef(coxmodel2))
## trt
## 1.017901
exp(confint(coxmodel2))
## 2.5 % 97.5 %
## trt 0.7143755 1.450389
### Create nice regression output
model1 <- tbl_regression(coxmodel2, intercept = TRUE)
as_gt(model1) %>%
gt::tab_header("Cox proportional hazards regression model") %>%
gt::tab_options(table.align='left')
Cox proportional hazards regression model | |||
Characteristic | log(HR)1 | 95% CI1 | p-value |
---|---|---|---|
trt | 0.02 | -0.34, 0.37 | >0.9 |
1 HR = Hazard Ratio, CI = Confidence Interval |
The Cox proportional hazards model yields log hazard ratio. To get the hazard ratio, exponentiate the log hazard ratio.
The hazard ratio was 1.02 with a 95% confidence interval of 0.71, 1.45. Since the 95% confidence interval crosses where the \(HR = 1\), we fail to reject the null hypothesis that there is no significant differences in hazards between subjects in Treatment 1 and Treatment 2.
We can control for confounders such as the age, prior history, cell type, diagnosis time, and Karnofsky score.
\(\large HR = e^{\beta_{1}(Treatment) + \beta_{2}(age) + \beta_{3}(prior) + \beta_{4}(celltype) + \beta_{5}(diagtime) + \beta_{6}(karno)}\)
coxmodel3 <- coxph(Surv(time, status) ~ trt + celltype + age + prior + diagtime + karno, data = veteran.data)
exp(coef(coxmodel3))
## trt celltypesmallcell celltypeadeno celltypelarge
## 1.3425930 2.3668512 3.3070825 1.4937529
## age prior diagtime karno
## 0.9913313 1.0071850 1.0000813 0.9677173
exp(confint(coxmodel3))
## 2.5 % 97.5 %
## trt 0.8938772 2.0165590
## celltypesmallcell 1.3799024 4.0596961
## celltypeadeno 1.8335975 5.9646647
## celltypelarge 0.8583289 2.5995834
## age 0.9734248 1.0095673
## prior 0.9623552 1.0541032
## diagtime 0.9823329 1.0181504
## karno 0.9573269 0.9782204
### Create nice regression output
model2 <- tbl_regression(coxmodel3, intercept = TRUE)
as_gt(model2) %>%
gt::tab_header("Cox proportional hazards regression model with confounders") %>%
gt::tab_options(table.align='left')
Cox proportional hazards regression model with confounders | |||
Characteristic | log(HR)1 | 95% CI1 | p-value |
---|---|---|---|
trt | 0.29 | -0.11, 0.70 | 0.2 |
celltype | |||
squamous | — | — | |
smallcell | 0.86 | 0.32, 1.4 | 0.002 |
adeno | 1.2 | 0.61, 1.8 | <0.001 |
large | 0.40 | -0.15, 0.96 | 0.2 |
age | -0.01 | -0.03, 0.01 | 0.3 |
prior | 0.01 | -0.04, 0.05 | 0.8 |
diagtime | 0.00 | -0.02, 0.02 | >0.9 |
karno | -0.03 | -0.04, -0.02 | <0.001 |
1 HR = Hazard Ratio, CI = Confidence Interval |
In the adjusted Cox proportional hazards model, subjects in Treatment 2 had a 1.34 higher hazard of the event compared to subjects in Treatment 1; the 95% confidence interval is between 0.89 and 2.02. This is not statistically significant becacause the 95% confidence interval includes the null or \(HR = 1\).
We can check to see if the adjustment model reduced confounding by estimating the magnitude of confounding. This is a rule of thumb: if the magnitude of confounding is greater than 10%, we should present the adjusted \(HR\).
magnitude of confounding = \(\large \frac{HR_{crude} - HR_{adjusted}}{HR_{adjusted}}\)
confounding <- (1.017901 - 1.3425930) / 1.3425930
confounding
## [1] -0.2418395
Since the magnitude of confounding is 24% and greater than 10%, we should present the adjusted \(HR\).
We can also compare the models side by side.
#### Merge the two Cox regression modelss outputs
table1 <- tbl_merge(tbls = list(model1, model2),
tab_spanner = c("**Model 1**", "**Model 2**"))
as_gt(table1) %>%
gt::tab_header("Comparison between models [Model 1 (crude) v. Model 2 (adjusted)]") %>%
gt::tab_options(table.align = 'left')
Comparison between models [Model 1 (crude) v. Model 2 (adjusted)] | ||||||
Characteristic | Model 1 | Model 2 | ||||
---|---|---|---|---|---|---|
log(HR)1 | 95% CI1 | p-value | log(HR)1 | 95% CI1 | p-value | |
trt | 0.02 | -0.34, 0.37 | >0.9 | 0.29 | -0.11, 0.70 | 0.2 |
celltype | ||||||
squamous | — | — | ||||
smallcell | 0.86 | 0.32, 1.4 | 0.002 | |||
adeno | 1.2 | 0.61, 1.8 | <0.001 | |||
large | 0.40 | -0.15, 0.96 | 0.2 | |||
age | -0.01 | -0.03, 0.01 | 0.3 | |||
prior | 0.01 | -0.04, 0.05 | 0.8 | |||
diagtime | 0.00 | -0.02, 0.02 | >0.9 | |||
karno | -0.03 | -0.04, -0.02 | <0.001 | |||
1 HR = Hazard Ratio, CI = Confidence Interval |
Survival analysis takes into account the amount of time that the subjects are at risk; this is known as survival time. Even though subjects do not experience the event during the study period, they can be censored because they may experience the event at a late date. Kaplan-Meier curves provide a convenient non-parametric visual of the survival probability across time, and the log rank test allows us to tests whether the survival curves are different between groups. However, the log rank test does not provide information about the size of the effect, its direction, nor its uncertainty. The Cox proportional hazards model provides us with the hazard ratio and 95% confidence interval, which describes the relative hazards between two groups.
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958; 53:457–81.link
Bland JM, Altman DG. The logrank test. BMJ. 2004;328(7447):1073.link
The gtsummary
package creates nice tables for
demographic summaries. The gtsummary
package was developed
by Daniel D.
Sjoberg.
##
## To cite gtsummary in publications use:
##
## Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J. Reproducible
## summary tables with the gtsummary package. The R Journal
## 2021;13:570–80. https://doi.org/10.32614/RJ-2021-053.
##
## A BibTeX entry for LaTeX users is
##
## @Article{gtsummary,
## author = {Daniel D. Sjoberg and Karissa Whiting and Michael Curry and Jessica A. Lavery and Joseph Larmarange},
## title = {Reproducible Summary Tables with the gtsummary Package},
## journal = {{The R Journal}},
## year = {2021},
## url = {https://doi.org/10.32614/RJ-2021-053},
## doi = {10.32614/RJ-2021-053},
## volume = {13},
## issue = {1},
## pages = {570-580},
## }
These tutorials are a work in progress, so they may contain errors. I highly recommend you seek out statistical consult if you plan on running any analysis meant for publication. Any comments and suggestions can be sent to: internal.validity.blog@gmail.com