It is all very well to be able to do stats in R, you also need to be able to write about them. I saw this intriguing tweet (have your joined twitter for the #rstats content yet?) today and thought I would try out the report() package.

Often one needs to report statistical analysis in a publication/report, and formatting them manually can be tedious and error-prone 📄

The {report} function from the eponymous 📦 automates this process to follow best practices ✍️https://t.co/gAaRN4Qqr1 #rstats #DataScience pic.twitter.com/59QS89HiLE
— R Function A Day (@rfunctionaday) April 17, 2021

It is from a family of packages called easystats- lets see if it lives up to its name!!

library(tidyverse)
library(palmerpenguins)
library(report)

penguins <- penguins

The report() function works on lots of different data objects (from dataframes through model outputs) and depending on the kind of input you feed it, it will give you a “report”. In some cases, it almost writes your results section for you!!

using the `report`package

First install the package (just once) and load it.

# remotes::install_github("easystats/report")

library(report)

dataframes

Feed report() a dataframe and it will produce a descriptives summary for each variable. Not in the most readable format though…

report(penguins)

## The data contains 344 observations of the following 8 variables:
##   - species: 3 levels, namely Adelie (n = 152, 44.19%), Chinstrap (n = 68, 19.77%) and Gentoo (n = 124, 36.05%)
##   - island: 3 levels, namely Biscoe (n = 168, 48.84%), Dream (n = 124, 36.05%) and Torgersen (n = 52, 15.12%)
##   - bill_length_mm: n = 344, Mean = 43.92, SD = 5.46, Median = , MAD = 7.04, range: [32.10, 59.60], Skewness = 0.05, Kurtosis = -0.88, 0.58% missing
##   - bill_depth_mm: n = 344, Mean = 17.15, SD = 1.97, Median = , MAD = 2.22, range: [13.10, 21.50], Skewness = -0.14, Kurtosis = -0.91, 0.58% missing
##   - flipper_length_mm: n = 344, Mean = 200.92, SD = 14.06, Median = , MAD = 16.31, range: [172, 231], Skewness = 0.35, Kurtosis = -0.98, 0.58% missing
##   - body_mass_g: n = 344, Mean = 4201.75, SD = 801.95, Median = , MAD = 889.56, range: [2700, 6300], Skewness = 0.47, Kurtosis = -0.72, 0.58% missing
##   - sex: 2 levels, namely female (n = 165, 47.97%), male (n = 168, 48.84%) and missing (n = 11, 3.20%)
##   - year: n = 344, Mean = 2008.03, SD = 0.82, Median = 2008.00, MAD = 1.48, range: [2007, 2009], Skewness = -0.05, Kurtosis = -1.50, 0% missing

But it really designed for making it easy to report outcomes of statistical analysis.

t-test

Lets test whether there are differences in penguin weight by sex.

I think that male penguins are probably heavier than female penguins. Following JennyS recommendations, I am going to get descriptives, plot the data and then do a t-test.

# descriptives

penguins %>%
  group_by(sex) %>%
  summarise(mean = mean(body_mass_g))

## # A tibble: 3 x 2
##   sex     mean
##   <fct>  <dbl>
## 1 female 3862.
## 2 male   4546.
## 3 <NA>     NA

# plot

penguins %>%
  ggplot(aes(x = sex, y = body_mass_g, fill = sex)) +
  geom_boxplot(alpha = 0.5) +
  stat_summary(fun = "mean")

Let test whether that difference is significant. Here I am test whether there is a difference in body_mass_g by sex, so use the formula body_mass_g ~ sex. Here is what the standard t-test output looks like.

t.test(formula = body_mass_g ~ sex, data = penguins)

## 
##  Welch Two Sample t-test
## 
## data:  body_mass_g by sex
## t = -8.5545, df = 323.9, p-value = 4.794e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -840.5783 -526.2453
## sample estimates:
## mean in group female   mean in group male 
##             3862.273             4545.685

Now I am assigning the output of the t-test to an object, and then running report() on the my-ttest object.

my_ttest <- t.test(formula = body_mass_g ~ sex, data = penguins)

report(my_ttest)

## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The Welch Two Sample t-test testing the difference of body_mass_g by sex (mean in group female = 3862.27, mean in group male = 4545.68) suggests that the effect is positive, statistically significant, and large (difference = 683.41, 95% CI [-840.58, -526.25], t(323.90) = -8.55, p < .001; Cohen's d = -0.95, 95% CI [-1.18, -0.72])

MAGIC! that is cool. I love how it reports the stats, but also means AND effect sizes automatically.

correlation

OK what about correlations: are penguins with longer flippers generally heavier?

For correlations, descriptives aren’t that helpful so I am going to just plot a scatter and then test for a correlation.

# plot

penguins %>%
  ggplot(aes(x = body_mass_g, y = flipper_length_mm)) +
  geom_point() +
  geom_smooth()

# statistics

my_correlation <- cor.test(penguins$body_mass_g, penguins$flipper_length_mm)

report(my_correlation)

## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Pearson's product-moment correlation between penguins$body_mass_g and penguins$flipper_length_mm is positive, statistically significant, and very large (r = 0.87, 95% CI [0.84, 0.89], t(340) = 32.72, p < .001)

linear regression

And regression, can we predict the weight of a penguin from its bill length?

# plot

penguins %>%
  ggplot(aes(x = body_mass_g, y =  bill_length_mm)) +
  geom_point() +
  geom_smooth()

# statistics
my_model <- lm(body_mass_g ~ bill_length_mm, data = penguins)


report(my_model)

## We fitted a linear model (estimated using OLS) to predict body_mass_g with bill_length_mm (formula: body_mass_g ~ bill_length_mm). The model explains a statistically significant and substantial proportion of variance (R2 = 0.35, F(1, 340) = 186.44, p < .001, adj. R2 = 0.35). The model's intercept, corresponding to bill_length_mm = 0, is at 362.31 (95% CI [-195.02, 919.64], t(340) = 1.28, p = 0.202). Within this model:
## 
##   - The effect of bill_length_mm is statistically significant and positive (beta = 87.42, 95% CI [74.82, 100.01], t(340) = 13.65, p < .001; Std. beta = 0.60, 95% CI [0.51, 0.68])
## 
## Standardized parameters were obtained by fitting the model on a standardized version of the dataset.

OK we are account for 35% of the variance in body mass with just bill length. What if we add species to the model.

# plot

penguins %>%
  ggplot(aes(x = body_mass_g, y =  bill_length_mm, colour = species)) +
  geom_point() +
  geom_smooth()

From the plot, it seems that species is important information, lets test whether it improves our prediction.

# statistics
my_model2 <- lm(body_mass_g ~ bill_length_mm + species, data = penguins)


report(my_model2)

## We fitted a linear model (estimated using OLS) to predict body_mass_g with bill_length_mm and species (formula: body_mass_g ~ bill_length_mm + species). The model explains a statistically significant and substantial proportion of variance (R2 = 0.78, F(3, 338) = 406.27, p < .001, adj. R2 = 0.78). The model's intercept, corresponding to bill_length_mm = 0 and species = Adelie, is at 153.74 (95% CI [-375.19, 682.67], t(338) = 0.57, p = 0.568). Within this model:
## 
##   - The effect of bill_length_mm is statistically significant and positive (beta = 91.44, 95% CI [77.89, 104.98], t(338) = 13.28, p < .001; Std. beta = 0.62, 95% CI [0.53, 0.71])
##   - The effect of species [Chinstrap] is statistically significant and negative (beta = -885.81, 95% CI [-1059.40, -712.22], t(338) = -10.04, p < .001; Std. beta = -1.10, 95% CI [-1.32, -0.89])
##   - The effect of species [Gentoo] is statistically significant and positive (beta = 578.63, 95% CI [430.39, 726.87], t(338) = 7.68, p < .001; Std. beta = 0.72, 95% CI [0.54, 0.91])
## 
## Standardized parameters were obtained by fitting the model on a standardized version of the dataset.

WOOT, 78% of the variance! Penguin stats is so fun- why do I study babies again?? I should study baby penguins.

via GIPHY

anova

And last, but not least: anova. We showed above with a t-test that male penguins are generally heavier than female penguins, but does that effect differ by species?

# descriptives

penguins %>%
  na.omit() %>%
  group_by(sex, species) %>%
  summarise(mean = mean(body_mass_g)) %>%
  arrange(species)

## # A tibble: 6 x 3
## # Groups:   sex [2]
##   sex    species    mean
##   <fct>  <fct>     <dbl>
## 1 female Adelie    3369.
## 2 male   Adelie    4043.
## 3 female Chinstrap 3527.
## 4 male   Chinstrap 3939.
## 5 female Gentoo    4680.
## 6 male   Gentoo    5485.

Hard to say from descriptives, lets plot….

# plot

penguins %>%
  na.omit() %>%
  ggplot(aes(x = sex, y = body_mass_g, fill = species)) +
  geom_boxplot(alpha = 0.5) +
  stat_summary(fun = mean) +
  facet_wrap(~ species)

Maybe the sex differences for Gentoo penguins are larger than for Adelie and Chinstrap? Lets test that. Here I am looking for differences in body_mass_g as a function of sex and species. By using the formula body_mass_g ~ sex * species, I am telling R that I want it to look for main effects of sex and species AND the sex x species interaction (i.e. sex + species + sex*species).

my_anova <- aov(body_mass_g ~ sex * species, data = penguins)

report(my_anova)

## The ANOVA (formula: body_mass_g ~ sex * species) suggests that:
## 
##   - The main effect of sex is statistically significant and large (F(1, 327) = 406.14, p < .001; Eta2 (partial) = 0.55, 90% CI [0.50, 0.60])
##   - The main effect of species is statistically significant and large (F(2, 327) = 749.02, p < .001; Eta2 (partial) = 0.82, 90% CI [0.80, 0.84])
##   - The interaction between sex and species is statistically significant and small (F(2, 327) = 8.76, p < .001; Eta2 (partial) = 0.05, 90% CI [0.02, 0.09])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

The interaction between sex and species is significant, but small.

making reports pretty

The report() output is super helpful but it is a bit ugly. You could copy and paste the output of report() into your word .doc results section, but we are going to all this trouble in the service of reproducibility. I want to do my analysis and write my paper in the SAME RMarkdown document.

I think this is where inline code comes in.

You can put snippets of R code in the text of your Markdown document using backticks like this r - the tricky bit is working out how to refer to pieces of your statistical output. I’ve seen inline code used to report statistics in both the papaja package and the gtsummary package. I wonder if report works similarly?

papaja inline_text

The papaja package uses the apa_print() function to generate a list of outputs from a statistical test that you can then refer to in the text using inline code. Here I am “wrapping” my t-test output in apa_print() and assigning it to an object called out. See relevant documentation here.

library(papaja)

out <- apa_print(t.test(formula = body_mass_g ~ sex, data = penguins))

print(out)

## $estimate
## [1] "$\\Delta M = -683.41$, 95\\% CI $[-840.58$, $-526.25]$"
## 
## $statistic
## [1] "$t(323.90) = -8.55$, $p < .001$"
## 
## $full_result
## [1] "$\\Delta M = -683.41$, 95\\% CI $[-840.58$, $-526.25]$, $t(323.90) = -8.55$, $p < .001$"
## 
## $table
## NULL
## 
## attr(,"class")
## [1] "apa_results" "list"

When you print that list called out, you can see it contains details of the estimate, statistics, and full_results.

Then you can refer to these list components when you are writing your results in RMarkdown.

In your RMarkdown document the inline code looks like this…

And when you knit the test renders like this… which is pretty cool. I wonder how easy it is to customise… I dont love the triangle for M change…

Male penguins are significantly heavier than female penguins, \(\Delta M = -683.41\), 95% CI \([-840.58\), \(-526.25]\), \(t(323.90) = -8.55\), \(p < .001\).

gtsummary inline_text

The gtsummary() package makes inline text pretty easy too- see documentation here.

gtsummary is designed to be an extension of the gt package. It is pretty easy to get a good looking descriptives table using the tbl_summary() function and once you have assigned that to an object, you can refer to parts of it using the inline_text() function.

library(gtsummary)

tab1 <- tbl_summary(penguins, by = sex)

tab1

Characteristic	female, N = 165¹	male, N = 168¹
species
Adelie	73 (44%)	73 (43%)
Chinstrap	34 (21%)	34 (20%)
Gentoo	58 (35%)	61 (36%)
island
Biscoe	80 (48%)	83 (49%)
Dream	61 (37%)	62 (37%)
Torgersen	24 (15%)	23 (14%)
bill_length_mm	42.8 (37.6, 46.2)	46.8 (41.0, 50.3)
bill_depth_mm	17.00 (14.50, 17.80)	18.45 (16.08, 19.25)
flipper_length_mm	193 (187, 210)	200 (193, 219)
body_mass_g	3,650 (3,350, 4,550)	4,300 (3,900, 5,312)
year
2007	51 (31%)	52 (31%)
2008	56 (34%)	57 (34%)
2009	58 (35%)	59 (35%)
¹ n (%); Median (IQR)

Here I want to report the mean body mass by sex. My text in my Rmd looks like this…

And when it renders into my knitted document it looks like this…Hmmm I wonder how easy that is to customise, I don’t love the values in brackets and it would be nice to include units. sigh…

Male penguins were on average heavier 4,300 (3,900, 5,312) than female penguins 3,650 (3,350, 4,550).

report inline_text?

I didn’t have much success in my googling re whether it is possible to use inline text in RMarkdown with the report() package. I found lots of discussion on the package issues thread about integration with RMarkdown. They are talking about making APA format a output option in this github issue. And this github issue seems to suggest that the developers are considering adding format functions to allow the report output to just appear in the Markdown document, but haven’t yet. Of course with open source software, no one is being paid to do any of these things. The developers are making updates in their spare time and when there is a problem to solve that makes the package better for them.

The best solution I came up with was using the code chunk option results = asis. This tells RMarkdown to write the output as raw Markdown (i.e. “as is”) rather than as code output.

Lets try it on the t-test output.

This is what is looks like with standard chunk settings.

my_ttest <- t.test(formula = body_mass_g ~ sex, data = penguins)

report(my_ttest)

## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The Welch Two Sample t-test testing the difference of body_mass_g by sex (mean in group female = 3862.27, mean in group male = 4545.68) suggests that the effect is positive, statistically significant, and large (difference = 683.41, 95% CI [-840.58, -526.25], t(323.90) = -8.55, p < .001; Cohen's d = -0.95, 95% CI [-1.18, -0.72])

And this is what it looks like when I specify results = ‘asis’ in the chunk options.

my_ttest <- t.test(formula = body_mass_g ~ sex, data = penguins)

report(my_ttest)

Effect sizes were labelled following Cohen’s (1988) recommendations.

The Welch Two Sample t-test testing the difference of body_mass_g by sex (mean in group female = 3862.27, mean in group male = 4545.68) suggests that the effect is positive, statistically significant, and large (difference = 683.41, 95% CI [-840.58, -526.25], t(323.90) = -8.55, p < .001; Cohen’s d = -0.95, 95% CI [-1.18, -0.72])

via GIPHY

That isn’t terrible. As with all the other inline reporting options, I would like to have a little more flexibility. While automagically generating your results section is fun (and makes it WAY LESS error prone), your writing is going to sound super clunky if you describe each t-test in exactly the same way. But at least it is giving you a good idea of what you need to be reporting.

hope this is useful!

easystats report

Jen Richmond

20/04/2021

using the `report`package

dataframes

t-test

correlation

linear regression

anova

making reports pretty

papaja inline_text

gtsummary inline_text

report inline_text?

easystats report

Jen Richmond

20/04/2021

using the reportpackage

dataframes

t-test

correlation

linear regression

anova

making reports pretty

papaja inline_text

gtsummary inline_text

report inline_text?

using the `report`package