When writing a report in RMD, it is generally a good idea to limit the raw R output that you display to the screen. The raw R output can leave your report looking messy and unprofessional. Luckily, there are different libraries that you can use to convert your R ouput into tables that look aesthetically clean. Below we will look at the Kable library, but another common option that people use is the gt library.

To start off with, let’s consider that you create a multiple linear regression model using the inbuilt mtcars dataset (as this dataset is inbuilt, you should be able to run the code below without reading in any data):

model = lm(mpg ~ wt + hp + qsec, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt + hp + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8591 -1.6418 -0.4636  1.1940  5.6092 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.61053    8.41993   3.279  0.00278 ** 
## wt          -4.35880    0.75270  -5.791 3.22e-06 ***
## hp          -0.01782    0.01498  -1.190  0.24418    
## qsec         0.51083    0.43922   1.163  0.25463    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared:  0.8348, Adjusted R-squared:  0.8171 
## F-statistic: 47.15 on 3 and 28 DF,  p-value: 4.506e-11

As you can see from the output above, there is lots of output, and it is definitely not arranged into a neat table! One of the most important pieces of this output is the coefficient table which provides the estimate, standard error, t-value and p value for each of the model’s coefficients.

# Load required packages
library(knitr)
library(broom)  # for tidy()

# Fit a multiple linear regression model
model <- lm(mpg ~ wt + hp + qsec, data = mtcars)

# Get tidy summary of model
model_tidy <- tidy(model)

# Create a nicely formatted table
kable(model_tidy, digits = 3, caption = "Multiple Linear Regression Coefficients")
Table 1: Multiple Linear Regression Coefficients
term estimate std.error statistic p.value
(Intercept) 27.611 8.420 3.279 0.003
wt -4.359 0.753 -5.791 0.000
hp -0.018 0.015 -1.190 0.244
qsec 0.511 0.439 1.163 0.255

What if we only wanted to focus on a few sections? We can use tidyverse to select the columns that we want.

library(tidyverse)
model_tidy %>% select(term, estimate, p.value) %>% kable()
term estimate p.value
(Intercept) 27.6105269 0.0027846
wt -4.3587972 0.0000032
hp -0.0178223 0.2441762
qsec 0.5108337 0.2546284

Maybe earlier in the report we want to include a table of what each variable in the multiple linear regression stands for?

# Create a dataframe for selected variables
regression_vars <- data.frame(
  Variable = c("mpg", "wt", "hp", "qsec"),
  Description = c(
    "Miles/(US) gallon (response)",
    "Weight (1000 lbs)",
    "Gross horsepower",
    "1/4 mile time"
  ),
  stringsAsFactors = FALSE
)

# Display table
kable(regression_vars, caption = "Variables Used in the Regression Model")
Table 2: Variables Used in the Regression Model
Variable Description
mpg Miles/(US) gallon (response)
wt Weight (1000 lbs)
hp Gross horsepower
qsec 1/4 mile time

Warning - More advanced! Maybe we want to be clear about which is the dependent variable, and which are the independent variables? We can do this using the kableExtra package.

library(kableExtra)
kbl(regression_vars, caption = "Variables Used in the Regression Model") %>%
  kable_paper("striped", full_width = F) %>%
  pack_rows("Dependent Variable", 1,1) %>%
  pack_rows("Independent Variables)", 2,4) %>%
kable_styling(position = "center")
Table 3: Variables Used in the Regression Model
Variable Description
Dependent Variable
mpg Miles/(US) gallon (response)
Independent Variables)
wt Weight (1000 lbs)
hp Gross horsepower
qsec 1/4 mile time

Finally, whenever you include a figure or a table in a report, it is important that you reference them in the body of your text. If you don’t reference them, what was the point of including them in the first place?

In order to reference a table, in the R code chunk where you create the table, you need to label the chunk. You do this by adding a chunk name next to where the r occurs in the curly bracket {} to make a chunk. For example, for the table I just created to display the descriptions of the dependent and independent variables, I might modify the R chunk by writing: {r dataDictionaryTable}.

Then, in the main body of my report, to refer to the table in the body of my text I write \@ref(tab:dataDictionaryTable). This will automatically include a link back to the table. For example, in the main body of my text, I might write: “A data dictionary of the variables used in the regression model of this report are presented int Table 3.”

For knitting to html:

In order for \@ref(tab:dataDictionaryTable) to link to the table, it is important that you update the YAML (the stuff between the —–) at the top of the Rmd file. Replace output: html_document with output: bookdown::html_document2 (you will have to have previously installed the bookdown library).

For knitting to pdf:

In order for \@ref(tab:dataDictionaryTable) to link to the table, it is important that you update the YAML (the stuff between the —–) at the top of the Rmd file. Replace output: pdf_document with output: bookdown::pdf_document2: (you will have to have previously installed the bookdown library).


There are lots of other things you can do! Feel free to search online for more examples! One resource that might help can be found here: https://andrewirwin.github.io/data-visualization/format-tables.html

AI Acknowledgment: ChatGPT was used to generate code in this document.