Question: How to add a line of best fit to a scatterplot in ggpubr?

A line of best fit is a straight line drawn through the maximum number of points on a scatter plot balancing about an equal number of points above and below the line. The line, depending on its orientation, determines the positive or negative value of the correlation of the two variables.

We’ll use the “palmerpenguins” package for this example.

Data Preparation

library(ggpubr)
## Loading required package: ggplot2
library(palmerpenguins)

data(penguins)

Making a Scatterplot

We make scatterplots in ggpubr using the ggscatter () function. Because this is not a base function in R, we must explicitly define a y and an x variable.

ggscatter(y = "flipper_length_mm",
          x = "bill_length_mm", 
          data = penguins)
## Warning: Removed 2 rows containing missing values (geom_point).

Adding the line of best fit

ggscatter(y = "flipper_length_mm",
          x = "bill_length_mm", 
          add = "reg.line",      # LoBF   
          data = penguins)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

ggscatter(y = "flipper_length_mm",
          x = "bill_length_mm", 
          data = penguins)
## Warning: Removed 2 rows containing missing values (geom_point).

Additional Reading

https://www.statmethods.net/graphs/scatterplot.html

Keywords

  1. ggscatter()
  2. scatter plot
  3. Line of best fit
  4. linear regression
  5. slope
  6. correlation