What are the key differences between making a scatterplot with the regular plot() function and the ggscatter() function from ggpubr?

Install the necessary packages.

# install.packages("palmerpenguins")
# data(penguins)
library(ggpubr)
## Loading required package: ggplot2
## Loading required package: magrittr

Use the “palmerpenguins” data to create a dataframe including three variables. (I needed to create the vectors manually due to issues loading this package).

bill_length_mm <- c(39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0)
bill_depth_mm <- c(18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2)
body_mass_g <- c(3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250)
sex <- c("male","female","female","NA","female","male","female","male","NA","NA")
df <- data.frame(bill_length_mm, bill_depth_mm, body_mass_g, sex)

First, we will use just the regular plot() function to create a scatter plot, with bill legnth along the x-axis and bill depth along the y-axis.

x <- bill_length_mm
y <- bill_depth_mm

plot(y~x,
     main = "bill length vs bill depth, plot() function",
     xlab = "bill length (mm)",
     ylab = "bill depth (mm)")

Now, we will use the ggpubr function ggscatter().

ggscatter(x = "bill_length_mm",
          y = "bill_depth_mm",
          data = df,
          xlab = "bill length (mm)",
          ylab = "bill depth (mm)",
          title = "bill length vs bill depth, ggscatter() function")
## Warning: Removed 1 rows containing missing values (geom_point).

The ggscatter() function from ggpubr allows for some interesting multi-variable comparisons. This is part of what differentiates it from the standard R plot() function. For example, we will size the points based on the body mass. We can also label them by sex.

ggscatter(x = "bill_length_mm",
          y = "bill_depth_mm",
          data = df,
          xlab = "bill length (mm)",
          ylab = "bill depth (mm)",
          size = "body_mass_g",
          label = "sex",
          title = "bill length vs bill depth, ggscatter() function")
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_text).

The outputs show key differences. The function from ggpubr has more variation in its arguments and can size points based on other variables, as well as label the individual points, while the general plot() function does not have these arguments.
By default, points are filled in when using ggscatter() but are not filled with the plot() function.
The ggscatter() function uses the argument “title =” to name the graph, while plot() uses “main =” for the name.

Key words:

  • scatter plot
  • plot()
  • ggpubr
  • ggscatter()
  • dataframe