QUESTION: How do I add a correlation coefficient to my scatterplot when I use ggpubr/ggscatter?

To do this, we simply add a small argument into the function.

Data

We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library(“palmerpenguins”), and load the data with data(penguins)

#install.packages("palmerpenguins")
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.1.2
#install.packages("ggpubr")
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.1.2
## Loading required package: ggplot2

Adding a Correlation Coefficient to a Scatterplot Using ggpubr

This chunk is just to get some information on the data so we know how to deal with it!

data(penguins)

penguins
## # A tibble: 344 x 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ... with 334 more rows, and 2 more variables: sex <fct>, year <int>
is(penguins)
## [1] "tbl_df"     "tbl"        "data.frame" "list"       "oldClass"  
## [6] "vector"

First, using names() check the different columns in the penguins data set. For this example I will use bill_length_mm and bill_depth_mm but you can use any.

names(penguins)
## [1] "species"           "island"            "bill_length_mm"   
## [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
## [7] "sex"               "year"

Before this step, make sure you have ggpubr installed and loaded! This is just setting up the basic scatterplot.

ggscatter(y = "bill_length_mm",
          x = "bill_depth_mm",
          data = penguins)
## Warning: Removed 2 rows containing missing values (geom_point).

Now that the basic scatter plot set up, we can add to it. To add a correlation coefficient, we insert cor.coef = TRUE

ggscatter(y = "bill_length_mm",
          x = "bill_depth_mm",
          data = penguins,
          cor.coef = TRUE)
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).

Additional Reading

For more information on this topic, see

https://r-charts.com/correlation/scatter-plot-regression-line/

Keywords

palmerpenguins, ggpubr, ggscatter(), names(), is(), Building a scatter plot, Adding a correlation coefficient, scatter plot, correlation coefficient