Here I will demonstrate how to add color coding using both a categorical and a continuous variable.
We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library(““palmerpenguins”), and load the data with data(penguins)
#install.packages("palmerpenguins")
library(palmerpenguins)
data(penguins)
First, you need to install and load the ggpubr package. I commented out the first line in this code chunk since I’ve already installed ggpubr.
#install.packages("ggpubr")
library(ggpubr)
## Loading required package: ggplot2
The main way to color code a scatterplot using ggpubr is in the ggscatter() function, specifically with the color argument.
A categorical variable is a qualitative variable that has a distinct number of categories. Here, I am color coding by sex, where the options are male, female, or NA. The arguments of the function ggscatter() are the x and y columns I chose to build the scatterplot based on. The color argument is where I identify the variable to base the color coding on. Lastly, the data argument shows the name of the dataframe that contains all the data.
ggscatter(y = "bill_depth_mm",
x = "bill_length_mm",
color = "sex",
data = penguins)
## Warning: Removed 2 rows containing missing values (geom_point).
This code chunk looks the same as the previous one, except the variable being used for color is body_mass_g instead of sex. body_mass_g is a continuous variable, meaning that the values can take on any numeric values.
ggscatter(y = "bill_depth_mm",
x = "bill_length_mm",
color = "body_mass_g",
data = penguins)
## Warning: Removed 2 rows containing missing values (geom_point).
For more information on this topic, see https://rpkgs.datanovia.com/ggpubr/reference/ggscatter.html