How do I plot a line of best fit on a scatterplot when the data are in separate vectors?

If we have data in separate vectors, then we can plot them into a scatterplot and then add a line of best fit to them using the lm() function.

Data

We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library("“palmerpenguins”), and load the data with data(penguins)

#install.packages("palmerpenguins")
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.1.2
data(penguins)

#To set things up, we should first subset two columns from the data so we can put them into vectors.

X <- penguins$bill_length_mm
Y <- penguins$bill_depth_mm

How to add a line of best fit when making a scatterplot of two separate vectors

Here we will make the columns from penguins data into vectors

#Bill Length
billlength_vector<- c(X)

is(billlength_vector)
## [1] "numeric" "vector"
#Bill depth
billdepth_vector<- c(Y)

is(billdepth_vector)
## [1] "numeric" "vector"

Plot the vectors using plot() function. Add x axis and y axis labels using xlab = and ylab =

#Plot data
plot(billlength_vector ~ billdepth_vector, 
     xlab = "Bill Length (mm)", 
     ylab = "Bill Depth (mm)")

The lm() function will show you the slope and intercept values of the line of best fit. You can then put these values into the abline() function to put the line of best fit onto the plot.

#Add line of best fit
lm(billlength_vector ~ billdepth_vector, data = penguins)
## 
## Call:
## lm(formula = billlength_vector ~ billdepth_vector, data = penguins)
## 
## Coefficients:
##      (Intercept)  billdepth_vector  
##          55.0674           -0.6498
plotlobf<- plot(billlength_vector ~ billdepth_vector, 
     xlab = "Bill Length (mm)", 
     ylab = "Bill Depth (mm)"
)

abline(a=55.0674, b=-0.6498)

We can also do this using a different package called ggpubr and ggplot2

#First want to call up ggpubr and ggscatterplot
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)

#Use ggscatter to plot
ggscatter(data = penguins, 
          y = "bill_length_mm",
          x = "bill_depth_mm",
          xlab = "bill length (mm)",
          ylab= "bill depth (mm)",
          )
## Warning: Removed 2 rows containing missing values (geom_point).

#We can then add in the line of best fit as well as display the correlational coefficient 
ggscatter(data = penguins, 
          y = "bill_length_mm",
          x = "bill_depth_mm",
          xlab = "bill length (mm)",
          ylab= "bill depth (mm)",
          add = "reg.line", 
          corr.coef = TRUE
          )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).

## Warning: Removed 2 rows containing missing values (geom_point).

Additional Reading

For more information on this topic, see https://rpubs.com/lowbrowR/palmerpenguinsplot

Keywords

palmerpenguins data() plot() xlab, ylab lm() abline() ggpubr and ggplot2 ggscatter() add = reg.line