DSLabs

Author

Zijin Wang

Load necessary libraries

library(dslabs)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)

Load the murders dataset

data(olive)

Create the scatterplot

ggplot(data = olive, aes(x = oleic, y = linoleic, color = region)) +  
  geom_point(aes(size = eicosenoic), alpha = 0.6) +  
  geom_smooth(method = "lm", se = FALSE, col = "darkgrey", linetype = "dashed") + 
  labs(
    x = "Oleic Acid Level", 
    y = "Linoleic Acid Level", 
    title = "Relationship between Oleic and Linoleic Acid in Italian Olive Oils", 
    subtitle = "Data from the olive dataset in dslabs package",
    caption = "Source: dslabs"
  ) + 
  theme_minimal() +   
  scale_color_brewer(palette = "Paired", name = "Italian Region") +   
  theme(legend.position = "bottom") +
  guides(size = guide_legend(title = "Eicosenoic Acid Level"))   
`geom_smooth()` using formula = 'y ~ x'

For this graph, I’ve chosen the olive dataset from the dslabs package. It showcases the relationship between two specific fatty acid levels—Oleic and Linoleic—in Italian olive oils. The main focus is on understanding how the levels of Oleic acid (on the x-axis) relate to the levels of Linoleic acid (on the y-axis) in various samples of olive oils from Italy. Each point on the scatterplot represents a sample of olive oil, and its color indicates the region in Italy from which it originates. The size of each point on the scatterplot corresponds to the level of Eicosenoic acid in the olive oil. The dashed line represents a linear regression trend, which shows the overall pattern or direction of the relationship between Oleic and Linoleic acids.