dslabs homework

Author

Bryce Williams

library(dslabs)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data(olive)
olibe <- olive |>
  # Start the ggplot object and map the aesthetics
  ggplot(aes(x = oleic, y = linoleic, color = region)) +
  
  # scatterplot
  geom_point(size = 3, alpha = 0.7) +
  
  # Add a title, labels, and color legend title
  labs(
    title = "Oleic vs. Linoleic Acid Composition in Italian Olive Oil by Region",
    x = "Oleic Acid Percentage (Monounsaturated)",
    y = "Linoleic Acid Percentage (Polyunsaturated)",
    color = "Geographical Region"
  ) +
  # theme swapping
  theme_minimal() + 
  # Further customize the theme elements for better readability and style
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), # Center and bold title
    axis.title = element_text(size = 12),
    legend.position = "right",
    panel.grid.minor = element_blank() # Remove minor gridlines for a cleaner look, found while reading through the theme customization options
  ) +
  # Doing color scaling
  scale_color_brewer(palette = "Pastel2")
olibe

I used the olive dataset from the dslabs package, which contains the percentage composition of eight fatty acids found in 572 Italian olive oils, along with their region and specific area of origin. My goal was to create an interesting visualization to show what I had learned from the dataset, as I knew nothing about olives before I had started. I created a scatterplot to explore the fundamental relationship between two key and largely inversely correlated fatty acids: Oleic Acid and Linoleic Acid, with the data points colored by the major geographical region (Northern Italy, Sardinia, Southern Italy).

Graph Description and Insights: The plot visualizes the inverse relationship between Oleic Acid (x-axis) and Linoleic Acid (y-axis), with data points colored by the region variable.The Northern Italy points(Green) cluster at the bottom-left (low Oleic, high Linoleic).The Southern Italy points (Blue) cluster at the top-right (high Oleic, low Linoleic).The Sardinia points (Orange) are clearly separated in the middle.This visualization clearly separates the three regional groups, demonstrating the distinct chemical fingerprint of olive oils based on their geographical origin.