Part 1: Scatter Plots (Using diamonds)

The diamonds dataset contains information about ~54,000 diamonds, including price, carat, cut, clarity, and color.

Task 1: Basic Scatter Plot

Create a scatter plot with:

  • x-axis: carat

  • y-axis: price

library(ggplot2)
library(tidyr)
data(diamonds)

ggplot(data = diamonds, aes(x = carat, y = price))+
  geom_point(color = "blue", size = 1)+
  labs(title = "Scatter Plot of Diamond Carat vs. Price",
       y= "Price",
       x= "Carat")

Question: What type of relationship appears between carat and price?

More carats means higher price

Task 2: Add Aesthetic Mappings

Modify your plot:

  • Color points by cut

  • Add a meaningful title and axis labels

  • Apply theme_minimal()

ggplot(data = diamonds, aes(x = carat, y = price, color = cut))+
  geom_point(size = 1)+
  labs(title = "Scatter Plot of Diamond Carat vs. Price",
       y= "Price",
       x= "Carat")+
  theme_minimal()

Question: Which cut appears to have higher prices at similar carat values?

Ideal

Task 3: Add a Trend Line

Add a regression line:

  • geom_smooth(method = “lm”)
ggplot(data = diamonds, aes(x = carat, y = price, color = cut))+
  geom_point(size = 1)+
   geom_smooth(method="lm")+
  labs(title = "Scatter Plot of Diamond Carat vs. Price",
       y= "Price",
       x= "Carat")+
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Question: Does the relationship between carat and price appear linear?

Yes

Question: What does the “lm” option do in the geom_smooth command? What are the other options and what do they do?

Adds a linear regression line. Other options include loess, glm, and rlm. Loess does local polynomial regression fitting, glm is a generalized model, rlm is a robust model.

Task 4: Improve Visualization

Because the dataset is large, reduce overplotting by:

  • Adjusting alpha

  • Changing point size

  • Trying geom_jitter()

ggplot(data = diamonds, aes(x = carat, y = price, color = cut))+
  geom_jitter(alpha = 0.5,size = 1)+
   geom_smooth(method="lm", color ="red")+
  labs(title = "Scatter Plot of Diamond Carat vs. Price",
       y= "Price",
       x= "Carat")+
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Question: Why is overplotting a concern with large datasets?

It makes the graph confusing

Question: What does the alpha command do and how does it help with overplotting?

Alpha makes the points more transparent so you can see them overlapping

Question: Based on what you see, what are the risks and benefits of using geom_jitter?

Jitter ofsets the points so you can see them when they overlap, but it might cause what value they relate to to be less clear

Task 5: Challenge Scatter Plot

Create a scatter plot:

  • table vs price

  • Points colored by clarity

  • Facet by cut (we learn alot more about this later, but just give it a try!)

ggplot(data = diamonds, aes(x = table, y = price, color = clarity))+
  geom_point(size = 1)+
  facet_grid(~cut)+
  labs(title = "Scatter Plot of Diamond Table vs. Price",
       y= "Price",
       x= "Table")+
  theme_minimal()

Question: Does the relationship differ by cut?

The fair cut has lower prices, but it doesn’t differ much by other cuts/

Part 2: Line Plots (Using economics Dataset)

The economics dataset contains monthly US economic data over time.

Task 6: Basic Line Plot

Create a line plot:

  • x-axis: date

  • y-axis: unemploy

data("economics")

ggplot(data = economics, aes(x = date, y = unemploy))+
  geom_line()+
  theme_minimal()

Question: Describe the overall trend over time.

Unemployment rates generally increase over time

Task 7: Multiple Lines on One Plot

Reshape the data using pivot_longer() to plot:

  • uempmed

  • psavert

Then create a multi-line plot with:

  • color = variable
econ_long <- pivot_longer(
  data = economics,
  cols = starts_with("uempmed") | starts_with("psavert"),
  values_to = "uem_psa",
  names_to = "up_names"
)
print(econ_long)
## # A tibble: 1,148 × 6
##    date         pce    pop unemploy up_names uem_psa
##    <date>     <dbl>  <dbl>    <dbl> <chr>      <dbl>
##  1 1967-07-01  507. 198712     2944 uempmed      4.5
##  2 1967-07-01  507. 198712     2944 psavert     12.6
##  3 1967-08-01  510. 198911     2945 uempmed      4.7
##  4 1967-08-01  510. 198911     2945 psavert     12.6
##  5 1967-09-01  516. 199113     2958 uempmed      4.6
##  6 1967-09-01  516. 199113     2958 psavert     11.9
##  7 1967-10-01  512. 199311     3143 uempmed      4.9
##  8 1967-10-01  512. 199311     3143 psavert     12.9
##  9 1967-11-01  517. 199498     3066 uempmed      4.7
## 10 1967-11-01  517. 199498     3066 psavert     12.8
## # ℹ 1,138 more rows
ggplot(data = econ_long, aes(x = date, y = uem_psa, color = up_names))+
  geom_line()+
  theme_minimal()

Question: Do these variables appear to move together over time?

Yes

Task 8: Customize Your Line Plot

Enhance your plot by:

  • Changing line width

  • Customizing colors

  • Formatting the date axis

  • Adding title, subtitle, and caption

  • Applying a theme (theme_bw() or theme_classic())

ggplot(data = econ_long, aes(x = date, y = uem_psa, color = up_names))+
  geom_line(size = 0.4)+
    scale_color_manual(
      values = c("maroon", "navy"),
      labels = c("Personal Savings Rate", "Median Duration of Unemployment"))+
  labs(title = "Personal Savings and Duration of Unemployment by Year",
       subtitle = "A graph showing the relationship between personal savings rates and umployment duration over a time range from the 1960s to 2010s",
       caption = "Source: Economics Data",
      x= "Year",
      y = "Values",
      color = "")+
  theme_bw()+
  theme(axis.text.x = element_text(size = 10,face = "bold" ),
        text = element_text(family="Times New Roman"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.