Part 1: Scatter Plots (Using diamonds)

The diamonds dataset contains information about ~54,000 diamonds, including price, carat, cut, clarity, and color.

Task 1: Basic Scatter Plot

Create a scatter plot with:

  • x-axis: carat

  • y-axis: price

library(ggplot2)
ggplot(diamonds, aes(x = carat, y = price)) + geom_point()

Question: What type of relationship appears between carat and price?

There is a positive relationship between carat and price. As the carat increases the price also increases.

Task 2: Add Aesthetic Mappings

Modify your plot:

  • Color points by cut

  • Add a meaningful title and axis labels

  • Apply theme_minimal()

ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point() +
  labs(title = "Diamond Price vs Carat by Cut",
       x = "Carat",
       y = "Price") +
  theme_minimal()

Question: Which cut appears to have higher prices at similar carat values?

The ideal and premium cuts appear to have higher prices at similar carat values.

Task 3: Add a Trend Line

Add a regression line:

  • geom_smooth(method = “lm”)
ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Diamond Price vs Carat by Cut",
       x = "Carat",
       y = "Price") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Question: Does the relationship between carat and price appear linear?

The relationship looks mostly positive but not perfectly linear.

Question: What does the “lm” option do in the geom_smooth command? What are the other options and what do they do?

The “lm” option tells ggplot to use a linear model which draws a stright best fit regression line through the data. Other options incle “loses”, which creates a smooth curved line that follows the pattern of the data “gam”, which allows more flexible curves, and “glm” which uses a gernealized lionear model for differnt types of data.

Task 4: Improve Visualization

Because the dataset is large, reduce overplotting by:

  • Adjusting alpha

  • Changing point size

  • Trying geom_jitter()

ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
  geom_jitter(alpha = 0.3, size = 1) +
  geom_smooth(method = "lm") +
  labs(title = "Diamond Price vs Carat by Cut",
       x = "Carat",
       y = "Price") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Question: Why is overplotting a concern with large datasets?

Many datapoints overlap, making it hard to see the true pattern.

Question: What does the alpha command do and how does it help with overplotting?

Makes the points more transparent so overlapping points become easier to see.

Question: Based on what you see, what are the risks and benefits of using geom_jitter?

The benefit is thay it spreads points out so overlapping points are easier to see, but the risk if that it slightly moves the points and may make the data look a little less exact.

Task 5: Challenge Scatter Plot

Create a scatter plot:

  • table vs price

  • Points colored by clarity

  • Facet by cut (we learn alot more about this later, but just give it a try!)

ggplot(diamonds, aes(x = table, y = price, color = clarity)) +
  geom_point(alpha = 0.3, size = 1) +
  facet_wrap(~cut) +
  theme_minimal()

Question: Does the relationship differ by cut?

Yes the relationship differs by cut. Each categroy shows a slightly differnt pattern and spread of pieces.

Part 2: Line Plots (Using economics Dataset)

The economics dataset contains monthly US economic data over time.

Task 6: Basic Line Plot

Create a line plot:

  • x-axis: date

  • y-axis: unemploy

ggplot(economics, aes(x = date, y = unemploy)) +
  geom_line()

Question: Describe the overall trend over time.

The overall trend shpws that unemployment changes over time with several increases and decreases. ## Task 7: Multiple Lines on One Plot

Reshape the data using pivot_longer() to plot:

  • uempmed

  • psavert

Then create a multi-line plot with:

  • color = variable
library(tidyr)

economics_long <- pivot_longer(economics, cols = c(uempmed, psavert),
                               names_to = "variable", values_to = "value")

ggplot(economics_long, aes(x = date, y = value, color = variable)) +
  geom_line()

Question: Do these variables appear to move together over time?

no, the variables do not move together consistently over time. Sometimes they increase or decrease at the same time, but often follow diff trends.

Task 8: Customize Your Line Plot

Enhance your plot by:

  • Changing line width

  • Customizing colors

  • Formatting the date axis

  • Adding title, subtitle, and caption

  • Applying a theme (theme_bw() or theme_classic())

ggplot(economics_long, aes(x = date, y = value, color = variable)) +
  geom_line(linewidth = 1.2) +
  scale_color_manual(values = c("pink", "yellow")) +
  scale_x_date(date_labels = "%Y", date_breaks = "5 years") +
  labs(
    title = "Economic Trends Over Time",
    subtitle = "U.S. Unemployment Duration vs Personal Savings Rate",
    caption = "Source: ggplot2 economics dataset",
    x = "Year",
    y = "Value"
  ) +
  theme_classic()