###Practice from the board- creating scatter and line plots with ggplot2

library(ggplot2)

data("USArrests")

ggplot(data = USArrests, aes(x= Murder, y = Assault))+ geom_point(color= "blue", size= 3)+ geom_smooth(method = "lm", se= TRUE, color= "red")+
labs(title = "Scatter Plot of Assault vs. Murder Rates", 
X= "Murder Rate",
y= "Assault Rate") +
theme_minimal()
## Ignoring unknown labels:
## • X : "Murder Rate"
## `geom_smooth()` using formula = 'y ~ x'

USArrests$State <- rownames(USArrests)
USArrests$AverageCrimeRate <- rowMeans(USArrests[ , c("Murder", "Assault", "Rape")])

 ggplot(data = USArrests, aes(x= State, y= AverageCrimeRate, group= 1))+
   geom_line(color= "darkgreen", size= 1)+
   geom_point(color= "orange", size= 3)+
   labs(tile= "line Plot of Average Crime Rate by State",
   x = "State",
   y = "Average Crime Rate") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1) )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Ignoring unknown labels:
## • tile : "line Plot of Average Crime Rate by State"

# Part 1: Scatter Plots (Using diamonds)

The diamonds dataset contains information about ~54,000 diamonds, including price, carat, cut, clarity, and color.

Task 1: Basic Scatter Plot

Create a scatter plot with:

ggplot(data = diamonds, aes(x= carat, y = price))+ geom_point(color= "pink", size= 3)

Question: What type of relationship appears between carat and price? When the weight of carat increases, the price increases as well.

Task 2: Add Aesthetic Mappings

Modify your plot:

ggplot(data = diamonds, aes(x= carat, y = price, color = cut))+
  geom_point()+
  labs(
title = "The price of diamonds compared to the quality of the cut", 
x= "Carat",
y= "Price",
color = "cut quality") +
theme_minimal()

Question: Which cut appears to have higher prices at similar carat values? The ideal and premium cuts have usually have a higher price.

Task 3: Add a Trend Line

Add a regression line:

ggplot(data = diamonds, aes(x= carat, y = price))+
geom_point(color= "yellow", size= 5)+
geom_smooth(method = "lm", se= TRUE, color= "blue")+
labs(title = "The price of diamonds compared to the quality of the cut", 
x= "carat",
y= "price") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Question: Does the relationship between carat and price appear linear? It does appear linear, but ti is not perfect. As the price increases the carat size also increases. There is a clustering happening on the graoh as well making it have outliers and not being completely linear.

Question: What does the “lm” option do in the geom_smooth command? What are the other options and what do they do? the lm command draws a linear regression line that is straight. Some other options are glm which is a generalized linear model. There is also glm which is a generalized additive model. Loess is also a command that is a smooth curved line instead of a straight line.

Task 4: Improve Visualization

Because the dataset is large, reduce overplotting by:

ggplot(diamonds, aes(carat,price,colour = carat))+
  geom_jitter(size=0.7, alpha = 0.3)+
  labs(
    title = "The price of diamonds compared to the quality of the cut",
    x = "carat size",
    y = "price")+
  theme_minimal()

Question: Why is overplotting a concern with large datasets? This is a problem because there is a lot of overlapping in the graphs and making it hard to see distribution.

Question: What does the alpha command do and how does it help with overplotting? The alpha command control the trasnparency of points. This allows for there to be overlapping points, but allows them to be seen more clear and helps reveal where the points are concentrated. Question: Based on what you see, what are the risks and benefits of using geom_jitter? Some risks of geom_jitter are how it moves the points slightly from there values and it makes the plots look misleading. Some benefits are it spreadds overlapping points apart and makes the dense areas easier to see.

Task 5: Challenge Scatter Plot

Create a scatter plot:

ggplot(diamonds,aes(table,price, colour = clarity))+
  geom_point(size = 0.7, alpha = 0.5)+
labs(
  title = "The price of diamonds compared to the quality of the cut",
  x = "table",
  y = "price")+
  
  
theme_minimal()+
facet_wrap(~cut)

Question: Does the relationship differ by cut? The relationships do differ by cut, but the overall patterns are similar between groups.

Part 2: Line Plots (Using economics Dataset)

The economics dataset contains monthly US economic data over time.

Task 6: Basic Line Plot

Create a line plot:

  • x-axis: date

  • y-axis: unemploy

ggplot(economics, aes(date, unemploy))+
  geom_line()

Question: Describe the overall trend over time. There are lots of ups and downs on the line graph. The highest peak on the graph is in 2010. There is a mjor decrease after 2010. ## Task 7: Multiple Lines on One Plot

Reshape the data using pivot_longer() to plot:

  • uempmed

  • psavert

Then create a multi-line plot with:

  • color = variable
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ lubridate 1.9.5     ✔ tibble    3.3.1
## ✔ purrr     1.2.1     ✔ tidyr     1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
economics_pivot = economics %>%
  pivot_longer(
    cols = c(uempmed, psavert),
    names_to = "variable",
    values_to = "value")

ggplot(economics_pivot, aes(date, value, color = variable))+
    geom_line()

Question: Do these variables appear to move together over time? Yes, they do move up and down around the same times. ## Task 8: Customize Your Line Plot

Enhance your plot by:

  • Changing line width

  • Customizing colors

  • Formatting the date axis

  • Adding title, subtitle, and caption

  • Applying a theme (theme_bw() or theme_classic())

ggplot(economics_pivot, aes(date, value, color = variable)) +
  geom_line(linewidth = 1.0) +
  scale_color_manual(values = c("pink", "purple")) +
  scale_x_date(date_labels = "%Y", date_breaks = "10 years") +
  labs(
    title = "Economic Trends Over Time",
    subtitle = "Comparison of Median Unemployment Duration and Personal Savings Rate",
    x = "Date",
    y = "Value",
    caption = "Source: ggplot2 economics dataset",
    color = "Variable"
  ) +
  theme_bw()