The diamonds dataset contains information about ~54,000 diamonds, including price, carat, cut, clarity, and color.
Create a scatter plot with:
x-axis: carat
y-axis: price
library(ggplot2)
library(tidyr)
data(diamonds)
ggplot(data = diamonds, aes(x = carat, y = price))+
geom_point(color = "blue", size = 1)+
labs(title = "Scatter Plot of Diamond Carat vs. Price",
y= "Price",
x= "Carat")
Question: What type of relationship appears between carat and price?
More carats means higher price
Modify your plot:
Color points by cut
Add a meaningful title and axis labels
Apply theme_minimal()
ggplot(data = diamonds, aes(x = carat, y = price, color = cut))+
geom_point(size = 1)+
labs(title = "Scatter Plot of Diamond Carat vs. Price",
y= "Price",
x= "Carat")+
theme_minimal()
Question: Which cut appears to have higher prices at similar carat values?
Ideal
Add a regression line:
ggplot(data = diamonds, aes(x = carat, y = price, color = cut))+
geom_point(size = 1)+
geom_smooth(method="lm")+
labs(title = "Scatter Plot of Diamond Carat vs. Price",
y= "Price",
x= "Carat")+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Question: Does the relationship between carat and price appear linear?
Yes
Question: What does the “lm” option do in the geom_smooth command? What are the other options and what do they do?
Adds a linear regression line. Other options include loess, glm, and rlm. Loess does local polynomial regression fitting, glm is a generalized model, rlm is a robust model.
Because the dataset is large, reduce overplotting by:
Adjusting alpha
Changing point size
Trying geom_jitter()
ggplot(data = diamonds, aes(x = carat, y = price, color = cut))+
geom_jitter(alpha = 0.5,size = 1)+
geom_smooth(method="lm", color ="red")+
labs(title = "Scatter Plot of Diamond Carat vs. Price",
y= "Price",
x= "Carat")+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Question: Why is overplotting a concern with large datasets?
It makes the graph confusing
Question: What does the alpha command do and how does it help with overplotting?
Alpha makes the points more transparent so you can see them overlapping
Question: Based on what you see, what are the risks and benefits of using geom_jitter?
Jitter ofsets the points so you can see them when they overlap, but it might cause what value they relate to to be less clear
Create a scatter plot:
table vs price
Points colored by clarity
Facet by cut (we learn alot more about this later, but just give it a try!)
ggplot(data = diamonds, aes(x = table, y = price, color = clarity))+
geom_point(size = 1)+
facet_grid(~cut)+
labs(title = "Scatter Plot of Diamond Table vs. Price",
y= "Price",
x= "Table")+
theme_minimal()
Question: Does the relationship differ by cut?
The fair cut has lower prices, but it doesn’t differ much by other cuts/
The economics dataset contains monthly US economic data over time.
Create a line plot:
x-axis: date
y-axis: unemploy
data("economics")
ggplot(data = economics, aes(x = date, y = unemploy))+
geom_line()+
theme_minimal()
Question: Describe the overall trend over time.
Unemployment rates generally increase over time
Reshape the data using pivot_longer() to plot:
uempmed
psavert
Then create a multi-line plot with:
econ_long <- pivot_longer(
data = economics,
cols = starts_with("uempmed") | starts_with("psavert"),
values_to = "uem_psa",
names_to = "up_names"
)
print(econ_long)
## # A tibble: 1,148 × 6
## date pce pop unemploy up_names uem_psa
## <date> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 1967-07-01 507. 198712 2944 uempmed 4.5
## 2 1967-07-01 507. 198712 2944 psavert 12.6
## 3 1967-08-01 510. 198911 2945 uempmed 4.7
## 4 1967-08-01 510. 198911 2945 psavert 12.6
## 5 1967-09-01 516. 199113 2958 uempmed 4.6
## 6 1967-09-01 516. 199113 2958 psavert 11.9
## 7 1967-10-01 512. 199311 3143 uempmed 4.9
## 8 1967-10-01 512. 199311 3143 psavert 12.9
## 9 1967-11-01 517. 199498 3066 uempmed 4.7
## 10 1967-11-01 517. 199498 3066 psavert 12.8
## # ℹ 1,138 more rows
ggplot(data = econ_long, aes(x = date, y = uem_psa, color = up_names))+
geom_line()+
theme_minimal()
Question: Do these variables appear to move together over time?
Yes
Enhance your plot by:
Changing line width
Customizing colors
Formatting the date axis
Adding title, subtitle, and caption
Applying a theme (theme_bw() or theme_classic())
ggplot(data = econ_long, aes(x = date, y = uem_psa, color = up_names))+
geom_line(size = 0.4)+
scale_color_manual(
values = c("maroon", "navy"),
labels = c("Personal Savings Rate", "Median Duration of Unemployment"))+
labs(title = "Personal Savings and Duration of Unemployment by Year",
subtitle = "A graph showing the relationship between personal savings rates and umployment duration over a time range from the 1960s to 2010s",
caption = "Source: Economics Data",
x= "Year",
y = "Values",
color = "")+
theme_bw()+
theme(axis.text.x = element_text(size = 10,face = "bold" ),
text = element_text(family="Times New Roman"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.