###Practice from the board- creating scatter and line plots with ggplot2
library(ggplot2)
data("USArrests")
ggplot(data = USArrests, aes(x= Murder, y = Assault))+ geom_point(color= "blue", size= 3)+ geom_smooth(method = "lm", se= TRUE, color= "red")+
labs(title = "Scatter Plot of Assault vs. Murder Rates",
X= "Murder Rate",
y= "Assault Rate") +
theme_minimal()
## Ignoring unknown labels:
## • X : "Murder Rate"
## `geom_smooth()` using formula = 'y ~ x'
USArrests$State <- rownames(USArrests)
USArrests$AverageCrimeRate <- rowMeans(USArrests[ , c("Murder", "Assault", "Rape")])
ggplot(data = USArrests, aes(x= State, y= AverageCrimeRate, group= 1))+
geom_line(color= "darkgreen", size= 1)+
geom_point(color= "orange", size= 3)+
labs(tile= "line Plot of Average Crime Rate by State",
x = "State",
y = "Average Crime Rate") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 90, hjust = 1) )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Ignoring unknown labels:
## • tile : "line Plot of Average Crime Rate by State"
# Part 1: Scatter Plots (Using diamonds)
The diamonds dataset contains information about ~54,000 diamonds, including price, carat, cut, clarity, and color.
Create a scatter plot with:
x-axis: carat
y-axis: price
ggplot(data = diamonds, aes(x= carat, y = price))+ geom_point(color= "pink", size= 3)
Question: What type of relationship appears between carat and price? When the weight of carat increases, the price increases as well.
Modify your plot:
Color points by cut
Add a meaningful title and axis labels
Apply theme_minimal()
ggplot(data = diamonds, aes(x= carat, y = price, color = cut))+
geom_point()+
labs(
title = "The price of diamonds compared to the quality of the cut",
x= "Carat",
y= "Price",
color = "cut quality") +
theme_minimal()
Question: Which cut appears to have higher prices at similar carat values? The ideal and premium cuts have usually have a higher price.
Add a regression line:
ggplot(data = diamonds, aes(x= carat, y = price))+
geom_point(color= "yellow", size= 5)+
geom_smooth(method = "lm", se= TRUE, color= "blue")+
labs(title = "The price of diamonds compared to the quality of the cut",
x= "carat",
y= "price") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Question: Does the relationship between carat and price appear linear? It does appear linear, but ti is not perfect. As the price increases the carat size also increases. There is a clustering happening on the graoh as well making it have outliers and not being completely linear.
Question: What does the “lm” option do in the geom_smooth command? What are the other options and what do they do? the lm command draws a linear regression line that is straight. Some other options are glm which is a generalized linear model. There is also glm which is a generalized additive model. Loess is also a command that is a smooth curved line instead of a straight line.
Because the dataset is large, reduce overplotting by:
Adjusting alpha
Changing point size
Trying geom_jitter()
ggplot(diamonds, aes(carat,price,colour = carat))+
geom_jitter(size=0.7, alpha = 0.3)+
labs(
title = "The price of diamonds compared to the quality of the cut",
x = "carat size",
y = "price")+
theme_minimal()
Question: Why is overplotting a concern with large datasets? This is a problem because there is a lot of overlapping in the graphs and making it hard to see distribution.
Question: What does the alpha command do and how does it help with overplotting? The alpha command control the trasnparency of points. This allows for there to be overlapping points, but allows them to be seen more clear and helps reveal where the points are concentrated. Question: Based on what you see, what are the risks and benefits of using geom_jitter? Some risks of geom_jitter are how it moves the points slightly from there values and it makes the plots look misleading. Some benefits are it spreadds overlapping points apart and makes the dense areas easier to see.
Create a scatter plot:
table vs price
Points colored by clarity
Facet by cut (we learn alot more about this later, but just give it a try!)
ggplot(diamonds,aes(table,price, colour = clarity))+
geom_point(size = 0.7, alpha = 0.5)+
labs(
title = "The price of diamonds compared to the quality of the cut",
x = "table",
y = "price")+
theme_minimal()+
facet_wrap(~cut)
Question: Does the relationship differ by cut? The relationships do differ by cut, but the overall patterns are similar between groups.
The economics dataset contains monthly US economic data over time.
Create a line plot:
x-axis: date
y-axis: unemploy
ggplot(economics, aes(date, unemploy))+
geom_line()
Question: Describe the overall trend over time. There are lots of ups and downs on the line graph. The highest peak on the graph is in 2010. There is a mjor decrease after 2010. ## Task 7: Multiple Lines on One Plot
Reshape the data using pivot_longer() to plot:
uempmed
psavert
Then create a multi-line plot with:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ lubridate 1.9.5 ✔ tibble 3.3.1
## ✔ purrr 1.2.1 ✔ tidyr 1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
economics_pivot = economics %>%
pivot_longer(
cols = c(uempmed, psavert),
names_to = "variable",
values_to = "value")
ggplot(economics_pivot, aes(date, value, color = variable))+
geom_line()
Question: Do these variables appear to move together over time? Yes, they do move up and down around the same times. ## Task 8: Customize Your Line Plot
Enhance your plot by:
Changing line width
Customizing colors
Formatting the date axis
Adding title, subtitle, and caption
Applying a theme (theme_bw() or theme_classic())
ggplot(economics_pivot, aes(date, value, color = variable)) +
geom_line(linewidth = 1.0) +
scale_color_manual(values = c("pink", "purple")) +
scale_x_date(date_labels = "%Y", date_breaks = "10 years") +
labs(
title = "Economic Trends Over Time",
subtitle = "Comparison of Median Unemployment Duration and Personal Savings Rate",
x = "Date",
y = "Value",
caption = "Source: ggplot2 economics dataset",
color = "Variable"
) +
theme_bw()