The data set olympics.csv (found at https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/olympics.csv) has data on about 6000 Olympic athletes that completed in 2024 Olympic games in one of 10 sports:
Athletics, Swimming, Rowing, Judo, Shooting, Sailing, Volleyball, Equestrian, Fencing, Boxing, Cycling Road, Gymnastics
(Athletics is a catchall for Track and Field style of events)
The two relevant columns are:
sport: Which of the 10 sports the athlete participated in
age: The age the athlete is at the start of the 2024 Olympic games
Using the data set, create the side-by-side box plots seen in Brightspace. The hex codes for the colors are #0081c8 and #FCB131.
To reorder the sports to match what is in Brightspace, use
fct_reorder()
(to see how it works, the help menu is your
friend!)
ggplot(
# Reading the data directly into ggplot (a bit of a shorcut)
data = read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/olympics.csv"),
# Mapping age to x and a reordered sports (by age) to y
mapping = aes(
x = age,
y = fct_reorder(sport, age)
)
) +
# Creating the box plots of age by sport
geom_boxplot(
fill = "#0081c8",
color = "#FCB131"
) +
# Changing the labels and adding a title and subtitle
labs(
x = NULL, # Removing the space for an x-axis label
y = "Sport",
title = "Age of Olympians at the start of the Olympics by sport",
subtitle = "10 of the most common sports only"
) +
# Changing the default theme and centering the title and subtitle
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12)
)
The used cars.csv file has info about 400 cars listed on Craigslist in 2023 (github link: https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/used%20cars.csv)
The columns are:
Create the first graph for question 2 seen in Brightspace. Save it as gg_q2a and make sure to display it in the knitted document
gg_q2a <-
ggplot(
# Reading the data directly into ggplot()
data = read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/used%20cars.csv"),
# Mapping the columns to the corresponding aesthetics
mapping = aes(
x = odometer,
y = price,
color = manufacturer
)
) +
# Adding the points and a trend line
geom_point() +
geom_smooth(
method = "lm", # Straight line
se = F, # No shaded confidence region
formula = y ~ x, # Default formula
show.legend = F, # No line in the color guide
color = "black" # black line (overrides color = manufacturer)
)
gg_q2a
Using gg_q2a, add the title, subtitle, and caption and change/remove the labels for x, y, and color as seen in Brightspace. Change the legend to match Brightspace and move the legend to the top right corner of the plot.
Save it as gg_q2b display it in the knitted document.
gg_q2b <-
gg_q2a +
# Changing the labels and adding the title, subtitle, and caption
labs(
x = "Mileage",
y = NULL,
color = NULL,
title = "Used Cars for Sale",
subtitle = "Listed on Craiglist in 2023",
caption = "Data: kaggle.com"
) +
# Changing the theme and moving the legend to inside the plot
theme_classic() +
theme(
legend.position = "inside",
legend.position.inside = c(0.9, 0.9)
)
gg_q2b
Make the final changes to the graph in gg_q2b that can be seen in Brightspace. Make sure to pay close attention to the color guide!
The colors used are
Ford: #47a8e5 Chevrolet: #D1AD57 Honda: #CC0000 Jeep: #485F2B
gg_q2b +
# Changing the y-axis labels to have $ and be every $10k
scale_y_continuous(
labels = scales::label_dollar(),
breaks = seq(from = 0, to = 6e4, by = 1e4)
) +
# x-axis not in scientific notation
scale_x_continuous(
labels = scales::label_comma(),
#breaks = seq(from = 0, to = 2e5, by = 25e3)
) +
# Changing the colors used and labels for the manufacturer
scale_color_manual(
labels = c(chevrolet ="Chevrolet", ford ="Ford",
honda ="Honda", jeep = "Jeep"),
values = c(chevrolet = "#D1AD57", ford = "#47a8e5",
honda = "#CC0000", jeep = "#485F2B")
)
Create a set of 4 scatter plots with a fitted line in the same overall graph - 1 for each manufacturer.
Each individual plot should have odometer on the x-axis, price on the y-axis, and age of the car represented by color.
ggplot(
data = read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/used%20cars.csv"),
mapping = aes(
x = odometer,
y = price,
color = 2023 - year # age = 2023 - year
)
) +
# Adding points and a trend line
geom_point() +
geom_smooth(
method = "lm",
se = F,
formula = y ~ x,
show.legend = F,
color = "black"
) +
# Adding context
labs(
x = "Mileage",
y = NULL,
color = "Age",
title = "Used Cars for Sale",
subtitle = "Listed on Craiglist in 2023",
caption = "Data: kaggle.com"
) +
# Changing the theme
theme_bw() +
# Creating 4 plots (small multiples) for each manufacturer
facet_wrap(
facets = vars(manufacturer)
)