The code chunk below will create two data sets:
You’ll need both data sets for the first question.
strava_full <-
#read_csv("C:/Users/Jacob/OneDrive - University of Vermont/Data/strava full.csv") |>
read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/strava%20full.csv") |>
janitor::clean_names() |>
dplyr::select(date = activity_date,
distance)
# Creating a data set that has a single column:
# The date from the first date in the data set to the last date in the data set
by_day <-
data.frame(
date = seq(from = as.Date("2023-06-25"), to = Sys.Date(), by = "day")
)
In order to create the graph seen in Brightspace, you’ll need to clean and wrangle the data first to put it into a form you can use to plot the cumulative distance per day for the months of May, June, July 2024.
A cumulative sum adds the current value with all the ones previously. If we have a vector of
\[[5, 3, 10, 2]\]
The cumulative vector would be:
\[[5, 5 + 3, 5 + 3 + 10, 5 + 3 + 10 + 2]\]
The function to calculate the cumulative sum is
cumsum()
. You’ll need to use it along with the appropriate
dplyr
verbs to get the data into the form you’ll need to
make the graph.
Additionally, the conversion from km to mi is 1 km = 0.621371 mi.
This question is broken into two code chunks. The first one below, you’ll wrangle the data and in the second chunk, you’ll create the graph. Make sure to display both the data set created and the graph!
bike <-
# Merging the two data sets together by date
left_join(
x = by_day,
y = strava_full |>
# Converting the date column in strava to a date type column
# and converting distance from km to mi
mutate(
date = lubridate::mdy_hms(date) |> as.Date(),
distance = distance * 0.621371
),
by = "date"
) |>
# Combining the rows of the same date together by adding the distances
summarize(
.by = date,
distance = sum(distance)
) |>
# Creating the day, month, and year columns
mutate(
year = year(date),
month = month(date, label = T, abbr = F),
day = mday(date),
# Replacing the NA values with 0
distance = if_else(is.na(distance), 0, distance)
) |>
# Calculating the cumulative distance for each month
mutate(
.by = c(year, month),
dist_tot = cumsum(distance)
) |>
# Keeping only may, june, july in 2024
filter(
year == 2024,
month %in% c("May", "June", "July")
)
tibble(bike)
## # A tibble: 92 × 6
## date distance year month day dist_tot
## <date> <dbl> <dbl> <ord> <int> <dbl>
## 1 2024-05-01 0 2024 May 1 0
## 2 2024-05-02 8.60 2024 May 2 8.60
## 3 2024-05-03 0 2024 May 3 8.60
## 4 2024-05-04 22.6 2024 May 4 31.2
## 5 2024-05-05 0 2024 May 5 31.2
## 6 2024-05-06 0 2024 May 6 31.2
## 7 2024-05-07 11.0 2024 May 7 42.2
## 8 2024-05-08 0 2024 May 8 42.2
## 9 2024-05-09 0 2024 May 9 42.2
## 10 2024-05-10 0 2024 May 10 42.2
## # ℹ 82 more rows
Using the data frame created above, form the graph seen in Brightspace.
gg_bike <-
# Mapping the 3 aesthetics to the 3 columns
ggplot(
data = bike,
mapping = aes(
x = day,
y = dist_tot,
color = month
)
) +
# Creating the lines in the graph
geom_line(
linewidth = 1
) +
# Changing the theme to theme_bw()
theme_bw() +
# Centering the title and subtitle and italicizing the caption
theme(
plot.title = element_text(hjust = 0.5, size = 16),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(face = "italic")
) +
# Adding the appropriate labels and titles
labs(
title = "Cumulative distance travelled by bike",
subtitle = "May, June, July 2024",
caption = "Data: strava.com",
x = "Day of the month",
y = "Total Distance (mi)",
color = NULL
)
gg_bike
For line graphs, it’s often better to include the group at the end of the line than in a legend so the viewer of the graph doesn’t have to look from the graph to the legend and back to the graph. Create the graph seen in Brightspace!
gg_bike +
# Adding the month and rounded mileage
ggrepel::geom_text_repel(
# The data for the text should only have the last day of each month
data = bike |>
filter(
.by = month,
day == max(day)
),
# Creating 1 character value per row with month and distance
mapping = aes(
label = paste0(month, "\n", round(dist_tot), " mi")
),
# Moving the text a little to the right
nudge_x = 1.5
) +
# Removing the legend
theme(
legend.position = "none"
)