Reading in the data

The code chunk below will create two data sets:

  1. strava_full: A data set on 149 recorded bike activities with two columns
  1. date: The date of the activity. A date can appear multiple times if there were multiple activities on the same day
  2. distance: The total distance of the trip in kilometers (km).
  1. by_day: A data set with one column - Each day from the earliest date to the latest date of the strava_full data set.

You’ll need both data sets for the first question.

strava_full <- 
  #read_csv("C:/Users/Jacob/OneDrive - University of Vermont/Data/strava full.csv") |> 
  read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/strava%20full.csv") |> 
  janitor::clean_names() |> 
  dplyr::select(date = activity_date,
                distance)

# Creating a data set that has a single column: 
# The date from the first date in the data set to the last date in the data set
by_day <- 
  data.frame(
    date = seq(from = as.Date("2023-06-25"), to = Sys.Date(), by = "day")
  )

Question 1: Total distance travelled by day in May, June, and July 2024

In order to create the graph seen in Brightspace, you’ll need to clean and wrangle the data first to put it into a form you can use to plot the cumulative distance per day for the months of May, June, July 2024.

A cumulative sum adds the current value with all the ones previously. If we have a vector of

\[[5, 3, 10, 2]\]

The cumulative vector would be:

\[[5, 5 + 3, 5 + 3 + 10, 5 + 3 + 10 + 2]\]

The function to calculate the cumulative sum is cumsum(). You’ll need to use it along with the appropriate dplyr verbs to get the data into the form you’ll need to make the graph.

Additionally, the conversion from km to mi is 1 km = 0.621371 mi.

This question is broken into two code chunks. The first one below, you’ll wrangle the data and in the second chunk, you’ll create the graph. Make sure to display both the data set created and the graph!

bike <- 
  # Merging the two data sets together by date
  left_join(
  x = by_day,
  y = strava_full |>
        # Converting the date column in strava to a date type column
        # and converting distance from km to mi
        mutate(
          date = lubridate::mdy_hms(date) |> as.Date(),
          distance = distance * 0.621371
        ),
  by = "date"
) |> 
  # Combining the rows of the same date together by adding the distances
  summarize(
    .by = date,
    distance = sum(distance)
  ) |>
  # Creating the day, month, and year columns
  mutate(
    year = year(date),
    month = month(date, label = T, abbr = F),
    day = mday(date),
    # Replacing the NA values with 0
    distance = if_else(is.na(distance), 0, distance)
  ) |> 
  # Calculating the cumulative distance for each month
  mutate(
    .by = c(year, month),
    dist_tot = cumsum(distance)
  ) |> 
  # Keeping only may, june, july in 2024
  filter(
    year == 2024,
    month %in% c("May", "June", "July")
  )

tibble(bike)
## # A tibble: 92 × 6
##    date       distance  year month   day dist_tot
##    <date>        <dbl> <dbl> <ord> <int>    <dbl>
##  1 2024-05-01     0     2024 May       1     0   
##  2 2024-05-02     8.60  2024 May       2     8.60
##  3 2024-05-03     0     2024 May       3     8.60
##  4 2024-05-04    22.6   2024 May       4    31.2 
##  5 2024-05-05     0     2024 May       5    31.2 
##  6 2024-05-06     0     2024 May       6    31.2 
##  7 2024-05-07    11.0   2024 May       7    42.2 
##  8 2024-05-08     0     2024 May       8    42.2 
##  9 2024-05-09     0     2024 May       9    42.2 
## 10 2024-05-10     0     2024 May      10    42.2 
## # ℹ 82 more rows

Using the data frame created above, form the graph seen in Brightspace.

gg_bike <- 
  # Mapping the 3 aesthetics to the 3 columns
  ggplot(
    data = bike,
    mapping = aes(
      x = day,
      y = dist_tot,
      color = month
    )
  ) + 
  # Creating the lines in the graph
  geom_line(
    linewidth = 1
  ) + 
  # Changing the theme to theme_bw()
  theme_bw() +
  # Centering the title and subtitle and italicizing the caption
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5),
    plot.caption = element_text(face = "italic")
  ) + 
  # Adding the appropriate labels and titles
  labs(
    title = "Cumulative distance travelled by bike",
    subtitle = "May, June, July 2024",
    caption = "Data: strava.com",
    x = "Day of the month",
    y = "Total Distance (mi)",
    color = NULL
  ) 

gg_bike

Question 2: Better line graph

For line graphs, it’s often better to include the group at the end of the line than in a legend so the viewer of the graph doesn’t have to look from the graph to the legend and back to the graph. Create the graph seen in Brightspace!

gg_bike +
  # Adding the month and rounded mileage
  ggrepel::geom_text_repel(
    # The data for the text should only have the last day of each month
    data = bike |> 
      filter(
        .by = month,
        day == max(day)
      ),
    # Creating 1 character value per row with month and distance
    mapping = aes(
      label = paste0(month, "\n", round(dist_tot), " mi")
    ),
    # Moving the text a little to the right
    nudge_x = 1.5
  ) + 
  # Removing the legend
  theme(
    legend.position = "none"
  )