Load in any necessary package

library(ggplot2)
library(tidyverse)
library(readxl)
library(showtext)
library(viridis)
library(ggExtra)
library(plotly)
library(dslabs)
library(rjson)


Note: on data sets

You may use any data of your choosing in the following problems, but I would suggest you choose a data set you find interesting or would give an interesting graph (so, don’t use something like the old iris data set). You will get more out of the project in the end, and it will look better to those in the future you are showing it to. If the data set comes from an R package then reference this. If the data set is from elsewhere, then upload a copy to blackboard (.csv format).


Problem 1 [20 points]

Create a plotly graph of your choosing that represents at least two variables, one of which must be a categorical variable.

This plot can be a scatter plot, overlayed density plots (graphing variable is continuous, separate densities grouped by categorical variable), etc. choropleth maps could also be on the list…you have to admit they look kinda cool.

The graph must include:

1. customized hover text that is informative to the graphing elements in the plot

2. separate color to represent groups

3. labeled axes and appropriate title

Include at least a 1-paragraph discussion about the graph. Discuss what is being plotted and what information is being displayed in the graph. Discuss any information that the reader may gain from hovering the cursor over graphing elements. Discuss any issues/chalenges you had (if any) while making the plot, and you you dealt with or overcame them.

Read in the dataset and data clean (if necessary)

#Read in the data
nascar <- read_xlsx("nascar2.xlsx", sheet = 1)

Make a scatterplot based on Finish Pos VS Wavg Speed, grouped by team

nascar %>%
  filter(year == 2024, race_name == "Pocono") %>%
  group_by(driver) %>%
  plot_ly(x = ~fin_pos, y = ~wavg_speed, color = ~manufacturer, 
          colors = c("#EDB53C", "#1d91c0", "#081d58"),
          hoverinfo = "text",
          text = ~paste("Driver:", driver,
                        "<br>Team:", team,
                        "<br>Finish Position: ", fin_pos,
                        "<br>Wavg Position: ", wavg_speed,
                        "<br>Diff: ", finVspeed)) %>%
  add_markers(showlegend = T, size = 3) %>%
  add_annotations(x = 12.5, y = 32.5, font = list(color = 'darkgreen',size = 14),
                  text = "Finished Above<br> Weighted Position",showarrow = F) %>%
  add_annotations(x = 32.5, y = 12.5,  font = list(color = 'darkred',size = 14),
                  text = "Finished Below<br> Weighted Position",showarrow = F) %>%
  layout(yaxis = list(title = "Weighted Average Running Position"),
         xaxis = list(title = "Finish Position",
                      range = list(0,40), dtick = 5, tick0 = 0, tickmode = "linear"),
         title = "Pocono 2024 Finish Position VS Wavg Position",
         shapes = list(type = "line", x0 = 0, y0 = 0, x1 = 40, y1 = 40,
                       line = list(color = "black", width = 1)))

Discussion

When looking at a general race in NASCAR, one thing that is evident is drivers finish positions. It is most noticed when a driver who has not raced near the front all day steals a top 5 late through strategy. One thing that I have an interest in is looking at a drivers weighted running position versus how they finish. This is what I decided to plot. I chose Pocono in 2024 as my race to look into this, as this was a race that I attended. This scatterplot plots each driver, with their finish position being on the x-axis and the weighted average running position for the race on the y-axis. I added a y=x line as well, which represents a driver’s finish position and weighted running position being equal. I decided to color code the points by manufacturer.

When you hover over a point, it shows you the driver’s name, the team that the driver is on, their finish position, their weighted average position, and the difference between the two. This plot shows us drivers who finished better than their weighted position (Points that are above the line) and drivers who finished worse than their weighted position (drivers who were below the line). Two toyotas stand out to me. Bubba Wallace finished 10 positions better than his weighted position, which means that they made the most out of a rather mediocre day. On the flip side, Ty Gibbs finished 9 positions worse than his weighted position, meaning he spoiled a day where he had a good car.

Overall, I did not have many issues when making this plot. The biggest issue that I did have was learning how to add the text to the plot. I tried adding it in the plot_ly command first, which only worked for the “finished above weighted position” text and not the other one. I also tried add_text, and the same problem arose.It wasn’t until I used the add_annotations command and called it twice that both features worked.

Problem 2 [20 points]

Create an animated plotly graph with a data set of your choosing. This can be, but does not have to be a scatter plot. Also, the animation does not have to take place over time. As mentioned in the notes, the frame can be set to a categorical variable. However, the categories the frames cycle through should be organized (if needs be) such that the progression through them shows some pattern.

This graph should include:

1. Aside from the graphing variable, a separate categorical variable. For example, in our animated scatter plot we color grouped the points by continent.

2. Appropriate axis labels and a title

3. Augment the frame label to make it more visible. This can include changing the font size and color to make it stand out more, and/or moving the frame label to a new location in the plotting region. Note, if you do this, make sure it is till clearly visible and does not obstruct the view of your plot.

Load in the Pocono 2024 All Laps dataset and clean it up

Poco24 <- read.csv("Pocono 24 Laps.csv")

#Perform some data cleaning
Poco24 <- Poco24 %>%
  arrange(lap_number, running_position) %>%
  group_by(lap_number) %>%  
  mutate(lap_rank = rank(lap_time))

#Make the lap variable into a factor
Poco24$lap_number <- factor(Poco24$lap_number, levels = sort(unique(Poco24$lap_number)))

#identify the caution laps
cautions <- unique(Poco24$lap_number[Poco24$statuses == "caution"])

#Place NA's in where the cautions are since the cars are not being scored "live"
Poco24$lap_rank <- ifelse(Poco24$lap_number %in% cautions, NA, Poco24$lap_rank)

#Add the image vectors into the dataset for the plot
Poco24$image <- paste0("PocoLogo/", Poco24$driver_name, ".png")

Create a top10 dataframe and add colors

Poco24_top10 <- Poco24 %>%
  group_by(lap_number) %>%
  filter(running_position <= 10) %>%
  ungroup()

drivers <- unique(Poco24$driver_name)
colors <- c("black", "#cb313e", "black", "#71cf42", "white", "#0f9297", "#083760",
            "#d60227", "gold", "dodgerblue", "grey20", "darkorange", "blue", "white",
            "steelblue", "turquoise", "palegreen", "black", "orange", "yellowgreen",
            "orange", "white", "red", "white", "orange", "white", "forestgreen",
            "purple", "tan", "white", "navy", "green", "black", "dodgerblue", 
            "grey50", "royalblue", "gold")

borders <- c("#83c435", "black", "#19888a", "white", "purple", "black", "yellow",
             "white", "red", "red", "red", "royalblue", "white", "hotpink",
             "turquoise", "white", "black", "red", "white", "black", 
             "steelblue", "red", "orange", "maroon", "black", "royalblue", "white",
             "yellowgreen", "turquoise", "#0f9297", "orange", "navy", "yellow", "darkorange",
             "red", "red", "red")

df <- data.frame(drivers, colors, borders)

Poco24$color <- df$colors[match(Poco24$driver_name, df$drivers)]
Poco24$border <- df$borders[match(Poco24$driver_name, df$drivers)]

Create and Generate the Point Plot

Poco24 %>%
  group_by(driver_name) %>%
  plot_ly(x = ~as.numeric(lap_number), y = ~running_position, 
          hoverinfo = "text", mode = "markers", showlegend = FALSE,
          text = ~paste("Driver:", driver_name,
                        "<br>Team:", team,
                        "<br>Lap Rank: ", lap_rank)) %>%
  add_text(x = 80, y = 5.5, text = ~lap_number, frame = ~lap_number,
           textfont = list(size = 120, showlegend = FALSE, color = toRGB("white"))) %>%
  add_markers(frame = ~lap_number, ids = ~driver_name, showlegend = FALSE, 
              marker = list(color = ~color, size = 25,
                            line = list(color = ~border, width = 3))) %>%
  layout(yaxis = list(range=list(10.5,0),dtick = 1, tick0 = 0, tickmode = "linear",
                      title = "Running Position"),
         xaxis = list(title = "Lap Number", 
                      range = list(0,161), dtick = 10, tick0 = 0, tickmode = "linear"),
         title = "Pocono Top 10 Point Chart",
         paper_bgcolor="peachpuff", plot_bgcolor="peachpuff") %>%
  animation_slider(hide = FALSE, currentvalue = list(font = list(color="peachpuff"))) %>%
  animation_opts(frame = 2000, transition = 1000, redraw = T)

Discussion

Using a similar theme from above, I wanted to stay with the 2024 Pocono race. When I saw that animated plotly charts were a thing, I thought of a cool line chart that I saw NASCAR post once regarding the standings. In this plot, they moved across the x-axis by race, with line plots showing the top 10 in points, moving up and down as the drivers changed positions. I thought it would be neat to do this with the top 10 in running order during that Pocono Race.

For this plot, the points (colored by driver name) move across the x-axis, showing the jokceying of position from the drivers in the top 10. I elected to hide the legend due to the sheer size of it, having 37 drivers. They come in and out as the laps go on as the running order changes. When you hover over the points, it displays information of driver name, team name, and the rank amongst other drivers during that lap. In this plot, you can see certain drivers stay in the top 10 for most of the day. Common names include Ryan Blaney, Brad Keselowski, and Denny Hamlin. Additionally, the running order was changing dramatically at times. This was due to cautions being called, which usually shake up the running order as some drivers will pit while others stay out.

I had a lot of issues making this plot if I’m being completely honest. First and foremost, I wanted to make a lineplot. I could not get the lines to work, as they would show up ahead of time and then be redrawn overtop, looking really messy and bad. Next, as the points plotted moving along the laps, some frames would leave behind “ghost points (that is what I have decided to call them)”. It seemed really random, only happening with William Byron and occasionally Tyler Reddick. I tried a lot of different commands and additions to the plot_ly commands, such as setting ids in the add_markers, adding new colors, changing the colors variable, and others changes. This would not stop the issue. I think I came to the conclusion that this may be a plotting issue with a certain shade of grey, as it only happened with a certain shade. I did run out of time to set custom colors, which I would like to go back and do at some point soon. My ultimate solution was to limit my dataframe from all laps to only the laps of the drivers in the top 10 at each point. For some reason, this solved the problem with the “ghost points”. It did introduce some minor “blipping”, however, I believe that this is more reasonable. Drivers would re-enter the top 10 and their point would travel from the last time they were in the top 10.