Problem 1 [20 points]
Create a plotly graph of your choosing that represents at least two
variables, one of which must be a categorical variable.
This plot can be a scatter plot, overlayed density plots (graphing
variable is continuous, separate densities grouped by categorical
variable), etc. choropleth maps could also be on the list…you have to
admit they look kinda cool.
The graph must include:
1. customized hover text that is informative to the graphing
elements in the plot
2. separate color to represent groups
3. labeled axes and appropriate title
Include at least a 1-paragraph discussion about the graph. Discuss
what is being plotted and what information is being displayed in the
graph. Discuss any information that the reader may gain from hovering
the cursor over graphing elements. Discuss any issues/chalenges you had
(if any) while making the plot, and you you dealt with or overcame
them.
Read in the dataset and data clean (if necessary)
#Read in the data
nascar <- read_xlsx("nascar2.xlsx", sheet = 1)
Make a scatterplot based on Finish Pos VS Wavg Speed, grouped by
team
nascar %>%
filter(year == 2024, race_name == "Pocono") %>%
group_by(driver) %>%
plot_ly(x = ~fin_pos, y = ~wavg_speed, color = ~manufacturer,
colors = c("#EDB53C", "#1d91c0", "#081d58"),
hoverinfo = "text",
text = ~paste("Driver:", driver,
"<br>Team:", team,
"<br>Finish Position: ", fin_pos,
"<br>Wavg Position: ", wavg_speed,
"<br>Diff: ", finVspeed)) %>%
add_markers(showlegend = T, size = 3) %>%
add_annotations(x = 12.5, y = 32.5, font = list(color = 'darkgreen',size = 14),
text = "Finished Above<br> Weighted Position",showarrow = F) %>%
add_annotations(x = 32.5, y = 12.5, font = list(color = 'darkred',size = 14),
text = "Finished Below<br> Weighted Position",showarrow = F) %>%
layout(yaxis = list(title = "Weighted Average Running Position"),
xaxis = list(title = "Finish Position",
range = list(0,40), dtick = 5, tick0 = 0, tickmode = "linear"),
title = "Pocono 2024 Finish Position VS Wavg Position",
shapes = list(type = "line", x0 = 0, y0 = 0, x1 = 40, y1 = 40,
line = list(color = "black", width = 1)))
Discussion
When looking at a general race in NASCAR, one thing that is
evident is drivers finish positions. It is most noticed when a driver
who has not raced near the front all day steals a top 5 late through
strategy. One thing that I have an interest in is looking at a drivers
weighted running position versus how they finish. This is what I decided
to plot. I chose Pocono in 2024 as my race to look into this, as this
was a race that I attended. This scatterplot plots each driver, with
their finish position being on the x-axis and the weighted average
running position for the race on the y-axis. I added a y=x line as well,
which represents a driver’s finish position and weighted running
position being equal. I decided to color code the points by
manufacturer.
When you hover over a point, it shows you the driver’s name,
the team that the driver is on, their finish position, their weighted
average position, and the difference between the two. This plot shows us
drivers who finished better than their weighted position (Points that
are above the line) and drivers who finished worse than their weighted
position (drivers who were below the line). Two toyotas stand out to me.
Bubba Wallace finished 10 positions better than his weighted position,
which means that they made the most out of a rather mediocre day. On the
flip side, Ty Gibbs finished 9 positions worse than his weighted
position, meaning he spoiled a day where he had a good
car.
Overall, I did not have many issues when making this plot.
The biggest issue that I did have was learning how to add the text to
the plot. I tried adding it in the plot_ly command first, which only
worked for the “finished above weighted position” text and not the other
one. I also tried add_text, and the same problem arose.It wasn’t until I
used the add_annotations command and called it twice that both features
worked.
Problem 2 [20 points]
Create an animated plotly graph with a data set of your choosing.
This can be, but does not have to be a scatter plot. Also, the animation
does not have to take place over time. As mentioned in the notes, the
frame can be set to a categorical variable. However, the categories the
frames cycle through should be organized (if needs be) such that the
progression through them shows some pattern.
This graph should include:
1. Aside from the graphing variable, a separate categorical
variable. For example, in our animated scatter plot we color grouped the
points by continent.
2. Appropriate axis labels and a title
3. Augment the frame label to make it more visible. This can include
changing the font size and color to make it stand out more, and/or
moving the frame label to a new location in the plotting region. Note,
if you do this, make sure it is till clearly visible and does not
obstruct the view of your plot.
Include at least a 1-paragraph discussion about the plot. Discuss
what you are plotting and what trends can be seen throughout the
animation. Discuss any issues, if any, you ran into in making the plot
and how you overcame them.
Load in the Pocono 2024 All Laps dataset and clean it up
Poco24 <- read.csv("Pocono 24 Laps.csv")
#Perform some data cleaning
Poco24 <- Poco24 %>%
arrange(lap_number, running_position) %>%
group_by(lap_number) %>%
mutate(lap_rank = rank(lap_time))
#Make the lap variable into a factor
Poco24$lap_number <- factor(Poco24$lap_number, levels = sort(unique(Poco24$lap_number)))
#identify the caution laps
cautions <- unique(Poco24$lap_number[Poco24$statuses == "caution"])
#Place NA's in where the cautions are since the cars are not being scored "live"
Poco24$lap_rank <- ifelse(Poco24$lap_number %in% cautions, NA, Poco24$lap_rank)
#Add the image vectors into the dataset for the plot
Poco24$image <- paste0("PocoLogo/", Poco24$driver_name, ".png")
Create a top10 dataframe and add colors
Poco24_top10 <- Poco24 %>%
group_by(lap_number) %>%
filter(running_position <= 10) %>%
ungroup()
Create and Generate the Point Plot
Poco24_top10 %>%
group_by(driver_name) %>%
plot_ly(x = ~as.numeric(lap_number), y = ~running_position, color = ~as.factor(driver_id),
hoverinfo = "text", mode = "markers", showlegend = FALSE,
text = ~paste("Driver:", driver_name,
"<br>Team:", team,
"<br>Lap Rank: ", lap_rank)) %>%
add_text(x = 80, y = 5.5, text = ~lap_number, frame = ~lap_number,
textfont = list(size = 120, showlegend = FALSE, color = toRGB("grey93"))) %>%
add_markers(frame = ~lap_number, ids = ~driver_name,
size = 3, showlegend = FALSE) %>%
layout(yaxis = list(range=list(10.5,0),dtick = 1, tick0 = 0, tickmode = "linear",
title = "Running Position"),
xaxis = list(title = "Lap Number",
range = list(0,161), dtick = 10, tick0 = 0, tickmode = "linear"),
title = "Pocono Top 10 Point Chart") %>%
animation_slider(hide = TRUE) %>%
animation_opts(frame = 2000, transition = 1000, redraw = T)
Discussion
Using a similar theme from above, I wanted to stay with the
2024 Pocono race. When I saw that animated plotly charts were a thing, I
thought of a cool line chart that I saw NASCAR post once regarding the
standings. In this plot, they moved across the x-axis by race, with line
plots showing the top 10 in points, moving up and down as the drivers
changed positions. I thought it would be neat to do this with the top 10
in running order during that Pocono Race.
For this plot, the points (colored by driver name) move
across the x-axis, showing the jokceying of position from the drivers in
the top 10. I elected to hide the legend due to the sheer size of it,
having 37 drivers. They come in and out as the laps go on as the running
order changes. When you hover over the points, it displays information
of driver name, team name, and the rank amongst other drivers during
that lap. In this plot, you can see certain drivers stay in the top 10
for most of the day. Common names include Ryan Blaney, Brad Keselowski,
and Denny Hamlin. Additionally, the running order was changing
dramatically at times. This was due to cautions being called, which
usually shake up the running order as some drivers will pit while others
stay out.
I had a lot of issues making this plot if I’m being
completely honest. First and foremost, I wanted to make a lineplot. I
could not get the lines to work, as they would show up ahead of time and
then be redrawn overtop, looking really messy and bad. Next, as the
points plotted moving along the laps, some frames would leave behind
“ghost points (that is what I have decided to call them)”. It seemed
really random, only happening with William Byron and occasionally Tyler
Reddick. I tried a lot of different commands and additions to the
plot_ly commands, such as setting ids in the add_markers, adding new
colors, changing the colors variable, and others changes. This would not
stop the issue. I think I came to the conclusion that this may be a
plotting issue with a certain shade of grey, as it only happened with a
certain shade. I did run out of time to set custom colors, which I would
like to go back and do at some point soon. My ultimate solution was to
limit my dataframe from all laps to only the laps of the drivers in the
top 10 at each point. For some reason, this solved the problem with the
“ghost points”. It did introduce some minor “blipping”, however, I
believe that this is more reasonable. Drivers would re-enter the top 10
and their point would travel from the last time they were in the top
10.