library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(dslabs)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks plotly::filter(), stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors


Note: on data sets

You may use any data of your choosing in the following problems, but I would suggest you choose a data set you find interesting or would give an interesting graph (so, don’t use something like the old iris data set). You will get more out of the project in the end, and it will look better to those in the future you are showing it to. If the data set comes from an R package then reference this. If the data set is from elsewhere, then upload a copy to blackboard (.csv format).


Problem 1 [20 points]

Create a plotly graph of your choosing that represents at least two variables, one of which must be a categorical variable.

This plot can be a scatter plot, overlayed density plots (graphing variable is continuous, separate densities grouped by categorical variable), etc. choropleth maps could also be on the list…you have to admit they look kinda cool.

The graph must include:

  1. customized hover text that is informative to the graphing elements in the plot

  2. separate color to represent groups

  3. labeled axes and appropriate title

weather <- read.csv("StateCollege_Weather.csv")

# converting months into seasons to use as categorical variable
weather$Date <- as.Date(weather$Date, format = "%m/%d/%Y")
weather$Season <- factor(
  months <- format(weather$Date, "%m"),
  levels = c("12","01","02","03","04","05","06","07","08","09","10","11"),
  labels = c("Winter","Winter","Winter","Spring","Spring","Spring",
             "Summer","Summer","Summer","Fall","Fall","Fall"))

plot_ly(
  data = weather,
  x = ~Avg.Temp.,
  y = ~Avg.RH,
  color = ~Season,
  colors = c("blue", "green", "orange", "brown"),
  type = "scatter",
  mode = "markers",
  text = ~paste("Date:", Date,
                "<br>Temp:", Avg.Temp.,
                "<br>Rel. Humidity:", Avg.RH,
                "<br>Season:", Season),
  hoverinfo = "text") %>%
  layout(
    title = "Temperature vs Relative Humidity in State College",
    xaxis = list(title = "Average Temperature"),
    yaxis = list(title = "Average Relative Humidity"),
    legend = list(title = list(text = "Season")))

Include at least a 1-paragraph discussion about the graph. Discuss what is being plotted and what information is being displayed in the graph. Discuss any information that the reader may gain from hovering the cursor over graphing elements. Discuss any issues/chalenges you had (if any) while making the plot, and you you dealt with or overcame them.

Discussion:

This plot is showing the average daily temperature vs. the average daily relative humidity in State College for January - October 2025. The graph is faceted by season. We can see that summer temperatures are the highest, while winter is the lowest (which is expected). However, it does not seem that humidity follows any type of trend dependent on the season. The reader will be able to see the date, season, average temperature, and average relative humidity when they hover over the data points. Initially, I faceted the data over the ten months. This looked a bit messy, which I easily fixed by converting the months into seasons instead. This can help the reader see what trends happen during different parts of the year.


Problem 2 [20 points]

Create an animated plotly graph with a data set of your choosing. This can be, but does not have to be a scatter plot. Also, the animation does not have to take place over time. As mentioned in the notes, the frame can be set to a categorical variable. However, the categories the frames cycle through should be organized (if needs be) such that the progression through them shows some pattern or trend.

This graph should include:

  1. Aside from the graphing variable, a separate categorical variable. For example, in our animated scatter plot we color grouped the points by continent.

  2. Appropriate axis labels and a title

  3. Augment the frame label to make it more visible. This can include changing the font size and color to make it stand out more, and/or moving the frame label to a new location in the plotting region. Note, if you do this, make sure it is still clearly visible and does not obstruct the view of your plot.

# changing precipitation NAs to 0
weather$Total.Precip[is.na(weather$Total.Precip)] <- 0

plot_ly(
  data = weather,
  x = ~Avg.Temp.,
  y = ~Avg.Press,
  frame = ~Season,
  color = ~Season,
  colors = c("blue", "green", "orange", "brown"),
  text = ~paste("Date:", Date,
                "<br>Temp:", Avg.Temp.,
                "<br>Pressure", Avg.Press,
                "<br>Season:", Season),
  hoverinfo = "text",
  type = 'scatter',
  mode = 'markers') %>%
  layout(
    title = "Temperature vs Pressure in State College",
    xaxis = list(title = "Average Temperature"),
    yaxis = list(title = "Average Pressure")) %>%
  animation_opts(frame = 2000, transition = 500, redraw = TRUE) %>%
  animation_slider(currentvalue = list(prefix = "Season: ", font = list(size = 24, color = "red")))
## Warning in p$x$data[firstFrame] <- p$x$frames[[1]]$data: number of items to
## replace is not a multiple of replacement length

Include at least a 1-paragraph discussion about the plot. Discuss what you are plotting and what trends can be seen throughout the animation. Discuss any issues, if any, you ran into in making the plot and how you overcame them.

Discussion:

This plot is showing the average daily temperature vs. the average daily pressure in State College for January - October 2025. The graph is faceted by season. Like before, we can see that summer temperatures are the highest, while winter is the lowest (which is expected). Pressure seems to be more spread in the winter and spring, and less spread for summer and fall. The reader will be able to see the date, season, average temperature, and average pressure when they hover over the data points. Again, I initially faceted the data over the ten months. I fixed this issue again by using seasons.

What to turn in: