Note: on data sets

You may use any data of your choosing in the following problems, but I would suggest you choose a data set you find interesting or would give an interesting graph (so, don’t use something like the old iris data set). You will get more out of the project in the end, and it will look better to those in the future you are showing it to. If the data set comes from an R package then reference this. If the data set is from elsewhere, then upload a copy to blackboard (.csv format).


Problem 1 [20 points]

Create a plotly graph of your choosing that represents at least two variables, one of which must be a categorical variable.

This plot can be a scatter plot, overlayed density plots (graphing variable is continuous, separate densities grouped by categorical variable), etc. choropleth maps could also be on the list…you have to admit they look kinda cool.

The graph must include:

library(readr)
realistic_ocean_climate_dataset <- read_csv("realistic_ocean_climate_dataset.csv")
## Rows: 299 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Location, Bleaching Severity
## dbl (5): Latitude, Longitude, SST (°C), pH Level, Species Observed
## lgl (1): Marine Heatwave
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. customized hover text that is informative to the graphing elements in the plot

  2. separate color to represent groups

  3. labeled axes and appropriate title

library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plt <- plot_ly(data = realistic_ocean_climate_dataset,
               x = ~`pH Level`,#X represents pH level
               y = ~`Species Observed`,#Y represents how many species were observed
               color=~`Bleaching Severity`,#Color represents the severity of the coral bleaching
               colors = "Set1",
               text = ~paste("pH Level ", `pH Level`, '<br>Species Observed:', `Species Observed`))%>%#Text for hover
  #Adding labels
  layout(title = "Ocean pH level V. Number of Species Observed",
         legend = list(title = list(text ="Bleaching Severity")),
         xaxis = list(title = "pH Level"),
         yaxis = list(title = "Number of Species Observed"))

plt
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

According to Kaggle “This dataset compiles synthetic-yet-realistic measurements of sea surface temperature (SST), pH levels, coral bleaching severity, and species observations from ecologically critical marine zones. It spans from 2015 to 2023 and simulates how marine environments are responding to global warming, acidification, and heatwaves.

The goal of this dataset is to support machine learning, climate analysis, and ecological modeling”

This plot demonstrates the relationship between the pH level of the water and the number of distinct species that were found there. The color of the point represents the level of severity of bleaching that has happened to the coral. When hovering over the point, you get a quick summary of the pH level and the number of species found at that location. The main issue I had with this graph was figuring out which categorical variable I wanted to use. I thought whether or not it was under a marine heat wave or the general location of the measurement could have also been good categories to look at.


Problem 2 [20 points]

Create an animated plotly graph with a data set of your choosing. This can be, but does not have to be a scatter plot. Also, the animation does not have to take place over time. As mentioned in the notes, the frame can be set to a categorical variable. However, the categories the frames cycle through should be organized (if needs be) such that the progression through them shows some pattern.

This graph should include:

  1. Aside from the graphing variable, a separate categorical variable. For example, in our animated scatter plot we color grouped the points by continent.

  2. Appropriate axis labels and a title

  3. Augment the frame label to make it more visible. This can include changing the font size and color to make it stand out more, and/or moving the frame label to a new location in the plotting region. Note, if you do this, make sure it is till clearly visible and does not obstruct the view of your plot.

library(readr)
Global_Cybersecurity_Threats_2015_2024 <- read_csv("Global_Cybersecurity_Threats_2015-2024.csv")
## Rows: 249 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Country, Attack Type, Target Industry, Attack Source, Security Vuln...
## dbl (4): Year, Financial Loss (in Million $), Number of Affected Users, Inci...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Creating plot
fig <- Global_Cybersecurity_Threats_2015_2024 %>%
  plot_ly(
    x = ~`Financial Loss (in Million $)`,
    y = ~`Incident Resolution Time (in Hours)`,
    size = 10,
    frame = ~Year, #Year is the animation
    color = ~`Attack Source`,#Color of points refers to attack source
    type = 'scatter',
    mode = 'markers',
    showlegend = T
  ) %>%
  #adding labels
  layout(title = "Financial Loss v. Time to Resolve Incident Over Time",
         legend = list(title = list(text ="Attack Source")),
         xaxis = list(title = "Financial Loss (Million $)"),
         yaxis = list(title = "Time to Resolve Incident (Hours)"))

fig <- fig %>%
  #Setting animation time and animation type
  animation_opts(
    1500, easing = "linear", redraw = FALSE
  )
fig <- fig %>%#Making the frame label more visible
  animation_slider(
    currentvalue = list(prefix = "YEAR ", font = list(color="blue"))
  )

fig

This data set is a breakdown of cyber attacks from 2015 to 2024.

This plot shows the relationship between the financial loss in millions in cyber attacks vs the amount of time it took to respond to the incident. The color represents the source of the attack. The animation represents different years. I think the two biggest issues I ran into were the errors generated and the general lack of correlation between the variables I picked.