DATA 110 First Solo Visualization

Load libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(RColorBrewer)
library(streamgraph)
data(flights)
data(weather)

My Edited Visualization

Find the total monthly precipitation at each airport

I was struggling really hard to merge these dataframes, I even at one point wrote a function to do indexing for me so I could mutate one of the dataframes using an input from the other. Turns out I forgot to consider that there are measurements coming from multiple different airports, each with different precipitations. Even though I sunk maybe 4 hours into this, it was a learning experience. Essentially, I was trying to merge the dataframes after I had already got rid of information about which airport the measurements were coming from, so of course the computer just spit out a dataframe with 1 million observations, because it didn’t know how I wanted it to match up the data. I think this is the epitome of debugging.

rainports <- merge(flights, weather)
rainports <- select(rainports, time_hour, precip, origin, dep_delay)

Take the sum of the rainfall over the months for each airport

rainports_sums <- rainports |>
  mutate(month = as.numeric(format(as.Date(time_hour, format="%Y-%m-%d"),"%m"))) |>
  group_by(month, origin) |>
  summarise(total_precip=sum(precip))
`summarise()` has grouped output by 'month'. You can override using the
`.groups` argument.

Stream Graph

Here we graph the rainfall at each airport over the year 2013 as a stream graph. There are a couple features that seem to not be working with the package, such as the axis labels and annotation functions. The legend is not displayed as a color key, rather, it is a dropdown menu. This could be useful for when there are many categories, such as in the babynames dataset, but is redundant here when it is easy to hover over for the category. I found the units for the y axis label in the nycflights13 manual.

lluvia <- streamgraph(rainports_sums, key=origin, value=total_precip, 
                      date=month, scale="continuous", interpolate = "step",
                      height = 400, width = 800) |>
  sg_fill_brewer("Paired") |>
  sg_legend(show = TRUE, label = "Airport: ") |>
  sg_title(title = "Rainfall at New York Airports in 2013")

lluvia
Warning in widget_html(name, package, id = x$id, style = css(width =
validateCssUnit(sizeInfo$width), : streamgraph_html returned an object of class
`list` instead of a `shiny.tag`.
Warning: `bindFillRole()` only works on htmltools::tag() objects (e.g., div(),
p(), etc.), not objects of type 'list'.
Rainfall at New York Airports in 2013

The above is my visualization of the total monthly rainfall at each NYC airport with respect to the month in 2013. I chose to change the interpolation to stepwise as opposed to smoothed, just because I thought it looked more aesthetically pleasing for this data, but then I realized that this basically just makes it a more confusing stacked barchart. Either way this experimentation has taught me more about what makes a meaningful visualization. One notable, but perhaps obvious feature of this graph, is that the total rainfall spiked significantly in the summertime as compared to the other months. Perhaps also, we can see that there was generally less rainfall at JFK as compared to EWR. Perhaps this is because of the smog moisture-absorption effect? Or perhaps there was just less rain in the JFK area this year. More data and careful statistical analysis is required to determine that. While I was playing around with the data, I tried graphing the rainfall daily as opposed to monthly, but the spikes were just too big to make the graph appear meaningful. I considered making a log plot for the daily rainfall, but thought that it was ultimately best to stick to units that are easily understandable. If I were to do this again, I would like to try using the ggplot2 streamgraphs feature to see whether the process is easier (especially with making the axis labels and legend), and I would also like to try summing the rainfall by weeks.