Data Description

The temps data contains the high (TMAX) and low (TMIN) for each day (DATE), recorded at the South Burlington Airport between December 1940 to September 2023. Snow (SNOW) and rain (PRCP) are also included in the data, but won’t be used (neither will be TAVG since it is missing for almost 80% of the days).

You will be creating a graph that shows the year-to-year difference of the maximum temperature across the twelve months of the year.

Question 1: Data Cleaning

Before we create the graph, we need to do a little data cleaning and create two data sets:

Part 1a) Temperature per month for each year

Create a data set named temps_month by making the following changes:

  1. Change the column names to lower case

  2. Using the functions in the lubridate package, convert the date column from a character column to a date-type column, then create new columns for:

    1. year = Year
    2. month = Month as a number (January = 1, December = 12)
    3. month_lab = Abbreviation for Month (January = Jan, December = Dec)
    4. day = Day of the year (Jan 1st = 1, Dec 31 = 365 (or 366))
  3. Remove any days with the daily high temp missing or during 1940

  4. Calculate the average temperature for each month by year combination (Aka the average high temp for January 1945 was …)

Display the first 10 rows using tibble(temps_month) and make sure that it is not a grouped data frame!

temps |> 
  # Making all the names lowercase:
  janitor::clean_names() |> 
  
  # Converting the date column into days and year
  mutate(
    # Need to convert date column to a date type column with mdy()
    date = mdy(date),
    # find the day of the year with yday()
    day = yday(date),
    # Find the month of the year using the month abbreviation
    week = week(date),
    # Find the month of the year with number
    month = month(date), 
    # Find the month of the year using the month abbreviation
    month_label = month(date, label = T),
    # find the year with year()
    year = year(date)
  ) |> 
  
   # Only keeping the days with a non-missing tavg and after 1940
  filter(
    !is.na(tmax),
    year > 1940
  ) |> 
  
  # Averaging the max temp across the month and year
  summarize(
    .by = c(month, month_label, year),
    tmax_avg = mean(tmax)
  ) ->
  temps_month

tibble(temps_month)
## # A tibble: 993 Ă— 4
##    month month_label  year tmax_avg
##    <dbl> <ord>       <dbl>    <dbl>
##  1     1 Jan          1941     22.6
##  2     2 Feb          1941     28.5
##  3     3 Mar          1941     32.8
##  4     4 Apr          1941     61.2
##  5     5 May          1941     69.1
##  6     6 Jun          1941     81.7
##  7     7 Jul          1941     83.5
##  8     8 Aug          1941     77.5
##  9     9 Sep          1941     73.5
## 10    10 Oct          1941     57.0
## # ℹ 983 more rows

Part 1B) Maximum and Minimum Average per month

Using the temps_month data set from part 1A, find the highest and lowest average temperature for each month and save the results in a data frame named temps_min_max. Display all 12 rows in the knitted document.

temps_month |> 
  # Finding the minimum and maximum average high temp for each month (keeping the label)
  summarize(
    .by = c(month, month_label),
    min_tmax = min(tmax_avg),
    max_tmax = max(tmax_avg)
  ) ->
  temps_min_max

head(temps_min_max, n = 12)
##    month month_label min_tmax max_tmax
## 1      1         Jan 14.64516 37.58065
## 2      2         Feb 16.10714 40.32143
## 3      3         Mar 30.58065 52.77419
## 4      4         Apr 45.30000 61.16667
## 5      5         May 57.90323 76.19355
## 6      6         Jun 68.93333 83.43333
## 7      7         Jul 75.96774 87.41935
## 8      8         Aug 73.19355 84.67742
## 9      9         Sep 64.93333 79.00000
## 10    10         Oct 51.22581 69.03226
## 11    11         Nov 37.70000 52.13333
## 12    12         Dec 17.45161 45.70968

Question 2: Graph for Temperature

Part 2A) First Graph

Create the graph seen in the solutions in Brightspace. You’ll need to use geom_ribbon() to create the grey region in the graph (it should also be the only geom_ for part 2A) and the temps_min_max data set. The help menu can be helpful! Save it as geom_temp_ribbon and make sure to display the results in the knitted document

ggplot(
  #data = temps_min_max,
  mapping = aes(x = month)
) + 
  
  # Adding the shaded region using the temps_min_max data set
  geom_ribbon(
    data = temps_min_max,
    mapping = aes(
      ymin = min_tmax,   # Bottom part of the ribbon
      ymax = max_tmax,   # Top part of the ribbon
      xmin = month,      # Left part of the ribbon
      xmax = month       # right part of the ribbon
    ),
    fill = "grey70",
    alpha = 0.55
  ) + 
  
  # Making the graph look better
  labs(
    #x = NULL, #"Month of the year",
    y = "Temperature",
    title = "Average Monthly High Temperatures",
    subtitle = "Burlington, VT: 1940 - 2023",
    caption = "Data: ncdc.noaa.gov"
  ) ->
  
  gg_temps_ribbon

gg_temps_ribbon

Part 2B) Adding lines for years 1942, 1982, 2022

Add lines for the month-to-month change for the years 1942, 1982, and 2022. See the graph on Brightspace for what the results should look like. Save the graph as gg_tempsB and make sure that it appears in the knitted document

gg_temps_ribbon +
    geom_line(
    data = temps_month |> filter(year %in% c(1942, 1982, 2022)),
    mapping = aes(y = tmax_avg,
                  color = factor(year)),
    linewidth = 1,
    show.legend = F
  ) +
  
  # Adding the year at the end of the 3 lines
  geom_text(
    data = temps_month |> filter(year %in% c(1942, 1982, 2022), month == 12),
    mapping = aes(y = tmax_avg,
                  color = factor(year),
                  label = year),
    show.legend = F, 
    nudge_x = 0.35,
    fontface = "bold"
  ) ->
  
  gg_tempsB

gg_tempsB

Part 2C) Changing the Data Ink

Use gg_tempsB and the appropriate functions to make x, y, and color match the graph in Brightspace, making sure to remove the labels for the x and y axis. To get the colors to match, you’ll need to add scale_color_tq() from the tidyquant package and choose the right value for the theme argument in said function. Save the results as gg_tempsC.

To get the degree F symbol to appear, use "\u00b0F"

gg_tempsB + 
  
  # Changing the colors
  scale_color_tq(
    theme = "dark"
  ) +
  
  # Changing the x-axis
  scale_x_continuous(
    expand = c(0, 0, 0.05, 0),
    breaks = seq(from = 1, to = 12, by = 1),
    labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", 
               "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
    minor_breaks = NULL,
    position = "top"
  ) + 
  
  # Changing the y-axis
  scale_y_continuous(
    breaks = seq(from = 20, to = 80, by = 20),
    labels = paste0(seq(from = 20, to = 80, by = 20), "\u00b0F")
  ) +
  
  # Removing the x and y-axis labels
  labs(
    x = NULL,
    y = NULL
  ) ->
  
  gg_tempsC

gg_tempsC

Part 2D) Changing the non-data ink

Finally, make the needed changes to have the graph match what is in Brightspace. You’ll need to add theme_tq() from the tidyquant package then make the additional needed changes. All of the lines and text use white, even if they look a little gray in the graph!

The title is size 16 and the subtitle is size 12, all other labels have the default size

gg_tempsC + 
  
  # Changing the pre-packaged theme
  theme_tq() + 
  
  # Changing the other theme options
  theme(
    # Changing the title, subtitle and caption
    plot.title = element_text(hjust = 0.5,
                              size = 16),
    plot.subtitle = element_text(hjust = 0.5,
                                 size = 12),
    plot.caption = element_text(hjust = 0,
                                face = "italic"),
    
    # Changing the caption location to be on the "plot" instead of "panel"
    plot.caption.position = "plot",
    
    # Changing the background color of the plot
    plot.background = element_rect(fill = "black",
                                   color = "black"),
    
    # Changing all the text to white
    text = element_text(color = "white",
                        face = "bold"),
    
    # Changing the panel background to black
    panel.background = element_rect(fill = "black"),
    #axis.line = element_line(color = "white"),
    
    # Changing the panel lines to be white
    panel.border = element_rect(color = "white"),
    panel.grid.major = element_line(color = "white"),
    panel.grid.minor = element_line(color = "white")
  )