Assignment 3B: Temperature Window Functions

Author

Emily El Mouaquite

Approach

I will begin by asking an LLM to generate a data set that includes the daily high and low temperatures for two different cities over the past eighteen months. Using this data, I will then use window functions in dplyr, grouped by city and ordered by date, to calculate the YTD average & six day moving average for each daily weather record. A challenge that I might anticipate with this data is that I must also decide how to analyze both the high & low temperatures, and ensure that window functions are applied in the way that I intend for them to be.

Code Base

Introduction

An LLM generated a data set with the high and low temperatures in Phoenix, Arizona and Seattle, Washington over the past 18 months (August 1st 2024 - February 15th 2026). This data is stored in the file temperature.csv.

# import csv
temperatures <- read.csv("temperature.csv")

Body

# load dplyr
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# convert date column data type
temperatures$Date <- as.Date(temperatures$Date)

Year to Date Averages

temperatures <- temperatures %>%
  filter(format(Date, "%Y") == "2026") %>%
  group_by(City) %>%
  arrange(Date) %>%
  mutate(
    YTD_high = cummean(High_Temp_F),
    YTD_low  = cummean(Low_Temp_F)
  ) %>%
  ungroup()
temperatures %>%
  group_by(City) %>%
  summarise(
    overall_avg_high = mean(High_Temp_F),
    overall_avg_low  = mean(Low_Temp_F)
  )
# A tibble: 2 × 3
  City       overall_avg_high overall_avg_low
  <chr>                 <dbl>           <dbl>
1 Phoenix_AZ             70.7            50.2
2 Seattle_WA             43.5            31.9

The YTD average high temperature in Phoenix is 70.7, while in Seattle it is 43.54. The average low temperatures are 50.19 in Phoenix and 31.89 in Seattle.

Six Day Moving Averages

temperatures <- temperatures %>%
  filter(format(Date, "%Y") == "2026") %>%
  group_by(City) %>%
  arrange(Date) %>%
  mutate(
    MA_high = stats::filter(High_Temp_F, rep(1/6, 6), sides = 1),
    MA_low  = stats::filter(Low_Temp_F,  rep(1/6, 6), sides = 1)
  ) %>%
  ungroup()
temperatures %>%
  arrange(Date) %>%
  slice_tail(n = 6) %>%
  select(City, Date, MA_high, MA_low)
# A tibble: 6 × 4
  City       Date       MA_high MA_low
  <chr>      <date>       <dbl>  <dbl>
1 Phoenix_AZ 2026-02-13    76.4   54.9
2 Seattle_WA 2026-02-13    47.1   34.8
3 Phoenix_AZ 2026-02-14    76.0   54.6
4 Seattle_WA 2026-02-14    47.1   35.2
5 Phoenix_AZ 2026-02-15    75.1   54.4
6 Seattle_WA 2026-02-15    47.6   35.3

Above are the 6 day moving averages for the past 6 days in each city.

Conclusion

This assignment was extremely helpful in getting some dplyr practice. To further expand on it, the temperatures in the two cities could be compared, the standard deviation could be calculated, or future temperatures could be forecasted.