Week3B

Author

Sinem K Moschos

Objective

For this assignment, I worked with a time series dataset that has data recorded over time for more than one item.

The reason I needed this type of data is because window functions only make sense when time is involved. Each item needs to have values changing over days so that averages over time can be calculated.

Dataset Selection

I chose a dataset that includes: • a date column • an item name • a numeric value that changes over time

There are multiple items in the same dataset, so each item has its own time series. This is important because the calculations should be done separately for each item, not all together.

Why Window Functions Are Needed

Window functions let me calculate things like averages without collapsing the data.

Instead of grouping and losing rows, window functions: • keep every original row • look at nearby rows related to the current row • calculate values based on time order

This is especially useful for moving averages and year-to-date calculations.

Analytical Approach

For each item in the dataset, I calculated two things: 1. Year-to-date average This shows the average value for an item from the beginning of the year up to the current date. 2. Six-day moving average This shows the average of the current day and the previous five days. Both calculations are done within each item, so one item’s values do not affect another item.

How I approached the problem

First, I loaded and cleaned the data so the date and values were usable. Then, I sorted the data by item and date. This step is important because window functions depend on the correct time order. After that, I used dplyr window functions to calculate the year-to-date average and the six-day moving average while keeping all original rows.

Expected Outcome

The final dataset will include: • Original time series data • A YTD average column for each item • A six-day moving average column for each item

This output demonstrates how window functions can be used to analyze trends over time without aggregating away the original data.

Reproducibility

The analysis will be: • Fully reproducible • Implemented using R and dplyr • Structured so that rendering the Quarto file reproduces the same results

Code Base

Step 1: Load libraries

First, I loaded the libraries I needed. I used tidyverse for data manipulation and slider for moving averages.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Step 2: Create a time series dataset (reproducible)

For this assignment, I created a small time series dataset directly in R. This makes the analysis fully reproducible without relying on external files. The dataset contains: • a date column • an item name • a numeric value

set.seed(123)

data <- tibble(
  date = rep(seq.Date(from = as.Date("2023-01-01"),
                      by = "day",
                      length.out = 30), 2),
  item = rep(c("Item_A", "Item_B"), each = 30),
  value = c(
    round(runif(30, 10, 20), 1),
    round(runif(30, 30, 40), 1)
  )
)

data

# A tibble: 60 × 3
   date       item   value
   <date>     <chr>  <dbl>
 1 2023-01-01 Item_A  12.9
 2 2023-01-02 Item_A  17.9
 3 2023-01-03 Item_A  14.1
 4 2023-01-04 Item_A  18.8
 5 2023-01-05 Item_A  19.4
 6 2023-01-06 Item_A  10.5
 7 2023-01-07 Item_A  15.3
 8 2023-01-08 Item_A  18.9
 9 2023-01-09 Item_A  15.5
10 2023-01-10 Item_A  14.6
# ℹ 50 more rows

Step 3: Arrange data for window functions

Before using window functions, the data must be sorted by item and date.

data <- data %>%
  arrange(item, date)

Step 4: Calculate Year-to-Date (YTD) average

Next, I calculated the year-to-date average for each item. This shows the average value from the beginning of the year up to the current date, calculated separately for each item.

data <- data %>%
  group_by(item) %>%
  mutate(
    ytd_average = cummean(value)
  ) %>%
  ungroup()

Step 5: Calculate six-day moving average

After that, I calculated the six-day moving average. This average includes the current day and the previous five days. The calculation resets for each item.

data <- data %>%
  group_by(item) %>%
  mutate(
    six_day_moving_avg =
      (
        value +
        lag(value, 1) +
        lag(value, 2) +
        lag(value, 3) +
        lag(value, 4) +
        lag(value, 5)
      ) / 6
  ) %>%
  ungroup()

Step 6: Final output

The final dataset includes: • the original data • the year-to-date average • the six-day moving average

data

# A tibble: 60 × 5
   date       item   value ytd_average six_day_moving_avg
   <date>     <chr>  <dbl>       <dbl>              <dbl>
 1 2023-01-01 Item_A  12.9        12.9               NA  
 2 2023-01-02 Item_A  17.9        15.4               NA  
 3 2023-01-03 Item_A  14.1        15.0               NA  
 4 2023-01-04 Item_A  18.8        15.9               NA  
 5 2023-01-05 Item_A  19.4        16.6               NA  
 6 2023-01-06 Item_A  10.5        15.6               15.6
 7 2023-01-07 Item_A  15.3        15.6               16  
 8 2023-01-08 Item_A  18.9        16.0               16.2
 9 2023-01-09 Item_A  15.5        15.9               16.4
10 2023-01-10 Item_A  14.6        15.8               15.7
# ℹ 50 more rows

Step 7:View results for one item

This step makes it easier to see how the window functions work for a single item.

data %>%
  filter(item == "Item_A")

# A tibble: 30 × 5
   date       item   value ytd_average six_day_moving_avg
   <date>     <chr>  <dbl>       <dbl>              <dbl>
 1 2023-01-01 Item_A  12.9        12.9               NA  
 2 2023-01-02 Item_A  17.9        15.4               NA  
 3 2023-01-03 Item_A  14.1        15.0               NA  
 4 2023-01-04 Item_A  18.8        15.9               NA  
 5 2023-01-05 Item_A  19.4        16.6               NA  
 6 2023-01-06 Item_A  10.5        15.6               15.6
 7 2023-01-07 Item_A  15.3        15.6               16  
 8 2023-01-08 Item_A  18.9        16.0               16.2
 9 2023-01-09 Item_A  15.5        15.9               16.4
10 2023-01-10 Item_A  14.6        15.8               15.7
# ℹ 20 more rows