Assignment 3b

Approach

I will create a synthetic dataset with three columns Date, Item, and Price. I’ll sort the items by Date in ascending order. We’ll use the window function to group items by year to date average and Six-Day Moving Average without ruining the original dataset.

Code base

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(zoo)

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
synthetic_dataset <- read.csv("synthetic_dataset.csv")

processed_data <- synthetic_dataset %>%
  group_by(item) %>%
  arrange(date) %>%
  mutate(
    ytd_average_sales = cummean(sales),
    six_day_moving_avg = rollmean(sales, k = 6, fill = NA, align = "right")
  ) %>%
  ungroup()

processed_data %>%
  filter(date >= as.Date("2024-01-06")) %>%
  head(10)
# A tibble: 10 × 9
   date    item  category region sales quantity price_per_unit ytd_average_sales
   <chr>   <chr> <chr>    <chr>  <dbl>    <int>          <dbl>             <dbl>
 1 2024-0… Prod… Electro… North  101.         2           57.1             105. 
 2 2024-0… Prod… Clothing South  137.         3           37.4             153. 
 3 2024-0… Prod… Electro… East   318.         6           47.8             327. 
 4 2024-0… Prod… Home     West    53.7        1           19.3              83.9
 5 2024-0… Prod… Clothing Centr… 156.        10           67.9             161. 
 6 2024-0… Prod… Electro… North  120.         3           72.6             107. 
 7 2024-0… Prod… Clothing South  165.        12           46.3             155. 
 8 2024-0… Prod… Electro… East   335.         8           52.9             328. 
 9 2024-0… Prod… Home     West    70.4        4           98.1              82.0
10 2024-0… Prod… Clothing Centr… 219.        12           18.8             169. 
# ℹ 1 more variable: six_day_moving_avg <dbl>