Approach

In this assignment, I am using the data set of Coco-Cola (KO) and PepsiCo(PEP) price trends since January 1, 2022. I find this data set interesting as they are both each others main competitors. I will be using the dpylr and zoo R packages to better understand how to use window function for time series for 2 or more separate items , which is a variation of an aggregation function to THE 2 types of window function that will be required for this analysis will be a rolling window for the 6 day moving average and an expanding window for the Year to Data (YTD) average.

Process proposal steps:

  1. Data Preparation and Sorting: First, i will sort the data by Ticker and Date which ensure calculations will follow the correct chronological order so that there is no blunder to accurate time-series analysis

  2. Rolling Window –> 6 Days Moving Average: I will group by Ticket only, use rollmean() from zoo package, k=6 for window size, align = “right”, and setting any incomplete parts with NA (fill = NA) using values from date and the first 5 days is used for the average.

  3. Expanding Window –> YTD Average:
    Group by Ticker and Year, use cummean () - cumulative mean in dyplr package to calculate YTD average.

Source: (1) Gemini Pro (2) R for Data Science 2e (3)https://dplyr.tidyverse.org/articles/window-functions.html (4)https://www.r-bloggers.com/2017/07/tidy-time-series-analysis-part-2-rolling-functions/

library (tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Data set generated

I am interested in the two time series values of Coca Cola vs Pepsi stocks and want to see how they compare to each other price wise overtime in terms of 6 days and YTD overall. I had Gemini Pro LLM generate the data set representing the daily closing prices for Coca-Cola and PepsiCo starting January 2022

#Dataset genereate by LLM for Coco-Cola and PepsiCo

stock_data <- data.frame(
  Date = as.Date(c(
    # Coca-Cola (KO) - Jan 2022
    "2022-01-03", "2022-01-04", "2022-01-05", "2022-01-06", 
    "2022-01-07", "2022-01-10", "2022-01-11",
    # PepsiCo (PEP) - Jan 2022
    "2022-01-03", "2022-01-04", "2022-01-05", "2022-01-06", 
    "2022-01-07", "2022-01-10", "2022-01-11"
  )),
  Ticker = c(rep("KO", 7), rep("PEP", 7)),
  Price = c(
    59.0, 59.5, 60.0, 59.8, 60.2, 60.5, 60.1,  # KO Prices
    173.0, 174.5, 175.0, 174.0, 175.5, 176.0, 175.2 # PEP Prices
  )
)

print(stock_data)
##          Date Ticker Price
## 1  2022-01-03     KO  59.0
## 2  2022-01-04     KO  59.5
## 3  2022-01-05     KO  60.0
## 4  2022-01-06     KO  59.8
## 5  2022-01-07     KO  60.2
## 6  2022-01-10     KO  60.5
## 7  2022-01-11     KO  60.1
## 8  2022-01-03    PEP 173.0
## 9  2022-01-04    PEP 174.5
## 10 2022-01-05    PEP 175.0
## 11 2022-01-06    PEP 174.0
## 12 2022-01-07    PEP 175.5
## 13 2022-01-10    PEP 176.0
## 14 2022-01-11    PEP 175.2

Window functions to calculate required average

Using information found on the dplyr.tidyverse and R-blogger website, I combine that with other previously used function to manipulate the data set table.

#————————————————————- # 1. Calculate 6 Day Moving Average

calculated_data <- stock_data |> 
  group_by(Ticker) |>  #group only by ticker so it slides 
  mutate(Moving_Avg_6D = rollmean(Price, k = 6, fill= NA, align = "right"))  # window function (rollmean() for looking at the current 5 row preceding metrics)

print(calculated_data)
## # A tibble: 14 × 4
## # Groups:   Ticker [2]
##    Date       Ticker Price Moving_Avg_6D
##    <date>     <chr>  <dbl>         <dbl>
##  1 2022-01-03 KO      59            NA  
##  2 2022-01-04 KO      59.5          NA  
##  3 2022-01-05 KO      60            NA  
##  4 2022-01-06 KO      59.8          NA  
##  5 2022-01-07 KO      60.2          NA  
##  6 2022-01-10 KO      60.5          59.8
##  7 2022-01-11 KO      60.1          60.0
##  8 2022-01-03 PEP    173            NA  
##  9 2022-01-04 PEP    174.           NA  
## 10 2022-01-05 PEP    175            NA  
## 11 2022-01-06 PEP    174            NA  
## 12 2022-01-07 PEP    176.           NA  
## 13 2022-01-10 PEP    176           175. 
## 14 2022-01-11 PEP    175.          175.

#—————————————————————– # 2. Year to Date Average

calculated_YTD_data <- stock_data |> 
mutate(Year = year(Date)) |> # Create a Year column
  group_by(Ticker, Year) |>   #Group by Ticker and Year to annual reset
  mutate(YTD_avg = cummean(Price)) # Window Function (cummean()) to gather average from Jan 1st

print(calculated_YTD_data)  #view completed result
## # A tibble: 14 × 5
## # Groups:   Ticker, Year [2]
##    Date       Ticker Price  Year YTD_avg
##    <date>     <chr>  <dbl> <dbl>   <dbl>
##  1 2022-01-03 KO      59    2022    59  
##  2 2022-01-04 KO      59.5  2022    59.2
##  3 2022-01-05 KO      60    2022    59.5
##  4 2022-01-06 KO      59.8  2022    59.6
##  5 2022-01-07 KO      60.2  2022    59.7
##  6 2022-01-10 KO      60.5  2022    59.8
##  7 2022-01-11 KO      60.1  2022    59.9
##  8 2022-01-03 PEP    173    2022   173  
##  9 2022-01-04 PEP    174.   2022   174. 
## 10 2022-01-05 PEP    175    2022   174. 
## 11 2022-01-06 PEP    174    2022   174. 
## 12 2022-01-07 PEP    176.   2022   174. 
## 13 2022-01-10 PEP    176    2022   175. 
## 14 2022-01-11 PEP    175.   2022   175.

Conclusion

This assignment I used the LLM generated KO and PEP stock result to demonstrate how to apply window functions to analyze time-series financial data. I was able to track the daily closing price of both stock since January 1,2022 and run both rolling and expanding window calculation with dplyr and zoo package in R. This analysis relied on the window functions rollmean() and cummean() to satisfied this and allows me to observe the both the short-term trend and expected price of the stocks for the year which helps with seeing if the stocks are overperforming/underperforming for that year.

Challenge: In-completed Data: The first few days of the data was not calculable with the 6 day moving average due to it not being the full 6 day. Hence, filling that space with fill=NA in the window function prevents R from returning partial averages and focus on calculating what we need. Another issue I found while trying data sort/organize, was trying to use solely just the year but with lubricate inside the tidyverse package, I was able to do that and add it to group_by() to correct the calculation. Finally, a unexpected error also occurred with naming a column, initially “6_Day_Moving_Avg” but R syntax rules made it so that variable names cannot start with a number so I adjusted to a Moving_Avg_6D name column.