week_3B_window_functions

Author

Brandon Chanderban

Introduction/Approach

The objective of this week 3B assignment is to apply window functions to time series data in order to compute both rolling and cumulative summary statistics. More specifically, this task requires the calculation of:

  • The year-to-date (YTD ) average, and

  • The six-day moving average

for two or more separate time-series items.

For this assignment, I plan to use end-of-day Bitcoin and Ethereum prices (beginning January 1st, 2022 to present day, February 14th, 2026) in order to showcase grouped window functionality across multiple instruments.

All calculations will be executed within R using the dplyr package, utilizing grouped window operations here rather than via SQL.

Data Source

The dataset to be analyzed will likely consist of three principal components:

  1. The date variable,

  2. The instrument (in this case, BTC and ETH), and

  3. The variable of closing price.

The data may be obtained from a public finance API (such as Yahoo Finance, or some other cryptocurrency data provider) and then imported within RStudio as a dataframe.

In order to properly calculate the desired summary statistics, the data must:

  • Be sorted into chronological order,

  • Contain no missing values (or, if present, be handled accordingly), and

  • Include more than one instrument so as to demonstrate and make use of grouped window calculations.

Anticipated Challenges

  • Ensuring that the data is arranged in chronological order prior to applying rolling average calculations.

  • Making sure that the YTD average resets at the start of each new calendar year within the dataset.

  • Ensuring that grouped window functions operate independently across the different examined instruments (BTC and ETH).

Optional Endeavor

In this assignment, I may generate plots to visually depict:

  • The closing prices (per instrument) over the examined time period,

  • The YTD average, and/or

  • The six-day moving average on a time-series graph.

Such visualizations as above would serve to reinforce the interpretations garnered from the window calculations.

Code Base (Body)

Upon some further investigation, a package by the name of “tidyquant” was discovered and installed.

#install.packages("tidyquant")

This package enables the retrieval of financial time-series data (such as the desired cryptocurrency prices, and even stock prices) directly into RStudio, while maintaining compatibility with tidyverse workflows.

Subsequently, the libraries to be utilized within the scope of this analysis were loaded.

library(tidyquant)
Warning: package 'tidyquant' was built under R version 4.5.2
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Warning: package 'xts' was built under R version 4.5.2
Warning: package 'zoo' was built under R version 4.5.2
Warning: package 'quantmod' was built under R version 4.5.2
Warning: package 'TTR' was built under R version 4.5.2
Warning: package 'PerformanceAnalytics' was built under R version 4.5.2
── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.11 ──
✔ PerformanceAnalytics 2.0.8      ✔ TTR                  0.24.4
✔ quantmod             0.4.28     ✔ xts                  0.14.1
── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
✖ zoo::as.Date()                 masks base::as.Date()
✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
✖ PerformanceAnalytics::legend() masks graphics::legend()
✖ quantmod::summary()            masks base::summary()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

######################### Warning from 'xts' package ##########################
#                                                                             #
# The dplyr lag() function breaks how base R's lag() function is supposed to  #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
# source() into this session won't work correctly.                            #
#                                                                             #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
# dplyr from breaking base R's lag() function.                                #
#                                                                             #
# Code in packages is not affected. It's protected by R's namespace mechanism #
# Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
#                                                                             #
###############################################################################

Attaching package: 'dplyr'

The following objects are masked from 'package:xts':

    first, last

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Now, the financial cryptocurrency data (end-of-day prices for BTC and ETH) will be called forth using the tq_get() functionality.

crypto_prices <- tq_get(
  c("BTC-USD","ETH-USD"),
  from = "2022-01-01",
  to = "2026-02-13"
)

glimpse(crypto_prices)
Rows: 3,010
Columns: 8
$ symbol   <chr> "BTC-USD", "BTC-USD", "BTC-USD", "BTC-USD", "BTC-USD", "BTC-U…
$ date     <date> 2022-01-01, 2022-01-02, 2022-01-03, 2022-01-04, 2022-01-05, …
$ open     <dbl> 46311.75, 47680.93, 47343.54, 46458.85, 45899.36, 43565.51, 4…
$ high     <dbl> 47827.31, 47881.41, 47510.73, 47406.55, 46929.05, 43748.72, 4…
$ low      <dbl> 46288.48, 46856.94, 45835.96, 45752.46, 42798.22, 42645.54, 4…
$ close    <dbl> 47686.81, 47345.22, 46458.12, 45897.57, 43569.00, 43160.93, 4…
$ volume   <dbl> 24582667004, 27951569547, 33071628362, 42494677905, 368510848…
$ adjusted <dbl> 47686.81, 47345.22, 46458.12, 45897.57, 43569.00, 43160.93, 4…
head(crypto_prices)
# A tibble: 6 × 8
  symbol  date         open   high    low  close      volume adjusted
  <chr>   <date>      <dbl>  <dbl>  <dbl>  <dbl>       <dbl>    <dbl>
1 BTC-USD 2022-01-01 46312. 47827. 46288. 47687. 24582667004   47687.
2 BTC-USD 2022-01-02 47681. 47881. 46857. 47345. 27951569547   47345.
3 BTC-USD 2022-01-03 47344. 47511. 45836. 46458. 33071628362   46458.
4 BTC-USD 2022-01-04 46459. 47407. 45752. 45898. 42494677905   45898.
5 BTC-USD 2022-01-05 45899. 46929. 42798. 43569. 36851084859   43569.
6 BTC-USD 2022-01-06 43566. 43749. 42646. 43161. 30208048289   43161.

Successfully calling forth the data and assigning it into a dataframe(crypto_prices), we can observe that there are approximately 3,010 rows within the dataframe, as well as 8 columns. Of those 8, the 3 that we will focus on are those of: symbol, date, and close. The dataframe will thus be subsetted to concentrate upon these variables.

crypto_data <- crypto_prices %>%
  select(symbol, date, close)

glimpse(crypto_data)
Rows: 3,010
Columns: 3
$ symbol <chr> "BTC-USD", "BTC-USD", "BTC-USD", "BTC-USD", "BTC-USD", "BTC-USD…
$ date   <date> 2022-01-01, 2022-01-02, 2022-01-03, 2022-01-04, 2022-01-05, 20…
$ close  <dbl> 47686.81, 47345.22, 46458.12, 45897.57, 43569.00, 43160.93, 415…
head(crypto_data)
# A tibble: 6 × 3
  symbol  date        close
  <chr>   <date>      <dbl>
1 BTC-USD 2022-01-01 47687.
2 BTC-USD 2022-01-02 47345.
3 BTC-USD 2022-01-03 46458.
4 BTC-USD 2022-01-04 45898.
5 BTC-USD 2022-01-05 43569.
6 BTC-USD 2022-01-06 43161.
unique(crypto_data$symbol)
[1] "BTC-USD" "ETH-USD"

Now that we have isolated the variables of interest, the next vital step is to ensure that the data is properly arranged in chronological order for each of the instruments. This is necessary since the window functions are dependent upon the correct ordering of observations.

crypto_data <- crypto_data %>%
  arrange(symbol, date)

head(crypto_data)
# A tibble: 6 × 3
  symbol  date        close
  <chr>   <date>      <dbl>
1 BTC-USD 2022-01-01 47687.
2 BTC-USD 2022-01-02 47345.
3 BTC-USD 2022-01-03 46458.
4 BTC-USD 2022-01-04 45898.
5 BTC-USD 2022-01-05 43569.
6 BTC-USD 2022-01-06 43161.

Creating a Year Variable

Since one of the principal goals of this analysis involves computing a Year-To-Date (YTD) average, it makes logical sense that we must possess a year variable to compute that statistic correctly. Therefore, we extract the calendar year from the date variable, allowing the YTD calculation to reset at the start of each new year.

library(lubridate)

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
crypto_data <- crypto_data %>%
  mutate(year = lubridate::year(date))

head(crypto_data)
# A tibble: 6 × 4
  symbol  date        close  year
  <chr>   <date>      <dbl> <dbl>
1 BTC-USD 2022-01-01 47687.  2022
2 BTC-USD 2022-01-02 47345.  2022
3 BTC-USD 2022-01-03 46458.  2022
4 BTC-USD 2022-01-04 45898.  2022
5 BTC-USD 2022-01-05 43569.  2022
6 BTC-USD 2022-01-06 43161.  2022
unique(crypto_data$year)
[1] 2022 2023 2024 2025 2026

Calculating the YTD Average

The YTD average is a cumulative average computed within the time window of a given year, and for each specific examined instrument (BTC-USD and ETH-USD).

This average, grouped by symbol and year, is computed below.

crypto_data <- crypto_data %>%
  group_by(symbol, year) %>%
  arrange(date, .by_group = TRUE) %>%
  mutate(
    ytd_avg = cumsum(close) / row_number()
  ) %>%
  ungroup()

head(crypto_data)
# A tibble: 6 × 5
  symbol  date        close  year ytd_avg
  <chr>   <date>      <dbl> <dbl>   <dbl>
1 BTC-USD 2022-01-01 47687.  2022  47687.
2 BTC-USD 2022-01-02 47345.  2022  47516.
3 BTC-USD 2022-01-03 46458.  2022  47163.
4 BTC-USD 2022-01-04 45898.  2022  46847.
5 BTC-USD 2022-01-05 43569.  2022  46191.
6 BTC-USD 2022-01-06 43161.  2022  45686.

In the calculations above, grouping by both symbol and year ensures that BTC and ETH are handled independently, and also allows the YTD average to reset at the start of each new calendar year. The cumsum() function calculates the cumulative sum of closing prices, and row_number() tracks how many observations have occurred so far within that year, which allows the running average to be computed at each row.

Calculating the Six-Day Moving Average

A primary difference between the YTD and the six-day moving average is that the six-day average (unlike YTD, which accumulates from the start of the given year) only takes the most recent six days into account.

For this calculation, the slider package will be installed and loaded. This package is particularly useful rolling statistics (like moving averages), since it allows control over the window size and whether partial windows are permitted.

#install.packages("slider")
library(slider)
Warning: package 'slider' was built under R version 4.5.2

Next, we calculate the six-day moving average for each instrument in the crypto_data dataset. One thing to note pertains to the first five observations. Since there are not yet six days of prior values available, we set .complete = TRUE, which causes those early rows to return NA (a common and reasonable approach). The .before argument in turn defines how many observations preceding the current element should be included within the sliding window. Since we want to focus on the previous six days (inclusive of the current day), we shall set this argument as being equal to 5.

crypto_data <- crypto_data %>%
  group_by(symbol) %>%
  arrange(date, .by_group = TRUE) %>%
  mutate(
    mov_avg_6day = slide_dbl(
     close,
     mean,
     .before = 5,
     .complete = TRUE
      
    )
  ) %>%
  ungroup()

head(crypto_data)
# A tibble: 6 × 6
  symbol  date        close  year ytd_avg mov_avg_6day
  <chr>   <date>      <dbl> <dbl>   <dbl>        <dbl>
1 BTC-USD 2022-01-01 47687.  2022  47687.          NA 
2 BTC-USD 2022-01-02 47345.  2022  47516.          NA 
3 BTC-USD 2022-01-03 46458.  2022  47163.          NA 
4 BTC-USD 2022-01-04 45898.  2022  46847.          NA 
5 BTC-USD 2022-01-05 43569.  2022  46191.          NA 
6 BTC-USD 2022-01-06 43161.  2022  45686.       45686.
tail(crypto_data)
# A tibble: 6 × 6
  symbol  date       close  year ytd_avg mov_avg_6day
  <chr>   <date>     <dbl> <dbl>   <dbl>        <dbl>
1 ETH-USD 2026-02-08 2089.  2026   2876.        2073.
2 ETH-USD 2026-02-09 2104.  2026   2857.        2052.
3 ETH-USD 2026-02-10 2019.  2026   2837.        2031.
4 ETH-USD 2026-02-11 1941.  2026   2815.        2051.
5 ETH-USD 2026-02-12 1947.  2026   2795.        2032.
6 ETH-USD 2026-02-13 2049.  2026   2778.        2025.

As shown in the output, the first five observations are encoded as NA for the six-day moving average, while observations from day six onward contain valid moving-average values. For example, on 2022-01-06, the sum of the previous 6 days closing values (from 2022-01-01 through 2022-01-06) is that of 274,117.65. The prior figure, once divided by 6, results in 45,686.275, which corresponds to the figure above.

Optional Endeavor: Visualization

As mentioned in the approach, one pathway to further examine the data is visualization, which allows for easier discernment of trends over time. In the depiction below, a comparison between the closing price versus he six-day moving average is plotted for both cryptocurrencies. These comparisons are placed side by side for easier assessment.

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2
ggplot(crypto_data, aes(x = date)) +
  geom_line(aes(y = close, color = "Closing Price")) +
  geom_line(aes(y = mov_avg_6day, color = "Six-Day Moving Avg.")) +
  facet_wrap(~ symbol, scales = "free_y") +
  labs(
    title = "BTC and ETH Closing Prices with 6-Day Moving Average",
    y = "Price",
    x = "Date"
  ) +
  theme_minimal()
Warning: Removed 10 rows containing missing values or values outside the scale range
(`geom_line()`).

Both Bitcoin and Ethereum display clear inclines and declines throughout the years, with prices notably dropping in 2022, rising strongly into 2024 and 2025, then becoming more unstable yet again toward current times.

Another comparison that may be drawn relates to the relationship between the closing prices and the YTD averages. This relation will be depicted in the graph below.

ggplot(crypto_data, aes(x = date)) +
  geom_line(aes(y = close, color = "Closing Price"), alpha = 0.6) +
  geom_line(aes(y = ytd_avg, color = "YTD Average"), size = 1) +
  facet_wrap(~ symbol, scales = "free_y") +
  labs(
    title = "Closing Price vs Year-to-Date (YTD) Average",
    x = "Date",
    y = "Price (USD)",
    color = "Legend"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)
  )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

In looking at the above graph, the YTD average appears to be smoother than the six-day moving average since it is calculated using all observations from the beginning of the year up to each given date, which in turn dampens the short-term fluctuations. In contrast, the six-day moving average only considers a small recent window of data, making it more responsive to daily price swings and therefore more variable.

Conclusion

In fulfilling the requirements of this assignment, it became quite lucid that window functions provide a highly effective means for calculating cumulative and rolling statistics within time-series data. In this scenario, I used grouped operations in the dplyr library to compute both the six-day moving average and the year-to-date (YTD) average for Bitcoin and Ethereum. Each instrument was handled in its own grouping context, and the exercise in and of itself compounded the importance of proper chronological sequencing and grouping when conducting time-series analysis.

LLM Used

  • OpenAI. (2026). ChatGPT (Version 4o) [Large language model]. https://chat.openai.com . Accessed February 14, 2026.