Assignment 3B Window Functions Code Base

Author

Long Lin

Window Functions Overview

For this assignment, I will use a dataset that I got originally from kaggle and uploaded to github. The dataset includes daily bitcoin historical data from 2018 to 2026. I will use window functions to calculate the year-to-date average and the six-day moving averages for each observation in the year 2026.

kaggle source: https://www.kaggle.com/datasets/novandraanugrah/bitcoin-historical-datasets-2018-2024?select=btc_1d_data_2018_to_2025.csv

github source: https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Assignment%203B/btc_1d_data_2018_to_2025.csv

Calculating year-to-date average

In order to calculate the year to date average, I will start by importing the data into a dataframe.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
url <- "https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Assignment%203B/btc_1d_data_2018_to_2025.csv"

btc_df <- read_csv(
  file = url,
  show_col_types = FALSE,
  progress = FALSE
)

head(btc_df, 10)
# A tibble: 10 × 12
   `Open time`                     Open   High    Low  Close Volume `Close time`
   <chr>                          <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>       
 1 2018-01-01 00:00:00.000000 U… 13716. 13819. 12750  13380   8610. 2018-01-01 …
 2 2018-01-02 00:00:00.000000 U… 13382. 15473. 12890. 14675. 20078. 2018-01-02 …
 3 2018-01-03 00:00:00.000000 U… 14690  15308. 14150  14920. 15906. 2018-01-03 …
 4 2018-01-04 00:00:00.000000 U… 14920. 15280  13918. 15060. 21330. 2018-01-04 …
 5 2018-01-05 00:00:00.000000 U… 15060. 17176. 14600  16960. 23251. 2018-01-05 …
 6 2018-01-06 00:00:00.000000 U… 16960. 17143. 16011. 17070. 18571. 2018-01-06 …
 7 2018-01-07 00:00:00.000000 U… 17070. 17100. 15610  16150. 12493. 2018-01-07 …
 8 2018-01-08 00:00:00.000000 U… 16219. 16322. 12812  14903. 26601. 2018-01-08 …
 9 2018-01-09 00:00:00.000000 U… 14903. 15500  14011. 14400  14315. 2018-01-09 …
10 2018-01-10 00:00:00.000000 U… 14401  14956. 13131. 14907. 17411. 2018-01-10 …
# ℹ 5 more variables: `Quote asset volume` <dbl>, `Number of trades` <dbl>,
#   `Taker buy base asset volume` <dbl>, `Taker buy quote asset volume` <dbl>,
#   Ignore <dbl>

Next I will take a subset of the data for only items in 2026 and include only the “Close Time” and “Close” columns. These columns represent the date and closing price for the specific observation. I also extracted the date out of the “Close Time” data point because the close time was not needed since it was always 23:59 UTC time.

btc_2026_df <- btc_df |>
  filter(
    year(btc_df$`Close time`) == "2026"
  ) 

btc_2026_df <- btc_2026_df[c("Close time", "Close")]

btc_2026_df <- btc_2026_df |>
  mutate(
    Date = as.Date(`Close time`)
  )

btc_2026_df <- btc_2026_df[c("Date", "Close")]

head(btc_2026_df, 10)
# A tibble: 10 × 2
   Date        Close
   <date>      <dbl>
 1 2026-01-01 87875.
 2 2026-01-02 88732.
 3 2026-01-03 90186.
 4 2026-01-04 91104 
 5 2026-01-05 92948.
 6 2026-01-06 93956.
 7 2026-01-07 92531.
 8 2026-01-08 91429.
 9 2026-01-09 91179.
10 2026-01-10 90641.

With the subset of observations from 2026, I’ll create another column that shows the YTD average. In order to get the year to date average, I first have to arrange the data in order by the close time and then calculate the cumulative mean on the close price.

btc_2026_df <- btc_2026_df |>
  arrange(btc_2026_df$`Close time`) |>
    mutate("YTD Average" = cummean(btc_2026_df$Close))
Warning: There was 1 warning in `arrange()`.
ℹ In argument: `..1 = btc_2026_df$`Close time``.
Caused by warning:
! Unknown or uninitialised column: `Close time`.
head(btc_2026_df, 10)
# A tibble: 10 × 3
   Date        Close `YTD Average`
   <date>      <dbl>         <dbl>
 1 2026-01-01 87875.        87875.
 2 2026-01-02 88732.        88304.
 3 2026-01-03 90186.        88931.
 4 2026-01-04 91104         89474.
 5 2026-01-05 92948.        90169.
 6 2026-01-06 93956.        90800.
 7 2026-01-07 92531.        91047.
 8 2026-01-08 91429.        91095.
 9 2026-01-09 91179.        91105.
10 2026-01-10 90641.        91058.

Next I will create a column for the six day moving averages for the 2026 bitcoin data. In order to get the six day moving average, I’ll use the rollmean() function from the “zoo” package.

library(zoo) # Required for rollmean()

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
btc_2026_df <- btc_2026_df |>
  arrange(btc_2026_df$`Close time`) |>
    mutate(
      # Rolling (6-Day) - Window moves with the date
      # k=6: The window size
      # fill=NA: Fills the first 5 days with NA (since they don't have enough history)
      # align="right": Calculates using the current day + previous 5 days
      "6 Day Moving Average" = rollmean(btc_2026_df$Close, k = 6, fill = NA, align = "right")
    )
Warning: There was 1 warning in `arrange()`.
ℹ In argument: `..1 = btc_2026_df$`Close time``.
Caused by warning:
! Unknown or uninitialised column: `Close time`.
head(btc_2026_df, 10)
# A tibble: 10 × 4
   Date        Close `YTD Average` `6 Day Moving Average`
   <date>      <dbl>         <dbl>                  <dbl>
 1 2026-01-01 87875.        87875.                    NA 
 2 2026-01-02 88732.        88304.                    NA 
 3 2026-01-03 90186.        88931.                    NA 
 4 2026-01-04 91104         89474.                    NA 
 5 2026-01-05 92948.        90169.                    NA 
 6 2026-01-06 93956.        90800.                 90800.
 7 2026-01-07 92531.        91047.                 91576.
 8 2026-01-08 91429.        91095.                 92026.
 9 2026-01-09 91179.        91105.                 92191.
10 2026-01-10 90641.        91058.                 92114.

Finally, to improve readability, I’ll change the column name for “Close” to “Price”, and add currency formatting for the price, YTD average, and 6 day moving average. I also removed the decimal values that represent the cent values on the prices because the numbers are large and the cent values are just insignificant in this context.

library(gt)
btc_2026_df |>
  gt() |>
  cols_label(
    Close = "Price"
  ) |>
  fmt_currency(
    columns = c(`Close`, `YTD Average`, `6 Day Moving Average`),
    currency = "USD",
    decimals = 0
  )
Date Price YTD Average 6 Day Moving Average
2026-01-01 $87,875 $87,875 NA
2026-01-02 $88,732 $88,304 NA
2026-01-03 $90,186 $88,931 NA
2026-01-04 $91,104 $89,474 NA
2026-01-05 $92,948 $90,169 NA
2026-01-06 $93,956 $90,800 $90,800
2026-01-07 $92,531 $91,047 $91,576
2026-01-08 $91,429 $91,095 $92,026
2026-01-09 $91,179 $91,105 $92,191
2026-01-10 $90,641 $91,058 $92,114
2026-01-11 $90,581 $91,015 $91,720
2026-01-12 $91,459 $91,052 $91,303
2026-01-13 $91,386 $91,078 $91,113
2026-01-14 $95,501 $91,394 $91,791
2026-01-15 $96,522 $91,735 $92,682
2026-01-16 $95,548 $91,974 $93,500
2026-01-17 $95,478 $92,180 $94,316
2026-01-18 $95,026 $92,338 $94,910
2026-01-19 $92,571 $92,350 $95,108
2026-01-20 $92,807 $92,373 $94,659
2026-01-21 $89,004 $92,213 $93,406
2026-01-22 $90,157 $92,119 $92,507
2026-01-23 $89,666 $92,013 $91,538
2026-01-24 $89,573 $91,911 $90,630
2026-01-25 $89,187 $91,802 $90,066
2026-01-26 $87,607 $91,641 $89,199
2026-01-27 $88,408 $91,521 $89,100
2026-01-28 $89,309 $91,442 $88,958
2026-01-29 $88,846 $91,352 $88,821
2026-01-30 $82,422 $91,055 $87,630
2026-01-31 $84,010 $90,827 $86,767
2026-02-01 $78,589 $90,445 $85,264
2026-02-02 $77,654 $90,057 $83,471
2026-02-03 $78,948 $89,731 $81,745
2026-02-04 $76,597 $89,355 $79,703
2026-02-05 $72,549 $88,889 $78,058
2026-02-06 $65,245 $88,249 $74,930
2026-02-07 $70,264 $87,776 $73,543
2026-02-08 $69,421 $87,306 $72,171
2026-02-09 $70,380 $86,882 $70,743
2026-02-10 $70,426 $86,481 $69,714
2026-02-11 $69,159 $86,069 $69,149
2026-02-12 $67,635 $85,640 $69,547
2026-02-13 $66,395 $85,203 $68,903
2026-02-14 $69,015 $84,843 $68,835

Final Conclusions

Using the table above, I am able to see that even though the price of bitcoin dipped below $70,000 in February 2026, the YTD average up to that point was still above $80,000 while the 6 day moving average was a lot closer to the actual price at around $70,000. I think in this example, the 6 day moving average is a much better average because it gives a value closer to the current price.