Assignment3BFinal

Approach

I’m using an LLM in particular Claude to generate me a data-set of going to back to 2021 of the 3 largest crypto coins on the market and provide the date and the end closing price of that coin. It generated for each coin and created a csv for me which will be attached in the file and loaded. I will use this data-set to calculate the year-to-date average and the six-day moving averages for the 3 crypto coins.

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(readr)
library(lubridate)


Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

Loading Data

url <- "https://raw.githubusercontent.com/AslamF/DATA607-Assignment-3B/refs/heads/main/cryptoData-Set.csv"

cryptoData <- read.csv(url)

glimpse(cryptoData)

Rows: 4,512
Columns: 3
$ date        <chr> "2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2…
$ crypto      <chr> "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "B…
$ close_price <dbl> 47607.14, 47466.38, 48258.70, 50120.32, 49851.98, 49585.10…

unique(cryptoData$crypto)

[1] "BTC"  "ETH"  "USDT"

range(cryptoData$date)

[1] "2022-01-01" "2026-02-12"

Calculations

The Goal of this assignment is to calculate the year to date average and the six day moving averages for each crypto currency

We are going to pipe the data through several operations and save the final as results, which will be used to display the results. Sorting the data by crypto and arranding the data oldest to newest. This is an extra step but good practice. The data was already organized this way by the LLM. year() is a function that comes from the lubridate library. Lag() is critical for determining the 6-day average as it lets use select a value from a previous row. In our data-set a previous row represents a previous day. cumean() is the critical function which lets us calculate year-to date as it provides the running average. In our case it will take the day (out of 365) and take the running total of price per day and find the average.

results <- cryptoData %>%
  arrange(crypto, date) %>%
  group_by(crypto) %>%
  mutate( year = year(date),
          moving_avg_6day = (
      close_price +           # Current day
      lag(close_price, 1) +   # 1 day ago
      lag(close_price, 2) +   # 2 days ago
      lag(close_price, 3) +   # 3 days ago
      lag(close_price, 4) +   # 4 days ago
      lag(close_price, 5)     # 5 days ago
    ) / 6
  ) %>%
  group_by(crypto, year) %>%
  mutate(
    ytd_average = cummean(close_price)
  ) %>%
  ungroup()

Results

results %>%
  filter(crypto == "BTC", date >= "2022-01-01", date <= "2022-01-10") %>%
  select(date, crypto, close_price, moving_avg_6day, ytd_average)

# A tibble: 10 × 5
   date       crypto close_price moving_avg_6day ytd_average
   <chr>      <chr>        <dbl>           <dbl>       <dbl>
 1 2022-01-01 BTC         47607.             NA       47607.
 2 2022-01-02 BTC         47466.             NA       47537.
 3 2022-01-03 BTC         48259.             NA       47777.
 4 2022-01-04 BTC         50120.             NA       48363.
 5 2022-01-05 BTC         49852.             NA       48661.
 6 2022-01-06 BTC         49585.          48815.      48815.
 7 2022-01-07 BTC         51568.          49475.      49208.
 8 2022-01-08 BTC         52583.          50328.      49630.
 9 2022-01-09 BTC         51992.          50950.      49892.
10 2022-01-10 BTC         52723.          51384.      50175.

results %>%
  filter(crypto == "ETH", date >= "2022-01-01", date <= "2022-01-10") %>%
  select(date, crypto, close_price, moving_avg_6day, ytd_average)

# A tibble: 10 × 5
   date       crypto close_price moving_avg_6day ytd_average
   <chr>      <chr>        <dbl>           <dbl>       <dbl>
 1 2022-01-01 ETH          3683.             NA        3683.
 2 2022-01-02 ETH          3635.             NA        3659.
 3 2022-01-03 ETH          3712.             NA        3677.
 4 2022-01-04 ETH          3821.             NA        3713.
 5 2022-01-05 ETH          3833.             NA        3737.
 6 2022-01-06 ETH          4005.           3781.       3781.
 7 2022-01-07 ETH          3869.           3812.       3794.
 8 2022-01-08 ETH          3849.           3848.       3801.
 9 2022-01-09 ETH          3768.           3857.       3797.
10 2022-01-10 ETH          3559.           3814.       3773.

results %>%
  filter(crypto == "USDT", date >= "2022-01-01", date <= "2022-01-10") %>%
  select(date, crypto, close_price, moving_avg_6day, ytd_average)

# A tibble: 10 × 5
   date       crypto close_price moving_avg_6day ytd_average
   <chr>      <chr>        <dbl>           <dbl>       <dbl>
 1 2022-01-01 USDT             1              NA           1
 2 2022-01-02 USDT             1              NA           1
 3 2022-01-03 USDT             1              NA           1
 4 2022-01-04 USDT             1              NA           1
 5 2022-01-05 USDT             1              NA           1
 6 2022-01-06 USDT             1               1           1
 7 2022-01-07 USDT             1               1           1
 8 2022-01-08 USDT             1               1           1
 9 2022-01-09 USDT             1               1           1
10 2022-01-10 USDT             1               1           1

Conclusion

My first notice is how “useless” the data for USDT appears. Looking at the data it hovers around 1.0 and shifts very slowly. Barely reaches 2.0 over 4-years The data is not very entertaining and it is hard to extract information about the coin itself from the data. This shows me a-lot about the importance on the quality of data. I think in the future, to improve what is being shown I can show yearly changes in crypto currency and monthly changes per year. Its more impactful than seeing every days change. As an investor a 6-day avg is more important than something like a monthly change but the data is more impactful to observe large changes. It really tells me that what you want to use the data for really impact what type of analysis you will do on the data.