if (!require("ISwR")) {
install.packages("ISwR")
library(ISwR)
}
## Loading required package: ISwR
## Warning: package 'ISwR' was built under R version 4.4.3
if (!require("MASS")) {
install.packages("MASS")
library(MASS)
}
## Loading required package: MASS
if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
## Loading required package: knitr
if (!require("forecast")) {
install.packages("forecast")
library(forecast)
}
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 4.4.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
if (!require("TTR")) {
install.packages("TTR")
library(TTR)
}
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 4.4.3
if (!require("dplyr")) {
install.packages("dplyr")
library(dplyr)
}
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!require("ggplot2")) {
install.packages("ggplot2")
library(ggplot2)
}
## Loading required package: ggplot2
if (!require("tseries")) {
install.packages("tseries")
library(tseries)
}
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 4.4.3
knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
message = FALSE,
fig.align = "center",
fig.pos = "ht"
)
Time series decomposition is a process of splitting a time series into basic components: trend, seasonality, and random error. In this report, we apply the decomposition and forecasting methods discussed in class to a financial time series: weekly Bitcoin (BTC-USD) prices.
The goals of this analysis are:
The data used for this project is weekly BTC pricing data from 2016 to 2024. The goal is to forecast seasonality within BTC pricing. ## 2.1 Variable Definitions To make the analysis transparent, we summarize below the variables used in the dataset along with their data types and descriptions:
Date
Type: Date
Description: The weekly observation date for the BTC
price.
BTC.USD
Type: Numeric
Description: The closing price of Bitcoin (BTC) in U.S. dollars
for each observation week.
These variables form the structure of the time series used throughout the decomposition and forecasting procedures.
data_path <- "C:/Users/rg03/Downloads/btc_pricing.csv"
btc_raw <- read.csv(data_path, stringsAsFactors = FALSE)
str(btc_raw)
## 'data.frame': 417 obs. of 2 variables:
## $ Date : chr "12/29/2016" "1/5/2017" "1/12/2017" "1/19/2017" ...
## $ BTC.USD: num 911 822 925 919 1027 ...
head(btc_raw)
names(btc_raw)
## [1] "Date" "BTC.USD"
# Convert Date column to Date class (MM/DD/YYYY format)
btc_raw$Date <- as.Date(btc_raw$Date, format = "%m/%d/%Y")
# Sort by Date
btc_raw <- btc_raw[order(btc_raw$Date), ]
# Keep the last 450 observations for analysis
data.btc <- tail(btc_raw, 450)
nrow(data.btc)
## [1] 417
head(data.btc)
tail(data.btc)
The original dataset contains weekly Bitcoin prices over several years. For this assignment, we follow the guideline from the class notes and use the most recent 450 observations as our working time series. This balances having enough historical data with focusing on the more relevant recent behavior.
We now convert the cleaned Bitcoin price data into a time series
object. Time series decomposition requires data to be stored in an R
ts object with an appropriate frequency. Since our data
consists of weekly observations, we assign a frequency of 52,
representing the number of weeks in a year. Defining the data as a time
series allows us to apply classical decomposition, STL decomposition,
and forecasting techniques in a structured and consistent framework.
btc_price <- data.btc[, 2]
btc.ts <- ts(btc_price, frequency = 52)
par(mar = c(3, 3, 2, 1))
plot(
btc.ts,
main = "Weekly Bitcoin Prices (Last 450 Weeks)",
ylab = "BTC Price (USD)",
xlab = "Time",
col = "darkred"
)
Classical decomposition splits a time series into three structural components—trend, seasonality, and remainder—using moving averages and seasonal averaging. This method mirrors the step-by-step approach demonstrated in the lecture notes. In this section, we apply classical decomposition to the weekly Bitcoin series by extracting the long-term trend using a centered moving average, computing the seasonal component through weekly averaging, and identifying the random fluctuations in the remainder. These components allow us to understand the underlying structure of Bitcoin price dynamics.
To estimate the long-term behavior of Bitcoin prices, we compute a centered moving average with a window equal to the seasonal period (52 weeks). This smooths out short-term volatility and highlights the general direction of the series.
Once the trend is removed, the detrended series is rearranged into a matrix with 52 rows, enabling us to average across corresponding weeks to identify repeating weekly seasonal patterns. This step reveals whether Bitcoin exhibits systematic weekly fluctuations.
# Extract trend using a centered moving average with window = 52 (weekly seasonality)
trend.btc <- ma(btc.ts, order = 52, centre = TRUE)
par(mar = c(3, 3, 2, 1))
plot(
btc.ts,
xlab = "",
ylab = "BTC Price (USD)",
main = "Extract Trend from Weekly Bitcoin Prices",
col = "darkred",
lwd = 2
)
lines(trend.btc, col = "blue", lwd = 2)
legend(
"topleft",
c("Original series", "Trend curve"),
lwd = c(2, 2),
col = c("darkred", "blue"),
bty = "n"
)
# Detrend (additive model)
detrend.btc <- btc.ts - trend.btc
par(mar = c(3, 3, 2, 1))
plot(
detrend.btc,
xlab = "",
ylab = "Detrended BTC Price",
main = "Detrended Weekly BTC Prices",
col = "darkred"
)
mtrx.btc <- t(matrix(data = detrend.btc, nrow = 52))
# Average over each week position across all years/blocks
seasonal.btc <- colMeans(mtrx.btc, na.rm = TRUE)
# Repeat seasonal pattern over the length of the series
seasonal.btc.ts <- ts(
rep(seasonal.btc, length.out = length(btc.ts)),
frequency = 52
)
par(mar = c(3, 3, 2, 1))
plot(
seasonal.btc.ts,
xlab = "",
ylab = "Seasonal Component",
main = "Seasonal Series of Weekly BTC Prices",
col = "darkred"
)
# Random error in an additive model
random.btc <- btc.ts - trend.btc - seasonal.btc.ts
par(mfrow = c(3, 1), mar = c(3, 3, 2, 1))
plot(btc.ts,
main = "Original Weekly BTC Prices",
xlab = "", ylab = "BTC Price (USD)", col = "darkred")
plot(trend.btc,
main = "Trend Component",
xlab = "", ylab = "Trend", col = "blue")
plot(random.btc,
main = "Random Errors (Remainder)",
xlab = "", ylab = "Error", col = "darkgreen")
par(mfrow = c(1, 1))
recomposed.btc <- trend.btc + seasonal.btc.ts + random.btc
par(mar = c(3, 3, 2, 1))
plot(
btc.ts,
col = "darkred",
lty = 1,
main = "Original vs Reconstructed BTC Series",
xlab = "",
ylab = "BTC Price (USD)"
)
lines(recomposed.btc, col = "blue", lty = 2, lwd = 2)
legend(
"topleft",
c("Original series", "Reconstructed series"),
col = c("darkred", "blue"),
lty = 1:2,
lwd = 1:2,
cex = 0.8,
bty = "n"
)
decomp.btc <- decompose(btc.ts, type = "additive")
par(mar = c(2, 3, 2, 1))
plot(decomp.btc, col = "darkred", xlab = "")
seasonal.btc.dec <- decomp.btc$seasonal
trend.btc.dec <- decomp.btc$trend
error.btc.dec <- decomp.btc$random
head(seasonal.btc.dec)
## Time Series:
## Start = c(1, 1)
## End = c(1, 6)
## Frequency = 52
## [1] 405.4466 225.4120 -118.4234 -1529.4473 -1547.4751 -158.9906
head(trend.btc.dec)
## Time Series:
## Start = c(1, 1)
## End = c(1, 6)
## Frequency = 52
## [1] NA NA NA NA NA NA
head(error.btc.dec)
## Time Series:
## Start = c(1, 1)
## End = c(1, 6)
## Frequency = 52
## [1] NA NA NA NA NA NA
STL is a more flexible and robust decomposition technique than the classical approach. It applies locally weighted regression to produce smooth trend estimates and stable seasonal components. In this section, we use STL to decompose the weekly Bitcoin series and then generate 7-week forecasts using a random walk with drift. This forecasting approach reflects the assumption that future prices evolve similarly to recent trends, consistent with the methodology presented in the lecture notes.
stl.btc <- stl(btc.ts, s.window = "periodic")
par(mar = c(2, 3, 2, 1))
plot(stl.btc)
fit_stl <- stl(btc.ts, s.window = "periodic")
par(mar = c(3, 3, 2, 1))
fcst.btc <- forecast(fit_stl, h = 7, method = "rwdrift")
plot(fcst.btc,
main = "Forecasts from STL + Random Walk with Drift (BTC)",
ylab = "BTC Price (USD)",
xlab = "Time")
fcst.btc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 9.019231 92963.28 89585.05 96341.51 87796.72 98129.84
## 9.038462 92834.55 88051.28 97617.82 85519.17 100149.93
## 9.057692 91772.31 85907.03 97637.60 82802.13 100742.50
## 9.076923 91951.47 85170.75 98732.20 81581.25 102321.70
## 9.096154 93355.77 85765.67 100945.87 81747.71 104963.83
## 9.115385 95457.24 87132.83 103781.65 82726.16 108188.33
## 9.134615 96854.69 87852.65 105856.73 83087.26 110622.12
To evaluate how training sample size impacts forecasting accuracy, we follow the case study from the lecture notes by holding out the last seven observations as a test set. We then construct four different training sets of increasing lengths and apply STL decomposition and naive forecasting to each. Comparing their forecast performance allows us to examine whether more historical data necessarily improves accuracy in a volatile financial series like Bitcoin.
ini.data <- as.numeric(btc.ts)
n0 <- length(ini.data)
# Last 7 observations used as the test set
test.data <- ini.data[(n0 - 6):n0]
# Four training sets with different starting points
train.data01 <- ini.data[1:(n0 - 7)]
train.data02 <- ini.data[21:(n0 - 7)]
train.data03 <- ini.data[31:(n0 - 7)]
train.data04 <- ini.data[40:(n0 - 7)]
length(train.data01)
## [1] 410
length(train.data02)
## [1] 390
length(train.data03)
## [1] 380
length(train.data04)
## [1] 371
train01.ts <- ts(train.data01, frequency = 52)
train02.ts <- ts(train.data02, frequency = 52)
train03.ts <- ts(train.data03, frequency = 52)
train04.ts <- ts(train.data04, frequency = 52)
stl01 <- stl(train01.ts, s.window = "periodic")
stl02 <- stl(train02.ts, s.window = "periodic")
stl03 <- stl(train03.ts, s.window = "periodic")
stl04 <- stl(train04.ts, s.window = "periodic")
fcst01 <- forecast(stl01, h = 7, method = "naive")
fcst02 <- forecast(stl02, h = 7, method = "naive")
fcst03 <- forecast(stl03, h = 7, method = "naive")
fcst04 <- forecast(stl04, h = 7, method = "naive")
We compute prediction errors for each model using mean squared error (MSE) and mean absolute percentage error (MAPE). These metrics quantify how well each training-size model predicts the held-out test data. Following the template in the lecture notes, we summarize the results in a table to identify the most accurate model.
# Percentage errors
PE01 <- (test.data - fcst01$mean) / fcst01$mean
PE02 <- (test.data - fcst02$mean) / fcst02$mean
PE03 <- (test.data - fcst03$mean) / fcst03$mean
PE04 <- (test.data - fcst04$mean) / fcst04$mean
# MAPE
MAPE1 <- mean(abs(PE01))
MAPE2 <- mean(abs(PE02))
MAPE3 <- mean(abs(PE03))
MAPE4 <- mean(abs(PE04))
# Raw errors
E1 <- test.data - fcst01$mean
E2 <- test.data - fcst02$mean
E3 <- test.data - fcst03$mean
E4 <- test.data - fcst04$mean
# MSE
MSE1 <- mean(E1^2)
MSE2 <- mean(E2^2)
MSE3 <- mean(E3^2)
MSE4 <- mean(E4^2)
# Store into vectors
MSE <- c(MSE1, MSE2, MSE3, MSE4)
MAPE <- c(MAPE1, MAPE2, MAPE3, MAPE4)
# Combine into accuracy table
accuracy <- cbind(MSE = MSE, MAPE = MAPE)
row.names(accuracy) <- c("Train_1", "Train_2", "Train_3", "Train_4")
kable(accuracy, caption = "Error Comparison for BTC Forecasts with Different Training Sizes")
| MSE | MAPE | |
|---|---|---|
| Train_1 | 365814692 | 0.2364654 |
| Train_2 | 357076541 | 0.2328612 |
| Train_3 | 357932527 | 0.2332444 |
| Train_4 | 361652248 | 0.2348110 |
To visually compare forecasting accuracy across the four training models, we plot both MSE and MAPE as functions of training sample size. This visual representation allows us to assess trends in model performance and determine whether increasing the amount of training data significantly improves forecasting accuracy for Bitcoin prices.
par(mar = c(4, 4, 3, 1))
plot(
1:4, MSE,
type = "b",
col = "darkred",
ylab = "Error",
xlab = "",
ylim = c(min(c(MSE, MAPE)) * 0.9, max(c(MSE, MAPE)) * 1.1),
xlim = c(0.5, 4.5),
main = "Error Curves (BTC Weekly Forecasts)",
axes = FALSE
)
axis(2)
labs <- c("Train_1", "Train_2", "Train_3", "Train_4")
axis(1, at = 1:4, labels = labs, pos = min(c(MSE, MAPE)) * 0.9)
lines(1:4, MAPE, type = "b", col = "blue")
text(1:4, MAPE + 0.02 * max(MAPE), round(MAPE, 4), col = "blue", cex = 0.7)
text(1:4, MSE - 0.02 * max(MSE), round(MSE, 4), col = "darkred", cex = 0.7)
legend(
"topright",
c("MSE", "MAPE"),
col = c("darkred", "blue"),
lty = 1,
bty = "n",
cex = 0.8
)
This section summarizes the findings from the decomposition and forecasting procedures applied to the weekly Bitcoin price series.
The STL trend component captures the rapid upward movement of BTC’s pricing, including major bull runs. Both the classical and STL seasonal components are relatively flat. The aligns with the underlying strucutre of the asset: BTC trades 24/7, without closure, making it harder to find seasonal trends.
Using STL and Random walk with drift, the 7 week forecasts show that forecast values extend the current trend, and wide prediction intervals - to be expected with a high vol asset like BTC.
The classical decomposition results for BTC show that the seasonal component fluctuates more than expected and contains structural artifacts due to the large moving average window and the noisy nature of the series. This is consistent with the limitations discussed in the lecture notes: classical decomposition is highly sensitive to outliers and performs poorly when the time series has weak seasonal structure. In contrast, the STL method provides a much smoother and more realistic trend extraction, with a stable and nearly flat seasonal component. STL is therefore the more appropriate choice for financial assets such as Bitcoin, which exhibit nonlinear growth and minimal true seasonality.
BTC weekly pricing demonstrates strong trend, high vol, and minimal weekly seasonality. Classical decomposition works but produces limited seasonal insight, while STL provides a more accurate depiction of the underlying trend. Forecasting BTC is difficult due to high vol and noise, which the wide prediction intervals show. Ultimately, BTC is a challenging asset to model cyclically. Further extension to increase accuracy could be implementing more advanced time series forecasting methods, such as an ARIMA or GARCH model.