Load necessary libraries

library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(stats)
library(quantmod)
## Warning: package 'quantmod' was built under R version 4.3.3
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first()  masks xts::first()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::last()   masks xts::last()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsbox)
## Warning: package 'tsbox' was built under R version 4.3.3
library(tidyquant)
## Loading required package: PerformanceAnalytics
## Warning: package 'PerformanceAnalytics' was built under R version 4.3.3
## 
## Attaching package: 'PerformanceAnalytics'
## 
## The following object is masked from 'package:graphics':
## 
##     legend
 getSymbols("QQQ", from = "2023-01-01")
## [1] "QQQ"
 price <- to.daily(QQQ, indexAt = "lastof", OHLC = FALSE)
 
 price <- ts(price$QQQ.Close)

Fit a StructTS model

# Fit the Structural Time Series model
fit <- StructTS(price, type = "trend")

# Check the structure of the fitted model
str(fit)
## List of 12
##  $ coef     : Named num [1:3] 16.8 0 0
##   ..- attr(*, "names")= chr [1:3] "level" "slope" "epsilon"
##  $ loglik   : num -1121
##  $ loglik0  : num -37.9
##  $ data     : Time-Series [1:393] from 1 to 393: 264 266 262 269 271 ...
##   ..- attr(*, "src")= chr "yahoo"
##   ..- attr(*, "updated")= POSIXct[1:1], format: "2024-07-29 12:01:30"
##   ..- attr(*, "index")= num [1:393] 1.68e+09 1.68e+09 1.68e+09 1.68e+09 1.68e+09 ...
##   .. ..- attr(*, "tclass")= chr "Date"
##   .. ..- attr(*, "tzone")= chr "UTC"
##  $ residuals: Time-Series [1:393] from 1 to 393: 0 0.275 -0.984 1.742 0.257 ...
##  $ fitted   : Time-Series [1:393, 1:2] from 1 to 393: 264 266 262 269 271 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "level" "slope"
##  $ call     : language StructTS(x = price, type = "trend")
##  $ series   : chr "price"
##  $ code     : int 0
##  $ model    :List of 7
##   ..$ Z : num [1:2] 1 0
##   ..$ a : num [1:2] 462.97 0.501
##   ..$ P : num [1:2, 1:2] 0 0 0 0.0423
##   ..$ T : num [1:2, 1:2] 1 0 1 1
##   ..$ V : num [1:2, 1:2] 16.8 0 0 0
##   ..$ h : num 0
##   ..$ Pn: num [1:2, 1:2] 16.8037 0.0424 0.0424 0.0424
##  $ model0   :List of 7
##   ..$ Z : num [1:2] 1 0
##   ..$ a : num [1:2] 264 0
##   ..$ P : num [1:2, 1:2] 33869094 33869094 33869094 33869094
##   ..$ T : num [1:2, 1:2] 1 0 1 1
##   ..$ V : num [1:2, 1:2] 16.8 0 0 0
##   ..$ h : num 0
##   ..$ Pn: num [1:2, 1:2] 0 0 0 0
##  $ xtsp     : num [1:3] 1 393 1
##  - attr(*, "class")= chr "StructTS"
fit
## 
## Call:
## StructTS(x = price, type = "trend")
## 
## Variances:
##   level    slope  epsilon  
##   16.76     0.00     0.00
smoothed <- tsSmooth(fit)
str(smoothed)
##  Time-Series [1:393, 1:2] from 1 to 393: 264 266 262 269 271 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "level" "slope"
summary(fit)
##           Length Class  Mode     
## coef        3    -none- numeric  
## loglik      1    -none- numeric  
## loglik0     1    -none- numeric  
## data      393    ts     numeric  
## residuals 393    ts     numeric  
## fitted    786    mts    numeric  
## call        3    -none- call     
## series      1    -none- character
## code        1    -none- numeric  
## model       7    -none- list     
## model0      7    -none- list     
## xtsp        3    -none- numeric
tsdiag(fit)

cat("Transitional variance:", fit$coef["level"],
    "\n", "Slope variance:", fit$coef["slope"],
    "\n", "Observational variance:", fit$coef["epsilon"],
    "\n", "Initial level of mu:", fit$model0$a[1],
    "\n", "Initial level of lambda:", fit$model0$a[2],
    "\n")
## Transitional variance: 16.76127 
##  Slope variance: 0 
##  Observational variance: 0 
##  Initial level of mu: 264.48 
##  Initial level of lambda: 0
#The slopeis zero so a trend model provides no real value. here that a level #model
# would also provide . 

Certainly! the diagnostic chart for the QQQ stock prices from the StructTS state space model. The chart consists of three panels:

  1. Standardized Residuals
  2. ACF (Autocorrelation Function) of Residuals
  3. p-values for Ljung-Box Statistic

Standardized Residuals

  • This top panel shows the standardized residuals over time.
  • Ideally, these residuals should resemble white noise, meaning they should be randomly scattered around zero without any obvious patterns or trends.
  • In your chart, the residuals appear to fluctuate around zero, which is a good sign. There are no clear patterns, suggesting the model captures the data’s dynamics reasonably well.

ACF of Residuals

  • The middle panel shows the autocorrelation function of the residuals.
  • The ACF plot helps determine if there is any correlation between residuals at different lags.
  • The blue dashed lines represent the confidence intervals. For a well-fitted model, the autocorrelations should mostly lie within these bounds.
  • In your chart, most of the autocorrelations are within the confidence intervals, indicating that the residuals are not significantly autocorrelated, which is another good sign.

p-values for Ljung-Box Statistic

  • The bottom panel shows the p-values of the Ljung-Box test at different lags.
  • The Ljung-Box test checks whether groups of autocorrelations of the residuals are different from zero. It tests the null hypothesis that the data are independently distributed (i.e., residuals are white noise).
  • The blue dashed line represents the significance level (usually 0.05). If the p-values are above this line, we fail to reject the null hypothesis at that lag.
  • In your chart, the p-values at various lags are all above the significance level, suggesting that the residuals are independently distributed and there is no significant autocorrelation present.

Overall Interpretation

  • The standardized residuals appear to be randomly scattered around zero, indicating no obvious patterns.
  • The ACF plot shows that the residuals are mostly within the confidence intervals, indicating no significant autocorrelation.
  • The p-values for the Ljung-Box test are above the significance level, suggesting that the residuals are independently distributed.

Conclusion

The diagnostics indicate that the state space model (StructTS) for the QQQ stock prices fits the data well. The residuals behave like white noise, and there is no significant autocorrelation, implying that the model captures the essential dynamics of the data.

# Adjust plot margins to avoid the "figure margins too large" error
par(mar=c(5, 4, 4, 2) + 0.1)

# Plot the original time series
plot(price, type = "l", lwd = 1, main = "Original and Fitted Time Series")

# Extract the level component from the fitted values
fitted_values <- fitted(fit)[, "level"]

# Check the lengths of the original and fitted values
length(price)
## [1] 393
length(fitted_values)
## [1] 393
# Plot the fitted values (level component)
lines(fitted_values, lty = "dashed", lwd = 2, col = "green")

# Extract the level component from the smoothed values
smoothed_values <- tsSmooth(fit)[, "level"]

# Check the lengths of the original and smoothed values
length(price)
## [1] 393
length(smoothed_values)
## [1] 393
# Plot the smoothed values (level component)
lines(smoothed_values, lty = "dotted", lwd = 2, col = "blue")

# Forecast and plot the forecast
forecast_values <- forecast(fit, h = 20)
plot(forecast_values)

price_return <- diff(log(price))

library(pracma)
## Warning: package 'pracma' was built under R version 4.3.3
## 
## Attaching package: 'pracma'
## The following object is masked from 'package:purrr':
## 
##     cross
hurst <- hurstexp(price_return)
## Simple R/S Hurst estimation:         0.5014949 
## Corrected R over S Hurst exponent:   0.5308905 
## Empirical Hurst exponent:            0.4881369 
## Corrected empirical Hurst exponent:  0.4481049 
## Theoretical Hurst exponent:          0.5464941

The hurstexp function in the pracma library in R computes the Hurst exponent using several different methods. Here’s a breakdown of each method and what it represents:

  1. Simple R/S Hurst estimation: This method is based on the rescaled range (R/S) analysis proposed by Hurst. The rescaled range is calculated by dividing the range of cumulative deviations from the mean by the standard deviation. The Hurst exponent is then estimated from the scaling behavior of the rescaled range with respect to the time period.

  2. Corrected R over S Hurst exponent: This method corrects the bias present in the simple R/S estimation. The correction involves adjusting for small sample sizes and other statistical artifacts that might affect the accuracy of the Hurst exponent estimation.

  3. Empirical Hurst exponent: This method estimates the Hurst exponent using empirical data directly without relying on theoretical assumptions. It typically involves dividing the time series into non-overlapping segments and analyzing the scaling behavior of the variance of the time series.

  4. Corrected empirical Hurst exponent: Similar to the corrected R/S Hurst exponent, this method adjusts the empirical Hurst exponent for biases due to sample size and statistical artifacts.

  5. Theoretical Hurst exponent: This is based on the theoretical properties of the time series under certain assumptions, such as it being a self-similar process with stationary increments. Theoretical methods often involve fitting a model to the data and deriving the Hurst exponent from the model parameters.

In summary, the hurstexp function provides multiple estimates of the Hurst exponent using different approaches to give a comprehensive understanding of the time series’ long-term dependence or persistence. Each method has its strengths and weaknesses, and comparing them can help ensure the robustness of the Hurst exponent estimation.

library(fBasics)
## Warning: package 'fBasics' was built under R version 4.3.3
## 
## Attaching package: 'fBasics'
## The following objects are masked from 'package:pracma':
## 
##     akimaInterp, inv, kron, pascal
## The following objects are masked from 'package:PerformanceAnalytics':
## 
##     kurtosis, skewness
## The following object is masked from 'package:TTR':
## 
##     volatility
library(quantmod)

library(tseries)
## Warning: package 'tseries' was built under R version 4.3.3
getSymbols("QQQ")
## [1] "QQQ"
q <- dailyReturn(QQQ, type = "log")
#sp <- sp[!is.na(sp)]

## data - 

plot(q, type = "l", col = "steelblue", main = "NASDAQ")
abline(h = 0, col = "grey")

## teffectPlot -
# Scaling Law Effect:
scalinglawPlot(q)
## 
## Scaling Law:          daily.returns
##   Plot Intercept      3.755912
##   Plot Slope          0.4792611
##   Plot Inverse Slope  2.086545
scalinglawPlot(q)

## 
## Scaling Law:          daily.returns
##   Plot Intercept      3.755912
##   Plot Slope          0.4792611
##   Plot Inverse Slope  2.086545
outExp <- exp(3.76)
outExp
## [1] 42.94843
out2 <- outExp*42^(0.48)
out2
## [1] 258.2896
tail(q)
##            daily.returns
## 2024-07-19  -0.008903100
## 2024-07-22   0.014787890
## 2024-07-23  -0.003530882
## 2024-07-24  -0.036529465
## 2024-07-25  -0.011088956
## 2024-07-26   0.010203754
head(q)
##            daily.returns
## 2007-01-03 -0.0050749222
## 2007-01-04  0.0187863410
## 2007-01-05 -0.0047776885
## 2007-01-08  0.0006839757
## 2007-01-09  0.0050010881
## 2007-01-10  0.0117224172
head(q)
##            daily.returns
## 2007-01-03 -0.0050749222
## 2007-01-04  0.0187863410
## 2007-01-05 -0.0047776885
## 2007-01-08  0.0006839757
## 2007-01-09  0.0050010881
## 2007-01-10  0.0117224172
tail(q)
##            daily.returns
## 2024-07-19  -0.008903100
## 2024-07-22   0.014787890
## 2024-07-23  -0.003530882
## 2024-07-24  -0.036529465
## 2024-07-25  -0.011088956
## 2024-07-26   0.010203754

Given the output from the scalinglawplot() command, you have the following parameters:

  • Plot Intercept: \(\log(k) = 3.755912\)
  • Plot Slope: \(a = 0.4792611\)
  • Plot Inverse Slope: Not needed directly for forming the power law equation.

To form the power law equation \(y = k \cdot x^a\):

  1. Calculate the scaling factor \(k\): The intercept provided is the logarithm of \(k\). Therefore, \(k\) can be found by exponentiating the intercept.

\[ k = e^{3.755912} \]

  1. Write the power law equation: Using the slope \(a\) directly as the exponent.

So, the power law equation is:

\[ y = k \cdot x^a \]

Let’s calculate \(k\):

\[ k = e^{3.755912} \approx 42.7732111964 \]

Thus, the power law equation is:

\[ y = 42.7732111964 \cdot x^{0.4783532} \]

This equation represents the scaling relationship derived from your data.

Interpretation of the Scaling Law Equation

Rate of Growth in the Power Law Equation

In the context of the power law equation \(y = k \cdot x^a\), the rate of growth is indicated by the . Here’s how it works:

: This exponent determines how \(y\) changes as \(x\) changes. It essentially describes the rate of growth or decay:

item If \(\( a > 1 \), \( y \)\) grows faster than \(x\) (super-linear growth). If \(0 < a < 1\), \(y\) grows but at a decreasing rate compared to \(x\) (sub-linear growth). If \(a = 1\), \(y\) grows linearly with \(x\).

If \(a < 0\), \(y\) decreases as \(x\) increases (decay).

For your specific equation \(y = 66.987 \cdot x^{0.4783532}\):

item The slope \(a = 0.4783532\) indicates sub-linear growth. This means that \(y\) increases with \(x\), but at a decreasing rate. The growth rate is slower compared to a linear relationship (where \(a = 1\)).

The or value of \(y\) at specific points depends on both the scaling factor \(k\) and the slope \(a\), but the slope \(a\) is what fundamentally indicates the rate at which \(y\) grows with respect to \(x\).