References:
This text makes reference to the following sources:
The following is a free, online textbook: R For Data Sciences by H. Wickham and G. Grolemund https://r4ds.had.co.nz/
Investment Sciences by D.G. Luenberger
R is widely used in quantitative finance due to its extensive statistical capabilities, data manipulation tools, and its active community that develops and maintains specialized packages for finance. It offers a wide range of functionalities that enable professionals in the finance industry to analyze data, build models, conduct simulations, and perform risk management tasks.
Here are some key features and applications of R in quantitative finance:
Data Analysis and Visualization: R provides powerful tools for data analysis and visualization, allowing you to explore, clean, and transform financial data. It offers numerous packages for importing and manipulating financial data from various sources, such as CSV files, databases, and web APIs. R’s graphics capabilities help create informative charts, plots, and interactive visualizations to gain insights from the data.
Statistical Analysis: R’s statistical functionalities are highly valuable in finance. You can use R to perform descriptive statistics, hypothesis testing, regression analysis, time series analysis, and advanced modeling techniques. R’s extensive collection of statistical packages, including “stats,” “quantmod,” “rugarch,” and “fGarch,” provide specialized functions for finance-related analysis.
Quantitative Modeling: R is well-suited for building quantitative models in finance. It offers libraries for option pricing, portfolio optimization, risk assessment, and asset allocation. Packages like “quantmod,” “PerformanceAnalytics,” and “PortfolioAnalytics” provide tools for backtesting trading strategies, calculating performance metrics, and constructing efficient portfolios.
Time Series Analysis: R has robust capabilities for analyzing and modeling time series data, which is prevalent in finance. It offers packages like “xts” and “zoo” for handling time series objects, and packages like “forecast” and “TTR” for time series forecasting, volatility modeling, and technical analysis.
Risk Management: R provides tools for risk management in finance, such as Value-at-Risk (VaR) calculations, Monte Carlo simulations, and stress testing. Packages like “RiskMetrics” and “fPortfolio” offer functions for measuring and managing financial risk.
Financial Econometrics: R is widely used for econometric analysis in finance. It offers packages like “Econometrics” and “plm” for estimating econometric models, performing panel data analysis, and conducting event studies.
Algorithmic Trading: R can be used for developing and backtesting algorithmic trading strategies. Packages like “quantstrat” and “blotter” enable you to implement and evaluate trading strategies using historical data.
In addition to these specific applications, R’s open-source nature and active community allow users to access a vast collection of finance-related packages and resources. The flexibility of the R language and its integration with other programming languages, such as C++, also make it suitable for implementing complex financial models and high-performance computations.
To install R and RStudio, follow the steps below. Please note that in order to use RStudio, install both R and RStudio applications:
Installing R:
Visit the Comprehensive R Archive Network (CRAN) website: https://cran.r-project.org/
Choose a CRAN mirror location near your geographical location.
On the mirror’s website, select your operating system (Windows, macOS, or Linux).
Download the appropriate installation package for your operating system.
Once the download is complete, run the installer and follow the instructions provided.
During the installation process, you may be asked to choose customization options. You can typically leave the default settings unless you have specific preferences.
Complete the installation process.
Installing RStudio:
Visit the RStudio website: https://www.rstudio.com/
Go to the “Products” menu and click on “RStudio Desktop.”
Scroll down to the “Installers for Supported Platforms” section.
Download the appropriate installer for your operating system.
Once the download is complete, run the installer and follow the instructions provided.
During the installation process, you may be asked to choose customization options. You can typically leave the default settings unless you have specific preferences.
Complete the installation process.
After following these steps, you should have R and RStudio installed on your computer. You can then launch RStudio, and it should automatically detect your R installation and configure itself accordingly.
RStudio is an integrated development environment (IDE) specifically designed for R, a popular programming language for statistical computing and data analysis. RStudio provides a user-friendly interface that enhances the workflow of R users, making it easier to write, run, and debug R code.
Here are the main components of the RStudio environment:
Console: The RStudio Console is where you can directly interact with the R interpreter. You can type and execute R commands here, and the results will be displayed immediately.
Script Editor: The Script Editor is where you write your R code. It provides features like syntax highlighting, code completion, and code formatting to make writing code easier and more efficient. You can save your code as scripts for future use.
Environment and History: The Environment tab shows the objects (variables, functions, etc.) currently in memory. You can view and manipulate these objects. The History tab displays the commands you have executed in the Console, making it easy to review and rerun previous commands.
Files, Plots, Packages, Help: These tabs provide access to various tools and resources. The Files tab allows you to navigate your computer’s file system and manage files. The Plots tab displays graphical outputs generated by your R code. The Packages tab lets you manage installed packages and install new ones. The Help tab provides access to R documentation and other resources.
Viewer: The Viewer pane allows you to view HTML content generated by R, such as interactive plots or reports.
Version Control Integration: RStudio has built-in support for version control systems like Git. You can manage your code repositories, commit changes, and collaborate with others directly from the IDE.
Customizable Layout: RStudio’s layout is highly customizable. You can arrange the different panes according to your preferences, detach them into separate windows, or create different layouts for different projects.
These are the main components and features of the RStudio environment. RStudio aims to provide a user-friendly and efficient interface for R programming, making it easier to work with R and analyze data.
Here are some basic syntax and data types in R:
Variables and Assignment: Variables are used to store values in
R. You can assign a value to a variable using the assignment operator
<-
or =
. For example:
x <- 10
y = 5
Data Types: R has several basic data types:
x <- 10
,
y <- 3.14
name <- "John"
TRUE
or FALSE
). Example:
is_valid <- TRUE
z <- 5L
cplx <- 2 + 3i
Vectors: A vector is a basic data structure in R that can hold
multiple values of the same data type. You can create a vector using the
c()
function. For example:
numbers <- c(1, 2, 3, 4, 5)
fruits <- c("apple", "banana", "orange")
Matrices: A matrix is a two-dimensional data structure in R with
rows and columns. You can create a matrix using the
matrix()
function. For example:
my_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
Data Frames: A data frame is a tabular data structure in R
similar to a table in a database or a spreadsheet. It can hold different
data types. You can create a data frame using the
data.frame()
function. For example:
my_data <- data.frame(name = c("John", "Jane", "Adam"), age = c(25, 30, 35))
Lists: A list is a versatile data structure in R that can hold
different types of objects, including vectors, matrices, data frames,
and even other lists. You can create a list using the
list()
function. For example:
my_list <- list(numbers = c(1, 2, 3), fruits = c("apple", "banana", "orange"))
These are just some of the basic syntax and data types in R. R offers many more advanced features and data structures, making it a powerful language for data analysis and statistical computing.
Managing R packages involves various tasks such as installing, updating, and removing packages. R packages are collections of functions, data, and documentation that extend the functionality of the R programming language. Here’s a guide on how to manage R packages effectively:
install.packages()
function. For example, to install the “dplyr” package, you would run:
install.packages("dplyr")
.library()
or
require()
functions to load a package. For example, to load
the “dplyr” package, you would run: library(dplyr)
.update.packages()
function. For example, to update all
installed packages, run: update.packages()
. You can also
selectively update specific packages by providing their names as
arguments.installed.packages()
function. Running this function will
return a data frame with information about each installed package,
including the package name, version, and other details.remove.packages()
function. For example, to remove the
“dplyr” package, run: remove.packages("dplyr")
. This will
remove the package and its associated files from your R
environment.install.packages()
function with the version
parameter. For example:
install.packages("dplyr", version = "0.8.5")
.Remember to periodically update your packages to stay up-to-date with the latest improvements and bug fixes. It’s good practice to document the packages you use in your projects to ensure reproducibility.
In R, there are several methods available for importing and exporting data. Here are some common approaches:
read.csv()
function. For example:data <- read.csv("data.csv")
readxl
package. First, install the package using
install.packages("readxl")
, and then load it using
library(readxl)
. You can then use the
read_excel()
function to read the data. For example:library(readxl)
data <- read_excel("data.xlsx")
jsonlite
package for JSON,
xml2
package for XML, and haven
package for
SPSS files.write.csv()
function. For example:write.csv(data, "output.csv", row.names = FALSE)
write.xlsx()
function from the openxlsx
package. First, install the package using
install.packages("openxlsx")
, and then load it using
library(openxlsx)
. You can then use the
write.xlsx()
function to write the data. For example:library(openxlsx)
write.xlsx(data, "output.xlsx", row.names = FALSE)
## Warning: Please use 'rowNames' instead of 'row.names'
jsonlite
package for JSON, xml2
package for XML, and
haven
package for SPSS files.These are just a few examples of importing and exporting data in R. Depending on your specific needs and the format of your data, you may need to explore additional packages or functions.
Data cleaning and preprocessing are essential steps in data analysis. R provides a variety of packages and functions to perform these tasks efficiently. Here are some common techniques for data cleaning and preprocessing in R:
is.na()
or
complete.cases()
to identify missing values in your
dataset.na.omit()
or fill missing values
using functions like na.fill()
or
tidyr::replace_na()
.mice
package
provides multiple imputation methods.scale()
or
caret::preProcess()
to scale numeric variables to a common
range (e.g., 0 to 1) or standardize them.log()
) to reduce the skewness of highly skewed
variables.model.matrix()
or the
dummy_cols()
function from the dplyr
package.tolower()
or toupper()
to
lowercase or uppercase, and stringr::str_replace()
for
pattern-based replacements.gsub()
or regular expressions
(regex
) for pattern matching and replacement.as.Date()
or
lubridate::ymd()
to convert date/time strings into proper
date/time objects.month()
, year()
,
day()
, etc.These are just a few examples of common data cleaning and
preprocessing techniques in R. The choice of techniques will depend on
the specific characteristics of your dataset and the goals of your
analysis. R packages like tidyverse
, dplyr
,
tidyr
, and stringr
provide a rich set of
functions that can greatly simplify these tasks.
For an in-depth discussion on this topic, refer to Chaper 5 of the R For Data Science textbook https://r4ds.had.co.nz/transform.html
For an in-depth discussion on this topic, refer to Section 7.4 of the R For Data Science textbook https://r4ds.had.co.nz/exploratory-data-analysis.html#missing-values-2
Exploratory Data Analysis (EDA) is of paramount importance in quantitative finance. It serves as a crucial initial step in the data analysis process and provides valuable insights and understanding of the financial data. Here are some reasons why EDA is significant in quantitative finance:
Data Quality Assessment: EDA helps in assessing the quality and integrity of the financial data. It involves checking for missing values, outliers, data inconsistencies, and potential errors. By identifying and addressing these issues, EDA ensures that the subsequent analysis is based on reliable and accurate data.
Pattern Identification: EDA techniques help in identifying patterns and relationships within the financial data. By visualizing the data through plots, charts, and graphs, analysts can observe trends, seasonality, cyclicality, and other patterns. These insights are vital for making informed decisions in quantitative finance, such as identifying trading opportunities or estimating risk.
Feature Selection: In quantitative finance, it is essential to identify relevant features or variables that impact financial outcomes. EDA facilitates the exploration of different variables and their relationships with the target variable. By analyzing correlations, distributions, and statistical measures, analysts can select the most significant features for further analysis and modeling.
Risk Assessment: EDA aids in understanding and quantifying risks associated with financial instruments or portfolios. It allows analysts to examine the distribution and volatility of returns, identify extreme events, and measure risk metrics such as value-at-risk (VaR) and conditional value-at-risk (CVaR). By exploring historical data and conducting scenario analyses, EDA helps in evaluating and managing risks effectively.
Model Assumptions: Quantitative finance often involves developing mathematical models to predict asset prices, portfolio performance, or other financial indicators. EDA helps in validating the assumptions underlying these models. Analysts can assess whether the data adheres to the model assumptions, such as normality, stationarity, or independence. If the assumptions are violated, appropriate adjustments or alternative models can be considered.
Strategy Development: EDA plays a crucial role in developing trading strategies and investment approaches. By thoroughly analyzing historical price data, trading volumes, and other relevant factors, analysts can uncover patterns, market inefficiencies, and opportunities for alpha generation. EDA allows for backtesting and fine-tuning of strategies based on historical performance, helping traders and investors make more informed decisions.
In summary, EDA is essential in quantitative finance as it enables data quality assessment, pattern identification, feature selection, risk assessment, validation of model assumptions, and strategy development. It provides a solid foundation for subsequent analysis, modeling, and decision-making processes in the field of finance.
Visualizing financial data using R and ggplot2 can be a powerful way to explore and communicate insights from your data. ggplot2 is a popular data visualization package in R that provides a flexible and intuitive grammar of graphics. Here’s a step-by-step guide to getting started with visualizing financial data using R and ggplot2:
Step 1: Install and load the necessary packages: First, you need to make sure you have the ggplot2 package installed. If you haven’t installed it yet, you can do so by running the following command:
#install.packages("ggplot2")
#install.packages("quantmod")
Once installed, load the ggplot2 package into your R session using the following command:
library(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
options(xts.warn_dplyr_breaks_lag = FALSE)
Step 2: Prepare your financial data: Before
visualizing your financial data, you need to prepare it in a suitable
format. Let’s assume you have a dataset called
financial_data
with columns such as “Date,” “Price,” and
“Volume.” Make sure your data is loaded into your R session.
Step 3: Create basic plots: You can start by creating basic plots to get an overview of your financial data. Here’s an example of how to create a line plot of stock prices over time:
getSymbols("AAPL")
## [1] "AAPL"
financial_data = data.frame(Date=index(AAPL), coredata(AAPL))
ggplot(data = financial_data, aes(x = Date, y = AAPL.Close)) +
geom_line() +
labs(title = "Stock Prices Over Time", x = "Date", y = "Price")
This code specifies the data and mapping aesthetics (x and y) in the
ggplot()
function. The geom_line()
layer is
used to draw the line plot. The labs()
function is used to
provide a title and labels for the x and y axes.
Step 4: Customize your plot: You can customize your plot further by adding additional layers, changing colors, adding annotations, and modifying the axis scales. Here’s an example that adds a moving average line and customizes the appearance:
ggplot(data = financial_data, aes(x = Date, y = AAPL.Close)) +
geom_line(color = "steelblue") +
geom_smooth(aes(y = AAPL.Close), method = "loess", se = TRUE, color = "red") +
labs(title = "Stock Prices Over Time", x = "Date", y = "Price") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
In this code, the geom_smooth()
layer is used to add a
smooth line (in this case, a loess regression line) to the plot. The
color
argument is used to specify the colors of the lines.
The theme_minimal()
function changes the theme of the plot
to a minimalistic style.
Step 5: Explore other plot types: ggplot2 offers a wide range of plot types that can be useful for visualizing financial data. Some common plot types include bar charts, scatter plots, and candlestick charts. Explore the ggplot2 documentation and examples to learn about different plot types and how to create them.
Step 6: Save and export your plot: Once you’re
satisfied with your plot, you can save it as an image file using the
ggsave()
function. Here’s an example of how to save your
plot as a PNG file:
ggsave(filename = "financial_plot.png", plot = last_plot(), width = 10, height = 6)
## `geom_smooth()` using formula = 'y ~ x'
This code saves the last plot created in your R session as a PNG file named “financial_plot.png” with a width of 10 inches and a height of 6 inches. Adjust the filename and dimensions as needed.
That’s it! You now have a basic guide to visualizing financial data using R and ggplot2. Feel free to explore the various customization options and plot types offered by ggplot2 to create informative and visually appealing visualizations of your financial data.
In R, you can use various functions and packages to perform descriptive statistics and data summaries for quantitative finance. Let’s go through some common tasks step-by-step:
list.of.packages <- c("quantmod", "PerformanceAnalytics","dplyr","psych", "TSPred")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
library(PerformanceAnalytics, warn.conflicts = FALSE, quietly = TRUE)
library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
library(psych, warn.conflicts = FALSE, quietly = TRUE)
library(TSPred, warn.conflicts = FALSE, quietly = TRUE)
Data retrieval and exploration: Assuming you have financial data
available, you can retrieve it using quantmod
. For this
example, we’ll continue to use the AAPL
data table, which
has up-to-date Apple financial information:
Data summary and descriptive statistics: Now, let’s calculate the summary statistics for the financial data:
summary(financial_data)
## Date AAPL.Open AAPL.High AAPL.Low
## Min. :2007-01-03 Min. : 2.835 Min. : 2.929 Min. : 2.793
## 1st Qu.:2011-03-24 1st Qu.: 12.186 1st Qu.: 12.282 1st Qu.: 12.041
## Median :2015-06-17 Median : 26.848 Median : 27.075 Median : 26.573
## Mean :2015-06-16 Mean : 49.470 Mean : 50.012 Mean : 48.951
## 3rd Qu.:2019-09-09 3rd Qu.: 55.812 3rd Qu.: 56.343 3rd Qu.: 55.093
## Max. :2023-11-29 Max. :196.240 Max. :198.230 Max. :195.280
## AAPL.Close AAPL.Volume AAPL.Adjusted
## Min. : 2.793 Min. :2.405e+07 Min. : 2.367
## 1st Qu.: 12.161 1st Qu.:1.036e+08 1st Qu.: 10.308
## Median : 26.832 Median :2.044e+08 Median : 24.424
## Mean : 49.504 Mean :3.590e+08 Mean : 47.710
## 3rd Qu.: 55.898 3rd Qu.:4.845e+08 3rd Qu.: 53.655
## Max. :196.450 Max. :3.373e+09 Max. :195.927
# If you want more specific statistics, you can use the psych package:
describe(financial_data)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
## vars n mean sd median trimmed
## Date 1 4257 NaN NA NA NaN
## AAPL.Open 2 4257 49.47 53.63 26.85 40.37
## AAPL.High 3 4257 50.01 54.25 27.08 40.81
## AAPL.Low 4 4257 48.95 53.08 26.57 39.93
## AAPL.Close 5 4257 49.50 53.69 26.83 40.39
## AAPL.Volume 6 4257 359041903.62 370722582.53 204420400.00 288190477.63
## AAPL.Adjusted 7 4257 47.71 53.83 24.42 38.35
## mad min max range skew kurtosis
## Date NA Inf -Inf -Inf NA NA
## AAPL.Open 27.87 2.84 1.96240e+02 1.934000e+02 1.31 0.29
## AAPL.High 28.10 2.93 1.98230e+02 1.953000e+02 1.31 0.28
## AAPL.Low 27.57 2.79 1.95280e+02 1.924900e+02 1.31 0.31
## AAPL.Close 27.78 2.79 1.96450e+02 1.936600e+02 1.31 0.29
## AAPL.Volume 189248256.12 24048300.00 3.37297e+09 3.348921e+09 2.17 6.64
## AAPL.Adjusted 26.71 2.37 1.95930e+02 1.935600e+02 1.33 0.33
## se
## Date NA
## AAPL.Open 0.82
## AAPL.High 0.83
## AAPL.Low 0.81
## AAPL.Close 0.82
## AAPL.Volume 5681947.50
## AAPL.Adjusted 0.83
# Determine the log returns
AAPL$AAPL.Returns = diff(log(AAPL$AAPL.Adjusted))
AAPL = na.omit(AAPL)
# Common performance metrics
annualized_returns <- Return.annualized(AAPL$AAPL.Returns, scale = 252) # take
sharpe_ratio <- SharpeRatio(AAPL$AAPL.Returns)
maximum_drawdown <- maxDrawdown(AAPL$AAPL.Returns, invert = FALSE)
#Microsoft financial information
getSymbols("MSFT")
## [1] "MSFT"
MSFT$MSFT.Returns = diff(log(MSFT$MSFT.Adjusted))
MSFT = na.omit(MSFT)
#Google financial information
getSymbols("GOOG")
## [1] "GOOG"
GOOG$GOOG.Returns = diff(log(GOOG$GOOG.Adjusted))
GOOG = na.omit(GOOG)
returns = cbind(AAPL$AAPL.Returns, GOOG$GOOG.Returns, MSFT$MSFT.Returns)
correlation_matrix <- cor(returns)
covariance_matrix <- cov(returns)
covariance_matrix
## AAPL.Returns GOOG.Returns MSFT.Returns
## AAPL.Returns 0.0004034716 0.0002203686 0.000205222
## GOOG.Returns 0.0002203686 0.0003487534 0.000209437
## MSFT.Returns 0.0002052220 0.0002094370 0.000319617
correlation_matrix
## AAPL.Returns GOOG.Returns MSFT.Returns
## AAPL.Returns 1.0000000 0.5874677 0.5714820
## GOOG.Returns 0.5874677 1.0000000 0.6273058
## MSFT.Returns 0.5714820 0.6273058 1.0000000
returns = data.frame(Date=index(returns), coredata(returns))
hist(returns$AAPL.Returns, breaks = 30, main = "Histogram of Returns", xlab = "Returns") #do we see a normal distribution?
plot(density(returns$AAPL.Returns), main = "Density Plot of Returns", xlab = "Returns")
Remember that these are just basic examples to get you started. In practice, you might need to customize your analysis and summaries based on your specific data and research questions in quantitative finance.
Time series analysis and visualization are essential components of quantitative analysis in R. R offers various libraries and functions that make it easy to handle time series data, perform statistical analysis, and create insightful visualizations. In this response, I’ll provide a brief overview of the steps involved in time series analysis and visualization in R.
ts
, xts
,
zoo
, forecast
, and ggplot2
for
visualization.# Install and load required libraries
#list.of.packages <- c("quantmod", "PerformanceAnalytics","dplyr","psych", "TSPred","ts","xts","zoo","forecast","ggpplot2","plotly")
#new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
#if(length(new.packages)) install.packages(new.packages)
library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
library(PerformanceAnalytics, warn.conflicts = FALSE, quietly = TRUE)
library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
library(psych, warn.conflicts = FALSE, quietly = TRUE)
library(TSPred, warn.conflicts = FALSE, quietly = TRUE)
library(forecast, warn.conflicts = FALSE, quietly = TRUE)
library(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
library(plotly, warn.conflicts = FALSE, quietly = TRUE)
# Candlestick plot
getSymbols("AAPL")
## [1] "AAPL"
df <- data.frame(Date=index(AAPL),coredata(AAPL))
df <- tail(df, 90)
fig <- df %>% plot_ly(x = ~Date, type="candlestick",
open = ~AAPL.Open, close = ~AAPL.Close,
high = ~AAPL.High, low = ~AAPL.Low)
fig <- fig %>% layout(title = "Basic Candlestick Chart")
fig
# Decompose the time series
x <- c(-50, 175, 149, 214, 247, 237, 225, 329, 729, 809,
530, 489, 540, 457, 195, 176, 337, 239, 128, 102, 232, 429, 3,
98, 43, -141, -77, -13, 125, 361, -45, 184)
x <- ts(x, start = c(1951, 1), end = c(1958, 4), frequency = 4)
decomposed <- decompose(x)
# Plot decomposition components
plot(decomposed)
forecast
library provides many useful functions for this
purpose.# Fit a forecasting model (e.g., ARIMA)
model <- auto.arima(x)
# Forecast future values
forecast_values <- forecast(model, h = 12) # Forecasting 12 steps ahead
# Plot the forecasted values
plot(forecast_values, main = "Forecasted Values", xlab = "Time", ylab = "Value", col = "red")
# Combine historical data and forecast
combined_data <- c(x, forecast_values$mean)
plot(combined_data, main = "Historical Data and Forecast", xlab = "Time", ylab = "Value", col = c(rep("blue", length(x)), rep("red", length(forecast_values$mean))))
These are the basic steps for time series analysis and visualization in R. Depending on your specific analysis goals, you may need to explore additional techniques and models. R provides a wide range of tools and packages to handle more advanced time series tasks as well.
For this week, we’ll be working on the fundamental knowledge of finance and understanding the market performance. To best understand the market, we’ll build a series of financial models and observe which models are of best-fit.
The time-value-of-money is a financial principle that states that the value of money changes over time due to factors like inflation, interest rates, and the potential to earn returns on investment. It suggests that a sum of money available today is worth more than the same amount in the future because of its potential earning capacity or investment opportunity.
This concept is crucial in various financial decisions, like evaluating investments, loans, and understanding the significance of receiving money today versus receiving the same amount in the future. It’s the backbone of many financial calculations, including present value, future value, annuities, and determining rates of return.
Present Value (PV) represents the current worth of a future sum of money or a series of cash flows, discounted at a specified rate to reflect their value in today’s terms.
The basic formula for present value is:
\[ PV = \frac{FV}{(1 + r)^n} \]
Future Value represents the value of an investment at a specific date in the future, given a specified rate of return.
The basic formula for future value is:
\[ FV = PV \times (1 + r)^n \]
When interest is compounded more frequently than annually, such as monthly, it affects the calculation of both future value and present value (PV). Compounding more frequently leads to a higher effective interest rate and influences the growth or discounting of the investment or cash flow. This process is known as compound interest.
The formula for FV with monthly compounding is:
\[ FV = PV \times \left(1 + \frac{r}{n}\right)^{nt} \]
With monthly compounding, the nominal annual interest rate (\(r\)) needs to be divided by the number of compounding periods per year (\(n\)). The total number of compounding periods (\(nt\)) is calculated by multiplying the number of years (\(t\)) by the number of compounding periods per year (\(n\)).
The formula for PV with monthly compounding involves discounting future cash flows back to the present:
\[ PV = \frac{FV}{\left(1 + \frac{r}{n}\right)^{nt}} \]
The key difference is incorporating the adjusted interest rate (\(r/n\)) and the total number of compounding periods (\(nt\)).
The natural constant, denoted by the letter ‘e’, is a fundamental mathematical constant approximately equal to 2.71828. ‘e’ is a critical element when discussing exponential growth or decay, and it plays a significant role in compound interest calculations, particularly when compounding occurs continuously.
In finance, continuous compounding assumes that interest is compounded an infinite number of times over a continuous period. The formula for calculating future value with continuous compounding is:
\[ FV = PV \times e^{rt} \]
Now, consider the following: \[FV=PV \times e^{rt} \] \[ \begin{align} &\therefore \frac {FV}{PV}= e^{rt}\\ &\therefore ln(\frac {FV}{PV}) = ln(e^{rt})\\ &\therefore ln(\frac {FV}{PV}) = rt\\ &\therefore rt = ln(FV) - ln(PV)\\ \end{align} \]
It is conventional to consider the time-factor in the following manner: \[ ln(\frac {FV_t}{PV}) = rt, \] or \[ R(t)= ln(FV_t) - ln(PV)) \]
You might recognize this formula in earlier texts. Moving forward, we will assume that the rate of return, \(R(t)\), as the log of the difference between prices at different times of the same stock.
Time-series modeling in R for financial data is a powerful way to analyze and forecast trends, patterns, and behaviors in stock prices, market indices, or other financial metrics over time.
Time-series models are diverse, offering various approaches to capture and predict patterns within sequential data. The common types include:
Each model has its strengths and is suited to different types of time-series data. Choosing the appropriate model involves understanding the data’s characteristics, such as trend, seasonality, stationarity, and the nature of dependencies within the series. Model selection may involve experimentation and testing to determine the best fit for a particular dataset or forecasting task.
In cognicance of time, we will consider the following Models: 1. Exponential smoothing (Simple Exponential Smoothing) 2. ARIMA model 3. Various GARCH models
library(rugarch, warn.conflicts = FALSE, quietly = TRUE)
library(forecast, warn.conflicts = FALSE, quietly = TRUE)
library(tseries, warn.conflicts = FALSE, quietly = TRUE)
library(tidyquant, warn.conflicts = FALSE, quietly = TRUE)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
library(tidyverse, warn.conflicts = FALSE, quietly = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ purrr 1.0.1 ✔ tibble 3.2.1
## ✔ readr 2.1.4 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%() masks ggplot2::%+%()
## ✖ psych::alpha() masks ggplot2::alpha()
## ✖ plotly::filter() masks dplyr::filter(), stats::filter()
## ✖ dplyr::first() masks xts::first()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks xts::last()
## ✖ purrr::reduce() masks rugarch::reduce()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(PerformanceAnalytics, warn.conflicts = FALSE, quietly = TRUE)
library(tseries, warn.conflicts = FALSE, quietly = TRUE)
For our models, we’ll be working with the Johannesburg-listed NASPERS data from 2008 to 2019.
Exponential smoothing methods are a class of forecasting techniques used in time-series analysis to predict future values based on weighted averages of past observations, with more recent observations receiving greater weight. These methods are particularly useful for data with trend and seasonality. For our scenarios, we will consider the Simple Exponential Smoothing (SES), but for further reading, I’ve also included Double Exponential Smoothing, and Triple Exponential Smoothing (Holt-Winters).
Consider the below modelling and forecast plot of the SES model.
getSymbols("NPN.JO", from = "2008-01-01", to = "2019-12-31")
## [1] "NPN.JO"
Naspers = ts(NPN.JO$NPN.JO.Adjusted, frequency = 21) # group the daily data into months
train.data = window(Naspers, end = c(120,19)) #training data for 10 years
test.data = window(Naspers, start = c(120,20)) #test data for 1 year
brown = ses(train.data, h = 566)
plot(brown, main = "Brown Exponential smoothing", ylab = "Price", xlab = "Month")
lines(test.data, lty = 3)
The Exponential smoothing methods are relatively simple yet effective for forecasting time-series data, especially when the data exhibit trend, seasonality, or both. It is indeed worth exploring whether the more complex exponential smoothing models are of better fit, but we can conclusively demonstrate that the SES model failed to capture the forecasted prices.
The Autoregressive Integrated Moving Average (ARIMA) model is a widely used time-series forecasting method that combines autoregression (AR), differencing (I for integrated), and moving average (MA) components to handle different types of time-series data. It’s a versatile model suitable for both stationary and non-stationary data exhibiting trends and seasonality.
getSymbols("NPN.JO", from = "2010-01-01", to = "2019-12-31")
## [1] "NPN.JO"
returns = RETURN(NPN.JO$NPN.JO.Adjusted)
returns = na.omit(returns)
adf.test(returns)
## Warning in adf.test(returns): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: returns
## Dickey-Fuller = -14.104, Lag order = 13, p-value = 0.01
## alternative hypothesis: stationary
#If p-value is less than 0.05, then we do not reject the Null Hypothesis, i.e., we assume stationarity on this model
auto.arima(returns) #determines the optimal number of parameters for returns, in this case ARIMA(1,0,1)
## Series: returns
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.8451 -0.8837 0.0012
## s.e. 0.1281 0.1135 0.0003
##
## sigma^2 = 0.0003879: log likelihood = 6470.9
## AIC=-12933.79 AICc=-12933.78 BIC=-12910.37
ARIMAModel = arima(returns["2008::2018"], c(1,0,1)) # based on the auto.arima function
ARIMAModel
##
## Call:
## arima(x = returns["2008::2018"], order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.7840 -0.8321 0.0012
## s.e. 0.0954 0.0856 0.0003
##
## sigma^2 estimated as 0.000397: log likelihood = 5820.22, aic = -11632.44
#ARIMA(1,0,1) = 0.0012 + 0.7840returns(t-1) -0.8321e(t-1)
############### Diagnostic Checks
#Residual Parameter:
#There should be no correlation in the residuals
#Rssiduals should have zero mean
#Normal distributed residuals
et = residuals(ARIMAModel)
acf(et) ##is there correlation between the residuals?
plot.ts(et) #Test for zero mean
chart.Histogram(et,
methods = c("add.density", "add.normal"),
colorset = c("blue", "green", "red"))
There are three considerations when considering an ARIMA model:
forecast_ARIMA = forecast(ARIMAModel,300)
returns.df = as.data.frame(returns)
plot(forecast_ARIMA, xlab = "Time", ylab = "Returns",
main = "ARIMA Forecast vs Actual Data")
lines(returns.df, col = "blue")
Although ARIMA models are effective for modeling and forecasting time-series data with trend and seasonality, they might require careful tuning of parameters and diagnostic checks to ensure model adequacy and accuracy in capturing the underlying patterns in the data. For our demonstration, the model does not seem to satisfy the assumptions necessary to assume an ARIMA model. This is more evident when looking at the forecasted values against the actual values. The time-series fails to capture 95% of the returns within the given year.
Let’s consider the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model. The GARCH model is a statistical model used to describe the variance in a time series, particularly in financial markets. It’s employed to capture the volatility clustering often observed in financial data, where periods of high volatility tend to be followed by more high-volatility periods and vice versa.
The GARCH model is composed of the following:
Autoregressive (AR) Component: The AR component in a GARCH model signifies that the variance or volatility at a particular time point is dependent on previous variance values. It captures the persistence of volatility changes.
Conditional Heteroskedasticity: The “heteroskedasticity” part refers to the changing variance over time. In financial markets, volatility isn’t constant; it fluctuates. GARCH models accommodate this by allowing volatility to vary over time based on past observations.
Generalized Structure: The “generalized” aspect means it’s an extension of ARCH (Autoregressive Conditional Heteroskedasticity) models. GARCH models incorporate not only past squared errors (like in ARCH) but also past variance values to forecast future variance.
One way to demonstrate that financial data indeed does have heteroskedasticity, is to plot the rolling volatility of the asset’s price over time. Using the JSE-listed Naspers Limited (APN.JO), we’ll demonstrate that indeed volatility within financial instruments is unlikely to be homoskedastic
getSymbols("NPN.JO", from = "2008-01-01", to = "2020-12-31")
## [1] "NPN.JO"
Returns = RETURN(NPN.JO$NPN.JO.Adjusted)
Returns = na.omit(Returns)
chart.RollingPerformance(R = Returns["2008::2019"],
width = 252,
FUN = "sd.annualized",
scale = 252, main = "Naspers' Yearly Rolling Volatility")
The above, in addition to a demonstrated non-constant mean, suggest that Naspers’ returns cannot be optimally modeled using the usual ARIMA model. Now, consider the Standard GARCH (sGARCH) model with a constant mean, invariant to time. Using the same NASPERS data, we’ll specify the Mean model, the Variance model and the distribution of Error.
#Considering an sGARCH model for NASPERS
spec <- ugarchspec(variance.model = list(model = "sGARCH"),
mean.model = list(armaOrder = c(0, 0)), #
distribution.model = "norm")
fit <- ugarchfit(spec, data = Returns)
From theory, we know that returns and volatility are given as follows:
\[ R_t = \mu + \omega_t \] and, \[\sigma^2_t = \omega +\alpha_1\epsilon^2_{t-1} + \beta_1\sigma^2_{t - 1}\]
Hence the Model equation is as follows:
fit
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : norm
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001577 0.000326 4.8445 1e-06
## omega 0.000014 0.000001 15.1329 0e+00
## alpha1 0.070523 0.004961 14.2150 0e+00
## beta1 0.899076 0.007176 125.2843 0e+00
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001577 0.000299 5.2805 0
## omega 0.000014 0.000001 10.4756 0
## alpha1 0.070523 0.006065 11.6278 0
## beta1 0.899076 0.008665 103.7652 0
##
## LogLikelihood : 8289.683
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.9689
## Bayes -4.9616
## Shibata -4.9689
## Hannan-Quinn -4.9663
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.849 0.17395
## Lag[2*(p+q)+(p+q)-1][2] 5.451 0.03083
## Lag[4*(p+q)+(p+q)-1][5] 8.423 0.02311
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 6.235 0.01253
## Lag[2*(p+q)+(p+q)-1][5] 6.626 0.06383
## Lag[4*(p+q)+(p+q)-1][9] 11.276 0.02640
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.06551 0.500 2.000 0.7980
## ARCH Lag[5] 0.44958 1.440 1.667 0.8984
## ARCH Lag[7] 5.93486 2.315 1.543 0.1457
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 35.496
## Individual Statistics:
## mu 0.02616
## omega 3.15912
## alpha1 0.09933
## beta1 0.10680
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.07 1.24 1.6
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.7248 0.468653
## Negative Sign Bias 2.9495 0.003205 ***
## Positive Sign Bias 1.0163 0.309560
## Joint Effect 15.8782 0.001201 ***
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 148.4 4.411e-22
## 2 30 144.0 3.385e-17
## 3 40 233.2 1.716e-29
## 4 50 280.9 2.766e-34
##
##
## Elapsed time : 1.208936
For this reason, we have:
\[ R_t = \mu + \epsilon_t \] \[\implies R_t = 0.001577 + \epsilon_t \]
whilst, \[\sigma^2_t = \omega +\alpha_1\epsilon^2_{t-1} + \beta_1\sigma^2_{t - 1}\] \[\implies \sigma^2_t = 0.000014 + 0.070522\epsilon_{t-1} + 0.899078\]
Note that the parameter estimates of our model have p-values less than 0.05. For this reason, the parameters are statistically significant with 95% confidence. To compare this model, against others, we also use the following tests:
plot(fit, which = "all")
##
## please wait...calculating quantiles...
plot(fit, which = 2)
##
## please wait...calculating quantiles...
You can call the forecasted mean- and volatility-models in the following manner:
forecasts = ugarchforecast(fitORspec = fit,
n.ahead = 20) #generate 20 data points in the future
plot(fitted(forecasts)) #plot the mean, remember we assumed constant mean
plot(sigma(forecasts)) #plot the forecasted volatility
In this section, we will consider tweaking a few assumptions in our GARCH model and determine whether the means justify the end. In other words, we will test whether the adjustments improve our model or unjustifiably complexify it.
These adjustments are known in the following convention:
Note that the below code has only altered the
distribution.model
from “norm” to “sstd”.
spec <- ugarchspec(variance.model = list(model = "sGARCH"),
mean.model = list(armaOrder = c(0, 0)),
distribution.model = "sstd")
fitSSTD <- ugarchfit(spec, data = Returns)
fitSSTD
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001533 0.000325 4.7148 2e-06
## omega 0.000014 0.000002 7.5490 0e+00
## alpha1 0.082643 0.004134 19.9912 0e+00
## beta1 0.889553 0.010003 88.9249 0e+00
## skew 0.986921 0.022308 44.2401 0e+00
## shape 6.901228 0.713409 9.6736 0e+00
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001533 0.000302 5.0792 0
## omega 0.000014 0.000003 5.3097 0
## alpha1 0.082643 0.008711 9.4867 0
## beta1 0.889553 0.009885 89.9922 0
## skew 0.986921 0.021205 46.5414 0
## shape 6.901228 0.966134 7.1431 0
##
## LogLikelihood : 8344.153
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.0004
## Bayes -4.9894
## Shibata -5.0004
## Hannan-Quinn -4.9965
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.697 0.19267
## Lag[2*(p+q)+(p+q)-1][2] 5.231 0.03527
## Lag[4*(p+q)+(p+q)-1][5] 8.084 0.02808
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.102 0.04283
## Lag[2*(p+q)+(p+q)-1][5] 4.417 0.20645
## Lag[4*(p+q)+(p+q)-1][9] 9.737 0.05706
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.0124 0.500 2.000 0.91134
## ARCH Lag[5] 0.6273 1.440 1.667 0.84551
## ARCH Lag[7] 6.9377 2.315 1.543 0.08957
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 24.657
## Individual Statistics:
## mu 0.02368
## omega 3.04717
## alpha1 0.31264
## beta1 0.24096
## skew 0.13179
## shape 0.91072
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.8848 0.376332
## Negative Sign Bias 2.3442 0.019128 **
## Positive Sign Bias 0.6619 0.508084
## Joint Effect 12.4548 0.005977 ***
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 86.26 1.515e-10
## 2 30 115.34 2.994e-12
## 3 40 186.69 3.622e-21
## 4 50 159.29 1.370e-13
##
##
## Elapsed time : 0.8552871
plot(fitSSTD, which = "all")
##
## please wait...calculating quantiles...
Note that there are now three notable differences, namely (1), the
individual statistics now also include skewness and shape factors, (2),
the Adjusted Pearson Goodness-of-Fit tests are now less than 0.05, and
finally (3), the QQ-plot’s emperical data aligns closer with the
theoretical data. Based on these three points, we can conclude that this
adjustment represents a better model for the data.
The GJR-GARCH model, named after Glosten-Jagannathan-Runkle, also known as the Threshold GARCH model, is an extension of the basic GARCH model that incorporates asymmetry in volatility responses to shocks.
In a GJR-GARCH model:
Threshold Effect: It introduces an additional parameter to account for asymmetry. This parameter captures the idea that volatility responses might be different for positive and negative shocks or that the impact of shocks might vary depending on certain conditions or thresholds.
Asymmetric Volatility: The model allows for different responses to positive and negative shocks. For example, it might imply that negative shocks have a larger impact on volatility than positive shocks, capturing the observed tendency of volatility to increase more after negative events.
The GJR-GARCH model is expressed as:
\[ \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \gamma \varepsilon_{t-1}^2 I_{t-1<0} + \beta \sigma_{t-1}^2 \]
Where: - \(\sigma_t^2\) is the conditional variance at time \(t\). - \(\omega\) is the intercept. - \(\alpha\), \(\beta\), and \(\gamma\) are parameters. - \(\varepsilon_{t-1}^2\) is the squared error term at time \(t-1\). - \(I_{t-1<0}\) is an indicator function that takes the value 1 when \(\varepsilon_{t-1}\) is negative and 0 otherwise.
The GJR-GARCH model allows for more nuanced modeling of volatility dynamics by considering asymmetry in how shocks affect volatility. This asymmetry can be particularly relevant in financial markets, where negative events may have different effects on volatility compared to positive events.
Let’s get into the code, note that the only adjustment to Model 1 was
updating the variance.model
from sGARCH to gjrGARCH.
spec <- ugarchspec(variance.model = list(model = "gjrGARCH"),
mean.model = list(armaOrder = c(0, 0)), #
distribution.model = "sstd")
fitGJR <- ugarchfit(spec, data = Returns)
fitGJR
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : gjrGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001131 0.000321 3.5203 0.000431
## omega 0.000015 0.000002 9.4008 0.000000
## alpha1 0.026258 0.004619 5.6844 0.000000
## beta1 0.891690 0.009269 96.1988 0.000000
## gamma1 0.104451 0.018004 5.8016 0.000000
## skew 0.970917 0.022169 43.7955 0.000000
## shape 7.318456 0.811349 9.0201 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001131 0.000307 3.6883 0.000226
## omega 0.000015 0.000002 7.2755 0.000000
## alpha1 0.026258 0.008971 2.9270 0.003422
## beta1 0.891690 0.008689 102.6194 0.000000
## gamma1 0.104451 0.020464 5.1041 0.000000
## skew 0.970917 0.020781 46.7216 0.000000
## shape 7.318456 0.977729 7.4852 0.000000
##
## LogLikelihood : 8360.958
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.0099
## Bayes -4.9970
## Shibata -5.0099
## Hannan-Quinn -5.0053
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.153 0.28285
## Lag[2*(p+q)+(p+q)-1][2] 5.503 0.02985
## Lag[4*(p+q)+(p+q)-1][5] 8.933 0.01721
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.381 0.03633
## Lag[2*(p+q)+(p+q)-1][5] 4.804 0.16943
## Lag[4*(p+q)+(p+q)-1][9] 10.278 0.04372
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.0000222 0.500 2.000 0.99624
## ARCH Lag[5] 0.6628101 1.440 1.667 0.83475
## ARCH Lag[7] 7.0973268 2.315 1.543 0.08275
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 25.4221
## Individual Statistics:
## mu 0.04578
## omega 3.37121
## alpha1 0.36338
## beta1 0.30802
## gamma1 0.41537
## skew 0.13935
## shape 0.92961
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.69 1.9 2.35
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.5172 0.12932
## Negative Sign Bias 0.8067 0.41989
## Positive Sign Bias 1.9815 0.04762 **
## Joint Effect 6.1938 0.10255
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 83.21 5.163e-10
## 2 30 70.45 2.627e-05
## 3 40 161.46 7.735e-17
## 4 50 227.74 7.425e-25
##
##
## Elapsed time : 1.561519
plot(fitGJR, which = "all")
##
## please wait...calculating quantiles...
Note that the notable difference now lies within the twelfth plot. What
was a symmetrical “News Impact Curve”, weighing equally on both negative
and positive news against the asset, this adjustment weighs more on
negative news than it does on positive news. Note that the model has
additional parameters, all of which are statistically significant.
An AR(1) GJR-GARCH model expands on the above as a combination of an Autoregressive Integrated Moving Average (ARIMA) model with an asymmetric GJR-GARCH model.
Here’s a breakdown:
AR(1): The AR(1) model is an Autoregressive model of order 1. It means the current value of a time series is linearly dependent on its previous value with a lag of 1. Mathematically, it can be represented as \(y_t = c + \phi_1 y_{t-1} + \varepsilon_t\), where \(y_t\) is the value at time \(t\), \(\phi_1\) is the autoregressive coefficient, \(c\) is a constant term, and \(\varepsilon_t\) is the error term.
GJR-GARCH: As explained earlier, the GJR-GARCH model introduces asymmetric responses to volatility shocks, allowing for different effects of positive and negative shocks on volatility.
An AR(1) GJR-GARCH model combines the autoregressive nature of AR(1) with the volatility dynamics of the GJR-GARCH model. The model expresses both the conditional mean (through the AR(1) process) and the conditional variance (through the GJR-GARCH process) of a time series, making it suitable for capturing both the trend and the volatility clustering observed in financial time series data.
The mathematical representation of an AR(1) GJR-GARCH model combines the AR(1) equation with the GJR-GARCH equation, allowing for the joint modeling of the mean and volatility of a time series. The specifics of the equations will involve parameters related to both the AR(1) process and the GJR-GARCH process, incorporating autoregressive dynamics and asymmetry in volatility responses simultaneously.
To compute this model in R, the only adjustment lies with adjusting
the armaOrder
in the following manner:
spec <- ugarchspec(variance.model = list(model = "gjrGARCH"),
mean.model = list(armaOrder = c(1, 0)), #
distribution.model = "sstd")
fitAR.GJR <- ugarchfit(spec, data = Returns)
fitAR.GJR
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : gjrGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001161 0.000314 3.6921 0.000222
## ar1 -0.027169 0.017633 -1.5408 0.123374
## omega 0.000014 0.000001 10.4157 0.000000
## alpha1 0.026678 0.004905 5.4386 0.000000
## beta1 0.893328 0.009066 98.5374 0.000000
## gamma1 0.100962 0.017702 5.7034 0.000000
## skew 0.971005 0.022146 43.8450 0.000000
## shape 7.285811 0.813388 8.9574 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001161 0.000304 3.8234 0.000132
## ar1 -0.027169 0.017135 -1.5856 0.112835
## omega 0.000014 0.000002 8.3855 0.000000
## alpha1 0.026678 0.008338 3.1995 0.001377
## beta1 0.893328 0.008340 107.1094 0.000000
## gamma1 0.100962 0.020003 5.0474 0.000000
## skew 0.971005 0.020914 46.4291 0.000000
## shape 7.285811 0.954831 7.6305 0.000000
##
## LogLikelihood : 8362.142
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.0100
## Bayes -4.9953
## Shibata -5.0100
## Hannan-Quinn -5.0047
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.08841 0.766213
## Lag[2*(p+q)+(p+q)-1][2] 4.61768 0.001068
## Lag[4*(p+q)+(p+q)-1][5] 8.28579 0.005842
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.025 0.04482
## Lag[2*(p+q)+(p+q)-1][5] 4.411 0.20709
## Lag[4*(p+q)+(p+q)-1][9] 9.797 0.05543
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.005517 0.500 2.000 0.9408
## ARCH Lag[5] 0.646406 1.440 1.667 0.8397
## ARCH Lag[7] 6.980286 2.315 1.543 0.0877
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 27.7792
## Individual Statistics:
## mu 0.0457
## ar1 0.1590
## omega 3.7331
## alpha1 0.3638
## beta1 0.3084
## gamma1 0.4103
## skew 0.1353
## shape 0.9244
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.89 2.11 2.59
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.3676 0.1715
## Negative Sign Bias 0.8703 0.3842
## Positive Sign Bias 1.8521 0.0641 *
## Joint Effect 5.6012 0.1327
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 111.19 4.799e-15
## 2 30 88.79 5.589e-08
## 3 40 159.13 1.901e-16
## 4 50 143.70 3.092e-11
##
##
## Elapsed time : 1.902537
plot(fitAR.GJR, which = "all")
##
## please wait...calculating quantiles...
We do not see anything unusual in the plots. Further note that the
p-value for the AR(1) parameter estimate is 0.123310 > 0.05. For this
reason, the latter estimate is statistically insignificant and adds no
additional value. In conclusion, this model is not better than the
GJR-GARCH model.
In our sGARCH, we assumed a zero-constant mean, invariant of time. This adjustment considers a non-zero mean to the model and whether this approach is statistically significant.
spec <- ugarchspec(variance.model = list(model = "gjrGARCH"),
mean.model = list(armaOrder = c(1, 0),
archm = T,
archpow = 2),
distribution.model = "sstd")
fit.Mean.GJR <- ugarchfit(spec, data = Returns)
fit.Mean.GJR
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : gjrGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.000418 0.000432 0.96832 0.332887
## ar1 -0.024653 0.017741 -1.38962 0.164644
## archm 2.011310 1.049917 1.91568 0.055405
## omega 0.000016 0.000002 9.09135 0.000000
## alpha1 0.025841 0.005470 4.72413 0.000002
## beta1 0.887169 0.009795 90.57805 0.000000
## gamma1 0.104874 0.017419 6.02051 0.000000
## skew 0.970993 0.022120 43.89717 0.000000
## shape 7.325487 0.848489 8.63356 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.000418 0.000640 0.65344 0.513473
## ar1 -0.024653 0.017367 -1.41957 0.155732
## archm 2.011310 1.539225 1.30670 0.191314
## omega 0.000016 0.000002 6.61439 0.000000
## alpha1 0.025841 0.008677 2.97802 0.002901
## beta1 0.887169 0.010053 88.25275 0.000000
## gamma1 0.104874 0.019181 5.46746 0.000000
## skew 0.970993 0.020629 47.06906 0.000000
## shape 7.325487 0.974813 7.51476 0.000000
##
## LogLikelihood : 8362.779
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.0098
## Bayes -4.9933
## Shibata -5.0098
## Hannan-Quinn -5.0039
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.1079 0.742538
## Lag[2*(p+q)+(p+q)-1][2] 4.1203 0.003295
## Lag[4*(p+q)+(p+q)-1][5] 7.1630 0.016638
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 3.993 0.04568
## Lag[2*(p+q)+(p+q)-1][5] 4.387 0.20954
## Lag[4*(p+q)+(p+q)-1][9] 9.720 0.05755
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.006982 0.500 2.000 0.93341
## ARCH Lag[5] 0.620405 1.440 1.667 0.84759
## ARCH Lag[7] 6.901323 2.315 1.543 0.09119
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 18.734
## Individual Statistics:
## mu 0.04589
## ar1 0.16893
## archm 0.07821
## omega 2.10323
## alpha1 0.35716
## beta1 0.31667
## gamma1 0.41967
## skew 0.13016
## shape 0.91074
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 2.1 2.32 2.82
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.7438 0.08129 *
## Negative Sign Bias 0.5438 0.58660
## Positive Sign Bias 2.1004 0.03577 **
## Joint Effect 6.3934 0.09396 *
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 87.79 8.137e-11
## 2 30 83.17 3.937e-07
## 3 40 132.81 3.685e-12
## 4 50 154.64 7.061e-13
##
##
## Elapsed time : 4.204651
plot(fit.Mean.GJR, which = "all")
##
## please wait...calculating quantiles...
Once again, the additional parameters added by this method results in an estimate that is statistically insignificant.
In conclusion, the best GARCH model for our financial data is the GJR-GARCH model, assuming a student’s skewed t-distribution for the residuals.
Let’s prepare our data for the simulation.
sFinal = ugarchspec(variance.model = list(model = "gjrGARCH"),
mean.model = list(armaOrder = c(0, 0)), #
distribution.model = "sstd")
setfixed(sFinal) = as.list(coef(fitGJR))
f2008 = ugarchforecast(data = Returns["/2008-12"],
fitORspec = sFinal,
n.ahead = 252)
f2019 = ugarchforecast(data = Returns["/2019-12"],
fitORspec = sFinal,
n.ahead = 252)
par(mfrow = c(2,1))
plot(sigma(f2008))
plot(sigma(f2019))
You will find that the shape of this asset is consistnet with many listed shares across the world. That is, volatility tended to decline after 2008, whilst volatility tended to increase in the latter parts of 2019.
par(mfrow = c(1,1))
simulations = ugarchpath(spec = sFinal,
m.sim = 5, #simulate five lines
n.sim = 1*252, #over one year, if more than one multiply 252 by the desired number of years
rseed = 123)
p = 231695.3*apply(fitted(simulations),2, 'cumsum') + 231695.3 #The number 231695.3 was the adjusted closing price for 2019-12-31
matplot(p, type = "l", lwd = 3)