References:

This text makes reference to the following sources:

  1. The following is a free, online textbook: R For Data Sciences by H. Wickham and G. Grolemund https://r4ds.had.co.nz/

  2. Investment Sciences by D.G. Luenberger

1 Week 1: Introduction to R and RStudio

1.1 Introduction to RStudio

R is widely used in quantitative finance due to its extensive statistical capabilities, data manipulation tools, and its active community that develops and maintains specialized packages for finance. It offers a wide range of functionalities that enable professionals in the finance industry to analyze data, build models, conduct simulations, and perform risk management tasks.

Here are some key features and applications of R in quantitative finance:

  1. Data Analysis and Visualization: R provides powerful tools for data analysis and visualization, allowing you to explore, clean, and transform financial data. It offers numerous packages for importing and manipulating financial data from various sources, such as CSV files, databases, and web APIs. R’s graphics capabilities help create informative charts, plots, and interactive visualizations to gain insights from the data.

  2. Statistical Analysis: R’s statistical functionalities are highly valuable in finance. You can use R to perform descriptive statistics, hypothesis testing, regression analysis, time series analysis, and advanced modeling techniques. R’s extensive collection of statistical packages, including “stats,” “quantmod,” “rugarch,” and “fGarch,” provide specialized functions for finance-related analysis.

  3. Quantitative Modeling: R is well-suited for building quantitative models in finance. It offers libraries for option pricing, portfolio optimization, risk assessment, and asset allocation. Packages like “quantmod,” “PerformanceAnalytics,” and “PortfolioAnalytics” provide tools for backtesting trading strategies, calculating performance metrics, and constructing efficient portfolios.

  4. Time Series Analysis: R has robust capabilities for analyzing and modeling time series data, which is prevalent in finance. It offers packages like “xts” and “zoo” for handling time series objects, and packages like “forecast” and “TTR” for time series forecasting, volatility modeling, and technical analysis.

  5. Risk Management: R provides tools for risk management in finance, such as Value-at-Risk (VaR) calculations, Monte Carlo simulations, and stress testing. Packages like “RiskMetrics” and “fPortfolio” offer functions for measuring and managing financial risk.

  6. Financial Econometrics: R is widely used for econometric analysis in finance. It offers packages like “Econometrics” and “plm” for estimating econometric models, performing panel data analysis, and conducting event studies.

  7. Algorithmic Trading: R can be used for developing and backtesting algorithmic trading strategies. Packages like “quantstrat” and “blotter” enable you to implement and evaluate trading strategies using historical data.

In addition to these specific applications, R’s open-source nature and active community allow users to access a vast collection of finance-related packages and resources. The flexibility of the R language and its integration with other programming languages, such as C++, also make it suitable for implementing complex financial models and high-performance computations.

Back to top

1.2 Installing R and RStudio

To install R and RStudio, follow the steps below. Please note that in order to use RStudio, install both R and RStudio applications:

Installing R:

  1. Visit the Comprehensive R Archive Network (CRAN) website: https://cran.r-project.org/

  2. Choose a CRAN mirror location near your geographical location.

  3. On the mirror’s website, select your operating system (Windows, macOS, or Linux).

  4. Download the appropriate installation package for your operating system.

  5. Once the download is complete, run the installer and follow the instructions provided.

  6. During the installation process, you may be asked to choose customization options. You can typically leave the default settings unless you have specific preferences.

  7. Complete the installation process.

Installing RStudio:

  1. Visit the RStudio website: https://www.rstudio.com/

  2. Go to the “Products” menu and click on “RStudio Desktop.”

  3. Scroll down to the “Installers for Supported Platforms” section.

  4. Download the appropriate installer for your operating system.

  5. Once the download is complete, run the installer and follow the instructions provided.

  6. During the installation process, you may be asked to choose customization options. You can typically leave the default settings unless you have specific preferences.

  7. Complete the installation process.

After following these steps, you should have R and RStudio installed on your computer. You can then launch RStudio, and it should automatically detect your R installation and configure itself accordingly.

Back to top

1.3 Understanding the RStudio environment

RStudio is an integrated development environment (IDE) specifically designed for R, a popular programming language for statistical computing and data analysis. RStudio provides a user-friendly interface that enhances the workflow of R users, making it easier to write, run, and debug R code.

Here are the main components of the RStudio environment:

  1. Console: The RStudio Console is where you can directly interact with the R interpreter. You can type and execute R commands here, and the results will be displayed immediately.

  2. Script Editor: The Script Editor is where you write your R code. It provides features like syntax highlighting, code completion, and code formatting to make writing code easier and more efficient. You can save your code as scripts for future use.

  3. Environment and History: The Environment tab shows the objects (variables, functions, etc.) currently in memory. You can view and manipulate these objects. The History tab displays the commands you have executed in the Console, making it easy to review and rerun previous commands.

  4. Files, Plots, Packages, Help: These tabs provide access to various tools and resources. The Files tab allows you to navigate your computer’s file system and manage files. The Plots tab displays graphical outputs generated by your R code. The Packages tab lets you manage installed packages and install new ones. The Help tab provides access to R documentation and other resources.

  5. Viewer: The Viewer pane allows you to view HTML content generated by R, such as interactive plots or reports.

  6. Version Control Integration: RStudio has built-in support for version control systems like Git. You can manage your code repositories, commit changes, and collaborate with others directly from the IDE.

  7. Customizable Layout: RStudio’s layout is highly customizable. You can arrange the different panes according to your preferences, detach them into separate windows, or create different layouts for different projects.

These are the main components and features of the RStudio environment. RStudio aims to provide a user-friendly and efficient interface for R programming, making it easier to work with R and analyze data.

Back to top

1.4 Basic R syntax and data types

Here are some basic syntax and data types in R:

  1. Variables and Assignment: Variables are used to store values in R. You can assign a value to a variable using the assignment operator <- or =. For example:

    x <- 10
    y = 5
  2. Data Types: R has several basic data types:

    • Numeric: Used to represent numbers (integers or decimals). Example: x <- 10, y <- 3.14
    • Character: Used to represent text. Text should be enclosed in quotes. Example: name <- "John"
    • Logical: Used to represent boolean values (TRUE or FALSE). Example: is_valid <- TRUE
    • Integer: A special type of numeric data type used to represent whole numbers. Example: z <- 5L
    • Complex: Used to represent complex numbers. Example: cplx <- 2 + 3i
  3. Vectors: A vector is a basic data structure in R that can hold multiple values of the same data type. You can create a vector using the c() function. For example:

    numbers <- c(1, 2, 3, 4, 5)
    fruits <- c("apple", "banana", "orange")
  4. Matrices: A matrix is a two-dimensional data structure in R with rows and columns. You can create a matrix using the matrix() function. For example:

    my_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
  5. Data Frames: A data frame is a tabular data structure in R similar to a table in a database or a spreadsheet. It can hold different data types. You can create a data frame using the data.frame() function. For example:

    my_data <- data.frame(name = c("John", "Jane", "Adam"), age = c(25, 30, 35))
  6. Lists: A list is a versatile data structure in R that can hold different types of objects, including vectors, matrices, data frames, and even other lists. You can create a list using the list() function. For example:

    my_list <- list(numbers = c(1, 2, 3), fruits = c("apple", "banana", "orange"))

These are just some of the basic syntax and data types in R. R offers many more advanced features and data structures, making it a powerful language for data analysis and statistical computing.

Back to top

1.5 Managing R packages

Managing R packages involves various tasks such as installing, updating, and removing packages. R packages are collections of functions, data, and documentation that extend the functionality of the R programming language. Here’s a guide on how to manage R packages effectively:

  1. Installing Packages:
    • To install a package, use the install.packages() function. For example, to install the “dplyr” package, you would run: install.packages("dplyr").
    • By default, packages are installed from the Comprehensive R Archive Network (CRAN). However, you can also install packages from other sources like GitHub or Bioconductor using specific functions or package managers.
  2. Loading Packages:
    • Once a package is installed, you need to load it into your R session to use its functions. Use the library() or require() functions to load a package. For example, to load the “dplyr” package, you would run: library(dplyr).
  3. Updating Packages:
    • Regularly updating packages ensures that you have the latest features and bug fixes. To update packages, use the update.packages() function. For example, to update all installed packages, run: update.packages(). You can also selectively update specific packages by providing their names as arguments.
  4. Checking Installed Packages:
    • To see a list of installed packages in your R environment, use the installed.packages() function. Running this function will return a data frame with information about each installed package, including the package name, version, and other details.
  5. Removing Packages:
    • If you no longer need a package, you can remove it using the remove.packages() function. For example, to remove the “dplyr” package, run: remove.packages("dplyr"). This will remove the package and its associated files from your R environment.
  6. Managing Package Dependencies:
    • Some packages depend on other packages to function correctly. R will automatically install and load dependencies when you install a package from CRAN. However, if you install packages from other sources, you might need to manually manage their dependencies.
  7. Managing Package Versions:
    • Sometimes you may need to work with specific package versions, especially when reproducing code or dealing with package conflicts. You can install specific package versions using the install.packages() function with the version parameter. For example: install.packages("dplyr", version = "0.8.5").

Remember to periodically update your packages to stay up-to-date with the latest improvements and bug fixes. It’s good practice to document the packages you use in your projects to ensure reproducibility.

Back to top

2 Week 2: Data Manipulation and Importing

2.1 Importing and Exporting data in R

In R, there are several methods available for importing and exporting data. Here are some common approaches:

2.1.1 Importing Data:

  1. CSV Files: To import data from a CSV file, you can use the read.csv() function. For example:
data <- read.csv("data.csv")
  1. Excel Files: For importing data from Excel files, you can use the readxl package. First, install the package using install.packages("readxl"), and then load it using library(readxl). You can then use the read_excel() function to read the data. For example:
library(readxl)
data <- read_excel("data.xlsx")
  1. Other Formats: R supports various other data formats such as JSON, XML, SPSS, and more. There are specific packages available for each format. For instance, jsonlite package for JSON, xml2 package for XML, and haven package for SPSS files.

2.1.2 Exporting Data:

  1. CSV Files: To export data to a CSV file, you can use the write.csv() function. For example:
write.csv(data, "output.csv", row.names = FALSE)
  1. Excel Files: To export data to an Excel file, you can use the write.xlsx() function from the openxlsx package. First, install the package using install.packages("openxlsx"), and then load it using library(openxlsx). You can then use the write.xlsx() function to write the data. For example:
library(openxlsx)
write.xlsx(data, "output.xlsx", row.names = FALSE)
## Warning: Please use 'rowNames' instead of 'row.names'
  1. Other Formats: Similar to importing, R provides packages for exporting data in various formats. For instance, jsonlite package for JSON, xml2 package for XML, and haven package for SPSS files.

These are just a few examples of importing and exporting data in R. Depending on your specific needs and the format of your data, you may need to explore additional packages or functions.

Back to top

2.2 Data Cleaning and Preprocessing in R

Data cleaning and preprocessing are essential steps in data analysis. R provides a variety of packages and functions to perform these tasks efficiently. Here are some common techniques for data cleaning and preprocessing in R:

  1. Handling Missing Data:
    • Identify missing values: Use functions like is.na() or complete.cases() to identify missing values in your dataset.
    • Dealing with missing values: You can either remove rows or columns with missing values using na.omit() or fill missing values using functions like na.fill() or tidyr::replace_na().
    • Imputation: Use techniques like mean, median, or regression imputation to replace missing values. The mice package provides multiple imputation methods.
  2. Handling Outliers:
    • Identify outliers: Visualize your data using box plots, histograms, or scatter plots and use statistical measures such as z-scores or the interquartile range (IQR) to identify outliers.
    • Dealing with outliers: Remove outliers or transform them using methods like Winsorization, where extreme values are replaced with values at a certain percentile.
  3. Data Transformation:
    • Scaling/Normalization: Use functions like scale() or caret::preProcess() to scale numeric variables to a common range (e.g., 0 to 1) or standardize them.
    • Logarithmic Transformation: Apply a logarithmic function (log()) to reduce the skewness of highly skewed variables.
    • Dummy Coding: Convert categorical variables into binary dummy variables using functions like model.matrix() or the dummy_cols() function from the dplyr package.
  4. Handling Inconsistent Data:
    • Standardizing Text: Convert text to a consistent format using functions like tolower() or toupper() to lowercase or uppercase, and stringr::str_replace() for pattern-based replacements.
    • Correcting Data Entry Errors: Identify and correct common data entry errors using functions like gsub() or regular expressions (regex) for pattern matching and replacement.
  5. Removing Redundant Variables:
    • Identify correlated variables: Use correlation matrices or techniques like principal component analysis (PCA) to identify highly correlated variables.
    • Remove redundant variables: Based on the analysis, remove variables that do not contribute significantly to the analysis.
  6. Handling Date and Time:
    • Parse Date and Time: Use functions like as.Date() or lubridate::ymd() to convert date/time strings into proper date/time objects.
    • Extract Information: Extract specific information from date/time objects using functions like month(), year(), day(), etc.

These are just a few examples of common data cleaning and preprocessing techniques in R. The choice of techniques will depend on the specific characteristics of your dataset and the goals of your analysis. R packages like tidyverse, dplyr, tidyr, and stringr provide a rich set of functions that can greatly simplify these tasks.

Back to top

2.3 Data Transformation and Aggregation

For an in-depth discussion on this topic, refer to Chaper 5 of the R For Data Science textbook https://r4ds.had.co.nz/transform.html

Back to top

2.4 Handling Missing Data and Outliers

For an in-depth discussion on this topic, refer to Section 7.4 of the R For Data Science textbook https://r4ds.had.co.nz/exploratory-data-analysis.html#missing-values-2

Back to top

3 Week 3: Exploratory Data Analysis (EDA)

3.1 Understanding the Importance of EDA in Quantitative Finance

Exploratory Data Analysis (EDA) is of paramount importance in quantitative finance. It serves as a crucial initial step in the data analysis process and provides valuable insights and understanding of the financial data. Here are some reasons why EDA is significant in quantitative finance:

  1. Data Quality Assessment: EDA helps in assessing the quality and integrity of the financial data. It involves checking for missing values, outliers, data inconsistencies, and potential errors. By identifying and addressing these issues, EDA ensures that the subsequent analysis is based on reliable and accurate data.

  2. Pattern Identification: EDA techniques help in identifying patterns and relationships within the financial data. By visualizing the data through plots, charts, and graphs, analysts can observe trends, seasonality, cyclicality, and other patterns. These insights are vital for making informed decisions in quantitative finance, such as identifying trading opportunities or estimating risk.

  3. Feature Selection: In quantitative finance, it is essential to identify relevant features or variables that impact financial outcomes. EDA facilitates the exploration of different variables and their relationships with the target variable. By analyzing correlations, distributions, and statistical measures, analysts can select the most significant features for further analysis and modeling.

  4. Risk Assessment: EDA aids in understanding and quantifying risks associated with financial instruments or portfolios. It allows analysts to examine the distribution and volatility of returns, identify extreme events, and measure risk metrics such as value-at-risk (VaR) and conditional value-at-risk (CVaR). By exploring historical data and conducting scenario analyses, EDA helps in evaluating and managing risks effectively.

  5. Model Assumptions: Quantitative finance often involves developing mathematical models to predict asset prices, portfolio performance, or other financial indicators. EDA helps in validating the assumptions underlying these models. Analysts can assess whether the data adheres to the model assumptions, such as normality, stationarity, or independence. If the assumptions are violated, appropriate adjustments or alternative models can be considered.

  6. Strategy Development: EDA plays a crucial role in developing trading strategies and investment approaches. By thoroughly analyzing historical price data, trading volumes, and other relevant factors, analysts can uncover patterns, market inefficiencies, and opportunities for alpha generation. EDA allows for backtesting and fine-tuning of strategies based on historical performance, helping traders and investors make more informed decisions.

In summary, EDA is essential in quantitative finance as it enables data quality assessment, pattern identification, feature selection, risk assessment, validation of model assumptions, and strategy development. It provides a solid foundation for subsequent analysis, modeling, and decision-making processes in the field of finance.

Back to top

3.2 Visualizing Financial Data Using R and ggplot2

Visualizing financial data using R and ggplot2 can be a powerful way to explore and communicate insights from your data. ggplot2 is a popular data visualization package in R that provides a flexible and intuitive grammar of graphics. Here’s a step-by-step guide to getting started with visualizing financial data using R and ggplot2:

Step 1: Install and load the necessary packages: First, you need to make sure you have the ggplot2 package installed. If you haven’t installed it yet, you can do so by running the following command:

#install.packages("ggplot2")
#install.packages("quantmod")

Once installed, load the ggplot2 package into your R session using the following command:

library(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
options(xts.warn_dplyr_breaks_lag = FALSE)

Step 2: Prepare your financial data: Before visualizing your financial data, you need to prepare it in a suitable format. Let’s assume you have a dataset called financial_data with columns such as “Date,” “Price,” and “Volume.” Make sure your data is loaded into your R session.

Step 3: Create basic plots: You can start by creating basic plots to get an overview of your financial data. Here’s an example of how to create a line plot of stock prices over time:

getSymbols("AAPL")
## [1] "AAPL"
financial_data = data.frame(Date=index(AAPL), coredata(AAPL))

ggplot(data = financial_data, aes(x = Date, y = AAPL.Close)) +
  geom_line() +
  labs(title = "Stock Prices Over Time", x = "Date", y = "Price")

This code specifies the data and mapping aesthetics (x and y) in the ggplot() function. The geom_line() layer is used to draw the line plot. The labs() function is used to provide a title and labels for the x and y axes.

Step 4: Customize your plot: You can customize your plot further by adding additional layers, changing colors, adding annotations, and modifying the axis scales. Here’s an example that adds a moving average line and customizes the appearance:

ggplot(data = financial_data, aes(x = Date, y = AAPL.Close)) +
  geom_line(color = "steelblue") +
  geom_smooth(aes(y = AAPL.Close), method = "loess", se = TRUE, color = "red") +
  labs(title = "Stock Prices Over Time", x = "Date", y = "Price") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

In this code, the geom_smooth() layer is used to add a smooth line (in this case, a loess regression line) to the plot. The color argument is used to specify the colors of the lines. The theme_minimal() function changes the theme of the plot to a minimalistic style.

Step 5: Explore other plot types: ggplot2 offers a wide range of plot types that can be useful for visualizing financial data. Some common plot types include bar charts, scatter plots, and candlestick charts. Explore the ggplot2 documentation and examples to learn about different plot types and how to create them.

Step 6: Save and export your plot: Once you’re satisfied with your plot, you can save it as an image file using the ggsave() function. Here’s an example of how to save your plot as a PNG file:

ggsave(filename = "financial_plot.png", plot = last_plot(), width = 10, height = 6)
## `geom_smooth()` using formula = 'y ~ x'

This code saves the last plot created in your R session as a PNG file named “financial_plot.png” with a width of 10 inches and a height of 6 inches. Adjust the filename and dimensions as needed.

That’s it! You now have a basic guide to visualizing financial data using R and ggplot2. Feel free to explore the various customization options and plot types offered by ggplot2 to create informative and visually appealing visualizations of your financial data.

Back to top

3.3 Descriptive Statistics and Data Summary

In R, you can use various functions and packages to perform descriptive statistics and data summaries for quantitative finance. Let’s go through some common tasks step-by-step:

  1. Install and load necessary packages: First, make sure you have the required packages installed and loaded:
list.of.packages <- c("quantmod", "PerformanceAnalytics","dplyr","psych", "TSPred")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
library(PerformanceAnalytics, warn.conflicts = FALSE, quietly = TRUE)
library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
library(psych, warn.conflicts = FALSE, quietly = TRUE)
library(TSPred, warn.conflicts = FALSE, quietly = TRUE)
  1. Data retrieval and exploration: Assuming you have financial data available, you can retrieve it using quantmod. For this example, we’ll continue to use the AAPL data table, which has up-to-date Apple financial information:

  2. Data summary and descriptive statistics: Now, let’s calculate the summary statistics for the financial data:

summary(financial_data)
##       Date              AAPL.Open         AAPL.High          AAPL.Low      
##  Min.   :2007-01-03   Min.   :  2.835   Min.   :  2.929   Min.   :  2.793  
##  1st Qu.:2011-03-24   1st Qu.: 12.186   1st Qu.: 12.282   1st Qu.: 12.041  
##  Median :2015-06-17   Median : 26.848   Median : 27.075   Median : 26.573  
##  Mean   :2015-06-16   Mean   : 49.470   Mean   : 50.012   Mean   : 48.951  
##  3rd Qu.:2019-09-09   3rd Qu.: 55.812   3rd Qu.: 56.343   3rd Qu.: 55.093  
##  Max.   :2023-11-29   Max.   :196.240   Max.   :198.230   Max.   :195.280  
##    AAPL.Close       AAPL.Volume        AAPL.Adjusted    
##  Min.   :  2.793   Min.   :2.405e+07   Min.   :  2.367  
##  1st Qu.: 12.161   1st Qu.:1.036e+08   1st Qu.: 10.308  
##  Median : 26.832   Median :2.044e+08   Median : 24.424  
##  Mean   : 49.504   Mean   :3.590e+08   Mean   : 47.710  
##  3rd Qu.: 55.898   3rd Qu.:4.845e+08   3rd Qu.: 53.655  
##  Max.   :196.450   Max.   :3.373e+09   Max.   :195.927
# If you want more specific statistics, you can use the psych package:
describe(financial_data)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
##               vars    n         mean           sd       median      trimmed
## Date             1 4257          NaN           NA           NA          NaN
## AAPL.Open        2 4257        49.47        53.63        26.85        40.37
## AAPL.High        3 4257        50.01        54.25        27.08        40.81
## AAPL.Low         4 4257        48.95        53.08        26.57        39.93
## AAPL.Close       5 4257        49.50        53.69        26.83        40.39
## AAPL.Volume      6 4257 359041903.62 370722582.53 204420400.00 288190477.63
## AAPL.Adjusted    7 4257        47.71        53.83        24.42        38.35
##                        mad         min         max        range skew kurtosis
## Date                    NA         Inf        -Inf         -Inf   NA       NA
## AAPL.Open            27.87        2.84 1.96240e+02 1.934000e+02 1.31     0.29
## AAPL.High            28.10        2.93 1.98230e+02 1.953000e+02 1.31     0.28
## AAPL.Low             27.57        2.79 1.95280e+02 1.924900e+02 1.31     0.31
## AAPL.Close           27.78        2.79 1.96450e+02 1.936600e+02 1.31     0.29
## AAPL.Volume   189248256.12 24048300.00 3.37297e+09 3.348921e+09 2.17     6.64
## AAPL.Adjusted        26.71        2.37 1.95930e+02 1.935600e+02 1.33     0.33
##                       se
## Date                  NA
## AAPL.Open           0.82
## AAPL.High           0.83
## AAPL.Low            0.81
## AAPL.Close          0.82
## AAPL.Volume   5681947.50
## AAPL.Adjusted       0.83
  1. Calculating returns and performance metrics: It is often useful to work with returns for financial analysis. Let’s calculate the daily returns and some common performance metrics:
# Determine the log returns 
AAPL$AAPL.Returns = diff(log(AAPL$AAPL.Adjusted))
AAPL = na.omit(AAPL)
# Common performance metrics
annualized_returns <- Return.annualized(AAPL$AAPL.Returns, scale = 252) # take 
sharpe_ratio <- SharpeRatio(AAPL$AAPL.Returns)
maximum_drawdown <- maxDrawdown(AAPL$AAPL.Returns, invert = FALSE)
  1. Correlation and covariance analysis: Analyzing the correlations and covariances of asset returns can be important for portfolio construction:
#Microsoft financial information
getSymbols("MSFT")
## [1] "MSFT"
MSFT$MSFT.Returns = diff(log(MSFT$MSFT.Adjusted))
MSFT = na.omit(MSFT)

#Google financial information
getSymbols("GOOG")
## [1] "GOOG"
GOOG$GOOG.Returns = diff(log(GOOG$GOOG.Adjusted))
GOOG = na.omit(GOOG)

returns = cbind(AAPL$AAPL.Returns, GOOG$GOOG.Returns, MSFT$MSFT.Returns)
correlation_matrix <- cor(returns)
covariance_matrix <- cov(returns)

covariance_matrix
##              AAPL.Returns GOOG.Returns MSFT.Returns
## AAPL.Returns 0.0004034716 0.0002203686  0.000205222
## GOOG.Returns 0.0002203686 0.0003487534  0.000209437
## MSFT.Returns 0.0002052220 0.0002094370  0.000319617
correlation_matrix
##              AAPL.Returns GOOG.Returns MSFT.Returns
## AAPL.Returns    1.0000000    0.5874677    0.5714820
## GOOG.Returns    0.5874677    1.0000000    0.6273058
## MSFT.Returns    0.5714820    0.6273058    1.0000000
  1. Histogram and density plot: Visualizing the distribution of returns can provide insights into the data:
returns = data.frame(Date=index(returns), coredata(returns))

hist(returns$AAPL.Returns, breaks = 30, main = "Histogram of Returns", xlab = "Returns") #do we see a normal distribution?

plot(density(returns$AAPL.Returns), main = "Density Plot of Returns", xlab = "Returns")

Remember that these are just basic examples to get you started. In practice, you might need to customize your analysis and summaries based on your specific data and research questions in quantitative finance.

Back to top

3.4 Time-Series Analysis and Visualization

Time series analysis and visualization are essential components of quantitative analysis in R. R offers various libraries and functions that make it easy to handle time series data, perform statistical analysis, and create insightful visualizations. In this response, I’ll provide a brief overview of the steps involved in time series analysis and visualization in R.

  1. Load Required Libraries: Before you start, make sure you have the necessary libraries installed. Commonly used libraries for time series analysis in R include ts, xts, zoo, forecast, and ggplot2 for visualization.
# Install and load required libraries

#list.of.packages <- c("quantmod", "PerformanceAnalytics","dplyr","psych", "TSPred","ts","xts","zoo","forecast","ggpplot2","plotly")
#new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
#if(length(new.packages)) install.packages(new.packages)

library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
library(PerformanceAnalytics, warn.conflicts = FALSE, quietly = TRUE)
library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
library(psych, warn.conflicts = FALSE, quietly = TRUE)
library(TSPred, warn.conflicts = FALSE, quietly = TRUE)

library(forecast, warn.conflicts = FALSE, quietly = TRUE)
library(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
library(plotly, warn.conflicts = FALSE, quietly = TRUE)
  1. Time Series Visualization: Visualize the time series to get a better understanding of the data using different plotting techniques.
# Candlestick plot
getSymbols("AAPL")
## [1] "AAPL"
df <- data.frame(Date=index(AAPL),coredata(AAPL))
df <- tail(df, 90)

fig <- df %>% plot_ly(x = ~Date, type="candlestick",
          open = ~AAPL.Open, close = ~AAPL.Close,
          high = ~AAPL.High, low = ~AAPL.Low) 
fig <- fig %>% layout(title = "Basic Candlestick Chart")

fig
  1. Time Series Decomposition: Time series data often exhibit trend, seasonality, and random fluctuations. Decomposing the time series helps identify these components.
# Decompose the time series
x <- c(-50, 175, 149, 214, 247, 237, 225, 329, 729, 809,
       530, 489, 540, 457, 195, 176, 337, 239, 128, 102, 232, 429, 3,
       98, 43, -141, -77, -13, 125, 361, -45, 184)
x <- ts(x, start = c(1951, 1), end = c(1958, 4), frequency = 4)
decomposed <- decompose(x)

# Plot decomposition components
plot(decomposed)

  1. Time Series Forecasting: You can use forecasting models to predict future values based on historical data. The forecast library provides many useful functions for this purpose.
# Fit a forecasting model (e.g., ARIMA)
model <- auto.arima(x)

# Forecast future values
forecast_values <- forecast(model, h = 12) # Forecasting 12 steps ahead
  1. Visualizing Forecasts: Visualize the forecasted values along with the historical data to understand the model’s predictions.
# Plot the forecasted values
plot(forecast_values, main = "Forecasted Values", xlab = "Time", ylab = "Value", col = "red")

# Combine historical data and forecast
combined_data <- c(x, forecast_values$mean)
plot(combined_data, main = "Historical Data and Forecast", xlab = "Time", ylab = "Value", col = c(rep("blue", length(x)), rep("red", length(forecast_values$mean))))

These are the basic steps for time series analysis and visualization in R. Depending on your specific analysis goals, you may need to explore additional techniques and models. R provides a wide range of tools and packages to handle more advanced time series tasks as well.

Back to top

4 Week 4: Financial Modeling

For this week, we’ll be working on the fundamental knowledge of finance and understanding the market performance. To best understand the market, we’ll build a series of financial models and observe which models are of best-fit.

4.1 Time-Value of Money

The time-value-of-money is a financial principle that states that the value of money changes over time due to factors like inflation, interest rates, and the potential to earn returns on investment. It suggests that a sum of money available today is worth more than the same amount in the future because of its potential earning capacity or investment opportunity.

This concept is crucial in various financial decisions, like evaluating investments, loans, and understanding the significance of receiving money today versus receiving the same amount in the future. It’s the backbone of many financial calculations, including present value, future value, annuities, and determining rates of return.

4.2 Present- and Future-Value

4.2.1 Present Value

Present Value (PV) represents the current worth of a future sum of money or a series of cash flows, discounted at a specified rate to reflect their value in today’s terms.

The basic formula for present value is:

\[ PV = \frac{FV}{(1 + r)^n} \]

  • \(PV\) = Present Value
  • \(FV\) = Future Value
  • \(r\) = Discount rate (interest rate)
  • \(n\) = Number of time periods

4.2.1.1 Principles:

  • Discounting: PV calculation involves discounting future cash flows back to the present at a specified discount rate. The rationale is that money available today is worth more than the same amount in the future due to its potential earning capacity.
  • Time Factor: The longer the time until receipt of the future amount, the lower its present value due to the effects of compounding or discounting over time.
  • Inverse Relationship: There’s an inverse relationship between discount rate and present value; as the discount rate increases, the present value decreases.

4.2.2 Future Value:

Future Value represents the value of an investment at a specific date in the future, given a specified rate of return.

The basic formula for future value is:

\[ FV = PV \times (1 + r)^n \]

  • \(FV\) = Future Value
  • \(PV\) = Present Value
  • \(r\) = Interest rate or rate of return
  • \(n\) = Number of time periods

4.3 Compounding Frequency

When interest is compounded more frequently than annually, such as monthly, it affects the calculation of both future value and present value (PV). Compounding more frequently leads to a higher effective interest rate and influences the growth or discounting of the investment or cash flow. This process is known as compound interest.

4.3.1 Monthly Compounding Example

4.3.1.1 Future Value Calculation

The formula for FV with monthly compounding is:

\[ FV = PV \times \left(1 + \frac{r}{n}\right)^{nt} \]

  • \(FV\) = Future Value
  • \(PV\) = Present Value
  • \(r\) = Nominal annual interest rate
  • \(n\) = Number of compounding periods per year (monthly compounding = 12)
  • \(t\) = Number of years

With monthly compounding, the nominal annual interest rate (\(r\)) needs to be divided by the number of compounding periods per year (\(n\)). The total number of compounding periods (\(nt\)) is calculated by multiplying the number of years (\(t\)) by the number of compounding periods per year (\(n\)).

4.3.1.2 Present Value (PV) Calculation:

The formula for PV with monthly compounding involves discounting future cash flows back to the present:

\[ PV = \frac{FV}{\left(1 + \frac{r}{n}\right)^{nt}} \]

The key difference is incorporating the adjusted interest rate (\(r/n\)) and the total number of compounding periods (\(nt\)).

4.3.2 Impact of Compounding

  • Higher Effective Interest Rate: More frequent compounding increases the effective interest rate. For the same nominal annual rate, the effective rate will be higher when compounded monthly compared to annually.
  • Accelerated Growth: Investments or savings grow faster due to more frequent compounding, as the interest is added more often.
  • Greater Accumulation: More compounding periods mean a higher future value and a lower present value for the same cash flows or investments.

4.4 The Power of \(e\)

The natural constant, denoted by the letter ‘e’, is a fundamental mathematical constant approximately equal to 2.71828. ‘e’ is a critical element when discussing exponential growth or decay, and it plays a significant role in compound interest calculations, particularly when compounding occurs continuously.

4.4.1 Continuous Compounding using ‘e’:

In finance, continuous compounding assumes that interest is compounded an infinite number of times over a continuous period. The formula for calculating future value with continuous compounding is:

\[ FV = PV \times e^{rt} \]

  • \(FV\) = Future Value
  • \(PV\) = Present Value
  • \(r\) = Annual interest rate (expressed as a decimal)
  • \(t\) = Time in years

4.4.2 Key Points about ‘e’ in Continuous Compounding

  • Exponential Growth: The constant ‘e’ is the base for exponential growth, reflecting the continuous growth of an investment over time.
  • Instantaneous Compounding: Continuous compounding assumes that interest is added instantaneously, resulting in the highest effective interest rate.
  • Limit of Compounding: As the number of compounding periods approaches infinity, the formula \(FV = PV \times e^{rt}\) converges to the value obtained through continuous compounding.

4.4.3 Relationship with Compound Interest Formulas

  • ‘e’ appears in the exponent of the continuous compounding formula, \(FV = PV \times e^{rt}\), showcasing the natural growth of investments over time.
  • For practical purposes, when compounding happens frequently but not continuously (e.g., monthly or quarterly), the formula involving ‘e’ serves as a theoretical limit of growth.

Now, consider the following: \[FV=PV \times e^{rt} \] \[ \begin{align} &\therefore \frac {FV}{PV}= e^{rt}\\ &\therefore ln(\frac {FV}{PV}) = ln(e^{rt})\\ &\therefore ln(\frac {FV}{PV}) = rt\\ &\therefore rt = ln(FV) - ln(PV)\\ \end{align} \]

It is conventional to consider the time-factor in the following manner: \[ ln(\frac {FV_t}{PV}) = rt, \] or \[ R(t)= ln(FV_t) - ln(PV)) \]

You might recognize this formula in earlier texts. Moving forward, we will assume that the rate of return, \(R(t)\), as the log of the difference between prices at different times of the same stock.

4.5 Time-Series Modeling

Time-series modeling in R for financial data is a powerful way to analyze and forecast trends, patterns, and behaviors in stock prices, market indices, or other financial metrics over time.

4.5.1 Types of Time-Series Models

Time-series models are diverse, offering various approaches to capture and predict patterns within sequential data. The common types include:

4.5.1.1 Autoregressive (AR) Model:

  • AR(p): Uses past observations from the same time series to predict future values. The model assumes the future value depends linearly on its own previous values.
  • Assumption: The future value of the series is a linear combination of past values.
  • Stationarity: Assumes the series is stationary (constant mean and variance).

4.5.1.2 Moving Average (MA) Model:

  • MA(q): Utilizes the past forecast errors to predict future values. It focuses on the relationship between the observed series and a lagged forecast error.
  • Assumption: Future values are a linear combination of past forecast errors.
  • Stationarity: Assumes the series is stationary.

4.5.1.3 Autoregressive Integrated Moving Average (ARIMA) Model:

  • ARIMA(p, d, q): Combines AR and MA components with differencing to handle non-stationary data.
    • p: Autoregressive order.
    • d: Degree of differencing.
    • q: Moving average order.
  • Assumption: Combines autoregression, differencing, and moving average components to handle non-stationary data.
  • Stationarity: Requires differencing to achieve stationarity.

4.5.1.4 Seasonal ARIMA (SARIMA) Model:

  • Incorporates seasonality into the ARIMA model by adding seasonal terms (S, s) to account for periodic fluctuations in the data.
  • Assumption: Extends ARIMA to incorporate seasonality.
  • Stationarity: Handles seasonality through differencing at seasonal intervals.

4.5.1.5 Vector Autoregression (VAR) Model:

  • Handles multiple time series simultaneously by modeling each variable as a linear combination of its past values and past values of other variables.
  • Assumption: Extends VAR by incorporating moving average components.
  • Stationarity: Similar to VAR, assumes stationarity of involved variables.

4.5.1.6 Vector Autoregression Moving-Average (VARMA) Model:

  • Extends VAR by incorporating a moving average component.
  • Assumption: Extends VAR by incorporating moving average components.
  • Stationarity: Similar to VAR, assumes stationarity of involved variables.

4.5.1.7 Seasonal VAR (SVAR) Model:

  • Accounts for seasonal patterns in multivariate time series data.
  • Assumption: Accounts for seasonality in a multivariate system.
  • Stationarity: Addresses seasonality within the multivariate framework.

4.5.1.8 Exponential Smoothing Methods:

  • Simple Exponential Smoothing: Gives more weight to recent observations while forecasting.
  • Double Exponential Smoothing (Holt’s method): Considers trends along with levels.
  • Triple Exponential Smoothing (Holt-Winters method): Accounts for seasonality in addition to trend and level.
  • Assumption: Assigns different weights to observations, with more recent observations having more influence.
  • Stationarity: Less reliant on stationarity assumptions compared to ARIMA models.

4.5.1.9 GARCH (Generalized Autoregressive Conditional Heteroskedasticity) Model:

  • Models volatility clustering and time-varying volatility in financial time series, useful for forecasting volatility.
  • Assumption: Captures time-varying volatility in financial data.
  • Stationarity: Focuses on modeling volatility rather than the entire time series.

4.5.1.10 State Space Models:

  • Allows for more complex dynamics and is used when the underlying process is not explicitly known but can be represented by a set of unobserved states.
  • Assumption: Represents the underlying process through unobserved states and observed outputs.
  • Stationarity: The model structure may accommodate non-stationarity through state evolution equations.

4.5.1.11 Machine Learning-Based Models:

  • LSTM (Long Short-Term Memory): A type of recurrent neural network often used for sequence prediction.
  • Random Forests, Gradient Boosting, etc.: Traditional machine learning algorithms adapted to handle time-series data.
  • Assumption: Depends on the specific algorithm but often assumes patterns and dependencies in data.
  • Stationarity: Some ML models can handle non-stationary data without explicit stationarity assumptions.

Each model has its strengths and is suited to different types of time-series data. Choosing the appropriate model involves understanding the data’s characteristics, such as trend, seasonality, stationarity, and the nature of dependencies within the series. Model selection may involve experimentation and testing to determine the best fit for a particular dataset or forecasting task.

In cognicance of time, we will consider the following Models: 1. Exponential smoothing (Simple Exponential Smoothing) 2. ARIMA model 3. Various GARCH models

library(rugarch, warn.conflicts = FALSE, quietly = TRUE)
library(forecast, warn.conflicts = FALSE, quietly = TRUE)
library(tseries, warn.conflicts = FALSE, quietly = TRUE)
library(tidyquant, warn.conflicts = FALSE, quietly = TRUE)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(quantmod, warn.conflicts = FALSE, quietly = TRUE)
library(tidyverse, warn.conflicts = FALSE, quietly = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ stringr 1.5.0
## ✔ purrr   1.0.1     ✔ tibble  3.2.1
## ✔ readr   2.1.4     ✔ tidyr   1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%()     masks ggplot2::%+%()
## ✖ psych::alpha()   masks ggplot2::alpha()
## ✖ plotly::filter() masks dplyr::filter(), stats::filter()
## ✖ dplyr::first()   masks xts::first()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ dplyr::last()    masks xts::last()
## ✖ purrr::reduce()  masks rugarch::reduce()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(PerformanceAnalytics, warn.conflicts = FALSE, quietly = TRUE)
library(tseries, warn.conflicts = FALSE, quietly = TRUE)

For our models, we’ll be working with the Johannesburg-listed NASPERS data from 2008 to 2019.

4.5.2 Exponential Smoothing

Exponential smoothing methods are a class of forecasting techniques used in time-series analysis to predict future values based on weighted averages of past observations, with more recent observations receiving greater weight. These methods are particularly useful for data with trend and seasonality. For our scenarios, we will consider the Simple Exponential Smoothing (SES), but for further reading, I’ve also included Double Exponential Smoothing, and Triple Exponential Smoothing (Holt-Winters).

4.5.2.1 Types of Expontial Smoothing

  1. Simple Exponential Smoothing (SES):
  • Idea: Assigns exponentially decreasing weights to past observations.
  • Calculation: Forecasts are based on a weighted average of past observations, with the weights exponentially decreasing as observations get older.
  • Formula for Forecast: \[ F_{t+1} = \alpha \times Y_t + (1 - \alpha) \times F_t \] where \(F_{t+1}\) is the forecast for the next period, \(Y_t\) is the actual observation at time \(t\), \(F_t\) is the forecast for the current period, and \(\alpha\) is the smoothing parameter (0 < \(\alpha\) < 1).
  1. Double Exponential Smoothing (Holt’s Method):
    • Idea: Incorporates trend by considering both level and trend components.
    • Calculation: In addition to SES, it estimates the trend by considering the change between successive observations.
    • Formulas for Level (\(L_t\)) and Trend (\(T_t\)) Components: \[ L_t = \alpha \times Y_t + (1 - \alpha) \times (L_{t-1} + T_{t-1}) \] \[ T_t = \beta \times (L_t - L_{t-1}) + (1 - \beta) \times T_{t-1} \] where \(L_t\) represents the level component, \(T_t\) represents the trend component, \(Y_t\) is the actual observation, and \(\alpha\) and \(\beta\) are smoothing parameters (0 < \(\alpha, \beta\) < 1).
  2. Triple Exponential Smoothing (Holt-Winters Method):
    • Idea: Considers seasonality along with level and trend components.
    • Calculation: Adds a seasonal component to Holt’s method to handle data with seasonality.
    • Formulas for Level (\(L_t\)), Trend (\(T_t\)), and Seasonal (\(S_t\)) Components: \[ L_t = \alpha \times \left( \frac{Y_t}{S_{t-m}} \right) + (1 - \alpha) \times (L_{t-1} + T_{t-1}) \] \[ T_t = \beta \times (L_t - L_{t-1}) + (1 - \beta) \times T_{t-1} \] \[ S_t = \gamma \times \left( \frac{Y_t}{L_t} \right) + (1 - \gamma) \times S_{t-m} \] where \(L_t\) represents the level component, \(T_t\) represents the trend component, \(S_t\) represents the seasonal component, \(Y_t\) is the actual observation, and \(\alpha, \beta, \gamma\) are smoothing parameters (0 < \(\alpha, \beta, \gamma\) < 1), and \(m\) is the length of the seasonal cycle.

4.5.2.2 SES Model

Consider the below modelling and forecast plot of the SES model.

getSymbols("NPN.JO", from = "2008-01-01", to = "2019-12-31")
## [1] "NPN.JO"
Naspers = ts(NPN.JO$NPN.JO.Adjusted, frequency = 21) # group the daily data into months


train.data = window(Naspers, end = c(120,19)) #training data for 10 years
test.data = window(Naspers, start = c(120,20)) #test data for 1 year

brown = ses(train.data, h = 566)

plot(brown, main = "Brown Exponential smoothing", ylab = "Price", xlab = "Month")
lines(test.data, lty = 3)

The Exponential smoothing methods are relatively simple yet effective for forecasting time-series data, especially when the data exhibit trend, seasonality, or both. It is indeed worth exploring whether the more complex exponential smoothing models are of better fit, but we can conclusively demonstrate that the SES model failed to capture the forecasted prices.

4.5.3 ARIMA Model

The Autoregressive Integrated Moving Average (ARIMA) model is a widely used time-series forecasting method that combines autoregression (AR), differencing (I for integrated), and moving average (MA) components to handle different types of time-series data. It’s a versatile model suitable for both stationary and non-stationary data exhibiting trends and seasonality.

4.5.3.1 Components of ARIMA:

  1. Autoregression (AR):
    • AR(p): Represents the relationship between the current observation and its previous observations.
    • Formula: \(Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p} + \varepsilon_t\)
    • The \(Y_{t-i}\) terms represent past observations, \(c\) is a constant, \(\phi_i\) are the autoregressive coefficients, and \(\varepsilon_t\) is the error term.
  2. Differencing (I for Integrated):
    • Differencing (d): Addresses non-stationarity by subtracting the current observation from the previous observation.
    • Formula (First-order differencing): \(Y'_t = Y_t - Y_{t-1}\)
    • Applied iteratively if more differencing is needed to achieve stationarity.
  3. Moving Average (MA):
    • MA(q): Models the relationship between the current observation and past forecast errors.
    • Formula: \(Y_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + ... + \theta_q \varepsilon_{t-q}\)
    • The \(\varepsilon_{t-i}\) terms represent past forecast errors, \(c\) is a constant, \(\theta_i\) are the moving average coefficients, and \(\varepsilon_t\) is the current error term.

4.5.3.2 ARIMA Notation: ARIMA(p, d, q)

  • p: Number of autoregressive terms (AR order).
  • d: Number of differences needed for stationarity.
  • q: Number of moving average terms (MA order).

4.5.3.3 Steps to Build an ARIMA Model:

  1. Identify Stationarity: Check for trends and seasonality; apply differencing if needed to achieve stationarity.
  2. Determine \(p\), \(d\), and \(q\): Use ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots to estimate model parameters.
  3. Fit ARIMA Model: Use the identified \(p\), \(d\), and \(q\) values to fit the ARIMA model to the data.
  4. Model Diagnostics: Assess the model’s goodness-of-fit using residuals analysis, AIC (Akaike Information Criterion), and other diagnostic measures.
  5. Forecasting: Use the fitted model to make future predictions.

4.5.3.4 Example of ARIMA

getSymbols("NPN.JO", from = "2010-01-01", to = "2019-12-31")
## [1] "NPN.JO"
returns = RETURN(NPN.JO$NPN.JO.Adjusted)
returns = na.omit(returns)
adf.test(returns) 
## Warning in adf.test(returns): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  returns
## Dickey-Fuller = -14.104, Lag order = 13, p-value = 0.01
## alternative hypothesis: stationary
#If p-value is less than 0.05, then we do not reject the Null Hypothesis, i.e., we assume stationarity on this model
auto.arima(returns) #determines the optimal number of parameters for returns, in this case ARIMA(1,0,1)
## Series: returns 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1      ma1    mean
##       0.8451  -0.8837  0.0012
## s.e.  0.1281   0.1135  0.0003
## 
## sigma^2 = 0.0003879:  log likelihood = 6470.9
## AIC=-12933.79   AICc=-12933.78   BIC=-12910.37
ARIMAModel = arima(returns["2008::2018"], c(1,0,1)) # based on the auto.arima function
ARIMAModel
## 
## Call:
## arima(x = returns["2008::2018"], order = c(1, 0, 1))
## 
## Coefficients:
##          ar1      ma1  intercept
##       0.7840  -0.8321     0.0012
## s.e.  0.0954   0.0856     0.0003
## 
## sigma^2 estimated as 0.000397:  log likelihood = 5820.22,  aic = -11632.44
#ARIMA(1,0,1) = 0.0012 + 0.7840returns(t-1) -0.8321e(t-1)

############### Diagnostic Checks
#Residual Parameter:
  #There should be no correlation in the residuals
  #Rssiduals should have zero mean
  #Normal distributed residuals

et = residuals(ARIMAModel)
acf(et) ##is there correlation between the residuals?

plot.ts(et) #Test for zero mean

chart.Histogram(et,
                methods = c("add.density", "add.normal"),
                colorset = c("blue", "green", "red"))

There are three considerations when considering an ARIMA model:

  1. The residuals must be homeskedastic, i.e., the lags must not be correlated. We observe when plotting the Autocorrelation factor graph that the the lags tend to have correlation.
  2. The residuals must have a zero-constant-mean. This condition seems to be satisfied when observing the residual’s plot.
  3. The residuals must follow a normal distribution. When looking at the histogram of the residuals, overlayed with a normal distribution, we can observe that indeed the empirical data has a bell-shaped curve; however, it does not quite align with the normal distribution. In fact, one can argue that the empirical data is skewed to the left and has a higher peak.

4.5.3.5 ARIMA Forecasts

forecast_ARIMA = forecast(ARIMAModel,300)

returns.df = as.data.frame(returns)

plot(forecast_ARIMA, xlab = "Time", ylab = "Returns",
      main = "ARIMA Forecast vs Actual Data")
lines(returns.df, col = "blue")

Although ARIMA models are effective for modeling and forecasting time-series data with trend and seasonality, they might require careful tuning of parameters and diagnostic checks to ensure model adequacy and accuracy in capturing the underlying patterns in the data. For our demonstration, the model does not seem to satisfy the assumptions necessary to assume an ARIMA model. This is more evident when looking at the forecasted values against the actual values. The time-series fails to capture 95% of the returns within the given year.

4.5.4 GARCH Model

Let’s consider the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model. The GARCH model is a statistical model used to describe the variance in a time series, particularly in financial markets. It’s employed to capture the volatility clustering often observed in financial data, where periods of high volatility tend to be followed by more high-volatility periods and vice versa.

The GARCH model is composed of the following:

  1. Autoregressive (AR) Component: The AR component in a GARCH model signifies that the variance or volatility at a particular time point is dependent on previous variance values. It captures the persistence of volatility changes.

  2. Conditional Heteroskedasticity: The “heteroskedasticity” part refers to the changing variance over time. In financial markets, volatility isn’t constant; it fluctuates. GARCH models accommodate this by allowing volatility to vary over time based on past observations.

  3. Generalized Structure: The “generalized” aspect means it’s an extension of ARCH (Autoregressive Conditional Heteroskedasticity) models. GARCH models incorporate not only past squared errors (like in ARCH) but also past variance values to forecast future variance.

One way to demonstrate that financial data indeed does have heteroskedasticity, is to plot the rolling volatility of the asset’s price over time. Using the JSE-listed Naspers Limited (APN.JO), we’ll demonstrate that indeed volatility within financial instruments is unlikely to be homoskedastic

getSymbols("NPN.JO", from = "2008-01-01", to = "2020-12-31")
## [1] "NPN.JO"
Returns = RETURN(NPN.JO$NPN.JO.Adjusted)
Returns = na.omit(Returns)

chart.RollingPerformance(R = Returns["2008::2019"],
                         width = 252,
                         FUN = "sd.annualized",
                         scale = 252, main = "Naspers' Yearly Rolling Volatility")

The above, in addition to a demonstrated non-constant mean, suggest that Naspers’ returns cannot be optimally modeled using the usual ARIMA model. Now, consider the Standard GARCH (sGARCH) model with a constant mean, invariant to time. Using the same NASPERS data, we’ll specify the Mean model, the Variance model and the distribution of Error.

#Considering an sGARCH model for NASPERS

spec <- ugarchspec(variance.model = list(model = "sGARCH"),
                   mean.model = list(armaOrder = c(0, 0)), #
                   distribution.model = "norm")

fit <- ugarchfit(spec, data = Returns)

From theory, we know that returns and volatility are given as follows:

\[ R_t = \mu + \omega_t \] and, \[\sigma^2_t = \omega +\alpha_1\epsilon^2_{t-1} + \beta_1\sigma^2_{t - 1}\]

Hence the Model equation is as follows:

fit
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001577    0.000326   4.8445    1e-06
## omega   0.000014    0.000001  15.1329    0e+00
## alpha1  0.070523    0.004961  14.2150    0e+00
## beta1   0.899076    0.007176 125.2843    0e+00
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001577    0.000299   5.2805        0
## omega   0.000014    0.000001  10.4756        0
## alpha1  0.070523    0.006065  11.6278        0
## beta1   0.899076    0.008665 103.7652        0
## 
## LogLikelihood : 8289.683 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -4.9689
## Bayes        -4.9616
## Shibata      -4.9689
## Hannan-Quinn -4.9663
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      1.849 0.17395
## Lag[2*(p+q)+(p+q)-1][2]     5.451 0.03083
## Lag[4*(p+q)+(p+q)-1][5]     8.423 0.02311
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      6.235 0.01253
## Lag[2*(p+q)+(p+q)-1][5]     6.626 0.06383
## Lag[4*(p+q)+(p+q)-1][9]    11.276 0.02640
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]   0.06551 0.500 2.000  0.7980
## ARCH Lag[5]   0.44958 1.440 1.667  0.8984
## ARCH Lag[7]   5.93486 2.315 1.543  0.1457
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  35.496
## Individual Statistics:              
## mu     0.02616
## omega  3.15912
## alpha1 0.09933
## beta1  0.10680
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.07 1.24 1.6
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value     prob sig
## Sign Bias           0.7248 0.468653    
## Negative Sign Bias  2.9495 0.003205 ***
## Positive Sign Bias  1.0163 0.309560    
## Joint Effect       15.8782 0.001201 ***
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     148.4    4.411e-22
## 2    30     144.0    3.385e-17
## 3    40     233.2    1.716e-29
## 4    50     280.9    2.766e-34
## 
## 
## Elapsed time : 1.208936

For this reason, we have:

\[ R_t = \mu + \epsilon_t \] \[\implies R_t = 0.001577 + \epsilon_t \]

whilst, \[\sigma^2_t = \omega +\alpha_1\epsilon^2_{t-1} + \beta_1\sigma^2_{t - 1}\] \[\implies \sigma^2_t = 0.000014 + 0.070522\epsilon_{t-1} + 0.899078\]

4.5.4.1 Model Diagnostics

Note that the parameter estimates of our model have p-values less than 0.05. For this reason, the parameters are statistically significant with 95% confidence. To compare this model, against others, we also use the following tests:

  1. Information Criteria: the model with the lowest Information Criteria ranks as the model of best-fit.
  2. Ljung-Box Test:This method tests for correlation within standardised residuals. P-values associated with this method test whether or not we reject the claim of residuals being correlated. For this model, p-value is greater than 0.05, hence we do not reject the assertion, i.e., we assume that there is no correlation between the residuals.
  3. The Twelve-Plot Analysis: RStudio has the power to summarise the twelve key diagnostic features of the sGARCH model. We call the model as follows:
plot(fit, which = "all")
## 
## please wait...calculating quantiles...

plot(fit, which = 2)
## 
## please wait...calculating quantiles...

You can call the forecasted mean- and volatility-models in the following manner:

forecasts = ugarchforecast(fitORspec = fit,
                           n.ahead = 20)  #generate 20 data points in the future

plot(fitted(forecasts)) #plot the mean, remember we assumed constant mean

plot(sigma(forecasts)) #plot the forecasted volatility

4.5.5 Types of GARCH Model

In this section, we will consider tweaking a few assumptions in our GARCH model and determine whether the means justify the end. In other words, we will test whether the adjustments improve our model or unjustifiably complexify it.

These adjustments are known in the following convention:

  1. Model 2: GARCH with skewed-student’s t-distribution
  2. Model 3: GJR-GARCH
  3. Model 4: AR(1) GJR-GARCH
  4. Model 5: GJR-GARCH in Mean

4.5.5.1 SSTD GARCH

Note that the below code has only altered the distribution.model from “norm” to “sstd”.

  spec <- ugarchspec(variance.model = list(model = "sGARCH"),
                   mean.model = list(armaOrder = c(0, 0)), 
                   distribution.model = "sstd")

fitSSTD <- ugarchfit(spec, data = Returns)
fitSSTD
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001533    0.000325   4.7148    2e-06
## omega   0.000014    0.000002   7.5490    0e+00
## alpha1  0.082643    0.004134  19.9912    0e+00
## beta1   0.889553    0.010003  88.9249    0e+00
## skew    0.986921    0.022308  44.2401    0e+00
## shape   6.901228    0.713409   9.6736    0e+00
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001533    0.000302   5.0792        0
## omega   0.000014    0.000003   5.3097        0
## alpha1  0.082643    0.008711   9.4867        0
## beta1   0.889553    0.009885  89.9922        0
## skew    0.986921    0.021205  46.5414        0
## shape   6.901228    0.966134   7.1431        0
## 
## LogLikelihood : 8344.153 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.0004
## Bayes        -4.9894
## Shibata      -5.0004
## Hannan-Quinn -4.9965
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      1.697 0.19267
## Lag[2*(p+q)+(p+q)-1][2]     5.231 0.03527
## Lag[4*(p+q)+(p+q)-1][5]     8.084 0.02808
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.102 0.04283
## Lag[2*(p+q)+(p+q)-1][5]     4.417 0.20645
## Lag[4*(p+q)+(p+q)-1][9]     9.737 0.05706
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.0124 0.500 2.000 0.91134
## ARCH Lag[5]    0.6273 1.440 1.667 0.84551
## ARCH Lag[7]    6.9377 2.315 1.543 0.08957
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  24.657
## Individual Statistics:              
## mu     0.02368
## omega  3.04717
## alpha1 0.31264
## beta1  0.24096
## skew   0.13179
## shape  0.91072
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value     prob sig
## Sign Bias           0.8848 0.376332    
## Negative Sign Bias  2.3442 0.019128  **
## Positive Sign Bias  0.6619 0.508084    
## Joint Effect       12.4548 0.005977 ***
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     86.26    1.515e-10
## 2    30    115.34    2.994e-12
## 3    40    186.69    3.622e-21
## 4    50    159.29    1.370e-13
## 
## 
## Elapsed time : 0.8552871
plot(fitSSTD, which = "all")
## 
## please wait...calculating quantiles...

Note that there are now three notable differences, namely (1), the individual statistics now also include skewness and shape factors, (2), the Adjusted Pearson Goodness-of-Fit tests are now less than 0.05, and finally (3), the QQ-plot’s emperical data aligns closer with the theoretical data. Based on these three points, we can conclude that this adjustment represents a better model for the data.

4.5.5.2 GJR-GARCH

The GJR-GARCH model, named after Glosten-Jagannathan-Runkle, also known as the Threshold GARCH model, is an extension of the basic GARCH model that incorporates asymmetry in volatility responses to shocks.

In a GJR-GARCH model:

  1. Threshold Effect: It introduces an additional parameter to account for asymmetry. This parameter captures the idea that volatility responses might be different for positive and negative shocks or that the impact of shocks might vary depending on certain conditions or thresholds.

  2. Asymmetric Volatility: The model allows for different responses to positive and negative shocks. For example, it might imply that negative shocks have a larger impact on volatility than positive shocks, capturing the observed tendency of volatility to increase more after negative events.

The GJR-GARCH model is expressed as:

\[ \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \gamma \varepsilon_{t-1}^2 I_{t-1<0} + \beta \sigma_{t-1}^2 \]

Where: - \(\sigma_t^2\) is the conditional variance at time \(t\). - \(\omega\) is the intercept. - \(\alpha\), \(\beta\), and \(\gamma\) are parameters. - \(\varepsilon_{t-1}^2\) is the squared error term at time \(t-1\). - \(I_{t-1<0}\) is an indicator function that takes the value 1 when \(\varepsilon_{t-1}\) is negative and 0 otherwise.

The GJR-GARCH model allows for more nuanced modeling of volatility dynamics by considering asymmetry in how shocks affect volatility. This asymmetry can be particularly relevant in financial markets, where negative events may have different effects on volatility compared to positive events.

Let’s get into the code, note that the only adjustment to Model 1 was updating the variance.model from sGARCH to gjrGARCH.

  spec <- ugarchspec(variance.model = list(model = "gjrGARCH"),
                   mean.model = list(armaOrder = c(0, 0)), #
                   distribution.model = "sstd")

fitGJR <- ugarchfit(spec, data = Returns)
fitGJR
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : gjrGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001131    0.000321   3.5203 0.000431
## omega   0.000015    0.000002   9.4008 0.000000
## alpha1  0.026258    0.004619   5.6844 0.000000
## beta1   0.891690    0.009269  96.1988 0.000000
## gamma1  0.104451    0.018004   5.8016 0.000000
## skew    0.970917    0.022169  43.7955 0.000000
## shape   7.318456    0.811349   9.0201 0.000000
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001131    0.000307   3.6883 0.000226
## omega   0.000015    0.000002   7.2755 0.000000
## alpha1  0.026258    0.008971   2.9270 0.003422
## beta1   0.891690    0.008689 102.6194 0.000000
## gamma1  0.104451    0.020464   5.1041 0.000000
## skew    0.970917    0.020781  46.7216 0.000000
## shape   7.318456    0.977729   7.4852 0.000000
## 
## LogLikelihood : 8360.958 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.0099
## Bayes        -4.9970
## Shibata      -5.0099
## Hannan-Quinn -5.0053
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      1.153 0.28285
## Lag[2*(p+q)+(p+q)-1][2]     5.503 0.02985
## Lag[4*(p+q)+(p+q)-1][5]     8.933 0.01721
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.381 0.03633
## Lag[2*(p+q)+(p+q)-1][5]     4.804 0.16943
## Lag[4*(p+q)+(p+q)-1][9]    10.278 0.04372
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3] 0.0000222 0.500 2.000 0.99624
## ARCH Lag[5] 0.6628101 1.440 1.667 0.83475
## ARCH Lag[7] 7.0973268 2.315 1.543 0.08275
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  25.4221
## Individual Statistics:              
## mu     0.04578
## omega  3.37121
## alpha1 0.36338
## beta1  0.30802
## gamma1 0.41537
## skew   0.13935
## shape  0.92961
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.69 1.9 2.35
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias           1.5172 0.12932    
## Negative Sign Bias  0.8067 0.41989    
## Positive Sign Bias  1.9815 0.04762  **
## Joint Effect        6.1938 0.10255    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     83.21    5.163e-10
## 2    30     70.45    2.627e-05
## 3    40    161.46    7.735e-17
## 4    50    227.74    7.425e-25
## 
## 
## Elapsed time : 1.561519
plot(fitGJR, which = "all")
## 
## please wait...calculating quantiles...

Note that the notable difference now lies within the twelfth plot. What was a symmetrical “News Impact Curve”, weighing equally on both negative and positive news against the asset, this adjustment weighs more on negative news than it does on positive news. Note that the model has additional parameters, all of which are statistically significant.

4.5.5.3 AR(1) GJR-GARCH

An AR(1) GJR-GARCH model expands on the above as a combination of an Autoregressive Integrated Moving Average (ARIMA) model with an asymmetric GJR-GARCH model.

Here’s a breakdown:

  1. AR(1): The AR(1) model is an Autoregressive model of order 1. It means the current value of a time series is linearly dependent on its previous value with a lag of 1. Mathematically, it can be represented as \(y_t = c + \phi_1 y_{t-1} + \varepsilon_t\), where \(y_t\) is the value at time \(t\), \(\phi_1\) is the autoregressive coefficient, \(c\) is a constant term, and \(\varepsilon_t\) is the error term.

  2. GJR-GARCH: As explained earlier, the GJR-GARCH model introduces asymmetric responses to volatility shocks, allowing for different effects of positive and negative shocks on volatility.

An AR(1) GJR-GARCH model combines the autoregressive nature of AR(1) with the volatility dynamics of the GJR-GARCH model. The model expresses both the conditional mean (through the AR(1) process) and the conditional variance (through the GJR-GARCH process) of a time series, making it suitable for capturing both the trend and the volatility clustering observed in financial time series data.

The mathematical representation of an AR(1) GJR-GARCH model combines the AR(1) equation with the GJR-GARCH equation, allowing for the joint modeling of the mean and volatility of a time series. The specifics of the equations will involve parameters related to both the AR(1) process and the GJR-GARCH process, incorporating autoregressive dynamics and asymmetry in volatility responses simultaneously.

To compute this model in R, the only adjustment lies with adjusting the armaOrder in the following manner:

  spec <- ugarchspec(variance.model = list(model = "gjrGARCH"),
                   mean.model = list(armaOrder = c(1, 0)), #
                   distribution.model = "sstd")

fitAR.GJR <- ugarchfit(spec, data = Returns)
fitAR.GJR
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : gjrGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001161    0.000314   3.6921 0.000222
## ar1    -0.027169    0.017633  -1.5408 0.123374
## omega   0.000014    0.000001  10.4157 0.000000
## alpha1  0.026678    0.004905   5.4386 0.000000
## beta1   0.893328    0.009066  98.5374 0.000000
## gamma1  0.100962    0.017702   5.7034 0.000000
## skew    0.971005    0.022146  43.8450 0.000000
## shape   7.285811    0.813388   8.9574 0.000000
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001161    0.000304   3.8234 0.000132
## ar1    -0.027169    0.017135  -1.5856 0.112835
## omega   0.000014    0.000002   8.3855 0.000000
## alpha1  0.026678    0.008338   3.1995 0.001377
## beta1   0.893328    0.008340 107.1094 0.000000
## gamma1  0.100962    0.020003   5.0474 0.000000
## skew    0.971005    0.020914  46.4291 0.000000
## shape   7.285811    0.954831   7.6305 0.000000
## 
## LogLikelihood : 8362.142 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.0100
## Bayes        -4.9953
## Shibata      -5.0100
## Hannan-Quinn -5.0047
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                    0.08841 0.766213
## Lag[2*(p+q)+(p+q)-1][2]   4.61768 0.001068
## Lag[4*(p+q)+(p+q)-1][5]   8.28579 0.005842
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.025 0.04482
## Lag[2*(p+q)+(p+q)-1][5]     4.411 0.20709
## Lag[4*(p+q)+(p+q)-1][9]     9.797 0.05543
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]  0.005517 0.500 2.000  0.9408
## ARCH Lag[5]  0.646406 1.440 1.667  0.8397
## ARCH Lag[7]  6.980286 2.315 1.543  0.0877
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  27.7792
## Individual Statistics:             
## mu     0.0457
## ar1    0.1590
## omega  3.7331
## alpha1 0.3638
## beta1  0.3084
## gamma1 0.4103
## skew   0.1353
## shape  0.9244
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.89 2.11 2.59
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.3676 0.1715    
## Negative Sign Bias  0.8703 0.3842    
## Positive Sign Bias  1.8521 0.0641   *
## Joint Effect        5.6012 0.1327    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20    111.19    4.799e-15
## 2    30     88.79    5.589e-08
## 3    40    159.13    1.901e-16
## 4    50    143.70    3.092e-11
## 
## 
## Elapsed time : 1.902537
plot(fitAR.GJR, which = "all")
## 
## please wait...calculating quantiles...

We do not see anything unusual in the plots. Further note that the p-value for the AR(1) parameter estimate is 0.123310 > 0.05. For this reason, the latter estimate is statistically insignificant and adds no additional value. In conclusion, this model is not better than the GJR-GARCH model.

4.5.5.4 GJR GARCH in Mean

In our sGARCH, we assumed a zero-constant mean, invariant of time. This adjustment considers a non-zero mean to the model and whether this approach is statistically significant.

  spec <- ugarchspec(variance.model = list(model = "gjrGARCH"),
                   mean.model = list(armaOrder = c(1, 0),
                                     archm = T,
                                     archpow = 2), 
                   distribution.model = "sstd")

fit.Mean.GJR <- ugarchfit(spec, data = Returns)
fit.Mean.GJR
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : gjrGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.000418    0.000432  0.96832 0.332887
## ar1    -0.024653    0.017741 -1.38962 0.164644
## archm   2.011310    1.049917  1.91568 0.055405
## omega   0.000016    0.000002  9.09135 0.000000
## alpha1  0.025841    0.005470  4.72413 0.000002
## beta1   0.887169    0.009795 90.57805 0.000000
## gamma1  0.104874    0.017419  6.02051 0.000000
## skew    0.970993    0.022120 43.89717 0.000000
## shape   7.325487    0.848489  8.63356 0.000000
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.000418    0.000640  0.65344 0.513473
## ar1    -0.024653    0.017367 -1.41957 0.155732
## archm   2.011310    1.539225  1.30670 0.191314
## omega   0.000016    0.000002  6.61439 0.000000
## alpha1  0.025841    0.008677  2.97802 0.002901
## beta1   0.887169    0.010053 88.25275 0.000000
## gamma1  0.104874    0.019181  5.46746 0.000000
## skew    0.970993    0.020629 47.06906 0.000000
## shape   7.325487    0.974813  7.51476 0.000000
## 
## LogLikelihood : 8362.779 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.0098
## Bayes        -4.9933
## Shibata      -5.0098
## Hannan-Quinn -5.0039
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                     0.1079 0.742538
## Lag[2*(p+q)+(p+q)-1][2]    4.1203 0.003295
## Lag[4*(p+q)+(p+q)-1][5]    7.1630 0.016638
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      3.993 0.04568
## Lag[2*(p+q)+(p+q)-1][5]     4.387 0.20954
## Lag[4*(p+q)+(p+q)-1][9]     9.720 0.05755
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]  0.006982 0.500 2.000 0.93341
## ARCH Lag[5]  0.620405 1.440 1.667 0.84759
## ARCH Lag[7]  6.901323 2.315 1.543 0.09119
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  18.734
## Individual Statistics:              
## mu     0.04589
## ar1    0.16893
## archm  0.07821
## omega  2.10323
## alpha1 0.35716
## beta1  0.31667
## gamma1 0.41967
## skew   0.13016
## shape  0.91074
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          2.1 2.32 2.82
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias           1.7438 0.08129   *
## Negative Sign Bias  0.5438 0.58660    
## Positive Sign Bias  2.1004 0.03577  **
## Joint Effect        6.3934 0.09396   *
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     87.79    8.137e-11
## 2    30     83.17    3.937e-07
## 3    40    132.81    3.685e-12
## 4    50    154.64    7.061e-13
## 
## 
## Elapsed time : 4.204651
plot(fit.Mean.GJR, which = "all")
## 
## please wait...calculating quantiles...

Once again, the additional parameters added by this method results in an estimate that is statistically insignificant.

In conclusion, the best GARCH model for our financial data is the GJR-GARCH model, assuming a student’s skewed t-distribution for the residuals.

4.5.6 Simulating NASPERS’ Stocks

Let’s prepare our data for the simulation.

sFinal = ugarchspec(variance.model = list(model = "gjrGARCH"),
                   mean.model = list(armaOrder = c(0, 0)), #
                   distribution.model = "sstd")
setfixed(sFinal) = as.list(coef(fitGJR))

f2008 = ugarchforecast(data = Returns["/2008-12"],
                       fitORspec = sFinal,
                       n.ahead = 252)

f2019 = ugarchforecast(data = Returns["/2019-12"],
                       fitORspec = sFinal,
                       n.ahead = 252)

par(mfrow = c(2,1))
 plot(sigma(f2008))
 plot(sigma(f2019))

You will find that the shape of this asset is consistnet with many listed shares across the world. That is, volatility tended to decline after 2008, whilst volatility tended to increase in the latter parts of 2019.

par(mfrow = c(1,1))

simulations = ugarchpath(spec = sFinal,
                         m.sim = 5, #simulate five lines
                         n.sim = 1*252, #over one year, if more than one multiply 252 by the desired number of years
                         rseed = 123)

p = 231695.3*apply(fitted(simulations),2, 'cumsum') + 231695.3 #The number 231695.3 was the adjusted closing price for 2019-12-31
matplot(p, type = "l", lwd = 3)