Overview

This project will determine if the daily market volatility in the S&P 500 increased significantly during the COVID-19 crash compared to pre-COVID.

Introduction

For this project, the data was obtained using the “tidyquant” library in R, which retrieves financial data directly from Yahoo Finance. The data retrieved from Yahoo Finance was separated into two separate time periods: pre-COVID (March 2019 - April 2019) and COVID (March 2020 - April 2020).

Exploring the Data

# Load Libraries 

library(tidyquant)
## Warning: package 'tidyquant' was built under R version 4.5.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Warning: package 'xts' was built under R version 4.5.3
## Warning: package 'zoo' was built under R version 4.5.3
## Warning: package 'quantmod' was built under R version 4.5.3
## Warning: package 'TTR' was built under R version 4.5.3
## Warning: package 'PerformanceAnalytics' was built under R version 4.5.3
## ── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.12 ──
## ✔ PerformanceAnalytics 2.1.0      ✔ TTR                  0.24.4
## ✔ quantmod             0.4.28     ✔ xts                  0.14.2
## ── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
## ✖ zoo::as.Date()                 masks base::as.Date()
## ✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
## ✖ PerformanceAnalytics::legend() masks graphics::legend()
## ✖ quantmod::summary()            masks base::summary()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:xts':
## 
##     first, last
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

# Store Data in Environment 

sp500 <- tq_get("^GSPC", 
                from = "2019-03-01",
                to = "2020-04-30")

sp500_returns <- sp500 %>%
  tq_transmute(
    select = adjusted, 
    mutate_fun = periodReturn, 
    period = "daily",
    col_rename = "daily_return"
  )

pre_covid <- sp500_returns %>%
  filter(date >= "2019-03-01", 
         date <= "2019-04-30")

covid <- sp500_returns %>%
  filter( date >= "2020-03-01",
          date <= "2020-04-30")

pre_covid <- pre_covid %>%
  mutate(period = "2019")

covid <- covid %>% 
  mutate(period = "2020")

sp500_final <- bind_rows(pre_covid, covid)

# View the Structure of the Dataset

str(sp500_final)
## tibble [84 × 3] (S3: tbl_df/tbl/data.frame)
##  $ date        : Date[1:84], format: "2019-03-01" "2019-03-04" ...
##  $ daily_return: num [1:84] 0 -0.00388 -0.00113 -0.00652 -0.00813 ...
##  $ period      : chr [1:84] "2019" "2019" "2019" "2019" ...
# Preview First Few Lines of Data 

head(sp500_final)
## # A tibble: 6 × 3
##   date       daily_return period
##   <date>            <dbl> <chr> 
## 1 2019-03-01      0       2019  
## 2 2019-03-04     -0.00388 2019  
## 3 2019-03-05     -0.00113 2019  
## 4 2019-03-06     -0.00652 2019  
## 5 2019-03-07     -0.00813 2019  
## 6 2019-03-08     -0.00213 2019

The sp500_final dataset contains 84 observations of dates. In these observations, three variables were recorded, which were the date within the timeframe (in date), daily return values (in daily_return) and the year, either 2019 or 2020, the dates correlate to (in period).

# Summary of Data

summary(sp500_final)
##       date             daily_return           period         
##  Min.   :2019-03-01   Min.   :-0.1198406   Length:84         
##  1st Qu.:2019-03-31   1st Qu.:-0.0061816   Class :character  
##  Median :2019-09-30   Median : 0.0009817   Mode  :character  
##  Mean   :2019-09-29   Mean   : 0.0010626                     
##  3rd Qu.:2020-03-30   3rd Qu.: 0.0110231                     
##  Max.   :2020-04-29   Max.   : 0.0938277

The dates range from March 1, 2019 to April 29, 2020. The daily returns range from approximately -0.1198 to 0.0938. The period column contains character values of either “2019” or “2020”, and the length is 84.

# Time-Series Plot 

sp500_final <- sp500_final %>%
  mutate(
    period = ifelse(format(date, "%Y") == "2019",
                    "Pre-COVID",
                    "COVID"),
    
    plot_date = as.Date(
      paste0("2020-", format(date, "%m-%d"))
    )
  )

ggplot(sp500_final,
       aes(x = plot_date,
           y = daily_return * 100,
           color = period,
           group = period)) +
  
  geom_line(linewidth = 1) +
  
  labs(
    title = "S&P 500 Daily Returns",
    subtitle = "March-April 2019 (Pre-COVID) vs. March-April 2020 (COVID)",
    x = "Date",
    y = "Daily Return (%)",
    color = "Period"
  ) +
  
  scale_color_manual(values = c(
    "Pre-COVID" = "red",
    "COVID" = "blue"
  )) +
  
  scale_x_date(date_labels = "%b %d") +
  
  theme_minimal()

For the purpose of visualization, the “plot_date” column was added to the “sp500_final” dataset to align the pre-COVID (2019) and COVID (2020) periods by their corresponding months and days so both lines could be compared. The year was temporarily set to 2020 for all observations only for visualization purposes, and this column does not affect any of the statistical analysis.

Based on the time-series plot of the data, it appears that the COVID period shows extreme negative and positive movement over time in comparison to the more stable movement of the pre-COVID period.

Analysis

Hypotheses

\(H_0 : \mu|r_{2020}| = \mu|r_{2019}|\) \(H_A : \mu|r_{2020}| > \mu|r_{2019}|\)

Sample Statistics

# Mean and Standard Deviation (Pre-COVID) 

mean_2019 <- mean(abs(pre_covid$daily_return))
sd_2019 <- sd(abs(pre_covid$daily_return))

mean_2019
## [1] 0.004133875
sd_2019
## [1] 0.004179569
# Mean and Standard Deviation (During COVID)

mean_2020 <- mean(abs(covid$daily_return))
sd_2020 <- sd(abs(covid$daily_return))

mean_2020
## [1] 0.03618841
sd_2020
## [1] 0.02809939

During the pre-COVID period (March 2019 - April 2019), the mean daily return was approximately 0.0041 with a standard deviation of about 0.0042. During the COVID period (March 2020 - April 2020), the mean daily return was approximately 0.0362 and a standard deviation of about 0.0281.

Test Statistic and P-Value

# Test Statistic 

t.test(abs(covid$daily_return), abs(pre_covid$daily_return), alternative = "greater")$statistic
##        t 
## 7.312493
# P-Value

t.test(abs(covid$daily_return), abs(pre_covid$daily_return), alternative = "greater")$p.value
## [1] 2.33374e-09

The test statistic is approximately 7.31, and the p-value is 2.33374e-09.

Conclusions and Decisions

Because our p-value (2.33374e-09) is less than alpha (0.05), we reject the null hypothesis in favor of the alternative. From our low p-value and large test statistic (7.31), we can conclude that the results of this analysis are statistically significant, and there was a significant increase in the volatility of the S&P 500’s daily returns during COVID.

Limitations

In this analysis, the volatility was approximated using the mean of the absolute value of the daily returns.

\(volatility = \mu(|daily returns|)\)

However, in financial analysis, standard deviation is a more standard approach for measuring market volatility. While analyzing the data using the mean absolute daily returns still captures the average magnitude of change in daily returns, it is not the most complete or conventional method of measurement. Standard deviation captures the spread or deviation from the mean, and it can capture wild swings financial returns. Thus, using the mean absolute daily returns is merely an approximation or estimate.