Overview

Over the last month, we have witnessed all-time high levels of volatility. This has been reflected by the VIX index, also known as the fear gauge. The spike in volatility came along the large sell-off in the stock market triggered by concerns about the severe economic impact of the COVID-19 epidemic. In this note, I will delve a little into the global data on the confirmed COVID-19 cases, published by Johns Hopkins University. The purpose of this vignette is to help us associate the onslaught of the epidemic with uncertainty in the stock market.

Getting Data

We start by downloading the recent data on COVID-19 cases using the github page of the CSSE at Johns Hopkins University. Rather than downloading manually, we can run the following command:

url = "https://github.com/CSSEGISandData/COVID-19/archive/master.zip"
temp <- tempfile()
download.file(url,temp)
trying URL 'https://github.com/CSSEGISandData/COVID-19/archive/master.zip'
Content type 'application/zip' length 59155051 bytes (56.4 MB)
==================================================
downloaded 56.4 MB

The data file contains a number of data sets on the individual country as well as the aggregate global level. After unzipping the data content, I focus on the aggregate level. This can be done using the following commands:

unz_files <- unzip(temp)
data_files <- unz_files[grep("/csse_covid_19_data/csse_covid_19_time_series/",unz_files)]
data_files <- data_files[grep("confirmed_global.csv",data_files)]
ds_list <- lapply(data_files,function(x) read.csv(x,stringsAsFactors = F))

The above downloads the data for all files and store them in a list. We can now combine the list altogether for some data preparation and manipulation in the following.

Cleaning Data

The main issue is that the data need to be organized in a time series format. We do so in the following steps.

library(plyr)
library(lubridate)
ds <- ldply(ds_list,data.frame)

First, let’s aggregate the data on the country level. We refer to data.table library to do so.

library(data.table)
DS <- data.table(ds)
DS$Lat <- DS$Long <- NULL
select_var <- names(DS)[!names(DS) %in% c("Province.State","Country.Region")]
DS <- DS[,lapply(.SD,sum),by =  "Country.Region", .SDcols = select_var ]

Second, let us pivot the data in an xts format. To do so, let reconstruct the dates from the variables’ names with some fixing

all_dates <- gsub("X","",select_var)
all_dates <- strsplit(all_dates,"\\.")
d <- sapply(all_dates,function(x) x[2])
d[nchar(d) == 1] <- paste(0,d[nchar(d) == 1],sep = "")
m <- sapply(all_dates,function(x) x[1])
m[nchar(m) == 1] <- paste(0,m[nchar(m) == 1],sep = "")
y <- sapply(all_dates,function(x) x[3])
y[nchar(y) == 2] <- paste(20,y[nchar(y) == 2],sep = "")
all_dates <- paste(y,m,d,sep = "-")

Given the dates, we can pivot the data easily using the following commands

library(xts)
DS_t <- data.frame(DS[,select_var,with = F])
DS_t <- t(DS_t)
rownames(DS_t) <- all_dates
colnames(DS_t) <- DS$Country.Region
DS_t <- as.xts(DS_t)

Time Series Summary

Let’s focus on the 10 countries with the largest cases

select_countries <- names(DS_t)[ order(DS_t[nrow(DS_t),],decreasing = T)][1:10]
DS_t10 <- DS_t[,select_countries]
tail(data.frame(DS_t10))

Give the time series data, we plot the number of confirmed cases since the beginning of the data, which is Jan 22, 2020.

library(RColorBrewer)
nb.cols <- length(select_countries)
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
plot(DS_t10,legend.loc = "topleft",col = mycolors,main = "Confirmed Cases")

We see that number of confirmed cases in China been roughly constant since mid Fed, whereas the number of cases started to increase exponentially in mid March 2020. We take a closer look at this period below.

plot(DS_t10["20200315/",],legend.loc = "topleft",col = mycolors,main = "Confirmed Cases")

Market Reaction

We consider two major indices to relate to the associate impact of the COVID-19 with the market. We will focus on the S&P 500 and the VIX index. Given the aggregate number of cases, we focus on the average confirmed cases over time as a proxy for the onslaught of the epidemic.

S&P Vs. VIX

We take a quick a look at the cumulative return of each index since the beginning of the year.

library(quantmod)
getSymbols("^VIX")
[1] "^VIX"
getSymbols("^GSPC")
[1] "^GSPC"
SNP <- GSPC$GSPC.Adjusted
VIX <- VIX$VIX.Adjusted
ds_fin <- merge(SNP,VIX)
ds_fin <- ds_fin/lag(ds_fin)-1
ds_fin <- ds_fin["20200101/",]
names(ds_fin) <- c("SNP","VIX")
plot(cumprod(ds_fin+1),legend.loc = "topleft",main = "Cumulative Return")

The red line corresponds to the VIX index, whereas the black line corresponds to the return on the S&P 500. We can see that both are negatively correlated, where drops in the equity index is accompanied by large increases in the VIX, hence the expression the fear gauge. At the same time, we witness a small recovery in the equity market as the number of confirmed cases of COVID-19 has been relatively maintained recently.

Altogether

We move to investigate the market reactions to the increase in confirmed cases. Such number can be viewed as an exponential function of time, as we can see below

ds_covid <- as.xts(apply(DS_t10,1,sum))
plot(ds_covid,main ="Total Confirmed Cases")

To get a better perspective, we can think about the slope of this function, which denotes the velocity of the number of cases

ds_covid_1 <- ds_covid/lag(ds_covid)-1
plot(ds_covid_1,main = "Increase Rate")

The above plot indicates the velocity of the confirmed cases. We can see that in all cases it is positive. However, at the same time we observe that rate declined relative to the beginning of the epidemic. This brings us to the acceleration of the confirmed cases, which corresponds to the second derivative of the confirmed cases over time:

ds_covid_2 <- ds_covid_1/lag(ds_covid_1)-1
plot(ds_covid_2,main = "Acceleration Rate")

We note that the acceleration rate spikes in Feb 13, 2020. Taking a closer look this is the case when the data in China exhibits that truncation characteristic. The result of which leads to a large increase in the second derivative. To mitigate this, we use a local regression to smooth out the trend over time and plot below

a <- na.omit(as.numeric(ds_covid_2))
x <- 1:length(a)
a_fit <- predict(loess(a~x))
names(a_fit) <- date(na.omit(ds_covid_2))
a_fit <- as.xts(a_fit)
plot(a_fit, main = "Acceleration Rate Smoothed",cex = 0.5)

The above plot provides a more robust perspective in terms of the acceleration trend over time. We discern that the acceleration of the confirmed cases has decreased over time. In other words, while the number of confirmed cases are still increasing, the increase rate at has been slowing down relatively.

Given the above insights, let us combine altogether and investigate the market reaction with respect to the increase in the confirmed cases. We refer to the plotly package to illustrate this relationship. We do so for a couple of reasons. First, it provides excellent interactive visualization platform. Second, the scale of the y-axis can be easily modified to accommodate a log-scale. This is relevant when controlling for level increases (such as price levels). Finally, two different time series can be plotted easily on the opposite y-axis. On the left hand side, we plot the market data, whereas we plot the COVID-19 data on the right hand side y-axis.

library(plotly)
# choose the labels format
f <- list(
  family = "Courier New, monospace",
  size = 18,
  color = "#7f7f7f"
)

# x-axis 
xl <- list(
  title = "Date",
  titlefont = f
)

# y-axis for S&P
yl <- list(
  title = "S&P 500",
  titlefont = f,
  type = "log"
)

# right hand y-axis for VIX
yl2 <- list(title = "COVID-19 Rate",
        overlaying = "y",
        titlefont = f, 
        side = "right",
        type = "log")


snp <- as.numeric(cumprod(ds_fin$SNP+1))
vix <- as.numeric(cumprod(ds_fin$VIX+1))
covid <- as.numeric(ds_covid_1[,1])
p <- plot_ly()
p <- add_lines(p,x = ~ date(ds_fin), y = ~ snp, name = "S&P 500")
p <- add_lines(p,x = ~ date(ds_fin), y = ~ vix, name = "VIX")
p <- add_lines(p,x = ~ date(ds_covid_1), y = ~ covid ,yaxis = "y2", name = "COVID-19") 
p <- layout(p,yaxis = yl, yaxis2 = yl2,xaxis = xl)
p

NA

A couple of comments are in order. First, we note that the major spike in the VIX index took place during the time that the velocity of the COVID-19 confirmed cases started to increase again. This can be traced back to Feb 23rd, 2020. Before that, this is the time during which China demonstrated positive signs of maintaining it. However, afterwards, this is the time during which we witnessed the spread of the epidemic outside China, especially in the US.

Second, it is interesting to see how much the fear gauge is associated with the COVID-19 velocity, i.e. increase rate. Since mid-March, we observe that the VIX has dropped from its peak, while at the same time the velocity of the COVID-19 has also dropped. We discern that both time series exhibit an inverse U shape since late Feb 2020. The opposite holds true for the S&P 500, where we are currently witnessing a small recovery.

Concluding Remarks

Economic activity is not simply contracting with the ongoing epidemic. We are actually witnessing a complete freeze in the economy. This has already manifested in a large sell-off since late Feb 2020. The velocity of confirmed cases of COVID-19 seems to be dropping over the last two weeks, the same holds true for the market fear gauge (VIX). The S&P 500 seems to be rebounding as the other indicators slow down. Nonetheless, there is more uncertainty to come given the spillover effect of the epidemic and its long-term impact. For the interested reader, I suggest looking into this report by BlackRock.

Acknowledgement

The above vignette has greatly benefited from data, comments, libraries, and other code sources posted on the web. For instance, we followed Dirk Eddelbuettel’s instructions on Stack Overflow to download the data in an automated manner. Special thanks to him and other members of the R community. Also, we thank Data Novia for pointing out how to expand color palettes. Finally, our sincere graduate goes to the team at John Hopkins for putting this great effort and making this data publicly available.

---
title: "COVID-19 and the Fear Gauge"
author: "Majeed Simaan"
date: "April 8, 2020"
output:
  html_notebook: defa ult
  html_document:
    df_print: paged
  pdf_document: default
---

# Overview
Over the last month, we have witnessed all-time high levels of volatility. This has been reflected by the VIX index, also known as the fear gauge. The spike in volatility came along the large sell-off in the stock market triggered by concerns about the severe economic impact of the COVID-19 epidemic. In this note, I will delve a little into the global data on the confirmed COVID-19 cases, published by  Johns Hopkins University. The purpose of this vignette is to help us associate  the onslaught of the epidemic with uncertainty in the stock market.


# Getting Data
We start by downloading the recent data on COVID-19 cases using the github page of the CSSE at Johns Hopkins University. Rather than downloading manually, we can run the following command:
```{r}
url = "https://github.com/CSSEGISandData/COVID-19/archive/master.zip"
temp <- tempfile()
download.file(url,temp)
```
The data file contains a number of data sets on the individual country as well as the aggregate  global level. After unzipping the data content, I focus on the aggregate level. This can be done using the following commands:
```{r}
unz_files <- unzip(temp)
data_files <- unz_files[grep("/csse_covid_19_data/csse_covid_19_time_series/",unz_files)]
data_files <- data_files[grep("confirmed_global.csv",data_files)]
ds_list <- lapply(data_files,function(x) read.csv(x,stringsAsFactors = F))
```
The above downloads the data for all files and store them in a list. We can now combine the list altogether for some data preparation and manipulation in the following.

## Cleaning Data
The main issue is that the data need to be organized in a time series format. We do so in the following steps.
```{r,message=F,warning=F}
library(plyr)
library(lubridate)
ds <- ldply(ds_list,data.frame)
```
First, let's aggregate the data on the country level. We refer to `data.table` library to do so.
```{r,message = F,warning=F}
library(data.table)
DS <- data.table(ds)
DS$Lat <- DS$Long <- NULL
select_var <- names(DS)[!names(DS) %in% c("Province.State","Country.Region")]
DS <- DS[,lapply(.SD,sum),by =  "Country.Region", .SDcols = select_var ]
```
Second, let us pivot the data in an `xts` format. To do so, let reconstruct the dates from the variables' names with some fixing
```{r}
all_dates <- gsub("X","",select_var)
all_dates <- strsplit(all_dates,"\\.")
d <- sapply(all_dates,function(x) x[2])
d[nchar(d) == 1] <- paste(0,d[nchar(d) == 1],sep = "")
m <- sapply(all_dates,function(x) x[1])
m[nchar(m) == 1] <- paste(0,m[nchar(m) == 1],sep = "")
y <- sapply(all_dates,function(x) x[3])
y[nchar(y) == 2] <- paste(20,y[nchar(y) == 2],sep = "")
all_dates <- paste(y,m,d,sep = "-")
```
Given the dates, we can pivot the data easily using the following commands
```{r,message = F,warning=F}
library(xts)
DS_t <- data.frame(DS[,select_var,with = F])
DS_t <- t(DS_t)
rownames(DS_t) <- all_dates
colnames(DS_t) <- DS$Country.Region
DS_t <- as.xts(DS_t)
```

## Time Series Summary
Let's focus on the 10 countries with the largest cases
```{r}
select_countries <- names(DS_t)[ order(DS_t[nrow(DS_t),],decreasing = T)][1:10]
DS_t10 <- DS_t[,select_countries]
tail(data.frame(DS_t10))
```

Give the time series data, we plot the number of confirmed cases since the beginning of the data, which is Jan 22, 2020.
```{r,fig.align="center",message=F,warning=F,fig.width=6}
library(RColorBrewer)
nb.cols <- length(select_countries)
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
plot(DS_t10,legend.loc = "topleft",col = mycolors,main = "Confirmed Cases")
```
We see that number of confirmed cases in China been roughly constant since mid Fed, whereas the number of cases started to increase exponentially in mid March 2020. We take a closer look at this period below.

```{r,fig.width=6,fig.align="center"}
plot(DS_t10["20200315/",],legend.loc = "topleft",col = mycolors,main = "Confirmed Cases")
```


# Market Reaction 
We consider two major indices to relate to the associate impact of the COVID-19 with the market. We will focus on the S\&P 500 and the VIX index. Given the aggregate number of cases, we focus on the average confirmed cases over time as a proxy for the onslaught of the epidemic.

## S\&P Vs. VIX
We take a quick a look at the cumulative return of each index since the beginning of the year. 
```{r,message=F,warning=F,fig.width=6,fig.align="center"}
library(quantmod)
getSymbols("^VIX")
getSymbols("^GSPC")

SNP <- GSPC$GSPC.Adjusted
VIX <- VIX$VIX.Adjusted
ds_fin <- merge(SNP,VIX)
ds_fin <- ds_fin/lag(ds_fin)-1
ds_fin <- ds_fin["20200101/",]
names(ds_fin) <- c("SNP","VIX")
plot(cumprod(ds_fin+1),legend.loc = "topleft",main = "Cumulative Return")
```
The red line corresponds to the VIX index, whereas the black line corresponds to the return on the S\&P 500. We can see that both are negatively correlated, where drops in the equity index is accompanied by large increases in the VIX, hence the expression the fear gauge. At the same time, we witness a small recovery in the equity market  as the number of confirmed cases of COVID-19 has been relatively maintained recently. 

## Altogether
We move to investigate the market reactions to the increase in confirmed cases. Such number can be viewed as an exponential function of time, as we can see below
```{r,fig.width=6,fig.align="center"}
ds_covid <- as.xts(apply(DS_t10,1,sum))
plot(ds_covid,main ="Total Confirmed Cases")
```
To get a better perspective, we can think about the slope of this function, which denotes the velocity of the number of cases 
```{r,fig.width=6,fig.align="center"}
ds_covid_1 <- ds_covid/lag(ds_covid)-1
plot(ds_covid_1,main = "Increase Rate")
```
The above plot indicates the velocity of the confirmed cases. We can see that in all cases it is positive. However, at the same time we observe that rate declined relative to the beginning of the epidemic. This brings us to the acceleration of the confirmed cases, which corresponds to the second derivative of the confirmed cases over time:
```{r,fig.width=6,fig.align="center"}
ds_covid_2 <- ds_covid_1/lag(ds_covid_1)-1
plot(ds_covid_2,main = "Acceleration Rate")
```
We note that the acceleration rate spikes in Feb 13, 2020. Taking a closer look this is the case when the data in China exhibits that truncation characteristic. The result of which leads to a large increase in the second derivative. To mitigate this, we use a local regression to smooth out the trend over time and plot below
```{r,fig.width=6,fig.align="center"}
a <- na.omit(as.numeric(ds_covid_2))
x <- 1:length(a)
a_fit <- predict(loess(a~x))
names(a_fit) <- date(na.omit(ds_covid_2))
a_fit <- as.xts(a_fit)
plot(a_fit, main = "Acceleration Rate Smoothed",cex = 0.5)
```
The above plot provides a more robust perspective in terms of the acceleration trend over time. We discern that the acceleration of the confirmed cases has decreased over time. In other words, while the number of confirmed cases are still increasing, the increase rate at has been slowing down relatively. 

Given the above insights, let us combine altogether and investigate the market reaction with respect to the increase in the confirmed cases. We refer to the `plotly` package to illustrate this relationship. We do so for a couple of reasons. First, it provides excellent interactive visualization platform. Second, the scale of the y-axis can be easily modified to accommodate a log-scale. This is relevant when controlling for level increases (such as price levels). Finally, two different time series can be plotted easily on the opposite y-axis. On the left hand side, we plot the market data, whereas we plot the COVID-19 data on the right hand side y-axis.
```{r,message=FALSE,warning=F,fig.width=6,fig.align="center"}
library(plotly)
# choose the labels format
f <- list(
  family = "Courier New, monospace",
  size = 18,
  color = "#7f7f7f"
)

# x-axis 
xl <- list(
  title = "Date",
  titlefont = f
)

# y-axis for S&P
yl <- list(
  title = "S&P 500",
  titlefont = f,
  type = "log"
)

# right hand y-axis for VIX
yl2 <- list(title = "COVID-19 Rate",
        overlaying = "y",
        titlefont = f, 
        side = "right",
        type = "log")


snp <- as.numeric(cumprod(ds_fin$SNP+1))
vix <- as.numeric(cumprod(ds_fin$VIX+1))
covid <- as.numeric(ds_covid_1[,1])
p <- plot_ly()
p <- add_lines(p,x = ~ date(ds_fin), y = ~ snp, name = "S&P 500")
p <- add_lines(p,x = ~ date(ds_fin), y = ~ vix, name = "VIX")
p <- add_lines(p,x = ~ date(ds_covid_1), y = ~ covid ,yaxis = "y2", name = "COVID-19") 
p <- layout(p,yaxis = yl, yaxis2 = yl2,xaxis = xl)
p

```
A couple of comments are in order. First, we note that the major spike in the VIX index took place during the time that the velocity of the COVID-19 confirmed cases started to increase again. This can be traced back to Feb 23rd, 2020. Before that, this is the time during which China demonstrated positive signs of maintaining it. However, afterwards, this is the time during which we witnessed the spread of the epidemic outside China, especially in the US. 

Second, it is interesting to see how much the fear gauge is associated with the COVID-19 velocity, i.e. increase rate. Since mid-March, we observe that the VIX has dropped from its peak, while at the same time the velocity of the COVID-19 has also dropped. We discern that both time series exhibit an inverse U shape since late Feb 2020. The opposite holds true for the S\&P 500, where we are currently witnessing a  small recovery.

# Concluding Remarks
Economic activity is not simply contracting with the ongoing epidemic. We are actually witnessing a complete freeze in the economy. This has already manifested in a large sell-off since late Feb 2020. The velocity of confirmed cases of COVID-19 seems to be dropping over the last two weeks, the same holds true for the market fear gauge (VIX). The S\&P 500 seems to be rebounding as the other indicators slow down. Nonetheless, there is more uncertainty to come given the spillover effect of the epidemic and its long-term impact. For the interested reader, I suggest looking into this [report](https://www.blackrock.com/corporate/insights/blackrock-investment-institute/publications/global-macro-outlook) by BlackRock. 

# Acknowledgement
The above vignette has greatly benefited from data, comments, libraries,  and other code sources posted on the web. For instance, we followed Dirk Eddelbuettel's instructions on [Stack Overflow](https://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data) to download the data in an automated manner. Special thanks to him and other members of the R community. Also, we thank [Data Novia](https://www.datanovia.com/en/blog/easy-way-to-expand-color-palettes-in-r) for pointing out how to expand color palettes. 
Finally, our sincere graduate goes to the team at John Hopkins for putting this great effort and making this data publicly available. 