Portfolio Analysis and Efficient Frontier - NASDAQ-100

Retrieving Financial Data

library(PortfolioAnalytics)
library(ROI)
library(ROI.plugin.quadprog)
library(MASS)
library(dplyr)
library(caTools)
library(quantmod)
library(timeSeries)
library(astsa)
library(forecast)

# Get risk-free rate (1 month T-Bond rate) from FRED
rf <- getSymbols('DGS1MO', src = 'FRED', auto.assign = FALSE)

# ticker for NASDAQ-100 Index
ndx_symbol <- '^NDX'


ndx <- getSymbols(ndx_symbol, from = "2000-01-01", auto.assign = FALSE)

str(ndx)

## An 'xts' object on 2000-01-03/2018-05-14 containing:
##   Data: num [1:4620, 1:6] 3756 3767 3543 3488 3337 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "NDX.Open" "NDX.High" "NDX.Low" "NDX.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2018-05-15 18:06:06"

Checking for missing values..

any(is.na(ndx))

## [1] FALSE

#which(is.na(ndx)) had there been NAs

Rename columns and convert to monthly data It makes sense to look at daily (or other temporal measure) returns in some cases, but today we are looking to analyze trends over a longer period of time followed by optimization and modelling, so we’d like our data to be a bit closer to normal distribution.

ndx_monthly <- to.monthly(ndx)

ndx_returns <- Return.calculate(ndx_monthly$ndx.Adjusted)
ndx_returns <- ndx_returns[(-1),]
names(ndx_returns) <- 'NDX_Monthly'

*Note: above I am subsetting our ndx data with the ‘Adjusted’ column. For this NASDAQ-100 index (as it is an index) the Adjusted and Close columns are exactly the same. The reason I have ‘Adjusted’ above is merely because for some of the assets I have been working with splits and dividends are not accounted for in the Close data, and I’m just not changing the code in this example for the sake of consistency and personal convenience.

Quickly plot returns for the whole period

plot.zoo(ndx_returns, main = 'NASDAQ-100 Monthly Returns', xlab = 'Date')

Calculate Mean, Geometric Mean, Standard Deviation

# Compute the mean monthly returns
print(paste0('Arithmetic Mean Returns: ', round(mean(ndx_returns), 6)))

## [1] "Arithmetic Mean Returns: 0.005494"

# Compute the geometric mean of monthly returns
print(paste0('Geometric Mean Returns: ', round(mean.geometric(ndx_returns), 6)))

## [1] "Geometric Mean Returns: 0.003042"

# Compute the standard deviation
print(paste0('StdDev Returns: ', round(sd(ndx_returns), 6)))

## [1] "StdDev Returns: 0.069128"

Computing annualized risk free rate, excess returns, sharpe ratio

Compute the annualized risk free rate

rf <- zoo(rf, order.by = index(ndx_returns))
# I noticed just a few missing values in our rf series, so we'll impute by locf (last observation carried forward), which makes the most sense for this metric and our purposes today
rf <- na.locf(rf)
annualized_rf <- (1 + rf)^12 - 1

# Plot the annualized risk free rate
plot.zoo(annualized_rf, main = 'rf: Annualized 1-Month T-Bond', xlab = 'Date')

Compute the series of excess portfolio returns

rf <- as.xts(rf)
ndx_excess <- ndx_returns - rf

# Compare the means
print(paste0('Mean Returns: ', mean(ndx_returns)))

## [1] "Mean Returns: 0.00549423846606738"

print(paste0('Mean Excess: ', mean(ndx_excess)))

## [1] "Mean Excess: -2.09782394335211"

So we have some rather negative excess returns (so not really excess I suppose) for the entirety of the period from 2000 to 2018.

# Look at mean excess for 2010~2018:
print(paste0('Mean Excess: ', mean(ndx_excess['2010/2018'])))

## [1] "Mean Excess: -1.71935532248883"

Rises a bit, but still clearly negative.

Compute the Sharpe ratio

ndx_sharpe <- mean(ndx_excess) / sd(ndx_returns)

# Display annualized mean, sd and sharpe ratio all
# at once using table.AnnualizedReturns()
table.AnnualizedReturns(ndx_returns)

##                           NDX_Monthly
## Annualized Return              0.0371
## Annualized Std Dev             0.2395
## Annualized Sharpe (Rf=0%)      0.1550

So we see the NASDAQ-100 is a pretty volatile index with relatively low annualized returns and a not-so-impressive sharpe ratio for the period as a whole.

Exploratory Analysis and visualization

Extract the annualized mean, volatility, and sharpe ratio of ndx_returns

index(ndx_returns) <- as.Date(index(ndx_returns))
returns_ann <- Return.annualized(ndx_returns)
sd_ann <- StdDev.annualized(ndx_returns)
sharpe_ann <- SharpeRatio.annualized(ndx_returns, rf)

Plot the 12-month rolling annualized mean

chart.RollingPerformance(R = ndx_returns, width = 12, FUN = "Return.annualized")

# Plotting the 12-month rolling annualized standard deviation
chart.RollingPerformance(R = ndx_returns, width = 12, FUN = "StdDev.annualized")

# Plotting the 12-month rolling annualized Sharpe ratio
chart.RollingPerformance(R = ndx_returns, width = 12, FUN = "SharpeRatio.annualized", Rf = rf)

So what we can certainly discern from the above charts is that the general trend over time has been decreasing volatility and increasing annualized sharpe ratio and returns.
Between 2000~2004 we have especially high but falling volatility following the Dot-Com Bubble, and another volatility jump for the 2008 financial crisis.
The inverse is essentially true for returns/sharpe.

Year-to-year distribution comparisons

Comparing volatility: daily vs. monthly returns

par(mfrow = c(1, 2) , mar=c(3, 2, 2, 2))
# Plot daily returns distribution
chart.Histogram(ndx_daily_returns, methods = c('add.density', 'add.normal'))

# Plot monthly returns distribution
chart.Histogram(ndx_returns, methods = c('add.density', 'add.normal'))

This will be true of the vast majority of stocks and indexes, but clearly the monthly distribution is much closer to normal (although not perfectly normal, of course). Daily returns such as the above often have razor thin tails and plenty of skew.

Calculate skewness and kurtosis measures for daily and monthly returns

print(paste0('Daily Returns - Skewness: ', round(skewness(ndx_daily_returns), 4)))

## [1] "Daily Returns - Skewness: 0.4097"

print(paste0('Monthly Returns - Skewness: ', round(skewness(ndx_returns), 4)))

## [1] "Monthly Returns - Skewness: -0.551"

print(paste0('Daily Returns - Kurtosis: ', round(kurtosis(ndx_daily_returns), 4)))

## [1] "Daily Returns - Kurtosis: 7.8405"

print(paste0('Monthly Returns - Kurtosis: ', round(kurtosis(ndx_returns), 4)))

## [1] "Monthly Returns - Kurtosis: 1.7568"

Note especially the Daily Returns - Kurtosis. This is visualized above and can be seen in the fairly ridiculous right tail of our daily returns histogram.

Downside measures of volatility and risk

Below are some more specific measures of risk pertinent to assessing assets for investment.
1. SemiDeviation: sd() weights volatility above and below the mean equivalently, so we use the semideviation to look specifically at the ‘below’ half of standard deviation (i.e. how does the volatility look when we get returns that are lower than average). Humans are naturally risk-adverse, so we want to be take care to minimize our losses. $100 lost > $100 gained
2. VaR: Value at Risk measures the risk, or the largest loss that could be taken within a given period (1 month, in this case), with a given level of confidence (generally 95% or so).
3. ES: Expected Shortfall is the AVERAGE loss taken when losses are below a given percentile value (again, with a given level of confidence)

# Calculate the SemiDeviation
print(paste0('Monthly Returns - SemiDeviation: ', round(SemiDeviation(ndx_returns), 4)))

## [1] "Monthly Returns - SemiDeviation: 0.0521"

# Calculate the value at risk
print(paste0('VaR - 95%: ', round(VaR(ndx_returns, p = 0.95), 4))) #default

## [1] "VaR - 95%: -0.1159"

print(paste0('VaR - 99%: ', round(VaR(ndx_returns, p = 0.99), 4)))

## [1] "VaR - 99%: -0.2041"

# Calculate the expected shortfall
print(paste0('ES - 95%: ', round(ES(ndx_returns, p = 0.95), 4))) #default

## [1] "ES - 95%: -0.1785"

print(paste0('ES - 99%: ', round(ES(ndx_returns, p = 0.99), 4)))

## [1] "ES - 99%: -0.2385"

Above are our VaR and ES values calculated for confidence levels of 95% and 99%, for reference.

Drawdowns

# Table of drawdowns
print(table.Drawdowns(ndx_returns))

##         From     Trough         To   Depth Length To Trough Recovery
## 1 2000-04-01 2002-09-01 2015-02-01 -0.8107    179        30      149
## 2 2015-12-01 2016-02-01 2016-07-01 -0.0993      8         3        5
## 3 2015-08-01 2015-09-01 2015-10-01 -0.0889      3         2        1
## 4 2018-02-01 2018-03-01 2018-05-01 -0.0531      4         2        2
## 5 2015-06-01 2015-06-01 2015-07-01 -0.0247      2         1        1

# Plot drawdowns
chart.Drawdown(ndx_returns)

The fall from its high peak in 2000 makes the rest of the data difficult to judge, so let’s look at drawdowns beginning in the year 2002.

# Table of drawdowns
print(table.Drawdowns(ndx_returns['2002/2018']))

##         From     Trough         To   Depth Length To Trough Recovery
## 1 2007-11-01 2009-02-01 2011-01-01 -0.5011     39        16       23
## 2 2002-01-01 2002-09-01 2004-12-01 -0.4721     36         9       27
## 3 2005-01-01 2005-04-01 2005-11-01 -0.1236     11         4        7
## 4 2006-02-01 2006-07-01 2006-10-01 -0.1177      9         6        3
## 5 2011-05-01 2011-09-01 2012-01-01 -0.1102      9         5        4

# Plot drawdowns
chart.Drawdown(ndx_returns['2002/2018'])

Now that we’ve done some risk evaluation, let’s take a look at some predictions of the future.

ARIMA Model-fitting

Looking at a few ARIMA models and selecting by hand.

Check for stationarity

library(tseries)
adf.test(ndx_returns)

## Warning in adf.test(ndx_returns): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ndx_returns
## Dickey-Fuller = -5.2781, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

Cool. We can move on to ARIMA modelling without differencing.

But just for fun, given a non-stationary time series we could move on to comparing ndx_returns and the corresponding differences visually:

# Plot ndx_returns and diff(ndx_returns)
plot(ndx_returns, main = 'NDX Returns')

plot(diff(ndx_returns), main = 'NDX Returns - Differences')

# Plot P/ACF for ndx_returns and diff(ndx_returns)
acf2(ndx_returns)

##         ACF  PACF
##  [1,]  0.07  0.07
##  [2,] -0.09 -0.09
##  [3,]  0.02  0.03
##  [4,]  0.01 -0.01
##  [5,]  0.07  0.08
##  [6,]  0.05  0.04
##  [7,]  0.05  0.06
##  [8,] -0.08 -0.08
##  [9,] -0.04 -0.02
## [10,]  0.12  0.11
## [11,]  0.06  0.04
## [12,] -0.02 -0.02
## [13,] -0.14 -0.14
## [14,] -0.04 -0.01
## [15,]  0.02  0.00
## [16,]  0.11  0.11
## [17,]  0.00 -0.03
## [18,]  0.00  0.05
## [19,]  0.10  0.13
## [20,] -0.11 -0.13
## [21,]  0.09  0.09
## [22,]  0.03 -0.03
## [23,] -0.04  0.00
## [24,] -0.04 -0.04
## [25,] -0.03 -0.03

acf2(diff(ndx_returns))

##         ACF  PACF
##  [1,] -0.41 -0.41
##  [2,] -0.12 -0.35
##  [3,]  0.06 -0.20
##  [4,] -0.07 -0.24
##  [5,]  0.07 -0.14
##  [6,] -0.03 -0.15
##  [7,]  0.10  0.03
##  [8,] -0.09 -0.05
##  [9,] -0.05 -0.11
## [10,]  0.10 -0.02
## [11,]  0.00  0.03
## [12,]  0.06  0.14
## [13,] -0.14 -0.01
## [14,] -0.01 -0.07
## [15,]  0.00 -0.13
## [16,]  0.11  0.03
## [17,] -0.05 -0.05
## [18,] -0.05 -0.10
## [19,]  0.18  0.16
## [20,] -0.27 -0.11
## [21,]  0.14  0.02
## [22,]  0.02 -0.02
## [23,] -0.04  0.02
## [24,]  0.00  0.01
## [25,]  0.00  0.08

ARIMA model fitting

auto.arima(ndx_returns)

## Series: ndx_returns 
## ARIMA(1,1,0) with drift 
## 
## Coefficients:
##           ar1    drift
##       -0.4107  -0.0005
## s.e.   0.0620   0.0041
## 
## sigma^2 estimated as 0.007313:  log likelihood=228.69
## AIC=-451.39   AICc=-451.28   BIC=-441.22

ndx_fit110 <- auto.arima(ndx_returns)

NOTE:
I like to look at our options for model parameters manually as well, partially in order to visualize residuals and p-values, and partially in order to more deeply understand what I am fitting.
The forecast package’s auto.arima() is nice because it catches any parameters that may cause issues down the line, for instance functions on the non-invertible boundary.
For example I was liking this configuration of ARIMA(0, 1, 1) (MA(1))

ndx_fit011 <- sarima(ndx_returns, 0, 1, 1)

## initial  value -2.371907 
## iter   2 value -2.509335
## iter   3 value -2.557740
## iter   4 value -2.583791
## iter   5 value -2.585557
## iter   6 value -2.585819
## iter   7 value -2.585984
## iter   8 value -2.586220
## iter   9 value -2.586290
## iter  10 value -2.586292
## iter  10 value -2.586292
## iter  10 value -2.586292
## final  value -2.586292 
## converged
## initial  value -2.620560 
## iter   2 value -2.631609
## iter   3 value -2.642425
## iter   4 value -2.667448
## iter   5 value -2.668896
## iter   6 value -2.669901
## iter   7 value -2.670370
## iter   8 value -2.670377
## iter   9 value -2.670381
## iter   9 value -2.670381
## iter   9 value -2.670381
## final  value -2.670381 
## converged

as it has the highest AIC/BIC values of any configuration I tested (I won’t show them all here), but then you notice the MA coeff of -1, which could lead to complications and inaccuracies.

I do like the output format from the astsa package, however.

# Using our auto-fitted model
sarima(ndx_returns, 1, 1, 0)

## initial  value -2.376713 
## iter   2 value -2.468443
## iter   3 value -2.468461
## iter   4 value -2.468461
## iter   4 value -2.468461
## iter   4 value -2.468461
## final  value -2.468461 
## converged
## initial  value -2.463184 
## iter   2 value -2.463207
## iter   3 value -2.463207
## iter   3 value -2.463207
## iter   3 value -2.463207
## final  value -2.463207 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1, 
##     reltol = tol))
## 
## Coefficients:
##           ar1  constant
##       -0.4107   -0.0005
## s.e.   0.0620    0.0041
## 
## sigma^2 estimated as 0.007246:  log likelihood = 228.69,  aic = -451.39
## 
## $degrees_of_freedom
## [1] 217
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1       -0.4107 0.0620 -6.6286   0.000
## constant  -0.0005 0.0041 -0.1208   0.904
## 
## $AIC
## [1] -3.909076
## 
## $AICc
## [1] -3.89948
## 
## $BIC
## [1] -4.878225

Allowing us to view our diagnostics together, and providing the t-table and information criteria as output variables.

Forecasting

Now we can quickly make a forecast of the NDX returns trend using our (ARI) model

fcast <- forecast(ndx_fit110, h = 12)
plot(fcast)

# Just another way to do it, with astsa:
# for 12 months in the future
#sarima.for(ndx_returns, n.ahead = 24, p = 1, d = 1, q = 0)

Compare model to a hold-out set

And actually I want to compare a couple models here. Mostly for the sake of example, and being thorough, I am going to compare the auto.arima() optimized model ARIMA(1, 1, 0) with the MA(1) we saw with the higher AIC/BIC output.

# Set hold out period (begins 24 months before end of data)
hold <- window(ts(ndx_returns), start=196)

# Fit our two models to compare
ndx_fit110_nh = arima(ts(ndx_returns[-c(196:220)]), order=c(1,1,0))
ndx_fit011_nh = arima(ts(ndx_returns[-c(196:220)]), order=c(0,1,1))

fcast_110_nh <- forecast(ndx_fit110_nh, h = 24)
fcast_011_nh <- forecast(ndx_fit011_nh, h = 24)
par(mfrow = c(2, 1) , mar=c(2, 2, 2, 1))
plot(fcast_110_nh, main='autofit - ARIMA(1,1,0)', ylim = c(-0.6, 0.6))
lines(ts(ndx_returns))
plot(fcast_011_nh, main='AIC selection - ARIMA(0,1,1)', ylim = c(-0.6, 0.6))
lines(ts(ndx_returns))

So this is just one example, and we have a pretty calm time series to work with here, but we can see right off the bat the major difference between our forecasts is the degree to which each accounts for volatility.
Seeing as we are looking at stock trends I’d say this comparison is just one more reason to go with the autofit model, as it is more likely to account for sudden jumps or drops in NDX returns.

Perhaps a bit more interesting(?)
Example of an abridged forecasting of APPLE stock returns.

aapl <- getSymbols('AAPL', auto.assign = FALSE)[,6]
str(aapl)

## An 'xts' object on 2007-01-03/2018-05-14 containing:
##   Data: num [1:2861, 1] 8.07 8.25 8.19 8.23 8.92 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "AAPL.Adjusted"
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2018-05-15 18:06:14"

plot(aapl)

aapl_returns <- Return.calculate(aapl)
aapl_fit <- auto.arima(diff(aapl_returns))
aapl_fcast <- forecast(aapl_fit, h = 12)
plot(fcast)

Of course what we are seeking here is the separation of the trend from the noise. Since we are dealing with variation in asset prices (and returns) over time, there is bound to be some error, but our intention by doing univariate analysis on individual returns before deciding whether to invest, etc., is to determine whether the behavior of the asset returns in question prove not to be random, and assuming there is a trend, in what direction is it headed, how quickly, and does it have staying power.
The thing with predicting stocks is that looking at the short term it’s hard to say what will happen, given that we have arbitrage and juicy information is supposedly captured by existing market prices. As such we need to make the best long-term predictions we can on a number of assets simultaneously in order to capture the maximum gains (returns, given the unknown) while reducing risk (volatility) in the case that we’re wrong in one or two places.
But in order to get that full, all-around risk-adverse portfolio, it pays to look at and model the trends of constituent assets individually, which we have managed with the above.

Single-Asset Analysis: NASDAQ-100

WarriWes

May 3, 2018