Preliminary Analysis and Vizualization (DVA)

Retrieving Financial Data and Preparing Returns

Retrieving Symbols for a potential diversified portfolio.

I’ve specifically picked 20 assets from a few GICS sectors (emphasis on IT and energy), a few ETFs, small, mid, large cap, also the iShares 1-3 year T-Bond ETF, a couple potential shorts, etc..

symbols <- c('IJR', 'USO', 'SZK', 'XBI', 'HACK', 'FSLR', 'USD', 'IWF', 'GE',
             'JKI', 'ATVI', 'VYM', 'FORM', 'SHY', 'CTRN', 'WGL', 'BSCI',
             'AMZN', 'FDX', 'ROST')

dva_prices <- NULL
for (sym in symbols)
        dva_prices <- cbind(dva_prices, getSymbols(sym, from='2015-01-01',
                                           auto.assign=FALSE)[,6])

# Check where we have missing values 
as.Date(which(is.na(dva_prices)))

## Date of length 0

((Depends on what we find but)) Since we don’t have so many null values we’ll fill them by cubic spline interpolation:

dva_prices <- na.spline(dva_prices)

Rename columns and calculate returns from adjusted closing prices

colnames(dva_prices) <- symbols

dva_returns <- Return.calculate(dva_prices)
dva_returns <- dva_returns[(-1),]

Preliminary Analysis

We will begin by looking at individual asset return means and volatilities (mean, sd), followed by multivariate relationships (correlation, covariance). This step will generally follow individual asset time series analysis (see NDX Returns sample, below this portfolio optimization module), but could certainly be part of the asset selection process, as it is crucial to consider correlations between assets within a portfolio when deciding what assets to include in the first place. Were I in reality responsible for choosing the assets to include in a client’s portfolio, the univariate and multivariate analysis, etc. would be more fluid and coresponsive, which is to say there isn’t a set-in-stone ‘Step 1. Step 2. …’ sort of order to the process. Even if I at first like the look of my near-future forecasts for an individual asset, if this asset turns out to show rather high covariance with another asset I have decided to include, I may elect not to include it at all out of volatility considerations, or I may decide that one or both deserve especially strict weight constraints, for example.
Anyway, my point is that the format you see below, and across my posted projects in general, is so straight-forward in its course for the sake of presentation and succinctness. If I may, this is a manufactured canal of data science that I have constructed for quick and efficient transport of comprehension, whereas a true and comprehensive analysis process would take the form of a meandering river cutting through the uneven terraforms of what, which, where and when.

Enough of the chatter. Let’s visualize some relationships, shall we?

Plot all means vs. StdDevs

# Create a vector of means 
means <- colMeans(dva_returns)
# Create a vector of standard deviation
sds <- apply(dva_returns, 2, 'sd')

# Create a scatter plot means~sds
plot(sds, means, main = 'Individual Asset Means vs. StdDevs')
text(sds, means, labels = colnames(dva_returns), pos = 1, cex = 0.9, col = 'blue')
abline(h = 0, lty = 3)

We could fix some of the clutter, but we’re really looking for the outliers anyway. Note the high mean:sd ratios of Amazon (probably not surprising anyone) and Activision (Blizzard). On the other hand SZK (ProShares UltraShort) has a relatively low ratio with high volatility.

Visualizing correlations

While a bit large, the correlation matrix visualized in a correlation plot for all symbols in our portfolio.

library(corrplot)

## corrplot 0.84 loaded

cor_matrix <- cor(dva_returns)
corrplot(cor_matrix, method = 'circle')

So then we can look at the above and decide quickly what multivariate relationships we specifically want to investigate further at a glance!

Bivariate Correlations

And then, for instance, if we wanted to look at the relationship between two particularly correlated assets, we could extract them individually. It is, of course, possible to plot the relationships of all assets at once in a matrix of plots, however it gets pretty difficult to read with this many symbols at once.

So, for instance, IJR (iShares S&P SmallCap600 Index (ETF)) and JKI (iShares Morningstar Mid Value Index (ETF)) are strongly correlated.

chart.Correlation(dva_returns[, c('IJR', 'JKI')])

Can replace the assets to be plotted with any pair from our portfolio:

# e.g.
chart.Correlation(dva_returns[, c('AMZN', 'ROST')])

Assets that consistently move together can be liabilities in a portfolio, as high correlation over time indicates that for instance if there is a shock that suddenly bringing down one asset, a highly correlated asset will come crashing with it (doesn’t have to be a major shock either, we’re just talking negative returns). That’s why we have diversification and hedging in the first place. Again, for the sake of conciseness and presentation, I will not be modifying the list of assets we are using throughout this sample analysis and optimization. It’d be no fun analyzing a ‘corr-less’ set of assets anyway, and there’s potential later on to look at long-short combinations to engineer hedging using correlated assets.

Asset returns visualizations

Using some of PerformanceAnalytics built-in functions for visualization.

Rolling Performance

charts.PerformanceSummary(dva_returns, main = 'dva_returns Performance (ALL)')

Cluttered and hard to draw much from this, so say we want to compare just a couple assets:
(SZK = ProShares UltraShort Consumer Goods ETF, ATVI = Activision Blizzard)

charts.PerformanceSummary(dva_returns[,c('SZK','ATVI')], main = 'SZK vs. ATVI Performance')

We can also look at the distributions of returns of individual assets.

chart.Boxplot(dva_returns)

And again, if we want a closer look to compare a few assets’ distributions.. (AMZN = Amazon.com, Inc., CTRN = CitiTrends, Inc., ROST = Ross Stores, Inc.)

chart.Boxplot(dva_returns[,c('AMZN', 'CTRN', 'ROST')], main ='Return Distributions: Retail Players')

At this point it would make sense to investigate further any individual assets that catch our eye. That is outide the scope of this portfolio optimization demonstration, but check out the NASDAQ-100 univariate analysis module below, or better yet some of my other projects in which I have done more extensive univariate analysis.

Efficient Frontier and Visualization of Portfolio Options

Next I’m gonna look at some potential portfolios using the efficient frontier conception. I referenced this guy Frank’s blog for some plotting ideas. Though I won’t go into finance with R/Python as deep as he does (not yet, anyway ;)), I’m sure I’ll check out some more of his projects down the line, and I recommend it for anyone curious about an example of how you can combine your passions to create a fun and productivity-enhancing work of art. http://programmingforfinance.com/
But I digress… The following code utilizes R’s fPortfolio package, which is not my favorite package for flexible portfolio optimization for a few reasons, but it does lend itself to some nice visualization options.
So the following is actually a valid way to look at different optimized portfolios, but I include a more detailed optimization process below in my ‘TXG Portfolio Optimization’ Markdown using the PortfolioAnalytics package

(atm there is not a PortfolioAnalytics version in CRAN for R3.5.0, apparently the maintainers were slacking a bit with their responsiveness, idk, but anyway I may be using an older version of the package, depending on when you read this)

fPortfolio built-in frontier viz:

# Time series version of dva_returns
dva_returns_ts <- as.timeSeries(dva_returns)
# Create and plot fPortfolio efficient frontier
dva_effFrontier <- portfolioFrontier(dva_returns_ts)
plot(dva_effFrontier, c(1, 2, 3, 5, 7))

We can choose what exactly we want to show on our plot from a few metrics that fPortfolio gives as options. Above, the frontier portfolios are displayed, along with the min-risk portfolio (in red) and equal_weights portfolio (blue square), plus the Monte Carlo Portfolios (smaller black points).
The thing is, while it may be useful sometimes to visualize the efficient frontier (and I think good for presentation to a non-technical audience), in this and many cases investors may not be satisfied with the tangency or minimum-risk portfolio. The diversified set of assets we are working with today have relatively low risk for their return, so (depending on volatility and return targets) the more desirable portfolios will likely be further off to the right.
That said, fPortfolio does, of course, allow you to set your constraints and/or pass a portfolio specification object to the portfolioFrontier method. However, as a personal preference I like to use PortfolioAnalytics to play with specification and optimization (see full optimization markdown), so with fPortfolio here we’ll just look at visualization of minimum-risk and tangency portfolios, for the sake of example.

Frontier Weights

# get allocations for each instrument for each point on the efficient frontier
frontierWeights <- getWeights(dva_effFrontier) 
colnames(frontierWeights) <- symbols
risk_return <- frontierPoints(dva_effFrontier)
barplot(t(frontierWeights), main='Frontier Weights', col=rainbow(ncol(frontierWeights)+2), legend=colnames(frontierWeights))

MIN VAR PORTF WEIGHTS (cap of 10%)

mvp <- minvariancePortfolio(dva_returns_ts, spec=portfolioSpec(), constraints='maxW[1:20] = 0.1')
mvp

## 
## Title:
##  MV Minimum Variance Portfolio 
##  Estimator:         covEstimator 
##  Solver:            solveRquadprog 
##  Optimize:          minRisk 
##  Constraints:       maxW 
## 
## Portfolio Weights:
##    IJR    USO    SZK    XBI   HACK   FSLR    USD    IWF     GE    JKI 
## 0.1000 0.0224 0.0377 0.0000 0.0150 0.0000 0.0000 0.1000 0.0814 0.1000 
##   ATVI    VYM   FORM    SHY   CTRN    WGL   BSCI   AMZN    FDX   ROST 
## 0.0079 0.1000 0.0000 0.1000 0.0055 0.1000 0.1000 0.0325 0.0302 0.0676 
## 
## Covariance Risk Budgets:
##     IJR     USO     SZK     XBI    HACK    FSLR     USD     IWF      GE 
##  0.1338  0.0312  0.0526  0.0000  0.0209  0.0000  0.0000  0.1185  0.1135 
##     JKI    ATVI     VYM    FORM     SHY    CTRN     WGL    BSCI    AMZN 
##  0.1217  0.0110  0.1121  0.0000 -0.0021  0.0077  0.0966  0.0008  0.0454 
##     FDX    ROST 
##  0.0421  0.0943 
## 
## Target Returns and Risks:
##   mean    Cov   CVaR    VaR 
## 0.0005 0.0062 0.0143 0.0096 
## 
## Description:
##  Wed May 23 14:56:45 2018 by user: Wesley

TANGENCY PORTF WEIGHTS (cap of 10%)

tangencyPort <- tangencyPortfolio(dva_returns_ts, spec=portfolioSpec(), constraints='maxW[1:20] = 0.1')
tangencyPort

## 
## Title:
##  MV Tangency Portfolio 
##  Estimator:         covEstimator 
##  Solver:            solveRquadprog 
##  Optimize:          minRisk 
##  Constraints:       maxW 
## 
## Portfolio Weights:
##    IJR    USO    SZK    XBI   HACK   FSLR    USD    IWF     GE    JKI 
## 0.1000 0.0000 0.0531 0.0000 0.0000 0.0000 0.0000 0.1000 0.0000 0.0451 
##   ATVI    VYM   FORM    SHY   CTRN    WGL   BSCI   AMZN    FDX   ROST 
## 0.1000 0.0221 0.0186 0.1000 0.0120 0.1000 0.1000 0.1000 0.0491 0.1000 
## 
## Covariance Risk Budgets:
##     IJR     USO     SZK     XBI    HACK    FSLR     USD     IWF      GE 
##  0.1060  0.0000  0.0985  0.0000  0.0000  0.0000  0.0000  0.1017  0.0000 
##     JKI    ATVI     VYM    FORM     SHY    CTRN     WGL    BSCI    AMZN 
##  0.0408  0.1779  0.0191  0.0295 -0.0017  0.0145  0.0788  0.0006  0.1603 
##     FDX    ROST 
##  0.0538  0.1202 
## 
## Target Returns and Risks:
##   mean    Cov   CVaR    VaR 
## 0.0008 0.0069 0.0156 0.0112 
## 
## Description:
##  Wed May 23 14:56:45 2018 by user: Wesley

Extracting measures of VaR (Value at Risk)

mvpweights <- getWeights(mvp)
tangencyweights <- getWeights(tangencyPort)

#Extract value at risk
covRisk(dva_returns_ts, mvpweights)

##         Cov 
## 0.006153309

varRisk(dva_returns_ts, mvpweights, alpha = 0.05)

##       VaR.5% 
## -0.009647651

cvarRisk(dva_returns_ts, mvpweights, alpha = 0.05)

##     CVaR.5% 
## -0.01429125

Minimum-Variance Portfolio Weights

#Plot MVP Weights: Basic Graphs
barplot(mvpweights, main='Minimum Variance Portfolio Weights', xlab='Asset',ylab='Weight In Portfolio (%)', col=rainbow(ncol(frontierWeights)+2), legend=colnames(weights))

Minimum-Variance PF and Tangency PF Weights with ggplot2

library(ggplot2)
#ggplot MVP Weights
df <- data.frame(mvpweights)
assets <- colnames(frontierWeights)
ggplot(data=df, aes(x=assets, y=mvpweights, fill=assets)) +
        geom_bar(stat='identity', position=position_dodge(),colour='black') +
        geom_text(aes(label=sprintf('%.02f %%',mvpweights*100)),
                  position=position_dodge(width=0.9), vjust=-0.25, check_overlap = TRUE) +
        ggtitle('Minimum Variance Portfolio Optimal Weights')+ theme(plot.title = element_text(hjust = 0.5)) +
        labs(x= 'Assets', y = 'Weight (%)')

dft <- data.frame(tangencyweights)
assets <- colnames(frontierWeights)
ggplot(data=dft, aes(x=assets, y=tangencyweights, fill=assets)) +
        geom_bar(stat='identity', position=position_dodge(),colour='black') +
        geom_text(aes(label=sprintf('%.02f %%',tangencyweights*100)),
                  position=position_dodge(width=0.9), vjust=-0.25, check_overlap = TRUE) +
        ggtitle('Tangency Portfolio Weights')+ theme(plot.title = element_text(hjust = 0.5)) +
        labs(x= 'Assets', y = 'Weight (%)')

NOTES
Again, the above is part of the preliminary analysis and preparation process for optimization. Obviously if we intend to develop a return-maximizing, risk-averse portfolio using all the assets we have specifically chosen, we’re not just going to pick the min-variance or tangency portfolio and toss any assets that come up at 0%. As such, the true, optimized portfolios that we end up seriously considering will be produced in the next module. The above visualizations do, however, give us a good starting idea of what sorts of assets we might focus on for minimizing risk and selecting a theoretical optimal portfolio given our asset set.

Once we have determined the assets we hope to include, checked for correlation/covariance and return/risk relationships on the asset-by-asset and portfolio levels, in addition to some preliminary optimal portfolio visualization, we are ready to move on to the careful selection of parameters (namely, constraints and objectives) that we will abide by when optimizaing our final portfolio with a more sophisticated method. This is the topic of the following module, ‘Portfolio Optimization’.