1 All Seasons Portfolio Part 2

My third post to connect my learnings with my personal passion.

2 Business Science Problem Framework (BSPF)

  1. How To Successfully Manage A Data Science Project: The Business Science Problem Framework from Matt Dancho’s Business Science University (https://www.business-science.io/business/2018/06/19/business-science-problem-framework.html)
    Business Science Problem Framework

    Business Science Problem Framework

  2. Principles by Ray Dalio (https://www.principles.com/)
Principles

Principles

  1. Reading the wealth of articles on Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis by Jonathan K. Regenstein (http://www.reproduciblefinance.com/)
    Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis

    Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis

Credits go to both Matt Dancho and Jonathan K Regenstein in making this post possible. Matt for educating me and creating the tidyquant package and Jonathan for putting together a book for beginners.

3 Business Understanding

3.1 Developing a Portfolio based on Ray Dalio’s All Weather Fund

In Tony Robbin’s book - Master the Game, I learned about the All Weather Fund and more about personal investing for the general public. Since then I decided to complete a deeper dive on this portfolio and the different asset classes to determine if data science can help me to unlock some more financial benefits.

Objective and Key Result

  • Objective: Optimize an investment portfolio based on Ray Dalio’s All Weather Fund

  • Hypothesis and Key Result: An asset weighting with better return per unit of risk exists beyond what Ray Dalio has prescribed

3.2 Breakdown of All Weather Fund

  1. 40% Long Term Bonds (TLT)
  2. 30% Stocks (VTI)
  3. 15% Intermediate Term Bonds (IEF)
  4. 7.5% Gold (GLD)
  5. 7.5% Commodities (DBC)

4 My Workflow

Here’s a breakdown of the workflow I used to create the All Seasons Portfolio:

  1. Collecting Data: Source a reproducible function to import, transform and build a stock portfolio using tidyquant package. Create a second function to pull fama french factors.

  2. Visualize Data: Visualize the data to understand the correlation of Fama French Factors to portfolio

  3. Volatility of Portfolio: Chart the comparison of asset and portfolio standard deviation comparison

  4. Modelling: Forecast the Portfolio returns with machine learning using h2o package

  5. Tuning: Initial dive into tuning parameters of a deep_learning algorithm

5 Walk through - Building the All Weather Fund portfolio with tidyquant

5.1 Import Data

Tq_get will grab the 5 years of daily stock prices from Yahoo Finance and convert to monthly returns

6 Visualizing Results

What is Fama French Factors? A brief explanation of the fama french factors is that it extends the capital asset pricing model (CAPM) by adding in multiple variables including market, size, value, investment and profitability. While there is some debate over whether the Three-Factor Model or the Five-Factor Model is better, for the purposes of putting into the machine learning model I have used the Five-Factor Model. You will see several abbreviations being used and here is the explanation for them:

6.1 Correlation Analysis to Fama French Factors

6.1.1 Correlation Funnel to Fama French 5 Factor

Exploratory Data Analysis (EDA) which is usually a labour intensive. With this, I viewed the Correlation funnel as an essential to EDA as it streamlines the EDA process to prepare, process and visualize a correlation analysis. If there were more factors, the visualization will look like the shape of a funnel. In the following Correlation Funnel, what’s interesting to note is that Ray Dalio’s All Weather Fund leans more toward profitability factor and market factor. As a test, I also performed a linear regression on the factors and the results looked fairly similar.

7 Portfolio Volatiity

7.1 Asset and Portfolio Standard Deviation Comparison

In the following visualization, Ray Dalio’s multi-asset portfolio has lower volatility than the individual indexes. As an experiment, I pulled in the S&P500 as a comparison to the portfolio volatility.

# 5.0 Portfolio Standard Deviation ----
asset_returns_wide <- returns_port_tbl %>%
    filter(symbol != "Portfolio") %>%
    pivot_wider(names_from = "symbol", values_from= "monthly.returns") %>%
    # convert to xts function
    timetk::tk_xts(date_var = date)

# 5.1 Portfolio std deviation is 2.05%. Our potfolio has lower volatility
port_std_dev <- asset_returns_wide %>%
    StdDev(weights = w) %>%
    round(.,4) * 100

asset_names <- names(asset_returns_wide)

# 5.2 Asset std deviation is 2-5%


SP500_tbl <- individual_asset_multi_period_data("SPY", end, start, period = "monthly") %>%
    add_column(symbol = "S&P500") %>%
    select(symbol, everything())

asset_std_dev <- returns_port_tbl %>%
    filter(symbol != "Portfolio") %>%
    rbind(SP500_tbl) %>%
    pivot_wider(names_from = "symbol", values_from= "monthly.returns") %>%
    select(-date) %>%
    map_df(~ StdDev(.)) %>%
    round(.,4) *100

std_dev_comparison_tbl <- tibble(Name = c("Portfolio", asset_names, "SP500"),
       Std_Dev = c(port_std_dev, asset_std_dev)) %>%
    unnest() %>%
    mutate(Std_Dev = Std_Dev/100)

std_dev_comparison_tbl %>%
    ggplot(aes(Name, Std_Dev))+
    geom_point() +
    theme_tq() +
    scale_y_continuous(labels = scales::percent) +
    geom_label(aes(label = scales::percent(round(Std_Dev,5))), hjust = "inward")+
    labs(title = "Asset and Portfolio Standard Deviation Comparison",
         caption = "Ray Dalio's All Weather Portfolio",
         subtitle = "Portfolio Standard Deviation is 1.76%",
         x = "",
         y = "Standard Deviation") +
    geom_label(data = std_dev_comparison_tbl, aes(x = "Portfolio",
             y = 0.02,
             label = "Portfolio Standard Deviation"),
             color = palette_light()[[2]],
             fontface = "bold") +
    theme_tq()

8 Modelling with h2o

8.1 Time Series Forecasting with Machine Learning

Initializing h2o to start my machine learning (ML) journey. The h2o package has been an essential in enabling a range of functions from deploying multiple types of ML algorithms, performance metrics and auxiliary functions to make it easy and powerful.

To forecast the portfolio returns, I’ve used the h2o.autoML() to prescribe time series machine learning to forecast time series data. In this analysis, I have used 4 years of data as a training set and 2019 data as the testing set.

To compare, I’ve calculated a few residual metrics. These metrics help to evaluate how the accuracy of the model on the testing dataset. Typically, MAPE is the more reliable metric however it does not perform well when the metrics are low (ie. percentage returns). This causes MAPE to be very high. As a result, the more reliable measure to use is Mean Absolute Error (MAE).

The following is a visualization after applying three different models on the testing dataset to compare with the actual returns of the portfolio (in this case 3 GBMs). Although the forecast is still in its infancy, there are opportunities to improve.

source("00_Scripts/prediction_error_tbl.R")

# Reproducibility prediction based on leaderboard models
model_1 <- prediction_error_tbl(Portfolio_tbl = Port_FF_5_tbl,
                                                h2o_leaderboard = portfolio_automl_models_h2o,
                                                n = 1,
                                                test_tbl = test_tbl_h2o)
model_2 <- prediction_error_tbl(Portfolio_tbl = Port_FF_5_tbl,
                                                 h2o_leaderboard = portfolio_automl_models_h2o,
                                                 n = 2,
                                                 test_tbl = test_tbl_h2o)
model_3 <- prediction_error_tbl(Portfolio_tbl = Port_FF_5_tbl,
                                                         h2o_leaderboard = portfolio_automl_models_h2o,
                                                         n = 3,
                                                         test_tbl = test_tbl_h2o)
# Plot
#Spooky theme

p_load("extrafont")
library(extrafont)
loadfonts(device="win")
theme_spooky = function(base_size = 10, base_family = "Chiller") {

    theme_grey(base_size = base_size, base_family = base_family) %+replace%

        theme(
            # Specify axis options
            axis.line = element_blank(),
            axis.text.x = element_text(size = base_size*0.8, color = "white", lineheight = 0.9),
            axis.text.y = element_text(size = base_size*0.8, color = "white", lineheight = 0.9),
            axis.ticks = element_line(color = "white", size  =  0.2),
            axis.title.x = element_text(size = base_size, color = "white", margin = margin(0, 10, 0, 0)),
            axis.title.y = element_text(size = base_size, color = "white", angle = 90, margin = margin(0, 10, 0, 0)),
            axis.ticks.length = unit(0.3, "lines"),
            # Specify legend options
            legend.background = element_rect(color = NA, fill = " gray10"),
            legend.key = element_rect(color = "white",  fill = " gray10"),
            legend.key.size = unit(1.2, "lines"),
            legend.key.height = NULL,
            legend.key.width = NULL,
            legend.text = element_text(size = base_size*0.8, color = "white"),
            legend.title = element_text(size = base_size*0.8, face = "bold", hjust = 0, color = "white"),
            legend.position = "none",
            legend.text.align = NULL,
            legend.title.align = NULL,
            legend.direction = "vertical",
            legend.box = NULL,
            # Specify panel options
            panel.background = element_rect(fill = " gray10", color  =  NA),
            #panel.border = element_rect(fill = NA, color = "white"),
            panel.border = element_blank(),
            panel.grid.major = element_line(color = "grey35"),
            panel.grid.minor = element_line(color = "grey20"),
            panel.spacing = unit(0.5, "lines"),
            # Specify facetting options
            strip.background = element_rect(fill = "grey30", color = "grey10"),
            strip.text.x = element_text(size = base_size*0.8, color = "white"),
            strip.text.y = element_text(size = base_size*0.8, color = "white",angle = -90),
            # Specify plot options
            plot.background = element_rect(color = " gray10", fill = " gray10"),
            plot.title = element_text(size = base_size*1.2, color = "white",hjust=0,lineheight=1.25,
                                      margin=margin(2,2,2,2)),
            plot.subtitle = element_text(size = base_size*1, color = "white",hjust=0,  margin=margin(2,2,2,2)),
            plot.caption = element_text(size = base_size*0.8, color = "white",hjust=0),
            plot.margin = unit(rep(1, 4), "lines")

        )

}

Port_FF_5_tbl %>%
    na.omit() %>%
    # filter(lubridate::year(date) >= 2018) %>%
    ggplot(aes(date, Portfolio)) +
    geom_point(size = 2, color = "gray", alpha = 0.5, shape = 21, fill = "orange") +
    geom_line(color = "orange", size = 0.5) +
    geom_ma(n = 12, color = "white") +
    # Predictions - Spooky Purple (Model 1)
    geom_point(aes(y = pred), size = 2, color = "gray", alpha = 1, shape = 21, fill = "purple", data = model_1) +
    geom_line(aes(y = pred), color = "purple", size = 0.5, data = model_1) +
    # Predictions - Spooky Purple (Model 2)
    geom_point(aes(y = pred), size = 2, color = "gray", alpha = 1, shape = 21, fill = palette_light()[[2]], data = model_2) +
    geom_line(aes(y = pred), color = palette_light()[[2]], size = 0.5, data = model_2) +
    # Predictions - Spooky Purple (Model 3)
    geom_point(aes(y = pred), size = 2, color = "gray", alpha = 1, shape = 21, fill = palette_light()[[3]], data = model_3) +
    geom_line(aes(y = pred), color = palette_light()[[3]], size = 0.5, data = model_3) +
    # Aesthetics
    theme_spooky(base_size = 20, base_family = "Chiller") +
    annotate("text", x = ymd("2019-05-01"), y = -0.015,
             color = "purple", label = pull_model_name(h2o_leaderboard = portfolio_automl_models_h2o,
                                                       n = 1)) +
    annotate("text", x = ymd("2019-02-01"), y = 0.04000,
             color = palette_light()[[2]], label = pull_model_name(h2o_leaderboard = portfolio_automl_models_h2o,
                                                                   n = 2)) +
    annotate("text", x = ymd("2019-09-01"), y = -0.030,
             color = palette_light()[[3]], label = pull_model_name(h2o_leaderboard = portfolio_automl_models_h2o,
                                                       n = 3)) +
    scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
    scale_y_continuous(labels = scales::percent_format())+
    labs(title = "Forecast of Portfolio Returns: 2019",
         subtitle = "MAE = 1.39%",
         caption = "Ray Dalio's All Weather Portfolio",
         y = "Portfolio Returns", x = "Date")

8.2 h2o leaderboard metrics

I have created a reproducible function to evaluate the top N models based on a metric. Since this is a regression analysis, the available metrics are mae, mean_residual_variance, rmse, rmse and rmsle. In the following examples, I share three visualizations based on the order by mean_residual_variance, by RMSE and by MAE. The order doesn’t really change except at MAE.

9 Hyper Parameter Tuning

9.1 Tuning of DeepLearning h2o model

Within the deeplearning model, there are a number of hyper parameters available to be tuned including but not limited to hidden and epochs. By running , you can identify all the parameters associated with the model.

In the following analysis, multiple deep learning models were generated based on my experimentation of several different parameters. The conclusion was the deep_learning model 9 had the best MAE. However this MAE still fell short of the MAE generated from the GBM models.

10 Parting Thoughts

Visualizing the forecasts and seeing myself inch closer was the most fulfilling accomplishment. Forecasting may seem simple but it ain’t easy - it took me numerous days to figure out what machine learning really is and how to apply it to my problem - h2o automl is really the live saver.

“Success is a journey, not a destination. The doing is often more important than the outcome” - Arthur Ashe

It was really exciting to see the illustrations and analysis I could make from what I have learned so far in. I plan on writing a Part 3 to this post as I dig deeper into this portfolio by solving an optimization problem to determine the perfect asset weights.

If you want to learn more, I am currently in a four part course learning Data Science for Business. I have completed Business Science 201 course and close to completion of the Business Science 102 course.