Multidataset & model combinations using Purrr and caret

The motivation
An example using BostonHousing data: purrr and caret
Applying a function over rows of a data frame: Benchmarking multiple ways:purrr/purrrlyr and other iterative methods
- Plot times
Forecasting multitimeseries using purrr-One more purrr awesome use
Other References

tl;dr: You’ll learn how to use purrr, caret and list-cols to quickly create hundreds of dataset + model combinations, store data & model objects neatly in one tibble, and post process programatically. These tools enable succinct functional programming in which a lot gets done with just a few lines of code.

The motivation

I want to write a quick blogpost on two phenomenal pieces of code written by Kuhn et al, and Wickham et al, namely - purrr, caret. These play so well with the tidyverse that they have become an indispensible part of my repertoire.

In any datascience project, I want to investigate the effect of various combinations of:

Variable transformations
Variable selection
Grouped vs ungrouped categorical variables
Models of different types
Different hyperparameter tuning methods

For each of the various combinations possible, I want to quantify model performance using common performance metrics like AIC or SBC. Commonly, I’ll select the model that has the ‘best’ possible performance among all such models.

Traditionally, I end up with many R objects: one for each new combination of transformation-model_type-tuning_method. For example, boostFit, xgbFit, glmFit, elastinetFit for untransformed variables. If I have any transformations, I might also have boostFit.xform, xgbFit.xform, glmFit.xform etc. Add to that, investigation of grouped vs ungrouped variables… boostFit.xform.grouped, xgbFit.xform.ungrouped etc. You get the idea.

The challenge with this approach is that the data and the models remain separated, there’s a lot of repeat code for object management, manipulation and plotting, and in order to compare all the models together, we have to somehow stitch the results together. (For the last point, resamples() in caret works beautifully, but requires the same number of resamples in each model.)

The approach I’m presenting below is a combination of a few approaches I learnt through the APM book, the caret documentation, grammar and verbage in tidyverse, as well as a couple of useful talks in the 2017 R Studio conferenence in Orlando [Notably ones on purrr and list-cols]. What you’ll also see is that the code is extremely succint, which is simply a joy to write and read.

An example using `BostonHousing` data: `purrr` and `caret`

Load libs & data

The libraries I’m using here are tidyr, tibble, dplyr, magrittr, purrr, and caret. The dataset is from mlbench.

library(tidyr)
library(tibble)
library(dplyr)
library(magrittr)
library(purrr)
library(caret)
library(mlbench)
data("BostonHousing")
head(BostonHousing)

##      crim zn indus chas   nox    rm  age    dis rad tax ptratio      b
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12
##   lstat medv
## 1  4.98 24.0
## 2  9.14 21.6
## 3  4.03 34.7
## 4  2.94 33.4
## 5  5.33 36.2
## 6  5.21 28.7

Transformations on Xs

For the purposes of this demonstration, I’ll simply create two new sets variables using a Box-Cox transformation - caret’s preProcess() makes this easy - and the squared values of the originals. Save each new variable-set in a new character vector which follows the naming convention preds.xxxx.[^1]

1: This makes it super easy to find all such variable sets quickly using ls(pattern = 'preds') and store it in a character vector.

# The originals
response <- 'medv'
preds.original <- colnames(BostonHousing[,1:13])

# Box-Cox transformation
prepTrain <- preProcess(x = BostonHousing[,preds.original], method = c('BoxCox'))
boxcoxed <- predict(prepTrain,newdata = BostonHousing[,preds.original])
colnames(boxcoxed) <- paste0(colnames(boxcoxed),'.boxed')
preds.boxcoxed <- colnames(boxcoxed)

# Squaring
squared <- (BostonHousing[,c(1:3,5:13)])^2
colnames(squared) <- paste0(colnames(squared),'.sq')
preds.sq <- colnames(squared)

# All together now...
BostonHousing %<>% 
  cbind(boxcoxed,squared)

# Make sure everything is a numerical (for xgboost to work), and also NOT a tibble (some caret functions have trouble with tibbles)
BostonHousing %<>% 
  map_df(.f = ~as.numeric(.x)) %>% as.data.frame()

str(BostonHousing)

## 'data.frame':    506 obs. of  39 variables:
##  $ crim         : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn           : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus        : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ nox          : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm           : num  6.58 6.42 7.18 7 7.15 ...
##  $ age          : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis          : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad          : num  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax          : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio      : num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ b            : num  397 397 393 395 397 ...
##  $ lstat        : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv         : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
##  $ crim.boxed   : num  -5.06 -3.6 -3.6 -3.43 -2.67 ...
##  $ zn.boxed     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus.boxed  : num  0.994 2.966 2.966 0.914 0.914 ...
##  $ chas.boxed   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ nox.boxed    : num  -0.83 -1.09 -1.09 -1.13 -1.13 ...
##  $ rm.boxed     : num  2.81 2.76 3 2.94 2.99 ...
##  $ age.boxed    : num  175 224 161 110 137 ...
##  $ dis.boxed    : num  1.41 1.6 1.6 1.8 1.8 ...
##  $ rad.boxed    : num  0 0.693 0.693 1.099 1.099 ...
##  $ tax.boxed    : num  1.88 1.87 1.87 1.87 1.87 ...
##  $ ptratio.boxed: num  117 158 158 174 174 ...
##  $ b.boxed      : num  78764 78764 77157 77866 78764 ...
##  $ lstat.boxed  : num  1.89 2.78 1.61 1.2 1.99 ...
##  $ crim.sq      : num  3.99e-05 7.46e-04 7.45e-04 1.05e-03 4.77e-03 ...
##  $ zn.sq        : num  324 0 0 0 0 ...
##  $ indus.sq     : num  5.34 49.98 49.98 4.75 4.75 ...
##  $ nox.sq       : num  0.289 0.22 0.22 0.21 0.21 ...
##  $ rm.sq        : num  43.2 41.2 51.6 49 51.1 ...
##  $ age.sq       : num  4251 6225 3733 2098 2938 ...
##  $ dis.sq       : num  16.7 24.7 24.7 36.8 36.8 ...
##  $ rad.sq       : num  1 4 4 9 9 9 25 25 25 25 ...
##  $ tax.sq       : num  87616 58564 58564 49284 49284 ...
##  $ ptratio.sq   : num  234 317 317 350 350 ...
##  $ b.sq         : num  157530 157530 154315 155733 157530 ...
##  $ lstat.sq     : num  24.8 83.54 16.24 8.64 28.41 ...

Here’s our new predictor variable sets:

pred_varsets <- ls(pattern = 'preds')
pred_varsets

## [1] "preds.boxcoxed" "preds.original" "preds.sq"

Create a starter dataframe

I first create a starter dataframe where the input data is repeated as many times as the number of predictor variable sets. enframe() allows us to embed objects a dataframe column.

num_var_select <- length(pred_varsets)
list(BostonHousing) %>% 
    rep(num_var_select) %>% 
    enframe(name = 'id', value = 'rawdata') %>% 
    mutate(pred_varsets = pred_varsets) -> starter_df
starter_df

## # A tibble: 3 x 3
##      id rawdata                 pred_varsets  
##   <int> <list>                  <chr>         
## 1     1 <data.frame [506 x 39]> preds.boxcoxed
## 2     2 <data.frame [506 x 39]> preds.original
## 3     3 <data.frame [506 x 39]> preds.sq

Now, I split the raw data into train.X column which houses data only for those predictor variables identified in the pred_varsets column. map2 is a great function which allows a mapping to be done over two variables and passed to a function.

I also create a train.Y for the response variable here.

# Function to select columns in the raw data
filterColumns <- function(x,y){
    x[,(colnames(x) %in% eval(parse(text=y)))]
}

# Create X and Y columns
starter_df %<>% 
  transmute(
  id,
  pred_varsets,
  train.X = map2(rawdata, pred_varsets,  ~ filterColumns(.x, .y)),
  train.Y = map(rawdata, ~ .x$medv)
  )

starter_df

## # A tibble: 3 x 4
##      id pred_varsets   train.X                 train.Y    
##   <int> <chr>          <list>                  <list>     
## 1     1 preds.boxcoxed <data.frame [506 x 13]> <dbl [506]>
## 2     2 preds.original <data.frame [506 x 13]> <dbl [506]>
## 3     3 preds.sq       <data.frame [506 x 12]> <dbl [506]>

Select the models

This is where I can select which models I want in the analysis. Each model should be in a function of this style:

modelName <- function(X, Y){
    ctrl <- trainControl(
        ...
    )
    train(
        x = X,
        y = Y,
        trContrl = ctrl,
        method = '## modelname ##',
        ...
    )
}

I’m using caret exclusively, so each function needs a trainControl() and a train(). Learn more about caret here.

rpartModel <- function(X, Y) {
    ctrl <- trainControl(
        ## 5-fold CV
        method = "repeatedcv",
        number = 5
    )
    train(
        x = X,
        y = Y,
        method = 'rpart2',
        trControl = ctrl,
        tuneGrid = data.frame(maxdepth=c(2,3,4,5)),
        preProc = c('center', 'scale')
    )
}
xgbTreeModel <- function(X,Y){
    ctrl <- trainControl(
        ## 5-fold CV
        method = "repeatedcv",
        number = 5
    )
    train(
        x=X,
        y=Y,
        method = 'xgbTree',
        trControl = ctrl,
        tuneGrid = expand.grid(nrounds = c(100,300,500), 
                              max_depth = c(2,4,6) ,
                              eta = 0.1,
                              gamma = 1, 
                              colsample_bytree = 1, 
                              min_child_weight = 1, 
                              subsample = 1),
        preProc = c('center', 'scale')
    )
}

Once these functions are setup, enframe these into a dataframe.

model_list <- list(rpartModel=rpartModel,
                   xgbModel=xgbTreeModel) %>%
    enframe(name = 'modelName',value = 'model')

model_list

## # A tibble: 2 x 2
##   modelName  model 
##   <chr>      <list>
## 1 rpartModel <fun> 
## 2 xgbModel   <fun>

Create data-model combinations

Now, we’re ready to combine the two together. train_df has all the predictor varset combinations, model_list has the list of all models. I’m assuming I want to run each combination of the two; so if I have 3 variable sets, and 2 models, I have a total of 6 models to run. This code sets that up:

train_df <-
    starter_df[rep(1:nrow(starter_df),nrow(model_list)),]

train_df %<>%
    bind_cols(
        model_list[rep(1:nrow(model_list),nrow(starter_df)),] %>% arrange(modelName)
    ) %>%
    mutate(id=1:nrow(.))
train_df

## # A tibble: 6 x 6
##      id pred_varsets   train.X                 train.Y     modelName model
##   <int> <chr>          <list>                  <list>      <chr>     <lis>
## 1     1 preds.boxcoxed <data.frame [506 x 13]> <dbl [506]> rpartMod<U+0085> <fun>
## 2     2 preds.original <data.frame [506 x 13]> <dbl [506]> rpartMod<U+0085> <fun>
## 3     3 preds.sq       <data.frame [506 x 12]> <dbl [506]> rpartMod<U+0085> <fun>
## 4     4 preds.boxcoxed <data.frame [506 x 13]> <dbl [506]> xgbModel  <fun>
## 5     5 preds.original <data.frame [506 x 13]> <dbl [506]> xgbModel  <fun>
## 6     6 preds.sq       <data.frame [506 x 12]> <dbl [506]> xgbModel  <fun>

Solve the models

The data is almost all setup now. invoke_map() is a function which can call functions and pass it arguments. Since we need to pass both train.X and train.Y together, there’s an intermediate call to map2() to “listify” these first into params.

All them models solve, and their results (the model object itself) is stored in modelFits.

train_df %<>%
  mutate(params = map2(train.X, train.Y,  ~ list(X = .x, Y = .y)),
                 modelFits=invoke_map(model,params)
         )
train_df %>% dplyr::select(pred_varsets,modelName,params,modelFits)

## # A tibble: 6 x 4
##   pred_varsets   modelName  params     modelFits  
##   <chr>          <chr>      <list>     <list>     
## 1 preds.boxcoxed rpartModel <list [2]> <S3: train>
## 2 preds.original rpartModel <list [2]> <S3: train>
## 3 preds.sq       rpartModel <list [2]> <S3: train>
## 4 preds.boxcoxed xgbModel   <list [2]> <S3: train>
## 5 preds.original xgbModel   <list [2]> <S3: train>
## 6 preds.sq       xgbModel   <list [2]> <S3: train>

Extract results

Now, I can extract pretty much any model performance or hypertuning parameter using purrr. Since caret is so lovingly standardized, it doesn’t matter if I’m using a glm, xgboost, rpart2, or ann - the code remains the same.

train_df %<>% 
    mutate(
        RMSE=map_dbl(modelFits,~max(.x$results$RMSE)),
        RMSESD=map_dbl(modelFits,~max(.x$results$RMSESD)),
        Rsq=map_dbl(modelFits,~max(.x$results$Rsquared)),
        bestTune=map(modelFits,~.x$bestTune)
    )
train_df %>% dplyr::select(-train.X,-train.Y,-params,-modelFits)

## # A tibble: 6 x 8
##      id pred_varsets   modelName  model   RMSE RMSESD   Rsq bestTune      
##   <int> <chr>          <chr>      <list> <dbl>  <dbl> <dbl> <list>        
## 1     1 preds.boxcoxed rpartModel <fun>   6.07  1.04  0.741 <data.frame [<U+0085>
## 2     2 preds.original rpartModel <fun>   6.21  0.878 0.742 <data.frame [<U+0085>
## 3     3 preds.sq       rpartModel <fun>   5.94  0.982 0.721 <data.frame [<U+0085>
## 4     4 preds.boxcoxed xgbModel   <fun>   3.59  0.675 0.891 <data.frame [<U+0085>
## 5     5 preds.original xgbModel   <fun>   3.26  0.463 0.889 <data.frame [<U+0085>
## 6     6 preds.sq       xgbModel   <fun>   3.47  0.756 0.896 <data.frame [<U+0085>

This allows us to very quickly visualize the results using lattice or ggplot across all models.

lattice::dotplot(Rsq~pred_varsets|modelName,train_df)

train_df %>% 
    ggplot(aes(x=pred_varsets,color=modelName))+
    geom_point(aes(y=RMSE),size=2)+
    geom_errorbar(aes(ymin = RMSE-RMSESD,ymax= RMSE+RMSESD),size=.5,width=.15)

Since the model fit objects themselves are embedded, I can still look at each model’s internals. For example, to plot the results of the 5-fold CV on the grid search for the xgboost model:

plot(train_df$modelFits[train_df$modelName=='xgbModel' & train_df$pred_varsets=='preds.original'][[1]])

In conclusion

These packages make investigating a very large number of datasets and models easy. With just a few lines of code, I ran a total of 6 different models - and for each model: a 5-fold cross validation, and for the xgboost models: a grid search across two tuning parameters - to select the best model on any number of performance criteria. Yet, everything remains neatly arranged in one dataframe, which can be saved as .RData and retrived later.

Also remember that the raw data replicated in the data column of starter_df doesn’t have to be the exact same dataset for each row either. so you could leverage this methodology for a train-validate-test approach, or for resampled training sets, where each row has completely different datasets embedded within. Really depends on your creativity and how you write subsequent code. You’re definitely going to find more material online on this topic, be sure to check r-bloggers.

Applying a function over rows of a data frame: Benchmarking multiple ways:`purrr`/`purrrlyr` and other iterative methods

Source for this document.

@dattali asked, “what’s a safe way to iterate over rows of a data frame?” The example was to convert each row into a list and return a list of lists, indexed first by column, then by row.

A number of people gave suggestions on Twitter, which I’ve collected here. I’ve benchmarked these methods with data of various sizes; scroll down to see a plot of times.

library(purrr)
library(dplyr)
library(tidyr)
library(purrrlyr)

# @dattali
# Using apply (only safe when all cols are same type)
f_apply <- function(df) {
  apply(df, 1, function(row) as.list(row))  
}

# @drob
# split + lapply
f_split_lapply <- function(df) {
  df <- split(df, seq_len(nrow(df)))
  lapply(df, function(row) as.list(row))
}

# @winston_chang
# lapply over row indices
f_lapply_row <- function(df) {
  lapply(seq_len(nrow(df)), function(i) as.list(df[i,,drop=FALSE]))
}

# @winston_chang
# lapply + lapply: Treat data frame as list, and the slice out lists
f_lapply_lapply <- function(df) {
  cols <- seq_len(length(df))
  names(cols) <- names(df)

  lapply(seq_len(nrow(df)), function(row) {
    lapply(cols, function(col) {
      df[[col]][[row]]
    })
  })
}

# @winston_chang
# purrrlyr::by_row
f_by_row <- function(df) {
  res <- by_row(df, function(row) as.list(row))
  res$.out
}

# @JennyBryan
# purrr::pmap
f_pmap <- function(df) {
  pmap(df, list)
}

# purrr::pmap, but coerce df to a list first
f_pmap_aslist <- function(df) {
  pmap(as.list(df), list)
}

# @krlmlr
# dplyr::rowwise
f_rowwise <- function(df) {
  df %>% rowwise %>% do(row = as.list(.))
}

Benchmark each of them, using data sets with varying numbers of rows:

run_benchmark <- function(nrow) {
  # Make some data
  df <- data.frame(
    x = rnorm(nrow),
    y = runif(nrow),
    z = runif(nrow)
  )
  
  res <- list(
    apply         = system.time(f_apply(df)),
    split_lapply  = system.time(f_split_lapply(df)),
    lapply_row    = system.time(f_lapply_row(df)),
    lapply_lapply = system.time(f_lapply_lapply(df)),
    by_row        = system.time(f_by_row(df)),
    pmap          = system.time(f_pmap(df)),
    pmap_aslist   = system.time(f_pmap_aslist(df)),
    rowwise       = system.time(f_rowwise(df))
  )
  
  # Get elapsed times
  res <- lapply(res, `[[`, "elapsed")

  # Add nrow to front
  res <- c(nrow = nrow, res)
  res
}

# Run the benchmarks for various size data
all_times <- lapply(1:5, function(n) {
  run_benchmark(10^n)
})

# Convert to data frame
times <- lapply(all_times, as.data.frame)
times <- do.call(rbind, times)

knitr::kable(times)

nrow	apply	split_lapply	lapply_row	lapply_lapply	by_row	pmap	pmap_aslist	rowwise
1e+01	0.00	0.00	0.00	0.00	0.05	0.00	0.09	0.08
1e+02	0.01	0.01	0.01	0.02	0.11	0.00	0.00	0.17
1e+03	0.00	0.07	0.05	0.03	1.00	0.00	0.00	0.52
1e+04	0.08	0.94	0.82	0.26	10.84	0.03	0.03	4.70
1e+05	1.08	46.61	17.66	2.78	150.84	0.34	0.35	59.96

Plot times

This plot shows the number of seconds needed to process n rows, for each method. Both the x and y use log scales, so each step along the x scale represents a 10x increase in number of rows, and each step along the y scale represents a 10x increase in time.

library(ggplot2)
library(scales)

# Convert to long format
times_long <- gather(times, method, seconds, -nrow)

# Set order of methods, for plots
times_long$method <- factor(times_long$method,
  levels = c("apply", "split_lapply", "lapply_row", "lapply_lapply", "by_row",
    "pmap", "pmap_aslist", "rowwise")
)

# Plot with log-log axes
ggplot(times_long, aes(x = nrow, y = seconds, colour = method)) +
  geom_point() +
  geom_line() +
  annotation_logticks(sides = "trbl") +
  theme_bw() +
  scale_y_continuous(trans = log10_trans(),
    breaks = trans_breaks("log10", function(x) 10^x),
    labels = trans_format("log10", math_format(10^.x)),
    minor_breaks = NULL) +
  scale_x_continuous(trans = log10_trans(),
    breaks = trans_breaks("log10", function(x) 10^x),
    labels = trans_format("log10", math_format(10^.x)),
    minor_breaks = NULL)

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis

Forecasting multitimeseries using purrr-One more `purrr` awesome use

Beautiful data science with functional programming and R

If you follow what’s going on in the R ecosystem, you know that the language has been revolutionized by two packages: dplyr and magrittr. The former provides elegant methods for working with relational data, while the latter brought the pipe %>% to R.

Together, these packages make handling tabular data a snap. For those of you who haven’t seen these packages in action, here’s a little taste

iris %>%
    group_by(Species) %>%
    summarize(Mean.Width=mean(Sepal.Width))

##   Mean.Width
## 1   3.057333

The syntax is super clear-even some one with no R experience can understand those three lines of code. No wonder these packages caught on so fast.

But what if you don’t have tabular data? What if you’re using JSON or maybe a time series? Sure you can shoeshorn those data types into a data.frame, but it’s not the most natural. This is here my new favorite R package purrr can help you out.

What’s purrr?

The goal of purrr is to bring more functional programming to R. It gives you easy-to-use mappers, reducers and filters that you can chain together. In short, it’s dplyr for non-tabular data. That’s great in theory, you’re probably thinking, but how do I use it in practice? What does the code look like? Well, I say, the best explanation is always an example.

An Example: Forecasting Gross Domestic Product

Let’s say you want to forecast the GDP of a bunch of countries using R. You could force this data into a data.frame but it’s going to be a bit weird. Or you could use R’s built-in time series class but then you don’t have dplyr’s easy syntax. This is where purrr really excels: you can use time series objects and have dplyr-like syntax.

Getting the data from Quandl

We’re going to use the awesome Quandl API to get a bunch of GDP data and then forecast the GDP for 9 different countries for the next 10 years. First, let’s load some libraries and define the datasets that we’re interested in.

library(Quandl)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(purrr)
library(forecast)
library(dplyr)
datasets = c(Germany="FRED/DEURGDPR",
             Singapore="FRED/SGPRGDPR",
             Finland="FRED/FINRGDPR",
             France="FRED/FRARGDPR",
             Italy="FRED/ITARGDPR",
             Ireland="FRED/IRLRGDPR",
             Japan="FRED/JPNRGDPR",
             Netherlands="FRED/NLDRGDPR",
             Norway="FRED/NORRGDPR")

#Next, let's use the map function from purrr to pull down the data as a time series object for each of these countries.

time_series = datasets %>%
  map(Quandl, type="ts")

## What's going on with those two lines of code? The map function takes a list or vector and a function as inputs. It then applies the function to each element. So, for each of the elements in the datasets vector, it's calling the Quandl function to retrieve a time series object. Thetype="ts" at the end ensures that we're getting a time series object instead of a data.frame.

#You may be thinking that map is awfully similar to lapply, and you're right. In fact,purrr is wrapping the standard lapply. But it's providing an improved syntax and expanded capabilities that we'll see later.

#If you look at time_series, you'll see that it's a list of time series, one for each of the countries. (I truncated the output to save space).

str(time_series)

## List of 9
##  $ Germany    : Time-Series [1:52] from 1960 to 2011: 684721 716424 749838 770928 822282 ...
##  $ Singapore  : Time-Series [1:52] from 1960 to 2011: 7208 7795 8349 9181 8841 ...
##  $ Finland    : Time-Series [1:37] from 1975 to 2011: 85842 86137 86344 88865 95194 ...
##  $ France     : Time-Series [1:52] from 1960 to 2011: 525384 550628 588468 624954 665117 ...
##  $ Italy      : Time-Series [1:52] from 1960 to 2011: 493997 534536 567702 599551 616317 ...
##  $ Ireland    : Time-Series [1:42] from 1970 to 2011: 36835 38525 41188 43513 44647 ...
##  $ Japan      : Time-Series [1:52] from 1960 to 2011: 574859 642998 698275 759635 844605 ...
##  $ Netherlands: Time-Series [1:52] from 1960 to 2011: 163562 168306 175543 181336 196931 ...
##  $ Norway     : Time-Series [1:52] from 1960 to 2011: 59380 62997 64692 67143 70508 ...

Forecasting GDP

Next, we’re going to train an ARIMA model on each of these time series and then forecast for the next 10 years. The code for this is super easy with purrr. Check out my other blog post on time series if you want to find out more about these models.

forecasts = time_series %>%
  map(auto.arima) %>%
  map(forecast, h=10)

forecasts

## $Germany
##      Point Forecast   Lo 80   Hi 80   Lo 95   Hi 95
## 2012        3254005 3125365 3382645 3057267 3450743
## 2013        3303415 3121490 3485339 3025185 3581644
## 2014        3352824 3130013 3575635 3012064 3693584
## 2015        3402233 3144953 3659513 3008757 3795709
## 2016        3451643 3163995 3739291 3011723 3891562
## 2017        3501052 3185949 3816154 3019144 3982960
## 2018        3550461 3210111 3890811 3029941 4070981
## 2019        3599871 3236021 3963720 3043411 4156330
## 2020        3649280 3263360 4035200 3059066 4239494
## 2021        3698689 3291893 4105485 3076549 4320829
## 
## $Singapore
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2012       314488.9 306869.7 322108.1 302836.4 326141.4
## 2013       334273.0 322897.8 345648.3 316876.1 351670.0
## 2014       355945.8 342549.7 369342.0 335458.2 376433.5
## 2015       364567.7 347442.0 381693.5 338376.1 390759.3
## 2016       376784.1 354788.1 398780.1 343144.1 410424.1
## 2017       395694.1 369588.2 421800.0 355768.6 435619.6
## 2018       409962.5 379526.6 440398.4 363414.8 456510.3
## 2019       421816.8 386192.3 457441.3 367333.9 476299.7
## 2020       437396.3 396493.2 478299.4 374840.5 499952.2
## 2021       453099.6 406952.4 499246.7 382523.6 523675.6
## 
## $Finland
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2012       205227.8 198962.2 211493.5 195645.3 214810.3
## 2013       208433.7 197695.6 219171.7 192011.2 224856.1
## 2014       211639.5 197806.4 225472.5 190483.6 232795.3
## 2015       214845.3 198492.9 231197.7 189836.4 239854.1
## 2016       218051.1 199518.7 236583.5 189708.3 246393.9
## 2017       221256.9 200775.3 241738.6 189933.0 252580.9
## 2018       224462.8 202201.9 246723.6 190417.8 258507.8
## 2019       227668.6 203760.6 251576.6 191104.5 264232.7
## 2020       230874.4 205425.7 256323.2 191953.9 269794.9
## 2021       234080.2 207178.8 260981.7 192938.1 275222.4
## 
## $France
##      Point Forecast   Lo 80   Hi 80   Lo 95   Hi 95
## 2012        2321635 2292282 2350987 2276744 2366525
## 2013        2356110 2307001 2405218 2281005 2431215
## 2014        2390585 2327642 2453527 2294323 2486846
## 2015        2425060 2350818 2499301 2311517 2538602
## 2016        2459535 2375500 2543570 2331014 2588055
## 2017        2494010 2401209 2586810 2352083 2635936
## 2018        2528485 2427678 2629292 2374313 2682656
## 2019        2562960 2454737 2671182 2397447 2728472
## 2020        2597435 2482273 2712596 2421310 2773559
## 2021        2631909 2510204 2753615 2445776 2818043
## 
## $Italy
##      Point Forecast   Lo 80   Hi 80   Lo 95   Hi 95
## 2012        1962509 1924947 2000071 1905063 2019955
## 2013        1974968 1918926 2031011 1889259 2060678
## 2014        1987427 1915164 2059691 1876910 2097945
## 2015        1999886 1912210 2087563 1865797 2133976
## 2016        2012345 1909542 2115149 1855121 2169570
## 2017        2024805 1906911 2142698 1844502 2205107
## 2018        2037264 1904183 2170344 1833734 2240793
## 2019        2049723 1901278 2198167 1822696 2276749
## 2020        2062182 1898147 2226217 1811312 2313051
## 2021        2074641 1894758 2254523 1799534 2349747
## 
## $Ireland
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2012       186180.6 181847.1 190514.1 179553.0 192808.2
## 2013       189686.3 180220.0 199152.6 175208.9 204163.8
## 2014       193192.0 180525.5 205858.6 173820.2 212563.9
## 2015       196697.8 181490.1 211905.4 173439.6 219955.9
## 2016       200203.5 182822.3 217584.6 173621.3 226785.7
## 2017       203709.2 184397.6 223020.8 174174.7 233243.7
## 2018       207214.9 186149.1 228280.7 174997.5 239432.3
## 2019       210720.6 188035.8 233405.4 176027.2 245414.0
## 2020       214226.3 190030.7 238422.0 177222.2 251230.4
## 2021       217732.1 192114.4 243349.7 178553.3 256910.8
## 
## $Japan
##      Point Forecast   Lo 80   Hi 80   Lo 95   Hi 95
## 2012        4400750 4304313 4497187 4253263 4548238
## 2013        4418175 4269352 4566999 4190570 4645781
## 2014        4435601 4237821 4633381 4133122 4738079
## 2015        4453026 4206494 4699558 4075987 4830064
## 2016        4470451 4174303 4766599 4017531 4923371
## 2017        4487876 4140794 4834959 3957059 5018693
## 2018        4505301 4105755 4904848 3894247 5116356
## 2019        4522727 4069086 4976368 3828943 5216511
## 2020        4540152 4030745 5049559 3761081 5319223
## 2021        4557577 3990720 5124434 3690645 5424510
## 
## $Netherlands
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2012       722492.5 712037.2 732947.8 706502.6 738482.4
## 2013       733202.0 715504.3 750899.8 706135.6 760268.4
## 2014       743911.5 721171.5 766651.6 709133.7 778689.4
## 2015       754621.1 727769.5 781472.6 713555.1 795687.0
## 2016       765330.6 734918.3 795742.8 718819.1 811842.1
## 2017       776040.1 742442.4 809637.8 724656.9 827423.3
## 2018       786749.6 750243.5 823255.8 730918.3 842581.0
## 2019       797459.1 758259.7 836658.6 737508.7 857409.6
## 2020       808168.7 766449.4 849887.9 744364.6 871972.8
## 2021       818878.2 774782.9 862973.5 751440.3 886316.1
## 
## $Norway
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2012       310730.3 307307.9 314152.7 305496.3 315964.4
## 2013       315564.8 309262.5 321867.0 305926.3 325203.2
## 2014       320399.2 312169.8 328628.7 307813.3 332985.1
## 2015       325233.7 315449.5 335017.8 310270.1 340197.3
## 2016       330068.1 318944.5 341191.8 313056.0 347080.3
## 2017       334902.6 322584.2 347220.9 316063.3 353741.9
## 2018       339737.0 326330.1 353144.0 319232.8 360241.3
## 2019       344571.5 330157.9 358985.1 322527.7 366615.3
## 2020       349406.0 334051.5 364760.4 325923.4 372888.5
## 2021       354240.4 337999.6 370481.2 329402.2 379078.6

##With purrr, chaining together these maps becomes super easy. No messy loops or confusing lapply calls to deal with. It's functional programming at its best.

Plotting the forecasts

Now that we have our forecasts, it would be good to visualize them. We’re going to use themap2 function to do this.

par(mfrow=c(3,3)) # set grid for plots
map2(forecasts, names(forecasts), 
     function(forecast, country) plot(forecast, 
                                      main=country, 
                                      bty="n",
                                      ylab="GDP",
                                      xlab="Year"))

## $Germany
## $Germany$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 3254005 3303415 3352824 3402233 3451643 3501052 3550461 3599871
##  [9] 3649280 3698689
## 
## $Germany$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 3125365 3057267
## 2013 3121490 3025185
## 2014 3130013 3012064
## 2015 3144953 3008757
## 2016 3163995 3011723
## 2017 3185949 3019144
## 2018 3210111 3029941
## 2019 3236021 3043411
## 2020 3263360 3059066
## 2021 3291893 3076549
## 
## $Germany$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 3382645 3450743
## 2013 3485339 3581644
## 2014 3575635 3693584
## 2015 3659513 3795709
## 2016 3739291 3891562
## 2017 3816154 3982960
## 2018 3890811 4070981
## 2019 3963720 4156330
## 2020 4035200 4239494
## 2021 4105485 4320829
## 
## 
## $Singapore
## $Singapore$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 314488.9 334273.0 355945.8 364567.7 376784.1 395694.1 409962.5
##  [8] 421816.8 437396.3 453099.6
## 
## $Singapore$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 306869.7 302836.4
## 2013 322897.8 316876.1
## 2014 342549.7 335458.2
## 2015 347442.0 338376.1
## 2016 354788.1 343144.1
## 2017 369588.2 355768.6
## 2018 379526.6 363414.8
## 2019 386192.3 367333.9
## 2020 396493.2 374840.5
## 2021 406952.4 382523.6
## 
## $Singapore$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 322108.1 326141.4
## 2013 345648.3 351670.0
## 2014 369342.0 376433.5
## 2015 381693.5 390759.3
## 2016 398780.1 410424.1
## 2017 421800.0 435619.6
## 2018 440398.4 456510.3
## 2019 457441.3 476299.7
## 2020 478299.4 499952.2
## 2021 499246.7 523675.6
## 
## 
## $Finland
## $Finland$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 205227.8 208433.7 211639.5 214845.3 218051.1 221256.9 224462.8
##  [8] 227668.6 230874.4 234080.2
## 
## $Finland$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 198962.2 195645.3
## 2013 197695.6 192011.2
## 2014 197806.4 190483.6
## 2015 198492.9 189836.4
## 2016 199518.7 189708.3
## 2017 200775.3 189933.0
## 2018 202201.9 190417.8
## 2019 203760.6 191104.5
## 2020 205425.7 191953.9
## 2021 207178.8 192938.1
## 
## $Finland$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 211493.5 214810.3
## 2013 219171.7 224856.1
## 2014 225472.5 232795.3
## 2015 231197.7 239854.1
## 2016 236583.5 246393.9
## 2017 241738.6 252580.9
## 2018 246723.6 258507.8
## 2019 251576.6 264232.7
## 2020 256323.2 269794.9
## 2021 260981.7 275222.4
## 
## 
## $France
## $France$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 2321635 2356110 2390585 2425060 2459535 2494010 2528485 2562960
##  [9] 2597435 2631909
## 
## $France$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 2292282 2276744
## 2013 2307001 2281005
## 2014 2327642 2294323
## 2015 2350818 2311517
## 2016 2375500 2331014
## 2017 2401209 2352083
## 2018 2427678 2374313
## 2019 2454737 2397447
## 2020 2482273 2421310
## 2021 2510204 2445776
## 
## $France$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 2350987 2366525
## 2013 2405218 2431215
## 2014 2453527 2486846
## 2015 2499301 2538602
## 2016 2543570 2588055
## 2017 2586810 2635936
## 2018 2629292 2682656
## 2019 2671182 2728472
## 2020 2712596 2773559
## 2021 2753615 2818043
## 
## 
## $Italy
## $Italy$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 1962509 1974968 1987427 1999886 2012345 2024805 2037264 2049723
##  [9] 2062182 2074641
## 
## $Italy$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 1924947 1905063
## 2013 1918926 1889259
## 2014 1915164 1876910
## 2015 1912210 1865797
## 2016 1909542 1855121
## 2017 1906911 1844502
## 2018 1904183 1833734
## 2019 1901278 1822696
## 2020 1898147 1811312
## 2021 1894758 1799534
## 
## $Italy$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 2000071 2019955
## 2013 2031011 2060678
## 2014 2059691 2097945
## 2015 2087563 2133976
## 2016 2115149 2169570
## 2017 2142698 2205107
## 2018 2170344 2240793
## 2019 2198167 2276749
## 2020 2226217 2313051
## 2021 2254523 2349747
## 
## 
## $Ireland
## $Ireland$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 186180.6 189686.3 193192.0 196697.8 200203.5 203709.2 207214.9
##  [8] 210720.6 214226.3 217732.1
## 
## $Ireland$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 181847.1 179553.0
## 2013 180220.0 175208.9
## 2014 180525.5 173820.2
## 2015 181490.1 173439.6
## 2016 182822.3 173621.3
## 2017 184397.6 174174.7
## 2018 186149.1 174997.5
## 2019 188035.8 176027.2
## 2020 190030.7 177222.2
## 2021 192114.4 178553.3
## 
## $Ireland$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 190514.1 192808.2
## 2013 199152.6 204163.8
## 2014 205858.6 212563.9
## 2015 211905.4 219955.9
## 2016 217584.6 226785.7
## 2017 223020.8 233243.7
## 2018 228280.7 239432.3
## 2019 233405.4 245414.0
## 2020 238422.0 251230.4
## 2021 243349.7 256910.8
## 
## 
## $Japan
## $Japan$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 4400750 4418175 4435601 4453026 4470451 4487876 4505301 4522727
##  [9] 4540152 4557577
## 
## $Japan$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 4304313 4253263
## 2013 4269352 4190570
## 2014 4237821 4133122
## 2015 4206494 4075987
## 2016 4174303 4017531
## 2017 4140794 3957059
## 2018 4105755 3894247
## 2019 4069086 3828943
## 2020 4030745 3761081
## 2021 3990720 3690645
## 
## $Japan$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##          80%     95%
## 2012 4497187 4548238
## 2013 4566999 4645781
## 2014 4633381 4738079
## 2015 4699558 4830064
## 2016 4766599 4923371
## 2017 4834959 5018693
## 2018 4904848 5116356
## 2019 4976368 5216511
## 2020 5049559 5319223
## 2021 5124434 5424510
## 
## 
## $Netherlands
## $Netherlands$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 722492.5 733202.0 743911.5 754621.1 765330.6 776040.1 786749.6
##  [8] 797459.1 808168.7 818878.2
## 
## $Netherlands$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 712037.2 706502.6
## 2013 715504.3 706135.6
## 2014 721171.5 709133.7
## 2015 727769.5 713555.1
## 2016 734918.3 718819.1
## 2017 742442.4 724656.9
## 2018 750243.5 730918.3
## 2019 758259.7 737508.7
## 2020 766449.4 744364.6
## 2021 774782.9 751440.3
## 
## $Netherlands$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 732947.8 738482.4
## 2013 750899.8 760268.4
## 2014 766651.6 778689.4
## 2015 781472.6 795687.0
## 2016 795742.8 811842.1
## 2017 809637.8 827423.3
## 2018 823255.8 842581.0
## 2019 836658.6 857409.6
## 2020 849887.9 871972.8
## 2021 862973.5 886316.1
## 
## 
## $Norway
## $Norway$mean
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##  [1] 310730.3 315564.8 320399.2 325233.7 330068.1 334902.6 339737.0
##  [8] 344571.5 349406.0 354240.4
## 
## $Norway$lower
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 307307.9 305496.3
## 2013 309262.5 305926.3
## 2014 312169.8 307813.3
## 2015 315449.5 310270.1
## 2016 318944.5 313056.0
## 2017 322584.2 316063.3
## 2018 326330.1 319232.8
## 2019 330157.9 322527.7
## 2020 334051.5 325923.4
## 2021 337999.6 329402.2
## 
## $Norway$upper
## Time Series:
## Start = 2012 
## End = 2021 
## Frequency = 1 
##           80%      95%
## 2012 314152.7 315964.4
## 2013 321867.0 325203.2
## 2014 328628.7 332985.1
## 2015 335017.8 340197.3
## 2016 341191.8 347080.3
## 2017 347220.9 353741.9
## 2018 353144.0 360241.3
## 2019 358985.1 366615.3
## 2020 364760.4 372888.5
## 2021 370481.2 379078.6

Turning the forecasts into a data.frame

Of course, we wouldn’t be doing function programming if we didn’t have a reducing step. Thankfully, purrr has a reduce function made for doing this. I’m going to take each of my forecasts, convert them to data.frames and then stack them together with rbind. The code for this is again really simple.

forecasts_df = forecasts %>%
  map(as.data.frame) %>%
  reduce(rbind)

forecasts_df

##       Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 2012       3254005.3 3125365.2 3382645.4 3057267.3 3450743.4
## 2013       3303414.6 3121490.1 3485339.2 3025185.0 3581644.3
## 2014       3352823.9 3130012.8 3575635.1 3012063.6 3693584.2
## 2015       3402233.3 3144953.1 3659513.5 3008757.1 3795709.4
## 2016       3451642.6 3163994.6 3739290.6 3011722.9 3891562.2
## 2017       3501051.9 3185949.3 3816154.5 3019144.0 3982959.7
## 2018       3550461.2 3210111.5 3890810.9 3029941.2 4070981.2
## 2019       3599870.5 3236021.4 3963719.7 3043411.3 4156329.8
## 2020       3649279.8 3263359.5 4035200.1 3059065.7 4239494.0
## 2021       3698689.1 3291893.4 4105484.9 3076548.8 4320829.5
## 20121       314488.9  306869.7  322108.1  302836.4  326141.4
## 20131       334273.0  322897.8  345648.3  316876.1  351670.0
## 20141       355945.8  342549.7  369342.0  335458.2  376433.5
## 20151       364567.7  347442.0  381693.5  338376.1  390759.3
## 20161       376784.1  354788.1  398780.1  343144.1  410424.1
## 20171       395694.1  369588.2  421800.0  355768.6  435619.6
## 20181       409962.5  379526.6  440398.4  363414.8  456510.3
## 20191       421816.8  386192.3  457441.3  367333.9  476299.7
## 20201       437396.3  396493.2  478299.4  374840.5  499952.2
## 20211       453099.6  406952.4  499246.7  382523.6  523675.6
## 20122       205227.8  198962.2  211493.5  195645.3  214810.3
## 20132       208433.7  197695.6  219171.7  192011.2  224856.1
## 20142       211639.5  197806.4  225472.5  190483.6  232795.3
## 20152       214845.3  198492.9  231197.7  189836.4  239854.1
## 20162       218051.1  199518.7  236583.5  189708.3  246393.9
## 20172       221256.9  200775.3  241738.6  189933.0  252580.9
## 20182       224462.8  202201.9  246723.6  190417.8  258507.8
## 20192       227668.6  203760.6  251576.6  191104.5  264232.7
## 20202       230874.4  205425.7  256323.2  191953.9  269794.9
## 20212       234080.2  207178.8  260981.7  192938.1  275222.4
## 20123      2321634.7 2292282.3 2350987.0 2276744.2 2366525.2
## 20133      2356109.6 2307001.2 2405218.1 2281004.7 2431214.6
## 20143      2390584.6 2327642.4 2453526.9 2294322.8 2486846.5
## 20153      2425059.6 2350818.0 2499301.2 2311516.9 2538602.3
## 20163      2459534.6 2375499.5 2543569.7 2331014.0 2588055.2
## 20173      2494009.6 2401208.8 2586810.3 2352083.1 2635936.1
## 20183      2528484.6 2427677.5 2629291.6 2374313.5 2682655.6
## 20193      2562959.5 2454736.9 2671182.2 2397447.2 2728471.8
## 20203      2597434.5 2482272.8 2712596.3 2421309.8 2773559.2
## 20213      2631909.5 2510203.6 2753615.4 2445776.5 2818042.5
## 20124      1962509.1 1924947.0 2000071.2 1905062.8 2019955.3
## 20134      1974968.2 1918925.7 2031010.6 1889258.6 2060677.7
## 20144      1987427.3 1915163.8 2059690.7 1876909.8 2097944.7
## 20154      1999886.3 1912210.0 2087562.7 1865796.9 2133975.7
## 20164      2012345.4 1909541.6 2115149.2 1855120.6 2169570.3
## 20174      2024804.5 1906910.9 2142698.2 1844501.7 2205107.3
## 20184      2037263.6 1904182.8 2170344.4 1833734.1 2240793.1
## 20194      2049722.7 1901277.9 2198167.5 1822696.0 2276749.4
## 20204      2062181.8 1898147.0 2226216.6 1811312.2 2313051.4
## 20214      2074640.9 1894758.4 2254523.3 1799534.4 2349747.3
## 20125       186180.6  181847.1  190514.1  179553.0  192808.2
## 20135       189686.3  180220.0  199152.6  175208.9  204163.8
## 20145       193192.0  180525.5  205858.6  173820.2  212563.9
## 20155       196697.8  181490.1  211905.4  173439.6  219955.9
## 20165       200203.5  182822.3  217584.6  173621.3  226785.7
## 20175       203709.2  184397.6  223020.8  174174.7  233243.7
## 20185       207214.9  186149.1  228280.7  174997.5  239432.3
## 20195       210720.6  188035.8  233405.4  176027.2  245414.0
## 20205       214226.3  190030.7  238422.0  177222.2  251230.4
## 20215       217732.1  192114.4  243349.7  178553.3  256910.8
## 20126      4400750.2 4304313.4 4497187.0 4253262.9 4548237.6
## 20136      4418175.4 4269352.2 4566998.6 4190570.0 4645780.8
## 20146      4435600.6 4237820.5 4633380.8 4133122.0 4738079.2
## 20156      4453025.8 4206493.7 4699558.0 4075987.4 4830064.3
## 20166      4470451.1 4174302.7 4766599.4 4017531.2 4923370.9
## 20176      4487876.3 4140793.6 4834958.9 3957059.2 5018693.4
## 20186      4505301.5 4105754.6 4904848.4 3894247.2 5116355.8
## 20196      4522726.7 4069085.7 4976367.7 3828942.7 5216510.7
## 20206      4540151.9 4030744.7 5049559.1 3761080.9 5319222.9
## 20216      4557577.1 3990720.5 5124433.8 3690644.7 5424509.6
## 20127       722492.5  712037.2  732947.8  706502.6  738482.4
## 20137       733202.0  715504.3  750899.8  706135.6  760268.4
## 20147       743911.5  721171.5  766651.6  709133.7  778689.4
## 20157       754621.1  727769.5  781472.6  713555.1  795687.0
## 20167       765330.6  734918.3  795742.8  718819.1  811842.1
## 20177       776040.1  742442.4  809637.8  724656.9  827423.3
## 20187       786749.6  750243.5  823255.8  730918.3  842581.0
## 20197       797459.1  758259.7  836658.6  737508.7  857409.6
## 20207       808168.7  766449.4  849887.9  744364.6  871972.8
## 20217       818878.2  774782.9  862973.5  751440.3  886316.1
## 20128       310730.3  307307.9  314152.7  305496.3  315964.4
## 20138       315564.8  309262.5  321867.0  305926.3  325203.2
## 20148       320399.2  312169.8  328628.7  307813.3  332985.1
## 20158       325233.7  315449.5  335017.8  310270.1  340197.3
## 20168       330068.1  318944.5  341191.8  313056.0  347080.3
## 20178       334902.6  322584.2  347220.9  316063.3  353741.9
## 20188       339737.0  326330.1  353144.0  319232.8  360241.3
## 20198       344571.5  330157.9  358985.1  322527.7  366615.3
## 20208       349406.0  334051.5  364760.4  325923.4  372888.5
## 20218       354240.4  337999.6  370481.2  329402.2  379078.6

The reduce step sequentially binds each data.frame. Once you have the final data inforecasts_df you can start working with dplyr, since the data is in a nice tabular format.

Other References

purrr Lessons and Examples

Beautiful data science with functional programming and R

Map Functions From the Purrr Package: Part 1

Running a model on separate groups

Multidataset & model combinations using Purrr and caret

Store data & model objects neatly in one tibble, and post process programatically

Nishant Upadhyay

2017-10-06 22:59:33

The motivation

An example using `BostonHousing` data: `purrr` and `caret`

Load libs & data

Transformations on Xs

Create a starter dataframe

Select the models

Create data-model combinations

Solve the models

Extract results

In conclusion

Applying a function over rows of a data frame: Benchmarking multiple ways:`purrr`/`purrrlyr` and other iterative methods

Plot times

Forecasting multitimeseries using purrr-One more `purrr` awesome use

Other References

Multidataset & model combinations using Purrr and caret

Store data & model objects neatly in one tibble, and post process programatically

Nishant Upadhyay

2017-10-06 22:59:33

The motivation

An example using BostonHousing data: purrr and caret

Load libs & data

Transformations on Xs

Create a starter dataframe

Select the models

Create data-model combinations

Solve the models

Extract results

In conclusion

Applying a function over rows of a data frame: Benchmarking multiple ways:purrr/purrrlyr and other iterative methods

Plot times

Forecasting multitimeseries using purrr-One more purrr awesome use

Other References

An example using `BostonHousing` data: `purrr` and `caret`

Applying a function over rows of a data frame: Benchmarking multiple ways:`purrr`/`purrrlyr` and other iterative methods

Forecasting multitimeseries using purrr-One more `purrr` awesome use