tl;dr: You’ll learn how to use purrr, caret and list-cols to quickly create hundreds of dataset + model combinations, store data & model objects neatly in one tibble, and post process programatically. These tools enable succinct functional programming in which a lot gets done with just a few lines of code.
I want to write a quick blogpost on two phenomenal pieces of code written by Kuhn et al, and Wickham et al, namely - purrr, caret. These play so well with the tidyverse that they have become an indispensible part of my repertoire.
In any datascience project, I want to investigate the effect of various combinations of:
For each of the various combinations possible, I want to quantify model performance using common performance metrics like AIC or SBC. Commonly, I’ll select the model that has the ‘best’ possible performance among all such models.
Traditionally, I end up with many R objects: one for each new combination of transformation-model_type-tuning_method. For example, boostFit, xgbFit, glmFit, elastinetFit for untransformed variables. If I have any transformations, I might also have boostFit.xform, xgbFit.xform, glmFit.xform etc. Add to that, investigation of grouped vs ungrouped variables… boostFit.xform.grouped, xgbFit.xform.ungrouped etc. You get the idea.
The challenge with this approach is that the data and the models remain separated, there’s a lot of repeat code for object management, manipulation and plotting, and in order to compare all the models together, we have to somehow stitch the results together. (For the last point, resamples() in caret works beautifully, but requires the same number of resamples in each model.)
The approach I’m presenting below is a combination of a few approaches I learnt through the APM book, the caret documentation, grammar and verbage in tidyverse, as well as a couple of useful talks in the 2017 R Studio conferenence in Orlando [Notably ones on purrr and list-cols]. What you’ll also see is that the code is extremely succint, which is simply a joy to write and read.
BostonHousing data: purrr and caretThe libraries I’m using here are tidyr, tibble, dplyr, magrittr, purrr, and caret. The dataset is from mlbench.
library(tidyr)
library(tibble)
library(dplyr)
library(magrittr)
library(purrr)
library(caret)
library(mlbench)
data("BostonHousing")
head(BostonHousing)## crim zn indus chas nox rm age dis rad tax ptratio b
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12
## lstat medv
## 1 4.98 24.0
## 2 9.14 21.6
## 3 4.03 34.7
## 4 2.94 33.4
## 5 5.33 36.2
## 6 5.21 28.7
For the purposes of this demonstration, I’ll simply create two new sets variables using a Box-Cox transformation - caret’s preProcess() makes this easy - and the squared values of the originals. Save each new variable-set in a new character vector which follows the naming convention preds.xxxx.[^1]
1: This makes it super easy to find all such variable sets quickly using ls(pattern = 'preds') and store it in a character vector.
# The originals
response <- 'medv'
preds.original <- colnames(BostonHousing[,1:13])
# Box-Cox transformation
prepTrain <- preProcess(x = BostonHousing[,preds.original], method = c('BoxCox'))
boxcoxed <- predict(prepTrain,newdata = BostonHousing[,preds.original])
colnames(boxcoxed) <- paste0(colnames(boxcoxed),'.boxed')
preds.boxcoxed <- colnames(boxcoxed)
# Squaring
squared <- (BostonHousing[,c(1:3,5:13)])^2
colnames(squared) <- paste0(colnames(squared),'.sq')
preds.sq <- colnames(squared)
# All together now...
BostonHousing %<>%
cbind(boxcoxed,squared)
# Make sure everything is a numerical (for xgboost to work), and also NOT a tibble (some caret functions have trouble with tibbles)
BostonHousing %<>%
map_df(.f = ~as.numeric(.x)) %>% as.data.frame()
str(BostonHousing)## 'data.frame': 506 obs. of 39 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : num 1 1 1 1 1 1 1 1 1 1 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : num 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio : num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ b : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
## $ crim.boxed : num -5.06 -3.6 -3.6 -3.43 -2.67 ...
## $ zn.boxed : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus.boxed : num 0.994 2.966 2.966 0.914 0.914 ...
## $ chas.boxed : num 1 1 1 1 1 1 1 1 1 1 ...
## $ nox.boxed : num -0.83 -1.09 -1.09 -1.13 -1.13 ...
## $ rm.boxed : num 2.81 2.76 3 2.94 2.99 ...
## $ age.boxed : num 175 224 161 110 137 ...
## $ dis.boxed : num 1.41 1.6 1.6 1.8 1.8 ...
## $ rad.boxed : num 0 0.693 0.693 1.099 1.099 ...
## $ tax.boxed : num 1.88 1.87 1.87 1.87 1.87 ...
## $ ptratio.boxed: num 117 158 158 174 174 ...
## $ b.boxed : num 78764 78764 77157 77866 78764 ...
## $ lstat.boxed : num 1.89 2.78 1.61 1.2 1.99 ...
## $ crim.sq : num 3.99e-05 7.46e-04 7.45e-04 1.05e-03 4.77e-03 ...
## $ zn.sq : num 324 0 0 0 0 ...
## $ indus.sq : num 5.34 49.98 49.98 4.75 4.75 ...
## $ nox.sq : num 0.289 0.22 0.22 0.21 0.21 ...
## $ rm.sq : num 43.2 41.2 51.6 49 51.1 ...
## $ age.sq : num 4251 6225 3733 2098 2938 ...
## $ dis.sq : num 16.7 24.7 24.7 36.8 36.8 ...
## $ rad.sq : num 1 4 4 9 9 9 25 25 25 25 ...
## $ tax.sq : num 87616 58564 58564 49284 49284 ...
## $ ptratio.sq : num 234 317 317 350 350 ...
## $ b.sq : num 157530 157530 154315 155733 157530 ...
## $ lstat.sq : num 24.8 83.54 16.24 8.64 28.41 ...
Here’s our new predictor variable sets:
pred_varsets <- ls(pattern = 'preds')
pred_varsets## [1] "preds.boxcoxed" "preds.original" "preds.sq"
I first create a starter dataframe where the input data is repeated as many times as the number of predictor variable sets. enframe() allows us to embed objects a dataframe column.
num_var_select <- length(pred_varsets)
list(BostonHousing) %>%
rep(num_var_select) %>%
enframe(name = 'id', value = 'rawdata') %>%
mutate(pred_varsets = pred_varsets) -> starter_df
starter_df## # A tibble: 3 x 3
## id rawdata pred_varsets
## <int> <list> <chr>
## 1 1 <data.frame [506 x 39]> preds.boxcoxed
## 2 2 <data.frame [506 x 39]> preds.original
## 3 3 <data.frame [506 x 39]> preds.sq
Now, I split the raw data into train.X column which houses data only for those predictor variables identified in the pred_varsets column. map2 is a great function which allows a mapping to be done over two variables and passed to a function.
I also create a train.Y for the response variable here.
# Function to select columns in the raw data
filterColumns <- function(x,y){
x[,(colnames(x) %in% eval(parse(text=y)))]
}
# Create X and Y columns
starter_df %<>%
transmute(
id,
pred_varsets,
train.X = map2(rawdata, pred_varsets, ~ filterColumns(.x, .y)),
train.Y = map(rawdata, ~ .x$medv)
)
starter_df## # A tibble: 3 x 4
## id pred_varsets train.X train.Y
## <int> <chr> <list> <list>
## 1 1 preds.boxcoxed <data.frame [506 x 13]> <dbl [506]>
## 2 2 preds.original <data.frame [506 x 13]> <dbl [506]>
## 3 3 preds.sq <data.frame [506 x 12]> <dbl [506]>
This is where I can select which models I want in the analysis. Each model should be in a function of this style:
modelName <- function(X, Y){
ctrl <- trainControl(
...
)
train(
x = X,
y = Y,
trContrl = ctrl,
method = '## modelname ##',
...
)
}
I’m using caret exclusively, so each function needs a trainControl() and a train(). Learn more about caret here.
rpartModel <- function(X, Y) {
ctrl <- trainControl(
## 5-fold CV
method = "repeatedcv",
number = 5
)
train(
x = X,
y = Y,
method = 'rpart2',
trControl = ctrl,
tuneGrid = data.frame(maxdepth=c(2,3,4,5)),
preProc = c('center', 'scale')
)
}
xgbTreeModel <- function(X,Y){
ctrl <- trainControl(
## 5-fold CV
method = "repeatedcv",
number = 5
)
train(
x=X,
y=Y,
method = 'xgbTree',
trControl = ctrl,
tuneGrid = expand.grid(nrounds = c(100,300,500),
max_depth = c(2,4,6) ,
eta = 0.1,
gamma = 1,
colsample_bytree = 1,
min_child_weight = 1,
subsample = 1),
preProc = c('center', 'scale')
)
}Once these functions are setup, enframe these into a dataframe.
model_list <- list(rpartModel=rpartModel,
xgbModel=xgbTreeModel) %>%
enframe(name = 'modelName',value = 'model')
model_list## # A tibble: 2 x 2
## modelName model
## <chr> <list>
## 1 rpartModel <fun>
## 2 xgbModel <fun>
Now, we’re ready to combine the two together. train_df has all the predictor varset combinations, model_list has the list of all models. I’m assuming I want to run each combination of the two; so if I have 3 variable sets, and 2 models, I have a total of 6 models to run. This code sets that up:
train_df <-
starter_df[rep(1:nrow(starter_df),nrow(model_list)),]
train_df %<>%
bind_cols(
model_list[rep(1:nrow(model_list),nrow(starter_df)),] %>% arrange(modelName)
) %>%
mutate(id=1:nrow(.))
train_df## # A tibble: 6 x 6
## id pred_varsets train.X train.Y modelName model
## <int> <chr> <list> <list> <chr> <lis>
## 1 1 preds.boxcoxed <data.frame [506 x 13]> <dbl [506]> rpartMod<U+0085> <fun>
## 2 2 preds.original <data.frame [506 x 13]> <dbl [506]> rpartMod<U+0085> <fun>
## 3 3 preds.sq <data.frame [506 x 12]> <dbl [506]> rpartMod<U+0085> <fun>
## 4 4 preds.boxcoxed <data.frame [506 x 13]> <dbl [506]> xgbModel <fun>
## 5 5 preds.original <data.frame [506 x 13]> <dbl [506]> xgbModel <fun>
## 6 6 preds.sq <data.frame [506 x 12]> <dbl [506]> xgbModel <fun>
The data is almost all setup now. invoke_map() is a function which can call functions and pass it arguments. Since we need to pass both train.X and train.Y together, there’s an intermediate call to map2() to “listify” these first into params.
All them models solve, and their results (the model object itself) is stored in modelFits.
train_df %<>%
mutate(params = map2(train.X, train.Y, ~ list(X = .x, Y = .y)),
modelFits=invoke_map(model,params)
)
train_df %>% dplyr::select(pred_varsets,modelName,params,modelFits)## # A tibble: 6 x 4
## pred_varsets modelName params modelFits
## <chr> <chr> <list> <list>
## 1 preds.boxcoxed rpartModel <list [2]> <S3: train>
## 2 preds.original rpartModel <list [2]> <S3: train>
## 3 preds.sq rpartModel <list [2]> <S3: train>
## 4 preds.boxcoxed xgbModel <list [2]> <S3: train>
## 5 preds.original xgbModel <list [2]> <S3: train>
## 6 preds.sq xgbModel <list [2]> <S3: train>
Now, I can extract pretty much any model performance or hypertuning parameter using purrr. Since caret is so lovingly standardized, it doesn’t matter if I’m using a glm, xgboost, rpart2, or ann - the code remains the same.
train_df %<>%
mutate(
RMSE=map_dbl(modelFits,~max(.x$results$RMSE)),
RMSESD=map_dbl(modelFits,~max(.x$results$RMSESD)),
Rsq=map_dbl(modelFits,~max(.x$results$Rsquared)),
bestTune=map(modelFits,~.x$bestTune)
)
train_df %>% dplyr::select(-train.X,-train.Y,-params,-modelFits)## # A tibble: 6 x 8
## id pred_varsets modelName model RMSE RMSESD Rsq bestTune
## <int> <chr> <chr> <list> <dbl> <dbl> <dbl> <list>
## 1 1 preds.boxcoxed rpartModel <fun> 6.07 1.04 0.741 <data.frame [<U+0085>
## 2 2 preds.original rpartModel <fun> 6.21 0.878 0.742 <data.frame [<U+0085>
## 3 3 preds.sq rpartModel <fun> 5.94 0.982 0.721 <data.frame [<U+0085>
## 4 4 preds.boxcoxed xgbModel <fun> 3.59 0.675 0.891 <data.frame [<U+0085>
## 5 5 preds.original xgbModel <fun> 3.26 0.463 0.889 <data.frame [<U+0085>
## 6 6 preds.sq xgbModel <fun> 3.47 0.756 0.896 <data.frame [<U+0085>
This allows us to very quickly visualize the results using lattice or ggplot across all models.
lattice::dotplot(Rsq~pred_varsets|modelName,train_df)train_df %>%
ggplot(aes(x=pred_varsets,color=modelName))+
geom_point(aes(y=RMSE),size=2)+
geom_errorbar(aes(ymin = RMSE-RMSESD,ymax= RMSE+RMSESD),size=.5,width=.15)Since the model fit objects themselves are embedded, I can still look at each model’s internals. For example, to plot the results of the 5-fold CV on the grid search for the xgboost model:
plot(train_df$modelFits[train_df$modelName=='xgbModel' & train_df$pred_varsets=='preds.original'][[1]])These packages make investigating a very large number of datasets and models easy. With just a few lines of code, I ran a total of 6 different models - and for each model: a 5-fold cross validation, and for the xgboost models: a grid search across two tuning parameters - to select the best model on any number of performance criteria. Yet, everything remains neatly arranged in one dataframe, which can be saved as .RData and retrived later.
Also remember that the raw data replicated in the data column of starter_df doesn’t have to be the exact same dataset for each row either. so you could leverage this methodology for a train-validate-test approach, or for resampled training sets, where each row has completely different datasets embedded within. Really depends on your creativity and how you write subsequent code. You’re definitely going to find more material online on this topic, be sure to check r-bloggers.
purrr/purrrlyr and other iterative methodsSource for this document.
@dattali asked, “what’s a safe way to iterate over rows of a data frame?” The example was to convert each row into a list and return a list of lists, indexed first by column, then by row.
A number of people gave suggestions on Twitter, which I’ve collected here. I’ve benchmarked these methods with data of various sizes; scroll down to see a plot of times.
library(purrr)
library(dplyr)
library(tidyr)
library(purrrlyr)
# @dattali
# Using apply (only safe when all cols are same type)
f_apply <- function(df) {
apply(df, 1, function(row) as.list(row))
}
# @drob
# split + lapply
f_split_lapply <- function(df) {
df <- split(df, seq_len(nrow(df)))
lapply(df, function(row) as.list(row))
}
# @winston_chang
# lapply over row indices
f_lapply_row <- function(df) {
lapply(seq_len(nrow(df)), function(i) as.list(df[i,,drop=FALSE]))
}
# @winston_chang
# lapply + lapply: Treat data frame as list, and the slice out lists
f_lapply_lapply <- function(df) {
cols <- seq_len(length(df))
names(cols) <- names(df)
lapply(seq_len(nrow(df)), function(row) {
lapply(cols, function(col) {
df[[col]][[row]]
})
})
}
# @winston_chang
# purrrlyr::by_row
f_by_row <- function(df) {
res <- by_row(df, function(row) as.list(row))
res$.out
}
# @JennyBryan
# purrr::pmap
f_pmap <- function(df) {
pmap(df, list)
}
# purrr::pmap, but coerce df to a list first
f_pmap_aslist <- function(df) {
pmap(as.list(df), list)
}
# @krlmlr
# dplyr::rowwise
f_rowwise <- function(df) {
df %>% rowwise %>% do(row = as.list(.))
}Benchmark each of them, using data sets with varying numbers of rows:
run_benchmark <- function(nrow) {
# Make some data
df <- data.frame(
x = rnorm(nrow),
y = runif(nrow),
z = runif(nrow)
)
res <- list(
apply = system.time(f_apply(df)),
split_lapply = system.time(f_split_lapply(df)),
lapply_row = system.time(f_lapply_row(df)),
lapply_lapply = system.time(f_lapply_lapply(df)),
by_row = system.time(f_by_row(df)),
pmap = system.time(f_pmap(df)),
pmap_aslist = system.time(f_pmap_aslist(df)),
rowwise = system.time(f_rowwise(df))
)
# Get elapsed times
res <- lapply(res, `[[`, "elapsed")
# Add nrow to front
res <- c(nrow = nrow, res)
res
}
# Run the benchmarks for various size data
all_times <- lapply(1:5, function(n) {
run_benchmark(10^n)
})
# Convert to data frame
times <- lapply(all_times, as.data.frame)
times <- do.call(rbind, times)
knitr::kable(times)| nrow | apply | split_lapply | lapply_row | lapply_lapply | by_row | pmap | pmap_aslist | rowwise |
|---|---|---|---|---|---|---|---|---|
| 1e+01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.05 | 0.00 | 0.09 | 0.08 |
| 1e+02 | 0.01 | 0.01 | 0.01 | 0.02 | 0.11 | 0.00 | 0.00 | 0.17 |
| 1e+03 | 0.00 | 0.07 | 0.05 | 0.03 | 1.00 | 0.00 | 0.00 | 0.52 |
| 1e+04 | 0.08 | 0.94 | 0.82 | 0.26 | 10.84 | 0.03 | 0.03 | 4.70 |
| 1e+05 | 1.08 | 46.61 | 17.66 | 2.78 | 150.84 | 0.34 | 0.35 | 59.96 |
This plot shows the number of seconds needed to process n rows, for each method. Both the x and y use log scales, so each step along the x scale represents a 10x increase in number of rows, and each step along the y scale represents a 10x increase in time.
library(ggplot2)
library(scales)
# Convert to long format
times_long <- gather(times, method, seconds, -nrow)
# Set order of methods, for plots
times_long$method <- factor(times_long$method,
levels = c("apply", "split_lapply", "lapply_row", "lapply_lapply", "by_row",
"pmap", "pmap_aslist", "rowwise")
)
# Plot with log-log axes
ggplot(times_long, aes(x = nrow, y = seconds, colour = method)) +
geom_point() +
geom_line() +
annotation_logticks(sides = "trbl") +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)),
minor_breaks = NULL) +
scale_x_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)),
minor_breaks = NULL)## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous y-axis
purrr awesome useBeautiful data science with functional programming and R
If you follow what’s going on in the R ecosystem, you know that the language has been revolutionized by two packages: dplyr and magrittr. The former provides elegant methods for working with relational data, while the latter brought the pipe %>% to R.
Together, these packages make handling tabular data a snap. For those of you who haven’t seen these packages in action, here’s a little taste
iris %>%
group_by(Species) %>%
summarize(Mean.Width=mean(Sepal.Width))## Mean.Width
## 1 3.057333
The syntax is super clear-even some one with no R experience can understand those three lines of code. No wonder these packages caught on so fast.
But what if you don’t have tabular data? What if you’re using JSON or maybe a time series? Sure you can shoeshorn those data types into a data.frame, but it’s not the most natural. This is here my new favorite R package purrr can help you out.
What’s purrr?
The goal of purrr is to bring more functional programming to R. It gives you easy-to-use mappers, reducers and filters that you can chain together. In short, it’s dplyr for non-tabular data. That’s great in theory, you’re probably thinking, but how do I use it in practice? What does the code look like? Well, I say, the best explanation is always an example.
An Example: Forecasting Gross Domestic Product
Let’s say you want to forecast the GDP of a bunch of countries using R. You could force this data into a data.frame but it’s going to be a bit weird. Or you could use R’s built-in time series class but then you don’t have dplyr’s easy syntax. This is where purrr really excels: you can use time series objects and have dplyr-like syntax.
Getting the data from Quandl
We’re going to use the awesome Quandl API to get a bunch of GDP data and then forecast the GDP for 9 different countries for the next 10 years. First, let’s load some libraries and define the datasets that we’re interested in.
library(Quandl)## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
library(purrr)
library(forecast)
library(dplyr)
datasets = c(Germany="FRED/DEURGDPR",
Singapore="FRED/SGPRGDPR",
Finland="FRED/FINRGDPR",
France="FRED/FRARGDPR",
Italy="FRED/ITARGDPR",
Ireland="FRED/IRLRGDPR",
Japan="FRED/JPNRGDPR",
Netherlands="FRED/NLDRGDPR",
Norway="FRED/NORRGDPR")
#Next, let's use the map function from purrr to pull down the data as a time series object for each of these countries.
time_series = datasets %>%
map(Quandl, type="ts")
## What's going on with those two lines of code? The map function takes a list or vector and a function as inputs. It then applies the function to each element. So, for each of the elements in the datasets vector, it's calling the Quandl function to retrieve a time series object. Thetype="ts" at the end ensures that we're getting a time series object instead of a data.frame.
#You may be thinking that map is awfully similar to lapply, and you're right. In fact,purrr is wrapping the standard lapply. But it's providing an improved syntax and expanded capabilities that we'll see later.
#If you look at time_series, you'll see that it's a list of time series, one for each of the countries. (I truncated the output to save space).
str(time_series)## List of 9
## $ Germany : Time-Series [1:52] from 1960 to 2011: 684721 716424 749838 770928 822282 ...
## $ Singapore : Time-Series [1:52] from 1960 to 2011: 7208 7795 8349 9181 8841 ...
## $ Finland : Time-Series [1:37] from 1975 to 2011: 85842 86137 86344 88865 95194 ...
## $ France : Time-Series [1:52] from 1960 to 2011: 525384 550628 588468 624954 665117 ...
## $ Italy : Time-Series [1:52] from 1960 to 2011: 493997 534536 567702 599551 616317 ...
## $ Ireland : Time-Series [1:42] from 1970 to 2011: 36835 38525 41188 43513 44647 ...
## $ Japan : Time-Series [1:52] from 1960 to 2011: 574859 642998 698275 759635 844605 ...
## $ Netherlands: Time-Series [1:52] from 1960 to 2011: 163562 168306 175543 181336 196931 ...
## $ Norway : Time-Series [1:52] from 1960 to 2011: 59380 62997 64692 67143 70508 ...
Forecasting GDP
Next, we’re going to train an ARIMA model on each of these time series and then forecast for the next 10 years. The code for this is super easy with purrr. Check out my other blog post on time series if you want to find out more about these models.
forecasts = time_series %>%
map(auto.arima) %>%
map(forecast, h=10)
forecasts## $Germany
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 3254005 3125365 3382645 3057267 3450743
## 2013 3303415 3121490 3485339 3025185 3581644
## 2014 3352824 3130013 3575635 3012064 3693584
## 2015 3402233 3144953 3659513 3008757 3795709
## 2016 3451643 3163995 3739291 3011723 3891562
## 2017 3501052 3185949 3816154 3019144 3982960
## 2018 3550461 3210111 3890811 3029941 4070981
## 2019 3599871 3236021 3963720 3043411 4156330
## 2020 3649280 3263360 4035200 3059066 4239494
## 2021 3698689 3291893 4105485 3076549 4320829
##
## $Singapore
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 314488.9 306869.7 322108.1 302836.4 326141.4
## 2013 334273.0 322897.8 345648.3 316876.1 351670.0
## 2014 355945.8 342549.7 369342.0 335458.2 376433.5
## 2015 364567.7 347442.0 381693.5 338376.1 390759.3
## 2016 376784.1 354788.1 398780.1 343144.1 410424.1
## 2017 395694.1 369588.2 421800.0 355768.6 435619.6
## 2018 409962.5 379526.6 440398.4 363414.8 456510.3
## 2019 421816.8 386192.3 457441.3 367333.9 476299.7
## 2020 437396.3 396493.2 478299.4 374840.5 499952.2
## 2021 453099.6 406952.4 499246.7 382523.6 523675.6
##
## $Finland
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 205227.8 198962.2 211493.5 195645.3 214810.3
## 2013 208433.7 197695.6 219171.7 192011.2 224856.1
## 2014 211639.5 197806.4 225472.5 190483.6 232795.3
## 2015 214845.3 198492.9 231197.7 189836.4 239854.1
## 2016 218051.1 199518.7 236583.5 189708.3 246393.9
## 2017 221256.9 200775.3 241738.6 189933.0 252580.9
## 2018 224462.8 202201.9 246723.6 190417.8 258507.8
## 2019 227668.6 203760.6 251576.6 191104.5 264232.7
## 2020 230874.4 205425.7 256323.2 191953.9 269794.9
## 2021 234080.2 207178.8 260981.7 192938.1 275222.4
##
## $France
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 2321635 2292282 2350987 2276744 2366525
## 2013 2356110 2307001 2405218 2281005 2431215
## 2014 2390585 2327642 2453527 2294323 2486846
## 2015 2425060 2350818 2499301 2311517 2538602
## 2016 2459535 2375500 2543570 2331014 2588055
## 2017 2494010 2401209 2586810 2352083 2635936
## 2018 2528485 2427678 2629292 2374313 2682656
## 2019 2562960 2454737 2671182 2397447 2728472
## 2020 2597435 2482273 2712596 2421310 2773559
## 2021 2631909 2510204 2753615 2445776 2818043
##
## $Italy
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 1962509 1924947 2000071 1905063 2019955
## 2013 1974968 1918926 2031011 1889259 2060678
## 2014 1987427 1915164 2059691 1876910 2097945
## 2015 1999886 1912210 2087563 1865797 2133976
## 2016 2012345 1909542 2115149 1855121 2169570
## 2017 2024805 1906911 2142698 1844502 2205107
## 2018 2037264 1904183 2170344 1833734 2240793
## 2019 2049723 1901278 2198167 1822696 2276749
## 2020 2062182 1898147 2226217 1811312 2313051
## 2021 2074641 1894758 2254523 1799534 2349747
##
## $Ireland
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 186180.6 181847.1 190514.1 179553.0 192808.2
## 2013 189686.3 180220.0 199152.6 175208.9 204163.8
## 2014 193192.0 180525.5 205858.6 173820.2 212563.9
## 2015 196697.8 181490.1 211905.4 173439.6 219955.9
## 2016 200203.5 182822.3 217584.6 173621.3 226785.7
## 2017 203709.2 184397.6 223020.8 174174.7 233243.7
## 2018 207214.9 186149.1 228280.7 174997.5 239432.3
## 2019 210720.6 188035.8 233405.4 176027.2 245414.0
## 2020 214226.3 190030.7 238422.0 177222.2 251230.4
## 2021 217732.1 192114.4 243349.7 178553.3 256910.8
##
## $Japan
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 4400750 4304313 4497187 4253263 4548238
## 2013 4418175 4269352 4566999 4190570 4645781
## 2014 4435601 4237821 4633381 4133122 4738079
## 2015 4453026 4206494 4699558 4075987 4830064
## 2016 4470451 4174303 4766599 4017531 4923371
## 2017 4487876 4140794 4834959 3957059 5018693
## 2018 4505301 4105755 4904848 3894247 5116356
## 2019 4522727 4069086 4976368 3828943 5216511
## 2020 4540152 4030745 5049559 3761081 5319223
## 2021 4557577 3990720 5124434 3690645 5424510
##
## $Netherlands
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 722492.5 712037.2 732947.8 706502.6 738482.4
## 2013 733202.0 715504.3 750899.8 706135.6 760268.4
## 2014 743911.5 721171.5 766651.6 709133.7 778689.4
## 2015 754621.1 727769.5 781472.6 713555.1 795687.0
## 2016 765330.6 734918.3 795742.8 718819.1 811842.1
## 2017 776040.1 742442.4 809637.8 724656.9 827423.3
## 2018 786749.6 750243.5 823255.8 730918.3 842581.0
## 2019 797459.1 758259.7 836658.6 737508.7 857409.6
## 2020 808168.7 766449.4 849887.9 744364.6 871972.8
## 2021 818878.2 774782.9 862973.5 751440.3 886316.1
##
## $Norway
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 310730.3 307307.9 314152.7 305496.3 315964.4
## 2013 315564.8 309262.5 321867.0 305926.3 325203.2
## 2014 320399.2 312169.8 328628.7 307813.3 332985.1
## 2015 325233.7 315449.5 335017.8 310270.1 340197.3
## 2016 330068.1 318944.5 341191.8 313056.0 347080.3
## 2017 334902.6 322584.2 347220.9 316063.3 353741.9
## 2018 339737.0 326330.1 353144.0 319232.8 360241.3
## 2019 344571.5 330157.9 358985.1 322527.7 366615.3
## 2020 349406.0 334051.5 364760.4 325923.4 372888.5
## 2021 354240.4 337999.6 370481.2 329402.2 379078.6
##With purrr, chaining together these maps becomes super easy. No messy loops or confusing lapply calls to deal with. It's functional programming at its best.Plotting the forecasts
Now that we have our forecasts, it would be good to visualize them. We’re going to use themap2 function to do this.
par(mfrow=c(3,3)) # set grid for plots
map2(forecasts, names(forecasts),
function(forecast, country) plot(forecast,
main=country,
bty="n",
ylab="GDP",
xlab="Year"))## $Germany
## $Germany$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 3254005 3303415 3352824 3402233 3451643 3501052 3550461 3599871
## [9] 3649280 3698689
##
## $Germany$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 3125365 3057267
## 2013 3121490 3025185
## 2014 3130013 3012064
## 2015 3144953 3008757
## 2016 3163995 3011723
## 2017 3185949 3019144
## 2018 3210111 3029941
## 2019 3236021 3043411
## 2020 3263360 3059066
## 2021 3291893 3076549
##
## $Germany$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 3382645 3450743
## 2013 3485339 3581644
## 2014 3575635 3693584
## 2015 3659513 3795709
## 2016 3739291 3891562
## 2017 3816154 3982960
## 2018 3890811 4070981
## 2019 3963720 4156330
## 2020 4035200 4239494
## 2021 4105485 4320829
##
##
## $Singapore
## $Singapore$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 314488.9 334273.0 355945.8 364567.7 376784.1 395694.1 409962.5
## [8] 421816.8 437396.3 453099.6
##
## $Singapore$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 306869.7 302836.4
## 2013 322897.8 316876.1
## 2014 342549.7 335458.2
## 2015 347442.0 338376.1
## 2016 354788.1 343144.1
## 2017 369588.2 355768.6
## 2018 379526.6 363414.8
## 2019 386192.3 367333.9
## 2020 396493.2 374840.5
## 2021 406952.4 382523.6
##
## $Singapore$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 322108.1 326141.4
## 2013 345648.3 351670.0
## 2014 369342.0 376433.5
## 2015 381693.5 390759.3
## 2016 398780.1 410424.1
## 2017 421800.0 435619.6
## 2018 440398.4 456510.3
## 2019 457441.3 476299.7
## 2020 478299.4 499952.2
## 2021 499246.7 523675.6
##
##
## $Finland
## $Finland$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 205227.8 208433.7 211639.5 214845.3 218051.1 221256.9 224462.8
## [8] 227668.6 230874.4 234080.2
##
## $Finland$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 198962.2 195645.3
## 2013 197695.6 192011.2
## 2014 197806.4 190483.6
## 2015 198492.9 189836.4
## 2016 199518.7 189708.3
## 2017 200775.3 189933.0
## 2018 202201.9 190417.8
## 2019 203760.6 191104.5
## 2020 205425.7 191953.9
## 2021 207178.8 192938.1
##
## $Finland$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 211493.5 214810.3
## 2013 219171.7 224856.1
## 2014 225472.5 232795.3
## 2015 231197.7 239854.1
## 2016 236583.5 246393.9
## 2017 241738.6 252580.9
## 2018 246723.6 258507.8
## 2019 251576.6 264232.7
## 2020 256323.2 269794.9
## 2021 260981.7 275222.4
##
##
## $France
## $France$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 2321635 2356110 2390585 2425060 2459535 2494010 2528485 2562960
## [9] 2597435 2631909
##
## $France$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 2292282 2276744
## 2013 2307001 2281005
## 2014 2327642 2294323
## 2015 2350818 2311517
## 2016 2375500 2331014
## 2017 2401209 2352083
## 2018 2427678 2374313
## 2019 2454737 2397447
## 2020 2482273 2421310
## 2021 2510204 2445776
##
## $France$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 2350987 2366525
## 2013 2405218 2431215
## 2014 2453527 2486846
## 2015 2499301 2538602
## 2016 2543570 2588055
## 2017 2586810 2635936
## 2018 2629292 2682656
## 2019 2671182 2728472
## 2020 2712596 2773559
## 2021 2753615 2818043
##
##
## $Italy
## $Italy$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 1962509 1974968 1987427 1999886 2012345 2024805 2037264 2049723
## [9] 2062182 2074641
##
## $Italy$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 1924947 1905063
## 2013 1918926 1889259
## 2014 1915164 1876910
## 2015 1912210 1865797
## 2016 1909542 1855121
## 2017 1906911 1844502
## 2018 1904183 1833734
## 2019 1901278 1822696
## 2020 1898147 1811312
## 2021 1894758 1799534
##
## $Italy$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 2000071 2019955
## 2013 2031011 2060678
## 2014 2059691 2097945
## 2015 2087563 2133976
## 2016 2115149 2169570
## 2017 2142698 2205107
## 2018 2170344 2240793
## 2019 2198167 2276749
## 2020 2226217 2313051
## 2021 2254523 2349747
##
##
## $Ireland
## $Ireland$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 186180.6 189686.3 193192.0 196697.8 200203.5 203709.2 207214.9
## [8] 210720.6 214226.3 217732.1
##
## $Ireland$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 181847.1 179553.0
## 2013 180220.0 175208.9
## 2014 180525.5 173820.2
## 2015 181490.1 173439.6
## 2016 182822.3 173621.3
## 2017 184397.6 174174.7
## 2018 186149.1 174997.5
## 2019 188035.8 176027.2
## 2020 190030.7 177222.2
## 2021 192114.4 178553.3
##
## $Ireland$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 190514.1 192808.2
## 2013 199152.6 204163.8
## 2014 205858.6 212563.9
## 2015 211905.4 219955.9
## 2016 217584.6 226785.7
## 2017 223020.8 233243.7
## 2018 228280.7 239432.3
## 2019 233405.4 245414.0
## 2020 238422.0 251230.4
## 2021 243349.7 256910.8
##
##
## $Japan
## $Japan$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 4400750 4418175 4435601 4453026 4470451 4487876 4505301 4522727
## [9] 4540152 4557577
##
## $Japan$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 4304313 4253263
## 2013 4269352 4190570
## 2014 4237821 4133122
## 2015 4206494 4075987
## 2016 4174303 4017531
## 2017 4140794 3957059
## 2018 4105755 3894247
## 2019 4069086 3828943
## 2020 4030745 3761081
## 2021 3990720 3690645
##
## $Japan$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 4497187 4548238
## 2013 4566999 4645781
## 2014 4633381 4738079
## 2015 4699558 4830064
## 2016 4766599 4923371
## 2017 4834959 5018693
## 2018 4904848 5116356
## 2019 4976368 5216511
## 2020 5049559 5319223
## 2021 5124434 5424510
##
##
## $Netherlands
## $Netherlands$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 722492.5 733202.0 743911.5 754621.1 765330.6 776040.1 786749.6
## [8] 797459.1 808168.7 818878.2
##
## $Netherlands$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 712037.2 706502.6
## 2013 715504.3 706135.6
## 2014 721171.5 709133.7
## 2015 727769.5 713555.1
## 2016 734918.3 718819.1
## 2017 742442.4 724656.9
## 2018 750243.5 730918.3
## 2019 758259.7 737508.7
## 2020 766449.4 744364.6
## 2021 774782.9 751440.3
##
## $Netherlands$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 732947.8 738482.4
## 2013 750899.8 760268.4
## 2014 766651.6 778689.4
## 2015 781472.6 795687.0
## 2016 795742.8 811842.1
## 2017 809637.8 827423.3
## 2018 823255.8 842581.0
## 2019 836658.6 857409.6
## 2020 849887.9 871972.8
## 2021 862973.5 886316.1
##
##
## $Norway
## $Norway$mean
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## [1] 310730.3 315564.8 320399.2 325233.7 330068.1 334902.6 339737.0
## [8] 344571.5 349406.0 354240.4
##
## $Norway$lower
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 307307.9 305496.3
## 2013 309262.5 305926.3
## 2014 312169.8 307813.3
## 2015 315449.5 310270.1
## 2016 318944.5 313056.0
## 2017 322584.2 316063.3
## 2018 326330.1 319232.8
## 2019 330157.9 322527.7
## 2020 334051.5 325923.4
## 2021 337999.6 329402.2
##
## $Norway$upper
## Time Series:
## Start = 2012
## End = 2021
## Frequency = 1
## 80% 95%
## 2012 314152.7 315964.4
## 2013 321867.0 325203.2
## 2014 328628.7 332985.1
## 2015 335017.8 340197.3
## 2016 341191.8 347080.3
## 2017 347220.9 353741.9
## 2018 353144.0 360241.3
## 2019 358985.1 366615.3
## 2020 364760.4 372888.5
## 2021 370481.2 379078.6
Turning the forecasts into a data.frame
Of course, we wouldn’t be doing function programming if we didn’t have a reducing step. Thankfully, purrr has a reduce function made for doing this. I’m going to take each of my forecasts, convert them to data.frames and then stack them together with rbind. The code for this is again really simple.
forecasts_df = forecasts %>%
map(as.data.frame) %>%
reduce(rbind)
forecasts_df## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2012 3254005.3 3125365.2 3382645.4 3057267.3 3450743.4
## 2013 3303414.6 3121490.1 3485339.2 3025185.0 3581644.3
## 2014 3352823.9 3130012.8 3575635.1 3012063.6 3693584.2
## 2015 3402233.3 3144953.1 3659513.5 3008757.1 3795709.4
## 2016 3451642.6 3163994.6 3739290.6 3011722.9 3891562.2
## 2017 3501051.9 3185949.3 3816154.5 3019144.0 3982959.7
## 2018 3550461.2 3210111.5 3890810.9 3029941.2 4070981.2
## 2019 3599870.5 3236021.4 3963719.7 3043411.3 4156329.8
## 2020 3649279.8 3263359.5 4035200.1 3059065.7 4239494.0
## 2021 3698689.1 3291893.4 4105484.9 3076548.8 4320829.5
## 20121 314488.9 306869.7 322108.1 302836.4 326141.4
## 20131 334273.0 322897.8 345648.3 316876.1 351670.0
## 20141 355945.8 342549.7 369342.0 335458.2 376433.5
## 20151 364567.7 347442.0 381693.5 338376.1 390759.3
## 20161 376784.1 354788.1 398780.1 343144.1 410424.1
## 20171 395694.1 369588.2 421800.0 355768.6 435619.6
## 20181 409962.5 379526.6 440398.4 363414.8 456510.3
## 20191 421816.8 386192.3 457441.3 367333.9 476299.7
## 20201 437396.3 396493.2 478299.4 374840.5 499952.2
## 20211 453099.6 406952.4 499246.7 382523.6 523675.6
## 20122 205227.8 198962.2 211493.5 195645.3 214810.3
## 20132 208433.7 197695.6 219171.7 192011.2 224856.1
## 20142 211639.5 197806.4 225472.5 190483.6 232795.3
## 20152 214845.3 198492.9 231197.7 189836.4 239854.1
## 20162 218051.1 199518.7 236583.5 189708.3 246393.9
## 20172 221256.9 200775.3 241738.6 189933.0 252580.9
## 20182 224462.8 202201.9 246723.6 190417.8 258507.8
## 20192 227668.6 203760.6 251576.6 191104.5 264232.7
## 20202 230874.4 205425.7 256323.2 191953.9 269794.9
## 20212 234080.2 207178.8 260981.7 192938.1 275222.4
## 20123 2321634.7 2292282.3 2350987.0 2276744.2 2366525.2
## 20133 2356109.6 2307001.2 2405218.1 2281004.7 2431214.6
## 20143 2390584.6 2327642.4 2453526.9 2294322.8 2486846.5
## 20153 2425059.6 2350818.0 2499301.2 2311516.9 2538602.3
## 20163 2459534.6 2375499.5 2543569.7 2331014.0 2588055.2
## 20173 2494009.6 2401208.8 2586810.3 2352083.1 2635936.1
## 20183 2528484.6 2427677.5 2629291.6 2374313.5 2682655.6
## 20193 2562959.5 2454736.9 2671182.2 2397447.2 2728471.8
## 20203 2597434.5 2482272.8 2712596.3 2421309.8 2773559.2
## 20213 2631909.5 2510203.6 2753615.4 2445776.5 2818042.5
## 20124 1962509.1 1924947.0 2000071.2 1905062.8 2019955.3
## 20134 1974968.2 1918925.7 2031010.6 1889258.6 2060677.7
## 20144 1987427.3 1915163.8 2059690.7 1876909.8 2097944.7
## 20154 1999886.3 1912210.0 2087562.7 1865796.9 2133975.7
## 20164 2012345.4 1909541.6 2115149.2 1855120.6 2169570.3
## 20174 2024804.5 1906910.9 2142698.2 1844501.7 2205107.3
## 20184 2037263.6 1904182.8 2170344.4 1833734.1 2240793.1
## 20194 2049722.7 1901277.9 2198167.5 1822696.0 2276749.4
## 20204 2062181.8 1898147.0 2226216.6 1811312.2 2313051.4
## 20214 2074640.9 1894758.4 2254523.3 1799534.4 2349747.3
## 20125 186180.6 181847.1 190514.1 179553.0 192808.2
## 20135 189686.3 180220.0 199152.6 175208.9 204163.8
## 20145 193192.0 180525.5 205858.6 173820.2 212563.9
## 20155 196697.8 181490.1 211905.4 173439.6 219955.9
## 20165 200203.5 182822.3 217584.6 173621.3 226785.7
## 20175 203709.2 184397.6 223020.8 174174.7 233243.7
## 20185 207214.9 186149.1 228280.7 174997.5 239432.3
## 20195 210720.6 188035.8 233405.4 176027.2 245414.0
## 20205 214226.3 190030.7 238422.0 177222.2 251230.4
## 20215 217732.1 192114.4 243349.7 178553.3 256910.8
## 20126 4400750.2 4304313.4 4497187.0 4253262.9 4548237.6
## 20136 4418175.4 4269352.2 4566998.6 4190570.0 4645780.8
## 20146 4435600.6 4237820.5 4633380.8 4133122.0 4738079.2
## 20156 4453025.8 4206493.7 4699558.0 4075987.4 4830064.3
## 20166 4470451.1 4174302.7 4766599.4 4017531.2 4923370.9
## 20176 4487876.3 4140793.6 4834958.9 3957059.2 5018693.4
## 20186 4505301.5 4105754.6 4904848.4 3894247.2 5116355.8
## 20196 4522726.7 4069085.7 4976367.7 3828942.7 5216510.7
## 20206 4540151.9 4030744.7 5049559.1 3761080.9 5319222.9
## 20216 4557577.1 3990720.5 5124433.8 3690644.7 5424509.6
## 20127 722492.5 712037.2 732947.8 706502.6 738482.4
## 20137 733202.0 715504.3 750899.8 706135.6 760268.4
## 20147 743911.5 721171.5 766651.6 709133.7 778689.4
## 20157 754621.1 727769.5 781472.6 713555.1 795687.0
## 20167 765330.6 734918.3 795742.8 718819.1 811842.1
## 20177 776040.1 742442.4 809637.8 724656.9 827423.3
## 20187 786749.6 750243.5 823255.8 730918.3 842581.0
## 20197 797459.1 758259.7 836658.6 737508.7 857409.6
## 20207 808168.7 766449.4 849887.9 744364.6 871972.8
## 20217 818878.2 774782.9 862973.5 751440.3 886316.1
## 20128 310730.3 307307.9 314152.7 305496.3 315964.4
## 20138 315564.8 309262.5 321867.0 305926.3 325203.2
## 20148 320399.2 312169.8 328628.7 307813.3 332985.1
## 20158 325233.7 315449.5 335017.8 310270.1 340197.3
## 20168 330068.1 318944.5 341191.8 313056.0 347080.3
## 20178 334902.6 322584.2 347220.9 316063.3 353741.9
## 20188 339737.0 326330.1 353144.0 319232.8 360241.3
## 20198 344571.5 330157.9 358985.1 322527.7 366615.3
## 20208 349406.0 334051.5 364760.4 325923.4 372888.5
## 20218 354240.4 337999.6 370481.2 329402.2 379078.6
The reduce step sequentially binds each data.frame. Once you have the final data inforecasts_df you can start working with dplyr, since the data is in a nice tabular format.