A tour of Torch

Tip

This is I getting up to speed with using torch in R from the following resources:

library(torch)
library(torchvision)

1 torch components

1.1 Creating tensors

Tensors are the core data structures in torch.

1.1.1 From R objects

Tensors can be created from R atomic vectors, matrices and arrays using torch_tensor function:

# Create torch tensor from atomic vector
x <- torch_tensor(c(1, 2, 3))
x
torch_tensor
 1
 2
 3
[ CPUFloatType{3} ]

Seems default tensor configuration is by rows?

# Create torch tensor from matrix
m <- matrix(runif(6), nrow = 2)
x <- torch_tensor(m)
x
torch_tensor
 0.3777  0.3414  0.2290
 0.7923  0.0225  0.9320
[ CPUFloatType{2,3} ]
# Create torch tensors from arrays
a <- array(runif(16), dim = c(4, 2, 2))
x <- torch_tensor(a)
x
torch_tensor
(1,.,.) = 
  0.6295  0.3177
  0.0594  0.7283

(2,.,.) = 
  0.2989  0.5853
  0.9285  0.7687

(3,.,.) = 
  0.9246  0.7718
  0.2538  0.4100

(4,.,.) = 
  0.4965  0.4471
  0.0952  0.5655
[ CPUFloatType{4,2,2} ]

1.1.2 Using initialization functions

Tensors can also be created using torch initialization functions.

# Return a tensor filled with values drawn from a unit normal distribution
x <- torch_randn(5, 2, 3)
x
torch_tensor
(1,.,.) = 
  0.2484  0.5517  1.6350
 -0.8673  0.5252 -1.5049

(2,.,.) = 
  0.5737 -0.2363 -0.3382
 -0.7096  0.8415 -1.1913

(3,.,.) = 
 -0.8613  0.8600  0.4103
 -0.1573 -1.9381  0.7452

(4,.,.) = 
  0.1448 -1.4085  2.6813
  1.0972  2.9855 -1.3764

(5,.,.) = 
  0.7437 -0.5917 -0.5778
 -0.0745 -0.0062  0.3855
[ CPUFloatType{5,2,3} ]
torch_zeros(5)
torch_tensor
 0
 0
 0
 0
 0
[ CPUFloatType{5} ]

1.1.3 Converting back to R

torch provides methods e.g as.array, as.matrix, as.numeric, as.integer etc to convert tensors back to R.

# as.array is the most general method and allows
# converting any type of tensor
x <- torch_randn(2, 2)
as.array(x)
           [,1]       [,2]
[1,] -0.4641779  0.7707540
[2,] -0.2247133 -0.4308816

1.2 Tensor attributes

  • data type

  • device

  • dimensions

  • require_grad

1.2.1 Accessing attributes

Tensor attributes can be accessed using the $ operator:

x <- torch_randn(2, 2)
# Access data type
x$dtype
torch_Float
# Access device
x$device
torch_device(type='cpu')
# Access dimensions
x$shape
[1] 2 2
# Require gradient
x$requires_grad
[1] FALSE

1.2.2 Modifying attributes

Default tensor attributes can be modified when creating the tensors or later using the $to method.

# Modify tensor during creation
x <- torch_randn(2, 2, dtype = torch_float64())
x
torch_tensor
-0.1781  0.8839
-0.6321  1.4523
[ CPUDoubleType{2,2} ]
# Modify using $to
x <- torch_randn(2, 2)
x <- x$to(dtype = torch_float16())
x
torch_tensor
-1.3828  1.0049
 0.5552 -0.4260
[ CPUHalfType{2,2} ]

1.2.3 CUDA devices

Moving between devices is also done with the $to method, but only cpu devices are available to all systems. Moving between devices is an important operation because tensor operations happen on the device the tensor is located; so if you want to use the fast GPU implementations, you need to move tensors to the CUDA device. A common pattern in torch is to create a device object at the beginning of your script and reuse it as you create and move tensors. For example:

#Create device
device <- if (cuda_is_available()) "cuda" else "cpu"
device <- ifelse(cuda_is_available(), "cuda", "cpu")

x <- torch_randn(2, 2, device = device)
x
torch_tensor
-1.2370  0.3609
-0.8836  0.1419
[ CPUFloatType{2,2} ]
y <- x$to(device = device)
y
torch_tensor
-1.2370  0.3609
-0.8836  0.1419
[ CPUFloatType{2,2} ]

1.3 Indexing tensors

Indexing tensors in torch is very similar to indexing vectors, matrices and arrays in R — with an important difference when using negative indexes.

In torch negative indexes don’t remove the element, instead selection happens starting from the end which is used more frequently.

x <- torch_tensor(1:5)
x
torch_tensor
 1
 2
 3
 4
 5
[ CPULongType{5} ]

Index tensors

# Take the first element
x[1]
torch_tensor
1
[ CPULongType{} ]
# Negative index from last
x[-1]
torch_tensor
5
[ CPULongType{} ]
# Select first 3 elements
x[1:3]
torch_tensor
 1
 2
 3
[ CPULongType{3} ]
# Selecting from 3rd element to last using N
x[3:N]
torch_tensor
 3
 4
 5
[ CPULongType{3} ]
# Select the last 2 elements
x[-2:N]
torch_tensor
 4
 5
[ CPULongType{2} ]
# Select using a boolean tensor
x[x>2]
torch_tensor
 3
 4
 5
[ CPULongType{3} ]

1.3.1 Multidimensional selections

When indexing a tensor with multiple dimensions, you can use dimension-specific indices separated by commas, just like in R. For example:

x <- torch_randn(2, 2, 3)
x
torch_tensor
(1,.,.) = 
 -0.5067 -2.2206 -0.5674
 -3.0301 -0.4091 -0.8339

(2,.,.) = 
 -2.1815  1.5919 -0.7555
  0.5166 -0.3398 -1.1716
[ CPUFloatType{2,2,3} ]
# Selecting the first element in every dimension
x[1, 1, ]
torch_tensor
-0.5067
-2.2206
-0.5674
[ CPUFloatType{3} ]
# Select everything from a dimension using empty argument
x[, , 1]
torch_tensor
-0.5067 -3.0301
-2.1815  0.5166
[ CPUFloatType{2,2} ]
# or
x[..,1]
torch_tensor
-0.5067 -3.0301
-2.1815  0.5166
[ CPUFloatType{2,2} ]
# You can also add a new dimension using the newaxis sugar
x[.., newaxis]
torch_tensor
(1,1,.,.) = 
 -0.5067
 -2.2206
 -0.5674

(2,1,.,.) = 
 -2.1815
  1.5919
 -0.7555

(1,2,.,.) = 
 -3.0301
 -0.4091
 -0.8339

(2,2,.,.) = 
  0.5166
 -0.3398
 -1.1716
[ CPUFloatType{2,2,3,1} ]
# By default when you select a single element from a
# dimension it's dropped
# you can change this behavior by setting drop = FALSE
x[1, ..]
torch_tensor
-0.5067 -2.2206 -0.5674
-3.0301 -0.4091 -0.8339
[ CPUFloatType{2,3} ]
# Subset assignment is also supported
x[1, 1, 1] <- 0
x[1, 1 , 1]
torch_tensor
0
[ CPUFloatType{} ]

1.4 Array Computation

torch provides more than 200 functions and methods that operate on tensors. They range from mathematical operations to utilities for reshaping and modifying tensors.

Most operations have both CPU and GPU backends, and torch will use the backend corresponding to the tensor device.

See some examples below:

x <- c(1, 2, 3) %>% 
  torch_tensor()

# Subtract other scaled by alpha
x %>% 
  torch_sub(1)
torch_tensor
 0
 1
 2
[ CPUFloatType{3} ]
# many torch_* functions have a corresponding tensor method
x$sub(1)
torch_tensor
 0
 1
 2
[ CPUFloatType{3} ]
x %>% 
  torch_exp() %>% 
  torch_log()
torch_tensor
 1
 2
 3
[ CPUFloatType{3} ]

Full documentation: https://torch.mlverse.org/docs/reference/index.html#section-mathematical-operations-on-tensors

1.4.1 Reduction functions

x <- rbind(c(1,2,3), 4:6) %>% 
  torch_tensor()
x
torch_tensor
 1  2  3
 4  5  6
[ CPUFloatType{2,3} ]
# Sum of all elements in the input tensor
x$sum()
torch_tensor
21
[ CPUFloatType{} ]
# Reduce the first dimension i.e sum all rows for each column
# Reduce rows by adding columns?
x %>% 
  torch_sum(dim = 1)
torch_tensor
 5
 7
 9
[ CPUFloatType{3} ]
# Reduce the 2nd dimension: sum all columns for each row
# Reduce columns by adding rows?
x %>% 
  torch_sum(dim = 2)
torch_tensor
  6
 15
[ CPUFloatType{2} ]

1.4.2 Broadcasting

Allows one to use tensors of different shapes when executing binary/arithmetic operations.

# Simplest broadcasting example
torch_tensor(c(1, 2, 3)) + 1
torch_tensor
 2
 3
 4
[ CPUFloatType{3} ]
# Adding a (3,2) matrix to a (2) vector
torch_ones(3, 2) + torch_tensor(c(1, 2))
torch_tensor
 2  3
 2  3
 2  3
[ CPUFloatType{3,2} ]
torch_ones(2, 3) + torch_tensor(c(1, 2, 3))
torch_tensor
 2  3  4
 2  3  4
[ CPUFloatType{2,3} ]
# Danger will robinson
torch_ones(10, 1) + torch_tensor(rep(1, 10))
torch_tensor
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
 2  2  2  2  2  2  2  2  2  2
[ CPUFloatType{10,10} ]

1.5 What’s Autograd?

Autograd allows torch to compute exact derivatives of tensor operations with minimal code changes. It’s the central feature for making torch useful for training neural network models.

Suppose we have the operation y = x^3 and we want to compute the derivative \frac{dy}{dx} for a point x = 2 . Answer is 12. See: wolfram

# Create `x` that requires gradient
x <- torch_tensor(2, requires_grad = TRUE)

# Compute `y` as usual
y <- x^3

# Call backward which does the Actual computation of gradients 
y$backward()

# Extract the derivative
x$grad
torch_tensor
 12
[ CPUFloatType{1} ]

1.5.1 Disabling autograd

It might be useful to disable autograd for a few operations e.g when performing inference on a model

x <- torch_tensor(2, requires_grad = TRUE)

with_no_grad({
  y <- x^3
})

# Fails since there is no operation tracked by autograd
# y$backward()

1.5.2 A slightly more advanced example

Suppose now that we have a function f(x) = 3x^2 - 2x and we want to find it’s minimum using gradient descent. We define this function in R with:

f <- function(x){
  3*x^2 - 2*x
}

Next we are going to use the gradient descent algorithm to find its minimum

Tip
  1. We must disable autograd (using with_no_grad) when updating weights, because we don’t want autograd to track the weight updating operation, as we don’t want to back-propagate this operation later.

  2. The update operation must happen in-place on the weight tensor, otherwise this tensor is no longer a leaf tensor and torch can no longer back-propagate gradients for it.

  3. We must manually erase the gradients that we just used to update the tensors, usually by setting them to zero. This must also happen in-place, by using, for example, x$grad$zero_(). By default torch accumulates gradients after backward, and in general we want to start afresh the gradient accumulation after the weight update.

The torch optim_ functions will help us not having to remember all the details.

# Define learning rate
lr <- 0.1

# Number of iterations
num_iter <- 20

# Start with a random number
x <- torch_randn(1, requires_grad = TRUE)


for (i in seq_len(num_iter)){
  y <- f(x)
  y$backward()
  with_no_grad({
    x$sub_(lr*x$grad)
    x$grad$zero_()
})
}
x
torch_tensor
 0.3333
[ CPUFloatType{1} ][ requires_grad = TRUE ]

If we make a plot of this function, we can easily see that this value corresponds to the minimum:

library(tidyverse)
df <- tibble::tibble(x = as.numeric(x), y = f(x))

df %>% 
  ggplot(mapping = aes(x = x, y = y)) +
  geom_point(color = "red", size = 2) +
  stat_function(fun = f) +
  xlim(-2, 2)

1.6 Optimizers

torch optimizers are torch’s abstraction to encapsulate weight-updating logic.

First, let’s come back to the example we used in the autograd chapter. In this example we manually updated the weights using the simplest form of the gradient descent algorithm.

The logic for updating the weights is wrapped in the with_no_grad block.

# Start with a random number
x <- torch_randn(1, requires_grad = TRUE)

for (i in seq_len(num_iter)){
  y = f(x)
  y$backward()
  
  # ------ -> updating the weights
  with_no_grad({
    x$sub_(lr*x$grad)
    x$grad$zero_()
  })
  # ------ <- updating the weights
}

x
torch_tensor
 0.3333
[ CPUFloatType{1} ][ requires_grad = TRUE ]

We can rewrite this piece using the packaged torch optimizers.

# Learning rate
lr <- 0.1

# Number of iterations
num_iter <- 20

# Starting point
x <- torch_randn(1, requires_grad = TRUE)

# We create the optimizer and the first argument is a list
# of weights we want to optimize. It also takes a
# learning rate argument
optimizer <- optim_sgd(x, lr = lr)

for (i in seq_len(num_iter)){
  # Refresh the grad attribute of all parameters
  optimizer$zero_grad()
  y <- f(x)
  y$backward()
  # Perform one update step fo all parameters
  optimizer$step()
}
x
torch_tensor
 0.3333
[ CPUFloatType{1} ][ requires_grad = TRUE ]

Note that we no longer need to wrap our calls in with_no_grad, nor manually perform the update step with an in-place operation. The optimizer takes care of all of these implementation details. We still need to compute the loss and use the $backward() method to populate the grad attribute of the each parameter.

1.7 Neural network modules

In torch all layers and models are called neural network modules , or for short, nn_modules.

Deep learning models can be thought of as functions that operate on tensors but these functions have a special technical feature though: they have a state (weights and parameters) which change during training.

1.7.1 Implemented nn_modules

torch provides implementations of many of the most common neural network layers, like convolutional, recurrent, pooling and activation layers, as well as common loss functions.

# Apply a linear module and define structure
linear <- nn_linear(in_features = 10, out_features = 1)
linear
An `nn_module` containing 11 parameters.

-- Parameters ------------------------------------------------------------------
* weight: Float [1:1, 1:10]
* bias: Float [1:1]
x <- torch_randn(3, 10)
linear(x)
torch_tensor
-0.2091
 0.5430
-0.7450
[ CPUFloatType{3,1} ][ grad_fn = <AddmmBackward0> ]

Instances of nn_modules also have methods that are useful; for example, to inspect their parameters or move them to a different device).

# List parameters
str(linear$parameters)
List of 2
 $ weight:Float [1:1, 1:10]
 $ bias  :Float [1:1]
# Access individual parameters
linear$weight
torch_tensor
 0.1418  0.1971  0.3020  0.1634 -0.2996 -0.1990 -0.1604  0.0243 -0.1020 -0.2254
[ CPUFloatType{1,10} ][ requires_grad = TRUE ]
linear$bias
torch_tensor
0.01 *
 1.7467
[ CPUFloatType{1} ][ requires_grad = TRUE ]
# Moves the parameters to the specified device
linear$to(device = "cpu")

Find list of modules: Function reference

1.7.2 Custom nn_modules

To build a custom nn_module one requires 2 functions:

  • initialize: The initialize method is used to initialize model parameters and has access to the self object that can be used to share states between methods

  • forward: The forward method describes the transformations that the nn_module is going to perform on input data.

# Create a Linear nn_module
Linear <- nn_module(
  # Initialize model param
  initialize = function(in_features, out_features){
    # Indicates to nn_module that x is a parameter
    self$w <- nn_parameter(torch_randn(in_features, out_features))
    self$b <- nn_parameter(torch_zeros(out_features))
    
  },
  
  # Describe trans to data
  forward = function(input){
    # Matrix multiplication
    torch_mm(input, self$w) + self$b
    
  }
  
  
  
)

# Create an instance of it
lin <- Linear(in_features = 10, out_features = 1)
lin
An `nn_module` containing 11 parameters.

-- Parameters ------------------------------------------------------------------
* w: Float [1:10, 1:1]
* b: Float [1:1]

We now have an instance of the Linear module that is called lin. We are now able to use this instance to actually perform the linear model computation on a tensor. We use the instance as an R function, but it will actually delegate to the forward method that we defined earlier.

x <- torch_randn(3, 10)
lin(x)
torch_tensor
 1.4604
-0.5456
-0.9156
[ CPUFloatType{3,1} ][ grad_fn = <AddBackward0> ]

1.7.3 Combining multiple modules

nn_modules can also include sub-modules ,and this is what allows us to write modules using the same abstraction that we use to write layers.

For example, let’s build a multi-layer perceptron module with a ReLu activation.

# MLP with ReLu activation
nn_mlp <- nn_module(
  # Initialize model states
  initialize = function(in_features, hidden_features, out_features){
    self$fc1 = nn_linear(in_features, hidden_features)
    self$relu = nn_relu()
    self$fc2 = nn_linear(hidden_features, out_features)
  },
  # Define transformations that will be performed
  forward = function(input){
    input %>% 
      self$fc1() %>% 
      self$relu() %>% 
      self$fc2()
  }
)
mlp <- nn_mlp(in_features = 10, hidden_features = 5, out_features = 1)
mlp
An `nn_module` containing 61 parameters.

-- Modules ---------------------------------------------------------------------
* fc1: <nn_linear> #55 parameters
* relu: <nn_relu> #0 parameters
* fc2: <nn_linear> #6 parameters
# Calling the model
x <- torch_randn(3, 10)
mlp(x)
torch_tensor
 0.1746
 0.0258
 0.1690
[ CPUFloatType{3,1} ][ grad_fn = <AddmmBackward0> ]
Tip

In torch there’s no difference between module and models, i.e., an nn_module can be as low-level as a ReLu activation, or a much higher-level ResNet model.

1.7.4 Sequential modules

When the forward method in the nn_module just calls the submodules in a sequence like in the previous example, one can use nn_sequential container to skip writing the forward method:

mlp <- nn_sequential(
  nn_linear(10, 5),
  nn_relu(),
  nn_linear(5, 1)
)

mlp
An `nn_module` containing 61 parameters.

-- Modules ---------------------------------------------------------------------
* 0: <nn_linear> #55 parameters
* 1: <nn_relu> #0 parameters
* 2: <nn_linear> #6 parameters

1.7.5 Functional API

Most nn_* modules have a nnf_* counterpart, for example, nnf_relu() and nn_relu().

Sometimes the functional API is more convenient, specially if the module counterpart does not include parameters, because it allows you to avoid initializing the module.

Didn’t get this clearly so more reading on this later

1.7.6 Example: training a linear model

Let’s use everything we learned until now to train a linear model on simulated data. First, let’s simulate a data set.

We will generate a matrix with 100 observations of 3 variables, all randomly generated from the standard normal distribution. The response tensor will be generated using the equation: y=0.5+2∗x1−3∗x2+x3+noise We also add a small amount of noise sample from N(0,0.1).

# Generate a matrix with 100 observations of 3 variables
x <-torch_randn(100, 3) 

# Equation for output tensor
y <- 0.5 + 2*x[,1] - 3*x[,2] + x[,3] + torch_randn(100)/10 

# y dimension to be 100X1
y <- y[, newaxis]
y
torch_tensor
-0.8791
 2.6720
 0.8301
-0.7848
-0.7519
-2.3814
 2.4995
 4.6559
-5.5641
-5.9122
-3.0375
-2.3074
-1.5036
-1.0461
 1.4571
 2.5013
-4.6902
 0.6530
-7.4685
 3.2246
 0.8506
 3.3885
-3.6155
 2.2077
-0.9626
 2.2580
 1.1011
 3.4453
 1.3866
 1.7551
... [the output was truncated (use n=-1 to disable)]
[ CPUFloatType{100,1} ]

We now define our model and optimizer:

# Define model: MLP
# model <- nn_sequential(
#   nn_linear(in_features = 3, out_features = 32),
#   nn_relu(),
#   nn_linear(in_features = 32, out_features = 1)
# )
model <- nn_linear(in_features = 3, out_features = 1)
model
An `nn_module` containing 4 parameters.

-- Parameters ------------------------------------------------------------------
* weight: Float [1:1, 1:3]
* bias: Float [1:1]
# Define optimizer that implements SGD 
opt <- optim_sgd(model$parameters, lr = 0.1)
opt
<optim_sgd>
  Inherits from: <torch_Optimizer>
  Public:
    add_param_group: function (param_group) 
    clone: function (deep = FALSE) 
    defaults: list
    initialize: function (params, lr = optim_required(), momentum = 0, dampening = 0, 
    load_state_dict: function (state_dict) 
    param_groups: list
    state: State, R6
    state_dict: function () 
    step: function (closure = NULL) 
    zero_grad: function () 
  Private:
    step_helper: function (closure, loop_fun) 

Training loop

# Training loop to see whether we can obtain function weights back
for (iter in 1:10){
  # Refresh the grad attribute of all parameters
  opt$zero_grad()
  pred <- model(x)
  loss <- nnf_mse_loss(y, pred)
  # calculates the gradients/back propagation
  loss$backward()
  # use the optimizer to update model parameters
  opt$step()
  cat("Loss at step ", iter, ": ", loss$item(), "\n")
}
Loss at step  1 :  16.37103 
Loss at step  2 :  10.025 
Loss at step  3 :  6.223922 
Loss at step  4 :  3.91603 
Loss at step  5 :  2.495534 
Loss at step  6 :  1.609513 
Loss at step  7 :  1.049808 
Loss at step  8 :  0.6920332 
Loss at step  9 :  0.4608567 
Loss at step  10 :  0.3100319 
Note

the idiom of zeroing gradients is here to stay: Values stored in grad fields accumulate; whenever we’re done using them, we need to zero them out before reuse.

We can finally see the final parameter values. Compare them to the theoretical values and they should be similar to the values we used to simulate our data.

model$weight
torch_tensor
 1.7861 -2.6981  0.7968
[ CPUFloatType{1,3} ][ requires_grad = TRUE ]
model$bias
torch_tensor
 0.2601
[ CPUFloatType{1} ][ requires_grad = TRUE ]

Save model for inferencing using torch_save

Warning

saveRDS doesn’t work correctly for torch models.

# # Finally save model
# torch_save(model, "model.pt")
# 
# # To reload model
# torch_load("model.pt")

1.8 Datasets and dataloaders

torch_dataset is the object representing data in torch

1.8.1 Custom datasets

A new torch_dataset can be created using the dataset function, which requires the following 3 functions as arguments:

  • initilize: takes inputs for dataset initialization

  • .getitem: takes single integer as input and returns an observation of the dataset

  • .length: returns total number of observations

# Custom torch_dataset
mydataset <- dataset(
  initialize = function(n_rows, n_cols){
    self$x <- torch_randn(n_rows, n_cols)
    self$y <- torch_randn(n_rows)
  },
  
  # We subset the previously initialized x and y using index provided
  .getitem = function(index){
    list(self$x[index, ], self$y[index])
  },
  
  # Number of rows by looking at the initialized tensor x
  .length = function(){
    self$x$shape[1]
  }
  
)

The dataset function creates a definition of how to initialize and get elements from a dataset and compute length. Initialize dataset and start extracting elements from it:

# Initialize
ds <- mydataset(n_rows = 10, n_cols = 3)

# length
length(ds)
[1] 10
# Extract first observation
ds[1]
[[1]]
torch_tensor
-0.3693
 0.0837
-0.5006
[ CPUFloatType{3} ]

[[2]]
torch_tensor
-2.22254
[ CPUFloatType{} ]
# or equivalently
ds$.getitem(1)
[[1]]
torch_tensor
-0.3693
 0.0837
-0.5006
[ CPUFloatType{3} ]

[[2]]
torch_tensor
-2.22254
[ CPUFloatType{} ]

1.8.2 Common patterns:

The dataset() function allows us to define data loading and pre-processing in a very flexible way. We can decide how to implement the dataset in the way it works best for our problem.

See:

1.8.3 Dataloaders

Dataloaders are torch’s abstraction used to iterate over datasets in batches, and optionally shuffle and prepare data in parallel.

A dataloader is created by passing a dataset instance to the dataloader() function:

library(torchvision)
# Taking the validation dataset
mnist <- mnist_dataset(root = "data-raw/mnist", download = TRUE, train = FALSE)

# Data loader
dl <- dataloader(mnist, batch_size = 32, shuffle = TRUE)

# Number of batches we can extract from dataloader
length(dl)
[1] 313

length() returns the number of batches we can to extract from the dataloader.

Dataloaders can be iterated on using the coro::loop() function combined with a for loop. The reason we need coro::loop() is that batches in dataloaders are only computed when they are actually used, to avoid large memory usage.

total <- 0
coro::loop(for (batch in dl){
  total <- total + batch$x$shape[1]
})
total
[1] 10000

You can think of dataloaders as an object similar to an R list with the important difference that the elements are not actually computed yet, and they get computed every time you loop trough it.