Simple Neural Network in R

Mikey Tabak, PhD

27 January 2023

A tutorial from Quantitative Science Consulting


This tutorial will show you how to train your first neural network in R. I am glossing over the mathematical details and focusing on implementation so that you can follow this process with your own data. We are using the incredibly powerful torch package for R, which allows you to build, train, and deploy neural networks in R.


In this tutorial, we will predict the price of diamonds based on some features of the diamonds, e.g., carat, size, etc. Price is a continuous response variable (or “target”), and you could use the same code to predict any continuous target in your own dataset (e.g., height, distance)

Load libraries

# for manipulating data
library(dplyr)
# for parallel processing
library(purrr)
# for neural network functionality
library(torch)

You may be prompted to install additional software for torch. Type Yes.


Data preprocessing

Choose your dataset

Here we will only consider continous features. NNs that have categorical features are more advanced. Let’s keep it simple for now, but in a future tutorial, I’ll show you how to include categorical features. If you want to use your own data, this is the only step you need to change. Specifically, change the raw_data object, specify the column names of coninuous features, and specify the target name.

url_data <- "https://raw.githubusercontent.com/mikeyEcology/Dada-Analysis-on-Diamonds-Dataset/main/diamonds.csv"

raw_data <- url_data |>
    url() |>
    vroom::vroom()
continuous_features <- c("carat", "depth", "table", "x", "y", "z")
target_name <- "price"

In this example, we are using data on diamonds. We predict the price of the diamonds based on some features. These data are from Kaggle, you can read more details and get feature definitions here.

Standardize and scale the numerical data

Many statistical and machine learning methods perform better if the data are standardized and scaled. For neural network is is very important and failure to do so can lead to the problem of vanishing or exploding gradients. Fortunately, base R has a function to scale our covariates. Note that we will need to unscale the output of our model to get real world predictions.

features <- raw_data |>
    dplyr::select(dplyr::all_of(continuous_features)) |>
    scale() |>
    # scale converts the data into an unfortunate format. Make it back to df
    tibble::as_tibble()

target <- raw_data |>
    dplyr::select(dplyr::all_of(target_name)) |>
    scale() |>
    tibble::as_tibble()

train_val <- dplyr::bind_cols(features, target)

After scaling these features, we expect them to have a mean = 0 and sd = 1. Our values might be slightly different from 0, and 1, but they should be close

apply(train_val, 2, mean)
##         carat         depth         table             x             y 
##  3.563583e-14  5.518015e-13 -3.419117e-14  1.151112e-13  8.160402e-14 
##             z         price 
## -2.502553e-13 -6.337301e-15
apply(train_val, 2, sd)
## carat depth table     x     y     z price 
##     1     1     1     1     1     1     1

Visualize the data

You should always plot your data before fitting models! There are many ways to look at your data and that is the subject of another tutorial. We’ll just plot each variable against the other ones.

pairs(train_val)

Convert the data into the dataset class used by torch

You might be unfamilliar with this type of data construct in R. Object oriented programming is less common in R and it can take some getting used to.

dset <- torch::dataset(
    name = "dset",

    initialize = function(indices) {
        data <- self$prepare_data(train_val[indices, ])
        self$x <- data[[1]]
        self$y <- data[[2]]
    },

    .getitem = function(i) {
        x <- self$x[i, ]
        y <- self$y[i, ]
        
        list(x, y)
    },
  
    .length = function() {
        dim(self$y)[1]
    },

    prepare_data = function(input) {
        feature_cols <- input |>
            dplyr::select(dplyr::all_of(continuous_features)) |>
            as.matrix()

        target_col <- input |>
            dplyr::select(dplyr::all_of(target_name)) |>
            as.matrix()
    
        list(
            torch_tensor(feature_cols),
            torch_tensor(target_col)
        )
    }
)

Separate the data into training and validation datasets

One portion of the data (in this case 80%) will be used to train the network. The remainder of the data will be used to validate the net: to determine how good the network is as predicting data.

train_indices <- sample(1:nrow(train_val), size = floor(0.8 * nrow(train_val)))
valid_indices <- setdiff(1:nrow(train_val), train_indices)

Make datasets and dataloaders

Dataloaders are special structures for torch (and other deep learning software) that allow the data to be loaded appropirately in batches. To set up a dataloader, we need to first set up a Dataset (using the dataset class defined above).

# bigger batch sizes train faster. General concensus is that this is not a hyperparameter worth tuning. 
batch_size = 256

train_ds <- dset(train_indices)
train_dl <- train_ds |>
    dataloader(batch_size = batch_size, shuffle = TRUE)

valid_ds <- dset(valid_indices)
valid_dl <- valid_ds |>
    dataloader(batch_size = batch_size, shuffle = FALSE)


Neural Network

Define the model

The model structure is also defined using Object Oriented Programming. It will take some getting used to, but it is very customizable. We are building a NN with one hidden layer and fc1_dim neurons in that layer. Our output layer has only 1 neuron because this is regression (there is one value predicted by the network).

net <- nn_module(
  "net",

  initialize = function(num_numerical, fc1_dim
                ) {
    self$fc1 <- nn_linear(num_numerical, fc1_dim)
    self$output <- nn_linear(fc1_dim, 1)
  },

  forward = function(x) {
    x |> 
        self$fc1() |>
        nnf_relu() |>
        self$output() |>
        nnf_sigmoid()
  }
)

Specify aspects of the network

See comments in code for details

# This is the number of neurons in the hidden layer of the neural network
fc1_dim <- 32
# The number of features is the number of neurons in the input layer
num_numerical <- length(continuous_features)

# build this neural net
model <- net(num_numerical, fc1_dim)

Send the model to a Graphics Processing Unit (GPU) if you have one. This is very important for bigger models; a GPU speeds up computaion. If you are concerned about trying to use your GPU, set device<-"cpu". If you are using the GPU, always send the model there before specifying the optimizer (below)

device <- if (cuda_is_available()) torch_device("cuda:0") else "cpu"

model <- model$to(device = device)

Define the hyperparameters. These are options that you can “tune” (adjust) to try and improve model fit

learning_rate <- 0.01
# using stochastic gradient descent for the optimizer
optimizer <- optim_sgd(model$parameters, lr = learning_rate, momentum = 0)
num_epochs <- 10
# using mean squared error loss
loss_func <- nnf_mse_loss

Train and validate the NN

Training and validation are conducted in this one loop

for (epoch in 1:num_epochs) {

  model$train()
  train_losses <- c()  

  coro::loop(for (b in train_dl) {
    optimizer$zero_grad()
    output <- model(b[[1]]$to(device = device))
    loss <- loss_func(output, b[[2]]$to(dtype = torch_float(), device = device))
    loss$backward()
    optimizer$step()
    train_losses <- c(train_losses, loss$item())
  })

  model$eval()
  valid_losses <- c()

  coro::loop(for (b in valid_dl) {
    output <- model(b[[1]]$to(device = device))
    loss <- loss_func(output, b[[2]]$to(dtype = torch_float(), device = device))
    valid_losses <- c(valid_losses, loss$item())
  })

  cat(sprintf("Epoch %d: training loss: %3f, validation loss: %3f\n", epoch, mean(train_losses), mean(valid_losses)))
}
## Epoch 1: training loss: 0.908324, validation loss: 0.758647
## Epoch 2: training loss: 0.684765, validation loss: 0.658025
## Epoch 3: training loss: 0.624438, validation loss: 0.621939
## Epoch 4: training loss: 0.599572, validation loss: 0.603768
## Epoch 5: training loss: 0.585223, validation loss: 0.592853
## Epoch 6: training loss: 0.576741, validation loss: 0.585594
## Epoch 7: training loss: 0.570447, validation loss: 0.580454
## Epoch 8: training loss: 0.566900, validation loss: 0.576667
## Epoch 9: training loss: 0.563283, validation loss: 0.573795
## Epoch 10: training loss: 0.560574, validation loss: 0.571567

Evaluate the model on the validation dataset

# we set model to evaluation model so it knows not to calculate gradients now
model$eval()

# predict the target for the validation dataset
target_pred <- c()
coro::loop(for (b in valid_dl) {
    output <- model(b[[1]]$to(device = device))
    #target_pred <- c(target_pred, output)
    for (i in 1:length(output)) {
        pred_array <- as_array(output)[i,]
        target_pred <- c(target_pred, pred_array)
    }
})

# get the observed target from the validation dataset
# we use `as_array` to convert from tensors to standard R data structures
target_obs <- as_array(valid_ds$y)

Recall that we scaled the data before fitting a model. We need to unscale the data to see these data points in their real value.

# get mean and std dev from original data (these were used when scaling by the `scale` function)
raw_target <- raw_data |>
    dplyr::select(dplyr::all_of(target_name)) |>
    dplyr::pull() 
std_dev <- sd(raw_target)
mn <- mean(raw_target)

# unscale
y_obs <- (target_obs * std_dev) + mn
y_pred <- (target_pred *std_dev) + mn
plot(
    y_obs, y_pred, main = "Model evaluation", 
    xlab = "Observed target value", 
    ylab = "Predicted target value", 
    axes = FALSE, pch = 16
)
segments(
    x0=0, y0=0, x1=max(y_obs), y1=max(y_obs), lty = 2, lwd = 4, col = "red"
)
axis(1)
axis(2, las = 2)

Correlation between observed and predicted

# Pearson correlation
cor(y_obs, y_pred, method = "pearson")
##           [,1]
## [1,] 0.8930453
# Spearman correlation
cor(y_obs, y_pred, method = "spearman")
##           [,1]
## [1,] 0.9595739

You can also use your other favorite methods to compare observed to predicted values.

This is my first data science tutorial. Is it helpful for you? Please reach out on twitter or linkedin to tell me more information on the types of tutorials that would be helpful.

Follow QSC on linkedin to stay up to date with the latest tutorials.


Exercises