All Aboard the cloudml

I prefer to develop locally on my laptop but beyond MNIST or pre-trained models for simple tutorials, training Deep Learning models is pretty intense. Fortunately RStudio, in cahoots with Google, has native R support for their Google Cloud Machine Learning Service. This service lets you send off training jobs to Google’s cloud environment (instead of killing your laptop for a few days).

If you would rather run in the cloud, or are having issues getting things setup locally, you might consider a fully hosted environment like IBM Watson Studio that lets you run RStudio in your browser.

Setting up the Google Cloud SDK

Installation is pretty straightforward but setup can be tricky (as I discovered). RStudio has provided instructions but there are a few missing steps.

I already have a Google Cloud account. If you don’t and you wish to follow along with my journey, you are going to need one! Before you do anything, if you haven’t already set up a Google Cloud account, then go do that. The RStudio docs linked above give some insight into what you want to set up.

The RStudio docs tell us to run the following bits of code:

install.packages("cloudml")

library(cloudml)
gcloud_install()

Success! Let’s try the next step of the RStudio docs.

For the next step of the RStudio docs, we need a training script for CloudML to run. If you’re not quite sure what that’s all about, RStudio has it documented in the training_run() function here: Run a training script

And because we’re saving our precious grey cells for more important deep learning, we’ll use RStudio’s example “minist_mlp.R” script from their tfruns repository: Train a simple deep NN on the MNIST dataset

I used some fancy R code to throw it into a “training_run” script that calls training_run (and thus gets me all the pretty plots). I think my inner Perl Hacker is peeking out in this code.

download.file("https://raw.githubusercontent.com/rstudio/tfruns/master/inst/examples/mnist_mlp/mnist_mlp.R", here::here("bin/mnist_mlp.R"))

fileConn<-file(here::here("bin/training_run.R"))
writeLines(c("library(tfruns)","tfruns::training_run(\"bin/mnist_mlp.R\")"), fileConn)
close(fileConn)

I also want to save a copy of my model for posterity, so I’m going to update my script for that as well.

model_filename <- paste("models/", Sys.time(), " mnist", sep="")

fileConn<-file(open="at", here::here("bin/mnist_mlp.R"))
writeLines(c("save_model_hdf5(model, filepath = paste0(model_filename, \".h5\"))",
             "save_model_weights_hdf5(model, filepath=paste0(model_filename, \"_weights.h5\"))"), 
             fileConn)
close(fileConn)

Alright let’s see what happens!

> job <- cloudml_train(“training-run.R”)

Hrm, error message saying I need to run “gsutils config” to set up authorization.

gcloud auth login

So, I ran this command, which opened up a page in my browser & I was able to login to my Google Cloud account. Great, now let’s try the other command it said to run:

gsutils config

Unfortunately I was caught in some kind of a loop after this & it kept thinking I wasn’t authorized. So I ran gcloud_install() again, which seemed to fix whatever was going on.

Let’s try the next step of the RStudio docs again. Nope, another error.

> job <- cloudml_train(“training-run.R”)

Submitting training job to CloudML...
error in running commandError: ERROR: gcloud invocation failed [exit status 1]

[command]
/Users/auggy/google-cloud-sdk/bin/gsutil ls -p

[output]


[errmsg]
You are attempting to perform an operation that requires a project id, with none configured. Please re-run gsutil config and make sure to follow the instructions for finding and entering your default project id.

OK well, I need to set a default project. gsutil config never prompted me for one, even when I re-ran it so I was stuck in the mud. After much web spelunking I found some nuggets.

From the terminal, I need to figure out my active project with gcloud config list:

➜  using-tensorflow-with-r git:(master) ✗ gcloud config list
[core]
account = ohai@foo.com
disable_usage_reporting = False

Your active configuration is: [default]

I have no idea what [default] means, I’m sure there’s a config file somewhere but right now I don’t care. In any case, after much web spelunking, I found the command to set the project, but I wasn’t sure what the “project id” was or where to find it. After more spelunking, I learned the command is `gcloud projects list’:

➜  using-tensorflow-with-r git:(master) ✗ gcloud projects list
PROJECT_ID                     NAME                           PROJECT_NUMBER
derp-learnings                 Derp Learnings                 XXXXXXXXXXXXX
intro-to-tensorflow            Intro to Tensorflow            XXXXXXXXXXXXX
open-source-community-metrics  Open Source Community Metrics  XXXXXXXXXXXXX

Now I can set the default project with gcloud config set project:

➜  using-tensorflow-with-r git:(master) ✗ gcloud config set project derp-learnings

All Aboard!

OK I think we’re ready to run the next step in the RStudio docs without error!

job <- cloudml_train(here::here("bin/training_run.R"))

Error in validate_application(application, entrypoint) : Entrypoint /Users/auggy/dev/R/hello-cloudml/training_run.R not found under /Users/auggy/dev/R/hello-cloudml

I spoke too soon! The cloudml_train() function requires a relative path because under the hood it’s using get_wd() to figure out the fully qualified path to the training script. Let me go file an issue for this… one sec.

OK, done! Github issue

Whew, alright, third time’s a charm perhaps? Has it been three times? I’ve lost count.

(PSSSST… Don’t forget to hit “Y” on the Console)

job <- cloudml_train("bin/training_run.R")

Submitting training job to CloudML...
Job 'cloudml_2019_02_08_000011003' successfully submitted.

View job in the Cloud Console at:
https://console.cloud.google.com/ml/jobs/cloudml_2019_02_08_000011003?project=derp-learnings

View logs at:
https://console.cloud.google.com/logs?resource=ml.googleapis.com%2Fjob_id%2Fcloudml_2019_02_08_000011003&project=derp-learnings

Check job status with:     job_status("cloudml_2019_02_08_000011003")

Collect job output with:   job_collect("cloudml_2019_02_08_000011003")

After collect, view with:  view_run("runs/cloudml_2019_02_08_000011003")

Monitor and collect job in RStudio Terminal? [Y/n]: y

Riding the cloudml_train()!

tf_runs() was kind enough to generate a fancy html report for me once my training job completed.

Output from tf_runs()

Souvenir Shop

Remember, we added that line to the training script above? Our script serialized the model out to a folder called “models” which ends up the cloudml runs folder.

To access it programmatically, we just need the job id because that’s what our particular subfolder is named!

job$id

## [1] "cloudml_2019_02_08_000011003"

model_path <- here::here("runs", job$id, model_filename)
file.copy(paste0(model_path, ".h5"), here::here("models"))
file.copy(paste0(model_path, "_weights.h5"), here::here("models"))

[1] "/Users/auggy/dev/R/derp-learnings/hello-cloudml/runs/cloudml_2019_02_08_000011003/models/2019-02-08 11:01:04 mnist"

Next Stop

Next time we’ll explore exporting our Keras model using ONNX & importing models so we can carpool with our Pytorch friends!

All Aboard the cloudml_train()

Augustina Ragwitz

May 13, 2019

Setting up the Google Cloud SDK

All Aboard!

Souvenir Shop

Next Stop