This is an example of running an R version of Google Datalab

Google Datalab is a service that lets you easily interact with your data in the Google Cloud. This document is an excercise in trying to replicate the same functionality:

  • Runs on Google Cloud infrastructure using googleComputeEngineR within its own Docker container
  • Uses RStudio and its RMarkdown Notebooks to replicate the Jupyter/iPython functionality
  • Auto authentication with the Google cloud services to work with BigQuery and Cloud Storage data
  • Cross language support of python, SQL and bash via R Notebooks
  • Python data analysis libraries pandas and NumPy
  • Visualisation via R libraries such as the htmlwidgets family
  • Installation of Tensorflow and RStudio‚Äôs tensorflow package
  • Installation of tensorflow helper library tflearn
  • Installation of feather to help R and python share data nicely.

Setup

library(googleAuthR)
## this reuses the authentication of the GCE instance we are on
gar_gce_auth()

library(bigQueryR)
## list authenticated projects
myproject <- bqr_list_projects()

library(googleCloudStorageR)
## Setting scopes to https://www.googleapis.com/auth/devstorage.full_control
## If you need additional scopes set do so via options(googleAuthR.scopes.selected = c('scope1', 'scope2')) before loading library and include one required scope.
## list Cloud Storage buckets
gcs_list_buckets(myproject$id[[1]])
##                                       name storageClass location
## 1 artifacts.mark-edmondson-gde.appspot.com     STANDARD       US
## 2      mark-edmondson-gde-minecraft-backup     STANDARD       US
## 3              mark-edmondson-public-files     STANDARD       EU
##               updated
## 1 2016-10-07 11:37:55
## 2 2015-11-10 09:28:38
## 3 2016-08-27 20:47:23

Demo of running python in same document:

hiss = 'sssssssss'
print "Pythons go %s." % hiss
## Pythons go sssssssss.

Also works with SQL and bash

pip freeze
## Cython==0.25.1
## Pillow==3.4.2
## argparse==1.2.1
## cffi==0.8.6
## chardet==2.3.0
## colorama==0.3.2
## cryptography==0.6.1
## feather-format==0.3.1
## funcsigs==1.0.2
## h5py==2.6.0
## html5lib==0.999
## mock==2.0.0
## ndg-httpsclient==0.3.2
## numpy==1.11.2
## pandas==0.19.1
## pbr==1.10.0
## ply==3.4
## protobuf==3.0.0
## pyOpenSSL==0.14
## pyasn1==0.1.7
## pycparser==2.10
## python-dateutil==2.6.0
## pytz==2016.7
## requests==2.4.3
## six==1.10.0
## tensorflow==0.11.0
## tflearn==0.2.2
## urllib3==1.9.1
## wheel==0.29.0
## wsgiref==0.1.2

Transfer data between R and Python with feather

From the example intro blogpost for feather:

library(feather)
df <- mtcars
path <- "my_data.feather"
write_feather(df, path)
import feather
path = 'my_data.feather'
df = feather.read_dataframe(path)
df.head

Tensorflow

Hello world Python

from __future__ import print_function

import tensorflow as tf

# Simple hello world using TensorFlow

# Create a Constant op
# The op is added as a node to the default graph.
#
# The value returned by the constructor represents the output
# of the Constant op.
hello = tf.constant('Hello, TensorFlow!')

# Start tf session
sess = tf.Session()

# Run the op
print(sess.run(hello))
## Hello, TensorFlow!

Hello world R

library(tensorflow)
sess = tf$Session()
hello <- tf$constant('Hello, TensorFlow!')
sess$run(hello)
## [1] "Hello, TensorFlow!"

tflearn Titanic example

from __future__ import print_function

import numpy as np
import tflearn

# Download the Titanic dataset
from tflearn.datasets import titanic
titanic.download_dataset('titanic_dataset.csv')

# Load CSV file, indicate that the first column represents labels
from tflearn.data_utils import load_csv
data, labels = load_csv('titanic_dataset.csv', target_column=0,
                        categorical_labels=True, n_classes=2)

# Preprocessing function
def preprocess(data, columns_to_ignore):
    # Sort by descending id and delete columns
    for id in sorted(columns_to_ignore, reverse=True):
        [r.pop(id) for r in data]
    for i in range(len(data)):
      # Converting 'sex' field to float (id is 1 after removing labels column)
      data[i][1] = 1. if data[i][1] == 'female' else 0.
    return np.array(data, dtype=np.float32)

# Ignore 'name' and 'ticket' columns (id 1 & 6 of data array)
to_ignore=[1, 6]

# Preprocess data
data = preprocess(data, to_ignore)

# Build neural network
net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)

# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)

# Let's create some data for DiCaprio and Winslet
dicaprio = [3, 'Jack Dawson', 'male', 19, 0, 0, 'N/A', 5.0000]
winslet = [1, 'Rose DeWitt Bukater', 'female', 17, 1, 2, 'N/A', 100.0000]
# Preprocess data
dicaprio, winslet = preprocess([dicaprio, winslet], to_ignore)
# Predict surviving chances (class 1 results)
pred = model.predict([dicaprio, winslet])
print("DiCaprio Surviving Rate:", pred[0][1])
print("Winslet Surviving Rate:", pred[1][1])

tflearn using R as well

From the tflearn quickstart modified to use R for data preprocessing:

import tflearn

# Download the Titanic dataset to local file 'titanic_dataset.csv'
from tflearn.datasets import titanic
titanic.download_dataset('titanic_dataset.csv')
## Scipy not supported!

Use R to process data:

library(dplyr)

titanic <- read.csv('titanic_dataset.csv')

processed <- titanic %>% 
  select(-name, -ticket) %>%
  mutate(sex = as.numeric(as.factor(sex)) - 1)
str(processed)
## 'data.frame':    1309 obs. of  7 variables:
##  $ survived: int  1 1 0 0 0 1 1 0 1 0 ...
##  $ pclass  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ sex     : num  0 1 0 1 0 1 0 1 0 1 ...
##  $ age     : num  29 0.917 2 30 25 ...
##  $ sibsp   : int  0 1 1 1 1 0 1 0 2 0 ...
##  $ parch   : int  0 2 2 2 2 0 0 0 0 0 ...
##  $ fare    : num  211 152 152 152 152 ...
write.table(processed, "processed.csv",sep = ",", quote = FALSE, row.names = FALSE)

Back to Python to run model:

from __future__ import print_function

import numpy as np
import tflearn

# Load processed CSV file, indicate that the first column represents labels
from tflearn.data_utils import load_csv
data, labels = load_csv('processed.csv', target_column=0,
                        categorical_labels=True, n_classes=2)

data = np.array(data, dtype=np.float32)

# Build neural network
net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)

# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=10, batch_size=16)

# Let's create some data for DiCaprio and Winslet
dicaprio = [3, 1, 19, 0, 0, 5.0000]
winslet = [1, 0, 17, 1, 2, 100.0000]

# Predict surviving chances (class 1 results)
pred = model.predict([dicaprio, winslet])
print("DiCaprio Surviving Rate:", pred[0][1])
print("Winslet Surviving Rate:", pred[1][1])
## Scipy not supported!
## ---------------------------------
## Run id: U527AR
## Log directory: /tmp/tflearn_logs/
## ---------------------------------
## Training samples: 1309
## Validation samples: 0
## --
## Training Step: 1 
## 
| Adam | epoch: 001 | loss: 0.00000 -- iter: 0016/1309
## Training Step: 2  | total loss: 0.62355
## 
| Adam | epoch: 001 | loss: 0.62355 -- iter: 0032/1309
## Training Step: 3  | total loss: 0.68624
## 
| Adam | epoch: 001 | loss: 0.68624 -- iter: 0048/1309
## Training Step: 4  | total loss: 0.68828
## 
| Adam | epoch: 001 | loss: 0.68828 -- iter: 0064/1309
## Training Step: 5  | total loss: 0.68793
## 
| Adam | epoch: 001 | loss: 0.68793 -- iter: 0080/1309
## Training Step: 6  | total loss: 0.69098
## 
| Adam | epoch: 001 | loss: 0.69098 -- iter: 0096/1309
## Training Step: 7  | total loss: 0.69084
## 
| Adam | epoch: 001 | loss: 0.69084 -- iter: 0112/1309
## Training Step: 8  | total loss: 0.69009
## 
| Adam | epoch: 001 | loss: 0.69009 -- iter: 0128/1309
## Training Step: 9  | total loss: 0.68775
## 
| Adam | epoch: 001 | loss: 0.68775 -- iter: 0144/1309
## Training Step: 10  | total loss: 0.68380
## 
| Adam | epoch: 001 | loss: 0.68380 -- iter: 0160/1309
## Training Step: 11  | total loss: 0.68446
## 
| Adam | epoch: 001 | loss: 0.68446 -- iter: 0176/1309
## Training Step: 12  | total loss: 0.67775
## 
| Adam | epoch: 001 | loss: 0.67775 -- iter: 0192/1309
## Training Step: 13  | total loss: 0.68567
## 
| Adam | epoch: 001 | loss: 0.68567 -- iter: 0208/1309
## Training Step: 14  | total loss: 0.67682
## 
| Adam | epoch: 001 | loss: 0.67682 -- iter: 0224/1309
## Training Step: 15  | total loss: 0.67355
## 
| Adam | epoch: 001 | loss: 0.67355 -- iter: 0240/1309
## Training Step: 16  | total loss: 0.68062
## 
| Adam | epoch: 001 | loss: 0.68062 -- iter: 0256/1309
## Training Step: 17  | total loss: 0.67059
## 
| Adam | epoch: 001 | loss: 0.67059 -- iter: 0272/1309
## Training Step: 18  | total loss: 0.66260
## 
| Adam | epoch: 001 | loss: 0.66260 -- iter: 0288/1309
## Training Step: 19  | total loss: 0.66117
## 
| Adam | epoch: 001 | loss: 0.66117 -- iter: 0304/1309
## Training Step: 20  | total loss: 0.66160
## 
| Adam | epoch: 001 | loss: 0.66160 -- iter: 0320/1309
## Training Step: 21  | total loss: 0.65622
## 
| Adam | epoch: 001 | loss: 0.65622 -- iter: 0336/1309
## Training Step: 22  | total loss: 0.66092
## 
| Adam | epoch: 001 | loss: 0.66092 -- iter: 0352/1309
## Training Step: 23  | total loss: 0.64904
## 
| Adam | epoch: 001 | loss: 0.64904 -- iter: 0368/1309
## Training Step: 24  | total loss: 0.65021
## 
| Adam | epoch: 001 | loss: 0.65021 -- iter: 0384/1309
## Training Step: 25  | total loss: 0.64501
## 
| Adam | epoch: 001 | loss: 0.64501 -- iter: 0400/1309
## Training Step: 26  | total loss: 0.68222
## 
| Adam | epoch: 001 | loss: 0.68222 -- iter: 0416/1309
## Training Step: 27  | total loss: 0.68186
## 
| Adam | epoch: 001 | loss: 0.68186 -- iter: 0432/1309
## Training Step: 28  | total loss: 0.66725
## 
| Adam | epoch: 001 | loss: 0.66725 -- iter: 0448/1309
## Training Step: 29  | total loss: 0.66035
## 
| Adam | epoch: 001 | loss: 0.66035 -- iter: 0464/1309
## Training Step: 30  | total loss: 0.66366
## 
| Adam | epoch: 001 | loss: 0.66366 -- iter: 0480/1309
## Training Step: 31  | total loss: 0.66252
## 
| Adam | epoch: 001 | loss: 0.66252 -- iter: 0496/1309
## Training Step: 32  | total loss: 0.65841
## 
| Adam | epoch: 001 | loss: 0.65841 -- iter: 0512/1309
## Training Step: 33  | total loss: 0.63773
## 
| Adam | epoch: 001 | loss: 0.63773 -- iter: 0528/1309
## Training Step: 34  | total loss: 0.64177
## 
| Adam | epoch: 001 | loss: 0.64177 -- iter: 0544/1309
## Training Step: 35  | total loss: 0.66026
## 
| Adam | epoch: 001 | loss: 0.66026 -- iter: 0560/1309
## Training Step: 36  | total loss: 0.69326
## 
| Adam | epoch: 001 | loss: 0.69326 -- iter: 0576/1309
## Training Step: 37  | total loss: 0.66589
## 
| Adam | epoch: 001 | loss: 0.66589 -- iter: 0592/1309
## Training Step: 38  | total loss: 0.64924
## 
| Adam | epoch: 001 | loss: 0.64924 -- iter: 0608/1309
## Training Step: 39  | total loss: 0.64548
## 
| Adam | epoch: 001 | loss: 0.64548 -- iter: 0624/1309
## Training Step: 40  | total loss: 0.63763
## 
| Adam | epoch: 001 | loss: 0.63763 -- iter: 0640/1309
## Training Step: 41  | total loss: 0.67934
## 
| Adam | epoch: 001 | loss: 0.67934 -- iter: 0656/1309
## Training Step: 42  | total loss: 0.72584
## 
| Adam | epoch: 001 | loss: 0.72584 -- iter: 0672/1309
## Training Step: 43  | total loss: 0.70782
## 
| Adam | epoch: 001 | loss: 0.70782 -- iter: 0688/1309
## Training Step: 44  | total loss: 0.70890
## 
| Adam | epoch: 001 | loss: 0.70890 -- iter: 0704/1309
## Training Step: 45  | total loss: 0.69403
## 
| Adam | epoch: 001 | loss: 0.69403 -- iter: 0720/1309
## Training Step: 46  | total loss: 0.69799
## 
| Adam | epoch: 001 | loss: 0.69799 -- iter: 0736/1309
## Training Step: 47  | total loss: 0.69056
## 
| Adam | epoch: 001 | loss: 0.69056 -- iter: 0752/1309
## Training Step: 48  | total loss: 0.68293
## 
| Adam | epoch: 001 | loss: 0.68293 -- iter: 0768/1309
## Training Step: 49  | total loss: 0.67184
## 
| Adam | epoch: 001 | loss: 0.67184 -- iter: 0784/1309
## Training Step: 50  | total loss: 0.67261
## 
| Adam | epoch: 001 | loss: 0.67261 -- iter: 0800/1309
## Training Step: 51  | total loss: 0.67288
## 
| Adam | epoch: 001 | loss: 0.67288 -- iter: 0816/1309
## Training Step: 52  | total loss: 0.66129
## 
| Adam | epoch: 001 | loss: 0.66129 -- iter: 0832/1309
## Training Step: 53  | total loss: 0.65795
## 
| Adam | epoch: 001 | loss: 0.65795 -- iter: 0848/1309
## Training Step: 54  | total loss: 0.63502
## 
| Adam | epoch: 001 | loss: 0.63502 -- iter: 0864/1309
## Training Step: 55  | total loss: 0.64543
## 
| Adam | epoch: 001 | loss: 0.64543 -- iter: 0880/1309
## Training Step: 56  | total loss: 0.64112
## 
| Adam | epoch: 001 | loss: 0.64112 -- iter: 0896/1309
## Training Step: 57  | total loss: 0.64085
## 
| Adam | epoch: 001 | loss: 0.64085 -- iter: 0912/1309
## Training Step: 58  | total loss: 0.64628
## 
| Adam | epoch: 001 | loss: 0.64628 -- iter: 0928/1309
## Training Step: 59  | total loss: 0.65419
## 
| Adam | epoch: 001 | loss: 0.65419 -- iter: 0944/1309
## Training Step: 60  | total loss: 0.65015
## 
| Adam | epoch: 001 | loss: 0.65015 -- iter: 0960/1309
## Training Step: 61  | total loss: 0.65702
## 
| Adam | epoch: 001 | loss: 0.65702 -- iter: 0976/1309
## Training Step: 62  | total loss: 0.65542
## 
| Adam | epoch: 001 | loss: 0.65542 -- iter: 0992/1309
## Training Step: 63  | total loss: 0.67665
## 
| Adam | epoch: 001 | loss: 0.67665 -- iter: 1008/1309
## Training Step: 64  | total loss: 0.66618
## 
| Adam | epoch: 001 | loss: 0.66618 -- iter: 1024/1309
## Training Step: 65  | total loss: 0.65384
## 
| Adam | epoch: 001 | loss: 0.65384 -- iter: 1040/1309
## Training Step: 66  | total loss: 0.65121
## 
| Adam | epoch: 001 | loss: 0.65121 -- iter: 1056/1309
## Training Step: 67  | total loss: 0.65850
## 
| Adam | epoch: 001 | loss: 0.65850 -- iter: 1072/1309
## Training Step: 68  | total loss: 0.65625
## 
| Adam | epoch: 001 | loss: 0.65625 -- iter: 1088/1309
## Training Step: 69  | total loss: 0.64502
## 
| Adam | epoch: 001 | loss: 0.64502 -- iter: 1104/1309
## Training Step: 70  | total loss: 0.65894
## 
| Adam | epoch: 001 | loss: 0.65894 -- iter: 1120/1309
## Training Step: 71  | total loss: 0.65847
## 
| Adam | epoch: 001 | loss: 0.65847 -- iter: 1136/1309
## Training Step: 72  | total loss: 0.67057
## 
| Adam | epoch: 001 | loss: 0.67057 -- iter: 1152/1309
## Training Step: 73  | total loss: 0.66633
## 
| Adam | epoch: 001 | loss: 0.66633 -- iter: 1168/1309
## Training Step: 74  | total loss: 0.67553
## 
| Adam | epoch: 001 | loss: 0.67553 -- iter: 1184/1309
## Training Step: 75  | total loss: 0.66182
## 
| Adam | epoch: 001 | loss: 0.66182 -- iter: 1200/1309
## Training Step: 76  | total loss: 0.65571
## 
| Adam | epoch: 001 | loss: 0.65571 -- iter: 1216/1309
## Training Step: 77  | total loss: 0.66136
## 
| Adam | epoch: 001 | loss: 0.66136 -- iter: 1232/1309
## Training Step: 78  | total loss: 0.65617
## 
| Adam | epoch: 001 | loss: 0.65617 -- iter: 1248/1309
## Training Step: 79  | total loss: 0.65106
## 
| Adam | epoch: 001 | loss: 0.65106 -- iter: 1264/1309
## Training Step: 80  | total loss: 0.65181
## 
| Adam | epoch: 001 | loss: 0.65181 -- iter: 1280/1309
## Training Step: 81  | total loss: 0.65076
## 
| Adam | epoch: 001 | loss: 0.65076 -- iter: 1296/1309
## Training Step: 82  | total loss: 0.63598
## 
| Adam | epoch: 001 | loss: 0.63598 -- iter: 1309/1309
## Training Step: 82  | total loss: 0.63598
## 
| Adam | epoch: 001 | loss: 0.63598 -- iter: 1309/1309
## --
## Training Step: 83  | total loss: 0.62936
## 
| Adam | epoch: 002 | loss: 0.62936 -- iter: 0016/1309
## Training Step: 84  | total loss: 0.62307
## 
| Adam | epoch: 002 | loss: 0.62307 -- iter: 0032/1309
## Training Step: 85  | total loss: 0.62194
## 
| Adam | epoch: 002 | loss: 0.62194 -- iter: 0048/1309
## Training Step: 86  | total loss: 0.62587
## 
| Adam | epoch: 002 | loss: 0.62587 -- iter: 0064/1309
## Training Step: 87  | total loss: 0.61556
## 
| Adam | epoch: 002 | loss: 0.61556 -- iter: 0080/1309
## Training Step: 88  | total loss: 0.62776
## 
| Adam | epoch: 002 | loss: 0.62776 -- iter: 0096/1309
## Training Step: 89  | total loss: 0.62942
## 
| Adam | epoch: 002 | loss: 0.62942 -- iter: 0112/1309
## Training Step: 90  | total loss: 0.62614
## 
| Adam | epoch: 002 | loss: 0.62614 -- iter: 0128/1309
## Training Step: 91  | total loss: 0.61422
## 
| Adam | epoch: 002 | loss: 0.61422 -- iter: 0144/1309
## Training Step: 92  | total loss: 0.61096
## 
| Adam | epoch: 002 | loss: 0.61096 -- iter: 0160/1309
## Training Step: 93  | total loss: 0.60008
## 
| Adam | epoch: 002 | loss: 0.60008 -- iter: 0176/1309
## Training Step: 94  | total loss: 0.58198
## 
| Adam | epoch: 002 | loss: 0.58198 -- iter: 0192/1309
## Training Step: 95  | total loss: 0.58318
## 
| Adam | epoch: 002 | loss: 0.58318 -- iter: 0208/1309
## Training Step: 96  | total loss: 0.59722
## 
| Adam | epoch: 002 | loss: 0.59722 -- iter: 0224/1309
## Training Step: 97  | total loss: 0.60388
## 
| Adam | epoch: 002 | loss: 0.60388 -- iter: 0240/1309
## Training Step: 98  | total loss: 0.60225
## 
| Adam | epoch: 002 | loss: 0.60225 -- iter: 0256/1309
## Training Step: 99  | total loss: 0.60550
## 
| Adam | epoch: 002 | loss: 0.60550 -- iter: 0272/1309
## Training Step: 100  | total loss: 0.60809
## 
| Adam | epoch: 002 | loss: 0.60809 -- iter: 0288/1309
## Training Step: 101  | total loss: 0.60451
## 
| Adam | epoch: 002 | loss: 0.60451 -- iter: 0304/1309
## Training Step: 102  | total loss: 0.60749
## 
| Adam | epoch: 002 | loss: 0.60749 -- iter: 0320/1309
## Training Step: 103  | total loss: 0.60584
## 
| Adam | epoch: 002 | loss: 0.60584 -- iter: 0336/1309
## Training Step: 104  | total loss: 0.61162
## 
| Adam | epoch: 002 | loss: 0.6