Update reticulate package
#update.packages("reticulate")
#devtools::install_github("rstudio/reticulate")
The tfruns package provides a suite of tools for tracking, visualizing, and managing TensorFlow training runs and experiments from R:
Track the hyperparameters, metrics, output, and source code of every training run.
Compare hyperparmaeters and metrics across runs to find the best performing model.
Automatically generate reports to visualize individual training runs or comparisons between runs.
No changes to source code required (run data is automatically captured for all Keras and TF Estimator models).
#install.packages("tfruns")
library(tfruns)
#devtools::install_github("rstudio/tfestimators")
library(tensorflow)
library(tidyverse)
library(tfestimators)
library(data.table)
library(mlr)
library(reticulate)
The estimators availavle in tfestimator package is listed below.
library(knitr)
knitr::include_graphics('/Users/nanaakwasiabayieboateng/Documents/memphisclassesbooks/DataMiningscience/Deeplearning/tfestimator.png')
The data is available at the UCI machine learning repository here .
url<-"http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
data<-fread(url)
trying URL 'http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
Content type 'text/plain; charset=UTF-8' length 23279 bytes (22 KB)
==================================================
downloaded 22 KB
#read_csv(url)
colnames(data)<-c("preg","plas","pres","skin","test","mass","pedi","age","class")
data%>%glimpse()
Observations: 768
Variables: 9
$ preg <int> 6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10, 10, 1, 5, 7, 0...
$ plas <int> 148, 85, 183, 89, 137, 116, 78, 115, 197, 125, 110, ...
$ pres <int> 72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74, 80, 6...
$ skin <int> 35, 29, 0, 23, 35, 0, 32, 0, 45, 0, 0, 0, 0, 23, 19,...
$ test <int> 0, 0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, 846, 17...
$ mass <dbl> 33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31.0, 35.3, 30.5...
$ pedi <dbl> 0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248, 0.1...
$ age <int> 50, 31, 32, 21, 33, 30, 26, 29, 53, 54, 30, 34, 57, ...
$ class <int> 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1...
Title: Pima Indians Diabetes Database
Sources: National Institute of Diabetes and Digestive and Kidney Diseases
The diagnostic, binary-valued variable investigated is whether the patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care). The population lives near Phoenix, Arizona, USA.
Results: Their ADAP algorithm makes a real-valued prediction between 0 and 1. This was transformed into a binary decision using a cutoff of 0.448. Using 576 training instances, the sensitivity and specificity of their algorithm was 76% on the remaining 192 instances.
Relevant Information: Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ADAP is an adaptive learning routine that generates and executes digital analogs of perceptron-like devices. It is a unique algorithm; see the paper for details.
Number of Instances: 768
Number of Attributes: 8 plus class
Class variable (0 or 1)
Missing Attribute Values: Yes
Class Distribution: (class value 1 is interpreted as “tested positive for diabetes”)
Class Value Number of instances 0 500 1 268
The dataset contains no missing values.
colSums(is.na(data))
preg plas pres skin test mass pedi age class
0 0 0 0 0 0 0 0 0
summarizeColumns(data)
Split the data into training and test datasets.
# Split the data into training and test data sets
index <- sample(1:nrow(data), size = 0.80 * nrow(data))
train <- data[index, ]
test <- data[-index, ]
Estimators can accept data from arbitrary data sources through an ‘input function’. The input function selects feature and response columns from the input source as well as defines how data will be drawn (e.g. batch size, epochs, etc.). The tfestimators package provides the input_fn() helper function for generating input functions from common R data structures, e.g. R matrices and data frames.
# return an input_fn for a given subset of data
data_input_fn <- function(data, num_epochs = 1) {
input_fn(data,
features = c("preg","plas","pres","skin","test","mass","pedi","age"),
response = "class",
batch_size = 32,
num_epochs = num_epochs)
}
Next, we define the feature columns for our model. Feature columns are mappings of raw input data to the data that we’ll actually feed into our training, evaluation, and prediction steps. We can transform variables to categorical variables for the analysis. We can do this by using thetfestimators::feature_columns() function to get the data into the shape expected for an input Tensor. Category levels are set by passing a list to the vocabulary_list argument.For example to convert variable with two levels we can use column_categorical_with_vocabulary_list(“variable”, vocabulary_list = list(“level 1”, “level 2”))
cols <- feature_columns(
column_numeric("preg","plas","pres","skin","test","mass","pedi","age")
)
Next, we create the estimator by calling the linear_classifier() function and passing it a set of feature columns:
model <- linear_classifier(feature_columns = cols)
We train the model afterwards.
# train the model
model %>% train(data_input_fn(train, num_epochs = 10))
[-] Training -- loss: 22.18, step: 1
[\] Training -- loss: 898.94, step: 2
[|] Training -- loss: 26.33, step: 3
[/] Training -- loss: 112.03, step: 4
[-] Training -- loss: 238.68, step: 5
[\] Training -- loss: 549.02, step: 6
[|] Training -- loss: 255.83, step: 7
[/] Training -- loss: 127.88, step: 8
[-] Training -- loss: 90.98, step: 9
[\] Training -- loss: 177.39, step: 10
[|] Training -- loss: 263.25, step: 11
[/] Training -- loss: 181.84, step: 12
[-] Training -- loss: 116.20, step: 13
[\] Training -- loss: 272.25, step: 14
[|] Training -- loss: 29.15, step: 15
[/] Training -- loss: 45.41, step: 16
[-] Training -- loss: 117.46, step: 17
[\] Training -- loss: 41.48, step: 18
[|] Training -- loss: 159.80, step: 19
[/] Training -- loss: 113.04, step: 20
[-] Training -- loss: 142.92, step: 21
[\] Training -- loss: 248.46, step: 22
[|] Training -- loss: 85.69, step: 23
[/] Training -- loss: 216.56, step: 24
[-] Training -- loss: 68.48, step: 25
[\] Training -- loss: 52.60, step: 26
[|] Training -- loss: 82.04, step: 27
[/] Training -- loss: 184.77, step: 28
[-] Training -- loss: 84.66, step: 29
[\] Training -- loss: 56.09, step: 30
[|] Training -- loss: 62.31, step: 31
[/] Training -- loss: 94.20, step: 32
[-] Training -- loss: 51.16, step: 33
[\] Training -- loss: 58.38, step: 34
[|] Training -- loss: 112.49, step: 35
[/] Training -- loss: 138.90, step: 36
[-] Training -- loss: 48.63, step: 37
[\] Training -- loss: 52.38, step: 38
[|] Training -- loss: 23.78, step: 39
[/] Training -- loss: 81.91, step: 40
[-] Training -- loss: 117.72, step: 41
[\] Training -- loss: 34.53, step: 42
[|] Training -- loss: 52.25, step: 43
[/] Training -- loss: 103.42, step: 44
[-] Training -- loss: 64.43, step: 45
[\] Training -- loss: 40.75, step: 46
[|] Training -- loss: 73.24, step: 47
[/] Training -- loss: 72.88, step: 48
[-] Training -- loss: 141.48, step: 49
[\] Training -- loss: 69.24, step: 50
[|] Training -- loss: 81.84, step: 51
[/] Training -- loss: 76.14, step: 52
[-] Training -- loss: 50.69, step: 53
[\] Training -- loss: 26.84, step: 54
[|] Training -- loss: 80.43, step: 55
[/] Training -- loss: 111.80, step: 56
[-] Training -- loss: 114.20, step: 57
[\] Training -- loss: 49.81, step: 58
[|] Training -- loss: 40.30, step: 59
[/] Training -- loss: 47.10, step: 60
[-] Training -- loss: 79.76, step: 61
[\] Training -- loss: 118.48, step: 62
[|] Training -- loss: 73.83, step: 63
[/] Training -- loss: 20.61, step: 64
[-] Training -- loss: 32.32, step: 65
[\] Training -- loss: 76.05, step: 66
[|] Training -- loss: 38.57, step: 67
[/] Training -- loss: 67.98, step: 68
[-] Training -- loss: 30.62, step: 69
[\] Training -- loss: 30.51, step: 70
[|] Training -- loss: 25.79, step: 71
[/] Training -- loss: 21.09, step: 72
[-] Training -- loss: 39.48, step: 73
[\] Training -- loss: 38.96, step: 74
[|] Training -- loss: 30.10, step: 75
[/] Training -- loss: 33.58, step: 76
[-] Training -- loss: 44.39, step: 77
[\] Training -- loss: 121.28, step: 78
[|] Training -- loss: 89.70, step: 79
[/] Training -- loss: 59.57, step: 80
[-] Training -- loss: 62.29, step: 81
[\] Training -- loss: 30.87, step: 82
[|] Training -- loss: 56.52, step: 83
[/] Training -- loss: 32.06, step: 84
[-] Training -- loss: 60.52, step: 85
[\] Training -- loss: 99.41, step: 86
[|] Training -- loss: 27.82, step: 87
[/] Training -- loss: 52.77, step: 88
[-] Training -- loss: 49.96, step: 89
[\] Training -- loss: 72.59, step: 90
[|] Training -- loss: 67.14, step: 91
[/] Training -- loss: 33.29, step: 92
[-] Training -- loss: 27.95, step: 93
[\] Training -- loss: 17.61, step: 94
[|] Training -- loss: 20.99, step: 95
[/] Training -- loss: 27.35, step: 96
[-] Training -- loss: 19.51, step: 97
[\] Training -- loss: 39.69, step: 98
[|] Training -- loss: 25.91, step: 99
[/] Training -- loss: 25.87, step: 100
[-] Training -- loss: 23.38, step: 101
[\] Training -- loss: 25.72, step: 102
[|] Training -- loss: 49.72, step: 103
[/] Training -- loss: 19.75, step: 104
[-] Training -- loss: 19.75, step: 105
[\] Training -- loss: 37.84, step: 106
[|] Training -- loss: 83.40, step: 107
[/] Training -- loss: 104.30, step: 108
[-] Training -- loss: 59.79, step: 109
[\] Training -- loss: 24.61, step: 110
[|] Training -- loss: 28.32, step: 111
[/] Training -- loss: 35.22, step: 112
[-] Training -- loss: 66.99, step: 113
[\] Training -- loss: 49.32, step: 114
[|] Training -- loss: 27.45, step: 115
[/] Training -- loss: 34.46, step: 116
[-] Training -- loss: 53.99, step: 117
[\] Training -- loss: 30.21, step: 118
[|] Training -- loss: 59.44, step: 119
[/] Training -- loss: 23.67, step: 120
[-] Training -- loss: 17.16, step: 121
[\] Training -- loss: 15.45, step: 122
[|] Training -- loss: 43.75, step: 123
[/] Training -- loss: 135.51, step: 124
[-] Training -- loss: 58.39, step: 125
[\] Training -- loss: 35.61, step: 126
[|] Training -- loss: 22.55, step: 127
[/] Training -- loss: 25.20, step: 128
[-] Training -- loss: 35.48, step: 129
[\] Training -- loss: 27.58, step: 130
[|] Training -- loss: 58.30, step: 131
[/] Training -- loss: 49.04, step: 132
[-] Training -- loss: 28.12, step: 133
[\] Training -- loss: 40.11, step: 134
[|] Training -- loss: 29.50, step: 135
[/] Training -- loss: 31.44, step: 136
[-] Training -- loss: 43.53, step: 137
[\] Training -- loss: 51.67, step: 138
[|] Training -- loss: 42.87, step: 139
[/] Training -- loss: 34.58, step: 140
[-] Training -- loss: 39.75, step: 141
[\] Training -- loss: 31.81, step: 142
[|] Training -- loss: 24.74, step: 143
[/] Training -- loss: 25.95, step: 144
[-] Training -- loss: 21.55, step: 145
[\] Training -- loss: 25.45, step: 146
[|] Training -- loss: 39.19, step: 147
[/] Training -- loss: 23.43, step: 148
[-] Training -- loss: 47.57, step: 149
[\] Training -- loss: 50.52, step: 150
[|] Training -- loss: 94.98, step: 151
[/] Training -- loss: 40.20, step: 152
[-] Training -- loss: 20.37, step: 153
[\] Training -- loss: 21.75, step: 154
[|] Training -- loss: 30.24, step: 155
[/] Training -- loss: 42.65, step: 156
[-] Training -- loss: 23.87, step: 157
[\] Training -- loss: 26.00, step: 158
[|] Training -- loss: 25.41, step: 159
[/] Training -- loss: 52.50, step: 160
[-] Training -- loss: 24.05, step: 161
[\] Training -- loss: 18.57, step: 162
[|] Training -- loss: 27.64, step: 163
[/] Training -- loss: 35.61, step: 164
[-] Training -- loss: 19.14, step: 165
[\] Training -- loss: 20.32, step: 166
[|] Training -- loss: 19.01, step: 167
[/] Training -- loss: 24.10, step: 168
[-] Training -- loss: 33.67, step: 169
[\] Training -- loss: 59.86, step: 170
[|] Training -- loss: 70.62, step: 171
[/] Training -- loss: 29.50, step: 172
[-] Training -- loss: 29.61, step: 173
[\] Training -- loss: 25.99, step: 174
[|] Training -- loss: 28.93, step: 175
[/] Training -- loss: 62.65, step: 176
[-] Training -- loss: 53.62, step: 177
[\] Training -- loss: 35.48, step: 178
[|] Training -- loss: 17.98, step: 179
[/] Training -- loss: 28.27, step: 180
[-] Training -- loss: 34.04, step: 181
[\] Training -- loss: 27.57, step: 182
[|] Training -- loss: 26.55, step: 183
[/] Training -- loss: 21.56, step: 184
[-] Training -- loss: 26.21, step: 185
[\] Training -- loss: 27.46, step: 186
[|] Training -- loss: 27.01, step: 187
[/] Training -- loss: 35.04, step: 188
[-] Training -- loss: 32.12, step: 189
[\] Training -- loss: 67.88, step: 190
[|] Training -- loss: 19.98, step: 191
[/] Training -- loss: 26.74, step: 192
We can evaluate the model’s accuracy using the evaluate() function, using our ‘test’ data set for validation.
model %>% evaluate(data_input_fn(test), simplify = TRUE)
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
[-] Evaluating -- loss: 36.51, step: 1
[\] Evaluating -- loss: 37.55, step: 2
[|] Evaluating -- loss: 29.75, step: 3
[/] Evaluating -- loss: 34.83, step: 4
[-] Evaluating -- loss: 31.47, step: 5
After we’ve finished training our model, we can use it to generate predictions from new data.
#model %>% predict(data_input_fn(test), simplify = FALSE)
predictions=model %>% predict(data_input_fn(test), simplify = TRUE)
predictions%>%head()
glimpse(predictions)
Observations: 154
Variables: 5
$ logits <list> [0.1383194, 3.272619, 1.162019, 6.597319, 6...
$ logistic <list> [0.5345248, 0.9634774, 0.7616994, 0.9986379...
$ probabilities <list> [<0.4654752, 0.5345248>, <0.03652255, 0.963...
$ class_ids <list> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, ...
$ classes <list> ["1", "1", "1", "1", "1", "1", "1", "1", "1...
#simplify_fn <- function(predictions) {
# lapply(predictions, function(x) x$probabilities)
#}
#model %>% predict(data_input_fn(test), simplify = simplify_fn)
Models created via tfestimators are persisted on disk. To obtain the location of where the model artifacts are stored, we can call model_dir():
saved_model_dir <- model_dir(model)
And subsequently load the saved model (in a new session) by passing the directory to the model_dir argument of the model constructor and use it for prediction or continue training:
cols <- feature_columns(
column_numeric("preg","plas","pres","skin","test","mass","pedi","age")
)
loaded_model <- linear_classifier(feature_columns = cols,
model_dir = saved_model_dir)
loaded_model
A TensorFlow classifier [<tensorflow.python.estimator.canned.linear.LinearClassifier>]
Model Directory: /var/folders/mj/w1gxzjcd0qx2cw_0690z7y640000gn/T/tmpnl5hwxo3
Model has been trained for 192 steps.