Spark Summit from Andrej Karpathy at Tesla
The toolchain for the (software) 2.0 tack does not exist.
Different models need to be run across many runtimes each producing many results.
MLflow can run models across runtimes tracking their results.
“Helps teams manage their machine learning lifecycle.”
Install Anaconda or miniconda.
Implicit MLflow run:
library(mlflow)
# Log a parameter (key-value pair)
mlflow_log_param("param1", 5)
# Log a metric; metrics can be updated throughout the run
mlflow_log_metric("foo", 1)
mlflow_log_metric("foo", 2)
mlflow_log_metric("foo", 3)
# Log an artifact (output file)
writeLines("Hello world!", "output.txt")
mlflow_log_artifact("output.txt")
Run terminates when the R session finishes or by running:
Useful when sourcing files.
Explicit MLflow run:
library(mlflow)
with(mlflow_start_run(), {
# Log a parameter (key-value pair)
mlflow_log_param("param1", 5)
# Log a metric; metrics can be updated throughout the run
mlflow_log_metric("foo", 1)
mlflow_log_metric("foo", 2)
mlflow_log_metric("foo", 3)
# Log an artifact (output file)
writeLines("Hello world!", "output.txt")
mlflow_log_artifact("output.txt")
})
Or adding the following to tracking.R
in RStudio 1.2:
Create dependencies snapshot:
Then restore snapshot:
mlflow_run(
"train.R",
"https://github.com/rstudio/mlflow-example",
param_list = list(alpha = 0.2)
)
Elasticnet model (alpha=0.2, lambda=0.5):
RMSE: 0.827574750159859
MAE: 0.632070002076146
R2: 0.227227498131926
Elasticnet model (alpha=0.200000, l1_ratio=0.100000):
RMSE: 0.7836984021909766
MAE: 0.6142020452688988
R2: 0.20673590971167466
Or from bash:
Elasticnet model (alpha=0.5, lambda=0.5):
RMSE: 0.828684594922867
MAE: 0.627503954965052
R2: 0.19208126758775
Generic functions can be serialized with carrier::crate:
mlflow_rfunc_predict(
"model",
data = data.frame(cyl = 2, disp = 160, hp = 110, drat = 3.9, wt = 2.62,
qsec = 16.46, vs = 0, am = 1, gear = 4, carb = 4)
)
1
23.04527
Or from bash,
However, mlflow_save_model()
can be extended by packages:
library(keras)
cars <- scale(mtcars)
model <- keras_model_sequential() %>%
layer_dense(units = 8, activation = "relu", input_shape = ncol(mtcars) - 1) %>%
layer_dense(units = 4, activation = "relu") %>%
layer_dense(units = 1, activation = "relu") %>%
compile(
loss = "mse",
optimizer = optimizer_rmsprop(),
metrics = "mean_absolute_error"
)
fit(model, cars[,-1], cars[,1], epochs = 10)
mlflow_save_model(model, "keras")
mlflow_rfunc_predict(
"keras",
data = data.frame(cyl = 1, disp = 1, hp = 1, drat = 1, wt = 1,
qsec = 1, vs = 1, am = 1, gear = 1, carb = -1)
)
[,1]
[1,] 0.02378437
[{"cyl":1,"disp":1,"hp":1,"drat":1,"wt":1,"qsec":1,"vs":1,"am":1,"gear":1,"carb":-1}]
{"predictions": [[0.0238]]}
# Source: spark<?> [?? x 1]
result
* <dbl>
1 22.6
2 22.1
3 26.3
4 21.2
5 17.7
mlflow.org
github.com/mlflow/mlflow
rpubs.com/jluraschi/mlflow-rseattle
@javierluraschi