Data workflows in RStudio Connect

@javierluraschi

09/10/2019

Today

mlflow

MLflow docs require manually downloading win-quality.csv.

r2d3

The r2d2 package docs require downloading flare.csv.

readr

The readr package defines readr_example() to avoid downloads.

R4DS

“R for Data Science” contains “data/heights.csv” in data import.

Is this reproducible?

Workarounds

Add to .gitignore? Upstream changes? Share across projects?

Pins

Caching with Pins

With pins we can easily cache resources,

"/Users/javierluraschi/Library/Caches/pins/local/flare/flare.csv"

But wait, there is more…

Intro

Functionality

You can use the pins package to:

  • Pin remote resources locally to work offline and cache results with ease, pin() stores resources in boards which you can then retrieve with pin_get().
  • Discover new resources across different boards using pin_find().
  • Share resources on GitHub, Kaggle or RStudio Connect by registering new boards with board_register().
  • Resources can be anything from CSV, JSON, or image files to arbitrary R objects.

What can I pin?

Anything!

Where can I store pins?

Anywhere! – That implements the ‘board’ interface.

What is a board?

A storage location, like your local file systems, GitHub, Kaggle or RStudio Connect.

RStudio

Pin

Authentication

Discover

Share

Resources

A pin can be anything,

Extensions

A pin can also be extended!

Use Cases

There are many interesting use cases available for pins, from caching remote resources to creating data pipelines.

See rstudio.github.io/pins/articles/use-cases.