Using AWS S3 for caching with Shiny and memoise

This document demonstrates using AWS S3 for caching with Shiny and memoise. It uses the {aws.s3} package. Note that this is a proof-of-concept and I don’t know much about the robustness of {aws.s3}. See to the {aws.s3} documentation for more information.

Preliminaries

This is code for creating an S3 caching object. You can skip this section if you just want to see the S3 cache in action.

This code assumes you’re credentials are stored in ~/.aws/config, and the profile is labeled with [profile myname] (where myname should be replaced with the actual profile name). The {aws.s3} package also has other ways of providing credentials, but this is what worked for me.

# Install aws.s3 if needed.
# install.packages("aws.s3")

library(cachem)
library(memoise)

# Loads AWS variables from ~/.aws/config.
# profile_name is the name of the profile. For example, if there's a section
# in the config file with `[profile foo]`, then "foo" is the name.
load_aws_vars <- function(profile_name) {
  lines <- readLines("~/.aws/config")
  startline <- grep(sprintf("[profile %s]", profile_name), lines, fixed = TRUE)
  lines <- lines[seq(startline+1, length(lines))]
  lines <- lines[grepl("^aws", lines)]
  lines <- strsplit(lines, " += +")
  vars <- lapply(lines, `[[`, 2)
  names(vars) <- toupper(vapply(lines, `[[`, 1, FUN.VALUE = character(1)))
  do.call(Sys.setenv, vars)
}

# This function clears an S3 bucket
clear_bucket <- function(bucket) {
  keys <- aws.s3::get_bucket_df(bucket)$Key
  aws.s3::delete_object(keys, bucket)
}

# Create an S3 cache object with a cachem-compatible interface. This uses the
# old-style memoise::cache_s3(), and wraps it.
s3_cache <- function(target, read_only = FALSE) {
  structure(
    memoise:::wrap_old_cache(memoise::cache_s3(target)),
    class = "s3_cache"
  )
}

Creating and using the S3 cache

Next, we’ll create a memory cache, disk cache, and S3 cache, and compose them into a single layered cache object. (Note that the memory and disk caches are very small, just for the purposes of this demonstration.) With a cache_layered, it searches the first cache, and if there’s a miss, it searches the next one, and so on. If there’s a hit on (for example) the third-level cache, the value also gets copied to previous caches (the first- and second-level caches).

# NOTE: You will need to customize these values
profile_name <- "winston"
bucket_name  <- "cache-demo-1.stdout.org"

load_aws_vars(profile_name = profile_name)
# Run this to clear the bucket
# clear_bucket(bucket_name)

m <- cache_mem(max_size = 3e5)
d <- cache_disk(max_size = 1e6)
s <- s3_cache(bucket_name)

# Create layered cache which logs messages to the console. Note that messages
# won't show in the generated HTML, but they will show if you're using this code
# interactively.
cl <- cache_layered(m, d, s, logfile = stderr())

NOTE: As of this writing, cachem::cache_layered() is still in an experimental stage.

Demonstration of the layered cache with a Fibonacci function. Without caching, it takes about 3 seconds to calculate fib(32):

# Fibonacci function
fib <- function(n) {
  if (n <= 1) return(n)
  fib(n-1) + fib(n-2)
}

system.time(
  fib(32)
)
#>    user  system elapsed 
#>   2.670   0.016   2.691

When memoized with the layered cache, this takes some time, because each S3 interaction takes about 0.1 seconds from my home computer. It’s probably faster if running in AWS. For this particular example, because there are many interactions with S3, this takes more time than the un-memoized version. This is not time wasted, though: it has populated the S3 cache so that future runs in other R processes or on other machines will be fast.

fib <- memoise(fib, cache = cl)
system.time(
  fib(32)
)
#>    user  system elapsed 
#>   0.602   0.043   5.519

If we call the memoized version again, it will be very fast, because it will have a cache hit in the memory cache.

system.time(
  fib(32)
)
#>    user  system elapsed 
#>   0.000   0.000   0.001

NOTE: The layered cache with S3 will be slower in the event of cache misses in the memory and disk layers, because when those layers have a miss, it makes a query to S3, which takes some time. So it is not necessarily a good choice for every use case.

Now, imagine we’re in a separate R process or even on a different computer. To simulate this, we’ll create a new layered cache, so that the memory and disk cache are empty. However, the S3 bucket still has the recently-added contents.

m <- cache_mem(max_size = 3e5)
d <- cache_disk(max_size = 1e6)
s <- s3_cache(bucket_name)

cl <- cache_layered(m, d, s, logfile = stderr())

# Make the memoized fib function
fib <- function(n) {
  if (n <= 1) return(n)
  fib(n-1) + fib(n-2)
}
fib <- memoise(fib, cache = cl)

When we call the memoized function, it has a miss in the memory and disk caches, but has a hit in S3. This is reasonably fast, at about 0.1 seconds on my home computer. (Again, this will probably be faster if run in AWS.)

system.time(
  fib(32)
)
#>    user  system elapsed 
#>   0.014   0.001   0.107

If we call it again, it will have a cache hit in the memory cache, so it will be instantaneous:

system.time(
  fib(32)
)
#>    user  system elapsed 
#>       0       0       0

Inspecting caches

Each cache object can be inspected and manipulated:

# Objects in memory cache
m$keys()
#> [1] "5bdee8b8ac01912d3d3b7ef9f586579a"

# Objects in S3 cache
s$keys()
#>  [1] "00c5db5426583c6d263f0875ceb46e5a" "05fcf52fe581c96186f5362547bd865a"
#>  [3] "07d68be904946e80c67e253afc5881cb" "07e2f922db1b4d4a4f6a7835cfec139d"
#>  [5] "2427fb2864ff129f28725ce057a36659" "27270d52c1bb4a7bffa7a4f32b61a5f6"
#>  [7] "27ecf764767d4efd9e0db36108424e06" "2d857c0c0f0e84365bec3c9ddfc3c763"
#>  [9] "45e4e10bb0f8859314f7c2205dacac5b" "48abc23b199425a96b795473a8775472"
#> [11] "574a562eb6e432f514584722c5456100" "5b1cc320f586cfba7bfd44ccd9d38cbe"
#> [13] "5bdee8b8ac01912d3d3b7ef9f586579a" "6529bb1845ad44b1266b31be78c20f1a"
#> [15] "685d88584422a74b2177ed831be28787" "6c954dc19354bae1872bd60df1429212"
#> [17] "71f6b58b31f687b7b6523739fce6582e" "7bbe11e70a9aa73e85e61096f868cabb"
#> [19] "7e5c17137f954e64c2e9f9d537fc0226" "809db362b4588abd8a92f961c5b6ade1"
#> [21] "85f7cb2175c6f7a69c5453f5e2589e25" "9420fff086fadcbbca0e7c900029c142"
#> [23] "b89abb18015fb038da752782c8cb02fe" "c5328f92593ac6fc0390bda945cae6e5"
#> [25] "c8c6673dd4528ff0612680572f6509c5" "c9894982ed52a72de07e4490d1906889"
#> [27] "d26636eb9309df4785a6ad2e9eb432af" "dcdd58937b6baffffa37c7cd3fd6a0be"
#> [29] "e200221d37f71957fe2b3da47c57c887" "e8db4a44951e9f9437c5c2c2f1c51eeb"
#> [31] "f6e514306adbebb00decb8a67d8b0936" "f6fc5093307c784dadbefe64ca5b7409"
#> [33] "fc2155a8dae390f203856e2f5bcd9c4b"

# Objects in the layered cache (will include keys for all layers)
cl$keys()
#>  [1] "5bdee8b8ac01912d3d3b7ef9f586579a" "00c5db5426583c6d263f0875ceb46e5a"
#>  [3] "05fcf52fe581c96186f5362547bd865a" "07d68be904946e80c67e253afc5881cb"
#>  [5] "07e2f922db1b4d4a4f6a7835cfec139d" "2427fb2864ff129f28725ce057a36659"
#>  [7] "27270d52c1bb4a7bffa7a4f32b61a5f6" "27ecf764767d4efd9e0db36108424e06"
#>  [9] "2d857c0c0f0e84365bec3c9ddfc3c763" "45e4e10bb0f8859314f7c2205dacac5b"
#> [11] "48abc23b199425a96b795473a8775472" "574a562eb6e432f514584722c5456100"
#> [13] "5b1cc320f586cfba7bfd44ccd9d38cbe" "6529bb1845ad44b1266b31be78c20f1a"
#> [15] "685d88584422a74b2177ed831be28787" "6c954dc19354bae1872bd60df1429212"
#> [17] "71f6b58b31f687b7b6523739fce6582e" "7bbe11e70a9aa73e85e61096f868cabb"
#> [19] "7e5c17137f954e64c2e9f9d537fc0226" "809db362b4588abd8a92f961c5b6ade1"
#> [21] "85f7cb2175c6f7a69c5453f5e2589e25" "9420fff086fadcbbca0e7c900029c142"
#> [23] "b89abb18015fb038da752782c8cb02fe" "c5328f92593ac6fc0390bda945cae6e5"
#> [25] "c8c6673dd4528ff0612680572f6509c5" "c9894982ed52a72de07e4490d1906889"
#> [27] "d26636eb9309df4785a6ad2e9eb432af" "dcdd58937b6baffffa37c7cd3fd6a0be"
#> [29] "e200221d37f71957fe2b3da47c57c887" "e8db4a44951e9f9437c5c2c2f1c51eeb"
#> [31] "f6e514306adbebb00decb8a67d8b0936" "f6fc5093307c784dadbefe64ca5b7409"
#> [33] "fc2155a8dae390f203856e2f5bcd9c4b"

You can reset an individual cache by calling $reset():

# Reset memory cache
m$reset()

# Reset S3 cache
s$reset()

Using layered S3 cache with Shiny

Here’s a demo with a Shiny application with a plot that uses the layered cache. Note that the cache object is passed directly to the bindCache() call. To use the cache for the entire app, call shinyOptions(cache = cl) at the top. (See this for more information about cache scoping with Shiny applications.)

If you want to copy and paste this to run this in your R session, you’ll need to also copy and paste the code above to create the layered cache.

library(shiny)
shinyApp(
  fluidPage(
    sidebarLayout(
      sidebarPanel(
        sliderInput("n", "Number of points", 4, 32, value = 8, step = 4)
      ),
      mainPanel(plotOutput("plot"))
    )
  ),
  function(input, output, session) {
    # Print out pixelratio because that can affect plot caching
    message("pixelratio: ", isolate(session$clientData$pixelratio))

    output$plot <- renderPlot({
        Sys.sleep(2)  # Add an artificial delay
        seqn <- seq_len(input$n)
        plot(mtcars$wt[seqn], mtcars$mpg[seqn],
             xlim = range(mtcars$wt), ylim = range(mtcars$mpg))
      }) %>% 
      bindCache(input$n, cache = cl)
  }
)

Using AWS S3 for caching with Shiny and memoise

Winston Chang

2021-02-05

Preliminaries

Creating and using the S3 cache

Inspecting caches

Using layered S3 cache with Shiny