rstudio::conf(2022)

Highlights and Notes from the Conference

Author

Tyson S. Barrett, PhD

Published

August 4, 2022

Conference Takeaways

This document highlights insights from rstudio::conf(2022). There are likely others that I likely missed, but luckily just about everything is/will be online. For example, check out the workshops or this repo linking to talk materials.

A few things that are immediately important to mention:

  1. RStudio is changing its name to Posit! This is a move to show their place in data science (and science in general). They do not just see themselves as just an R company, but rather a much broader analytics and communication company open to using the best tools available (which can often be R but can also be python, julia, etc.). This name change will impact a few things but we’ve been told RStudio (the IDE) will stay RStudio. As such, much of the contact points between us and the tools will be unchanged. They said many times this isn’t a move away from R, but rather a move that makes it more clear they are inclusive and plan to incorporate more languages into their toolbox.
  2. Quarto is taking over the world (at least that’s how it felt at the conference). Quarto is the new rmarkdown with expanded features and modern styling. It has so much going for it that its pretty obvious that it is the future of analytic/scientific communication. Note that this document was written in Quarto.
  3. Shiny is expanding. First, there is now a Shiny for python, with similar features but with no plans to keep them “equal.” Second, new tools (like a point-n-click UI designer that writes code for you) are being distributed. In conjunction with all the extensions to shiny, it makes shiny a tool for production-grade applications.
  4. New tools for machine learning, working with databases, and designing for impact. This is broadly the themes of the rest of the talks (not shiny or quarto related).
    • dbcooper: slick interface for working with databases that feel more natural for individuals working with data frames
    • pool: powerful tool to reduce congestion when access databases, particularly useful for shiny applications
    • renv: just use it because it is amazing and will save you heartache

Posit

For more information on the rebranding, visit their website.

Quarto

Quarto is quite impressive. It can be used to produce individual documents, websites, blogs, presentations, and more. It feels more modern and is clearly the tool of the future for tying data, code, and output together in beautiful documents. A few things are noteworthy.

  1. It is multi-lingual. It is designed to work with R obviously but already handles python, julia, observable, and will certainly add many more.
  2. It’s ability to produce beautiful presentations is helpful when communicating analytical concepts (e.g., code, output, equations).
  3. If you use rmarkdown you can change the .rmd extension to .qmd and it will work out of the box. So no need to make drastic changes if you want to adopt Quarto in your workflow.

There are already great docs online if you want to learn more!

Note that this document is made with Quarto.

Shiny

Given I spent the first two days in a workshop devoted to shiny, I’ve spent a considerable amount of time diving deeper into its new features, extensions, and future. My work thus far is strictly with R but I’m sure use cases for python are many.

A few useful tools that I picked up in the workshop and in the talks are below.

Modules

  • Avoids namespace collisions when using same widget across different areas of your app
  • Allow you to encapsulate distinct app interfaces
  • Organize code into logical and easy to understand components
  • Facilitate collaboration
  • Kind of like regular functions in R

Anatomy of a module:

artUI <- function(id) {
  ns <- NS(id)
  tagList(
    checkboxInput(
      ns("input1"),                   # wrap ids in ns()
      "Check Here"
    ),
    selectInput(
      ns("input2"),                   # wrap ids in ns()
      "Select Object",
      choices = c("jar", "vase"),
      selected = "jar",
      multiple = FALSE
    ),
    plotOutput(ns("plot1"))
  )
}

artServer <- function(id) {
  moduleServer(                             # wrap the regular server stuff in moduleServer()
    id,
    function(input, output, session) {      # regular server part
      df <- reactive({
        # do something fancy
      })
      
      output$plot1 <- renderPlot({
        ggplot(df(), aes(x = x, y = y)) +
          geom_point()
      })
    }
  )
}

The moduleServer() encapsulates server-side logic with namespace applied

Invoking modules:

ui <- fluidPage(
  fluidRow(
    artUI("mod1")
  )
)

server <- function(input, output, session) {
  artServer("mod1")
}

shinyApp(ui, server)

Can include other arguments in the UI and server module functions.

UI function:

  • Reasonable inputs: static values, vectors, flags
  • Avoid reactive parameters in UI
  • Return value for UI is a tagList() of inputs, output placeholders, and other UI elements

Server function:

  • Input parameters and return values can be a mix of static and reactive objects
artServer <- function(id, df, title = "Amazing") {
  moduleServer(id,
    function(input, output, session) {
      user_selections <- reactive({
        list(input1 = input$input1,
             input2 = input$input2)
      })
      
      output$plot1 <- renderPlot({
        ggplot(df(), aes(x = x, y = y)) +
          geom_point() +
          ggtitle(title)
      })
      
      user_selections
    }
  )
}

# app server
df <- reactive({
  art_data |>
    filter(dept == input$dept)
})

artServer("mod1", df)

In the code above, df is a reactive but we do not use () when we pass it to the function. But when we use (“invoke”) df in the code, we use df() to get the value. The user_selections is being returned by the function, we return the name (user_selections) not the value (user_selections()).

Put module scripts in the R folder.

Consider the example below:

art_search_UI <- function(id, dept_choices) {
  ns <- NS(id)
  tagList(
    fluidRow(
      column(
        width = 4,
        textInput(
          ns("search_box"),
          "Search Query",
          placeholder = "enter single word"
        )
      ),
      column(
        width = 6,
        selectInput(
          ns("dept"),
          "Select Department",
          choices = dept_choices,
          selectize = FALSE
        )
      )
    ),
    fluidRow(
      column(
        width = 4,
        actionButton(
          ns("search_btn"),
          label = "Search",
          icon = icon("keyboard")
        )
      )
    )
  )
}

art_search_Server <- function(id) {
  moduleServer(
    id,
    function(input, output, session) {
      
      search_results <- reactive({
        if (!shiny::isTruthy(input$search_box)) {
          shinyWidgets::show_toast(
            "Enter a search term",
            type = "error",
            position = "top"
          )
          return(NULL)
        }
        
        search_test <- search_dept_data(q = input$search_box, departmentId = input$dept)
        if (is.null(search_test)) {
          message("I got nothing")
          shinyWidgets::show_toast(
            "I got nothing",
            type = "error",
            position = "center"
          )
          return(NULL)
        }
        
        search_test
        
      }) %>% bindEvent(input$search_btn, ignoreInit = TRUE)

      search_results
    }
  )
}

bindEvent() makes it so it isn’t evaluated until the other input is triggered. So this will only search once input$search_btn is clicked.

shinyWidgets notifications

  • show_toast() provides a nice error message pop up (see code above)
  • Then return(NULL) to essentially abort the function

This is a form of defensive programming where you plan for problems and communicate it clearly to the user, instead of having a weird, cryptic error

bslib

This package allows you to edit elements of the default bootstrap theme used by shiny directly in R.

  • Can explore theme options interactively
  • Built upon the Sass stylesheet language to extend traditional CSS with modern features

Run the following to play around with the theme:

library(shiny)
library(bslib)

bslib::bs_theme_preview()

When you make changes to the preview, the code needed to use that style will show up in the console.

To see it in the app itself, you can insert the following into the server.

bs_themer()

This allows you to play around with theme elements within your own app.

shinytest2

Built on testthat, shinytest2 allows you to automate the testing of your app. This can be a very important addition to your workflow as you will be able to catch bugs far quicker.

# File: simple-app/app.R
library(shiny)
ui <- fluidPage(
  textInput("name", "What is your name?"),
  actionButton("greet", "Greet"),
  textOutput("greeting")
)
server <- function(input, output, session) {
  output$greeting <- renderText({
    req(input$greet)
    paste0("Hello ", isolate(input$name), "!")
  })
}
shinyApp(ui, server)

With this simple app created, we can create a test that will call the app, insert inputs, click on “greet”, and produce values.

# File: simple-app/tests/testthat/test-shinytest2.R
library(shinytest2)

test_that("shinytest2 recording: simple-app", {
  app <- AppDriver$new(name = "simple-app", height = 407, width = 348)
  app$set_inputs(name = "Tyson")
  app$click("greet")
  app$expect_values()
})

This package has a lot of depth so check out the docs for more use cases and more in-depth tests.

cicerone

This package allows you to have a walk through of your app when someone first encounters it. This code is an example of using cicerone (put use_cicerone() in the UI as well).

guide <- cicerone::Cicerone$
  new(allow_close = TRUE)$
  step(
    "dept",
    "Department",
    "Choose from any department"
  )$
  step(
    "choice_table",
    "Your Choices",
    "Each choice will appear in a table here"
  )

The dept and choice_table are names of objects in the UI.

Debugging with browser()

If you want to assess how the environment looks at certain parts of the app, you can put browser() in any reactive and the browser environment will pop up and you can look at current objects (including inputs).

You can also use conditionals to invoke browser() only when certain things are triggered. For example:

if (!is.null(input$timevis_selected)) browser()

httr2

httr2 provides a pipeable API of httr

  • Build a request object to facilitate different pieces of a request workflow
  • Ability to perform dry-runs before actually sending the request
  • Converts HTTP errors to R errors

An example of pulling from the MET API:

library(dplyr)
library(tidyr)
library(purrr)
library(httr2)

# refer to https://metmuseum.github.io/ for documentation of API endpoints
base_url <- "https://collectionapi.metmuseum.org/public/collection/v1"

# How many artwork pieces have been updated in the museum database since July 1st, 2022?
req <- request(base_url) %>%
  req_url_path_append("objects") %>%
  # add query parameter metadataDate
  req_url_query(metadataDate = "2022-07-01")

req_dry_run(req)    # dry run

resp <- req_perform(req)      # actually performs the request
resp_status(resp)             # 200 is OK

# exports JSON
objects_updated <- resp %>%
  resp_body_json()

# Example of taking JSON to tibble
req <- request(base_url) %>%
  req_url_path_append("departments")

resp <- req_perform(req)
resp_status(resp)

departments <- resp %>%
  resp_body_json() %>%
  purrr::pluck("departments") %>%
  transpose() %>%
  tibble::as_tibble() %>%
  tidyr::unnest(cols = c("departmentId", "displayName"))

Debounce

When you want to control how quickly something reacts. For example, we don’t want to search for each time another letter is entered into a search bar.

query_term <- reactive({
  input$object_search
}) %>% 
  debounce(1000)   # makes it wait 1 second

search_res <- reactive({
  req(query_term())
  # other stuff you want it to do
})

CSS

To use an external CSS file, you can link to it using:

tags$head(
    tags$link(
      rel = "stylesheet", 
      type = "text/css", 
      href = "custom.css"
    )
  )

Tidymodels

From the very start, tidymodels made appearances via the first keynote (Julia Silge and Max Kuhn who have a new book out, available for free here), a whole section of talks devoted to extensions to it, and a book signing with Max and Julia. I highlight some of the key takeaways from my perspective (although there are so many things that could be included). Take a look at the docs if you want to see more.

Censored models

One useful for my work is censored models. It is called censored and has some documentation and examples.

The example they provide that most relevant to my work is with the propoortional models.

library(tidymodels)
Warning: package 'tidymodels' was built under R version 4.1.2
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom        1.0.0     ✔ recipes      1.0.1
✔ dials        1.0.0     ✔ rsample      1.0.0
✔ dplyr        1.0.9     ✔ tibble       3.1.7
✔ ggplot2      3.3.6     ✔ tidyr        1.2.0
✔ infer        1.0.2     ✔ tune         1.0.0
✔ modeldata    1.0.0     ✔ workflows    1.0.0
✔ parsnip      1.0.0     ✔ workflowsets 1.0.0
✔ purrr        0.3.4     ✔ yardstick    1.0.0
Warning: package 'broom' was built under R version 4.1.2
Warning: package 'dials' was built under R version 4.1.2
Warning: package 'scales' was built under R version 4.1.2
Warning: package 'dplyr' was built under R version 4.1.2
Warning: package 'ggplot2' was built under R version 4.1.2
Warning: package 'infer' was built under R version 4.1.2
Warning: package 'modeldata' was built under R version 4.1.2
Warning: package 'parsnip' was built under R version 4.1.2
Warning: package 'recipes' was built under R version 4.1.2
Warning: package 'rsample' was built under R version 4.1.2
Warning: package 'tibble' was built under R version 4.1.2
Warning: package 'tidyr' was built under R version 4.1.2
Warning: package 'tune' was built under R version 4.1.2
Warning: package 'workflows' was built under R version 4.1.2
Warning: package 'workflowsets' was built under R version 4.1.2
Warning: package 'yardstick' was built under R version 4.1.2
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts.
library(censored)
Warning: package 'censored' was built under R version 4.1.2
Loading required package: survival
Warning: package 'survival' was built under R version 4.1.2
library(survival)
tidymodels_prefer()

data(cancer)

# some adjustment of the data to fit the example
lung <- lung %>% drop_na()
lung_train <- lung[-c(1:5), ]
lung_test <- lung[1:5, ]

Then to actually run the model we will use a few steps:

set.seed(1)

proportional_hazards() %>%
  set_engine("survival") %>% 
  set_mode("censored regression") %>% 
  fit(Surv(time, status) ~ ., data = lung_train)
parsnip model object

Call:
survival::coxph(formula = Surv(time, status) ~ ., data = data, 
    model = TRUE, x = TRUE)

                coef  exp(coef)   se(coef)      z       p
inst      -0.0291726  0.9712488  0.0131293 -2.222 0.02629
age        0.0146341  1.0147417  0.0119705  1.223 0.22151
sex       -0.5977137  0.5500678  0.2051326 -2.914 0.00357
ph.ecog    0.7507039  2.1184906  0.2536100  2.960 0.00308
ph.karno   0.0137315  1.0138262  0.0132752  1.034 0.30096
pat.karno -0.0082098  0.9918238  0.0082560 -0.994 0.32002
meal.cal  -0.0001233  0.9998767  0.0002841 -0.434 0.66435
wt.loss   -0.0188464  0.9813301  0.0082051 -2.297 0.02162

Likelihood ratio test=32.61  on 8 df, p=7.224e-05
n= 162, number of events= 116 

Clustering

I wanted to just highlight that this was available if you are doing cluster analysis. It’s new but looks cool. It has documentation here and is designed to work within the tidymodels framework.

Working with Databases

Two packages seem very useful for ED&A:

  1. dbcooper: an innovative way to interact with databases that feels more natural for R users.
  2. pool: a package to automatically pool connections to databases (and automatically disconnect). Particularly useful for integration with shiny.

dbcooper

dbcooper turns a database connection into a collection of functions, handling logic for keeping track of connections and letting you take advantage of autocompletion when exploring a database. This example is from the GitHub page.

library(dbcooper)
dbc_init(con, "con_name")

dbc_init then creates user-friendly accessor functions in your global environment. (You could also pass it an environment in which the functions will be created).

dbc_init adds several functions when it initializes a database source. In this case, each will start with the lahman_ prefix.

  • _list: Get a list of tables
  • _tbl: Access a table that can be worked with in dbplyr
  • _query: Perform of a SQL query and work with the result
  • _execute: Execute a query (such as a CREATE or DROP)
  • _src: Retrieve a dbi_src for the database

For instance, we could start by finding the names of the tables in the Lahman database.

lahman_list()
#>  [1] "AllstarFull"         "Appearances"         "AwardsManagers"     
#>  [4] "AwardsPlayers"       "AwardsShareManagers" "AwardsSharePlayers" 
#>  [7] "Batting"             "BattingPost"         "CollegePlaying"     
#> [10] "Fielding"            "FieldingOF"          "FieldingOFsplit"    
#> [13] "FieldingPost"        "HallOfFame"          "HomeGames"          
#> [16] "LahmanData"          "Managers"            "ManagersHalf"       
#> [19] "Master"              "Parks"               "People"             
#> [22] "Pitching"            "PitchingPost"        "Salaries"           
#> [25] "Schools"             "SeriesPost"          "Teams"              
#> [28] "TeamsFranchises"     "TeamsHalf"           "sqlite_stat1"       
#> [31] "sqlite_stat4"

We can access one of these tables with lahman_tbl(), then put it through any kind of dplyr operation.

lahman_tbl("Batting")
#> # Source:   SQL [?? x 22]
#> # Database: sqlite 3.34.1
#> #   [/private/var/folders/wp/6jpw10dj1b13vw5n9bvf1dvc0000gn/T/RtmpuEyzKR/lahman.sqlite]
#>    playerID  yearID stint teamID lgID      G    AB     R     H   X2B
#>    <chr>      <int> <int> <chr>  <chr> <int> <int> <int> <int> <int>
#>  1 abercda01   1871     1 TRO    NA        1     4     0     0     0
#>  2 addybo01    1871     1 RC1    NA       25   118    30    32     6
#>  3 allisar01   1871     1 CL1    NA       29   137    28    40     4
#>  4 allisdo01   1871     1 WS3    NA       27   133    28    44    10
#>  5 ansonca01   1871     1 RC1    NA       25   120    29    39    11
#>  6 armstbo01   1871     1 FW1    NA       12    49     9    11     2
#>  7 barkeal01   1871     1 RC1    NA        1     4     0     1     0
#>  8 barnero01   1871     1 BS1    NA       31   157    66    63    10
#>  9 barrebi01   1871     1 FW1    NA        1     5     1     1     1
#> 10 barrofr01   1871     1 BS1    NA       18    86    13    13     2
#> # … with more rows, and 12 more variables: X3B <int>, HR <int>,
#> #   RBI <int>, SB <int>, CS <int>, BB <int>, SO <int>, IBB <int>,
#> #   HBP <int>, SH <int>, SF <int>, GIDP <int>
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

pool

pool with database connections (avoids opening and closing many connections so a shiny app can scale to many users).

  • Can help scale the use of databases with shiny
  • dbPool() allows you to do that (it replaces dbConnect())
  • Each query goes to the pool first, then fetches or initializes a connection
  • Also handles the disconnects in shiny

Use renv

Create reproducible environments for your R projects

  • Next generation of packrat
  • Isolated package library from rest of your system
  • Transfer projects to different collaborators/platforms
  • Reproducible package installation
  • Easily create new projects or convert existing projects

Upon initializing a project:

  1. Creates a project level .Rprofile to activate custom package library on start up
  2. Lockfile renv.lock to describe state of project library
  3. renv/library has the package information (doesn’t actually store them, instead those are at a central folder)
  4. renv/activate.R performs activation