Overview

RStudio Desktop

RStudio is at its heart a client-server application. Let’s look at RStudio Desktop first as it’s the simplest configuration, consisting only of a single client and a single server. Even though it appears to the user a single application, these two processes communicate over a socket using JSON.

rstudio.exe

This process functions much like a web browser. And in fact, it very nearly is a web browser. On Windows (rstudio.exe) and Linux (rstudio), the desktop executable is little more than a Qt window frame with some menus and a QtWebKit web browser control. On macOS, the desktop frame is written in Cocoa for performance reasons, but the architecture is otherwise identical.

rsession.exe

This is the process containing the actual R session. It isn’t quite the same as starting R at the command line as it doesn’t run the same binary you get when you type R at the console. Instead, it loads R as a library – a DLL on windows, and a shared object file on Linux and macOS – and calls the library when needed to evaluate R code, process console input, and so on.

The R session process acts much like a web server in the desktop configuration, but we’ll avoid referring to it as the “server” for reasons which will soon become clear.

RStudio Server

The open-source version of RStudio Server is much like RStudio Desktop, except that a web browser functions as the client, and there can be multiple sessions (one per user). It also introduces a new process called rserver, which is responsible for routing traffic from web browsers to sessions.

The communication between the web browser and the server happens over the network, and the communication between the server and the individual R sessions happens over a Unix domain socket.

The rserver process is also responsible for serving the login page and starting sessions. When a user visits the RStudio URL, it will check to see if that user already has a session running, and:

If they have a session running, redirect them to that session.
If not, start a new session (a new rsession), and redirect them to the new session.

RStudio Server Pro

RStudio Server Pro introduces some more moving parts. In RStudio Server Pro (hereafter “RSP”), there can be multiple sessions for each user. Sessions are also launched differently, in order to support features like PAM sessions.

rserver-launcher

This is the process which actually launches sessions. It receives launch requests from the rserver process, and runs the rsession-run script to start the session.

rserver-http

RSP includes a bundled copy of Nginx, which is used as a reverse proxy to feed HTTP traffic into the main binary. The Nginx configuration is dynamically generated each time RStudio Server starts.

rserver-monitor

The “monitor” process is responsible for RSP’s metrics and monitoring features.

Authentication

PAM

When PAM authentication is used, RStudio uses a helper binary called rserver-pam to validate the username and password. This binary is launched with the username as an argument; the password is sent from RStudio to the rserver-pam executable over standard input, after which rserver-pam passes it on to PAM itself. RStudio uses rserver-pam’s exit code to determine whether the authentication was successful.

It’s not possible to run rserver-pam directly, but RStudio comes with a pamtester utility which can be used to invoke it for testing purposes.

Sessions

In the open source version of RStudio, PAM is used only when the user signs in. However, in many environments, PAM modules also need to run to initialize the R session; for instance, to set environment variables, or issue a Kerberos ticket for use with the session. RStudio Server Pro makes this possible by delegating session launches to a dedicated binary named rserver-launcher.

You can disable PAM sessions by using auth-pam-sessions-enabled=0 in rserver.conf. If you do this, you’ll notice that rserver-launcher is not used at all; instead, Rstudio will launch sessions directly.

Session Management

One of the primary functions performed by RStudio Server is the creation and management of R sessions. Understanding everything that happens when a session is started and how its lifetime is managed is invaluable in troubleshooting!

Starting an R Session

R itself performs a large number of tasks when it starts up, many of which can run arbitrary code. They are all documented in one of R’s most useful help pages, R: Initialization at the Start of a Session. In RStudio Desktop and RStudio Server, there is little else executed when a session starts up.

RStudio Server Pro

As we’ve already seen, however, RStudio Server Pro launches sessions a little differently.

Prior to launching the process, it sets environment variables appropriate to the R version. This is what makes it possible for it to support multiple R installations simultaneously.
Instead of launching the rsession process directly, it uses the rserver-launcher process, which is responsible for doing PAM session management (documented in the admin guide).
The sessions are launched using an rsession-run script. This script runs /etc/rstudio/rsession-profile, then uses bash to launch the session, with the result that startup actions performed by bash will be run prior to starting the session.
Finally, the session itself starts, with all of the R Startup behavior described above.

Command line R

One of the first questions to ask when experiencing trouble in RStudio is whether the same problem exists when running R at the command line. This helps to ascertain whether the problem actually has anything to do with RStudio, which it often does not.

R at the command line follows the R documentation as regards to startup, but it’s important to note that typing R in your console actually invokes a script – try which R to see where the script lives. This script is responsible for setting variables such as R_HOME. RStudio does not invoke this script, so customizations to the script, which are often performed by distributions such as MRO, are not reflected in RStudio.

Suspend and Resume

State Folders

RStudio has three main levels of state storage: per-user, per-project, and per-session.

Type	Example
Per-User	Preferred color scheme and pane layout.
Per-Project	Indentation and tab settings.
Per-Session	Open files, environment contents.

There are two folders used by RStudio to save this state. One is .rstudio, which is stored in the user’s home directory, and the other is .Rproj.user, which is stored in the top level of a project.

Both store session-specific data, but .rstudio serves two purposes: first, it stores all the data that is user-specific, but not project-specific, such as global settings. Second, it doubles as a .Rproj.user folder when there is no specific project open. For this reason, many of the folders that exist in .rstudio also exist in .RProj.user., but the reverse is not true.

.rstudio

File/Folder	Contents
`client-state/`	The state of the IDE’s user interface, such as the current size of the panels.
`history_database`	History of R console commands and when they were executed.
`projects_settings/`	A list of unique IDs for every project, and which to load at startup.
`monitored/`	Global user preferences, saved settings, snippets, and MRU lists (recently opened files)
`rversion-settings/`	Default R versions for the user and each of their projects.
`unsaved-notebooks/`	Cached output for R Markdown notebooks that haven’t been saved.
`notebooks/`	Cached output for R Markdown notebooks that have been saved.

Again, .rstudio is also a project folder, so it also contains all the files and folders described in .Rproj.user below.

.Rproj.user

Because there is only one .Rproj.user folder but potentially several different users for the project, each user has their own folder beneath .Rproj.user. The folder is named with the user’s ID, which can be found as contextIdentifier="XXXXXXXX" in .rstudio/monitored/user-settings. For this reason, removing the .rstudio folder has the side effect of losing all your state in each project – not because that state is actually stored in .rstudio, but because the key used to look it up is stored there.

Each user’s folder inside .Rproj.user contains the following contents:

File/Folder	Contents
`console`	The scrollback buffer, and state, of each open terminal.
`saved_source_makers`	Source markers (lint, syntax errors, etc.) for open files.
`sessions/`	Session-specific data.
`sessions/graphics/`	Plots, past and present.
`sessions/active/`	List of active sessions and session-specific data for each.
`sessions/session-routes/`	Specifies the RSP instance associated with a session.
`session-persistent-state`	Which browser is connected to the session, and whether it crashed.
`pcs/`	The state of the IDE’s user interface.
`sources/`	Source database: contents of open files. Was called `sdb` prior to 1.1.
`sources/per/t/`	Contents of files with filenames (“titled”).
`sources/per/u/`	Contents of files that have never been saved (“untitled”).
`sources/prop/`	Properties/preferences associated with files.
`viewer-cache/`	The data for any open data viewer tabs.
`viewer_history`	A list of past locations (URLs) for the Viewer tab.

There are also a handful of files in .Rproj.user which aren’t user-specific. These are stored directly beneath .Rproj.user in a folder called shared.

File/Folder	Contents
`notebooks/paths`	A list of unique IDs for each notebook in the project.
`notebooks/`	Per-user notebook data, such as saved notebook output.
`users`	A list of all the users who currently have the project open

Licenses

RStudio supports two different kinds of licenses: traditional and floating. They are implemented using TurboActivate and TurboFloat, respectively.

The License Manager

RStudio validates its license when it starts up, and every 12 hours thereafter, using a helper binary called license-manager. This binary is a sort of Swiss Army knife. It links against both the TurboActivate and TurboFloat libraries. It includes commands run both by the user, such as activate, and by RStudio, such as verify.

Signed Output

You might be wondering why an enterprising administrator couldn’t just replace this binary with one that always reports that the license is valid. This is possible, but it’s not easy. The license manager commands that are designed to be invoked by RStudio include a signature that’s validated by RStudio when the JSON is consumed. You can see this for yourself by running a license manager command directly.

$ sudo rstudio-server license-manager verify
KZGZ1daqboVYZ4WCoCIN06MHTkGTU7l5INE3bXcfnWw=
{"days-left":1,"status":"evaluation","ts":1485974178819.000}

This is exactly what RStudio does to ensure that the license is valid. Since it also validates the signature, forging a response is difficult unless you have the signing key.

Commands

Here’s a list of each license-manager command and what it does:

Command	Argument	Action
`acquire-lease`		Acquire a lease on a floating license. Used by RStudio.
`activate-offline-request`		Generate a request to activate offline.
`activate-offline`	File	Use a file to activate offline.
`activate`	Key	Activate using the specified key.
`begin-evaluation-offline`	File	Use a file to begin a verified trial offline.
`begin-evaluation-request`		Generate a request to begin a verified trial.
`deactivate-offline`		Deactivate offline.
`deactivate`		Deactivate online.
`extend-evaluation-offline`	Key	Use an offline key to extend evaluation.
`extend-evaluation`	Key	Use an online key to extend evaluation
`initialize`		Load licensing and start an eval if necessary. Used by RStudio.
`license-server`		Set the server to use for floating licensing
`status-offline`		Report the current license status
`status`		Report the current license status (identical to `status-offline`)
`verify`		Verify that there is a valid license. Used by RStudio.

Floating Licensing

Floating licensing is a little more complicated; instead of just providing a key to RSP, you provide the key to a license server, and then you tell RSP where the license server is.

Internally this works by using the acquire-lease argument to the license manager. In fact, if you want to test floating licensing, you can invoke acquire-lease yourself, just as RStudio does.

$ sudo rstudio-server license-manager acquire-lease

If it is successful in obtaining a lease, the command will not terminate; instead, it will keep running, renewing the lease automatically when necessary, and printing the license status on standard output whenever it changes. Press Ctrl+C (or send e.g. SIGTERM) to terminate the command and relinquish the lease.

Project Sharing

Project Sharing is a Server Pro feature which enables users to share projects with each other. One user (the owner) maintains a copy of the project in their home directory, and other users are granted access to the project.

Initial Permissions

When a project is initially shared, the first thing we need to do is grant other users the ability to edit all the files in the project. This is done using Access Control Lists (ACLs), so that the original file permissions are not modified.

Owner Bits to ACL

The first step is to take the Unix permissions for the file’s owner and convert them into ACL entries. For example, if Bob has read and execute permission for a file (-r-x------), and shares it with Alice, and Eve, then we need to create an ACL which grants the following permissions:

Read and execute for Alice
Read and execute for Eve

Again, these are attached with an ACL, so that they don’t mess with the file’s actual (Unix) permissions.

Directory Default Bits

Most systems that have ACLs support a “default ACL”. This is a special ACL that is attached to directories. It does not act on the directory itself. Instead, it is used as a template and copied to the ACL for any new entries that are created in the directory.

Obviously we don’t want to have to try to set all the ACLs for new content as it’s created (this would be very difficult to do!). So part of establishing the ACLs for directories involves the creation of a default ACL which grants all users with whom the project is shared access to new content created in the directory.

Changing ACLs

It’s not possible for the rsession process to change ACLs itself. This is because projects can be shared by people other than their owners, but it is not generally possible to change the ACLs for files or folders you don’t own (for obvious reasons).

In order to modify ACLs, the following happens:

rsession creates a JSON blob describing the ACL change required, and sends it upstream to rserver.
rserver (which has dropped privilege) starts a helper process called rserver-acls.
rserver-acls runs as root and drops all nonessential capabilities. It then receives the JSON describing the ACL change required on standard input.
rserver-acls attempts to execute the ACL change and prepares a JSON blob describing the result.
rserver passes the result back to rsession, where it’s sent to the user (project

NFSv3, v4, and File Systems

The rserver-acls process is responsible for performing the difficult task of taking an abstract job (grant read access to Alice) and converting it into actual ACL changes. It is here, for example, that we must translate the permissions change into a POSIX ACL, NFSv3 ACL, or NFSv4 ACL.

It is here that a lot of customers run into trouble, because ACLs aren’t used by a lot of mainstream software (i.e. RStudio is the only thing that needs them) and are often not configured or even supported.

In order to be usable, an ACL must be supported both at the transport layer (e.g. an NFS client/server) and the storage layer (e.g. a ZFS volume on a NAS).

POSIX/NFSv3

POSIX and NFSv3 ACLs were the first to be supported by project sharing, because they’re effectively the same thing. This means that ACLs work locally just like they do on the network, so RStudio didn’t have to worry about whether the project was on a network drive or not. Convenient!

A POSIX ACL is implemented as an extended attribute on the file. This means that in order to use POSIX ACLs, the volume on which the file is stored has to support extended attributes (user_xattr). On an NFS mount, acl is often required as a mount option.

NFSv4

NFSv4 ACLs are a good deal more complicated than NFSv3 ACLs, and they’re also much more expressive.

Directory Traversal

Yet another technical hurdle for project sharing is that file permissions alone don’t guarantee that you can use the file. This is a problem in project sharing because the directory in which the project resides is often in a locked-down home directory. A simplified example:

/home          -r-xr-x-r-x
  /bob         -rwx-------
    /project   -rwx----r--
      /file.R  -rw-rw--rw-

In this example, file.R, while it is readable by anyone according to its permission bits, is actually only readable by bob because no one else can get into the project directory.

In Linux, the execute (--x) bit addresses this problem. On directories, it gives “traversal” permission. So in order to actually make it possible for anyone to read file.R, the following permissions are needed:

/home          -r-xr-x-r-x
  /bob         -rwx------x
    /project   -rwx----r-x
      /file.R  -rw-rw--rw-

Note that this doesn’t give anyone the permission to see what’s in bob’s home directory (as they would need read permission on the directory to do that).

Project sharing has to account for this, which means it needs to modify ACLs not only on all the project files itself but also on every directory above the project, to ensure users can access it.

Of course we don’t want to try to change permissions on all directories up to /, so we stop at the home directory. (Currently this means that we don’t try to do directory traversal fixup unless the project is in a home directory.)

Magic File Advertisement

Once the permissions are correct, we need to make it possible for other users to open it – they can’t just browse for it since they likely cannot read the directories above it; they need a direct link.

In order to provide this, every shared project has a magic .proj file in the Shared Storage folder. This folder path is configurable, but its default is:

/var/lib/rstudio-server/shared-storage/shared-projects

The name of the file is the user ID and project ID of the shared project (see Session URLs elsewhere in this doc for details). It contains a small JSON payload such as the following:

{
   "project_dir":"/home/jonathan/example",
   "project_file":"/home/jonathan/example/example.Rproj",
   "project_owner":"jonathan",
   "share_times":[
      {
         "time":1563830777255,
         "user":"maria"
      }
   ],
   "shared_with":[
      "maria"
   ],
   "updated_by":"jonathan"
}

The Magic File has ACLs just like the project it represents, so it can only be read by users with whom the project is shared.

The RStudio IDE enumerates all the Magic Files when it’s building the “Shared with Me” sections of the IDE, which is how it’s able to know quickly when you’ve shared a project with someone.

Presence Indicators

When you have an active R session in a shared project, the other users who are in the project can see that you’re there too.

This feature is implemented using “sentry files”, which are created by each R session when it has the project open and removed by the R session when it ends. The files live here:

/home/user/project/.Rproj.user/shared/users

RStudio monitors this directory to show you who is active in the project.

Collaborative Editing

The RStudio IDE has real-time collaborative editing. This is one of the most important project sharing features because it largely eliminates the possibility of write conflicts. Without it, it would be very easy for two people working on a project at the same time to overwrite each others’ changes without realizing it.

The core collaborative editing experience is powered by third-party utilities – most notably FirePad. FirePad is an open-source collaborative editing engine which is meant to work with Google’s Firebase.

Here’s a rough overview of how collaborative editing is coordinated:

Using presence indicators (see above), RStudio sessions notice that two users have the same file open.
The sessions figure out which user has the newer copy of the file.
The user who has the most recent copy of the file becomes the host. That user’s rsession process starts a process called filebase, which emulates just enough of the Firebase interface to power a collaborative editor for one file.
The filebase collaborative editing server starts a server on a Websocket and sends the address and port to all of the RStudio sessions.
All of the RStudio sessions initialize a Firepad instance, which bind the text editor in RStudio (ACE) to the collaborative editing server URL.
The filebase server continues to run until all users have closed the file.

Session URLs

RStudio session URLs look completely random. Here’s an example:

https://myserver/s/93ff08241028fde4b4e02

However, there is some internal oder. The 21 hex digits are divided into the user ID (5 digits), the project ID (8 digits), and the session ID (8 digits).

User ID

The user ID (93ff0 in this example) is the ID of the user who owns the project in question (not necessarily the current user). The user ID in the URL is an obfuscated version of the user ID the user would have in Linux.

Project ID

The project ID (8241028f in this example) is the ID of the project. Every project folder has a unique project ID which is assigned the first time the project is opened (a mapping of project folders to IDs can be found in the user’s .rstudio folder).

There are some project IDs baked in to RStudio; for example cfc78a31 is a special ID corresponding to the empty project (None).

Session ID

The session ID (de4b4e02 in this example) is the ID of the individual R session.

Notice that (a) all of your own session URLs start with the same 5 hex digits, and (b) opening a new session for your current project will produce a URL which is identical except for the last 8 digits.

Multiple Versions of R

One of the banner features in RStudio Server Pro is its support for multiple versions of R. The customer-facing use of this feature is well documented so in this document we will focus primarily on the internals.

Querying

When RSP starts up, it attempts to ascertain all of the versions of R that are available. For versions of R installed in “standard” locations, it should Just Work; for versions installed in other locations, the admin needs to create a text file that lists the installed path to the other versions.

TODO

Here are some more things we’d like to see in this document:

discuss how multiple versions of R work
discuss how project, user, and session IDs work
discuss source databases
rstudio graphics device
discuss the session local streams; follow HTTP request to session
unusual IPC
discuss how to iterate development

RStudio Architecture

Jonathan McPherson

Overview

RStudio Desktop

rstudio.exe

rsession.exe

RStudio Server

RStudio Server Pro

rserver-launcher

rserver-http

rserver-monitor

Authentication

PAM

Sessions

Session Management

Starting an R Session

RStudio Server Pro

Command line R

Suspend and Resume

State Folders

.rstudio

.Rproj.user

Licenses

The License Manager

Signed Output

Commands

Floating Licensing

Session URLs

User ID

Project ID

Session ID

Multiple Versions of R

Querying

TODO