1 Introduction

R statistical software and associated tools have the potential to increase your efficiency, simplify collaboration, and develop reproducible products. Often the biggest limitation to these tools is lack of awareness. This document is intended to provide you with an overview of some of the many R-related tools and is intended for individuals with at least a basic understanding of R. We will learn to create R functions, a robust project structure using RStudio, aesthetically pleasing documentation with R Markdown, and interactive tools with Shiny. In addition, we will delve into the ecosystem of packages developed by RStudio, known as the Tidyverse (e.g., dplyr, tidyr, purr, and ggplot2), which provide useful and intuitive functions for data manipulation, processing, and visualization. Overall, this document will provide you with the working knowledge of some of the most widely used R related tools available for your next project.

2 Quick Reference

Cheat Sheets: https://www.rstudio.com/resources/cheatsheets/
R Bloggers: https://www.r-bloggers.com/
Questions
- StackOverflow: https://stackoverflow.com
- R Community: https://community.rstudio.com/
Style Guides
- Hadely Wickham: http://style.tidyverse.org/index.html
- Google: https://google.github.io/styleguide/Rguide.xml
Books and Papers
- R for Data Science: http://r4ds.had.co.nz/
- Advanced R: http://adv-r.had.co.nz/
- R Packages: http://r-pkgs.had.co.nz/
- Tidy Data: https://www.jstatsoft.org/article/view/v059i10
Packages
- Shiny (Interactive Apps): https://shiny.rstudio.com/
  - Tutorials: https://shiny.rstudio.com/tutorial/
- Leaflet (Interactive Maps): https://rstudio.github.io/leaflet/
- DT (Interactive Tables): https://rstudio.github.io/DT/
- Dygraphs (Interactive Time Series Plots): https://rstudio.github.io/dygraphs/
- Plotly (Interactive Plots): https://plot.ly/r/
- Tidyverse Packages (Ecosystem of Packages): https://www.tidyverse.org/
- Rmarkdown (Documentation): https://rmarkdown.rstudio.com/lesson-1.html
- USGS GitHub: https://github.com/USGS-R
  - dataRetrieval (Acquire data from the Water Quality Portal): https://cran.r-project.org/web/packages/dataRetrieval/vignettes/dataRetrieval.html
  - EGRET (Analysis of long-term changes in water quality and streamflow): https://cran.r-project.org/web/packages/EGRET/vignettes/EGRET.pdf

3 Installation

3.1 R

https://cran.r-project.org/bin/windows/base/

3.2 RStudio

https://www.rstudio.com/products/rstudio/download/#download

3.3 GIT

https://git-scm.com/downloads

4 Updating Software and Packages

4.1 R

Run the following code in the R GUI (code copied from: https://www.r-statistics.com/2013/03/updating-r-from-r-on-windows-using-the-installr-package/). Make sure R Studio is closed.

# installing/loading the package:
if(!require(installr)) {
install.packages("installr");
require(installr)
} #load / install+load installr

# using the package:
updateR()

4.2 RStudio

In RStudio -> Help -> Check for Updates -> follow instructions

4.3 R-Packages

In RStudio -> Tools -> Check for Package Updates… - > follow instructions

5 Version Control

Version control software keeps track of changes made to files. This provides the user with the ability to revert changes back to an earlier version, a backup of the files, and simplifies collaboration efforts.

Git is a free open source version control system (https://git-scm.com/) and GitHub is a platform that allows the user to store their changes made via Git in the form of repositories (https://github.com/). R Studio has integrated Git into their R Studio IDE, making it easy to work with repositories from GitHub (https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN). Git will need to be installed locally (https://git-scm.com/downloads) and a GitHub account will need to be to be created (https://github.com/join?source=header-home) before these tools can be accessible from R Studio.

5.1 Git Resources

https://git-scm.com/doc

5.2 Link R Studio to GitHub Repository

Below are the steps for initializing an R project file connected to a GitHub repository. It is much easier to create the GitHub repository prior to creating project.

Create a new repository online on your GitHub account (https://github.com/)
Create a new project in R Studio (File -> New Project).
Select Version Control and Git (Version Control -> Git).
I believe R Studio should automatically recognize Git on your computer but I had to specify where the git.exe was located by going to Tools -> Global Options -> Git/SVN -> locate git.exe (my file path: “C:/Users/zsmith/AppData/Local/Programs/Git/bin/git.exe”).
Paste the repository URL (https://github.com/username/repository.git) into the “Repository URL” box in the R Studio window.
Copy the repository URL from GitHub.
Within the repositories “Code” tab select the green button labeled “Clone or Download.”
Copy the HTTPS URL provided.
Use the “Create project as sub-directory of:” box to manage where you want to store the project on your computer.
Click “Create Project.” The project will now be linked to the GitHub repository. A “Git” tab will appear within the “Environment” pane in R Studio. If you have an existing R project without a repository on GitHub and you would like to start using version control, I recommend starting from scratch. Create a GitHub repository, connect to a new R Studio project file, and copy the old R project files into the new R project folder.

5.3 Push and Pull Repository Changes in R Studio

Whenever updates are made to the files within the R project folder, they will be queued in the “Git” tab that appears in “Environment” pane in R Studio.

5.3.1 Pull

Pull every time you open the project to make sure you have the most up-to-date version of the repository. Changes may have been made by you from a different computer or by one of your collaborators.

5.3.2 Commit

A commit is an informative message to yourself or collaborators indicating why you made a change. When the “Commit” button is selected, an “RStudio: Review Changes” window will appear. In this window all of the altered files will appear in the upper left pane. By selecting an individual file in the upper left pane, the user can see the changes that were implemented in bottom pane of the “RStudio: Review Changes” window. Deletions will appear in red, while insertions will appear green. One or more files can be staged and then the user has three options: 1) Revert, 2) Ignore, or 3) Commit.

The “Revert” button will revert the staged file(s) to the previous state that is available in the GitHub repository.
The “Ignore” button will add the staged file(s) to the .gitignore file. The .gitignore file informs Git that a file should not be added to the GitHub repository and subsequent changes to the file should not be added to the GitHub repository. GitHub will prevent users from pushing large data sets, and thus large data sets should be added to the .gitignore file. Also, files containing sensitive information (e.g., usernames or passwords) should be added to the .gitingore file.
Staged file(s) require a commit message, an informative message indicating why a change was made, prior to being committed. All commits remain local until the “Push” button is used to send the changes to the GitHub repository.

5.3.3 Push

Push commits from R Studio to the GitHub repository.

5.4 Repository Branch

When a repository is created it consist of a single branch labeled “master.” The master branch will suffice as you first develop the app. However, you may reach a point where the master branch is functioning well (without any known issues) but you want to make some dramatic development changes. Rather than committing the changes to the master branch (potentially breaking your working product), you can create a new isolated branch to work on your development changes. In this case, the branch would clone the current state of the master, and then any edits made to the new branch would not impact the master branch.

When working in R Studio with GitHub, use the drop-down menu (located in the top right corner of the Git tab in the Environment pane) to select the branch you want to work on. In the image below I clicked on “master” and now I can see three branches are available for this project: 1) master, 2) Development, and 3) Zsmith. Simply select a name to change the branch you are working on.

5.4.1 Create a New Branch

Creating a new branch is relatively simple. There are three ways that I know how to create a repository branch: 1) via R Studio, 2) via GitHub, and 3) via Git Shell.

5.4.1.1 R Studio

In the “Git” tab of the Environment pane in R Studio, there is button for creating a new branch. Click on this button:

5.4.1.2 GitHub

Log on to your GitHub online GitHub account (https://github.com/) and navigate to the repository you are working on. Under the “Code” tab, click on the “Branch:” button. This will produce a drop down menu (as seen in the image below), where you can select an existing branch or create a new branch by typing the name you want to assign your new branch into the input box labeled “Find or create a branch.” Once you type in the new name, a new box will appear in the dropdown menu that says “Create Branch”. Click on this box to create the new branch.

5.4.1.3 Git Shell

The Git Shell can be accessed via R Studio in the Git tab of the Environment pane. Click on the “More” dropdown menu and then click “Shell…” (see image below).

A new window will appear. Use this link to understand how to create and work with branches in the Git Shell: https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging.

5.5 Merge Branches

After you have vetted a new branch, you can merge the new branch with the master branch (or some other branch). The merge will join all the changes made in the new branch and all of the changes made in the master branch. You may run into conflict issues if both branches updated the same section of code (https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/).

Merging branches is not as simple as creating branches. As far as I know, branches can only be merged using the Git Shell (see Git Shell to learn how to access the Git Shell). Use the following link to understand how to merge branches: https://help.github.com/articles/merging-an-upstream-repository-into-your-fork/

6 Style Guide

Code style is more important than you may first imagine. Adopting a consistent style will make it easier for you and your collaborators to read and comprehend your code. Please review and in future R-code use Hadley Wickham’s (http://style.tidyverse.org/index.html) style guide. As mentioned in Hadely Wickham’s guide, his guide is adapted from Google’s style guide (https://google.github.io/styleguide/Rguide.xml); therefore, there are many similarities. I do not want to recreate these style guides here, instead I want to highlight what I believe are some of the more important features.

6.1 Names

File names, function names, and column names should not contain spaces. It is very easy to create a name with two subsequent spaces by mistake and very frustrating to later trouble shoot why your call to this name in your R code is returning an error.

I use snake case, “snake_case”, instead of “snake case” or “snakeCase;” the later, “snakeCase”, is known as camel case. Many programmers use camel case but I find I make more typos when I use this naming scheme and in find snake case easier to read. Please use snake case.

6.1.1 Object Names: Discriptive Suffix

For object names, I prefer a style similar to the one found in Google’s style guide but with a descriptive suffix. I cannot provide a reference to this style but at one point I adopted a descriptive suffix that describes the objects class (e.g., data frame = “.df”, vector = “.vec”, scalar = “.scal”, list = “.lst”, matrix = “.mat”, and time series = “.ts”). I have found this simple naming scheme to be very helpful because I immediately know what the intended class of the object at any point that it is referenced in the script. This makes it easier to identify a bug if an object named “object.df” is represented in my RStudio Environment pane as a vector or list.

Examples:

# Data Frame-------------------------------------------------------------------
my.df <- data.frame()

# Vector-----------------------------------------------------------------------
my.vec <- c("a", "b", "c")
my.scal <- "a"

my.vec <- c(1:3)
my.scal <- 1

# Matrix-----------------------------------------------------------------------
my.mat <- matrix()

# List-------------------------------------------------------------------------
my.lst <- list()

# Time Series------------------------------------------------------------------
my.ts <- ts()

6.2 Spacing and Indenting

Please follow the spacing (http://style.tidyverse.org/syntax.html#spacing) and indenting (http://style.tidyverse.org/syntax.html#indenting) guide lines provided in Hadely Wickham’s style guide. I find it very difficult to follow code that does not adhere to these guidelines.

The following “good” and “bad” examples will create the exact same data frame. However, the “good” example is much easier to read and interpret. Good Example:

good.df <- data.frame(
  alphabet = letters,
  square_root = sqrt(81),
  add = 1 + 1,
  subtract = 1 - 1,
  multiply = 2 * 2,
  divide = 2 / 2,
  power = 2^2
)

Bad Example:

bad.df<-data.frame(alphabet=letters,square_root=sqrt(81),add=1+1,subtract=1-1,multiply=2*2,divide=2/2,power=2^2)

7 R Markdown

R Markdown is an excellent R-package that allows the user to integrate text, R-code, and R-code output into a well formatted document (e.g., HTML, MS Word, PDF). See the following link for R Markdown tutorials: https://rmarkdown.rstudio.com/lesson-1.html.

My recommendation is to create an R Markdown file for every R-project. The intention is to document as much of the project as possible. R Markdown provides a more readable document, with better descriptions of how and why an activity was performed, than a standard R script with a few commented lines.

7.1 Create a New Document

Click on the new document buttion:
Click on R Markdown:

Provide a “Title:”, select the “Defualt Output Format:”, and click “OK”

A new R Markdown document will appear with some instructions and example text/code. Delete everything after the YAML header:

---
title: "Untitled"
author: "Zachary M. Smith"
date: "September 23, 2018"
output: html_document
---

7.2 Editing

Again, your best resource for learning how to use R Markdown will be the R Markdown website (https://rmarkdown.rstudio.com/lesson-1.html), but I will describe some of the general features here.

7.2.1 Heading Text

Heading text follows one or more hash-sign(s) (#). The number of hash-signs determines the hierarchy of headings. For example, “# Heading 1” would represent the primary heading, “## Heading 2” would represent the secondary heading, “###Heading 3” would represent the tertiary heading, and so forth.

7.2.2 Plain Text

Simply add text below the YAML header.

7.2.3 Insert Code Chunks

To insert a code chunk, press Ctrl + Alt + i in the source pane (top left pane in the default settings of Studio). A code chunk will appear:

Inside the code chunk you can write and run R-code. If you print the output of your R-code it will appear below the code chunk in the source pane and the printed output will appear in the final compiled document. This is useful for producing figures and tables.

7.3 Compile the Document

To view the html document, you must compile the document using Knit. Follow these steps to Knit the document:

Find and click the Knit button (it looks like a ball of yarn) in the toolbar above the editor window.
If a window appears saying “Install Required Packages” for R Markdown, install the necessary packages for knitting the document.
The compiled file will be saved in the same directory as your Rmd file (your R Markdown file).

7.4 File Management

I store the R Markdown file(s) in a sub-directory labeled “notebook” within the R-project folder (rproject/notebook).

7.5 Child Documents

In general, I find that a single R Markdown file quickly becomes unwieldy. I recommend breaking the document up into multiple “child” documents and sourcing these child documents in a parent document. My child documents generally represent major subsections of the document.

I store the parent R Markdown file in the “notebook” folder (rproject/notebook) and the child R Markdown files in a sub-directory of my “notebook” folder called “sections” (rproject/notebook/sections). In the parent file, the child files are sourced within the code chunk header using “child = ‘sections/example.Rmd’. After sourcing all the child chunks, the parent file can be knit (compiled) like a normal R markdown document. The child documents cannot be run in the parent file.

7.5.1 Extract and Run R-Code from R Markdown Files

The parent file is great for organizing sections of your document, but the child documents cannot be executed within R Studio like a normal code chunk. Without the ability to easily execute the R code within the child documents it can become very difficult to develop new child documents because new child documents often depend on upstream code execution.

Imagine you have a parent document that sources child sections which import your data and clean your data. You now want to visualize your data; accordingly, you begin to develop a visualization child document, which depends on information from the upstream child sections. It would be inefficient and inappropriate to perform all the steps in the upstream child sections within the visualization section. Therefore, you need an effective way to execute the upstream child sections while you continue to develop the visualization section. The inefficient way of doing this is to open each child Rmd file in R Studio and execute them manually in the correct sequence. This becomes tedious after you have three or more documents (imagine doing this for 10+ child sections). The most efficient way that I have found to run upstream child sections is to extract the R-code chunks from each Rmd file, save them in a “raw_scripts” folder, and then source/execute the scripts within a regular R script file (.R).

7.5.1.1 R Code

In this section we establish the file path to the folder that contains all the child documents. The names of the child documents are extracted and stored as a vector. The grepl() function is used to retain only the Rmd files stored in the vector.

sections.path <- c("notebook/sections")
r.files.vec <- list.files(sections.path)
r.file.vec <- r.files.vec[grepl(".Rmd", r.files.vec)]

Next, a file path is specified for the R-scripts that will be extracted from the R Markdown documents; I place these files within a “raw_script/extracted” folder. The map() function from the purrr package is used to loop through each file in the vector (r.files.vec). Within the map() loop, the purl() function from knitr is used to extract the R-code from the R Markdown documents and save the code to the specified folder.

extracted.path <- c("notebook/raw_script/extracted")
purrr::map(r.files.vec, function(file.i) {
  file.name <- gsub(".Rmd", "", ".R")
  extracted.file <- paste0(file.name, ".R")
  knitr::purl(
    file.path(sections.path, file.i),
    file.path(extracted.path, extracted.file)
    )
})

Finally, create a vector of file names (source.vec) stored in the “raw_script/extracted” folder. You will want to type these out manually (do not use list.files() functions) because in this format you can easily comment out certain scripts and only run the scripts of interest. The map() is then used to loop through each specified file in source.vec. Keep in mind that the order of the file names specified in source.vec will determine the order that these files are executed in the map() function; therefore, order the files in source.vec from furthest upstream to furthest downstream. Each iteration of the loop, executes (sources) the specified R-script.

source.vec <- c(
  "import_data.R",
  "clean_data.R",
  "visualize_data.R"
)

purrr::map(source.vec, function(source.i) {
  source(file.path(extracted.path, source.i))
})

Once all the R-scripts extracted from the upstream child R Markdown files have been executed, you can begin or continue work on a new child R Markdown document. I keep all the above code in a single R-script and execute the entire script each time I use this file to make sure all of the files are up-to-date.

7.6 Parameterized Reports

https://www.coursera.org/lecture/reproducible-templates-analysis/adding-parameters-in-a-document-template-6fQwc

8 Shiny

8.1 What is Shiny?

Shiny is an R statistical software package that enables the developer to create interactive web applications (apps).

Example: https://zsmith.shinyapps.io/WQT_Shiny/

8.2 Resources

R Studio has a website dedicated shiny (https://shiny.rstudio.com/). There are a lot of resources available here but for those just learning shiny or looking for a shiny refresher I would direct you to the tutorial page (https://shiny.rstudio.com/tutorial/).

8.3 Project Composition

Shiny apps are mainly composed of three files: 1) ui.R, 2) server.R, and 3) global.R.

8.3.1 ui.R

In this file, you will specify the aesthetic aspects of the app. This includes: the presence/absence of a navigation bar, the presence and position of a dropdown menu, the presence and position of a sliderbar, and the location of figure created in the server.R file.

8.3.2 server.R

In this file, you will specify reactive functions that respond to user inputs. For example, an app may contain a dropdown menu of sample sites and scatterplot representing the site selected in the dropdown menu. Each time the user selects a new sample site from the dropdown menu the scatterplot would update.

8.3.3 global.R

In this file, you should include code that is static. This includes: loading libraries, sourcing functions, and potentially importing data. These activities are intended to occur once and will not be reacting to user inputs.

8.3.4 Structure

Shiny projects generally grow rapidly and it can become difficult to navigate hundreds of lines of code in a ui.R or server.R file. My preference is to break the code up into independent and more manageable scripts that are sourced in the ui.R or server.R files. For example, imagine you are developing an app which contains two tabs, one dedicated to tabular data and one dedicated to an interactive map. I would develop separate R scripts for server code associated with each tab. Similarly, I would create separate R scripts for the ui code associated with each tab. These files are stored in the appropriate folders labeled either “ui” or “server.” When sourcing files in a shiny app you must specify “local = TRUE”.

source("server/select_import_server.R", local = TRUE)

8.3.5 R-Packages

Most Shiny apps will require multiple R-packages. I recommend loading all of the necessary R-packages in the global.R file. This makes it simple to identify all the packages you must have installed locally to edit and develop a given shiny app. One way to simplify this task is to use the example provided by the following link: https://www.r-bloggers.com/install-and-load-missing-specifiedneeded-packages-on-the-fly/. Following this scripts template:

You specify all of the necessary R-packages.
The code checks that all these packages are installed.
If any packages are not installed, the code will install these packages.

This makes it easier to collaborate with others or work on multiple computers.

8.4 Helpful R-Packages

8.4.1 DT (Interactive Tables)

The R-package, DT, is great resource for creating interactive tables (https://rstudio.github.io/DT/).

8.4.2 dygraphs (Interactive Time Series Plots)

The R-package, dygraphs, is great resource for creating interactive time series plots (https://rstudio.github.io/dygraphs/).

8.4.3 leaflet (Interactive Maps)

The R-package, Leaflet, is a great resource for creating interactive maps (http://rstudio.github.io/leaflet/). When using this package in shiny there are a few steps you need to take to make the map function well (http://rstudio.github.io/leaflet/shiny.html). It is generally useful to create a leafletProxy, which will load the base map as shiny output. You can then use reactive functions to update the points presented on the map. Using leafletProxy, only the map points will update, the base map will remain unchanged. This prevents the need to reload the entire map each time the points are updated, which makes it appear that the map is flashing.

8.4.3.1 Mapbox

Mapbox is a platform that allows you to host interactive base map layers (https://www.mapbox.com). This is useful for shiny because these base layers can be referenced via the leaflet package function addTiles.

leaflet::leaflet( )%>%
    leaflet::addTiles(urlTemplate = "https://api.mapbox.com/styles/v1/skaisericprb/citvqu6qb002p2jo1ig5hnvtk/tiles/256/{z}/{x}/{y}?access_token=pk.eyJ1Ijoic2thaXNlcmljcHJiIiwiYSI6ImNpa2U3cGN1dDAwMnl1cm0yMW94bWNxbDEifQ.pEG_X7fqCAowSN8Xr6rX8g")

8.4.4 plotly (Interactive Figures)

The R-package, plotly, is great resource for creating interactive figures (https://plot.ly/r/).

8.5 Publishing

shinyapps.io is a shiny hosting platform provided by R Studio (http://www.shinyapps.io/). Users must create a shinyapps.io free account or a paid account. The free account limits the number of applications you can publish and the number of hours the app can be active per month. There are multiple tiers to the paid accounts. As the user pays more, the user can publish a greater amount of applications, more active hours are available per month, and other additional benefits are supplied by R Studio.

8.5.1 How to Publish to shinyapps.io

Click on the “Publish to Server” button located in the top right corner of the source pane.
A drop-down menu will appear. Select the appropriate shinyapps.io app.
- If this is your first time connecting to a shinyapps.io account.
  - Make sure you have created a shinyapps.io account (https://www.shinyapps.io/admin/#/login).
  - In the drop-down menu select “Manage Accounts.” This will bring you to the “Publishing Accounts” section of R Studios “Options.”
  - Select “Connect…” -> “ShinyApps.io” option -> follow instructions.
The “Publish to Server” window will appear. Select all of the files that are necessary for the app to run. Do not include unnecessary files, as this could slow down your app or make the app too large to host under your current account settings. Check the “Launch browser” check box. Click “Publish.”
A “Deploy” tab will appear in the console pane, which will inform you that R Studio is working on publishing your app. This will take a few minutes. If there are any issues, the app will stop deploying and you will receive an error message.

R in Practice (DRAFT)

Zachary M. Smith

September 23, 2018