R and RStudio Tips and Tricks (Guide)

Mark Bounthavong

28 September 2024

Introduction

This is a guide on using R and RStudio. I wrote this guide as a way to capture all the various tips and tricks that I’ve accumulated as an R / RStudio user. I plan to update this periodically.

During the preparation of this guide, the R version that I’m using is 4.2.3, and the RStudio version is 2023.03.0 Build 386.

You may have a more recent (or older) build, but many (if not all) of these tips and tricks will work.

I recommend that you subscribe to R-bloggers a great resource for R-related content (e.g., news, guides, tutorials, and updates).

Contents

This is a list of topics in this guide. You can click on the link to be directed to the topic:

Data Frame and Object

There are a lot of terms and lingo that we’ll introduce in this guide. The most common ones that you’ll learn about are in this section.

A data frame is a 2 dimensional table that contains rows and columns; in other words, it is table that contains your data.

An object can represent anything that is created in R. An object can be a value, vector, matrix, data frame, and results from a function. We refer to things that we create or assign something to an object. For example, we can assign a value of 5 to an object called x. Once we do this, we can use the object in a variety of ways. In this example, I printed the value of the object using print(x) function. The output will generate a value of 5.

x <- 5

print(x)
## [1] 5

Use RStudio instead of R when coding

R is the programming language from which we code and run analyses.

RStudio is the graphical user interface that allows us to script and view our output. Formally, it’s the integrated development environment (IDE) for R.

It is recommended that you script in RStudio rather than hard code in R.

You can code in the R Console as needed. Sometimes I code directly in the R Console for a quick calculation or to test a piece of code. However, I recommend using the R script to maintain a log of your work.

Use the R Script, not the R Console

Scripting in RStudio allows you to write, edit, and save your R code without having to hard code directly into the R Console. It is a great environment to maintain a record of the codes that you are writing or testing. Additionally, you can save your script and share it with others.

Let’s look at the R Console:

R Console.

R Console.

Coding directing into the R Console will not allow you to edit or save your work. It’s best to use RStudio for writing, editing, and saving your R-related work.

Here’s what RStudio looks like.

Rstudio interface.

Rstudio interface.

RStudio is divided into four quadrants or panes.

The R Script is in the top left quadrant. This pane is where you will write, edit, and save your R code. This is also the area where you will do most of your work.

The bottom left quadrant is the R Console. This is where you will view the output from your R code.

The top right quadrant contains information about the R environment such as the data sets, data frame, and objects. This is where you can review the various data frames and objects that are created over the course of your analysis.

The bottom right quadrant is the Miscellaneous pane and contains the Files, Plots, Packages, and Help. This is where your data visualization will appear.

Once you get used to RStudio, you’ll find it to be quite convenient when coding in R.

Annotation

Annotation is a hallmark of coding. Whenever possible, annotate your code. You annotate code in RStudio using the hash-tag #.

Here is an example:

#### Annotation example
    
#### Load library
library("openxlsx")

I recommend using annotation to help you organize your code and help others replicate your work. This also is a good method to help you recall what you did. Many years later when you return to your code, you will be grateful for good annotations.

Packages in R

R Package is a collection of tools (or “package”) that contains R functions, help files, and example data. It is a standardized format for R to download and install safely from the Comprehensive R Archive Network (CRAN), the main international R resource site. R Packages make coding in R much easier because they provide automated shortcuts to complex coding. Functions in the R Package can increase productivity and efficiency.

Suppose we wanted to create a contingency table (2 x 2 matrix) where the frequency and proportions of males and females are grouped according to their diabetes diagnosis. We can use the gmodels package, which contains the CrossTable() function. You can read more about the gmodels package in the Comprehensive R Archive Network (CRAN) site.

To use a package in RStudio, you need to install it into your library using the install.packages() function and then load it using the library() function. Note: When you install the R package, you don’t need to re-install every time you start a new R session. But you need to load the R package each time you start an R session using the library() function.

#### Install gmodels package
## install.packages("gmodels")  ### This is the R code to install the "gmodels" package. ###

#### Load library
library("gmodels")

Functions in R

An R function is code that automates a specific action. Within a function are arguments, which need to be entered in order for the action to take place.

For example, the CrossTable() function has various arguments that are needed for it to work.

Here are some of the arguments in the CrossTable() function:

x a variable from a data frame (independent variable)

y a variable from a data frame (dependent variable)

The arguments x and y are needed for the CrossTable() function to generate a contingency table.

Here is a motivating CrossTable() example using data from the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS). The data is already in a data frame that is denoted as hc2021p. We want to create a contingency table for the number of males and females with and without diabetes who are in the hc2021p data frame.

CrossTable(x = hc2021p$sex, y = hc2021p$diabetes)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  11751 
## 
##  
##              | hc2021p$diabetes 
##  hc2021p$sex | No diabetes |    Diabetes |   Row Total | 
## -------------|-------------|-------------|-------------|
##         Male |        5484 |         533 |        6017 | 
##              |       0.100 |       1.086 |             | 
##              |       0.911 |       0.089 |       0.512 | 
##              |       0.510 |       0.536 |             | 
##              |       0.467 |       0.045 |             | 
## -------------|-------------|-------------|-------------|
##       Female |        5272 |         462 |        5734 | 
##              |       0.105 |       1.139 |             | 
##              |       0.919 |       0.081 |       0.488 | 
##              |       0.490 |       0.464 |             | 
##              |       0.449 |       0.039 |             | 
## -------------|-------------|-------------|-------------|
## Column Total |       10756 |         995 |       11751 | 
##              |       0.915 |       0.085 |             | 
## -------------|-------------|-------------|-------------|
## 
## 

From the output, there are 533 (8.9%) males with diabetes and 462 (8.1%) females with diabetes. You can interpret the output by looking at the “Cell Contents.” This will reveal that the values in the rows and columns of the contingency tables denote the Chi-square distribution, row proportion, column proportion, and table proportions.

To see what arguments are in a function, you can type ?CrossTable in onto the RStudio Console. This will generate a R document in the lower right quadrant of RStudio. This contains helpful information about the function such as its description, arguments, and examples.

?CrossTable. ### Will open the R Documentation if the package is installed
## No documentation for 'CrossTable.' in specified packages and libraries:
## you could try '??CrossTable.'
??CrossTable ### Will open the R Documentation if the package is not installed

It is important to learn the arguments in the function. If you run into an error with a function, it’s likely due to a missing argument or incorrectly using the argument. If that happens, pay attention to the error message in the R Console.

Change the theme

You can change the theme of RStudio according to your preferences. Try the different themes until you find one that your prefer.

Changing your RStudio theme.

Changing your RStudio theme.

Turn on “Show line numbers”

It is recommended that you turn on the “Show line numbers” option in RStudio. This will add the line number to each line of code in the R Script panel. This is very helpful when making edits or during a troubleshooting process.

Turn on show line numbers.

Turn on show line numbers.

After you turn the option on, the line numbers will appear on the left side of the R Script.

Line numbers appear left of the R Script.

Line numbers appear left of the R Script.

Summary

I hope these tips and tricks are helpful to users. As I gain more experience with R and RStudio, I will update this guide in the future. Meanwhile, I hope you have a fantastic journey using these tools in your own work.

Disclaimers

This is a work in process, and I will update this periodically. This is for educational purposes only.