Introduction
This is a guide on using R and RStudio. I wrote this guide as a way to capture all the various tips and tricks that I’ve accumulated as an R / RStudio user. I plan to update this periodically.
During the preparation of this guide, the R version that I’m using is 4.2.3, and the RStudio version is 2023.03.0 Build 386.
You may have a more recent (or older) build, but many (if not all) of these tips and tricks will work.
I recommend that you subscribe to R-bloggers a great resource for R-related content (e.g., news, guides, tutorials, and updates).
Contents
This is a list of topics in this guide. You can click on the link to be directed to the topic:
Data Frame and Object
There are a lot of terms and lingo that we’ll introduce in this guide. The most common ones that you’ll learn about are in this section.
A data frame is a 2 dimensional table that contains rows and columns; in other words, it is table that contains your data.
An object can represent anything that is created in R. An
object can be a value, vector, matrix, data frame, and results from a
function. We refer to things that we create or assign something to an
object. For example, we can assign a value of 5
to an
object called x
. Once we do this, we can use the object in
a variety of ways. In this example, I printed the value of the object
using print(x)
function. The output will generate a value
of 5
.
x <- 5
print(x)
## [1] 5
Use RStudio instead of R when coding
R is the programming language from which we code and run analyses.
RStudio is the graphical user interface that allows us to script and view our output. Formally, it’s the integrated development environment (IDE) for R.
It is recommended that you script in RStudio rather than hard code in R.
You can code in the R Console as needed. Sometimes I code directly in the R Console for a quick calculation or to test a piece of code. However, I recommend using the R script to maintain a log of your work.
Use the R Script, not the R Console
Scripting in RStudio allows you to write, edit, and save your R code without having to hard code directly into the R Console. It is a great environment to maintain a record of the codes that you are writing or testing. Additionally, you can save your script and share it with others.
Let’s look at the R Console:
R Console.
Coding directing into the R Console will not allow you to edit or save your work. It’s best to use RStudio for writing, editing, and saving your R-related work.
Here’s what RStudio looks like.
Rstudio interface.
RStudio is divided into four quadrants or panes.
The R Script is in the top left quadrant. This pane is where you will write, edit, and save your R code. This is also the area where you will do most of your work.
The bottom left quadrant is the R Console. This is where you will view the output from your R code.
The top right quadrant contains information about the R environment such as the data sets, data frame, and objects. This is where you can review the various data frames and objects that are created over the course of your analysis.
The bottom right quadrant is the Miscellaneous pane and contains the Files, Plots, Packages, and Help. This is where your data visualization will appear.
Once you get used to RStudio, you’ll find it to be quite convenient when coding in R.
Annotation
Annotation is a hallmark of coding. Whenever possible, annotate your
code. You annotate code in RStudio using the hash-tag
#
.
Here is an example:
#### Annotation example
#### Load library
library("openxlsx")
I recommend using annotation to help you organize your code and help others replicate your work. This also is a good method to help you recall what you did. Many years later when you return to your code, you will be grateful for good annotations.
Packages in R
R Package is a collection of tools (or “package”) that contains R functions, help files, and example data. It is a standardized format for R to download and install safely from the Comprehensive R Archive Network (CRAN), the main international R resource site. R Packages make coding in R much easier because they provide automated shortcuts to complex coding. Functions in the R Package can increase productivity and efficiency.
Suppose we wanted to create a contingency table (2 x 2 matrix) where
the frequency and proportions of males and females are grouped according
to their diabetes diagnosis. We can use the gmodels
package, which contains the CrossTable()
function. You can
read more about the gmodels
package in the Comprehensive
R Archive Network (CRAN) site.
To use a package in RStudio, you need to install it into your library
using the install.packages()
function and then load it
using the library()
function. Note: When you install the R
package, you don’t need to re-install every time you start a new R
session. But you need to load the R package each time you start an R
session using the library()
function.
#### Install gmodels package
## install.packages("gmodels") ### This is the R code to install the "gmodels" package. ###
#### Load library
library("gmodels")
Functions in R
An R function is code that automates a specific action. Within a function are arguments, which need to be entered in order for the action to take place.
For example, the CrossTable()
function has various
arguments that are needed for it to work.
Here are some of the arguments in the CrossTable()
function:
x
a variable from a data frame (independent
variable)
y
a variable from a data frame (dependent variable)
The arguments x
and y
are needed for the
CrossTable()
function to generate a contingency table.
Here is a motivating CrossTable()
example using data
from the Agency for Healthcare Research and Quality (AHRQ) Medical
Expenditure Panel Survey (MEPS). The data is already in a data frame
that is denoted as hc2021p
. We want to create a contingency
table for the number of males and females with and without diabetes who
are in the hc2021p
data frame.
CrossTable(x = hc2021p$sex, y = hc2021p$diabetes)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 11751
##
##
## | hc2021p$diabetes
## hc2021p$sex | No diabetes | Diabetes | Row Total |
## -------------|-------------|-------------|-------------|
## Male | 5484 | 533 | 6017 |
## | 0.100 | 1.086 | |
## | 0.911 | 0.089 | 0.512 |
## | 0.510 | 0.536 | |
## | 0.467 | 0.045 | |
## -------------|-------------|-------------|-------------|
## Female | 5272 | 462 | 5734 |
## | 0.105 | 1.139 | |
## | 0.919 | 0.081 | 0.488 |
## | 0.490 | 0.464 | |
## | 0.449 | 0.039 | |
## -------------|-------------|-------------|-------------|
## Column Total | 10756 | 995 | 11751 |
## | 0.915 | 0.085 | |
## -------------|-------------|-------------|-------------|
##
##
From the output, there are 533 (8.9%) males with diabetes and 462 (8.1%) females with diabetes. You can interpret the output by looking at the “Cell Contents.” This will reveal that the values in the rows and columns of the contingency tables denote the Chi-square distribution, row proportion, column proportion, and table proportions.
To see what arguments are in a function, you can type
?CrossTable
in onto the RStudio Console. This will generate
a R document in the lower right quadrant of RStudio. This contains
helpful information about the function such as its description,
arguments, and examples.
?CrossTable. ### Will open the R Documentation if the package is installed
## No documentation for 'CrossTable.' in specified packages and libraries:
## you could try '??CrossTable.'
??CrossTable ### Will open the R Documentation if the package is not installed
It is important to learn the arguments in the function. If you run into an error with a function, it’s likely due to a missing argument or incorrectly using the argument. If that happens, pay attention to the error message in the R Console.
Change the theme
You can change the theme of RStudio according to your preferences. Try the different themes until you find one that your prefer.
Changing your RStudio theme.
Turn on “Show line numbers”
It is recommended that you turn on the “Show line numbers” option in RStudio. This will add the line number to each line of code in the R Script panel. This is very helpful when making edits or during a troubleshooting process.
Turn on show line numbers.
After you turn the option on, the line numbers will appear on the left side of the R Script.
Line numbers appear left of the R Script.
Summary
I hope these tips and tricks are helpful to users. As I gain more experience with R and RStudio, I will update this guide in the future. Meanwhile, I hope you have a fantastic journey using these tools in your own work.
Disclaimers
This is a work in process, and I will update this periodically. This is for educational purposes only.