Introduction to RStudio

RStudio is a four-pane workspace for

  1. creating file containing R script,
  2. typing R commands,
  3. viewing command histories, and
  4. viewing plots and more.

The four panels:

  1. Top-left panel: Code editor allowing you to create and open a file containing R script. The R script is where you keep a record of your work. You can also type R commands here which we will do.

  2. Bottom-left panel: R console for typing R commands

  3. Top-right panel:

    • Environment tab: shows the list of R objects you created during your R session
    • History tab: shows the history of all previous commands
  4. Bottom-right panel:

    • Files tab: show files in your working directory. These are folders on my computer where R is installed. You will want to create a folder for this course similar to what I have done. This can be done by simply clicking the “New Folder” icon along the menu bar for this window.
    • Plots tab: show the history of plots you created. From this tab, you can export a plot to a PDF or an image file
    • Packages tab: show external R packages available on your system. If checked, the package is loaded in R.

Some basic operations in R

Simple operations:

Arithmetic Operators include:

Operator Description
+ addition
- subtraction
***** multiplication
/ division
^ or ** exponentiation

Logical Operators include:

Operator Description
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
3+2
## [1] 5
3-2
## [1] 1
3*2
## [1] 6
3/2
## [1] 1.5
3^2
## [1] 9
2>2 # FALSE
## [1] FALSE
2>=2 # TRUE
## [1] TRUE
2==2 # TRUE
## [1] TRUE
2!=2 # FALSE
## [1] FALSE

Defining and storing scalars

a=4 # save 4 in a
b=3 # save 3 in b
c=a*b # compute a*b and save a*b in c
c # print c in the console
## [1] 12
d = (2>=4) # compute 2>=4 and save it as d
d # print d in the console
## [1] FALSE

Vectors

v=c(1,4,3,2) # create a vector [1,4,3,2]
v # print v in the console
## [1] 1 4 3 2
u=1:4 # generate the vector 1,2,3,4 and save it as u
u # print u in the console
## [1] 1 2 3 4
u=c(1,2,3,4)
u # print u in the console
## [1] 1 2 3 4

Element-wise operations for vectors

v+2
## [1] 3 6 5 4
v*2
## [1] 2 8 6 4
u^2
## [1]  1  4  9 16
u+v
## [1] 2 6 6 6
u*v
## [1] 1 8 9 8
## logical arguments
v
## [1] 1 4 3 2
v>2
## [1] FALSE  TRUE  TRUE FALSE
v==2
## [1] FALSE FALSE FALSE  TRUE
v>=2
## [1] FALSE  TRUE  TRUE  TRUE

Subsetting vectors

v = c(1,4,3,2)
v[2] # the second element of v
## [1] 4
v[c(1,4)] # the first and fourth element of v
## [1] 1 2
v[1:3] # the first, second, and third element of v
## [1] 1 4 3
v[c(FALSE,FALSE,TRUE,TRUE)] # the third and fourth element of v
## [1] 3 2
## subsetting using logical arguments
v
## [1] 1 4 3 2
which(v>2) # outputs c(2,3) which are the locations where v>2
## [1] 2 3
v[which(v>2)] # takes c(2,3) as stated above and gives the v linked to them
## [1] 4 3
v[c(2,3)] # gives 4,3 again
## [1] 4 3
v[v>2] # gives 4,3 again
## [1] 4 3

Use some basic R functions

Almost everything in R is done through functions! Here we highlight some essential functions for simple computations.

v = c(1.2,4.6,3.3,2.8) 
sqrt(v)
## [1] 1.095445 2.144761 1.816590 1.673320
log(v)
## [1] 0.1823216 1.5260563 1.1939225 1.0296194
round(v)
## [1] 1 5 3 3
round(v, digits=1)
## [1] 1.2 4.6 3.3 2.8
sum(v)
## [1] 11.9
mean(v)
## [1] 2.975
hist(v)

w = 3*v + 2
plot(v, w)

u = c("Jane","Jane","John","Jane","John")
table(u)
## u
## Jane John 
##    3    2

Data frames

A data frame is a table in which each column contains values of one variable and each row contains one set of values from each column. Suppose we have measured “height” (in cm) and determined “sex” (m/f) for 16 individuals.

height <- c(184.0, 174.2, 166.6, 193.2, 173.8, 166.4, 175.4, 183.3, 159.4, 171.8, 179.2, 165.8, 170.4, 178.1, 171.4, 159.7)
sex <- c('m', 'm', 'm', 'm', 'm', 'm', 'm', 'm', 'f', 'f', 'f', 'f', 'f', 'f', 'f', 'f')

We create a data frame by running the following line:

dat = data.frame(height, sex)
dat
##    height sex
## 1   184.0   m
## 2   174.2   m
## 3   166.6   m
## 4   193.2   m
## 5   173.8   m
## 6   166.4   m
## 7   175.4   m
## 8   183.3   m
## 9   159.4   f
## 10  171.8   f
## 11  179.2   f
## 12  165.8   f
## 13  170.4   f
## 14  178.1   f
## 15  171.4   f
## 16  159.7   f

The environment stores the data.frame in an object called dat. This object is the combination of height and sex, which are also stored as vector objects. This can easily create confusion! To remove the objects height and sex, we can use the function rm().

rm(sex)
rm(height)

The two objects have disappeared from the list in the Environment window on the right. That is, if we would ask for height (or sex), R will return an error.

The information that was stored in height and sex still is available in dat. We can access that information using the $ sign as follows:

dat$height
##  [1] 184.0 174.2 166.6 193.2 173.8 166.4 175.4 183.3 159.4 171.8 179.2 165.8
## [13] 170.4 178.1 171.4 159.7
dat$sex
##  [1] "m" "m" "m" "m" "m" "m" "m" "m" "f" "f" "f" "f" "f" "f" "f" "f"

We can add a column by running dat$new_column_name = the vector we want to attach to. For instance,

dat$smoking_status = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0)
dat
##    height sex smoking_status
## 1   184.0   m              0
## 2   174.2   m              0
## 3   166.6   m              0
## 4   193.2   m              0
## 5   173.8   m              0
## 6   166.4   m              1
## 7   175.4   m              0
## 8   183.3   m              0
## 9   159.4   f              1
## 10  171.8   f              0
## 11  179.2   f              0
## 12  165.8   f              1
## 13  170.4   f              1
## 14  178.1   f              0
## 15  171.4   f              1
## 16  159.7   f              0

We can subset the data frame similarly to the vector

dat[c(1,3,4), c(1,2)] # subset 1,3,4th row and 1,2th column
##   height sex
## 1  184.0   m
## 3  166.6   m
## 4  193.2   m
dat[c(1,3,4), ] # subset 1,3,4th row
##   height sex smoking_status
## 1  184.0   m              0
## 3  166.6   m              0
## 4  193.2   m              0
dat[, c(1,2)] # subset 1,2th column
##    height sex
## 1   184.0   m
## 2   174.2   m
## 3   166.6   m
## 4   193.2   m
## 5   173.8   m
## 6   166.4   m
## 7   175.4   m
## 8   183.3   m
## 9   159.4   f
## 10  171.8   f
## 11  179.2   f
## 12  165.8   f
## 13  170.4   f
## 14  178.1   f
## 15  171.4   f
## 16  159.7   f

Import and save data

Set the working directory

The working directory is the default location of any files you read into R or save out of R. You can only have one working directory active at any given time. The active working directory is called your current working directory.

To see your current working directory, use getwd()

To set a new working directory, use setwd(path_to_the_folder)or manually browse to the desirable folder in the lower-right R-studio panel and click files> setting > set as working directory in RStudio to set the folder as the working directory.

Import data

Text file

For the tab-delimited text file, you can use the read.table() function:

df <- read.table(file = "file.txt", header = TRUE)

The argument header = TRUE tells R that the first row of the text file contains the variable names.

CSV file

Another commonly used data format is the comma separated values (CSV) file. To import the csv file in R you can use the read.csv() function:

df2 <- read.csv(file = "file.csv", header = TRUE)

R data

.RData format is used to save multiple R objects. To load an .RData file use the load() function.

# Load objects in myFile.RData into my workspace
load(file = "myFile.RData")
Other files

There are many other types of data files that can be imported in R, which we will not discuss here. Usually, it is a matter of finding the right function and syntax. You may tryFile> Import Dataset> From Text (readr) from the dropdown menu of RStudio which usually works well.

Save data

We can save the data frame into a tab-delimited text file or a CSV file. For instance,

write.csv(x = df, file = "new_csv_file.csv", row.names = FALSE)

exports the dataframe df to new_csv_file.csv.

You can save (multiple) objects in your workspace as .RData file.

# Save two objects as a new .RData file in the data folder of my current working directory
save(list = c("v","w"), file="myFile.RData") #save v,w as myFile.RData (saved in my working directory folder)
load("myFile.RData") # load v,w into the R workspace