RStudio is a four-pane workspace for
Top-left panel: Code editor allowing you to create and open a file containing R script. The R script is where you keep a record of your work. You can also type R commands here which we will do.
Bottom-left panel: R console for typing R commands
Top-right panel:
Bottom-right panel:
Arithmetic Operators include:
| Operator | Description |
|---|---|
| + | addition |
| - | subtraction |
| ***** | multiplication |
| / | division |
| ^ or ** | exponentiation |
Logical Operators include:
| Operator | Description |
|---|---|
| > | greater than |
| >= | greater than or equal to |
| == | exactly equal to |
| != | not equal to |
3+2
## [1] 5
3-2
## [1] 1
3*2
## [1] 6
3/2
## [1] 1.5
3^2
## [1] 9
2>2 # FALSE
## [1] FALSE
2>=2 # TRUE
## [1] TRUE
2==2 # TRUE
## [1] TRUE
2!=2 # FALSE
## [1] FALSE
a=4 # save 4 in a
b=3 # save 3 in b
c=a*b # compute a*b and save a*b in c
c # print c in the console
## [1] 12
d = (2>=4) # compute 2>=4 and save it as d
d # print d in the console
## [1] FALSE
v=c(1,4,3,2) # create a vector [1,4,3,2]
v # print v in the console
## [1] 1 4 3 2
u=1:4 # generate the vector 1,2,3,4 and save it as u
u # print u in the console
## [1] 1 2 3 4
u=c(1,2,3,4)
u # print u in the console
## [1] 1 2 3 4
v+2
## [1] 3 6 5 4
v*2
## [1] 2 8 6 4
u^2
## [1] 1 4 9 16
u+v
## [1] 2 6 6 6
u*v
## [1] 1 8 9 8
## logical arguments
v
## [1] 1 4 3 2
v>2
## [1] FALSE TRUE TRUE FALSE
v==2
## [1] FALSE FALSE FALSE TRUE
v>=2
## [1] FALSE TRUE TRUE TRUE
v = c(1,4,3,2)
v[2] # the second element of v
## [1] 4
v[c(1,4)] # the first and fourth element of v
## [1] 1 2
v[1:3] # the first, second, and third element of v
## [1] 1 4 3
v[c(FALSE,FALSE,TRUE,TRUE)] # the third and fourth element of v
## [1] 3 2
## subsetting using logical arguments
v
## [1] 1 4 3 2
which(v>2) # outputs c(2,3) which are the locations where v>2
## [1] 2 3
v[which(v>2)] # takes c(2,3) as stated above and gives the v linked to them
## [1] 4 3
v[c(2,3)] # gives 4,3 again
## [1] 4 3
v[v>2] # gives 4,3 again
## [1] 4 3
Almost everything in R is done through functions! Here we highlight some essential functions for simple computations.
sqrt(), calculates the square root of a (set of)
numeric value(s)
log(), calculates the natural log of a (set of)
numeric value(s)
round(), rounds a (set of) numeric value(s) to the
specified number of decimal places (default 0).
sum(), calculates the sum of a set of numeric
values
mean(), calculates the mean of a set of numeric
values
hist(), draws a histogram from a set of numeric
values;
plot(), draws a scatterplot from two
vectors;
table(), creates a frequency table from a set of
numeric or character values.
v = c(1.2,4.6,3.3,2.8)
sqrt(v)
## [1] 1.095445 2.144761 1.816590 1.673320
log(v)
## [1] 0.1823216 1.5260563 1.1939225 1.0296194
round(v)
## [1] 1 5 3 3
round(v, digits=1)
## [1] 1.2 4.6 3.3 2.8
sum(v)
## [1] 11.9
mean(v)
## [1] 2.975
hist(v)
w = 3*v + 2
plot(v, w)
u = c("Jane","Jane","John","Jane","John")
table(u)
## u
## Jane John
## 3 2
A data frame is a table in which each column contains values of one variable and each row contains one set of values from each column. Suppose we have measured “height” (in cm) and determined “sex” (m/f) for 16 individuals.
height <- c(184.0, 174.2, 166.6, 193.2, 173.8, 166.4, 175.4, 183.3, 159.4, 171.8, 179.2, 165.8, 170.4, 178.1, 171.4, 159.7)
sex <- c('m', 'm', 'm', 'm', 'm', 'm', 'm', 'm', 'f', 'f', 'f', 'f', 'f', 'f', 'f', 'f')
We create a data frame by running the following line:
dat = data.frame(height, sex)
dat
## height sex
## 1 184.0 m
## 2 174.2 m
## 3 166.6 m
## 4 193.2 m
## 5 173.8 m
## 6 166.4 m
## 7 175.4 m
## 8 183.3 m
## 9 159.4 f
## 10 171.8 f
## 11 179.2 f
## 12 165.8 f
## 13 170.4 f
## 14 178.1 f
## 15 171.4 f
## 16 159.7 f
The environment stores the data.frame in an object
called dat. This object is the combination of
height and sex, which are also stored as
vector objects. This can easily create confusion! To remove the objects
height and sex, we can use the function
rm().
rm(sex)
rm(height)
The two objects have disappeared from the list in the Environment
window on the right. That is, if we would ask for height
(or sex), R will return an error.
The information that was stored in height and
sex still is available in dat. We can access
that information using the $ sign as follows:
dat$height
## [1] 184.0 174.2 166.6 193.2 173.8 166.4 175.4 183.3 159.4 171.8 179.2 165.8
## [13] 170.4 178.1 171.4 159.7
dat$sex
## [1] "m" "m" "m" "m" "m" "m" "m" "m" "f" "f" "f" "f" "f" "f" "f" "f"
We can add a column by running
dat$new_column_name = the vector we want to attach to. For
instance,
dat$smoking_status = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0)
dat
## height sex smoking_status
## 1 184.0 m 0
## 2 174.2 m 0
## 3 166.6 m 0
## 4 193.2 m 0
## 5 173.8 m 0
## 6 166.4 m 1
## 7 175.4 m 0
## 8 183.3 m 0
## 9 159.4 f 1
## 10 171.8 f 0
## 11 179.2 f 0
## 12 165.8 f 1
## 13 170.4 f 1
## 14 178.1 f 0
## 15 171.4 f 1
## 16 159.7 f 0
We can subset the data frame similarly to the vector
dat[c(1,3,4), c(1,2)] # subset 1,3,4th row and 1,2th column
## height sex
## 1 184.0 m
## 3 166.6 m
## 4 193.2 m
dat[c(1,3,4), ] # subset 1,3,4th row
## height sex smoking_status
## 1 184.0 m 0
## 3 166.6 m 0
## 4 193.2 m 0
dat[, c(1,2)] # subset 1,2th column
## height sex
## 1 184.0 m
## 2 174.2 m
## 3 166.6 m
## 4 193.2 m
## 5 173.8 m
## 6 166.4 m
## 7 175.4 m
## 8 183.3 m
## 9 159.4 f
## 10 171.8 f
## 11 179.2 f
## 12 165.8 f
## 13 170.4 f
## 14 178.1 f
## 15 171.4 f
## 16 159.7 f
The working directory is the default location of any files you read into R or save out of R. You can only have one working directory active at any given time. The active working directory is called your current working directory.
To see your current working directory, use getwd()
To set a new working directory, use
setwd(path_to_the_folder)or manually browse to the
desirable folder in the lower-right R-studio panel and click
files> setting > set as working directory in RStudio
to set the folder as the working directory.
For the tab-delimited text file, you can use the
read.table() function:
df <- read.table(file = "file.txt", header = TRUE)
The argument header = TRUE tells R that the first row of
the text file contains the variable names.
Another commonly used data format is the comma separated values (CSV)
file. To import the csv file in R you can use the
read.csv() function:
df2 <- read.csv(file = "file.csv", header = TRUE)
R data
.RData format is used to save multiple R objects. To
load an .RData file use the load()
function.
# Load objects in myFile.RData into my workspace
load(file = "myFile.RData")
There are many other types of data files that can be imported in R,
which we will not discuss here. Usually, it is a matter of finding the
right function and syntax. You may
tryFile> Import Dataset> From Text (readr) from the
dropdown menu of RStudio which usually works well.
We can save the data frame into a tab-delimited text file or a CSV file. For instance,
write.csv(x = df, file = "new_csv_file.csv", row.names = FALSE)
exports the dataframe df to
new_csv_file.csv.
You can save (multiple) objects in your workspace as
.RData file.
# Save two objects as a new .RData file in the data folder of my current working directory
save(list = c("v","w"), file="myFile.RData") #save v,w as myFile.RData (saved in my working directory folder)
load("myFile.RData") # load v,w into the R workspace