E115L Week 1: R basics

by Grace Yuh Chwen Lee, Winter 2025


What is R?

R is a programming language for statistical computing and data visualization. It is widely used in diverse disciplines, from statistics all the way to…. biology! In addition to core R, a large number of extension packages significantly augments the capabilities of R (and you can be an R developer too!).

R is free and open-sourced.


R Studio Windows

When working in R Studio, you will generally see 4 windows:

  1. Source: This is where you write your code. It will not be evaluated until it is run in the console. You can generate a new .R file by clicking on File > New File > R Script.

    To run a line of code, you could select the line and either (1) hit run on the top right corner of Source or (2) hit command+enter on your keyboard.

  2. Console: This is where your code from the source is evaluated by R. You can also type code directly into the console and hit enter on your keyboard for quick calculations that you don’t need to save. We do not recommend doing this though.

  3. Environment/History: The environment tab shows variables you have defined (see below) in your working space. The history tab shows your command history.

  4. Files/Plots/Packages/Help: Here, you can view your file directories, plots, packages, and access R Help.

You can change the order and size of these windows as well as minimize/maximize certain windows according to your preferences.


How to annotate your code in R?

It is very important to annotate your code (stuff you type in source). This helps you remember why you did something and your interpretations.

To annotate code, simply add # at the beginning of a line.

Try typing the following in R console, hit and compare their difference.

This is an instruction
#This is an instruction
Q1

What output did you get and why they are different?

Start a new R source document for the answers of this tutorial, annotate the section and number of question before writing your answer as follows:

  • ##Name:

  • Questions in that section numbered (e.g. #Q1), followed by the answer to the question.

As an example

## Name: Grace Lee
# Q1:
This is an instruction 
# Include your output in console here

#This is an instruction 
# include your output in console here
# explain why you think they are different
For all lab reports, please turn in both your results (calculated numbers on console, exported files, or plots) and your annotated code.

Basic Concepts

Variable

A quantity, quality, or property one can measure.

Example: Your true height and weight are two variables.

Value

The state of a variable when measured at a particular time under a particular condition. This measurement could vary between measurements made at different times, under different conditions, or by different people.

Example: Your measured height and weight

Observation

A set of measurements made under (hopefully) constant condition. An observation will contain several values, each with a different variable.

Example: Your height and weight measured this morning.

When talking to R, we need to define a variable and then tell R the values of that variable.

Try executing the following code in R Source.

heightS = c(156, 168, 182, 170, 145, 201)

Here, heightS is a variable and 156, 168, 182, 170, 145, 201 are values of that variable. We also refer to heightS as R object.

We will learn about what c( ) and = mean in a second.

Types of variables

Categorical: variables that can only take one of a small set of values

Example: Sunny, Cloudy, Rainy

Example: treatment group (control - no drug vs treatment - with drug)

Numerical: variables that can take a wide range of values. Can be discrete or continuous

Example: % of cloud coverage

Example: number of colonies, distance from colonies


Arithmetic operators

Arithmetic operators are used with numeric values to perform common mathematical operations, just like a calculator. Here are some you might use:

Operator Name Example
+ Addition x + y
- Subtraction x - y
* Multiplication x * y
/ Division x / y
^ Exponent x ^ y
Q2

Try the following - do they give the same answer? Does the order of operation rules apply here?

25*5+3*2
25*(5+3)*2

Assignment operators (= or <-)

Assignment operators are used to assign values to variables.

A variable name must start with a letter and can be a combination of letters, digits, period(.), and underscore(_). It cannot start with a number or underscore (_). Variable names are case-sensitive and reserved words cannot be used as variables (you will learn those in a second!).

You can use either = or <- to assign values to a variable.

When you assign a value to a variable, it gets added to your environment (it will show up on the Environment window in R studio).

Try assign your/your friend’s height and weight to height and weight.

height = YOUR OR YOUR FRIENDS HEIGHT in meter
weight = YOUR OR YOUR FRIENDS WEIGHT in kg
Q3

Look at the Environment window in R studio. What variables show up there and what their values are?

Tip: use nameS for vectors (stay tuned!)

You can also combine mathematical operation and assignment at the same time.

BMI = weight/height^2

When assign values to a variable, it will not show up in R console. To know the values of a variable, either check the Environment window or simply type the variable name in the console.

Q4

What is your/your friend’s BMI? Remember to include code (commented) for how you get to the answers.

You can also do calculations using an assigned variable later. This comes in handy and will be used again and again in class. For example:

a = 2*37/13
a
[1] 5.692308
a*123
[1] 700.1538
Q5

What is your/your friend’s height in feet? Trying using the height variable defined above. One meter = 3.28 feet.


R functions

R has many useful built-in functions to perform statistical tests, as well as organize your data.

To know the details of a function, type ?fun or help(fun), where fun is the function you would like to learn about. In R studio, this is pop up in one of the window.

Try executing ?mean or help(mean)

Try the following code. Here heightS is a vector, which we will learn about in a second.

heightS
mean(heightS)
max(heightS)

You can either provide the variable in the order of the function, or by providing the names of the variables, in which case you could write the variables in any order.

rnorm(1000, 0, 5)
rnorm(n = 1000, mean = 0, sd = 5) 
Q6

Briefly describe what the above two functions are doing and whether they are the same. You can start with ?rnorm to figure out what the function is doing.


Vectors

Vectors are series of objects (it can be of length 1) in R. They have only one dimension, their length.

Check whether an R object is a vector by is.vector(). It will return logical TRUE or FLASE

is.vector(heightS)
[1] TRUE

A single value is a vector of length 1.

is.vector(height)
[1] TRUE

Vectors can contain many elements. There are several ways to create vectors.

v1 = c(1, 3, 5, 7)
v2 = 1:10
v3 = (1:4)*2-1
v4 = (0:10)*2
v5 = seq(0,20,2)
v6 = rep(3, 4)
Q7

What is each of these lines of code doing?

Hint: execute each line of code or look at the environment window

Hint: remember how to learn about a function

You can do operations directly with vectors. For example:

v1 +2
[1] 3 5 7 9
v2*2
 [1]  2  4  6  8 10 12 14 16 18 20
v1*v3
[1]  1  9 25 49

When doing operations on vectors, it is operating on each element.

Q8

Do you think you could do v1+v2? Or v1+v3?

Try it! Write out what you get and why you think it is working/not working.

Q9

Write 6 temperatures (32, 69, 72, 80, 100, 621) in Fahrenheit and assign them to a vector called temp_fS. Convert them into Celsius and save the results in temp_cS.

some useful R functions for vectors

heightS
[1] 156 168 182 170 145 201
sum(heightS)
[1] 1022
length(heightS)
[1] 6
max(heightS)
[1] 201
min(heightS)
[1] 145
mean(heightS)
[1] 170.3333
median(heightS)
[1] 169
var(heightS)
[1] 385.8667
summary(heightS)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  145.0   159.0   169.0   170.3   179.0   201.0 

Data Types

Date Type Example
Numeric 10.5, 20.8, 1000.33, 0.234
Integer 1, 3, 5, 6, 12
Character “Year”, “2030”, “Anteater”
Logical (AKA Boolean) TRUE, FALSE

You need " and " before and after a character phrase.

You can use the class() function to check what type of an R object. This is important because different classes of data behave differently. For example, you cannot form mathematical operations on character objects.

Q10

Why you get an error message for one of the following two codes?

noS = c(10, 20)

class(noS)

sum(noS)
chaS = c("today", "tomorrow", "2024")

class(chaS)

sum(chaS)

Note that vector can only take one data type. If one of the element in a vector is character, it will be converted entirely (for all elements) to character.

mixS = c("Friday", 5, 2024, "2024")

class(mixS)
[1] "character"

Logical variables

R understands TRUE or FALSE (no quotation marks) as logical statements (T and F also works). These are very useful for data manipulation, which we will learn about in a second (and use throughout the course).

logicS = c(TRUE, TRUE, FALSE, TRUE)
logicS
[1]  TRUE  TRUE FALSE  TRUE

Comparison operators

Comparison operators are used to compare two values. Here are some commonly used ones.

Operator Name Example
== Equal x == y
!= Not equal x != y
> Greater than x > y
< Less than x < y
>= Greater than or equal to x >= y
<= Less than or equal to x <= y
Q11

Evaluate whether the following statements are true or false in R

if height is strictly greater than 1

if height is not equal to 1.5

You can also use comparison operators on vectors.

Q12

Try the following code and explain how comparison operators act on vectors.

v1 < v6

v1 == 3

Logical operators

Logical operators are used to combine conditional statements. Here are some commonly used:

Operator Description
& AND
| OR
! NOT
a = 3
b = 2
a == 1 & b == 2
[1] FALSE
a == 1 | b == 2
[1] TRUE
Q13

Call out the heightS vector. Write a logic statement to find out those that are (1) greater than or equal to 170 and (2) even.

Hint: %% operator returns the remainder. e.g., 1 %% 2 will return 1

Isolating part of a vector (data filtering)

Important

This is very important for analyzing your data in the future. Please make sure you fully understand this part!!

We use [ ] to isolate the part of the vector we want.

heightS
[1] 156 168 182 170 145 201
heightS[2]
[1] 168

[2] tells her to return the second element of the heightS vector.

To get more than one element of a vector, we can use another vector to do that.

v = 1:3
heightS[v]
[1] 156 168 182

or you could simply write it as the following

heightS[1:3]
[1] 156 168 182

Try the following code and see if you could tell what each line is filtering?

y = 5:7
y[1]
y[2]
y[3]
y[1:2]
y[c(1,3)]
y[c(FALSE,TRUE,TRUE)]
Q14

Make a vector that goes from 11 to 81, every 5 numbers (that is 11, 16). Get the first, third, and tenth elements in a new vector called vector_test

Finally, and importantly, we can use logical vectors to filter out data.

logic_v = heightS %% 3 == 0 & heightS > 150
logic_v
[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE
heightS[logic_v]
[1] 156 168 201
Q15

Explain what the above code is doing.

Assign your logical test in Q13 into a logical variable, and use that to isolate elements in heightS that meet that criteria.


Homework

  1. Provide your answers to the above questions in a R file. Remember to comment your code.