This is an instructionE115L Week 1: R basics
by Grace Yuh Chwen Lee, Spring 2024
What is R?
R is a programming language for statistical computing and data visualization. It is widely used in diverse disciplines, from statistics all the way to…. biology! In addition to core R, a large number of extension packages significantly augments the capabilities of R (and you can be an R developer too!).
R is free and open-sourced.
R Studio Windows
When working in R Studio, you will generally see 4 windows:
Source: This is where you write your code. It will not be evaluated until it is run in the console. You can generate a new .R file by clicking on File > New File > R Script.
To run a line of code, you could select the line and either (1) hit
runon the top right corner of Source or (2) hitcommand+enteron your keyboard.Console: This is where your code from the source is evaluated by R. You can also type code directly into the console and hit
enteron your keyboard for quick calculations that you don’t need to save. We do not recommend doing this though.Environment/History: The environment tab shows variables you have defined (see below) in your working space. The history tab shows your command history.
Files/Plots/Packages/Help: Here, you can view your file directories, plots, packages, and access R Help.
You can change the order and size of these windows as well as minimize/maximize certain windows according to your preferences.
How to annotate your code in R?
It is very important to annotate your code (stuff you type in source). This helps you remember why you did something and your interpretations.
To annotate code, simply add # at the beginning of a line.
Try typing the following in R console, hit and compare their difference.
#This is an instructionStart a new R source document for the answers of this tutorial, annotate the section and number of question before writing your answer as follows:
##Name:Questions in that section numbered (e.g.
#Q1), followed by the answer to the question.
As an example
## Name: Grace Lee
# Q1:
This is an instruction
# Include your output in console here
#This is an instruction
# include your output in console here
# explain why you think they are differentFor all lab reports, please turn in both your results (calculated numbers on console, exported files, or plots) and your annotated code.
Basic Concepts
Variable
A quantity, quality, or property one can measure.
Example: Your true height and weight are two variables.
Value
The state of a variable when measured at a particular time under a particular condition. This measurement could vary between measurements made at different times, under different conditions, or by different people.
Example: Your measured height and weight
Observation
A set of measurements made under (hopefully) constant condition. An observation will contain several values, each with a different variable.
Example: Your height and weight measured this morning.
When talking to R, we need to define a variable and then tell R the values of that variable.
Try executing the following code in R Source.
heightS = c(156, 168, 182, 170, 145, 201)Here, heightS is a variable and 156, 168, 182, 170, 145, 201 are values of that variable. We also refer to heightS as R object.
We will learn about what c( ) and = mean in a second.
Types of variables
Categorical: variables that can only take one of a small set of values
Example: Sunny, Cloudy, Rainy
Example: treatment group (control - no drug vs treatment - with drug)
Numerical: variables that can take a wide range of values. Can be discrete or continuous
Example: % of cloud coverage
Example: number of colonies, distance from colonies
Arithmetic operators
Arithmetic operators are used with numeric values to perform common mathematical operations, just like a calculator. Here are some you might use:
| Operator | Name | Example |
|---|---|---|
| + | Addition | x + y |
| - | Subtraction | x - y |
| * | Multiplication | x * y |
| / | Division | x / y |
| ^ | Exponent | x ^ y |
25*5-3*225*(5-3)*2Assignment operators (= or <-)
Assignment operators are used to assign values to variables.
A variable name must start with a letter and can be a combination of letters, digits, period(.), and underscore(_). It cannot start with a number or underscore (_). Variable names are case-sensitive and reserved words cannot be used as variables (you will learn those in a second!).
You can use either = or <- to assign values to a variable.
When you assign a value to a variable, it gets added to your environment (it will show up on the Environment window in R studio).
Try assign your/your friend’s height and weight to
heightandweight.
height = YOUR OR YOUR FRIENDS HEIGHT in meter
weight = YOUR OR YOUR FRIENDS WEIGHT in kgTip: use nameS for vectors (stay tuned!)
You can also combine mathematical operation and assignment at the same time.
BMI = weight/height^2When assign values to a variable, it will not show up in R console. To know the values of a variable, either check the Environment window or simply type the variable name in the console.
You can also do calculations using an assigned variable later. This comes in handy and will be used again and again in class. For example:
a = 2*34/13a[1] 5.230769
a*111[1] 580.6154
R functions
R has many useful built-in functions to perform statistical tests, as well as organize your data.
To know the details of a function, type ?fun or help(fun), where fun is the function you would like to learn about. In R studio, this is pop up in one of the window.
Try executing
?meanorhelp(mean)Try the following code. Here
heightSis a vector, which we will learn about in a second.
heightSmean(heightS)max(heightS)You can either provide the variable in the order of the function, or by providing the names of the variables, in which case you could write the variables in any order.
runif(10, 0, 100)runif(min = 0, max = 100, n = 10)Vectors
Vectors are series of objects (it can be of length 1) in R. They have only one dimension, their length.
Check whether an R object is a vector by is.vector(). It will return logical TRUE or FLASE
is.vector(heightS)[1] TRUE
A single value is a vector of length 1.
a = 1
is.vector(a)[1] TRUE
Vectors can contain many elements. There are several ways to create vectors.
v1 = c(1, 3, 5, 7)
v2 = 10:20
v3 = seq(0,20,2)
v4 = rep(3, 4)You can do operations directly with vectors. For example:
v1 +2[1] 3 5 7 9
v2*2 [1] 20 22 24 26 28 30 32 34 36 38 40
some useful R functions for vectors
heightS[1] 156 168 182 170 145 201
sum(heightS)[1] 1022
length(heightS)[1] 6
max(heightS)[1] 201
min(heightS)[1] 145
mean(heightS)[1] 170.3333
median(heightS)[1] 169
var(heightS)[1] 385.8667
summary(heightS) Min. 1st Qu. Median Mean 3rd Qu. Max.
145.0 159.0 169.0 170.3 179.0 201.0
Data Types
| Date Type | Example |
|---|---|
| Numeric | 10.5, 20.8, 1000.33, 0.234 |
| Integer | 1, 3, 5, 6, 12 |
| Character | “Year”, “2030”, “Anteater” |
| Logical (AKA Boolean) | TRUE, FALSE |
You need " and " before and after a character phrase.
You can use the class() function to check what type of an R object. This is important because different classes of data behave differently. For example, you cannot form mathematical operations on character objects.
noS = c(10, 20)
class(noS)
sum(noS)chaS = c("today", "tomorrow", "2024")
class(chaS)
sum(chaS)Note that vector can only take one data type. If one of the element in a vector is character, it will be converted entirely (for all elements) to character.
mixS = c("Friday", 5, 2024, "2024")
class(mixS)[1] "character"
Logical variables
R understands TRUE or FALSE (no quotation marks) as logical statements (T and F also works). These are very useful for data manipulation, which we will learn about in a second (and use throughout the course).
logicS = c(TRUE, TRUE, FALSE, TRUE)
logicS[1] TRUE TRUE FALSE TRUE
Comparison operators
Comparison operators are used to compare two values. Here are some commonly used ones.
| Operator | Name | Example |
|---|---|---|
| == | Equal | x == y |
| != | Not equal | x != y |
| > | Greater than | x > y |
| < | Less than | x < y |
| >= | Greater than or equal to | x >= y |
| <= | Less than or equal to | x <= y |
Logical operators
Logical operators are used to combine conditional statements. Here are some commonly used:
| Operator | Description |
|---|---|
| & | AND |
| | | OR |
| ! | NOT |
a = 3
b = 2
a == 1 & b == 2[1] FALSE
a == 1 | b == 2[1] TRUE
Isolating part of a vector (data filtering)
We use [ ] to isolate the part of the vector we want.
heightS[1] 156 168 182 170 145 201
heightS[2][1] 168
[2] tells her to return the second element of the heightS vector.
To get more than one element of a vector, we can use another vector to do that.
v = 1:3
heightS[v][1] 156 168 182
or you could simply write it as the following
heightS[1:3][1] 156 168 182
Try the following code and see if you could tell what each line is filtering?
y = 5:7
y[1]
y[2]
y[3]
y[1:2]
y[c(1,3)]
y[c(FALSE,TRUE,TRUE)]Finally, and importantly, we can use logical vectors to filter out data.
logic_v = heightS %% 3 == 0 & heightS > 150
logic_v[1] TRUE TRUE FALSE FALSE FALSE TRUE
heightS[logic_v][1] 156 168 201
Homework
- Provide your answers to the above questions in a R file. Remember to comment your code.
- Tidy up your Wed data (serial dilution) and download it as .csv file on your computer (or wherever you could access during the next Friday lab).