Introduction
This article is the training materials of our second meetup “Data Structures in R”. The aim of this meetup is to introduce you to the basic structure in programming with R.
We are covering : 1. Vectors 2. Matrix 3. Dataframes 4. Lists 5. Arrays
1.Vectors
Vector is a basic structure in R. They can contain a single member or multiple members.
1.1 How to create a vector
Vectors with multiple elements
Using c() function
Vectors are generally created using the c() function. They can be : numeric, logic, characters or factors.
Let’s create our first vectors
## [1] 1 2 3 6 8
## [1] TRUE FALSE TRUE TRUE
## [1] "Apple" "Banana" "Orange"
Using seq() function
You can also create a vector using seq() function. In seq() you specify the step using by.
## [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
Or you can also specify the length of the vector using length.out
## [1] 1.000000 3.333333 5.666667 8.000000
Using rep() function
You can also create a vector using rep() function combined with each or times
## [1] 1 2 3 1 2 3 1 2 3
## [1] 1 1 1 2 2 2 3 3 3
1.2 How to know the type and length of a vector
To know the type of a vector, you use a function called typeof() and to know the length, you use a function called length().
## [1] "double"
## [1] "character"
## [1] 3
1.3 How to combine vectors
Vectors can be combined via the function c. For examples, the following two vectors h1 and h2 are combined into a new vector containing elements from both vectors.
## [1] "banana" "orange" "cherry" "strawberry" "kiwi"
1.4 How to access an element of a vector
Vector index in R starts from 1, unlike most programming languages where index start from 0.
We can use a vector of integers as index to access specific elements.
We can also use negative integers to return all elements except that those specified.
## [1] "banana"
## [1] 54
1.5 How to modify an element of a vector
We can modify a vector using the assignment operator.
## [1] "banana" "strawberry" "hello"
## [1] 1 22 3 100 96
1.7 Vector arithmetics
You can do all the arithmetics operations within vectors :
## [1] 3 5 11 14 13
## [1] 2 4 6 10 12
## [1] -1 -1 -5 -4 -1
## [1] 0.5000000 0.6666667 0.3750000 0.5555556 0.8571429
2.Matrices
2.1 How to create a matrix
Matrices are R objects in which the elements are arranged in a two-dimensional array.
A matrix can be considered as a join of vectors. Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.
## x y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
## [,1] [,2] [,3]
## x 1 2 3
## y 10 11 12
The basic syntax for creating a matrix in R is : matrix(data, nrow, ncol, byrow, dimnames)
- data is the input vector which becomes the data elements of the matrix.
- nrow is the number of rows to be created.
- ncol is the number of rows to be created.
- byrow is a logical clue. if TRUE then the input vector elements are arranged by row.
- dimnames are the names assigned to the rows and columns.
Matrices can be created directly from vectors by adding a dimension attribute.
## [1] 2 6
Let’s consider a matrix P. Let’s define the column and row names :
rownames <- c("gene1", "gene2", "gene3", "gene4")
colnames <- rep(c("control", "treatment"), times = 2, each=1)Let’s create the matrix :
## control treatment control treatment
## gene1 1 2 3 4
## gene2 5 6 7 8
## gene3 9 10 11 12
## gene4 13 14 15 16
2.2 Access to an element of the matrix
To access the element at 1st row, 3rd column :
## [1] 3
Let’s access all the rows of column 4:
## gene1 gene2 gene3 gene4
## 4 8 12 16
Let’s access the 2nd row of all the columns:
## control treatment control treatment
## 5 6 7 8
2.3 Matrix Multiplication
R has two multiplication operators for matrices :
The first one is denoted by
*which is the same as a simple multiplication sign. This operation does a simple element by element multiplication up to matrices.The second operator is denoted by
%*%and it performs a matrix multiplication between the two matrices.
Let’s see an example:
M is a matrix of 4 rows and 3 columns.
## [,1] [,2] [,3]
## [1,] 3 7 11
## [2,] 4 8 12
## [3,] 5 9 13
## [4,] 6 10 14
What’s the dimension of M ? NOTE: The dimensions of a matrix are the number of rows by the number of columns. If a matrix has a rows and b columns, it is an a×b matrix.
## [1] 4 3
P is a matrix of 4 rows and 3 columns
## [,1] [,2] [,3]
## [1,] 20 24 28
## [2,] 21 25 29
## [3,] 22 26 30
## [4,] 23 27 31
NOTE: P and M have same dimensions so we can do element by element multiplication.
## [,1] [,2] [,3]
## [1,] 60 168 308
## [2,] 84 200 348
## [3,] 110 234 390
## [4,] 138 270 434
Let’s move to the algebric matrix multiplication in R.
N is a matrix of 3 rows and 4 columns
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
What’s the dimension of N ?
## [1] 3 4
Let’s do matrix multiplication in R :
## [,1] [,2] [,3] [,4]
## [1,] 50 113 176 239
## [2,] 56 128 200 272
## [3,] 62 143 224 305
## [4,] 68 158 248 338
3.Data Frames
Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications.
Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.
Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers or all numeric).
3.1 How to create a data frame
Data frames are usually created by reading in a dataset using the read.table() or read.csv(). However, data frames can also be created explicitly with the data.frame() function.
# Create the data frame
data <- data.frame(
Gender = rep(c("Male", "Female"), times = c(4, 6)),
Height = c(152, 171.5, 165, 170, 155, 167, 172.1, 165.8, 167, 181),
Weight = c(61, 93, 78, 61, 45, 57, 61, 55, 56, 71),
Age = c(22, 38, 26, 60, 25, 17, 26, 18, 17, 55),
Married = rep(c("Yes", "No"), times = c(6, 4))
)
print(data)## Gender Height Weight Age Married
## 1 Male 152.0 61 22 Yes
## 2 Male 171.5 93 38 Yes
## 3 Male 165.0 78 26 Yes
## 4 Male 170.0 61 60 Yes
## 5 Female 155.0 45 25 Yes
## 6 Female 167.0 57 17 Yes
## 7 Female 172.1 61 26 No
## 8 Female 165.8 55 18 No
## 9 Female 167.0 56 17 No
## 10 Female 181.0 71 55 No
## [1] 10
## [1] 5
R objects can have names, which is very useful for writing readable code and self-describing objects.
Note that for data frames, there is a separate function for setting the row names, the row.names() function. Also, data frames do not have column names, they just have names (like lists). So to set the column names of a data frame just use the names() function.
3.2 How to set column names and row names of a data frame
## [1] "Gender" "Height" "Weight" "Age" "Married"
row.names(data) <- c("1st row", "2nd row", "3rd row","4th row","5th row","6th row",
"7th row","8th row","9th row","10th row")
data## Gender Height Weight Age Married
## 1st row Male 152.0 61 22 Yes
## 2nd row Male 171.5 93 38 Yes
## 3rd row Male 165.0 78 26 Yes
## 4th row Male 170.0 61 60 Yes
## 5th row Female 155.0 45 25 Yes
## 6th row Female 167.0 57 17 Yes
## 7th row Female 172.1 61 26 No
## 8th row Female 165.8 55 18 No
## 9th row Female 167.0 56 17 No
## 10th row Female 181.0 71 55 No
3.3 Header of the data frame
Running this you will get the header of your dataset to get information about the variables and their values.
## Gender Height Weight Age Married
## 1st row Male 152.0 61 22 Yes
## 2nd row Male 171.5 93 38 Yes
## 3rd row Male 165.0 78 26 Yes
## 4th row Male 170.0 61 60 Yes
## 5th row Female 155.0 45 25 Yes
## 6th row Female 167.0 57 17 Yes
## Gender Height Weight Age Married
## 5th row Female 155.0 45 25 Yes
## 6th row Female 167.0 57 17 Yes
## 7th row Female 172.1 61 26 No
## 8th row Female 165.8 55 18 No
## 9th row Female 167.0 56 17 No
## 10th row Female 181.0 71 55 No
3.4 Structure of the data frame
To get information about the structure of dataset (i.e if variable is numeric or factor).
## 'data.frame': 10 obs. of 5 variables:
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Height : num 152 172 165 170 155 ...
## $ Weight : num 61 93 78 61 45 57 61 55 56 71
## $ Age : num 22 38 26 60 25 17 26 18 17 55
## $ Married: chr "Yes" "Yes" "Yes" "Yes" ...
## Gender Height Weight Age
## Length:10 Min. :152.0 Min. :45.00 Min. :17.0
## Class :character 1st Qu.:165.2 1st Qu.:56.25 1st Qu.:19.0
## Mode :character Median :167.0 Median :61.00 Median :25.5
## Mean :166.6 Mean :63.80 Mean :30.4
## 3rd Qu.:171.1 3rd Qu.:68.50 3rd Qu.:35.0
## Max. :181.0 Max. :93.00 Max. :60.0
## Married
## Length:10
## Class :character
## Mode :character
##
##
##
3.5 Missing values in a data frame
Check the data for missing values.
## Gender Height Weight Age Married
## 0 0 0 0 0
## Gender Height Weight Age Married
## 1st row FALSE FALSE FALSE FALSE FALSE
## 2nd row FALSE FALSE FALSE FALSE FALSE
## 3rd row FALSE FALSE FALSE FALSE FALSE
## 4th row FALSE FALSE FALSE FALSE FALSE
## 5th row FALSE FALSE FALSE FALSE FALSE
## 6th row FALSE FALSE FALSE FALSE FALSE
## 7th row FALSE FALSE FALSE FALSE FALSE
## 8th row FALSE FALSE FALSE FALSE FALSE
## 9th row FALSE FALSE FALSE FALSE FALSE
## 10th row FALSE FALSE FALSE FALSE FALSE
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1] 0
3.6 Adding and removing columns from a data frame
There are many different ways of adding and removing columns from a data frame.
# Ways to add a column
data$Status <- c("student", "employee", "freelancer", "student", "employee", "freelancer", "student", "employee", "freelancer", "student")
# data[["Status"]] <- c("student", "employee", "freelancer", "student", "employee", "freelancer",
# "student", "employee", "freelancer", "student")
# data[,"Status"] <- c("student", "employee", "freelancer", "student", "employee", "freelancer",
# "student", "employee", "freelancer", "student")
# data$Status <- "employee" # Use the same value (employee) for all rows
# Ways to remove the column
data$Age <- NULL
data[["Age"]] <- NULL
data[,"Age"] <- NULL
# data[[4]] <- NULL
# data[,4] <- NULL
# data <- subset(data, select=-size)
data## Gender Height Weight Married Status
## 1st row Male 152.0 61 Yes student
## 2nd row Male 171.5 93 Yes employee
## 3rd row Male 165.0 78 Yes freelancer
## 4th row Male 170.0 61 Yes student
## 5th row Female 155.0 45 Yes employee
## 6th row Female 167.0 57 Yes freelancer
## 7th row Female 172.1 61 No student
## 8th row Female 165.8 55 No employee
## 9th row Female 167.0 56 No freelancer
## 10th row Female 181.0 71 No student
3.7 Reordering the columns in a data frame
## Gender Weight Height Married Status
## 1st row Male 61 152.0 Yes student
## 2nd row Male 93 171.5 Yes employee
## 3rd row Male 78 165.0 Yes freelancer
## 4th row Male 61 170.0 Yes student
## 5th row Female 45 155.0 Yes employee
## 6th row Female 57 167.0 Yes freelancer
## 7th row Female 61 172.1 No student
## 8th row Female 55 165.8 No employee
## 9th row Female 56 167.0 No freelancer
## 10th row Female 71 181.0 No student
# To actually change `data`, you need to save it back into `data`:
# data <- data[c(1,3,2,4,5)]
# Reorder by column name
data[c("Gender", "Weight", "Height","Married","Status")]## Gender Weight Height Married Status
## 1st row Male 61 152.0 Yes student
## 2nd row Male 93 171.5 Yes employee
## 3rd row Male 78 165.0 Yes freelancer
## 4th row Male 61 170.0 Yes student
## 5th row Female 45 155.0 Yes employee
## 6th row Female 57 167.0 Yes freelancer
## 7th row Female 61 172.1 No student
## 8th row Female 55 165.8 No employee
## 9th row Female 56 167.0 No freelancer
## 10th row Female 71 181.0 No student
3.8 Convert a column to a factor
We can do the following with R’s built-in functions.
data$Married[data$Married=="Yes"] <- "1"
data$Married[data$Married=="No"] <- "0"
# Convert the column to a factor
data$Married <- factor(data$Married)
data## Gender Height Weight Married Status
## 1st row Male 152.0 61 1 student
## 2nd row Male 171.5 93 1 employee
## 3rd row Male 165.0 78 1 freelancer
## 4th row Male 170.0 61 1 student
## 5th row Female 155.0 45 1 employee
## 6th row Female 167.0 57 1 freelancer
## 7th row Female 172.1 61 0 student
## 8th row Female 165.8 55 0 employee
## 9th row Female 167.0 56 0 freelancer
## 10th row Female 181.0 71 0 student
4.Lists
Lists are data structures having components of mixed data types. They allow to store a variety of objects under one name. These objects can be matrices, vectors, data frames or even other lists. The different component of a list have different length, characteristic and type. We could consider a list as some kind of super data type in R. You can practically store any piece of information in it.
4.1 How to create a list
List can be created using the list() function. The arguments to the list function are the list components. The syntax is list(comp1,comp2,...) The arguments to the list function are the list components. Remeber that these components can be matrices, vectors, data frames or even other lists. Following is an example to create a list containing strings, numbers, vectors.
list1 <- list(name = "Sami", hobbie = c("programming", "cinema", "statistics", "mathematics"), age=54, genome = "ACGTCGTACAGTGCACGTGCA")
list1## $name
## [1] "Sami"
##
## $hobbie
## [1] "programming" "cinema" "statistics" "mathematics"
##
## $age
## [1] 54
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
4.2 How to access components of a list
Lists can be accessed the same way as vectors. Indexing with [ will give us sublist not the content inside the component.
Let’s see an example
- indexing using a character vector:
## $age
## [1] 54
##
## $hobbie
## [1] "programming" "cinema" "statistics" "mathematics"
- indexing using a logical vector:
## $name
## [1] "Sami"
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
- indexing using an integer vector:
## $age
## [1] 54
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
Be aware that [ returns a list.
4.3 How to access an element of a component of a list
To retrieve the content, we need to use [[.
However, this approach will allow us to access only a single component at a time. Let’s say we want to see the name:
## [1] "Sami"
Another way of accessing content of a list is the $ operator.
## [1] "Sami"
The main advantage of using $ is the partial matching on tags that it allows. Let’s see an example:
## [1] "Sami"
Beside of selecting components, we often need to select specific elements out of these components. For example to select the first element of the second component of the list we run list1[[2]][1]
## [1] "programming"
4.4 How to modify a list in R
We can change components of a list through reassignment. For that, we can choose any of the component accessing techniques discussed above to modify it.
Be aware that modification causes reordering of components.
or
Let’s check
## $name
## [1] "Ali"
##
## $hobbie
## [1] "programming" "cinema" "statistics" "mathematics"
##
## $age
## [1] 54
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
4.5 How to add components to a list
To add new components we simply assign values using new tags.
## $name
## [1] "Ali"
##
## $hobbie
## [1] "programming" "cinema" "statistics" "mathematics"
##
## $age
## [1] 54
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
##
## $passed
## [1] FALSE
4.6 How to delete components from a list
We can delete a component by assigning NULL to it.
## $name
## [1] "Ali"
##
## $age
## [1] 54
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
##
## $passed
## [1] FALSE
4.7 How to merge Lists
You can merge many lists into one list by placing all the lists inside one list() function. Let’s create a second list.
# create a second list
list2=list("ahmed", c("programming", "music", "biology", "mathematics"), 65, "AggTTAGTAGATGTAGTCCCGTAA")
names(list2) <- c("name", "hobbie", "age", "genome")
list2## $name
## [1] "ahmed"
##
## $hobbie
## [1] "programming" "music" "biology" "mathematics"
##
## $age
## [1] 65
##
## $genome
## [1] "AggTTAGTAGATGTAGTCCCGTAA"
Let’s merge the two lists.
## $name
## [1] "Ali"
##
## $age
## [1] 54
##
## $genome
## [1] "ACGTCGTACAGTGCACGTGCA"
##
## $passed
## [1] FALSE
##
## $name
## [1] "ahmed"
##
## $hobbie
## [1] "programming" "music" "biology" "mathematics"
##
## $age
## [1] 65
##
## $genome
## [1] "AggTTAGTAGATGTAGTCCCGTAA"
5.Arrays
Arrays are the R data objects which can store data in more than two dimensions. Be aware that arrays can store the values having only a similar kind of data types.
5.1 How to create an array
An array is created using the array() function. It takes vectors as input and uses the values in the dim parameter to create an array. For example − If we create an array of dimension (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data type.
# create first vector
vector1 <- seq(5,20, 5) #seq() function generates a sequence of numbers.
# create second vector
vector2 <- seq(25,60,5)
vector1## [1] 5 10 15 20
## [1] 25 30 35 40 45 50 55 60
seq() function generates a sequence of numbers. seq(from, to, by)
- from, to: begin and end number of the sequence
- by: step, increment (Default is 1)`
Let’s create our first array of dimension (2,3,4). The syntax is the following Array_NAME <- array(data, dim = (row_Size, column_Size, matrices, dimnames)
- data – Data is an input vector that is given to the array.
- matrices – Array in R consists of multi-dimensional matrices.
- row_Size – row_Size describes the number of row elements that an array can store.
- column_Size – Number of column elements that can be stored in an array.
- dimnames – Used to change the default names of rows and columns to the user’s preference.
## , , 1
##
## [,1] [,2] [,3]
## [1,] 5 15 25
## [2,] 10 20 30
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 35 45 55
## [2,] 40 50 60
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 5 15 25
## [2,] 10 20 30
##
## , , 4
##
## [,1] [,2] [,3]
## [1,] 35 45 55
## [2,] 40 50 60
5.2 How to raname the rows and/or the columns of an array
Let’s rename our arrays to array1,array2,array3 and array4 by using array.names. Also,the rows names changed to (“row1”,“row2”) and column names will be changed to (“column1”,“column2”,“column3”) respectively.The dimension of the matrix is 2 rows and 3 columns.
column.names <- c("column1","column2","column3")
row.names <- c("row1","row2")
array.names <- c("array1","array2","array3","array4")
myfirstarray <- array(c(vector1,vector2),dim = c(2,3,4),dimnames = list(row.names,column.names,array.names))
myfirstarray## , , array1
##
## column1 column2 column3
## row1 5 15 25
## row2 10 20 30
##
## , , array2
##
## column1 column2 column3
## row1 35 45 55
## row2 40 50 60
##
## , , array3
##
## column1 column2 column3
## row1 5 15 25
## row2 10 20 30
##
## , , array4
##
## column1 column2 column3
## row1 35 45 55
## row2 40 50 60
5.3 How to access an element of an array
Let’s see how the elements in the array can be extracted with the following examples. Let’s access the number 55. The number 55 is in the first row of the third column of the second array.
## [1] 55
We can access the entire array myfirstarray with the following syntax where myfirstarray[ , ,1] specifies to include all rows and columns each separated by commas, which are indicated by space. The 1 specifies the array myfirstarray to be extracted.
## column1 column2 column3
## row1 5 15 25
## row2 10 20 30
5.4 How to create matrices from these array
Let’s visualize the first matrix
## column1 column2 column3
## row1 5 15 25
## row2 10 20 30
Let’s visualize the second matrix
## column1 column2 column3
## row1 35 45 55
## row2 40 50 60
REFERENCES
http://www.r-tutor.com/r-introduction/vector/combining-vectors https://www.datamentor.io/r-programming/vector/ https://www.programmingr.com/matrix-multiplication/
https://www.datacamp.com/community/tutorials/creating-lists-r https://www.geeksforgeeks.org/r-lists/?ref=leftbar-rightbar https://www.datacamp.com/community/tutorials/arrays-in-r https://www.tutorialspoint.com/r/r_arrays.htm