Chapter 1: Introduction to Basic Elements in R

R has various data containers (e.g., variables, vectors, matrices, factors, data frames, and lists) on which we run our analyses. In this chapter, we examine these basic elements systematically using examples to get you going.

1. Variables: Simple Containers

Variables have different types and usually contain a specific piece of information (i.e., a specific characteristic such as height, purchase intention, gender) about cases (i.e., experimental participants, customers, etc.). Some of the variable types/classes are: numeric, character and logical.

1.1. Defining Variables

The R codes below define three variable types discussed above, respectively:

A = 2.78
My_text = "Hello"
My_logical = FALSE

Assignments in R can also be done through <- assignment sign. For example, x <-2assigns 2 to variable x. Another way to assign a value to a variable is to use the assign() function. The name of the variable comes first and always within "", followed by the value to be given to

assign("a",2)
a

## [1] 2

1.2. Calling Variables

After a variable’s content has been assigned, we can easily call that variable to examine its contents, in different ways. For example, we can use the print() function or simply type in the name of the variable.

print (A)

## [1] 2.78

## [1] 2.78

1.3. Checking Variables’ Class

All elements in R are objects. Objects belong to classes/types which have their own specific characteristics. To identify the class of an object (e.g., the type of a variable) the function class() can be utilized as following:

class(A)

## [1] "numeric"

class(My_text)

## [1] "character"

class(My_logical)

## [1] "logical"

A more comprehensive way to examine the nature of an object is to use str() function. Compared to class()which only identifies the type of an object (e.g., whether it is numerical, character, etc.), Structure provides further information about an object. Str() is exceedingly useful in summarizing lists, data frames.

str(A)

##  num 2.78

str(My_text)

##  chr "Hello"

str(My_logical)

##  logi FALSE

Yet another alternative to summarize, is to use summary() function. Numeric variables are summarized as mean, median, and some quantiles. Categorical and logical vectors are summarized by the counts of each value. Multidimensional objects, like matrices and data frames, are summarized by column. For example, the summary() function for data frames works like calling summary on each individual column.

Most of the common classes have their own is.*() functions:

is.character("red lorry, yellow lorry")

## [1] TRUE

is.logical(FALSE)

## [1] TRUE

is.numeric(2)

## [1] TRUE

is.list(list(a = 1, b = 2))  #we learn about lists later in this this chapter.

## [1] TRUE

We can see a complete list of all the is.*() functions in thebase package using:

ls(pattern = "^is", baseenv())

##  [1] "is.array"                "is.atomic"              
##  [3] "is.call"                 "is.character"           
##  [5] "is.complex"              "is.data.frame"          
##  [7] "is.double"               "is.element"             
##  [9] "is.environment"          "is.expression"          
## [11] "is.factor"               "is.finite"              
## [13] "is.function"             "is.infinite"            
## [15] "is.integer"              "is.language"            
## [17] "is.list"                 "is.loaded"              
## [19] "is.logical"              "is.matrix"              
## [21] "is.na"                   "is.na.data.frame"       
## [23] "is.na.numeric_version"   "is.na.POSIXlt"          
## [25] "is.na<-"                 "is.na<-.default"        
## [27] "is.na<-.factor"          "is.na<-.numeric_version"
## [29] "is.name"                 "is.nan"                 
## [31] "is.null"                 "is.numeric"             
## [33] "is.numeric.Date"         "is.numeric.difftime"    
## [35] "is.numeric.POSIXt"       "is.numeric_version"     
## [37] "is.object"               "is.ordered"             
## [39] "is.package_version"      "is.pairlist"            
## [41] "is.primitive"            "is.qr"                  
## [43] "is.R"                    "is.raw"                 
## [45] "is.recursive"            "is.single"              
## [47] "is.symbol"               "is.table"               
## [49] "is.unsorted"             "is.vector"              
## [51] "isatty"                  "isBaseNamespace"        
## [53] "isdebugged"              "isIncomplete"           
## [55] "isNamespace"             "isNamespaceLoaded"      
## [57] "isOpen"                  "isRestart"              
## [59] "isS4"                    "isSeekable"             
## [61] "isSymmetric"             "isSymmetric.matrix"     
## [63] "isTRUE"

In the preceding example, ls lists what it is asked for to list; ^isis a regular expression that means match strings that begin with ’is, and baseenv is a function that simply returns the environment of the base package.

Note: is.numeric returns TRUE for integers as well as floating point values.

1.4. Changing Class of Variables

Sometimes we want to change the type of an object. This is called casting, and most is.*() functions have a corresponding as.*()function to achieve this.

for example in the code below, we first assign a character to a variable and then turn it into a number.

myvar= "1234"
class(myvar)

as.numeric(myvar) #treats myvar as a number but notice that the class of myvar remains as character. 
class(myvar)
myvar + 10  #returns error. 

as.numeric(myvar)+10  #it works.

Note: as.*() functions basically treat variables as if they belong to a particular class but do not change the class of a variable permanently. To change class of variables permanently, one can use the class() function, though this is not recommended because class assignments usually have a different and more technical use. For the illustration purposes:

myvar= "1234"
myvar + 10   #does not work because myvar is still a character variable. 

class(myvar) = "numeric"
myvar + 10  #now it works.

An alternative approach is to assign às.*() to a new variable. For example:

myvar="1234"
new_var= as.numeric(myvar)
new_var+10

## [1] 1244

1.5. Special Numbers and Logical Symbols

To help with arithmetic, R supports four special numeric values: Inf, -Inf, NaN, and NA. The first two are, of course, positive and negative infinity, but the second pair need a little more explanation. NaN is short for not-a-number and means that our calculation either didn’t make mathematical sense or could not be performed properly. NA is short for not available and represents a missing value-a problem all too common in data analysis.

In general, if our calculation involves a missing value, then the results will also be missing:

grade= c(3,5,7, NA)
mean(grade)   #results in NA, becuase of a missing value in the vector.

## [1] NA

There are functions to check for these special values. Usually, the family of is.XYZ() functions come handy again and are used for such purpose.

is.na(grade) #checks whether any of the values in the grade vector is NA (missing).

## [1] FALSE FALSE FALSE  TRUE

is.nan(grade) # Checks wether a vector has any undefined value, that is NaN.

## [1] FALSE FALSE FALSE FALSE

limits = c (Inf, 2, 5) #Inf which represents infinity is not undefined.   
is.nan(limits)

## [1] FALSE FALSE FALSE

is.infinite(limits)  #this indicates which value in the vector is infinity.

## [1]  TRUE FALSE FALSE

impossible = c(Inf/Inf, 9, 0)  #Infinity divided by infinity is not defined. So is.nan(impossible) returns a TRUE this time. 
is.nan(impossible)

## [1]  TRUE FALSE FALSE

Finally, ! is used for not, & is used for and, and | is used for or.

1.6. Formatting Numbers

To format how the numerical data are printed out, one can use the format() function. Notice that the input should be a type of numerical data but the outcome is always a character vector or array.

initial = c(1:3)
powered= exp(initial)  #generates the e^initials. 

powered  # see the output numbers.

## [1]  2.718282  7.389056 20.085537

class(powered)

## [1] "numeric"

format(powered, digits = 3, scientific = FALSE)  #digits argument sets the number of digits the final numver should have. sometimes if rounding can not do this, numbers might be longer, as in the case in this example is with 20.085537.

## [1] " 2.72" " 7.39" "20.09"

format(powered, digits = 3, scientific = TRUE)  #Scientific notification.

## [1] "2.72e+00" "7.39e+00" "2.01e+01"

2. Vectors

Another basic element in R is array or vector that combines and stores a set of data, which usually are variables of the same type. For example, all of them are usually numeric, or all of them are usually character.

2.1. Creating Vectors and Naming Their Elements

We can define vector by function c(), standing for combine. Moreover, we can assign names to elements of a vector using names()function:

some_vector = c("John Doe", "poker player")
names(some_vector) = c("Name", "Profession")

This code first creates a vector some_vector and then gives its two elements a name. The first element is labeled Name, while the second element is labeled Profession. Printing the vector leads to the following:

some_vector

##           Name     Profession 
##     "John Doe" "poker player"

we can also check out the structure of our vector:

str(some_vector)

##  Named chr [1:2] "John Doe" "poker player"
##  - attr(*, "names")= chr [1:2] "Name" "Profession"

Example 1: Quick Naming

To name elements of a vector, we can specify names directly when we create a vector in the form name = value. You can name some elements of a vector and leave others blank.

fruits= c(apple = 1, banana = 2, "kiwi fruit" = 3, 4)
fruits

##      apple     banana kiwi fruit            
##          1          2          3          4

We can also name the elements of a vector after the vector is created using the names() function. Imagine we have two numerical vectors. the first one contains the amounts in dollars that you won in Poker from Monday to Friday last week; and the other one contains the amounts in dollars that you won in Roulette in those days.

poker_vector = c(140, -50, 20, -120, 240)

roulette_vector = c(-24, -50, 100, -350, 10)

If we want to name the elements of these two vectors based on week days, because the names of the elements are the same for both of these vectors, to avoid repeating, we can first create a vector containing the names and then assign that vector as the names corresponding to each of the above vectors.

days_vector = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

names(poker_vector) = days_vector  

names(roulette_vector) = days_vector

poker_vector

##    Monday   Tuesday Wednesday  Thursday    Friday 
##       140       -50        20      -120       240

roulette_vector

##    Monday   Tuesday Wednesday  Thursday    Friday 
##       -24       -50       100      -350        10

To retrieve names of a vector, we can use the names() function again (when there is no assignment in front of this function, it returns the names values)

names(poker_vector)

## [1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"

Note: Can a vector have different types of data in it?

Sure! a vector can contain elements of different types. For example vector a contains both numeric and character information. Interestingly when we check its class, it is considered as a character vector. R has coerced this vector into a character class. However, vector b only has numeric elements and its class is numeric.

a = c(2, "Rad", 5)

b= c(4,6,9)

class(a)

## [1] "character"

class(b)

## [1] "numeric"

because the two vectors are of different class, we cannot run mathematical operations on them. For example, it is meaningless to add the two vectors and R would give us a warning message in that case.

2.2. Indexing Vectors: Calling and Selecting Specific Elements in a Vector

Let’s work with the Poker vector, we defined in the previous section.

poker_vector <- c(140, -50, 20, -120, 240)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector

Imagine we want to select elements of these vectors. In general, to select elements of a vector (and also matrices, data frames, etc.), we can use square brackets []. Between the square brackets, we indicate which element(s) to select. For example, to select the first element of the vector, we use the following code line:

poker_vector[1]

## Monday 
##    140

poker_vector[4]

## Thursday 
##     -120

It is important to notice that the first element in a vector has index of 1, not 0 as in many other programming languages.

Another way to call an element is to call its corresponding name label, instead of using its numeric location. For example, the following code also returns the first element in the vector:

poker_vector["Monday"]

## Monday 
##    140

We can also call and return more than one element of a vector. For example, if we would like to call elements 2, 3 and 4 from our vector, then the following methods will do the job for us:

poker_vector[2:4]    # the symbol a:b considers all numbers between a and b, including a and b.

##   Tuesday Wednesday  Thursday 
##       -50        20      -120

An alternative method is to use a vector of locations we are interested in within the brackets. For example:

poker_vector[c(2,3,4)]

##   Tuesday Wednesday  Thursday 
##       -50        20      -120

This method is useful, particularly when we want to call elements in an unordered manner.

poker_vector[c(1,4,5)]

##   Monday Thursday   Friday 
##      140     -120      240

The same method can be coded in more than one line as below:

element_place = c(1,4,5)
poker_vector[(element_place)]

##   Monday Thursday   Friday 
##      140     -120      240

Another way to call the elements using their names is:

poker_vector[c("Monday","Friday")]

## Monday Friday 
##    140    240

We should note that (1) we must use the brackets, and (2) within the brackets, we must use only ONE item to index locations. That item could be a simple number, a character, a vector, etc. But it is always ONE item. Therefore using a code like poker_vector[1,3,5] is wrong! because within the brackets we do not have ONE item, but three.

Another useful function is lenght(). All vectors have a length, which tells us how many elements they contain. This is a non-negative integer (yes, zero-length vectors are allowed), and you can access this value with the length() function. Below you see several examples of this function:

length(1:5)

## [1] 5

length(c(TRUE, FALSE, NA))

## [1] 3

sn <- c("Sheena", "leads", "Sheila", "needs") 
length(sn) # returns the number of strings in the vector.

## [1] 4

If instead of number of strings in a character vector, we were interested in knowing the length of each string in that vector (i.e., number of their characters), then nchar() function can be used:

sn <- c("Sheena", "leads", "Sheila", "needs") 
nchar(sn)

## [1] 6 5 6 5

While we use length() function to check the number of elements in a vector object, it is also possible to assign a new length to a vector using this function, but this is an unusual thing to do, and probably indicates bad code. If you shorten a vector, the values at the end will be removed, and if you extend a vector, missing values will be added to the end:

poincare = c(1, 0, 0, 0, 2, 0, 2, 0)  
length(poincare) = 3 
poincare

## [1] 1 0 0

length(poincare) = 8 
poincare

## [1]  1  0  0 NA NA NA NA NA

2.3. Selecting Vector Elements by Comparison

We can examine elements of a vector using logical operators. When we do this, the logical operation (e.g., a comparison) is conducted on all elements of the vector. The result of such operation is also a vector consisting TRUE or FALSE values for each comparison made. Therefore, the result is basically a logical vector of the same size as the original vector.

For example, let’s see if there is any element bigger than 500 in our poker_vector. A simple line below does the job for us:

poker_vector > 50  #when there is a TRUE in the output, it means that its corresponding element in the vector satisfies the condition.

##    Monday   Tuesday Wednesday  Thursday    Friday 
##      TRUE     FALSE     FALSE     FALSE      TRUE

Notice that our aim is beyond simply running a logical operation on every element of a vector. For example, we would like to call only those elements in the vector that satisfy a condition, or in the case of our example, those that are bigger than 50. Below, we introduce two ways to do so:

# method 1: 
poker_vector[poker_vector>50]

## Monday Friday 
##    140    240

# method 2
bigger_than_50_true_vector = poker_vector>50

poker_vector[bigger_than_50_true_vector]

## Monday Friday 
##    140    240

Note that when a logical operation/command or vector is located within the brackets as the arguments of a vector (as in the above codes) only those elements whose arguments are TRUE are returned. This is a great advantage provided by R.

2.4. Recycling and Repetition

The rep()function is very useful to create a vector with repeated elements:

rep(1:5, 3)

##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

rep(1:5, each = 3) #compare the output with the previous code.

##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

rep(1:5, times = 1:5) #interesting?

##  [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5

rep(1:5, length.out = 7) #repeates 1 to 5 until 7 numbers in total are created.

## [1] 1 2 3 4 5 1 2

rep_len(1:5, 7)  #an alternative for the above code.

## [1] 1 2 3 4 5 1 2

2.5. Pasting Vectors

The paste() function pastes the elements of its first argument, to elements of its second argument. Its output is a character vector, not numeric, even though the class of its arguments might be numeric.

paste(c(1,2,3), c(1:10))  #see the output. remeber it is a character vector.

##  [1] "1 1"  "2 2"  "3 3"  "1 4"  "2 5"  "3 6"  "1 7"  "2 8"  "3 9"  "1 10"

paste(c("hello", "hi"), c("bye", "bye"))

## [1] "hello bye" "hi bye"

paste(c("hello", "hi"), c("bye", "bye"), sep = "-") #indicating the separator type.

## [1] "hello-bye" "hi-bye"

Example 2: A Random-Number Vector and More

Create a vector containing 100 randomly generated number from a standard normal distribution, that is with mean equal to 0 and sd of 1. After doing so, (a) check whether there are numbers in that vector bigger than 1, (b) print those numbers, (c) indicate how many numbers in that vector are bigger than 1 using lenght() function. What portion of the numbers are bigger than 1?

My_Vector = rnorm(100, mean=0, sd=1)

#(a)
My_Vector > 1

##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [12]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
##  [23] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [34] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [45] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
##  [78]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE
## [100]  TRUE

#(b)
My_Vector[My_Vector>1]

##  [1] 2.125449 1.742505 1.518000 1.557337 1.559839 2.933353 1.265785
##  [8] 1.681433 1.275373 2.181931 1.297052 1.638294 1.862588 1.123541
## [15] 1.771729 1.589635 1.603114

#(c)
Bigger_than_ONE_Vector = My_Vector[My_Vector>1]
length(Bigger_than_ONE_Vector)   #returns the length of a vector.

## [1] 17

percent = length(Bigger_than_ONE_Vector)/100  #calculating percentage of numbers bigger than 1 in our sample vector. 
percent

## [1] 0.17

Note: a cool example is to repeat the above code for 5000 times. Every time, we can record the portiion of random numbers between (-1, 1) that is (-1SD, 1SD) and then calculate the respective portion of those numbers. The overall average of those portions is an estimate of the real portion of random numbers that lie between +/- 1SD in the normal distribution. The same thing can be done for +/- 2SD, and then 3SD. The later should include about 99% of all observations. We will do this exercise when we study loops.

3. Matrices: Two Dimensional Arrays of the Similar Elements

3.1. Creating a Matrix

We can make a matrix with the matrix() function. Matrices have rows and columns and therefore we need to specify their arrangements. For example, let’s begin by arranging numbers 1 to 9, in a 3x3 matrix.

matrix(1:9, byrow = TRUE, ncol = 3) # arrange numbers 1 to 9, in a 3x3 matrix, in a row-based order.

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

matrix(1:9, byrow = FALSE, ncol = 3) # arrange numbers 1 to 9, in a 3x3 matrix, in a column-based order.

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The output of matrix() function is a matrix object and can be saved into another object/variable.

my_object = matrix(1:20, byrow = TRUE, ncol = 10)  # a matrix of 2x10, that is 2 rows and 10 columns. 

str(my_object)  #indicates the structure of the matrix object.

##  int [1:2, 1:10] 1 11 2 12 3 13 4 14 5 15 ...

my_object

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]   11   12   13   14   15   16   17   18   19    20

Example 3: Combining Vectors into a Matrix

Imagine that the following three vectors specify Star Wars sales in USA and Europe in millions of dollars, for three episodes of this movie: a new hope, the empire strikes back, and return of jedi. a) create a new vector that combines all the sales and call it total sales. b) create a 3x2 matrix of Star War sales, such that the first column is USA sales and the second is Europe sales. Each row should then represent one of the episodes. c) print the matrix.

new_hope = c(460.998, 314.4)
empire_strikes = c(290.475, 247.900)
return_jedi = c(309.306, 165.8)

total_sales = c(new_hope, empire_strikes, return_jedi)
star_wars_matrix = matrix(total_sales, byrow = TRUE, ncol = 2)
star_wars_matrix

##         [,1]  [,2]
## [1,] 460.998 314.4
## [2,] 290.475 247.9
## [3,] 309.306 165.8

3.2. Naming Rows and Columns

Now that we have made a matrice, we can put labels to name its rows and columns using the names() function. For naming rows, we use the rownames() function and for naming the columns, we use the colnames() function.

first we define two vectors including the names for columns (regions of sales) and then rows (film names) respectively

region = c("US", "non-US")
titles = c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")

now we assign these names using the rownames() and colnames() functions ass following:

rownames(star_wars_matrix) = titles
colnames(star_wars_matrix) = region

we could also do this naming more directly, as following:

rownames(star_wars_matrix) = c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
colnames(star_wars_matrix) = c("US", "non-US")

we can now print and see the outcome:

star_wars_matrix

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of the Jedi      309.306  165.8

If a matrice’s rows and columns already have names, we can call them by the same rownames() and the colnames() function.

rownames(star_wars_matrix)

## [1] "A New Hope"              "The Empire Strikes Back"
## [3] "Return of the Jedi"

colnames(star_wars_matrix)

## [1] "US"     "non-US"

3.3. Dimesions and Summary of Matrixes

For matrices, the dim() function returns a vector of integers of the dimensions of the variable/matrix: number of rows and then columns.

dim(star_wars_matrix)

## [1] 3 2

For matrices, the functions nrow() and ncol() return the number of rows and columns, respectively:

nrow(star_wars_matrix)

## [1] 3

ncol(star_wars_matrix)

## [1] 2

The length() function that we have previously used with vectors also works on matrices. In this case it returns the product of dimensions:

length(star_wars_matrix)

## [1] 6

We can also use the summary() function on matrices. As said before, the summary() function on multidimensional elements work by summarizing information on a column-based manner.

summary(star_wars_matrix)

##        US            non-US     
##  Min.   :290.5   Min.   :165.8  
##  1st Qu.:299.9   1st Qu.:206.8  
##  Median :309.3   Median :247.9  
##  Mean   :353.6   Mean   :242.7  
##  3rd Qu.:385.2   3rd Qu.:281.1  
##  Max.   :461.0   Max.   :314.4

3.4. Calculating Sum of Rows and Columns in a Matrix

Imagine we want to see the total sales for each star wars movies (i.e., based on movies), as well as their total regional sales (i.e., in each region). The functions rowSums()and colSums()do the job for us. The function rowSums() calculates sums of each row of the matrix. The function colSums() calculates the sums of elements for each column of the matrix. The outcomes of either of these functions are saved as a vector. Moreover, if the matrix rows or columns have name labels, they will be shown in the output vector object that rowSums()and colSums() return.

movie_sales = rowSums(star_wars_matrix)
movie_sales

##              A New Hope The Empire Strikes Back      Return of the Jedi 
##                 775.398                 538.375                 475.106

region_sales = colSums(star_wars_matrix)
region_sales

##       US   non-US 
## 1060.779  728.100

3.5. Adding a Column OR Row to an Already Existing Matrix

the c() function turns matrices into vectors and then combine them together and returns a vector. So it is not useful for combining matrices. We use functions cbind() and rbind() instead.

Imagine, we would like to add a column to the star_wars_matrix which contains the total sales in each row, that is the total sales for each movie. We can do this using the cbind() function, stands for column bind. The outcome of this function is a new matrix that can be named and used in further analyses.

We use the vector of row sales that we calculated in the previous section here and bind it to the matrix.

new_matrix_col = cbind(star_wars_matrix, movie_sales)
new_matrix_col   #notice that the name of the vector (i.e., movie_sales) is now the label name of the new column in our matrix. Cool!

##                              US non-US movie_sales
## A New Hope              460.998  314.4     775.398
## The Empire Strikes Back 290.475  247.9     538.375
## Return of the Jedi      309.306  165.8     475.106

In a similar fashion, the function rbind() adds a row to an already existing matrix. The output is a new matrix that can be named and used.

new_matrix_row = rbind(star_wars_matrix, region_sales)
new_matrix_row ##notice that the name of the vector (i.e., regional_sales) is now the label name of the new row in our matrix.

##                               US non-US
## A New Hope               460.998  314.4
## The Empire Strikes Back  290.475  247.9
## Return of the Jedi       309.306  165.8
## region_sales            1060.779  728.1

Example 4: Combining Matrices

Make two matrices of 3x3, one arranging numbers from 1 to 9, and the other from 10 to 18. Then combine them both into a new matrix of 3x6. Could we also make a matrix of 6x3 out of these two?

A = matrix(1:9, byrow = TRUE, ncol = 3)
B = matrix(10:18, byrow = TRUE, ncol=3)

M = cbind(A, B)
M         #matrix of 3x6

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    3   10   11   12
## [2,]    4    5    6   13   14   15
## [3,]    7    8    9   16   17   18

N = rbind(A, B)
N        #matrix of 6x3

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12
## [5,]   13   14   15
## [6,]   16   17   18

NOTE: As in the above example, both cbind() and rbind() can be used for combining matrices. For example, big_matrix = cbind(matrix1, matrix2,...matrixN) combines the N matrices, column-wise, into a bigger one.

3.6. Indexing Elements of a Matrix

Similar to vectors, we use the square brackets [ ] to index one or multiple elements from a matrix. Whereas vectors are only one dimensional, matrices have two dimensions (rows, columns), and thus a comma is used to indicate the coordination of an element in a specified row and column. For example, the following code brings the element in the 3rd row and 2nd column of the star_wars_matrix

star_wars_matrix [3,2]

## [1] 165.8

Also, the codes below return the entire 2nd row and the entire 1st column of the star_wars_matrix, respectively.

star_wars_matrix [2,] #returning 2nd row of the matrix.

##      US  non-US 
## 290.475 247.900

star_wars_matrix [,1] #returning 1st column of the matrix.

##              A New Hope The Empire Strikes Back      Return of the Jedi 
##                 460.998                 290.475                 309.306

star_wars_matrix [c(1,3),1] #returns emlements of [1,1] and [3,1]

##         A New Hope Return of the Jedi 
##            460.998            309.306

star_wars_matrix

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of the Jedi      309.306  165.8

Example 4: Returning Parts of a Matrix

Create a 10 x 10 matrix consisting of 100 numbers from 1 to 100. Name the matrix as my_matrix and do the following: * a) return all elements in rows 2 to 4, and in columns 3 to 6. * b) return elements (1,1) and (3,1) with one command. * c) try the code my_matrix[c(1,3,5), c(2,4,6)]. What is the output?

my_matrix = matrix(1:100, byrow = TRUE, ncol = 10)
my_matrix

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]   11   12   13   14   15   16   17   18   19    20
##  [3,]   21   22   23   24   25   26   27   28   29    30
##  [4,]   31   32   33   34   35   36   37   38   39    40
##  [5,]   41   42   43   44   45   46   47   48   49    50
##  [6,]   51   52   53   54   55   56   57   58   59    60
##  [7,]   61   62   63   64   65   66   67   68   69    70
##  [8,]   71   72   73   74   75   76   77   78   79    80
##  [9,]   81   82   83   84   85   86   87   88   89    90
## [10,]   91   92   93   94   95   96   97   98   99   100

# Part A. 
my_matrix[2:4, 3:6]

##      [,1] [,2] [,3] [,4]
## [1,]   13   14   15   16
## [2,]   23   24   25   26
## [3,]   33   34   35   36

# Part B. 
my_matrix[c(1,3), 1]

## [1]  1 21

# Part C. 
my_matrix[c(1,3,5), c(2,4,6)] #it returns a matrix of 3x3 elements consiting elements (1,2), (1,4), and (1,6) of my_matrix in its first row, then (3,2), (3,4), and (3,6) in its second row and finally (5,2), (5,4), and (5,6) in its last row.

##      [,1] [,2] [,3]
## [1,]    2    4    6
## [2,]   22   24   26
## [3,]   42   44   46

Note: For visualizing two-dimensional variables such as matrices and data frames, the View() function (notice the capital “V”) displays the variable as a (read-only) spreadsheet.

View(my_matrix)

3.7. Matrix Arithmatic

k * Matrix multiplies every element of Matrix by k. In a similar fashion (contrary to principles of Matrix Algebra), the command Matrix1 x Matrix2 (assuming that matrices are of the same size) returns a new matrix in which each element is the product of the corresponding elements in Matrix1 and Matrix2.

NOTE: As you have noticed multiplication of two matrices by * does not return to the product of two matrices as we have learnt in Algebra. To get the Algebraic multiplication of two matrices, one should use %*% in R.

Below, we create a matrix consisting of of squared of 1 to 9:

M1 = matrix(1:9, byrow = TRUE, ncol = 3)
Square_Matrix = M1*M1
Square_Matrix

##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36
## [3,]   49   64   81

Another application of corresponding multiplication is for example when matrix A represents the quantity of product (i,j) sold in the store and matrix B with the exact same size and structure representing the price of (i,j) product. Therefore, a sales matrix for all products can be created using A*B command.

Finally, in the following code, we do the mathematical (algebraic) multiplication of two matrices:

My_test = matrix(1:4, byrow=TRUE, nrow=2)
My_test

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

Math_way_multiplication_matrix = My_test%*% My_test
Math_way_multiplication_matrix

##      [,1] [,2]
## [1,]    7   10
## [2,]   15   22

To transpose a matrice, we use the function t().

t(My_test)  # do not confuse this with inverse matrix. this is just transposed.

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

To calculate the ïnverse of a matrice, that is A^-1, we use the function solve()

solve(My_test)

##      [,1] [,2]
## [1,] -2.0  1.0
## [2,]  1.5 -0.5

Notice that, we know A%*%A^-1 is I matrice. See below:

My_test%*%solve(My_test)

##      [,1]         [,2]
## [1,]    1 1.110223e-16
## [2,]    0 1.000000e+00

4. Factors: a Vector of Nominal or Ordinal Data

The gender_vector contains the sex of five different individuals. It is evident that there are two categories, or in R-terms ‘factor levels’, in this vector: Male and Female. The function factor()encodes the input vector and turns it into a factor. In other words, the output is still a vector, however, its elements have found meanings. They are now levels of a factor.

gender_vector = c("Male","Female","Female","Male","Male") #a typical vector
gender_factor = factor(gender_vector)  #turning the input vector into a factor

gender_factor

## [1] Male   Female Female Male   Male  
## Levels: Female Male

Note: Every time, we call a factor, not only the entire original vector of observations is returned, but also the levels of the factor, below them.

By default the function factor()transforms vectors to an unordered/nominal factor, as in the above example. In other words, it does not differentiate between levels in a meaningful and comparative way. However, factors might be ordianl and thus their levels comparable. In case of creating an ordinal factor, we apply the same function factor() and use ordered and levels arguments. By setting the argument order to TRUE in the function factor(), we indicate that the factor is ordered. With the argument levels, we indicate the values of the factor in the correct order, in an ascending manner.

temperature_vector = c("High", "Low", "High","Low", "Medium")
temperature_factor = factor(temperature_vector, ordered = TRUE, levels = c("Low", "Medium", "High"))

temperature_factor

## [1] High   Low    High   Low    Medium
## Levels: Low < Medium < High

Note: Another way of creating factor variables is when we create a data frame. In data frames, all the character columns are automatically turn into factors. We will see this in an example in the data frames section.

4.1. Changing Names of Factor Levels

Sometimes we want to change the names of the levels for clarity or other reasons. R allows us to do this with the function levels(). A good illustration is the raw data that is provided to you by a survey. A standard question for every questionnaire is the gender of the respondent. You remember from the previous question that this is a factor and when performing the questionnaire on the streets its levels are often coded as “M” and “F”. Imagine we have a vector indicating gender of of 5 respondents. Also, imagine when you want to start your data analysis, your main concern is to keep a nice overview of all the variables and what they mean. At that point, you will often want to change the factor levels to “Male” and “Female” instead of “M” and “F” to make your life easier.

gender_vector = c("M", "F", "F", "M", "M")
gender_vector= factor(gender_vector) #creating a factor with M and F levels. 

levels(gender_factor)= c("Female", "Male")

gender_factor

## [1] Male   Female Female Male   Male  
## Levels: Female Male

Note: the levles() function assigns the levels of old factor based on their alphabetical orders to their new names. In other words, the first new name is assigned to “F” and the other one to “M”, because “F” precedes “M” in the alphabet order.

4.2. Summarizing Infromation of Factor Variables.

The function summary() returns a summary of an object it receives as input. If this input is a factor, it returns useful information regarding its levels, number of observations under each level, etc.

summary(gender_factor)

## Female   Male 
##      2      3

NOTE: if the input of summary() function is a numerical vector, it returns the basic statistical summary of the vector.

numerical_vector = c(1,2,3,4,5)
summary(numerical_vector)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       2       3       3       4       5

4.2. Comparing Elements of an Ordered Factor Vector

Imagine you have 5 data analysts working for you and you have classified them based on their speed at work.

speed_vector = c("fast", "slow", "slow", "fast", "insane")
speed_factor = factor(speed_vector, ordered = TRUE, levels = c("slow", "fast", "insane")) #we turn the above vector into an ordered factor vector.

Remember that a factor is in its nature still a vector, except that its elements are classified as meaningful levels which are either ordered or not. Now imagine that one day, the second data analysts complains about the fifth data analysts and claims that he is slow. You want to test whether this claim is true. We can do this by running a simple and logical comparison, as we do on any typical factor if we wanted to know whether element 2 was bigger than 5. This comparison is possible because the vector is already an ordered factor vector.

factor_speed_vector[2] > factor_speed_vector[5]

speed_factor[2] > speed_factor[5]

## [1] FALSE

5. Data Frames

Unlike vectors and matrices, data frames are two-dimensional structures that can include variety of data types at the same time. Similar to matrices, data frames consist of rows and columns.

Each column represent a variable and each row is an observation/case/participant. Variables can be of different kinds: numeric, logical, character.

5.1. Creating Data Frames

Imagine we have the following six vectors (of the same size) that we would like to combine and make a data frame out of them. In other words, you want each of the following vectors become one column in our final data frame. The function data.frame()does the job for us.

name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

planets_dataf <- data.frame(name, type, diameter, rotation, rings)

planets_dataf  #notice that names of vectors becomes labels of each column in the data frame.

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 5 Jupiter          Gas giant   11.209     0.41  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE
## 7  Uranus          Gas giant    4.007    -0.72  TRUE
## 8 Neptune          Gas giant    3.883     0.67  TRUE

To know the structure of a data frame, the function str() does the job.

str(planets_dataf)

## 'data.frame':    8 obs. of  5 variables:
##  $ name    : Factor w/ 8 levels "Earth","Jupiter",..: 4 8 1 3 2 6 7 5
##  $ type    : Factor w/ 2 levels "Gas giant","Terrestrial planet": 2 2 2 2 1 1 1 1
##  $ diameter: num  0.382 0.949 1 0.532 11.209 ...
##  $ rotation: num  58.64 -243.02 1 1.03 0.41 ...
##  $ rings   : logi  FALSE FALSE FALSE FALSE TRUE TRUE ...

Let’s try to check the type of name and type columns. Note that we previously mentioned that a character vector, when positioned in a data frame, is turned into a factor.

class(planets_dataf$name)

## [1] "factor"

class(planets_dataf$type)

## [1] "factor"

we can now check their levels and number of levels in each of these factors:

levels(planets_dataf$name)

## [1] "Earth"   "Jupiter" "Mars"    "Mercury" "Neptune" "Saturn"  "Uranus" 
## [8] "Venus"

nlevels(planets_dataf$name)

## [1] 8

levels(planets_dataf$type)

## [1] "Gas giant"          "Terrestrial planet"

nlevels(planets_dataf$type)

## [1] 2

5.2. Making Sense of Big Data Frames using head() and tail() functions

the function head() returns the first few observations of a data frame. Similarly, the function tail() returns the last few observations in a data frame. Both head() and tail() return also a top line called the header, which contains the names of the different variables in a data frame.

5.3. Indexing Elements in Data Frames

Similar to vectors and matrices, we use the square brackets to identify element/elements. For example, the code below returns the element in row 1, column 3:

planets_dataf[1,3]

## [1] 0.382

the same job can be done using the row and column names. For example the code below prints the element in the first row and under the column called “name”.

planets_dataf[1,"name"]

## [1] Mercury
## Levels: Earth Jupiter Mars Mercury Neptune Saturn Uranus Venus

Moreover, the following codes also returns several elements simultaneously:

planets_dataf[1:3, 2:5] # returning rows 1 to 3, at columns 2 to 5.

##                 type diameter rotation rings
## 1 Terrestrial planet    0.382    58.64 FALSE
## 2 Terrestrial planet    0.949  -243.02 FALSE
## 3 Terrestrial planet    1.000     1.00 FALSE

planets_dataf[1:3, "rings"] # rows 1 to 4, under the column "rings"

## [1] FALSE FALSE FALSE

planets_dataf[,"name"]  # All the rows, but from the name column only.

## [1] Mercury Venus   Earth   Mars    Jupiter Saturn  Uranus  Neptune
## Levels: Earth Jupiter Mars Mercury Neptune Saturn Uranus Venus

planets_dataf[,1] # complete first column (from all rows).

## [1] Mercury Venus   Earth   Mars    Jupiter Saturn  Uranus  Neptune
## Levels: Earth Jupiter Mars Mercury Neptune Saturn Uranus Venus

planets_dataf[2,] # complete second row (from all columns)

##    name               type diameter rotation rings
## 2 Venus Terrestrial planet    0.949  -243.02 FALSE

planets_dataf[c(1,2),3]  # returns elemnts (1,3) and (2,3) respectively.

## [1] 0.382 0.949

planets_dataf[c(1,2),]   #returns the 1st and second rows.

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE

To single out a particular column, we can also use the symbol $. This shortcut can only be used when the columns have names and in the form of planet_dataf$name. The $ sign can only be used to refer to columns (variables) not rows.

planets_dataf$diameter[diameter>1] #only focuses on diameter column and retunrs values form that column.

## [1] 11.209  9.449  4.007  3.883

planets_dataf[planets_dataf$diameter>1, ]  #returns all cases where diameter is bigger than 1. this code retuns all information about those cases, compare it with the previous code.

##      name      type diameter rotation rings
## 5 Jupiter Gas giant   11.209     0.41  TRUE
## 6  Saturn Gas giant    9.449     0.43  TRUE
## 7  Uranus Gas giant    4.007    -0.72  TRUE
## 8 Neptune Gas giant    3.883     0.67  TRUE

planets_dataf[planets_dataf$diameter>1 | planets_dataf$rings==FALSE, ]

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 5 Jupiter          Gas giant   11.209     0.41  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE
## 7  Uranus          Gas giant    4.007    -0.72  TRUE
## 8 Neptune          Gas giant    3.883     0.67  TRUE

5.4. Subsetting: Selecting planets that have rings

imagine that we are interested in the information of the planets that have rings only. You want them to be selected from the rest of info in the data set. This is called subsetting.

The function subset () does the job for us. It has the following structure: subset(my_dataframe, subset = some_condition). For our example, here is the code:

subset(planets_dataf, subset = rings)  # because the ring vector is a logical vector, only those elements with TRUE values are printed. So, simply coding "subset = rings" does the job.

##      name      type diameter rotation rings
## 5 Jupiter Gas giant   11.209     0.41  TRUE
## 6  Saturn Gas giant    9.449     0.43  TRUE
## 7  Uranus Gas giant    4.007    -0.72  TRUE
## 8 Neptune Gas giant    3.883     0.67  TRUE

Suppose, we are interested in planets with bigger size than earth. That is those whose diameter in our data frame are bigger than one (under the diameter column).In that case, the following code does the job:

subset(planets_dataf, subset= diameter>1)

##      name      type diameter rotation rings
## 5 Jupiter Gas giant   11.209     0.41  TRUE
## 6  Saturn Gas giant    9.449     0.43  TRUE
## 7  Uranus Gas giant    4.007    -0.72  TRUE
## 8 Neptune Gas giant    3.883     0.67  TRUE

planets_dataf$diameter[diameter>1]  #compare with the top code regarding their outputs...

## [1] 11.209  9.449  4.007  3.883

Notice that the condition in front of the argument subset = is only focusing on the criteria/column of the interest. The data frame is already selected in the first argument of subset() function. So no need for the data frame to be specified again.

Example 5: Sorting a Vector Using Order()

Consider we would like to sort elements of a numerical vector in an ascending order, that is from minimum to maximum. Function order() checks and returns the position of the elements of an input vector, had its elements been ordered ascendingly, from the minimum to the maximum. Notice, this function simply returns the position of elements in the vector, in an ascending-ordered manner. It does not however return the ordered vector itself. But that is enough for us to be able to rebuild the orders vector.

test = c(200, -1, 14, -750)  #initial unordered vector. 
ordered_positions = order(test)

ordered_positions  #returns a vector containig positions of elements in the test vector, had they been ordered ascendingly

## [1] 4 2 3 1

test[ordered_positions]  #returns the vector test in an ordered manner.

## [1] -750   -1   14  200

Example 6: Sorting a Data Frame Using Order()

In the same fashion as in the previous example, we can sort a data frame. Let’s sort out all cases/rows in our data frame planets_dataf based on the data in the column/variable, diameter, in a ascending order.

diameters_ord = order(planets_dataf$diameter) #lets find the ordered position of diameter variable first.

Once we have the ordered position of diameter variable, it is the exact order by which we want to shuffle and present our cases/rows in the data frame. So, we can call the cases/rows in the data frame in that order, following by all the columns.

planets_dataf[diameters_ord,]

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 8 Neptune          Gas giant    3.883     0.67  TRUE
## 7  Uranus          Gas giant    4.007    -0.72  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE
## 5 Jupiter          Gas giant   11.209     0.41  TRUE

5.5. Combining Data Frames

Data frames can also be joined together using the merge()function. Where two data frames share columns, they can be merged together using the merge() function. merge() provides a variety of options for doing database-style joins. To join two data frames, you need to specify which columns contain the key values to match up. By default, the merge function uses all the common columns from the two data frames, but more commonly you will just want to use a single shared ID column. Note that merge() function combines two data frames by their columns,

v1 = c(2,3,5)
v2=c("M", "S", "I")
firstdf =data.frame(v1, v2)
firstdf

v3 = c(6,7,8)
v4=c("T","L","W")
seconddf=data.frame(v3,v4)
seconddf

thirddf = data.frame(v1,v4)
thirddf

merge(firstdf,seconddf, by="v1")  #returns error because there is no v1 in the seconddf. 

merge(firstdf, thirddf, by="v1") #this works.

6. Lists: Boxes that Contain Different Objects

A Recap:

Vectors (one dimensional arrays)can hold numeric, character or logical values. The elements in a vector all have the same data type.
Matrices (two dimensional arrays) can hold numeric, character or logical values. As in Vectors The elements of matrix all have the same data type.
Data Frames (two-dimensional objects) can hold numeric, character or logical values. Within a column/variable all elements have the same data type, however different columns can have different data types.
A List in R is similar to our notebook or to-do list at work or school. Different items in our notebook most likely differ in length, characteristic, type, etc. A list in R allows us to gather various objects under one umbrella,that is, the name of the list. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects to be related/similar to each other in any way. You could say that a list is some kind super data type; you can store practically any piece of information in it. lists do not have dimensions.

Note: Due to this ability to contain other lists within themselves, lists are considered to be recursive variables. Vectors, matrices, and arrays, by contrast, are atomic. (Variables can either be recursive or atomic, never both). The functions is.recursive() and is.atomic() let us test variables to see what type they are

6.1. Constructing a List

We can create a list using the list() function. Below, we create and return a list containing an array, matrix and a data frame.

my_vector = c(1:10)
my_matrix = matrix(1:9, ncol = 3, byrow=TRUE)

my_dataframe = mtcars[1:10,]  # First 10 elements of the R built-in data frame, mtcars

my_list =list(my_vector, my_matrix, my_dataframe) #creating a list

my_list   #calling a list

## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## 
## [[3]]
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

6.2. Naming Elements of a List

Just like your to-do list, you want to avoid not knowing or remembering what the components of your list stand for. That is why you should give names to them. Here we introduce two methods, just as in vectors, to do so.

Method1. Direct method of naming while making a list:
- my_list = list(name1 = your_comp1, name2 = your_comp2,…).

This creates a list and simultaneously name those components with labels as name1, name2, and so on.

Method 2. If you want to name your lists after you’ve created them, you can use the names() function as you did with vectors. The following commands are equivalent to the above code in Method1:
- my_list = list(your_comp1, your_comp2)
- names(my_list) = c(“name1”, “name2”)

Note: Once we have named the objects in our list, we can use their names to summon them.

Now, we use both Method 1 and 2 to name objects within the list we created above (i.e., my_list)

# Method1. 
my_list = list("Object1"= my_vector, "Object2"=my_matrix, "Object3"=my_dataframe)

my_list

## $Object1
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $Object2
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## 
## $Object3
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

# Method 2. Here we change the names, so we can see the difference between codes below and above. 

names(my_list)= c("Object_One", "Object_Two", "Object_Three" )
my_list

## $Object_One
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $Object_Two
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## 
## $Object_Three
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

6.3. Selecting and Returning Objects from a List

To select objects from a list, one should use double brackets [[]]. Inside the double brackets we can use either the name of an object inside "" marks or just a number to refer to the position of that object in the list. Another approach is using $ and then attaching the name of the object.

Suppose, we would like to return my_matrix from my_list. Here are three ways to do it:

my_list[[2]]

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

my_list$Object_Two

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

my_list[["Object_Two"]]

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

6.4. Selecting Elements of Objects in Lists

Suppose, we want to see the first row in the data frame, the 3rd object, in the list. Here is how it works:

my_list[[3]][1,]

##           mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

Here is how to call the first element of the vector in the list, which the first object in the list:

my_list[[1]][1]

## [1] 1

# Or doing the same, using the name of objects: 

my_list[["Object_One"]][1]

## [1] 1

We can use the $sign to call objects with their names and then call elements within them with usual []. For example, the code below calls the 2nd object (i.e., the matrix named, “Object_Two”) and returns its first row.

my_list$Object_Two[1,]

## [1] 1 2 3

Interestingly, one can use the $ sign several times. The code below calls the data frame object from the list, then returns a column in it called mpg.

my_list$Object_Three$mpg   # REALLY COOL!

##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2

6.5. Adding New Objects to a List

List is a box that contains objects. One can therefore add a new object into a list using ordinary c() command.

vector_NEW = c(2,6,8, 9)
my_list = c(my_list, vector_NEW)
my_list

## $Object_One
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $Object_Two
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## 
## $Object_Three
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 
## [[4]]
## [1] 2
## 
## [[5]]
## [1] 6
## 
## [[6]]
## [1] 8
## 
## [[7]]
## [1] 9

One issue with the above code is that elements of vector_NEW each have been added as new objects into the list. The list had 3 objects in it previously. Now with the above code, it has turned into a list with 7 objects. Unfortunately, the vector_NEW has not been added as a whole, one vector object, but as four different objects. Perhaps, in such cases it is better to define the original list again, including NEW objects we want it to contain.

Note: We simply use the str() or summary()functions to get an overview of the list structure.

Note: NULL NULL is a special value that represents an empty variable. Its most common use is in lists, but it also crops up with data frames and function arguments. When you create a list, you may wish to specify that an element should exist, but should have no contents. For example, the following list contains UK bank holidays1 for 2013 by month. Some months have no bank holidays, so we use NULL to represent this absence.

uk_bank_holidays_2013 = list( Jan = "New Year's Day",  
                              Feb = NULL,  Mar = "Good Friday",  
                              Apr = "Easter Monday",  
                              May = c("Early May Bank Holiday", "Spring Bank Holiday"),  
                              Jun = NULL,  
                              Jul = NULL,  
                              Aug = "Summer Bank Holiday",  
                              Sep = NULL,  
                              Oct = NULL, 
                              Nov = NULL,  
                              Dec = c("Christmas Day", "Boxing Day") )

uk_bank_holidays_2013

## $Jan
## [1] "New Year's Day"
## 
## $Feb
## NULL
## 
## $Mar
## [1] "Good Friday"
## 
## $Apr
## [1] "Easter Monday"
## 
## $May
## [1] "Early May Bank Holiday" "Spring Bank Holiday"   
## 
## $Jun
## NULL
## 
## $Jul
## NULL
## 
## $Aug
## [1] "Summer Bank Holiday"
## 
## $Sep
## NULL
## 
## $Oct
## NULL
## 
## $Nov
## NULL
## 
## $Dec
## [1] "Christmas Day" "Boxing Day"

It is important to understand the difference between NULL and the special missing value NA. The biggest difference is that NA is a scalar value, whereas NULL takes up no space at all-it has length zero:

length(NULL)

## [1] 0

length(NA)

## [1] 1

You can test for NULL using the function is.null(). Missing values are not null. Null values are not missing either. NULL should be seen and interpreted as nonexisting, and not as missing.