2.1 Overview: Types of Objects

As mentioned in Module 1, R is often called an object-oriented language. That is, the main purpose of R is to manipulate ‘objects’ to accomplish tasks. Your goal is to assign objects and then use functions to manipulate them.
There are many types (or classes) of objects. Many functions are specifically tailored to deal with specific types of objects. Therefore, it is critical that you understand the distinctions between different types of objects, and how to best make use of each. Some packages generate special types of objects, which can then be manipulated or analyzed in special ways. Here, we will cover some of the most common types of objects you will encounter.

Object Type Detail
Numeric Numbers
Character Text
Factor A set of characters with finite levels
Logical TRUE or FALSE
Date Dates and times can take on special formats
Vector A variable with multiple values of the same type (i.e., numeric, character, logical, etc.)
Matrix A two-dimensional array of numbers
Array A set of numbers arranged in any number of dimensions. For example, you can have a three-dimentional array, which is essentially a stack of matrices.
Data frame A two-dimensional object with each column consisting of a numerica vector or character string. What you typically thing of as a spreadsheet.
List A bundle of any set of components. Each element in a list can be whatever object. Once you get used to them, lists are very useful.

2.2 Vectors

Vectors are essentially a one-dimensional set of elements. The elements can be numbers (numeric vectors), characters, etc.

2.2.1 Vectors of different types

Let’s try making a numeric vector using a function called c() (for ‘combine’):

v=c(4,3,5,3,2,3,1)
v
## [1] 4 3 5 3 2 3 1

Objects can also be text. Text objects are called character strings. In R, all text needs to be contained within quotes (single or double quotes are allowed). Otherwise, it will just try to give you an object with that name.

We can combine multiple character strings into a vector. Each element can be a single letter, word, phrase, or entire sentences.

chars=c("a", "word", "or a phrase")
chars
## [1] "a"           "word"        "or a phrase"

If you try to combine letters and numbers into a single vector, it will turn into a character vector, with numbers treated as text:

numbersletters=c(1,2,3, "one", "two", "three")
numbersletters
## [1] "1"     "2"     "3"     "one"   "two"   "three"

Factors are different from chracters in that they have levels. This will become a bit more important later when we start playing with dataframes.

factors=as.factor(numbersletters) #convert the vector above to factors
factors
## [1] 1     2     3     one   two   three
## Levels: 1 2 3 one three two

Objects can also be logical objects, i.e., TRUE or FALSE. Note all capitals. This class can be really important and useful.

logic=c(TRUE, TRUE, FALSE, FALSE)
logic
## [1]  TRUE  TRUE FALSE FALSE

One cool thing to note is that we can convert logical objects into numerics by adding a number:

logic+0
## [1] 1 1 0 0

You can see that TRUE becomes 1 and FALSE becomes 0

2.2.2 Vector Functions

You can measure various attributes of this vector. For example, let’s find out how many numbers there are in this vector and add up all of the numbers. Try:

length(v)
## [1] 7
sum(v)
## [1] 21

From this, we can calculate the mean.

sum(v)/length(v)
## [1] 3

Of course, there is a pre-packaged function that calculates the mean of a vector, so this is simpler:

mean(v)
## [1] 3

Here are some more mathematical functions you can try out. Try typing these, and also try looking at the details of the functions using ?’functionname’:

function meaning
max() maximum value
min() minimum value
sum() sum
mean() average
median() median
range() returns vector of min and max values
var() sample variance

We can manipulate vectors as a whole. for example, let’s multiply the vector by 10.

v*10
## [1] 40 30 50 30 20 30 10

2.2.3 Indexing: The importance of [ ]

For multi-element objects (i.e., anything that is a combination of numbers, letters, etc.), we can locate specific elements within objects using square brackets []. For example, we can ask what is the 6th number in the numeric vector v, or the second element in the character vector chars from above.

v[6]
## [1] 3
chars[2]
## [1] "word"

2.3 Matrices

Ok, now let’s try a matrix. This is a two-dimensional set of numbers, so when we create a matrix, we also need to specify the dimensions. Let’s demonstrate the difference beween vectors and matrices:

1:9 #colon create vector of integers
## [1] 1 2 3 4 5 6 7 8 9
vec=1:9
mat=matrix(1:9,nrow=3)

Now look at the objects vec and mat

vec
## [1] 1 2 3 4 5 6 7 8 9
mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Note that R arranges the number series going up to down. This is important to remember when you are creating matrices. You can make R construct matrices by rows (which is more intuitive to me) by:

mat2=matrix(1:9,nrow=3,byrow=TRUE)
mat2
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Now, try a slight variation:

mat3=matrix(1:10,nrow=2,byrow=TRUE)
rownames(mat3)=c("row1","row2")
colnames(mat3)=c("A","B","C","D","E")
mat3
##      A B C D  E
## row1 1 2 3 4  5
## row2 6 7 8 9 10

You can see that matrices can be “rectangular”, and also you can name the dimensions (rows & columns) of the matrix using rownames() and colnames().

Indexing in a matrix requires two values inside the square brackets: [row, column]. You can also use this to look at entire rows or columns. For example:

mat3[2,3] #what is the number in row 2, column 3?
## [1] 8
mat3[2,] #what are the values of row 2?
##  A  B  C  D  E 
##  6  7  8  9 10
mat3[,4] #what are the values of column 4?
## row1 row2 
##    4    9

You can conduct mathematical operations on matrices:

mat3*10 #multiply all values in mat3 by 10
##       A  B  C  D   E
## row1 10 20 30 40  50
## row2 60 70 80 90 100

2.4 Dataframes

For most cases, your data will be organized in the form of a dataframe. A dataframe is an object with rows and columns in which each row represents an observation (sometimes called cases), and each column is a measurement of a variable (sometimes called fields). Whereas the values of a matrix can only be numbers, the values of a variable in a dataframe can be numeric, character,factor, or other formats (e.g., dates, logical variables such as TRUE and FALSE).

Let’s try creating a dataframe by combining a factor (categorical variable) and a numeric vector.

sex=c(rep("M",5), rep("F",5))
size=c(9,8,8,9,7,5,4,4,3,4)
dat=data.frame(sex, size)
dat
##    sex size
## 1    M    9
## 2    M    8
## 3    M    8
## 4    M    9
## 5    M    7
## 6    F    5
## 7    F    4
## 8    F    4
## 9    F    3
## 10   F    4

Notice that the columns already have names. The data.frame function uses the object name as the default column names. However, you can also assign column names using arguments inside the function:

dat=data.frame(Gender=sex, Size=size) #Notice the capitalization
dat
##    Gender Size
## 1       M    9
## 2       M    8
## 3       M    8
## 4       M    9
## 5       M    7
## 6       F    5
## 7       F    4
## 8       F    4
## 9       F    3
## 10      F    4

Indexing in dataframes

We can refer to each row or columns in the dataframe using square brackets, just as with the other objects we have learned already.

dat[1,] #first row
##   Gender Size
## 1      M    9
dat[,2] #third column
##  [1] 9 8 8 9 7 5 4 4 3 4

You can also get the columns of the dataframe using the $ operator:

dat$Gender
##  [1] M M M M M F F F F F
## Levels: F M

Here, the output shows the “levels” available in this column because it is a factor.

You can find out the type of variable for each column using the function class()

class(dat$Gender)
## [1] "factor"
class(dat$Size)
## [1] "numeric"

Two more useful functions: str() gives you the structure of the object, and summary() gives you some basic info on each column.

str(dat)
## 'data.frame':    10 obs. of  2 variables:
##  $ Gender: Factor w/ 2 levels "F","M": 2 2 2 2 2 1 1 1 1 1
##  $ Size  : num  9 8 8 9 7 5 4 4 3 4
summary(dat)
##  Gender      Size    
##  F:5    Min.   :3.0  
##  M:5    1st Qu.:4.0  
##         Median :6.0  
##         Mean   :6.1  
##         3rd Qu.:8.0  
##         Max.   :9.0

Built-in data sets

The base R program comes with a bunch of datasets as part of the program. To load a specific data set, you simply use the function data(). For example, to load the data set called ‘iris’:

data("iris")

Now let’s look at this dataset. Here, I’m going to use the function head(), which will display only the first 6 lines of the dataset:

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Built-in datasets are often useful for learning how functions work. You will often see examples within help files make use of built-in data sets to demonstrate how something works. You will also see some R packages will include some built-in data sets for this same reason.


2.5 Lists

A List object is a powerful and flexible tool in R. Dataframes, matrices and array have many constraints – e.g., each row must have the same number of columns. In contrast, you can combine any set of objects together into a list.
As an example, let’s create three vectors that are of different lengths with different types of elements (number, logical, and character).

apples=c(1,2,3,4,5)
oranges=c(TRUE, FALSE)
grapes=c("grape", "Grape", "GRAPE")

We can try to combine these objects into a dataframe, but we won’t be able to because the vectors are different lengths:

data.frame(apples, oranges, grapes)
## Error in data.frame(apples, oranges, grapes): arguments imply differing number of rows: 5, 2, 3

However, we can combine these into a list:

mylist=list(apples, oranges, grapes) 
mylist
## [[1]]
## [1] 1 2 3 4 5
## 
## [[2]]
## [1]  TRUE FALSE
## 
## [[3]]
## [1] "grape" "Grape" "GRAPE"

Lists are structured differently than other objects. In a list, each component or item is indexed using a double bracket [[]]. So the first item in the list (i.e., apples) is:

mylist[[1]]
## [1] 1 2 3 4 5

… and the second element within the third item (i.e., grapes) would be:

mylist[[3]][2]
## [1] "Grape"

You can name the items within a list when creating it, or afterwards:

#These do the same thing
mylist=list(apples=apples, oranges=oranges, grapes=grapes) 
names(mylist)=c("apples", "oranges", "grapes")
mylist
## $apples
## [1] 1 2 3 4 5
## 
## $oranges
## [1]  TRUE FALSE
## 
## $grapes
## [1] "grape" "Grape" "GRAPE"

Once you name the items in a list, you can use the $ operator to call a specific item:

mylist$grapes
## [1] "grape" "Grape" "GRAPE"

You can even combine different dataframes into a list. Let’s do this by loading several built-in data sets and then combining them into a list (output hidden):

data("iris")
data("trees")
data("Loblolly")
mydata=list(iris, trees, Loblolly)
mydata

Lists may not be intuitive to you yet, but you will see how convenient this type of object can be when we get around to more complex tasks such as batch processessing and apply functions.

Group Exercise: Learning new functions

Now that we have learned a few basic types of objects, we can start playing around with some more functions. You will collaborate as a group to learn a set of functions. Each person should choose one or two functions, skim the help file, play around with the function and learn what it does. Then take turns teaching each other what the functions do.

Hints:

  • You can get the help file by using the command ?function_name
  • Different functions use different types of objects as inputs. You can find out what types of objects the function can deal with in the “Arguments” section of the help file
  • Help files almost always have example usage at the bottom. These are often the most useful parts of the help files. You should be able to copy and paste these examples to your console and run them to see what the do.

Functions to learn:

  • dim()
  • quantile()
  • t()
  • rowSums()
  • sort()
  • order()
  • paste()
  • table()
  • nchar()