Introduction
Previoulsy, I created a “Tips and Tricks (Guide) for R,” which introduced users to the R environment, particularly RStudio. In this article, I wanted to discuss some other features of R that would be helpful for new users, specifically vectors, matrices, and dataframes.
I encourage users to review the “Tips and Tricks (Guide) for R,” before diving into this one.
Note: During the preparation of this guide, the R version that I’m using is 4.4.2, and the RStudio version is 2024.12.1+563 (2024.12.1+563).
Contents
Here is a list of topics this article will cover:
R objects - vectors
In the previous article, I introduced R objects and how they are used in the R environment. I want to expand on this because there are many type of objects you can create in R, and each of these have unique features and properties.
Recall that an object can represent anything that is created
in R. An object can be a value, vector, matrix, data frame, and results
from a function. We refer to things that we create or assign something
to an object. For example, we can assign a value of 5 to an
object called x. Once we do this, we can use the object in
a variety of ways. In this example, I printed the value of the object
using print(x) function. The output will generate a value
of 5.
x <- 5
print(x)
## [1] 5
But what is a vector or a matrix?
A vector is a list of items that are of the same type. For example, I can create a list of numbers or a list of characters and texts.
The c() function is short for combine. We use the
c() function to combine data into a vector.
I use the str() function to display the strucure of the
vector. This will tell me what data types are contained in each
vector.
x <- c(1, 2, 3, 4, 5) # Vector of numbers from 1 to 5
y <- c("A", "B", "C", "D") # Vector of characters from A to D
z <- c("yellow", "red", "blue") # Vector of words or character strings
str(x)
## num [1:5] 1 2 3 4 5
str(y)
## chr [1:4] "A" "B" "C" "D"
str(z)
## chr [1:3] "yellow" "red" "blue"
The vector x contains numeric data. The vectors
y and z contain character data (also known as
strings or texts).
Notice that each vector contains the same type of data. In the first
vector x, the data type includes discrete integers. In the
second vector y, the data type includes characters. The
last vector z also includes characters, but these are in
string or text form.
If you mix these data types in a vector, what happens. Well, let’s find out.
x <- c(1, "A", 3, "candy", 5) # Vector of numbers from 1 to 5
str(x)
## chr [1:5] "1" "A" "3" "candy" "5"
Notice that the number 1 is a character and not a
numeric value. Why isn’t is a numeric? It’s because R
assumed for us that this vector must contain characters because some of
the data include characters such as "A" and
"candy".
Keep this in mind when you are creating a vector of numeric and/or character data types. It can impact how you can use these vectors in your programming.
R objects - matrices
A matrix is an array of values arranged into rows and columns. Since a matrix is arranged into rows and columns, it is 2-dimensional.
Here is an example of a simple 2 x 2 matrix.
Matrix example
We can use R to create a matrix using the
matrix() function. We will denote this matrix as
m1. (Note: Recall the matrix is an object, so we can denote
this object with a name. In this example, we denote the object (matrix)
as m1.)
m1 <- matrix(c(1, 2, 3, 4), # Values in the matrix
nrow = 2, # Number of rows
ncol = 2, # Number of columns
byrow = FALSE) # Order by columns first
print(m1)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
Notice that the numbers in the matrix are ordered by column first followed by the rows.
Matrix example 1
We can change the byrow = FALSE argument to
byrow = TRUE to arrange the numbers by prioritizing the
rows over the columns. This will change the arrangement of the matrix
(see Matrix example 2). We will denote this new matrix as
m2.
m2 <- matrix(c(1, 2, 3, 4), # Values in the matrix
nrow = 2, # Number of rows
ncol = 2, # Number of columns
byrow = TRUE) # Order by rows first
print(m2)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
Matrix example 2
Matrices are important for computational operations. In some cases, we will need to use matrices to perform operations such as additions, subtractions, multiplications, and divisions.
In biostatistics, matrices and vectors help in performing regression analysis with multiple variables.
Here is an example of a simple linear regression model in matrix and vector forms:
Linear regression model in matrix form
R objects - dataframes
A dataframe in R is a 2-dimensional data
structure that contains various vectors with different data types in a
single object. In almost all cases, the dataframe is a tabular
structure. More importantly, since it’s an object, we can use it in a
variety of ways such computational and applied statistics.
We can convert a matrix into a dataframe using the
data.frame() function.
Let’s suppose we had 4 individuals with a unique identifier
id. Each individual has their age measured in units of
years, which is denoted by the variable age. They are also
assigned to a group denoted by the grouping variable group.
Individuals can be in Group == 0 or
Group == 1.
Step 1 - Create a vector
First we’ll create vectors for each variable. The data types will be
numeric (not character). There are three vectors (id,
age, and group). For this exercise, we will
input the values for each vector.
Step 2 - Convert vectors into a matrix
Then, we will create a matrix using the vectors. We will denote this
matrix as m3.
Step 3 - Convert a matrix into a dataframe
Afterwards, we will convert the matrix into a dataframe. We will
denote this as df1.
Step 4 - Add labels to the dataframe
Lastly, we will add labels to the variables: id,
age, and group.
Here is the R code:
## Step 1: Create vectors of the variables
id <- c(1, 2, 3, 4)
age <- c(45, 65, 37, 29)
group <- c(0, 0, 1, 1)
## Step 2: Create a matrix with the vectors of variables
m3 <- matrix(c(id, age, group),
nrow = 4,
ncol = 3,
byrow = FALSE)
## Step 3: Transform matrix into a dataframe
df1 <- data.frame(m3)
## Step 4: Label the variables in the dataframe
names(df1) <- c("ID", "Age", "Group")
## Print the dataframe and inspect
print(df1)
## ID Age Group
## 1 1 45 0
## 2 2 65 0
## 3 3 37 1
## 4 4 29 1
Once the matrix has been converted to a dataframe, we can perform
operations on the data inside. For example, we can estimate the mean
(average) age of the individuals in the dataframe using the
mean() function. Inside the mean() function,
we have to include the variable we want to estimate the mean for (e.g.,
age). To do this, we need to include the dataframe
df1 and the variable Age. But we do this using
the $ to let R know that the Age variable is
part of the dataframe df1.
mean <- mean(df1$Age)
print(mean)
## [1] 44
The mean age of the sample in the dataframe is 44 years.
Final thoughts
In this exercise, we were able to create vectors and matrices with
the intention of using these to create a dataframe. A dataframe allows
us to perform statistical analysis such as descriptive analysis or
inferential analysis. Knowing how data are structured in R
will help users to better undrestand the nuances associated with data,
particularly when performing statistical analysis.
Now that we have gone through some examples of vectors, matrices, and
dataframes, you will be better prepared to navigate the R
environment and perform statistical computation and analyses in the
future.
Disclaimers
This is a work in progress and subject to updates in the future.
This is for educational purposes only.