Introduction
Previoulsy, I created a “Tips and Tricks (Guide) for R,” which introduced users to the R environment, particularly RStudio. In this article, I wanted to discuss some other features of R that would be helpful for new users, specifically vectors, matrices, and dataframes.
I encourage users to review the “Tips and Tricks (Guide) for R,” before diving into this one.
Note: During the preparation of this guide, the R version that I’m using is 4.4.2, and the RStudio version is 2024.12.1+563 (2024.12.1+563).
Contents
Here is a list of topics this article will cover:
R objects - vectors
In the previous article, I introduced R objects and how they are used in the R environment. I want to expand on this because there are many type of objects you can create in R, and each of these have unique features and properties.
Recall that an object can represent anything that is created
in R. An object can be a value, vector, matrix, data frame, and results
from a function. We refer to things that we create or assign something
to an object. For example, we can assign a value of 5
to an
object called x
. Once we do this, we can use the object in
a variety of ways. In this example, I printed the value of the object
using print(x)
function. The output will generate a value
of 5
.
x <- 5
print(x)
## [1] 5
But what is a vector or a matrix?
A vector is a list of items that are of the same type. For example, I can create a list of numbers or a list of characters and texts.
The c()
function is short for combine. We use the
c()
function to combine data into a vector.
I use the str()
function to display the strucure of the
vector. This will tell me what data types are contained in each
vector.
x <- c(1, 2, 3, 4, 5) # Vector of numbers from 1 to 5
y <- c("A", "B", "C", "D") # Vector of characters from A to D
z <- c("yellow", "red", "blue") # Vector of words or character strings
str(x)
## num [1:5] 1 2 3 4 5
str(y)
## chr [1:4] "A" "B" "C" "D"
str(z)
## chr [1:3] "yellow" "red" "blue"
The vector x
contains numeric data. The vectors
y
and z
contain character data.
Notice that each vector contains the same type of data. In the first
vector x
, the data type includes discrete integers. In the
second vector y
, the data type includes characters. The
last vector y
includes strings or texts.
If you mix these data types in a vector, what happens. Well, let’s find out.
x <- c(1, "A", 3, "candy", 5) # Vector of numbers from 1 to 5
str(x)
## chr [1:5] "1" "A" "3" "candy" "5"
Notice that the number 1
is not a character. Why isn’t
is a numeric? It’s because R
assumed for us that this
vector must contain characters because some of the data include
characters such as "A"
and "candy"
.
Keep this in mind when you are creating a vector of numeric and/or character data types. It can impact how you can use these vectors in your programming.
R objects - matrices
A matrix is an array of values arranged into rows and columns. Since a matrix is arranged into rows and columns, it is 2-dimensional.
Here is an example of a simple 2 x 2 matrix.
Matrix example
We can use R
to create a matrix using the
matrix()
function.
m1 <- matrix(c(1, 2, 3, 4),
nrow = 2,
ncol = 2,
byrow = FALSE)
print(m1)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
Notice that the numbers in the matrix are ordered by column first followed by the rows.
Matrix example 1
We can change the byrow = FALSE
argument to
byrow = TRUE
to arrange the numbers by prioritizing the
rows over the columns. This will change the arrangement of the matrix
(see Matrix example 2).
m2 <- matrix(c(1, 2, 3, 4),
nrow = 2,
ncol = 2,
byrow = TRUE)
print(m2)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
Matrix example 2
Matrices are important for computational operations. In some cases, we will need to use matrices to perform operations such as additions, subtractions, multiplications, and divisions.
In biostatistics, matrices and vectors help in performing regression analysis with multiple variables.
Here is an example of a simple linear regression model in matrix and vector form:
Linear regression model in matrix form
R objects - dataframes
A dataframe in R
is a 2-dimensional data
structure that contains various vectors with different data types in a
single object. In almost all cases, the dataframe is a tabular
structure. More important, since it’s an object, we can use it in a
variety of ways such a computational and applied statistics.
We can convert a matrix into a dataframe using the
data.frame()
function.
Let’s suppose we had 4 individuals with unique identified
ID
. Each individual has their age measured in units of
years, which is denoted by the variable Age
. They are also
assigned to a group denoted by the grouping variable Group
.
Individuals can be in Group ==0
or Group == 1
.
First we’ll create vectors for each variable. The data types will be
numeric (not character). Then, we will create a matrix using the
vectors. After, we will convert the matrix into a dataframe. Lastly, we
will add labels to the variables: ID
, Age
, and
Group
.
## Create vectors of the variables
id <- c(1, 2, 3, 4)
age <- c(45, 65, 37, 29)
group <- c(0, 0, 1, 1)
## Create a matrix with the vectors of variables
m3 <- matrix(c(id, age, group),
nrow = 4,
ncol = 3,
byrow = FALSE)
## Transform matrix into a dataframe
df1 <- data.frame(m3)
## Label the variables in the dataframe
names(df1) <- c("ID", "Age", "Group")
## Print the dataframe and inspect
print(df1)
## ID Age Group
## 1 1 45 0
## 2 2 65 0
## 3 3 37 1
## 4 4 29 1
Once the matrix has been converted to a dataframe, we can perform
operations on the data inside. For example, we can estimate the mean of
the individuals in the dataframe using the mean()
function.
Inside the mean()
function, we have to include the variable
we want to estimate the mean for. To do this, we need to include the
dataframe df1
and the variable Age
. But we do
this using the $
to let R know that the Age
variable is part of the dataframe df1
.
mean <- mean(df1$Age)
print(mean)
## [1] 44
The mean age of the sample in the dataframe is 44.
Final thoughts
We are able to create vectors and matrices with the intention of
using these to create a dataframe. A dataframe allows us to perform
statistical analysis such as descriptive analyis or inferential
analysis. Knowing how data is structured in R
will help
users to better undrestand the nuances associated with data,
particularly when it comes to performing statistical analysis.
Disclaimers
This is a work in progress and subject to updates in the future.
This is for educational purposes only.