R and RStudio Tips and Tricks (Guide) - Part 2

Mark Bounthavong

30 August 2025

Introduction

Previoulsy, I created a “Tips and Tricks (Guide) for R,” which introduced users to the R environment, particularly RStudio. In this article, I wanted to discuss some other features of R that would be helpful for new users, specifically vectors, matrices, and dataframes.

I encourage users to review the “Tips and Tricks (Guide) for R,” before diving into this one.

Note: During the preparation of this guide, the R version that I’m using is 4.4.2, and the RStudio version is 2024.12.1+563 (2024.12.1+563).

Contents

Here is a list of topics this article will cover:

R objects - vectors

In the previous article, I introduced R objects and how they are used in the R environment. I want to expand on this because there are many type of objects you can create in R, and each of these have unique features and properties.

Recall that an object can represent anything that is created in R. An object can be a value, vector, matrix, data frame, and results from a function. We refer to things that we create or assign something to an object. For example, we can assign a value of 5 to an object called x. Once we do this, we can use the object in a variety of ways. In this example, I printed the value of the object using print(x) function. The output will generate a value of 5.

x <- 5

print(x)
## [1] 5

But what is a vector or a matrix?

A vector is a list of items that are of the same type. For example, I can create a list of numbers or a list of characters and texts.

The c() function is short for combine. We use the c() function to combine data into a vector.

I use the str() function to display the strucure of the vector. This will tell me what data types are contained in each vector.

x <- c(1, 2, 3, 4, 5) # Vector of numbers from 1 to 5
y <- c("A", "B", "C", "D") # Vector of characters from A to D
z <- c("yellow", "red", "blue") # Vector of words or character strings

str(x)
##  num [1:5] 1 2 3 4 5
str(y)
##  chr [1:4] "A" "B" "C" "D"
str(z)
##  chr [1:3] "yellow" "red" "blue"

The vector x contains numeric data. The vectors y and z contain character data.

Notice that each vector contains the same type of data. In the first vector x, the data type includes discrete integers. In the second vector y, the data type includes characters. The last vector y includes strings or texts.

If you mix these data types in a vector, what happens. Well, let’s find out.

x <- c(1, "A", 3, "candy", 5) # Vector of numbers from 1 to 5

str(x)
##  chr [1:5] "1" "A" "3" "candy" "5"

Notice that the number 1 is not a character. Why isn’t is a numeric? It’s because R assumed for us that this vector must contain characters because some of the data include characters such as "A" and "candy".

Keep this in mind when you are creating a vector of numeric and/or character data types. It can impact how you can use these vectors in your programming.

R objects - matrices

A matrix is an array of values arranged into rows and columns. Since a matrix is arranged into rows and columns, it is 2-dimensional.

Here is an example of a simple 2 x 2 matrix.

Matrix example

Matrix example

We can use R to create a matrix using the matrix() function.

m1 <- matrix(c(1, 2, 3, 4), 
             nrow = 2, 
             ncol = 2,
             byrow = FALSE)

print(m1)
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

Notice that the numbers in the matrix are ordered by column first followed by the rows.

Matrix example 1

Matrix example 1

We can change the byrow = FALSE argument to byrow = TRUE to arrange the numbers by prioritizing the rows over the columns. This will change the arrangement of the matrix (see Matrix example 2).

m2 <- matrix(c(1, 2, 3, 4), 
             nrow = 2, 
             ncol = 2,
             byrow = TRUE)

print(m2)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
Matrix example 2

Matrix example 2

Matrices are important for computational operations. In some cases, we will need to use matrices to perform operations such as additions, subtractions, multiplications, and divisions.

In biostatistics, matrices and vectors help in performing regression analysis with multiple variables.

Here is an example of a simple linear regression model in matrix and vector form:

Linear regression model in matrix form

Linear regression model in matrix form

R objects - dataframes

A dataframe in R is a 2-dimensional data structure that contains various vectors with different data types in a single object. In almost all cases, the dataframe is a tabular structure. More important, since it’s an object, we can use it in a variety of ways such a computational and applied statistics.

We can convert a matrix into a dataframe using the data.frame() function.

Let’s suppose we had 4 individuals with unique identified ID. Each individual has their age measured in units of years, which is denoted by the variable Age. They are also assigned to a group denoted by the grouping variable Group. Individuals can be in Group ==0 or Group == 1. First we’ll create vectors for each variable. The data types will be numeric (not character). Then, we will create a matrix using the vectors. After, we will convert the matrix into a dataframe. Lastly, we will add labels to the variables: ID, Age, and Group.

## Create vectors of the variables
id <- c(1, 2, 3, 4)
age <- c(45, 65, 37, 29)
group <- c(0, 0, 1, 1)

## Create a matrix with the vectors of variables
m3 <- matrix(c(id, age, group), 
             nrow = 4,
             ncol = 3,
             byrow = FALSE)

## Transform matrix into a dataframe
df1 <- data.frame(m3)

## Label the variables in the dataframe
names(df1) <- c("ID", "Age", "Group")

## Print the dataframe and inspect
print(df1)
##   ID Age Group
## 1  1  45     0
## 2  2  65     0
## 3  3  37     1
## 4  4  29     1

Once the matrix has been converted to a dataframe, we can perform operations on the data inside. For example, we can estimate the mean of the individuals in the dataframe using the mean() function. Inside the mean() function, we have to include the variable we want to estimate the mean for. To do this, we need to include the dataframe df1 and the variable Age. But we do this using the $ to let R know that the Age variable is part of the dataframe df1.

mean <- mean(df1$Age)

print(mean)
## [1] 44

The mean age of the sample in the dataframe is 44.

Final thoughts

We are able to create vectors and matrices with the intention of using these to create a dataframe. A dataframe allows us to perform statistical analysis such as descriptive analyis or inferential analysis. Knowing how data is structured in R will help users to better undrestand the nuances associated with data, particularly when it comes to performing statistical analysis.

Disclaimers

This is a work in progress and subject to updates in the future.

This is for educational purposes only.