Data structures

1. Vectors

To create a vector in R, we use the c() function. It can also be useful if we want to create a column as part of a data frame.

We are going to start with a vector that contains the name of five south american countries.

c("Chile", "Ecuador", "Bolivia", "Paraguay", "Uruguay")

## [1] "Chile"    "Ecuador"  "Bolivia"  "Paraguay" "Uruguay"

But, in R we can also create objects that store information. In this case we will store the previous vector in an object called pais and a second one that contains the population in millions of those 5 countries called poblacion.

pais <- c("Chile", "Ecuador", "Bolivia", "Paraguay", "Uruguay")
poblacion <- c(20, 18, 12, 6, 3)

Finally we display the two objects created by simply mentioning them.

pais

## [1] "Chile"    "Ecuador"  "Bolivia"  "Paraguay" "Uruguay"

poblacion

## [1] 20 18 12  6  3

2. Vectors to data frames

We can also create a data frame formed by some vectors. We will create a data frame with two columns, which will be the two vectors.

df_paises <- data.frame(pais, poblacion)

And we display it.

df_paises

##       pais poblacion
## 1    Chile        20
## 2  Ecuador        18
## 3  Bolivia        12
## 4 Paraguay         6
## 5  Uruguay         3

Nevertheless, we can also assign the names of the columns if we wanted them to be different from the vectors names.

df_paises <- data.frame(country = pais,
                     population = poblacion)

And we display the same data frame with the new names.

df_paises

##    country population
## 1    Chile         20
## 2  Ecuador         18
## 3  Bolivia         12
## 4 Paraguay          6
## 5  Uruguay          3

3. Matrixes

For matrixes, we can create a matrix that will contain all the integers from 1 to 9. We define that it will be composed by three rows (since they are nine numbers it will have also three columns). By default it is filled by columns.

matrix(1:9, nrow = 3)

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

We can also fill the matrix by rows.

matrix(1:9, byrow = TRUE, nrow = 3)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Now let’s create a matrix based on three vectors. Each vector will have the area (in thousands of squared kilometers) and the population from 2023 (in millions of people) for a Latin American country.

brazil <- c(8515.8, 211.1)
argentina <- c(2780.4, 45.54)
mexico <- c(1964.4, 129.7)

Then we can create a vector of the six values.

latin_america <- c(brazil, argentina, mexico)

Then we assign the vector with this information to a matrix of three rows (filled by rows).

latin_america_matrix <- matrix(latin_america, byrow = TRUE, nrow = 3)
latin_america_matrix

##        [,1]   [,2]
## [1,] 8515.8 211.10
## [2,] 2780.4  45.54
## [3,] 1964.4 129.70

We could also create the very same matrix without creating the vector latin_america.

latin_america_matrix <- matrix(c(brazil, argentina, mexico), nrow = 3, byrow = TRUE)
latin_america_matrix

##        [,1]   [,2]
## [1,] 8515.8 211.10
## [2,] 2780.4  45.54
## [3,] 1964.4 129.70

We can also set the row and column names. We start creating the vector of names.

figures <- c("area", "population")
countries <- c("brazil", "argentina", "mexico")

And then assign them to the matrix

colnames(latin_america_matrix) <- figures
rownames(latin_america_matrix) <- countries

Finally, the names can also be specified when the matrix is created. First the rows and then the columns.

latin_america_matrix <- matrix(latin_america,
                               nrow = 3, byrow = TRUE,
                               dimnames = list(countries, figures)) # rows and then columns

latin_america_matrix

##             area population
## brazil    8515.8     211.10
## argentina 2780.4      45.54
## mexico    1964.4     129.70

Finally we will create an additional row that will contain the total area and population of these three countries.

totals <- colSums(latin_america_matrix)
totals

##       area population 
##   13260.60     386.34

And will attach them to the end of the existing matrix.

latam_with_totals <- rbind(latin_america_matrix, totals)
latam_with_totals

##              area population
## brazil     8515.8     211.10
## argentina  2780.4      45.54
## mexico     1964.4     129.70
## totals    13260.6     386.34

4. Matrixes to data frames

df_latam <- data.frame(latin_america_matrix)

We could also use colnames to change the name of the data frame’s columns.

colnames(df_latam) <- c("Area_Km2_k", "Pop_M")
df_latam

##           Area_Km2_k  Pop_M
## brazil        8515.8 211.10
## argentina     2780.4  45.54
## mexico        1964.4 129.70

And we can delete the row names (i.e. the countries’ names) and assign those values to a real column named country.

rownames(df_latam) <- NULL
df_latam$country <- rownames(latin_america_matrix)
df_latam

##   Area_Km2_k  Pop_M   country
## 1     8515.8 211.10    brazil
## 2     2780.4  45.54 argentina
## 3     1964.4 129.70    mexico

Data structures

Pablo Herrera

marzo - 2025

Table of contents:

1. Vectors

2. Vectors to data frames

3. Matrixes

4. Matrixes to data frames