rm(list = ls())
setwd("C:/00 Pablo/Programacion/R/Apunte 101")
To create a vector in R, we use the c() function. It can also be useful if we want to create a column as part of a data frame.
We are going to start with a vector that contains the name of five south american countries.
c("Chile", "Ecuador", "Bolivia", "Paraguay", "Uruguay")
## [1] "Chile" "Ecuador" "Bolivia" "Paraguay" "Uruguay"
But, in R we can also create objects that store information. In this case we will store the previous vector in an object called pais and a second one that contains the population in millions of those 5 countries called poblacion.
pais <- c("Chile", "Ecuador", "Bolivia", "Paraguay", "Uruguay")
poblacion <- c(20, 18, 12, 6, 3)
Finally we display the two objects created by simply mentioning them.
pais
## [1] "Chile" "Ecuador" "Bolivia" "Paraguay" "Uruguay"
poblacion
## [1] 20 18 12 6 3
We can also create a data frame formed by some vectors. We will create a data frame with two columns, which will be the two vectors.
df_paises <- data.frame(pais, poblacion)
And we display it.
df_paises
## pais poblacion
## 1 Chile 20
## 2 Ecuador 18
## 3 Bolivia 12
## 4 Paraguay 6
## 5 Uruguay 3
Nevertheless, we can also assign the names of the columns if we wanted them to be different from the vectors names.
df_paises <- data.frame(country = pais,
population = poblacion)
And we display the same data frame with the new names.
df_paises
## country population
## 1 Chile 20
## 2 Ecuador 18
## 3 Bolivia 12
## 4 Paraguay 6
## 5 Uruguay 3
For matrixes, we can create a matrix that will contain all the integers from 1 to 9. We define that it will be composed by three rows (since they are nine numbers it will have also three columns). By default it is filled by columns.
matrix(1:9, nrow = 3)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
We can also fill the matrix by rows.
matrix(1:9, byrow = TRUE, nrow = 3)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
Now let’s create a matrix based on three vectors. Each vector will have the area (in thousands of squared kilometers) and the population from 2023 (in millions of people) for a Latin American country.
brazil <- c(8515.8, 211.1)
argentina <- c(2780.4, 45.54)
mexico <- c(1964.4, 129.7)
Then we can create a vector of the six values.
latin_america <- c(brazil, argentina, mexico)
Then we assign the vector with this information to a matrix of three rows (filled by rows).
latin_america_matrix <- matrix(latin_america, byrow = TRUE, nrow = 3)
latin_america_matrix
## [,1] [,2]
## [1,] 8515.8 211.10
## [2,] 2780.4 45.54
## [3,] 1964.4 129.70
We could also create the very same matrix without creating the vector latin_america.
latin_america_matrix <- matrix(c(brazil, argentina, mexico), nrow = 3, byrow = TRUE)
latin_america_matrix
## [,1] [,2]
## [1,] 8515.8 211.10
## [2,] 2780.4 45.54
## [3,] 1964.4 129.70
We can also set the row and column names. We start creating the vector of names.
figures <- c("area", "population")
countries <- c("brazil", "argentina", "mexico")
And then assign them to the matrix
colnames(latin_america_matrix) <- figures
rownames(latin_america_matrix) <- countries
Finally, the names can also be specified when the matrix is created. First the rows and then the columns.
latin_america_matrix <- matrix(latin_america,
nrow = 3, byrow = TRUE,
dimnames = list(countries, figures)) # rows and then columns
latin_america_matrix
## area population
## brazil 8515.8 211.10
## argentina 2780.4 45.54
## mexico 1964.4 129.70
Finally we will create an additional row that will contain the total area and population of these three countries.
totals <- colSums(latin_america_matrix)
totals
## area population
## 13260.60 386.34
And will attach them to the end of the existing matrix.
latam_with_totals <- rbind(latin_america_matrix, totals)
latam_with_totals
## area population
## brazil 8515.8 211.10
## argentina 2780.4 45.54
## mexico 1964.4 129.70
## totals 13260.6 386.34
df_latam <- data.frame(latin_america_matrix)
We could also use colnames to change the name of the data frame’s columns.
colnames(df_latam) <- c("Area_Km2_k", "Pop_M")
df_latam
## Area_Km2_k Pop_M
## brazil 8515.8 211.10
## argentina 2780.4 45.54
## mexico 1964.4 129.70
And we can delete the row names (i.e. the countries’ names) and assign those values to a real column named country.
rownames(df_latam) <- NULL
df_latam$country <- rownames(latin_america_matrix)
df_latam
## Area_Km2_k Pop_M country
## 1 8515.8 211.10 brazil
## 2 2780.4 45.54 argentina
## 3 1964.4 129.70 mexico