Input -Output with R

Author

Kamal Romero

Published

April 7, 2024

R programming language: A (very) brief introduction

What Is R?

The following definition is from the book Learning R de Richard Cotton:

Just to confuse you, R refers to two things. There is R, the programming language, and R, the piece of software that you use to run programs written in R. Fortunately, most of the time it should be clear from the context which R is being referred to.

R (the language) was created in the early 1990s by Ross Ihaka and Robert Gentleman, then both working at the University of Auckland. It is based upon the S language that was developed at Bell Laboratories in the 1970s, primarily by John Chambers. R (the software) is a GNU project, reflecting its status as important free and open source software.

Strengths of R

  • Free and open source

  • Platform independent

  • Foster’s a reproducible workflow

  • Active community of users and programmers making R better

Working environment

In this course we will use Rstudio.

Rstudio is an IDE (integrated development environment) that works as a graphical interface that facilitates the use of the R language.

You have a online version, Posit cloud

R vs RStudio

The following image and sentence is taken from the book Statistical Inference via Data Science: A ModernDive into R and the Tidyverse

Figure 1: R vs Studio Source.

More precisely, R is a programming language that runs computations, while RStudio is an integrated development environment (IDE) that provides an interface by adding many convenient features and tools. So just as the way of having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well.

Alternatives

There are alternatives to Rstudio, but Rstudio is the de facto R IDE for many.

An alternative that is gaining a lot of popularity is the well-known Visual Studio Code.

If you don’t want to use Rstudio, this would be the suggested alternative

Highlights of RStudio

Figure 2: RStudio pane layout. Source.
  • Lower Right:

    • Files, Plots, Packages, Help
  • Upper Right:

    • Environment, History
  • Lower Left: Console

  • Upper Left: Text editor

  • Nice Features:

    • Importing Data
    • Tab completion
    • ChatGTP

Projects, directories and libraries: Organising the working environment

To keep all our files organised, including databases we load or graphics we create, we are going to work in what RStudio calls projects.

Essentially, a RStudio project is a folder or directory on your computer that contains all the elements of your project.

Figure 3: Create a R project

The use of projects in RStudio is a good practice that allows you to keep control of all the files used in a project.

Projects are often not only complex but also dynamic, and the management of all the elements that make up a project is often an essential part of the workflow.

Furthermore, the organisation into projects facilitates reproducibility.

For a more detailed discussion, read this section of this good book.

Basic functioning

Next, we are going to introduce the basic handling of the working environment, such as defining variables, making comments to the code, etc. In this process, we will be introducing language concepts that we will be defining in a formal way later on.

Arithmetic operations

To introduce us to the use of the editor and the command line of Rstudio, we will start with some very basic operations

2 + 2
[1] 4
5 - 3
[1] 2
3 * 2
[1] 6
6 / 3
[1] 2

It is possible to apply standard association rules as well as operations beyond the basic ones (power, logarithm, etc.)

(5 + 3) / 4
[1] 2
3^2
[1] 9
log(100, base = 10)
[1] 2
5 %% 3
[1] 2
sqrt(9)
[1] 3

Assigning variables

We can store values by assigning a name to them, so that we can access the value later.

x <- 4

x
[1] 4
y <- (5 + 3) / 4

y
[1] 2

You create variables, or create new objects with the <- operator. You can also do this in the more conventional way with = but this is not standard practice.

Employees <- 150

Employees
[1] 150

With names you have to respect certain conventions: they must start with a letter and can only contain letters, numbers, _ and ..

You are free to name variables as you like, but there are some rules of style, for example

i_use_snake_case
otherPeopleUseCamelCase
some.people.use.periods
What_EVer.I.wha_NT

Figure 4: An illustration of variable naming styles by the great Allison Horst. Source.
New_analysts_January <- 5

New_analysts_February <- 3

Analysts <- New_analysts_January + New_analysts_February

Analysts
[1] 8

Introduce comments

Comments are initiated with #

# Calculation of number of analysts
Analysts <- New_analysts_January + New_analysts_February

Analysts
[1] 8

Errors

Throughout the course we will talk about R runtime errors, but it is worth getting used to them, as they will always be with us 😢, but they give us a guide to solve them 😊

## analyst_number_calculation
Analysts_update <- New_analysts_January + New_analysts_February + New_analysts_March

Analyst_update

Libraries

Libraries or packages are perhaps the most commonly used elements in R for practical work.

A formal definition of a library from the book R Packages by Hadley Wickham and Jennifer Bryan is as follows:

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of March 2023, there were over 19,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearing house for R packages”.

An R package is a way to share code in an organised way that expands the possibilities of R by extending its functionality.

Libraries are installed using the install.packages() command, and libraries are loaded with library().

install.packages(emoji)
library(emoji)

Basic types in R (Data Types)

In order to work with data, it is necessary to understand how data is stored in the computer by each programming language.

The structures that store numerical information, and the way they are accessed, differ from Python to R, or from R to other languages.

The most important family of variable types in R are vectors, which can be classified as either atomic or list.

The common data objects in R are:

  • Vectors: one dimensional array
    • Types: numeric, integer, character, factor, logical
  • Matrices: two dimensional array
    • Each column must have the same type
  • Data Frames: two dimensional array
    • Columns may have different types
  • Lists
    • Items don’t need to be the same size.

The structure of atomic vectors is as follows:

Figure 5: Atomic vectors. Source: Advanced R
Number <- 1.0 # (real, floating)
Integer <- 1
Character <- "ab"   
Logical <- TRUE  

Number
[1] 1
Integer
[1] 1
Character
[1] "ab"
Logical
[1] TRUE

Be careful when working with different types

Number + Character

When we perform an operation with two different numeric types (real + integer), R forces (coerces) the result to the type with the highest precision, in this case the real type.

Sum <- Number + Integer

Sum
[1] 2
typeof(Sum)
[1] "double"

Several things here, we’ve had our first approximation to a function in R, a topic we’ll explore in more detail later. Like the intuitive idea we have of a function from high school mathematics, a function in R has an argument (variable in parentheses) and gives us a result.

In R, functions have their name followed by parentheses, where we place the argument variable(s): a_function(x).

The typeof() function tells us the type of the variable (numeric, integer or logical).

typeof(Number)
[1] "double"
typeof(Integer)
[1] "double"
typeof(Character)
[1] "character"
typeof(Logical)
[1] "logical"

There are also specific functions to determine whether a variable is of a specific type

is.numeric(Number)
[1] TRUE
is.integer(Integer) 
[1] FALSE
is.character(Character)
[1] TRUE
is.logical(Logical)
[1] TRUE

We see that R tells us that Integer is not an integer, to specify an integer we have to put a letter L at the end of the number

Integer_2 <- 1L
Integer_2
[1] 1
is.integer(Integer_2)
[1] TRUE
typeof(Integer_2)
[1] "integer"

Vectors

We are going to use R to analyse data and create statistical or algorithmic models from it.

Most data is represented in tables: spreadsheets, relational databases tables, .csv files, etc.

Most statistical models use as input data in table form.

The most commonly used objects for working with tables in R are data frames and other variants (tibble or data.tables for example).

Before understanding how to work with tables, let’s review the concept of vector, which is the basic type on which data frames are built.

What characterises a vector is that it can store only data of the same type.

vector_numeric <- c(1, 10, 49)
vector_character <- c("a", "b", "c")

vector_numeric
[1]  1 10 49
vector_character
[1] "a" "b" "c"

Vectors are one-dimensional arrays (row or column) that can store numbers, characters or logical variables.

As we have seen above, vectors are created with the c() command where the c stands for combine.

Be careful not to confuse this structure with vectors as elements of a vector space (more on this later).

vector_mixed <- c(1,2, "a")
vector_mixed
[1] "1" "2" "a"

In the previous example we wanted to create a vector with elements of different types, numeric and character. R has converted all the elements to character.

If there are characters in a vector R converts all the elements to character, if they are all numeric but of different types, R converts them to the type with the highest precision (double). What happens with logical vectors?

vector_mixed2 <- c(1,2,TRUE)
vector_mixed2
[1] 1 2 1
typeof(vector_mixed2)
[1] "double"

In this case R has converted the elements of the vector to numeric.

We observe something that later will be very useful, R has assigned to the logical variable TRUE the number 1. The variable FALSE has been assigned a zero.

Can we change a variable or a vector type? YES

vector_numeric
[1]  1 10 49
as.character(vector_numeric)
[1] "1"  "10" "49"
logic_vector <- c(TRUE, FALSE)

logic_vector
[1]  TRUE FALSE
as.numeric(logic_vector)
[1] 1 0

There are several functions in R that allow for changes of type

as.character(logic_vector)
[1] "TRUE"  "FALSE"

There are a couple of other ways to create vectors

vector_1 <- 1:5

vector_2 <- seq(1,5)

vector_1
[1] 1 2 3 4 5
vector_2
[1] 1 2 3 4 5

where seq stands for sequence

Since in this course the data we will use is external and not generated by us, we will not go into the different ways of generating vectors. The following box elaborates in more detail on this topic and is optional, you can follow the rest of the lecture without reading it.

Note

More complex sequences can be created with the seq() function.

vector_3 <- seq(1,10, by = 2)

vector_4 <- seq(1,10, length.out = 20)

vector_5 <- seq(1,10, along.with = vector_1)

vector_3
[1] 1 3 5 7 9
vector_4
 [1]  1.000000  1.473684  1.947368  2.421053  2.894737  3.368421  3.842105
 [8]  4.315789  4.789474  5.263158  5.736842  6.210526  6.684211  7.157895
[15]  7.631579  8.105263  8.578947  9.052632  9.526316 10.000000
vector_5
[1]  1.00  3.25  5.50  7.75 10.00

Another command to generate sequences is rep.

repeated <- rep(4,10)

repeated
 [1] 4 4 4 4 4 4 4 4 4 4

We can repeat not only a value but vectors

repeat_2 <- rep(1:4, 4)

repeat_2
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
repeat_3 <- rep(1:4, each = 4)

repeat_3
 [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
repeat_4 <- rep(1:4, each=2, times=2)

repeat_4
 [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
repeat_5 <- rep(1:4, c(2,3,4,5))

repeat_5
 [1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4

We can also create random vectors, i.e., realisations of a random variable from a given distribution

random_normal <- rnorm(10)

random_uniform <- runif(10)

random_normal
 [1]  2.26744026 -0.23434046  0.99443512  1.76363938  0.49630264 -0.60692147
 [7] -1.32656912 -0.70248140 -0.04814444 -0.17562381
random_uniform
 [1] 0.61299812 0.88554631 0.74677384 0.04601948 0.75496588 0.07011662
 [7] 0.02468990 0.16496740 0.66402780 0.12522288

The elements of the vector can be assigned names

names(vector_1)
NULL
names_vec <- c("one", "two", "three", "four", "five")

names(vector_1) <- names_vec

names(vector_1)
[1] "one"   "two"   "three" "four"  "five" 

Another characteristic of vectors besides their type, is their length or dimension, which we can determine with the length() function.

length(vector_1)
[1] 5
length(vector_4)
[1] 20
length(repeat_5)
[1] 14

Arithmetic operations with vectors

Arithmetic operations can be performed with vectors

vector_1
  one   two three  four  five 
    1     2     3     4     5 
vector_2
[1] 1 2 3 4 5
Vec_sum <- vector_1 +vector_2

Vec_sum
  one   two three  four  five 
    2     4     6     8    10 
vector_1
  one   two three  four  five 
    1     2     3     4     5 
vector_2
[1] 1 2 3 4 5
Vec_product <- vector_1 * vector_2

Vec_product
  one   two three  four  five 
    1     4     9    16    25 

In this case, both vectors have the same dimension. What happens if the opposite is true?

vector_1 
  one   two three  four  five 
    1     2     3     4     5 
seq(6,15)
 [1]  6  7  8  9 10 11 12 13 14 15
vector_1 + seq(6,15)
 [1]  7  9 11 13 15 12 14 16 18 20

This peculiar behaviour is called recycling. Observe the result: what has R done in this case?

Vectorisation of operations

An operation is said to be vectorised if it can be applied to all elements of a vector.

vector_numeric
[1]  1 10 49
vector_numeric + 2
[1]  3 12 51
vector_numeric * 2
[1]  2 20 98
vector_numeric / 2
[1]  0.5  5.0 24.5
vector_numeric^2
[1]    1  100 2401
sqrt(vector_numeric)
[1] 1.000000 3.162278 7.000000
log(vector_numeric)
[1] 0.000000 2.302585 3.891820

We note that when an arithmetic operator (+, *, -, /, ^) is applied to a vector, the operation is performed on each of the elements of the vector. Similarly, when we apply a function (sqrt(), log()) it uses all the elements of the vector as arguments and its output is a vector of dimension equal to the dimension of the original vector.

Vector inspection

It is possible to apply certain functions to analyse some characteristics of vectors.

Remember that the ultimate goal is to perform data analysis, some of these functions will be used on a regular basis when we want to inspect data in a table.

Basic statistics of a vector

summary(vector_4)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.25    5.50    5.50    7.75   10.00 

Values at the beginning and at the end

head(vector_4)
[1] 1.000000 1.473684 1.947368 2.421053 2.894737 3.368421
tail(vector_4)
[1]  7.631579  8.105263  8.578947  9.052632  9.526316 10.000000

Selections

It is possible to construct new vectors from a vector

vector_1
  one   two three  four  five 
    1     2     3     4     5 
vector_square <- vector_1^2

vector_square
  one   two three  four  five 
    1     4     9    16    25 

How do we access the elements of a vector?

vector_square[1]
one 
  1 
vector_square[3]
three 
    9 

Several elements in a row (slice)

vector_square[1:3] 
  one   two three 
    1     4     9 
vector_square[3:1]
three   two   one 
    9     4     1 

Non consecutive elements of a vector

vector_square[c(1,3)]
  one three 
    1     9 
vector_square[c(3,1)]
three   one 
    9     1 

All elements except some

vector_square[-c(1,3)] 
 two four five 
   4   16   25 
vector_square[-length(vector_square)]
  one   two three  four 
    1     4     9    16 

All elements satisfying a condition

vector_5
[1]  1.00  3.25  5.50  7.75 10.00
vector_5[vector_5 < 5]
[1] 1.00 3.25
vector_5[vector_5 > 5]
[1]  5.50  7.75 10.00

The condition can be an equality

vector_5[vector_5 == 5.5]
[1] 5.5

These conditions are very important for data analysis. Suppose you want to locate in a table with the financial data from a sample of bank costumers, those customers whose income is greater than 30,000 per year.

It is also useful to locate data and replace it with another value. Following the example above, replace the income column for customers whose annual income is less than 12,000 per year with zero, as we will not be working with that segment.

vector_5
[1]  1.00  3.25  5.50  7.75 10.00
vector_5[vector_5 < 5] <- 0

vector_5
[1]  0.00  0.00  5.50  7.75 10.00

In case the vectors have names, we can use these to select elements

vector_square["two"]
two 
  4 
vector_square[c("one", "three")]
  one three 
    1     9 

More functions applied to vectors

vector_5 <- seq(1,10, along.with = vector_1)

vector_5
[1]  1.00  3.25  5.50  7.75 10.00
max(vector_5)
[1] 10
min(vector_5)
[1] 1
sum(vector_5)
[1] 27.5
prod(vector_5)
[1] 1385.312

Matrices

So far we have been working with a one-dimensional data structure, but in data analysis we are sometimes interested in discovering or finding relationships between variables, so it is necessary to look at them as a whole in order to capture interrelationships.

In our particular case, we will represent subsets of the input output table as matrices, and apply linear algebra operations to them.

This forces us to have to work in more than one dimension, in R we can do that with matrices and arrays.

A matrix is nothing more than a two-dimensional structure that allows us to store homogeneous data. To construct a matrix we use the matrix() function

vector_6 <- seq(8)

vector_6
[1] 1 2 3 4 5 6 7 8
A <- matrix(vector_6)

A
     [,1]
[1,]    1
[2,]    2
[3,]    3
[4,]    4
[5,]    5
[6,]    6
[7,]    7
[8,]    8

We have a one-column vector. What if we want more than one dimension?

A <- matrix(vector_6, nrow = 2)

A
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
A <- matrix(vector_6, ncol = 2)

A
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8
A <- matrix(vector_6, nrow = 4, ncol = 2)

A
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8

We notice that the matrix is filled by columns, we can change that behaviour

A <- matrix(vector_6, nrow = 4, ncol = 2, byrow = TRUE)

A
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
[4,]    7    8

Just as we had the name attribute for vectors, we can also name the rows and columns of a matrix. To inspect the row names of a matrix we use rownames(), and the row names colnames().

rownames(A)
NULL
colnames(A)
NULL

NULL indicates when an expression or function results in an undefined value. In this case it indicates that the array has no names. Let’s assign names to it by associating a vector of characters

rownames(A) <- c("row_1", "row_2", "row_3", "row_4")
colnames(A) <- c("column_1", "column_2")

rownames(A)
[1] "row_1" "row_2" "row_3" "row_4"
colnames(A)
[1] "column_1" "column_2"
A
      column_1 column_2
row_1        1        2
row_2        3        4
row_3        5        6
row_4        7        8

`dimnames() is another way to inspect/assign the names of an array.

dimnames(A)
[[1]]
[1] "row_1" "row_2" "row_3" "row_4"

[[2]]
[1] "column_1" "column_2"

Is there anything strange about this structure? There are lists, a topic that we will not be covered in this course.

Accessing the elements of an array is very similar to the case of vectors, we just need to differentiate between dimensions with a comma ,.

A
      column_1 column_2
row_1        1        2
row_2        3        4
row_3        5        6
row_4        7        8
A[1,1]
[1] 1
A[3,2]
[1] 6
A["row_2", "column_2"]
[1] 4
A[1:2, 1]
row_1 row_2 
    1     3 
A[c("row_1", "row_4"), 2]
row_1 row_4 
    2     8 

One operation that is widely used in practice is to create a new table from another by binding rows and columns. This is done with the rbind() and cbind() functions.

If we want to add columns based on another matrix:

set.seed(125)

matrix_A <- matrix(data = runif(n = 8, min = 10, max = 50), nrow = 2, ncol = 4)

matrix_B <- matrix(data = runif(n = 2, min = 10, max = 50), nrow = 2, ncol = 1)

matrix_A
         [,1]     [,2]     [,3]     [,4]
[1,] 42.98698 21.99122 48.60780 31.32369
[2,] 14.67404 24.26243 48.70242 23.28622
matrix_B
         [,1]
[1,] 36.15375
[2,] 34.58198

We combine both matrices

matrix_AB <- cbind(matrix_A, matrix_B)

matrix_AB
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 42.98698 21.99122 48.60780 31.32369 36.15375
[2,] 14.67404 24.26243 48.70242 23.28622 34.58198

If we want to add rows based on another matrix:

matrix_C <- matrix(data = runif(n = 5, min = 10, max = 50), nrow = 1, ncol = 5)

matrix_C
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 10.56518 40.53518 24.37221 14.05403 32.27189
matrix_ABC <- rbind(matrix_AB, matrix_C)

matrix_ABC
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 42.98698 21.99122 48.60780 31.32369 36.15375
[2,] 14.67404 24.26243 48.70242 23.28622 34.58198
[3,] 10.56518 40.53518 24.37221 14.05403 32.27189

As mentioned before, matrices are objects that can be manipulated as matrices in the sense of linear algebra. For example, we can calculate the transpose or the inverse of a matrix

Transpose

A
      column_1 column_2
row_1        1        2
row_2        3        4
row_3        5        6
row_4        7        8
t(A)
         row_1 row_2 row_3 row_4
column_1     1     3     5     7
column_2     2     4     6     8

Inverse

A
      column_1 column_2
row_1        1        2
row_2        3        4
row_3        5        6
row_4        7        8
solve(A[1:2,1:2])
         row_1 row_2
column_1  -2.0   1.0
column_2   1.5  -0.5

Diagonal

vector_diagonal <- c(1,2,3,4)

diag(vector_diagonal)
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    2    0    0
[3,]    0    0    3    0
[4,]    0    0    0    4

In this case

diag(4)
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    0    0    1    0
[4,]    0    0    0    1

The diag(n) command generates a square diagonal matrix of ones whit dimension n

The %*% operator performs products of matrices, as we will see in the practical examples

Note

Use the help to determine what the following linear algebra functions do

det(), svd(), eigen(), qr(), chol()

Data Frames

The data.frame will be one of the most commonly used objects in R for data analysis.

Data frames are tables or rectangular arrays, we can treat them as if they were an Excel sheet in which we organise the data we want to analyse.

As in Excel, we can do calculations with the tables, create new columns (variables), perform statistical analysis, etc.

Inspecting the data

In general, we usually inspect the data in a table to familiarise ourselves with it and see what kind of analysis we can do with it.

Let’s use one of the datasets that already comes with the basic R installation, mtcars.

We can view a user friendly version of the table in RStudio

mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

There are several ways to take a quick look at the data without having to view the whole table.

For example we can see the first 6 rows

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Or the last 6

tail(mtcars)
                mpg cyl  disp  hp drat    wt qsec vs am gear carb
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

If instead of 6, we want to see a certain number of rows, we simply tell the head and tail functions to do so

head(mtcars, n = 3)
               mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
tail(mtcars, n = 2)
               mpg cyl disp  hp drat   wt qsec vs am gear carb
Maserati Bora 15.0   8  301 335 3.54 3.57 14.6  0  1    5    8
Volvo 142E    21.4   4  121 109 4.11 2.78 18.6  1  1    4    2

Sometimes we use very large tables that store many variables, in this case before displaying even a fraction of the table, it is useful to inspect the column names of the data frame

colnames(mtcars)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"

In general, in order to determine how many variables and how many observations we have, we use the dim function, which gives the number of columns and rows in the table

dim(mtcars)
[1] 32 11

We can see the number of rows and columns separately with nrow and ncol.

ncol(mtcars)
[1] 11
nrow(mtcars)
[1] 32

Another way is to use the str function. Although it is not intuitive and even intimidating at first glance, it is not that complicated and provides useful information.

str(mtcars)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

A version that provides statistical information is summary.

summary(mtcars)
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

Now that we have our data in a data frame, the next step is to access subsets of data in the table (a column or group of rows that meet a condition), manipulate it and create new variables.

We can access the data in the data frame in the same way as we did with vectors, data frames are a kind of set of vectors.

Remember that we could access elements of vectors by position or name

mtcars[1,2]
[1] 6
mtcars[1,4]
[1] 110

In the first case we tell R that we want it to show us the data of the first row and the second column, in the second case the data of the first row and the fourth column.

In the case that we want to see a specific column or row, we leave one of the dimensions blank.

If we want to access the information of the second row

mtcars[2, ]
              mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

If we want to access the information in the fourth column

mtcars[ , 4]
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
[20]  65  97 150 150 245 175  66  91 113 264 175 335 109

We can also get the information per column using the column name

mtcars[ , "cyl"]
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

Likewise, we can inspect a data frame column by typing the data frame name followed by a $ sign and the column name.

The advantage of this method is that the column names are autocompleted

mtcars$cyl
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

With position, like vectors, we can access sets of rows and columns (slices).

For example, if we want to keep only the first 3 rows and the fourth and fifth columns, we can access the following position sets

mtcars[1:3, 4:5]
               hp drat
Mazda RX4     110 3.90
Mazda RX4 Wag 110 3.90
Datsun 710     93 3.85

We can build a new table from this sub-table, assigning a name to the latter

mtcars_peque <- mtcars[1:3, 4:5]

mtcars_peque
               hp drat
Mazda RX4     110 3.90
Mazda RX4 Wag 110 3.90
Datsun 710     93 3.85

Sub-table with first 10 rows only

mtcars[1:10, ]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Sub-table with last 2 columns only

mtcars[, 4:5]
                     hp drat
Mazda RX4           110 3.90
Mazda RX4 Wag       110 3.90
Datsun 710           93 3.85
Hornet 4 Drive      110 3.08
Hornet Sportabout   175 3.15
Valiant             105 2.76
Duster 360          245 3.21
Merc 240D            62 3.69
Merc 230             95 3.92
Merc 280            123 3.92
Merc 280C           123 3.92
Merc 450SE          180 3.07
Merc 450SL          180 3.07
Merc 450SLC         180 3.07
Cadillac Fleetwood  205 2.93
Lincoln Continental 215 3.00
Chrysler Imperial   230 3.23
Fiat 128             66 4.08
Honda Civic          52 4.93
Toyota Corolla       65 4.22
Toyota Corona        97 3.70
Dodge Challenger    150 2.76
AMC Javelin         150 3.15
Camaro Z28          245 3.73
Pontiac Firebird    175 3.08
Fiat X1-9            66 4.08
Porsche 914-2        91 4.43
Lotus Europa        113 3.77
Ford Pantera L      264 4.22
Ferrari Dino        175 3.62
Maserati Bora       335 3.54
Volvo 142E          109 4.11

We can select data with criteria based on meeting a condition

mtcars[mtcars$cyl > 4 , ]
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
mtcars[mtcars$cyl == 6 , ]
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

We can complicate it

mtcars[mtcars$cyl == 6 & mtcars$mpg > 18, ]
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

We have asked R to show us the data in the table corresponding to cars with 6 cylinders and miles per gallon greater that 18.

Both conditions are fulfilled at the same time.

In case you need that only one of the condition holds:

mtcars[mtcars$cyl == 6 | mtcars$mpg > 18, ]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Fiat 128          32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic       30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Read and write data in R

As mentioned earlier, instead of generating the data we will be analysing, we typically obtain the data from external sources. In our case, the input-output tables are generated by the national statistical institutes.

These tables are shared in various formats, such as Excel or flat files (mainly csv). It is advisable to prioritize the use of flat files.

Similarly, we will also want to save the data generated in R in a format that can be read in other platforms, such as Excel.

Reading data in R

In this presentation we will focus on two formats: Excel and csv.

csv files

The R base function for reading csv files is read.csv().

We need to provide the function with the file’s location path.

mtcars_csv <- read.csv(file = "mtcars.csv")

head(mtcars_csv)

Inspecting the function, we can find additional arguments. Let us explore them using the tab button.

In practice, we typically use the arguments header, sep and dec

xlsx files

Base R does not have a function to read Excel files, so we must rely on a library. The most common choice is the readxl package.

Once we load the library, we will use the function read_excel()

library(readxl)

mtcars_excel <- read_excel("mtcars.xlsx")

head(mtcars_excel)

In the practice section, we will use some of the other arguments of the read_excel() function. It’s worth mentioning the skip argument, which is used if we don’t wish to read the first n lines of an Excel file.

Writing data in R

Unsurprisingly, we write data from R using the functions write.csv() and write_xlsx()

We must provide both functions with the object (data frame) we wish to save and the file path.

write.csv(x = airquality, file = "my_airquality.csv")

library(writexl)

write_xlsx(x = airquality, path = "my_airquality.xlsx")

Input Output Analysis

We will use as an example a input output table of Belgium with 10 sectors

Read the data

We have our input-output table in Excel format. In order to manipulate this data in R, we must have an R object in memory.

When we read data from an Excel or csv file, this object is loaded as a data frame.

BE_Z <- read_xlsb(path = "figaroIO/FIGARO_ICIOI_23ed_2021.xlsb",
                  sheet = "BE",
                  range = "C12:L22")

BE_Ytemp <- read_xlsb(path = "figaroIO/FIGARO_ICIOI_23ed_2021.xlsb",
                      sheet = "BE",
                      range = "M12:R22")

BE_X <- read_xlsb(path = "figaroIO/FIGARO_ICIOI_23ed_2021.xlsb",
                  sheet = "BE",
                  range = "S12:S22")

Technical coefficients matrix

Let’s revisit the representation of interindustry production relations using the interindustry sales matrix \(Z\) and the final demand vector \(y\) as a system of linear equations in matrix form:

\[x=Zi+y\] In this equation, \(Z\) represents the interindustry sales matrix, \(i\) is a column vector of ones, and \(y\) is a vector representing the final demand by industry.

We can define the technical coefficient \(a_{ij}\) as:

\[a_{ij}=\frac{z_{ij}}{x_j}\]

Here, \(z_{ij}\) represents the value of interindustry sales between industries \(i\) and \(j\), and \(x_j\) represents the value of production in industry \(j\).

Using the technical coefficient definition, we can rewrite \(z_{ij}\) as \(z_{ij}=a_{ij}x_j\), which can be represented in matrix notation as:

\[x=Ax+y\]

Where \(A\) represents the technical coefficient matrix.

Once we have our input-output data loaded in R, the next step is to construct our \(Z\) and \(A\) matrices. While our data may initially be loaded as data frames in R, it’s important to note that we need to convert them to matrices in order to perform matrix algebra operations. Therefore, our first task is to convert the data frames to matrices.

Z <- as.matrix(BE_Z)

We add names to the matrix rows

rownames(Z) <- colnames(Z)

Next, we repeat this process with the output vector. Specifically, we need to extract the output column from the BE_X data frame and transform it into a vector, as opposed to a one-dimensional matrix. It’s important to note the distinction, as using a vector in this context is more appropriate for the calculations we will be performing.

x <- as.vector(BE_X$X)

We now have all the necessary elements to compute the technical coefficients matrix.

A = Z %*% diag(1 / x) 

colnames(A) <- rownames(A)

Leontief inverse

From the equation \(x=Ax+y\), we can derive an expression that informs us of the output required to meet a given exogenous level of final demand:

\[x=(I-A)^{-1}y\]

where \((I-A)^{-1}\) is known as the Leontief inverse or the total requirements matrix.

We compute the Leontief inverse in R:

#Number of sectors
n <- length(x) 

#Identy matrix
I <- diag(n) 

#Leontief inverse
B <- solve(I - A) 

Multipliers

Applying the change operator \(\Delta\) to both sides of the expression \(x=(I-A)^{-1}y\) we obtain

\[\Delta x=(I-A)^{-1}\Delta y\] Next, let’s consider the relationship between a unit change in the demand vector \(\Delta y\) and the resulting change in production \(\Delta x\), which can be expressed as \((I-A)^{-1}\). Let’s define \(B=(I-A)^{-1}\).

The matrix \(B\) enables us to calculate the increase in the production of each sector in response to changes in exogenous factors, such as variations in final demand

#Number of sectors
n <- length(x) 

#Identy matrix
I <- diag(n) 

#Leontief matrix
B <- solve(I - A) 

Interpretation

For instance, let’s compare the interpretation of the elements of \(A\) and \(B\), such as the elements in the first and second rows and the first column of matrix \(A\)

A[1, 1]
[1] 0.06982499
A[2, 1]
[1] 0.3041312

The element \(A_{1,1}\) indicates that in order to produce 1 unit of output in the agriculture, forestry, and fishing industry, 0.069825 units of the agriculture, forestry, and fishing industry are required.

Similarly, the element \(A_{2,1}\) tells us that to produce 1 unit of output in the agriculture, forestry, and fishing industry, 0.3041312 units of the mining and quarrying industry are needed.

By summing the columns of matrix \(A\), we can obtain the direct effect of a unitary change in the demand for the product in a specific industry

direct <- colSums(A, na.rm = TRUE)

direct
        A       B.E         F       G.I         J         K         L       M.N 
0.5636941 0.3477060 0.6233963 0.3335816 0.3543786 0.3378214 0.2422528 0.3520851 
      O.Q       R.U 
0.2186530 0.4343436 
direct[which.max(direct)]
        F 
0.6233963 
sort(direct, decreasing = TRUE)
        F         A       R.U         J       M.N       B.E         K       G.I 
0.6233963 0.5636941 0.4343436 0.3543786 0.3520851 0.3477060 0.3378214 0.3335816 
        L       O.Q 
0.2422528 0.2186530 

These numbers do not account for the feedback loops generated by the interdependence between sectors.

The matrix \(B\) incorporates the indirect effect stemming from the fact that industries supplying the agriculture sector, for instance, also require inputs from other industries.

By summing the columns of matrix \(B\), we obtain the direct + indirect effects of a unitary change in the demand for the product in a specific industry.

For example, let’s consider the elements in the first and second rows and the first column of matrix \(B\)

B[1, 1]
[1] 1.08592
B[2, 1]
[1] 0.4232464

The element \(B_{1,1}\) indicates that in order to meet a 1 unit increase in demand for agriculture, forestry, and fishing industry products, 1.0859204 units of agriculture, forestry, and fishing industry input are required. This includes the additional unit itself, as well as 0.0859204 units needed as input for other sectors.

Similarly, the element \(B_{2,1}\) shows that to satisfy a 1 unit increase in demand for agriculture, forestry, and fishing industry products, 0.4232464 units of mining and quarrying industry input are necessary.

By calculating the column sums of matrix \(B\), we can determine the multiplier for industry \(j\), which represents the total value of production in all industries required for all stages of production in order to produce one unit of product \(j\) for final use in the economy

total <- colSums(B, na.rm = TRUE)

total[which.max(total)]
       F 
2.161749 
sort(total, decreasing = TRUE)
       F        A      R.U      B.E        J      M.N        K      G.I 
2.161749 1.902851 1.686612 1.551760 1.547939 1.543445 1.516327 1.512203 
       L      O.Q 
1.387931 1.334102 

Closed model

So far, the model we have been dealing with treats the household sector as completely exogenous. This approach does not consider the feedback effects of this sector in response to an initial increase in demand.

When the production of a given sector increases, it can stimulate the demand for labor, leading to higher income earnings and potentially expanding consumer consumption. This generates additional rounds of spending that are not accounted for in the previous section’s analysis.

To incorporate the household sector into the \(x=(I-A)^{-1}y\) model, we expand the \(A\) matrix by including the household sector’s final demand column and the labor input row in the following manner

\[\begin{bmatrix} \mathbf{x} \\ x_{n+1} \end{bmatrix} = \begin{bmatrix} \mathbf{A} & \mathbf{h_c} \\ \mathbf{h_r} & 0 \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ x_{n+1} \end{bmatrix} + \begin{bmatrix} \mathbf{y} \\ y_{n+1} \end{bmatrix}\]

where \(\mathbf{h_c}\) represents the consumption coefficients (household consumption by sector divided by the total labor remuneration) and \(\mathbf{h_r}\) represents the labor earnings by sector divided by the sector’s output.

We will now proceed to load these additional elements into our R environment

BE_C <- read_xlsb(path = "figaroIO/FIGARO_ICIOI_23ed_2021.xlsb",
                  sheet = "BE",
                  range = "N12:N22")

c <- as.matrix(BE_C$P3_S14) 

BE_R <- read_xlsb(path = "figaroIO/FIGARO_ICIOI_23ed_2021.xlsb",
                  sheet = "BE",
                  range = "C27:L28")

We rearrange the labor earnings vector

rTemp <- t(BE_R) 
r <- as.matrix(rTemp)

Compute the consumption and earnings coefficients

hc <- c / sum(r) # Consumption coefficient

hr <- r / x # Earnings coefficient

hr <- t(hr) # Transpose

Build the \(\begin{bmatrix} \mathbf{A} & \mathbf{h_c} \\ \mathbf{h_r} & 0 \end{bmatrix}\) matrix

AF <- matrix(NA, ncol = n + 1, nrow = n + 1)  # A Matrix for the closed model
AF <- rbind(cbind(A, hc), cbind(hr, 0)) # Technical coefficients matrix

Compute the Leontief inverse matrix for the closed model

IF <- diag(n + 1) # Identity (n+1)x(n+1)

BF <- solve(IF - AF) # Leontief inverse matrix

Obtain the multipliers

totalF <- colSums(BF[,1:n], na.rm = TRUE)

total[which.max(total)]
       F 
2.161749 
sort(total, decreasing = TRUE)
       F        A      R.U      B.E        J      M.N        K      G.I 
2.161749 1.902851 1.686612 1.551760 1.547939 1.543445 1.516327 1.512203 
       L      O.Q 
1.387931 1.334102