It is often convenient to think biological datasets as ‘matrices’, as values are of same set of variables (e.g., levels of expression of different genes) in a set of biological samples (e.g., cell lines with different treatment protocols). So performing mathematical operations on matrices (termed as ‘linear algebra’) is worth learning. A matrix is a grid of numbers, each of which can be referred to by row and column indices. Let’s define a matrix A of dimension 3x2. (it has 3 rows and 2 columns)
\[\begin{bmatrix} A_{11}&A_{12} \\ A_{21}&A_{22} \\ \end{bmatrix}\]
If we multiply two matrices, then a new matrix with the number of row of the first matrix and the number of column of the second matrix will be formed.
\[ If\ AB = C,\ C_{ij} = \sum_{k = 1}^{p} A_{ik}\ *\ B_{kj}\]
If A is an (n x p) matrix and B is a (p x m) matrix, then AB will be an (n x m) matrix.
Let’s make a matrix in R.
Matrix A with dimensions (3 x 2).
A <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 3, ncol = 2)
A
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
Let’s make another matrix B with dimensions (2 x 3).
B <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
B
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
To multiply these two matrices we need an operator “%*%“.
C <- A%*%B
C
## [,1] [,2] [,3]
## [1,] 9 19 29
## [2,] 12 26 40
## [3,] 15 33 51
As you can see the resultant matrix C is a (3 x 3) matrix.
As we already know, a set of values sampled from a population of values modeled as a ‘random variable’. And if multiple measurements are obtained from the same objects, the set of vectors representing each measurement’s values can be treated as a matrix.
A matrix with just one row or one column is usually referred to as a vector.
This is usually the case for the datasets we will be working with ^_^