MVA Class 4 — Covariance Matrices

Covariance Matrices, Correlation, and Mahalanobis Distance

Today’s topic is Covariance matrix and related concepts, including correlation and Prasanta Chandra Mahalanobis distance, as stated in the official syllabus (00:03:52).
The discussion will cover the application of these quantities and serve as a review of matrix algebra, as computing them involves translating ordinary formulas into matrix expressions (00:04:21).
The goal is to learn how to compute these quantities, which will help in understanding how to translate formulas involving sums into matrix expressions (00:04:24).
The class will likely review some conceptual and geometric aspects of matrix algebra, but the focus will be on computing Covariance and correlation matrices (00:04:48).

Correlation and Regression

Correlation is a measure of bivariate Linearity dependence, and the class will cover the formulas for correlation and covariance, with R code provided at the end of the class (00:05:47).
The discussion will also touch on the relationship between regression lines and the data cloud, including the concept of best-fitting lines (00:06:18).
The class will use R code to demonstrate the concepts, but the instructor is having trouble navigating the interface due to changes in the IPhone and iPad Zoom layout (00:05:29).

Major Axis Regression

Major Axis regression is a type of regression that is not commonly discussed, but is often used in ecology, and it has to do with where the error term is placed in the equation (00:07:18).
In Major Axis regression, the error terms are perpendicular to the major axis line, and it is a Euclidean space approach (00:08:22).

Simple Linear Regression

In simple linear regression, the equation y = alpha + beta x + error can be used to describe the relationship between two variables, where beta equals the Covariance of x and y divided by the variance of x (00:08:39).
The equation can also be written as y = mu y + beta (x - mu x), where mu y and mu x are the means of y and x, respectively (00:09:31).
When working with units in the equation, the units of alpha must match the units of y, and the units of the error term must also match the units of y (00:10:56).
The check the dimensions rule in matrix algebra states that when adding things, their units must be the same (00:11:29).
The units of beta are the units of y divided by the units of x, and the units of the error term are the same as the units of y (00:11:15).
Using concrete units, such as centimeters and grams, can help make the equation more understandable and easier to work with (00:10:29).
In many social science areas, units are not often considered, but thinking about units can help determine whether a model makes sense or not, as adding different units (e.g., grams and centimeters) is meaningless (00:11:42).
When considering the slope of Beta, the units have to cancel out, so if Beta is in grams, it must be per centimeter, resulting in a unit of grams per centimeter (00:12:26).

Correlation Coefficients and Units

A Correlation coefficient has no units, or it can be thought of as having standard deviation units, and the equation for Beta is unitless (00:12:58).
Prasanta Chandra Mahalanobis distance is a way of making a dimensionless quantity, which will be discussed later in the class (00:13:20).
When performing Principal component analysis (PCA), dimensions must be managed, and different models manage them in different ways, but it is essential to think about them (00:13:34).
The standard deviation of y has units of grams, and the ratio of standard deviations cancels out the units (00:13:59).

Matrix Representation of Regression

Regression can be written in matrix form, and the expected value of y given X can be expressed using a formula that considers the dimensions of the variables (00:14:47).
The notation X represents a vector that is p by one, where p is the number of variables or covariates, and y is a scalar (one by one) (00:16:11).
The formula for the expected value of y given X involves a vector X and a scalar y, resulting in a p by one plus one by one vector (00:16:39).

Vectors in R

In R, a vector can be created by stacking numbers, with the “c” function used to combine the numbers, and the vector can be extended by adding more numbers to the end, separated by commas (00:16:59).
The expected value of a vector X is represented as Mu of X, which is a vector containing the means of all the variables in X, denoted as Mu x1, Mu x2, …, Mu xP (00:17:43).
The vector X with a hat on top (x with the r on top) is a p by 1 vector, representing the population means, and not the data set, which would have multiple rows (00:18:02).

Covariance Matrix

The Covariance matrix is represented by the Greek language letter Sigma (Σ), and it is a p by p matrix that contains the variances and Covariance of the variables in X (00:18:59).
The notation for the covariance matrix is often written with a big Sigma and a slash through it to distinguish it from the sum sign, and it is commonly used to represent the population covariance matrix (00:20:21).
The data covariance matrix is often represented by a big S, and it is used to distinguish it from the population covariance matrix (00:21:04).
The covariance matrix Sigma is used to represent the population covariance matrix, and it is a p by p matrix that contains the variances and covariances of the variables in X (00:19:22).
The covariance matrix, denoted as Big Sigma, is a matrix that contains variances and covariances of variables, with variances on the diagonal and covariances on the off-diagonal (00:21:58).
The variance of a variable, represented as Sigma 1, 1, is the variance of x 1, which is a measure of the spread or dispersion of the variable (00:22:17).
The Covariance matrix is a summary of the data, performing data reduction, and it does not contain information about the identity of individual cases (00:22:57).
The Covariance matrix and the mean vector are used in Structural equation modeling (SEM) to fit models, but there is still more information in the data set that is not used (00:23:35).
Linearity regression only uses the means, standard deviations, and Correlation coefficient, which can be computed from the covariance matrix and the mean vector (00:24:16).
The covariance matrix is used in various statistical techniques, including factor analysis, confirmatory factor analysis, and longitudinal models, which are often used in Structural Equation Modeling (00:24:50).
The covariance matrix has variances on the diagonal and covariances on the off-diagonal, with the variances representing the spread of the variables and the covariances representing the relationship between variables (00:25:39).

Covariance and its Interpretation

Covariance is a measure of linear dependence, essentially describing how much data points deviate from the major axis, and it is symmetric (00:26:29).
There is a way to generalize the regression formula, allowing for the calculation of regressions using matrix formulas, although this is more for theoretical understanding rather than practical application (00:26:36).
Covariance is not typically used for direct computation in computers, but rather for understanding the underlying theory (00:27:12).
The population covariance is a peculiar expected value that measures the average Linearity relationship between two variables (00:27:34).
The formula for covariance involves multiplying the deviations of two variables together and taking the average, which is similar to the standard data formula for variance (00:27:52).
Covariance is a generalized form of variance that measures the average linear relationship between two variables, rather than the scatter around one mean (00:28:36).
Covariance measures second-order dependence, which is linear dependence, whereas means are first-order quantities (00:28:52).
Higher-order quantities include skewness (third-order) and kurtosis (fourth-order), with no specific names for quantities beyond these (00:29:11).

Matrix Expressions and Sample Mean

The goal is to understand how to write these concepts in matrix expressions, which will be explored further (00:29:55).
Matrix expressions can be used to represent sums and other calculations, such as the sum of x values multiplied by their respective weights (00:30:28).
The concept of using matrices to write formulas is introduced, with the example of the sample mean being represented as a vector expression (00:31:02).

Contrasts and their Applications

Matrices can be used to perform various general operations, such as creating contrasts, which are useful for making specific comparisons between means (00:32:18).
Contrasts are often tedious to calculate, but can be useful in certain situations, such as when designing experiments with specific comparisons in mind (00:32:43).
An example of a contrast is given, where the means of two groups are compared, and a matrix formula is used to compute the contrast (00:34:07).
The formula for the contrast is represented as M transpose times a vector of coefficients, where M is a vector of means (00:34:22).
The coefficients in the vector are used to weight the means and compute the contrast, with the example given being mu one plus mu 2 over 2 minus mu 3 plus mu 4 over 2 (00:34:32).
A discussion about the use of contrasts in item response theory is mentioned, with a comment about not randomizing distractors when writing items (00:34:56).
The importance of designing studies with contrasts in mind is emphasized, as it can make the analysis more meaningful and interpretable (00:33:08).

Matrices in Statistical Analysis

Matrices are used to simplify formulas and expressions, making it easier to work with complex data (00:35:55).
In statistical analysis, matrices can be used to define contrast matrices or contrast vectors, which are necessary for computing certain statistics (00:36:35).
The ability to compute sample means is not the only benefit of using matrices; they can also be used to compute more complex statistics, such as Covariance matrix (00:36:49).

Computing the Covariance Matrix

The Covariance matrix can be computed using the formula S = (X, Y) * (X, Y)^T, where X and Y are vectors of variables (00:37:31).
To compute the covariance matrix, the vectors X and Y are combined into a single matrix using the cbind function, which stacks the columns of the two vectors together (00:38:00).
The resulting matrix is then multiplied by its transpose to compute the covariance matrix (00:38:50).
The formula for computing the Covariance matrix using matrices is the same regardless of the size of the matrix, making it easier to work with large datasets (00:39:42).
The final step in computing the covariance matrix involves fixing up the result to include a factor of 1/n, which is a simple step (00:39:52).
The sums of squares and cross products can be calculated for arbitrarily sized matrices and many data vectors, providing a way to work with matrix algebra (00:39:58).
The order of calculating the product of two vectors does not matter, as X transpose Y and Y transpose X are exactly the same due to the summing of their products (00:40:47).
Matrix algebra is beneficial as it saves the effort of thinking about the arithmetic, rather than the actual computational effort, allowing for easier handling of complex calculations (00:41:51).
The benefit of matrix algebra is that it allows for the use of computer languages and mathematical expressions that do not require thinking about the underlying arithmetic (00:42:19).

Centering a Data Vector

The sum of squares and cross products is not centered, but there is a way to write a matrix expression that centers a data vector using matrix algebra (00:42:58).
Centering a data vector involves subtracting the mean from each value, and this can be achieved using a specific matrix expression, which will be explored in the homework assignment (00:43:41).
A matrix Q can be used to center a data vector, and this will be further explored in the homework assignment (00:44:16).
Multiplying a vector by the matrix Q results in a centered vector, where Q is defined as I minus one over N, one transpose, with I being the identity matrix and N being the number of elements in the vector (00:44:57).
The matrix Q is known as a centering matrix and has interesting properties, such as Q squared equals Q, similar to the identity matrix (00:46:59).
The centering matrix Q can be used to center a vector, and once a vector is centered, there is no more centering to be done, as it is already centered (00:48:55).
The property of Q squared equals Q is demonstrated by foiling the equation, which results in most terms dropping out, leaving only I minus one over N, one transpose (00:48:10).
The centering matrix Q is an important concept in Linearity models and is used in various applications, including statistics (00:49:13).
The example provided uses a 3x3 matrix Q to center a vector X, resulting in a new vector with centered values (00:46:32).
Qq equals Q because Q centers a vector, and if a vector is already centered, it cannot be centered again (00:49:43).

Converting Covariance to Correlation Matrix

To convert a Covariance matrix to a correlation matrix, the Covariance is divided by the product of the standard deviations (00:50:28).
The correlation matrix RXY is calculated using the formula RXY = covariance / (standard deviation of X * standard deviation of Y) (00:50:23).
The diag operator in R is used to extract the diagonal elements of a matrix or create a diagonal matrix from a vector (00:50:45).
The diag operator can be used to get the diagonal elements of a matrix by applying it twice, first to the original matrix and then to the result (00:51:19).
To calculate the correlation matrix, the diagonal elements of the covariance matrix are extracted using the diag operator and then multiplied by the negative one half power of the standard deviations (00:51:42).
The negative one half power of a diagonal matrix is equivalent to taking the reciprocal of the square root of the diagonal elements (00:52:49).
The correlation matrix is calculated by multiplying the inverse of the standard deviations by the Covariance matrix and then multiplying by the inverse of the standard deviations again (00:52:30).

Interpretability of Covariance and Correlation

Covariance matrices can be written as a matrix expression, similar to correlation matrices, and are used in math due to their convenience, not because they have a direct interpretation (00:53:54).
A Correlation coefficient is a type of covariance with unit values that are the product of the two variables being measured, making it more interpretable than covariance (00:54:19).
Covariances are not typically interpretable and are considered “head scratchers,” whereas standard deviations and correlation coefficients have more meaningful interpretations (00:54:29).
Variances and covariances are used in calculations, but standard deviations and correlation coefficients are more commonly discussed due to their interpretability (00:54:57).
Normal distributions are often understood in terms of standard deviation units rather than variance units, as variances represent the squared average deviation (00:55:16).
Covariances are used throughout the semester because the math is convenient, not because they are typically interpretable, although there may be times when they are more interpretable than standard deviations (00:55:52).

Dummy Coding and Matrix Operations

Dummy variable (statistics) is a method of coding discrete variables, and it is possible to use matrices to perform calculations with dummy-coded variables (00:56:20).
When working with dummy-coded variables, it is necessary to include all categories, although in regression analysis, a reference category is typically excluded (00:57:01).
Matrices can be chopped into pieces and recombined using operations like binding, which can be useful for performing calculations with dummy-coded variables (00:57:22).
When binding two matrices together, they must have the same number of rows or columns, but not necessarily both, as demonstrated with matrices A and B being combined to form matrix D (00:57:50).
To understand the properties of matrix A, we can calculate A transpose times A, which involves multiplying the transpose of A by A itself (00:58:21).
The result of A transpose times A is a diagonal matrix containing the frequencies of each column in A, which can be obtained by summing down the rows of A (01:00:36).
The rules of Dummy variable (statistics) state that if a value is 1 in one column, it must be 0 in the other columns, and this is used to create the diagonal matrix (00:59:01).
There are alternative methods, such as probabilistic coding, which allow for different values, but this is not used in the current class (00:59:27).
The dimensions of A transpose times A are determined by the dimensions of A and its transpose, resulting in a 3x3 matrix in this case (00:59:44).
The resulting diagonal matrix has the sums of each column on the diagonal and zeros on the off-diagonals, which can be represented as a diagonal matrix with the sums of each column (01:01:36).
Similarly, B transpose times B also results in a diagonal matrix with the sums of each column on the diagonal and zeros on the off-diagonals (01:01:40).
To ensure the dimensions match, A transpose B or B transpose A can be used, where A is a 6 by 3 matrix and B is a 6 by 2 matrix, resulting in A transpose B being 3 by 2 and B transpose A being 2 by 3 (01:02:19).
A transpose B and B transpose A provide the same information, just flipped on its side, so only one of them needs to be calculated (01:03:00).
The result of the multiplication is a 3 by 2 matrix, which represents a crosstab or frequency table showing the relationship between two variables (01:03:42).

Matrix Algebra and Statistical Calculations

Matrix functions and expressions can be used to calculate various statistical measures, including means and Covariance matrix (01:04:02).
Linearity algebra, specifically matrix algebra, is a fundamental tool for solving statistical problems, and is used in calculations for many statistical methods (01:04:20).
The crosstab or frequency table can be used to show the relationship between two variables, with the rows and columns representing different groups or categories (01:05:39).
Many statistical measures and calculations, including Covariance matrices, means, regression, and crosstabs, can be expressed and solved using matrix formulas (01:06:23).
Matrix algebra can also be used to solve systems of linear equations (01:06:26).
Expressing statistical problems in terms of matrices is a key step in solving them using linear algebra (01:06:35).

Coding Schemes and Block Matrices

Dummy variable (statistics) and one-hot coding are methods used to represent categorical data in a numerical format, with one-hot coding being a more comprehensive version that includes all categories, whereas dummy coding is a reduced version that drops one of the columns (01:07:07).
In machine learning, the term “one-hot encoding” is often used to describe this process, and there are various other coding schemes, such as Helmert coding, that can be used depending on the specific application (01:07:49).
To perform these coding schemes, a matrix formula can be used, and it is possible to work with matrices of matrices, also known as block matrices, as long as the rules of matrix operations are followed (01:08:46).
Block matrices can be thought of as a way of chopping up a larger matrix into smaller sub-matrices, and this can be useful for organizing and analyzing data (01:09:27).
In R, block matrices can be created and manipulated using matrix subsetting and array subsetting, and the specific commands and functions used will depend on the structure and organization of the data (01:10:13).
The key objects used in R for working with data include vectors, matrices, and data frames, and understanding how to work with these objects is essential for performing data analysis and manipulation tasks (01:10:54).
A data frame is a way to combine character data and numerical data together, allowing for the inclusion of row labels or column labels as character data, which can be useful for certain functions like plotting (01:11:16).
Data frames can be thought of as matrices that allow for cheating by including different types of data, and they are often used in functions needed for the class (01:12:08).
Tibbles are similar to data frames but enforce additional rules, and many packages are not written with tibbles in mind, so it may be necessary to convert tibbles to data frames (01:12:18).
When loading data, it may come in as a tibble, and it is often necessary to convert it to a data frame, which requires understanding coercion (01:13:12).
Character data cannot be multiplied, so it is necessary to code data correctly, and matrices can be thought of as block matrices or collections of smaller matrices (01:13:56).
A common type of matrix is a block diagonal matrix, which consists of smaller matrices along the diagonal and zeros or other values elsewhere (01:15:37).
Block diagonal matrices can be symmetric, with the same pattern of values above and below the diagonal (01:14:57).
Matrices can be composed of smaller blocks or sub-matrices, which can be useful for understanding and working with larger matrices (01:14:15).

Data Structures in R

The key objects used in R for working with data include vectors, matrices, and data frames, and understanding how to work with these objects is essential for performing data analysis and manipulation tasks (01:10:54).
A data frame is a way to combine character data and numerical data together, allowing for the inclusion of row labels or column labels as character data, which can be useful for certain functions like plotting (01:11:16).
Data frames can be thought of as matrices that allow for cheating by including different types of data, and they are often used in functions needed for the class (01:12:08).
Tibbles are similar to data frames but enforce additional rules, and many packages are not written with tibbles in mind, so it may be necessary to convert tibbles to data frames (01:12:18).
When loading data, it may come in as a tibble, and it is often necessary to convert it to a data frame, which requires understanding coercion (01:13:12).
Character data cannot be multiplied, so it is necessary to code data correctly, and matrices can be thought of as block matrices or collections of smaller matrices (01:13:56).

Block Matrices and Matrix Operations

A common type of matrix is a block diagonal matrix, which consists of smaller matrices along the diagonal and zeros or other values elsewhere (01:15:37).
Block diagonal matrices can be symmetric, with the same pattern of values above and below the diagonal (01:14:57).
Matrices can be composed of smaller blocks or sub-matrices, which can be useful for understanding and working with larger matrices (01:14:15).
Matrices can be chopped up or sliced and diced as long as the rules are obeyed, and the resulting parts can be used for various computations, such as transposing, which can be useful for tasks like working with cross tabs (01:16:05).
Block diagonal matrices are not necessarily diagonal because they don’t have zeros everywhere else, but the blocks can form diagonals, and this structure can be useful for certain computations (01:15:41).
A matrix can be block symmetric, meaning every block is symmetric, and this property can be useful for understanding the relationships between variables (01:17:35).
Matrices can be symmetric, meaning they are equal to their transpose, and this property can be useful for understanding the relationships between variables (01:17:41).
In R, matrices and arrays can be subsetted using indices, and this can be useful for extracting specific parts of a matrix or array for further computation (01:18:28).

Covariance Matrices and Data Relationships

Covariance matrix can represent frequencies, cross tabs, and other types of data, and they can be used to understand the relationships between variables (01:19:13).
Frequencies are related to the variable with itself, similar to variance, and can indicate variability, while cross tabs are related to the relationships between variables, similar to Covariance (01:19:20).
Cross tabs can be seen as a type of covariance for categorical data, and they can be used to understand the relationships between variables (01:19:45).

Gaussian Elimination and Linear Regression

Gaussian elimination was written by Carl Friedrich Gauss, who also came up with the Gaussian distribution in the same paper, and is related to Linearity regression, which was proposed by Gauss in 1795, although other mathematicians, such as Adrien-Marie Legendre from France and Robert Adrian from the United States, also invented it around the same time (01:20:08).
The reason Gauss’s paper is more well-known is that it is more complete than the others and proposes a method for solving linear regression, whereas the other papers only proposed the idea (01:21:45).
The mathematicians who worked on linear regression, including Gauss, were mostly astronomers or surveyors who were trying to figure out how to combine multiple data points properly (01:22:08).
In Gaussian elimination, a linear system of equations, such as ax = b, can be represented in different ways, including as a vector system or a matrix equation, which are all equivalent representations of the same information (01:24:04).
A linear system of equations can be written as a matrix equation, where the coefficients of the variables are arranged in a matrix, and the variables and constants are represented as vectors (01:23:22).
The different representations of a linear system, including the original equations, the vector system, and the matrix equation, all contain the same information and are equivalent (01:24:21).
Gauss figured out that there was a matrix equation that needed to be solved to get the regression coefficients, which is X transpose X inverse X transpose y equals beta hat, the OLS estimator (01:25:12).

Matrix Inversion and Gaussian Elimination

The problem with this equation is that it requires computing an inverse, which can be difficult, especially for large matrices (01:25:58).
Gaussian elimination is a systematic way to solve systems of Linearity equations and can be used to calculate matrix inverses or determine if a matrix can’t be inverted (01:27:07).
Gaussian elimination can also tell us about the concept of rank, which will be discussed further next week (01:27:20).

Invertible Matrices and Linear Equations

To find the inverse of a matrix, we need to find a matrix that, when multiplied by the original matrix, equals the identity matrix (01:27:41).
The inverse of a matrix is used to solve systems of linear equations, similar to how division is used in ordinary algebra (01:27:56).
For a matrix to be invertible, it must be square, meaning it has the same number of rows and columns, and therefore the same number of unknowns as equations (01:29:03).
In high school, it was learned that to solve a system of Linearity equations, the number of unknowns must be equal to the number of equations, and there must not be linear dependence, meaning no columns or rows can be copies of each other (01:29:14).
However, in statistics, it is common to deal with inconsistent systems where there are more observations than unknowns, and there are also situations where there are more parameters or variables than observations (01:29:41).
Dimension reduction is a technique used to address these situations (01:30:27).
A 2x2 matrix can be inverted using a simple formula, but larger matrices require more complex formulas or techniques such as the block inverse theorem (01:30:55).
The block inverse theorem allows for a matrix to be chopped into smaller pieces, inverted, and then reassembled (01:31:35).
Inverting large matrices, such as 5x5 or larger, can be impractical and may require special formulas or techniques (01:32:12).
A 4x4 matrix is the largest size that can be reasonably inverted by hand, and larger matrices are typically inverted using computational methods (01:32:18).
The inverse of a matrix can be used to solve a system of Linearity equations, and multiplying a matrix by its inverse results in the identity matrix (01:33:08).

Determinants and Matrix Inversion

The concept of determinants can be applied to various matrix sizes, such as 3x3, 4x4, 5x5, or 6x6, and a generalized algorithm was developed to simplify the process (01:33:29).
Determinants were previously used to solve equations, but they are considered tedious and difficult to work with, especially for larger matrices (01:33:48).
The formula for finding the inverse of a 2x2 matrix, which involves the determinant, has been around for a long time and is an example of a concept that has been reinvented multiple times (01:34:50).
The formula for the inverse of a 2x2 matrix is important for conceptual understanding, and it is crucial to understand the limitations of the formula, specifically that the determinant (ad - BC) cannot be zero (01:35:30).
The reason the determinant cannot be zero is that it would result in dividing by zero, which is a fundamental rule that cannot be broken in Mathematics (01:35:56).
The determinant (ad - BC) being zero implies that the two columns of the matrix are proportional to each other, which means the matrix is not invertible (01:36:32).
The determinant can be represented geometrically as the volume of a parallelogram, and it provides information about the relationship between the columns or rows of the matrix (01:37:10).
If the columns or rows of the matrix are parallel to each other, the determinant will be zero, indicating that the matrix is not invertible (01:37:34).

Correlation Matrices and Determinants

A correlation matrix with a Correlation coefficient (RXY) near 1 will have an inverse that equals 1 divided by 1 minus r squared, where r is the correlation coefficient (01:38:40).
When the correlation coefficient (r) is equal to plus or minus 1, the inverse will be undefined because it involves dividing by 0 (01:38:57).
In practice, a correlation coefficient of 1 or -1 is often the result of a mistake, as it is rare to encounter such a high correlation in real data (01:39:21).

Generalizing the Inverse Formula

Gauss developed a method to generalize the inverse formula, which involves using elementary row operations to transform the original matrix into a diagonal matrix (01:39:48).
Elementary row operations are a series of multiplications by matrices that, when combined, form the inverse of the original matrix (01:40:50).
The process of finding the inverse of a matrix using elementary row operations involves starting with the original matrix and iteratively removing elements below the diagonal, while building up the terms needed for the inverse on the other side of the matrix (01:41:21).
The resulting inverse matrix can be obtained by multiplying the original matrix by a series of matrices that, when combined, form the inverse (01:41:04).
The process of finding the inverse of a matrix can be tedious, even for small matrices, but it is a logical and systematic process (01:41:14).

Non-Invertible Matrices and Linear Dependence

A matrix can be non-invertible if one of its columns is dependent on the other columns, as seen in the example of the matrix (012) 345-6789, where the third column is dependent on the first column because it is obtained by multiplying the first column by 3 (01:42:38).
The original matrix (123) 456-7890 is also non-invertible because its third column is dependent on the first column, as it is obtained by adding 3 to the first column (01:42:55).
The matrix is linearly dependent, and its columns can be expressed as Linearity combinations of each other, for example, the third column is equal to the first column plus 3 (01:43:24).

Mahalanobis Distance and Data Analysis

The discussion of matrices and their properties is relevant to the topic of Prasanta Chandra Mahalanobis distance, which will be covered next (01:43:56).
An example from a book by Solomon Kullback, “Information Theory and Statistics”, is presented, which involves data from soldiers in the United States Army in World War II, including measures of general IQ, high school Grading in education, and job-specific test scores (01:45:39).
The data includes two groups of soldiers: unsuccessful candidates (group 1) and successful candidates (group 2), with 96 and 209 observations, respectively (01:46:23).
The mean scores are not available, but the differences between the groups are provided, showing that the unsuccessful candidates scored 17.6 points lower on the general IQ test (01:46:45).
IQ scores are scaled using the T-score method, with a mean of 100 and a standard deviation of 15 (01:46:56).
IQ scores are computed as z-scores, then multiplied by 15 and added to 100 to avoid negative numbers (01:47:19).
A mean difference of 17.6 in IQ scores is equivalent to a standard deviation down, which is a significant difference (01:47:48).
Unsuccessful candidates have lower test scores than successful candidates, with a high school Grading in education of 1.8 and a job-specific test score of 5.3 (01:48:05).
To understand the data, we need to know the standard deviations, but we can start by looking at the variances (01:48:22).
The variances of the unsuccessful and successful candidates are presented, but we need to calculate the standard deviations to make sense of the data (01:48:37).
The standard deviations of the two groups are calculated, showing the variability within each group (01:48:56).
The data may have been subject to prescreening, which could affect the results (01:49:16).

Covariance Matrix Validation

The Covariance matrix for the unsuccessful and successful candidates are presented, but we need to determine if they are valid (01:49:34).
A valid Covariance matrix must be symmetric, which these matrices appear to be, but we also need to check for positive definiteness (01:50:08).
The covariance matrices are compared, with one being almost diagonal due to small off-diagonal values (01:50:37).

Correlation Analysis and Interpretation

To compare the covariances, we can calculate the correlations by multiplying the standard deviations and the inverse of the standard deviations (01:51:35).
The Correlation coefficient of two matrices are presented, showing a medium correlation between Ge. Loaded and high School Grading in education, which is expected (01:52:06).
The job specific test does not correlate with Ge. Loaded tests, which is desirable as it suggests that the job specific test is assessing job specific knowledge rather than general intelligence (01:52:27).
The job specific test is designed to measure how good someone would be at a specific job, and it is beneficial that it does not correlate with G loaded tests, as it means that the assessment is not biased towards general intelligence (01:53:04).
The unsuccessful candidates scored a standard deviation lower on the job specific test, indicating that the test is assessing job specific knowledge rather than general intelligence (01:53:27).
The assessment is useful because it is not strongly correlated with a G test, which means that it is not just measuring general intelligence, but rather job specific knowledge (01:54:02).

Regression and Temporal Precedence

Matrix formulas can be used to compute the regressions of the job specific test score on G loaded and high school Grading in education, but it is more meaningful to regress the job specific test score on high school Gpa, as high school Gpa is a temporal precedent (01:54:32).
Regressing the new thing on the old thing is not desirable, as it does not make sense temporally, and it is generally more meaningful to regress the old thing on the new thing (01:55:10).

Data Simulation and Covariance Matrices

The Mvr. Norm function in the Mathematics library can be used to simulate data, which can be useful for working with Covariance matrix (01:55:51).
A multivariate normal data set can be simulated with specific properties, including a mean vector and a Covariance matrix, using the empirical flag, which forces the new data set to have those properties (01:56:25).
The empirical flag allows for the creation of a data set with a specific correlation structure, mean vector, and standard deviations, making it useful for testing specific hypotheses, but not for simulation studies or bootstrapping (01:57:32).
Computing covariance by hand is not recommended and is only done for demonstration purposes; instead, a function should be used (01:57:57).

Data Standardization and the Scale Function

The scale function is a useful tool that centers and/or standardizes columns of data, allowing for the creation of z-scores (01:58:31).
The scale function can be used to standardize data, eliminating the mean and resulting in standardized differences or z-scores (01:59:54).
Standardizing data does not change the correlation matrix, which remains the same as before standardization (02:00:39).
The Prasanta Chandra Mahalanobis distance is a measure that centers, standardizes, and rotates data to a specific direction in multivariate space (02:00:42).
Using the scale function to standardize data results in a correlation matrix that is identical to the original correlation matrix (02:01:10).
The approach of calculating Covariance matrix should be avoided unless specifically required, and instead, the common core method should be used, as it is more practical for real-life applications (02:01:27).
There are situations where the function is not working as desired, and knowing how to calculate Covariance matrices can be helpful in troubleshooting (02:01:43).
In some cases, all cross products are needed instead of a covariance matrix, and knowing how to calculate them can be useful (02:01:58).

Mahalanobis Distance and Data Transformation

The Prasanta Chandra Mahalanobis distance is a measure that centers, standardizes, and rotates data to a specific direction in multivariate space (02:00:42).
Using the scale function to standardize data results in a correlation matrix that is identical to the original correlation matrix (02:01:10).
The approach of calculating Covariance matrix should be avoided unless specifically required, and instead, the common core method should be used, as it is more practical for real-life applications (02:01:27).

Practical Applications and Troubleshooting

There are situations where the function is not working as desired, and knowing how to calculate Covariance matrices can be helpful in troubleshooting (02:01:43).
In some cases, all cross products are needed instead of a covariance matrix, and knowing how to calculate them can be useful (02:01:58).

Future Topics: M Distance, Eigenvalues, and SVD

The next topic to be covered will be M distance, eigenvalues, and Singular value decomposition, which will lay the groundwork for principal components analysis and correspondence analysis (02:02:05).
The singular value decomposition will be used to develop the technology needed for principal components analysis and correspondence analysis (02:02:24).
The next class will cover another technique, followed by a week off, and then the focus will shift to applying the techniques learned, including using singular value decomposition and bootstrapping to assess the stability of answers (02:02:43).
The class will cover various topics, including Principal component analysis, correspondence analysis, multi-dimensional scaling, clusters, discriminant analysis, regularization, bootstrapping, auto-encoding, cross-validation, and permutation (02:04:06).
The syllabus includes topics such as regularization, bootstrapping, auto-encoding, jackknife permutation, and cross-validation, which will be covered in the remaining classes (02:04:18).