MVA Class 4 — Covariance Matrices

Covariance Matrices, Correlation, and Mahalanobis Distance
- Today’s topic is Covariance
matrix and related concepts, including correlation and Prasanta
Chandra Mahalanobis distance, as stated in the official syllabus (00:03:52).
- The discussion will cover the application of these quantities and
serve as a review of matrix algebra, as computing them involves
translating ordinary formulas into matrix expressions (00:04:21).
- The goal is to learn how to compute these quantities, which will
help in understanding how to translate formulas involving sums into
matrix expressions (00:04:24).
- The class will likely review some conceptual and geometric aspects
of matrix algebra, but the focus will be on computing Covariance and
correlation matrices (00:04:48).
Correlation and Regression
- Correlation is a measure of bivariate Linearity dependence,
and the class will cover the formulas for correlation and covariance,
with R code provided at the end of the class (00:05:47).
- The discussion will also touch on the relationship between
regression lines and the data cloud, including the concept of
best-fitting lines (00:06:18).
- The class will use R code to demonstrate the concepts, but the
instructor is having trouble navigating the interface due to changes in
the IPhone and iPad
Zoom layout (00:05:29).
Major Axis Regression
- Major Axis regression is a type of regression that is not commonly
discussed, but is often used in ecology, and it has to do with where the
error term is placed in the equation (00:07:18).
- In Major Axis regression, the error terms are perpendicular to the
major axis line, and it is a Euclidean space
approach (00:08:22).
Simple Linear Regression
- In simple linear
regression, the equation y = alpha + beta x + error can
be used to describe the relationship between two variables, where beta
equals the Covariance of x and
y divided by the variance of x (00:08:39).
- The equation can also be written as y = mu y + beta (x - mu x),
where mu y and mu x are the means of y and x, respectively (00:09:31).
- When working with units in the equation, the units of alpha must
match the units of y, and the units of the error term must also match
the units of y (00:10:56).
- The check the dimensions rule in matrix algebra states that when
adding things, their units must be the same (00:11:29).
- The units of beta are the units of y divided by the units of x, and
the units of the error term are the same as the units of y (00:11:15).
- Using concrete units, such as centimeters and grams, can help make
the equation more understandable and easier to work with (00:10:29).
- In many social science areas, units are not often considered, but
thinking about units can help determine whether a model makes sense or
not, as adding different units (e.g., grams and centimeters) is
meaningless (00:11:42).
- When considering the slope of Beta, the units have to cancel out, so
if Beta is in grams, it must be per centimeter, resulting in a unit of
grams per centimeter (00:12:26).
Correlation Coefficients and Units
Matrix Representation of Regression
- Regression can be written in matrix form, and the expected value of
y given X can be expressed using a formula that considers the dimensions
of the variables (00:14:47).
- The notation X represents a vector that is p by one, where p is the
number of variables or covariates, and y is a scalar (one by one) (00:16:11).
- The formula for the expected value of y given X involves a vector X
and a scalar y, resulting in a p by one plus one by one vector (00:16:39).
Vectors in R
- In R, a vector can be created by stacking numbers, with the “c”
function used to combine the numbers, and the vector can be extended by
adding more numbers to the end, separated by commas (00:16:59).
- The expected value of a vector X is represented as Mu of X, which is
a vector containing the means of all the variables in X, denoted as Mu
x1, Mu x2, …, Mu xP (00:17:43).
- The vector X with a hat on top (x with the r on top) is a p by 1
vector, representing the population means, and not the data set, which
would have multiple rows (00:18:02).
Covariance Matrix
- The Covariance
matrix is represented by the Greek language
letter Sigma (Σ), and
it is a p by p matrix that contains the variances and Covariance of the
variables in X (00:18:59).
- The notation for the covariance matrix is often written with a big
Sigma and a slash through it to distinguish it from the sum sign, and it
is commonly used to represent the population covariance matrix (00:20:21).
- The data covariance matrix is often represented by a big S, and it
is used to distinguish it from the population covariance matrix (00:21:04).
- The covariance matrix Sigma is used to represent the population
covariance matrix, and it is a p by p matrix that contains the variances
and covariances of the variables in X (00:19:22).
- The covariance matrix, denoted as Big Sigma, is a matrix that
contains variances and covariances of variables, with variances on the
diagonal and covariances on the off-diagonal (00:21:58).
- The variance of a variable, represented as Sigma 1, 1, is the
variance of x 1, which is a measure of the spread or dispersion of the
variable (00:22:17).
- The Covariance
matrix is a summary of the data, performing data reduction, and it
does not contain information about the identity of individual cases (00:22:57).
- The Covariance matrix
and the mean vector are used in Structural
equation modeling (SEM) to fit models, but there is still more
information in the data set that is not used (00:23:35).
- Linearity
regression only uses the means, standard deviations, and Correlation
coefficient, which can be computed from the covariance matrix and
the mean vector (00:24:16).
- The covariance matrix is used in various statistical techniques,
including factor analysis, confirmatory factor analysis, and
longitudinal models, which are often used in Structural Equation
Modeling (00:24:50).
- The covariance matrix has variances on the diagonal and covariances
on the off-diagonal, with the variances representing the spread of the
variables and the covariances representing the relationship between
variables (00:25:39).
Covariance and its Interpretation
- Covariance is
a measure of linear dependence, essentially describing how much data
points deviate from the major axis, and it is symmetric (00:26:29).
- There is a way to generalize the regression formula, allowing for
the calculation of regressions using matrix formulas, although this is
more for theoretical understanding rather than practical application (00:26:36).
- Covariance is not typically used for direct computation in
computers, but rather for understanding the underlying theory (00:27:12).
- The population covariance is a peculiar expected value that measures
the average Linearity
relationship between two variables (00:27:34).
- The formula for covariance involves multiplying the deviations of
two variables together and taking the average, which is similar to the
standard data formula for variance (00:27:52).
- Covariance is a generalized form of variance that measures the
average linear relationship between two variables, rather than the
scatter around one mean (00:28:36).
- Covariance
measures second-order dependence, which is linear dependence, whereas
means are first-order quantities (00:28:52).
- Higher-order quantities include skewness (third-order) and kurtosis
(fourth-order), with no specific names for quantities beyond these (00:29:11).
Matrix Expressions and Sample Mean
- The goal is to understand how to write these concepts in matrix
expressions, which will be explored further (00:29:55).
- Matrix expressions can be used to represent sums and other
calculations, such as the sum of x values multiplied by their respective
weights (00:30:28).
- The concept of using matrices to write formulas is introduced, with
the example of the sample mean being represented as a vector expression
(00:31:02).
Contrasts and their Applications
- Matrices can be used to perform various general operations, such as
creating contrasts, which are useful for making specific comparisons
between means (00:32:18).
- Contrasts are often tedious to calculate, but can be useful in
certain situations, such as when designing experiments with specific
comparisons in mind (00:32:43).
- An example of a contrast is given, where the means of two groups are
compared, and a matrix formula is used to compute the contrast (00:34:07).
- The formula for the contrast is represented as M transpose times a
vector of coefficients, where M is a vector of means (00:34:22).
- The coefficients in the vector are used to weight the means and
compute the contrast, with the example given being mu one plus mu 2 over
2 minus mu 3 plus mu 4 over 2 (00:34:32).
- A discussion about the use of contrasts in item response theory is
mentioned, with a comment about not randomizing distractors when writing
items (00:34:56).
- The importance of designing studies with contrasts in mind is
emphasized, as it can make the analysis more meaningful and
interpretable (00:33:08).
Matrices in Statistical Analysis
- Matrices are used to simplify formulas and expressions, making it
easier to work with complex data (00:35:55).
- In statistical analysis, matrices can be used to define contrast
matrices or contrast vectors, which are necessary for computing certain
statistics (00:36:35).
- The ability to compute sample means is not the only benefit of using
matrices; they can also be used to compute more complex statistics, such
as Covariance
matrix (00:36:49).
Computing the Covariance Matrix
- The Covariance matrix
can be computed using the formula S = (X, Y) * (X, Y)^T, where X and Y
are vectors of variables (00:37:31).
- To compute the covariance matrix, the vectors X and Y are combined
into a single matrix using the cbind function, which stacks the columns
of the two vectors together (00:38:00).
- The resulting matrix is then multiplied by its transpose to compute
the covariance matrix (00:38:50).
- The formula for computing the Covariance
matrix using matrices is the same regardless of the size of the
matrix, making it easier to work with large datasets (00:39:42).
- The final step in computing the covariance matrix involves fixing up
the result to include a factor of 1/n, which is a simple step (00:39:52).
- The sums of squares and cross products can be calculated for
arbitrarily sized matrices and many data vectors, providing a way to
work with matrix algebra (00:39:58).
- The order of calculating the product of two vectors does not matter,
as X transpose Y and Y transpose X are exactly the same due to the
summing of their products (00:40:47).
- Matrix algebra is beneficial as it saves the effort of thinking
about the arithmetic, rather than the actual computational effort,
allowing for easier handling of complex calculations (00:41:51).
- The benefit of matrix algebra is that it allows for the use of
computer languages and mathematical expressions that do not require
thinking about the underlying arithmetic (00:42:19).
Centering a Data Vector
- The sum of squares and cross products is not centered, but there is
a way to write a matrix expression that centers a data vector using
matrix algebra (00:42:58).
- Centering a data vector involves subtracting the mean from each
value, and this can be achieved using a specific matrix expression,
which will be explored in the homework assignment (00:43:41).
- A matrix Q can be used to center a data vector, and this will be
further explored in the homework assignment (00:44:16).
- Multiplying a vector by the matrix Q results in a centered vector,
where Q is defined as I minus one over N, one transpose, with I being
the identity matrix and N being the number of elements in the vector (00:44:57).
- The matrix Q is known as a centering matrix and has interesting
properties, such as Q squared equals Q, similar to the identity matrix
(00:46:59).
- The centering matrix Q can be used to center a vector, and once a
vector is centered, there is no more centering to be done, as it is
already centered (00:48:55).
- The property of Q squared equals Q is demonstrated by foiling the
equation, which results in most terms dropping out, leaving only I minus
one over N, one transpose (00:48:10).
- The centering matrix Q is an important concept in Linearity models and
is used in various applications, including statistics (00:49:13).
- The example provided uses a 3x3 matrix Q to center a vector X,
resulting in a new vector with centered values (00:46:32).
- Qq equals Q because Q centers a vector, and if a vector is already
centered, it cannot be centered again (00:49:43).
Converting Covariance to Correlation Matrix
- To convert a Covariance
matrix to a correlation matrix, the Covariance is
divided by the product of the standard deviations (00:50:28).
- The correlation matrix RXY is calculated using the formula RXY =
covariance / (standard deviation of X * standard deviation of Y) (00:50:23).
- The diag operator in R is used to extract the diagonal elements of a
matrix or create a diagonal matrix from a vector (00:50:45).
- The diag operator can be used to get the diagonal elements of a
matrix by applying it twice, first to the original matrix and then to
the result (00:51:19).
- To calculate the correlation matrix, the diagonal elements of the
covariance matrix are extracted using the diag operator and then
multiplied by the negative one half power of the standard deviations (00:51:42).
- The negative one half power of a diagonal matrix is equivalent to
taking the reciprocal of the square root of the diagonal elements (00:52:49).
- The correlation matrix is calculated by multiplying the inverse of
the standard deviations by the Covariance
matrix and then multiplying by the inverse of the standard
deviations again (00:52:30).
Interpretability of Covariance and Correlation
- Covariance
matrices can be written as a matrix expression, similar to correlation
matrices, and are used in math due to their convenience, not because
they have a direct interpretation (00:53:54).
- A Correlation
coefficient is a type of covariance with unit values that are the
product of the two variables being measured, making it more
interpretable than covariance (00:54:19).
- Covariances are not typically interpretable and are considered “head
scratchers,” whereas standard deviations and correlation coefficients
have more meaningful interpretations (00:54:29).
- Variances and covariances are used in calculations, but standard
deviations and correlation coefficients are more commonly discussed due
to their interpretability (00:54:57).
- Normal distributions are often understood in terms of standard
deviation units rather than variance units, as variances represent the
squared average deviation (00:55:16).
- Covariances are used throughout the semester because the math is
convenient, not because they are typically interpretable, although there
may be times when they are more interpretable than standard deviations
(00:55:52).
Dummy Coding and Matrix Operations
- Dummy
variable (statistics) is a method of coding discrete variables, and
it is possible to use matrices to perform calculations with dummy-coded
variables (00:56:20).
- When working with dummy-coded variables, it is necessary to include
all categories, although in regression analysis, a reference category is
typically excluded (00:57:01).
- Matrices can be chopped into pieces and recombined using operations
like binding, which can be useful for performing calculations with
dummy-coded variables (00:57:22).
- When binding two matrices together, they must have the same number
of rows or columns, but not necessarily both, as demonstrated with
matrices A and B being combined to form matrix D (00:57:50).
- To understand the properties of matrix A, we can calculate A
transpose times A, which involves multiplying the transpose of A by A
itself (00:58:21).
- The result of A transpose times A is a diagonal matrix containing
the frequencies of each column in A, which can be obtained by summing
down the rows of A (01:00:36).
- The rules of Dummy
variable (statistics) state that if a value is 1 in one column, it
must be 0 in the other columns, and this is used to create the diagonal
matrix (00:59:01).
- There are alternative methods, such as probabilistic coding, which
allow for different values, but this is not used in the current class (00:59:27).
- The dimensions of A transpose times A are determined by the
dimensions of A and its transpose, resulting in a 3x3 matrix in this
case (00:59:44).
- The resulting diagonal matrix has the sums of each column on the
diagonal and zeros on the off-diagonals, which can be represented as a
diagonal matrix with the sums of each column (01:01:36).
- Similarly, B transpose times B also results in a diagonal matrix
with the sums of each column on the diagonal and zeros on the
off-diagonals (01:01:40).
- To ensure the dimensions match, A transpose B or B transpose A can
be used, where A is a 6 by 3 matrix and B is a 6 by 2 matrix, resulting
in A transpose B being 3 by 2 and B transpose A being 2 by 3 (01:02:19).
- A transpose B and B transpose A provide the same information, just
flipped on its side, so only one of them needs to be calculated (01:03:00).
- The result of the multiplication is a 3 by 2 matrix, which
represents a crosstab or frequency table showing the relationship
between two variables (01:03:42).
Matrix Algebra and Statistical Calculations
- Matrix functions and expressions can be used to calculate various
statistical measures, including means and Covariance
matrix (01:04:02).
- Linearity
algebra, specifically matrix algebra, is a fundamental tool for solving
statistical problems, and is used in calculations for many statistical
methods (01:04:20).
- The crosstab or frequency table can be used to show the relationship
between two variables, with the rows and columns representing different
groups or categories (01:05:39).
- Many statistical measures and calculations, including Covariance matrices,
means, regression, and crosstabs, can be expressed and solved using
matrix formulas (01:06:23).
- Matrix algebra can also be used to solve systems of linear equations
(01:06:26).
- Expressing statistical problems in terms of matrices is a key step
in solving them using linear algebra (01:06:35).
Coding Schemes and Block Matrices
- Dummy
variable (statistics) and one-hot coding are methods used to
represent categorical data in a numerical format, with one-hot coding
being a more comprehensive version that includes all categories, whereas
dummy coding is a reduced version that drops one of the columns (01:07:07).
- In machine learning, the term “one-hot encoding” is often used to
describe this process, and there are various other coding schemes, such
as Helmert coding, that can be used depending on the specific
application (01:07:49).
- To perform these coding schemes, a matrix formula can be used, and
it is possible to work with matrices of matrices, also known as block
matrices, as long as the rules of matrix operations are followed (01:08:46).
- Block matrices can be thought of as a way of chopping up a larger
matrix into smaller sub-matrices, and this can be useful for organizing
and analyzing data (01:09:27).
- In R, block matrices can be created and manipulated using matrix
subsetting and array subsetting, and the specific commands and functions
used will depend on the structure and organization of the data (01:10:13).
- The key objects used in R for working with data include vectors,
matrices, and data frames, and understanding how to work with these
objects is essential for performing data analysis and manipulation tasks
(01:10:54).
- A data frame is a way to combine character data and numerical data
together, allowing for the inclusion of row labels or column labels as
character data, which can be useful for certain functions like plotting
(01:11:16).
- Data frames can be thought of as matrices that allow for cheating by
including different types of data, and they are often used in functions
needed for the class (01:12:08).
- Tibbles are similar to data frames but enforce additional rules, and
many packages are not written with tibbles in mind, so it may be
necessary to convert tibbles to data frames (01:12:18).
- When loading data, it may come in as a tibble, and it is often
necessary to convert it to a data frame, which requires understanding
coercion (01:13:12).
- Character data cannot be multiplied, so it is necessary to code data
correctly, and matrices can be thought of as block matrices or
collections of smaller matrices (01:13:56).
- A common type of matrix is a block diagonal matrix, which consists
of smaller matrices along the diagonal and zeros or other values
elsewhere (01:15:37).
- Block diagonal matrices can be symmetric, with the same pattern of
values above and below the diagonal (01:14:57).
- Matrices can be composed of smaller blocks or sub-matrices, which
can be useful for understanding and working with larger matrices (01:14:15).
Data Structures in R
- The key objects used in R for working with data include vectors,
matrices, and data frames, and understanding how to work with these
objects is essential for performing data analysis and manipulation tasks
(01:10:54).
- A data frame is a way to combine character data and numerical data
together, allowing for the inclusion of row labels or column labels as
character data, which can be useful for certain functions like plotting
(01:11:16).
- Data frames can be thought of as matrices that allow for cheating by
including different types of data, and they are often used in functions
needed for the class (01:12:08).
- Tibbles are similar to data frames but enforce additional rules, and
many packages are not written with tibbles in mind, so it may be
necessary to convert tibbles to data frames (01:12:18).
- When loading data, it may come in as a tibble, and it is often
necessary to convert it to a data frame, which requires understanding
coercion (01:13:12).
- Character data cannot be multiplied, so it is necessary to code data
correctly, and matrices can be thought of as block matrices or
collections of smaller matrices (01:13:56).
Block Matrices and Matrix Operations
- A common type of matrix is a block diagonal matrix, which consists
of smaller matrices along the diagonal and zeros or other values
elsewhere (01:15:37).
- Block diagonal matrices can be symmetric, with the same pattern of
values above and below the diagonal (01:14:57).
- Matrices can be composed of smaller blocks or sub-matrices, which
can be useful for understanding and working with larger matrices (01:14:15).
- Matrices can be chopped up or sliced and diced as long as the rules
are obeyed, and the resulting parts can be used for various
computations, such as transposing, which can be useful for tasks like
working with cross tabs (01:16:05).
- Block diagonal matrices are not necessarily diagonal because they
don’t have zeros everywhere else, but the blocks can form diagonals, and
this structure can be useful for certain computations (01:15:41).
- A matrix can be block symmetric, meaning every block is symmetric,
and this property can be useful for understanding the relationships
between variables (01:17:35).
- Matrices can be symmetric, meaning they are equal to their
transpose, and this property can be useful for understanding the
relationships between variables (01:17:41).
- In R, matrices and arrays can be subsetted using indices, and this
can be useful for extracting specific parts of a matrix or array for
further computation (01:18:28).
Covariance Matrices and Data Relationships
- Covariance
matrix can represent frequencies, cross tabs, and other types of
data, and they can be used to understand the relationships between
variables (01:19:13).
- Frequencies are related to the variable with itself, similar to
variance, and can indicate variability, while cross tabs are related to
the relationships between variables, similar to Covariance (01:19:20).
- Cross tabs can be seen as a type of covariance for categorical data,
and they can be used to understand the relationships between variables
(01:19:45).
Gaussian Elimination and Linear Regression
- Gaussian
elimination was written by Carl Friedrich
Gauss, who also came up with the Gaussian distribution in the same
paper, and is related to Linearity regression,
which was proposed by Gauss in 1795, although other mathematicians, such
as Adrien-Marie
Legendre from France and Robert Adrian
from the United States, also invented it around the same time (01:20:08).
- The reason Gauss’s paper is more well-known is that it is more
complete than the others and proposes a method for solving linear
regression, whereas the other papers only proposed the idea (01:21:45).
- The mathematicians who worked on linear regression, including Gauss,
were mostly astronomers or surveyors who were trying to figure out how
to combine multiple data points properly (01:22:08).
- In Gaussian elimination, a linear system of equations, such as ax =
b, can be represented in different ways, including as a vector system or
a matrix equation, which are all equivalent representations of the same
information (01:24:04).
- A linear system of equations can be written as a matrix equation,
where the coefficients of the variables are arranged in a matrix, and
the variables and constants are represented as vectors (01:23:22).
- The different representations of a linear system, including the
original equations, the vector system, and the matrix equation, all
contain the same information and are equivalent (01:24:21).
- Gauss figured out that there was a matrix equation that needed to be
solved to get the regression coefficients, which is X transpose X
inverse X transpose y equals beta hat, the OLS estimator (01:25:12).
Matrix Inversion and Gaussian Elimination
- The problem with this equation is that it requires computing an
inverse, which can be difficult, especially for large matrices (01:25:58).
- Gaussian
elimination is a systematic way to solve systems of Linearity equations
and can be used to calculate matrix inverses or determine if a matrix
can’t be inverted (01:27:07).
- Gaussian elimination can also tell us about the concept of rank,
which will be discussed further next week (01:27:20).
Invertible Matrices and Linear Equations
- To find the inverse of a matrix, we need to find a matrix that, when
multiplied by the original matrix, equals the identity matrix (01:27:41).
- The inverse of a matrix is used to solve systems of linear
equations, similar to how division is used in ordinary algebra (01:27:56).
- For a matrix to be invertible, it must be square, meaning it has the
same number of rows and columns, and therefore the same number of
unknowns as equations (01:29:03).
- In high school, it was learned that to solve a system of Linearity equations,
the number of unknowns must be equal to the number of equations, and
there must not be linear dependence, meaning no columns or rows can be
copies of each other (01:29:14).
- However, in statistics, it is common to deal with inconsistent
systems where there are more observations than unknowns, and there are
also situations where there are more parameters or variables than
observations (01:29:41).
- Dimension reduction is a technique used to address these situations
(01:30:27).
- A 2x2 matrix can be inverted using a simple formula, but larger
matrices require more complex formulas or techniques such as the block
inverse theorem (01:30:55).
- The block inverse theorem allows for a matrix to be chopped into
smaller pieces, inverted, and then reassembled (01:31:35).
- Inverting large matrices, such as 5x5 or larger, can be impractical
and may require special formulas or techniques (01:32:12).
- A 4x4 matrix is the largest size that can be reasonably inverted by
hand, and larger matrices are typically inverted using computational
methods (01:32:18).
- The inverse of a matrix can be used to solve a system of Linearity equations,
and multiplying a matrix by its inverse results in the identity matrix
(01:33:08).
Determinants and Matrix Inversion
- The concept of determinants can be applied to various matrix sizes,
such as 3x3, 4x4, 5x5, or 6x6, and a generalized algorithm was developed
to simplify the process (01:33:29).
- Determinants were previously used to solve equations, but they are
considered tedious and difficult to work with, especially for larger
matrices (01:33:48).
- The formula for finding the inverse of a 2x2 matrix, which involves
the determinant, has been around for a long time and is an example of a
concept that has been reinvented multiple times (01:34:50).
- The formula for the inverse of a 2x2 matrix is important for
conceptual understanding, and it is crucial to understand the
limitations of the formula, specifically that the determinant (ad - BC)
cannot be zero (01:35:30).
- The reason the determinant cannot be zero is that it would result in
dividing by zero, which is a fundamental rule that cannot be broken in
Mathematics (01:35:56).
- The determinant (ad - BC) being zero implies that the two columns of
the matrix are proportional to each other, which means the matrix is not
invertible (01:36:32).
- The determinant can be represented geometrically as the volume of a
parallelogram, and it provides information about the relationship
between the columns or rows of the matrix (01:37:10).
- If the columns or rows of the matrix are parallel to each other, the
determinant will be zero, indicating that the matrix is not invertible
(01:37:34).
Correlation Matrices and Determinants
- A correlation matrix with a Correlation
coefficient (RXY) near 1 will have an inverse that equals 1 divided
by 1 minus r squared, where r is the correlation coefficient (01:38:40).
- When the correlation coefficient (r) is equal to plus or minus 1,
the inverse will be undefined because it involves dividing by 0 (01:38:57).
- In practice, a correlation coefficient of 1 or -1 is often the
result of a mistake, as it is rare to encounter such a high correlation
in real data (01:39:21).
Non-Invertible Matrices and Linear Dependence
- A matrix can be non-invertible if one of its columns is dependent on
the other columns, as seen in the example of the matrix (012) 345-6789,
where the third column is dependent on the first column because it is
obtained by multiplying the first column by 3 (01:42:38).
- The original matrix (123) 456-7890 is also non-invertible because
its third column is dependent on the first column, as it is obtained by
adding 3 to the first column (01:42:55).
- The matrix is linearly dependent, and its columns can be expressed
as Linearity
combinations of each other, for example, the third column is equal to
the first column plus 3 (01:43:24).
Mahalanobis Distance and Data Analysis
- The discussion of matrices and their properties is relevant to the
topic of Prasanta
Chandra Mahalanobis distance, which will be covered next (01:43:56).
- An example from a book by Solomon
Kullback, “Information Theory and Statistics”, is presented, which
involves data from soldiers in the United States
Army in World
War II, including measures of general IQ, high school Grading in
education, and job-specific test scores (01:45:39).
- The data includes two groups of soldiers: unsuccessful candidates
(group 1) and successful candidates (group 2), with 96 and 209
observations, respectively (01:46:23).
- The mean scores are not available, but the differences between the
groups are provided, showing that the unsuccessful candidates scored
17.6 points lower on the general IQ test (01:46:45).
- IQ scores are scaled using the T-score method, with a mean of 100
and a standard deviation of 15 (01:46:56).
- IQ scores are computed as z-scores, then multiplied by 15 and added
to 100 to avoid negative numbers (01:47:19).
- A mean difference of 17.6 in IQ scores is equivalent to a standard
deviation down, which is a significant difference (01:47:48).
- Unsuccessful candidates have lower test scores than successful
candidates, with a high school Grading in
education of 1.8 and a job-specific test score of 5.3 (01:48:05).
- To understand the data, we need to know the standard deviations, but
we can start by looking at the variances (01:48:22).
- The variances of the unsuccessful and successful candidates are
presented, but we need to calculate the standard deviations to make
sense of the data (01:48:37).
- The standard deviations of the two groups are calculated, showing
the variability within each group (01:48:56).
- The data may have been subject to prescreening, which could affect
the results (01:49:16).
Covariance Matrix Validation
- The Covariance
matrix for the unsuccessful and successful candidates are presented,
but we need to determine if they are valid (01:49:34).
- A valid Covariance matrix
must be symmetric, which these matrices appear to be, but we also need
to check for positive definiteness (01:50:08).
- The covariance matrices are compared, with one being almost diagonal
due to small off-diagonal values (01:50:37).
Correlation Analysis and Interpretation
- To compare the covariances, we can calculate the correlations by
multiplying the standard deviations and the inverse of the standard
deviations (01:51:35).
- The Correlation
coefficient of two matrices are presented, showing a medium
correlation between Ge. Loaded and high School Grading in
education, which is expected (01:52:06).
- The job specific test does not correlate with Ge. Loaded tests,
which is desirable as it suggests that the job specific test is
assessing job specific knowledge rather than general intelligence (01:52:27).
- The job specific test is designed to measure how good someone would
be at a specific job, and it is beneficial that it does not correlate
with G loaded tests, as it means that the assessment is not biased
towards general intelligence (01:53:04).
- The unsuccessful candidates scored a standard deviation lower on the
job specific test, indicating that the test is assessing job specific
knowledge rather than general intelligence (01:53:27).
- The assessment is useful because it is not strongly correlated with
a G test, which means that it is not just measuring general
intelligence, but rather job specific knowledge (01:54:02).
Regression and Temporal Precedence
- Matrix formulas can be used to compute the regressions of the job
specific test score on G loaded and high school Grading in
education, but it is more meaningful to regress the job specific
test score on high school Gpa, as high school Gpa is a temporal
precedent (01:54:32).
- Regressing the new thing on the old thing is not desirable, as it
does not make sense temporally, and it is generally more meaningful to
regress the old thing on the new thing (01:55:10).
Data Simulation and Covariance Matrices
- The Mvr. Norm function in the Mathematics library
can be used to simulate data, which can be useful for working with Covariance
matrix (01:55:51).
- A multivariate normal data set can be simulated with specific
properties, including a mean vector and a Covariance matrix,
using the empirical flag, which forces the new data set to have those
properties (01:56:25).
- The empirical flag allows for the creation of a data set with a
specific correlation structure, mean vector, and standard deviations,
making it useful for testing specific hypotheses, but not for simulation
studies or bootstrapping (01:57:32).
- Computing covariance by hand is not recommended and is only done for
demonstration purposes; instead, a function should be used (01:57:57).
Data Standardization and the Scale Function
- The scale function is a useful tool that centers and/or standardizes
columns of data, allowing for the creation of z-scores (01:58:31).
- The scale function can be used to standardize data, eliminating the
mean and resulting in standardized differences or z-scores (01:59:54).
- Standardizing data does not change the correlation matrix, which
remains the same as before standardization (02:00:39).
- The Prasanta
Chandra Mahalanobis distance is a measure that centers,
standardizes, and rotates data to a specific direction in multivariate
space (02:00:42).
- Using the scale function to standardize data results in a
correlation matrix that is identical to the original correlation matrix
(02:01:10).
- The approach of calculating Covariance
matrix should be avoided unless specifically required, and instead,
the common core method should be used, as it is more practical for
real-life applications (02:01:27).
- There are situations where the function is not working as desired,
and knowing how to calculate Covariance matrices
can be helpful in troubleshooting (02:01:43).
- In some cases, all cross products are needed instead of a covariance
matrix, and knowing how to calculate them can be useful (02:01:58).
Practical Applications and Troubleshooting
- There are situations where the function is not working as desired,
and knowing how to calculate Covariance matrices
can be helpful in troubleshooting (02:01:43).
- In some cases, all cross products are needed instead of a covariance
matrix, and knowing how to calculate them can be useful (02:01:58).
Future Topics: M Distance, Eigenvalues, and SVD
- The next topic to be covered will be M distance, eigenvalues, and Singular
value decomposition, which will lay the groundwork for principal
components analysis and correspondence analysis (02:02:05).
- The singular value decomposition will be used to develop the
technology needed for principal components analysis and correspondence
analysis (02:02:24).
- The next class will cover another technique, followed by a week off,
and then the focus will shift to applying the techniques learned,
including using singular value decomposition and bootstrapping to assess
the stability of answers (02:02:43).
- The class will cover various topics, including Principal
component analysis, correspondence analysis, multi-dimensional
scaling, clusters, discriminant analysis, regularization, bootstrapping,
auto-encoding, cross-validation, and permutation (02:04:06).
- The syllabus includes topics such as regularization, bootstrapping,
auto-encoding, jackknife permutation, and cross-validation, which will
be covered in the remaining classes (02:04:18).