MVA Class 5 — M-distances, Eigendecomposition, SVD

Introduction and Course Overview
- The class is about to start, and the instructor, Jay Verkuilen,
mentions that the homework is due on Monday and will provide hints on
the questions during the last 15-20 minutes of the class (00:08:19).
- Jay Verkuilen plans to discuss the topics of M-distance, Prasanta
Chandra Mahalanobis distance, and Singular
value decomposition, which are connected to the previous topics of
covariance matrices and matrix multiplication (00:09:44).
- The class will also cover eigenvalues, which are substantially
connected to the other topics (00:09:51).
- Jay Verkuilen mentions that the topics are connected and will
provide a high-level overview, with readings available for those who
want a more in-depth understanding (00:10:41).
- The instructor posts a reminder in the chat that the last 15 minutes
of the class will be dedicated to discussing the homework (00:10:34).
Mahalanobis Distance
- The class begins with a discussion of Prasanta
Chandra Mahalanobis distance, named after the Indian statistician
Mahalanobis, who developed the concept (00:11:32).
- Mahalanobis was a physicist who studied Einsteinian relativity at
Cambridge and later worked as a statistician in India, contributing to
the foundations of multivariate statistics in the 1930s (00:11:39).
- Principal
component analysis (PCA) has its roots in the 19th century, but the
modern version was developed in the 1930s, with Mahalanobis working on
the concept at an anthropological laboratory, studying skulls and trying
to understand comparable measurements and compare units (00:12:43).
- When dealing with one dimension or variable, one of the most obvious
ways to make comparable measurements is to z-score it, which cancels out
the units of the mean and standard deviation (00:13:58).
- Z-scores are unsigned and can be used as a distance, calculated as
the absolute value of y minus the mean, divided by the standard
deviation (00:14:39).
- In the 1920s, researchers started dealing with multivariate
observations by rescaling variables by their standard deviations,
effectively z-scoring each variable but leaving the correlation
structure alone (00:15:20).
- This approach converts the covariance matrix into a correlation
matrix but fails to take the correlation structure of the variables into
account (00:16:00).
- Prasanta
Chandra Mahalanobis, with his background in general relativity,
proposed using ideas from general relativity to decorrelate the
variables, which can be achieved by taking the inverse of the covariance
matrix, denoted as Sigma inverse (00:17:11).
- Mahalanobis’ approach involves using the inverse of the covariance
matrix to decorrelate the variables, which is a key concept in Principal
component analysis (00:17:18).
- The Mahalanobis
distance is a proper generalization of the multivariate Z score, and
it is usually unsigned because it’s difficult to decide the sign of a
vector, with the exception of uniformly below the mean cases which could
be considered negative (00:17:41).
- The actual Mahalanobis distance (DM) is obtained by taking the
square root of DM squared, but in practice, the squared Mahalanobis
distance is often used instead (00:17:53).
- The reason for using the squared Mahalanobis distance is its
connection to the Chi-square distribution, allowing for reference to the
Chi-square table, and because the ordering of the squared and unsquared
distances is the same (00:19:20).
- The choice between using the squared or unsquared Mahalanobis
distance depends on the function used, and it’s recommended to check the
documentation or perform test cases to determine the output (00:19:05).
- The multivariate analog of the standard deviation (Sigma) is needed to
calculate the Prasanta
Chandra Mahalanobis distance, and it is represented by a square root
matrix, which is the positive square root of the variance-covariance
matrix (00:21:23).
- The base R function uses the squared distance, but it’s essential to
check the documentation to confirm the output of specific functions (00:21:40).
- To obtain the Mahalanobis
distance, one can take the square root of the squared Mahalanobis
distance, and the choice between the two often depends on the specific
application or interest (00:20:10).
- The Mahalanobis distance is calculated using the formula z = (y -
mu) / S, where S is the multivariate analog of sigma, and it is
essential to find the square root matrix of the variance-covariance
matrix to obtain S (00:20:50).
- To verify the calculation of Mahalanobis distances, one can compare
the results to a base function or check if the values are equal to the
square root of the expected value, with Peter Johnson suggesting this as
an easy check (00:22:09).
- Jay Verkuilen mentions that another way to verify the calculation is
to see if the Mahalanobis distances are scaling correctly, specifically
that they should scale like the square root of the number of variables
if the values are not squared, or like the number of variables if they
are squared (00:22:38).
- Verkuilen explains that if the Prasanta
Chandra Mahalanobis distances are scaling like the square root of
the number of variables, most of the values should be between 0 and 2
times the square root of the number of variables (00:23:01).
- Verkuilen notes that the square root of 5 is approximately 2.2, and
that the values should be close to this range if they are scaling
correctly (00:23:20).
- Verkuilen emphasizes that the best way to verify the calculation is
to check the documentation, rather than relying on these rough checks (00:23:31).
Matrix Square Root, Eigenvalues, and Eigenvectors
- The concept of a Square root
of a matrix is introduced, which is necessary to define the Mahalanobis
distance, and Verkuilen notes that this requires an understanding of
eigenvalues and eigenvectors (00:23:43).
- Verkuilen explains that the matrix square root is not simply the
square root of each element in the covariance matrix, but rather a more
complex operation that requires the use of eigenvalues and eigenvectors
(00:24:14).
- If Y is multivariate normal or approximately multivariate normal,
the squared Prasanta
Chandra Mahalanobis distance is approximately chi-square with
degrees of freedom given by the dimension of Y (00:24:52).
- Verkuilen notes that most of the time, people use squared
Mahalanobis distances in an ordinal fashion, but if exact values are
needed, the square root of the distance must be taken (00:25:24).
Dimensions and Calculations of Mahalanobis Distance
- The dimensions of the mean vector are P by 1, but when transposed,
it becomes 1 by P, where P is the number of variables in the column
vector y (00:26:53).
- The dimensions of the covariance matrix are P by P, where P is the
number of variables in the random vector (00:27:24).
- Inverting the covariance matrix operates similarly to
standardization, and it can be used to calculate the Prasanta
Chandra Mahalanobis distance squared (00:27:41).
- The Mahalanobis
distance squared can be calculated using the formula Z transpose Z,
where Z is the multivariate analog of the Z score (00:28:47).
- The covariance matrix Sigma must be invertible
to calculate the Mahalanobis distance, but there are ways to work around
this if Sigma is not invertible (00:29:14).
Eigendecomposition and Singular Value Decomposition (SVD)
- The symmetric square root matrix depends on the properties of
eigenvalues, which will be discussed in relation to Eigendecomposition
of a matrix and Singular
value decomposition (00:29:53).
- The motivation for eigendecomposition and singular value
decomposition is to break down a complicated matrix into simpler
components (00:30:30).
- Eigendecomposition and singular value decomposition can be used to
simplify complex matrices, similar to how prime numbers cannot be broken
down into simpler components (00:31:04).
- The concept of prime factorization is discussed, where a number is
broken down into its simplest components, such as 270 being factored
into 2 times 3 cubed times 5 (00:31:23).
- This concept is also applied to polynomials, where they can be
factored into simpler components, such as x cubed minus x plus x squared
minus one factoring into x plus one squared times x minus one (00:32:39).
- The goal of eigenvalues, eigen decomposition, and singular value
decomposition is to take complex matrices and break them down into
simpler, more understandable components (00:33:23).
- The concept of rank is introduced, which refers to the underlying
dimensionality of a matrix, with the rank being greater than or equal to
one, except possibly for the zero matrix (00:34:09).
- The rank of a matrix X, which is N by P, is less than or equal to
the minimum of N and P, meaning that if N is 5,000 and P is 10, the rank
of the matrix will be 10 or less (00:35:00).
- An example of a rank-one matrix is given, where a 3 by 1 matrix is
multiplied by a 1 by 3 matrix, resulting in a 3 by 3 matrix, also known
as an outer product (00:35:29).
- The matrix s(023) 423-4234 has all rows that are the same,
indicating it has 2 one-dimensional things interacting with each other,
resulting in an outer product with a highly structured aspect (00:36:06).
- The product of 2 vectors is rank one, and invertible matrices are
full rank, meaning their rank is equal to the number of rows or columns,
and they must be square (00:37:02).
- A 10 by 10 matrix with full rank is invertible, and the Singular
value decomposition (SVD) can be used to write a matrix as a product
of 3 matrices: U, D, and V (00:37:39).
- The SVD breaks down a matrix into 3 components: row singular vectors
(U), column singular vectors (V), and singular values (D) (00:38:04).
- The singular values (D) are diagonal and arranged from greatest to
least, with the smallest singular value being 0 (00:39:24).
- U and V are orthogonal matrices, meaning their transposes are their
inverses, and U transpose U equals the identity or V transpose V equals
the identity (00:39:55).
- Orthogonal matrices have geometric meanings and are characterized by
their transposes being their inverses (00:40:10).
- Singular Value Decomposition (SVD) involves a rotation, a stretch,
and another rotation, which can be thought of as a decomposition into P
rank one layers, where each layer is an outer product of two vectors
with weights represented by the diagonal matrix D (00:41:10).
- The Singular
value decomposition process creates layers of rank one
approximations, with each subsequent layer being orthogonal to the
previous one, and typically only a few layers are used, such as 2 or 3
dimensions in Principal Components Analysis (Principal
component analysis) (00:41:53).
- The formula for SVD involves multiplying by the singular value and
then by rank one matrices, with the first rank one matrix being the best
approximation, the next one being the next best, and so on, similar to
residual analysis (00:42:30).
- An example of an orthogonal matrix is provided, which is a matrix Q
with values 1, 1 over root 2, 1 over root 2, 1 over root 2, and negative
1 over root 2, and can be factored to show a contrast matrix (00:43:10).
- Orthogonal matrices and singular value decomposition have
similarities to contrasts in analysis of variance (ANOVA), where the
grand mean is removed, followed by main effects, interactions, and so on
(00:43:44).
- Multiplying Q transpose by Q results in the identity matrix, and the
matrix Q is symmetric (00:44:42).
- The process of decomposing a matrix into simpler components allows
for understanding what’s happening to vectors as they are multiplied,
and it enables taking steps sequentially to analyze the transformation
(00:45:18).
- By breaking down the multiplication by a matrix into simpler
components, it’s possible to think about it as a multi-step process,
such as decomposing P into UDV transpose, and then multiplying U, D, and
V transpose sequentially (00:46:11).
- This decomposition also enables intervention during the process,
such as dealing with a matrix that’s close to being singular, which will
be discussed in the topic of regularization (00:46:37).
- Breaking down the matrix into meaningful pieces, such as rotation,
stretch, and rotation, provides an ability to understand what the matrix
is doing (00:47:03).
- Multiplying a vector by a matrix can change the length of the
vector, and sometimes it’s desirable to preserve the original length,
which can be achieved by using a different matrix, such as Q (00:48:04).
- Matrix Q is designed to preserve the length of the vector, unlike
matrix P, which changes the length (00:49:04).
- Understanding matrices as actions on vectors, and being able to
decompose them into simpler components, allows for a deeper
understanding of what the matrix is doing (00:49:13).
- Multiplying by a diagonal matrix stretches the data, for example,
multiplying by 3, 2, 1 will stretch 1, 1 to 3, 2, 1 (00:49:43)
- Singular values currently have no specific interpretation, but they
will have meaning in the context of Principal
component analysis, which is the next class (00:50:04)
- Singular values will be interpreted as having a meaning in terms of
the data when PCA is covered (00:50:33)
- One of the most useful things that can be done with SVD is to test
whether a matrix is rank deficient, or not full rank, by using the Singular
value decomposition (00:51:00)
- SVD can be used to determine how many dimensions a matrix has lost,
which can be useful in identifying rank deficiency (00:51:07)
- An example of using SVD to identify rank deficiency is in a multiple
imputation project, where the algorithm was crashing constantly due to a
rank deficient covariance matrix (00:52:22)
- The problem in the multiple imputation project was caused by
variables that were functionally dependent on each other, such as items
and a total score, which is dependent on the items (00:53:40)
- Multiple imputation is a difficult and challenging task, especially
when dealing with missing data, and can take a significant amount of
time to complete (00:51:57)
- The use of SVD helped to identify the rank deficiency in the
covariance matrix, which was causing the multiple imputation algorithm
to crash (00:53:12)
- Multiple imputation involves running a large number of regression
analyses, which can be challenging when dealing with a large dataset,
especially if it’s not well-documented, as was the case with a dataset
containing around 150 variables (00:54:25).
- Singular
value decomposition (SVD) was found to be a useful tool in
identifying the dependencies in the dataset and understanding the
matrix’s rank (00:55:12).
- SVD is considered a crucial numerical algorithm, possibly the second
most important after Gaussian elimination, and is used in various
applications, including Google (00:55:51).
- By repeatedly running SVD and examining the documentation, it was
possible to identify and remove problematic variables, eventually
obtaining a matrix with a non-zero eigenvalue (00:56:06).
- The process of identifying the dependencies and running the analysis
was time-consuming, taking several days to complete, due to the
complexity of the analysis and the limited computing power at the time
(00:56:44).
Eigenvalues and Eigenvectors for Symmetric Matrices
- The discussion will focus on eigenvalues and eigenvectors,
specifically for Symmetric
matrix, as the non-symmetric case can be more complex (00:57:33).
- The eigenvalue eigenvector decomposition for a symmetric matrix A
can be represented as A = UΛU^T, where U is orthogonal, Λ is diagonal,
and U^T is orthogonal (00:58:04).
- The application of Eigenvalues and Eigenvector decomposition will
focus on symmetric matrices, which are commonly used in various fields,
such as symmetric covariance matrices, correlation matrices, and
distance matrices (00:58:48).
- Symmetric matrices are preferred because their sums of squares are
symmetric, and they are widely used in many applications (00:58:55).
- There are other applications of Eigenvalues and Eigenvectors that
involve non-symmetric matrices, such as Markov chains, which are
discrete time series models used to track the movement of individuals or
objects (00:59:31).
- Markov chains can be used to model various phenomena, such as
college persistence, where a person’s decision to register or stop out
is tracked over time (00:59:58).
- In cases where non-Symmetric
matrix are used, the Eigenvalues and Eigenvectors will not be the
same as those obtained from symmetric matrices (01:00:24).
- To create a symmetric matrix, a matrix X can be multiplied by its
transpose, X transpose, resulting in a symmetric matrix X transpose X (01:00:42).
- The product of X transpose X can be expressed as the product of
three matrices: U, D, and V, where U and V are orthogonal matrices, and
D is a diagonal matrix (01:01:04).
- By using the Singular
value decomposition (SVD) of X, the product X transpose X can be
expressed as U times D squared times V transpose, which is the same form
as the Eigen decomposition (01:01:44).
- This shows that a symmetric matrix can always be obtained by
multiplying a matrix by its transpose, and the resulting matrix can be
decomposed into its Eigenvalues and Eigenvectors (01:02:13).
- The same process can be applied to the product X times X transpose,
resulting in a symmetric matrix that can be decomposed into its
Eigenvalues and Eigenvectors (01:02:34).
- Squared singular values are also known as eigenvalues, and they can
be listed as either in Principal
component analysis output depending on the program used, with Facto
Miner being an example that calls them eigenvalues (01:02:54).
- These eigenvalues function like variances if the matrix A is
positive semi-definite, meaning it follows the rules of a correlation
matrix, including being symmetric and not having extremely strong
correlations between variables (01:03:18).
- A positive semi-definite matrix is a characteristic of all
covariance matrices and distance matrices (01:04:01).
- The computation of eigenvalues and eigenvectors is typically done
using a computer, as it can be a complex and time-consuming process,
especially for large matrices (01:04:16).
- The development of numerical linear algebra, including eigenvalue
algorithms, began in the 1950s, with some of the earliest algorithms
being developed by psychometricians who worked with multivariate data (01:05:28).
- One of the first eigenvalue algorithms was programmed in the 1950s,
although the specific problem it was solving is not remembered (01:05:46).
- The development of numerical linear algebra was necessary due to the
lack of computers before the 1950s, which made solving large matrix
problems difficult (01:06:17).
- Inverting a 10 by 10 matrix was a challenging task in the past, and
it was not until the development of computers that such tasks became
more manageable (01:06:40).
- An Eigenvector is a vector that, when multiplied by a matrix A,
results in a scaled version of itself, with the eigenvalue being the
scalar (01:07:10).
- The concept of Eigenvectors and eigenvalues can be understood
geometrically, although this may not be intuitive for everyone (01:07:10).
- An Eigenvector is a vector that, when multiplied by a matrix, only
gets stretched or shrunk, but does not change direction, and its
eigenvalue determines the amount of stretching or shrinking (01:07:31).
- When an Eigenvector is multiplied by its corresponding eigenvalue,
the result is the same vector, but scaled by the eigenvalue, and this
property makes Eigenvectors useful in linear algebra (01:08:02).
- A vector that is close to an Eigenvector will remain close to that
Eigenvector after multiplication by the matrix, and this concept is
related to the Gershgoran disk, which provides a way to quantify how
close is close (01:10:00).
- The Gershgoran disk is a concept in advanced linear algebra that
provides a way to determine how close a vector is to an Eigenvector, and
it is named after a Russian mathematician, with different spellings such
as Gershgoran, Hirschhoren, or Hirshhorn
Museum and Sculpture Garden (01:10:22).
- When eigenvalues are too close to each other, it can be difficult to
determine the correct orientation of a configuration of points, and this
can lead to poorly identified models or ambiguous results (01:10:52).
- In general, distinct eigenvalues are desirable, as they provide a
clear and unique solution, but when eigenvalues are close to each other,
it can be challenging to distinguish between different aspects of the
data (01:11:26).
- The gaps between eigenvalues are related to the strength of the
corresponding Eigenvectors, with larger gaps indicating stronger
Eigenvectors (01:11:42).
Homework Discussion and Data Analysis Example
- The process of calculating Prasanta
Chandra Mahalanobis distances does not require manually computing
the covariance, as this was already done previously, and the focus is on
rerunning the steps up to generating x1 and x2, as well as z1 and z2,
which involves z-scoring the columns without taking dependencies into
account (01:21:53).
- The Eigendecomposition
of a matrix and Singular
value decomposition of the covariance matrices s1 and s2 are
examined, where s1 represents the covariance matrix for individuals who
failed in a program, and s2 represents the covariance matrix for those
who succeeded (01:22:42).
- The three variables in the study are an IQ score, high school GPA,
and a job-specific test score, each with different metrics, such as IQ
scores on a 115 metric and high school GPA on a 0 to 4 metric (01:23:14).
- The Eigenvalues and Eigenvectors of the covariance matrix s1 are
obtained using the eigen function in R, which returns a vector of
Eigenvalues and a matrix of Eigenvectors (01:24:21).
- The Eigen decomposition is a diagonal matrix, but the eigen function
returns the Eigenvalues as a vector, requiring the use of the diag
function to reconstruct the diagonal matrix (01:24:56).
- The Eigenvectors are also obtained, and it is noted that they can be
used to reconstruct the original covariance matrix s1 by multiplying the
Eigenvectors, the diagonal matrix of Eigenvalues, and the transpose of
the Eigenvectors (01:25:26).
- The Eigenvectors are examined, and it is observed that each row or
column has a larger value in magnitude, with the remaining values being
small or close to zero (01:26:49).
- The relationship among the variables is not particularly strong, as
indicated by the fairly small correlations, which is why the
Eigenvectors end up having big and small values (01:27:13).
- The Eigenvectors times the Eigenvalues can help reemerge the
variances, and the last step rotates them back into position (01:27:57).
- Rounding numbers can be a useful tool for understanding, as it can
help simplify the matrix and make it easier to comprehend (01:28:58).
- The Eigenvector matrix is not a resizing tool, but rather a tool
that flips, flops, or reorders the elements (01:29:30).
- Multiplying a vector by the Eigenvector matrix can result in
negative elements and a change in the order of the elements (01:29:50).
- The Eigenvector matrix is essentially shuffling the elements around,
and it can be used to reconstruct the original matrix by multiplying it
with its transpose and the Eigenvalues (01:30:30).
- The process of reconstructing the original matrix involves shuffling
the elements, stretching or shrinking them, and then shuffling them
again to get back to the original position (01:31:20).
- The Eigenvector matrix times its transpose can put the elements back
in their original places, and the Eigenvalues can rescale them back to
the original values (01:31:05).
- The Eigenvalues obtained from a matrix have interpretations, with
the largest Eigenvalue associated with the strongest variance, the next
largest with the next strongest variance, and so on, taking into account
all correlations and interrelationships (01:31:32).
- The method of obtaining these Eigenvalues is not random, as
demonstrated by the specific values obtained from a particular matrix,
which are 1, 14, 4, 27, and 27, and the method actually flips them
around (01:32:06).
- The I function returns a vector and a matrix, while the Singular
value decomposition function returns a vector and two matrices (01:33:15).
- The Svd function applied to a covariance matrix gives the same
eigenvectors as obtained previously, but the singular values are not the
same as the Eigenvalues (01:33:56).
- To obtain the same singular values as the Eigenvalues, the data
needs to be column-centered, and the singular values are the square
roots of the Eigenvalues (01:34:30).
- The Eigenvectors obtained from the Svd of a correlation matrix are
the same as those obtained from the covariance matrix, except they can
differ by a sign (01:36:55).
- Multiplying an Eigenvector by negative one results in the same
Eigenvector (01:37:01).
- The scale function is useful in data analysis, and there is a more
advanced version called the Sweep function that can perform powerful
operations (01:37:17).
- Applying the Singular
value decomposition to a correlation matrix requires dividing by the
standard deviations, but once this is done, the results are consistent
with the Eigenvectors obtained from the covariance matrix (01:37:35).
- Singular value decomposition (SVD) and Eigendecomposition
of a matrix produce singular values and eigenvalues, as well as
eigenvectors and singular vectors, which are often the same (01:37:41).
- Double centering is a process that removes both column and row means
from a matrix, which can be useful in certain situations, such as when
individual differences among people need to be removed (01:38:21).
- When double centering is applied, the eigenvectors of the resulting
matrix are often very similar to each other, and the eigenvalues can be
affected, sometimes resulting in a rank deficient matrix (01:39:32).
- Double centering can potentially remove individual differences among
people by subtracting overall individual differences (01:41:57).
- The effect of double centering on eigenvalues can be seen by
comparing the eigenvalues of the original data and the double-centered
data, which can result in similar values (01:42:01).
- Double centering can make a matrix rank deficient because it
subtracts out all the rows (01:42:34).
- The purpose of double centering is to remove individual differences,
and it will be encountered again in homework and correspondence analysis
(01:41:37).
- R uses scientific notation to display certain numbers, which can
sometimes cause confusion (01:40:59).
- The process of reconstructing a matrix and interpreting it can be
complex and time-consuming, especially for larger datasets, making it
impractical to do by hand for anything bigger than a 3x3 matrix (01:43:01).
- A dataset comparing the head lengths and breadths of 25 pairs of
brothers is used as an example, with the data being small and having
been used extensively in research (01:44:01).
- The dataset includes measurements for the eldest and second sons,
with variables labeled as L1, B1, L2, and B2, representing the head
length and breadth of each brother (01:44:37).
- It is expected that L1 and B1 would be correlated, as well as L2 and
B2, since bigger people tend to have bigger heads, and siblings would
likely be correlated with each other, but to a smaller extent (01:44:54).
- The importance of thinking theoretically and using descriptive
analysis when dealing with datasets is emphasized, such as examining the
correlation matrix and grouping similar variables together (01:45:22).
- A correlation matrix can be used to identify patterns and
relationships between variables, such as the correlation between each
brother’s measurements and the inter-brother correlations (01:46:20).
- Examining the standard deviations of the variables can also provide
insight, and it is noted that drastically different standard deviations
may require harmonizing the scales (01:46:57).
- A scatter plot of the data is presented, showing a high correlation
between the variables, with the audience agreeing that the correlation
is “pretty highly correlated” (01:47:37).
- The data points are relatively strongly correlated and appear to be
tolerably Gaussian, but there might be an outlier due to the small
sample size (01:47:54).
- The scale function is useful for centering and scaling data, and it
provides the means and standard deviations of the variables, making it
easy to undo the scaling if needed (01:48:28).
- The scale function standardizes the data but does not transform it
to the Prasanta
Chandra Mahalanobis level, and it also provides the center and
standard deviation of the variables (01:48:55).
- The sweep function can be used to apply other transformations
besides scaling and rescaling, making it a useful tool for data
manipulation (01:49:37).
- To perform Eigendecomposition
of a matrix, the data needs to be converted into a matrix, and then
the matrix can be multiplied by its transpose to obtain the desired
dimensions (01:49:51).
- The dimensions of the resulting matrices are as expected, with a
25x25 matrix and a 4x4 matrix (01:50:18).
- The eigenvectors and eigenvalues can be used to understand the
relationships between the variables, with the first eigenvector
representing the overall size and the subsequent eigenvectors
representing contrasts between variables (01:50:42).
- The reconstruction formula can be used to verify the results of the
eigendecomposition, and it involves using the eigenvalues and
eigenvectors to reconstitute the original matrix (01:51:42).
- The zap small function in R can be used to remove small values from
a matrix, but it does not always work as expected (01:52:24).
- The “zap small” function is useful for making numbers in scientific
notation more understandable, but it doesn’t always work, especially
when the scale between numbers is dramatically different (01:52:36).
- Rounding is a very helpful technique that makes a massive difference
in making matrices easier to understand, especially for non-experts who
may be overwhelmed by scientific notation (01:53:12).
- Rounding requires judgment and experience, and it’s something that
can be improved over time (01:54:04).
- The rank of a matrix is the number of non-zero eigenvalues, and a
full-rank matrix has the same number of non-zero eigenvalues as its
dimensions (01:54:29).
- To make a rank-3 approximation of a matrix, the smallest eigenvalue
can be set to zero, effectively throwing out the corresponding
information (01:55:09).
- The resulting matrix is an approximation of the original matrix, and
the eigenvalues will be smaller (01:55:30).
- A rank-2 approximation is expected to be slightly worse than a
rank-3 approximation, but the difference depends on the specific case (01:56:14).
- The residual matrix of a rank-2 approximation is indeed worse than
that of a rank-3 approximation, but it may not be bad enough to be a
concern (01:56:40).
- To make lower rank approximations, smaller eigenvalues are thrown
out, and this is exactly what is done in Singular
value decomposition (SVD) and principal components analysis (Principal
component analysis) (01:56:49).
- Lower rank approximations of the full data are used as the basis for
analysis (01:57:18).
- The properties of eigenvalues are reviewed, including the fact that
eigenvalues and eigenvectors should be orthogonal, except for possible
negative one flips (01:57:40).
- The direction of eigenvectors is not identified, so expect negative
one flips to happen, which can be confusing when doing bootstrapping
with PCA (01:58:58).
- The SVD is used to find the singular values, which are the square
root of the eigenvalues (01:59:33).
- The mechanics of SVD and eigen decomposition are important to
understand, as they will be used to do PCA and other methods in the next
class (02:00:04).
- Principal
component analysis involves taking the Singular
value decomposition of a data matrix, scaling the variables to make
them comparable, and then interpreting the results (02:00:08).
- Methods like correspondence analysis, principal components analysis,
and some variations of multi-dimensional scaling are all SVD-based (02:00:52).
- Regularization will be used in machine learning methods, such as
autoencoders, to soften the decision of whether to keep or drop data,
rather than making a binary decision (02:01:06).
- The Singular Value Decomposition (SVD) is a powerful tool, but it
has its issues and limitations, and there are things it could do better
(02:01:42).
- The SVD is the basis of how the modern world runs, but it doesn’t do
exactly what is wanted, and it has problems that need to be fixed (02:01:53).
Homework Help and Q&A
- Homework is due, and if students need extra time, it’s fine, and
they can take an extra week or half the time, as the assignments are for
learning, not for evaluation (02:04:02).
- The homework involves various matrix expressions and calculations,
and students are encouraged to write them out step by step and follow
the orders of operation (02:04:32).
- Some of the matrix expressions may look complicated, but they can be
broken down into simpler steps, and students can use previously
calculated values to avoid redoing everything (02:04:47).
- The instructor is available to answer questions and provide help,
and students can also ask for office hours if needed (02:03:34).
- A student, shemontee, asked for clarification on a specific problem,
F, which involves a 1 column vector and a row vector with 3 ones, and
the instructor provided an explanation (02:05:22).
- When multiplying a row vector by a column vector, the result is not
equal to the multiplication of the column vector by the row vector, as
the dimensions are different (02:05:59).
- A matrix is symmetric if it is equal to its transpose, and this
property implies that the matrix is square (02:06:27).
- When explaining properties of matrices, it is sufficient to state
the property in one or two words, such as “symmetric” or “undefined”,
and provide a brief explanation if necessary (02:06:54).
- If a matrix operation is undefined, it is acceptable to explain why,
for example, by stating that the dimensions are incompatible (02:07:14).
- When multiplying a vector by a matrix, the result can be explained
by describing the effect of the matrix on the vector, such as adding or
taking differences between the vector’s components (02:09:20).
- An arbitrary vector is a vector with variables, not concrete
numbers, and not all components can be equal to 0 (02:08:46).
- To explain what happens to a vector under matrix multiplication, one
should take the vector and multiply it by different matrices, then
describe the resulting effect on the vector (02:09:01).
- The matrices used in the discussion are not complex and are similar
to those that would be encountered in real-life scenarios, although they
may be larger in size (02:09:29).
- Carla O asked for clarification on the dimensions of the matrix X,
and it was confirmed that X should be transposed to a 3 by 1 column (02:10:16).
- Jay Verkuilen frequently writes matrices in their transposed form to
make the page easier to read and to conserve space (02:10:27).
- This is a common practice, but it is essential to remember to
transpose the matrix back when working with it (02:11:03).
- Jay Verkuilen does not expect students to transpose the matrices
themselves, as it is primarily done for readability (02:11:12).
- The discussion moved on to multiplying matrices and understanding
the results, with a focus on using different Prasanta
Chandra Mahalanobis and Euclidean distances (02:11:41).
- Jay Verkuilen provided a dataset on Parkinson’s and asked students
to use different cases to calculate Mahalanobis and Euclidean distances
(02:11:53).
- Fabio Setti asked if it is possible for Mahalanobis and Euclidean
distances to be the same, and Jay Verkuilen confirmed that it is
possible if the variables have the same variances and no correlations (02:12:23).
- Jay Verkuilen emphasized the importance of understanding the data
before performing any advanced analysis and encouraged students to use
the R function to run descriptive analysis (02:13:03).
- The due date for a course post can be changed to a week later if
needed, and students can ask questions online if they require
clarification (02:13:36).
- For a specific problem, students are not expected to do calculations
by hand, and they can use R for the computations (02:13:52).
- When computing Prasanta
Chandra Mahalanobis distances, students should use the covariance
matrix of group N, not the full data set (02:14:49).
- The Mahalanobis
distance function should be computed relative to the mean vector of
group N, and this applies to all comparisons (02:14:57).
- The question was updated from last year, and there may be occasional
bugs or errors, but the instructor will fix them and upload the
corrected version (02:15:03).
- For all comparisons, students should compare from group N, except in
one case where they need to look at how things differ from each other (02:15:54).
- The instructor will put a comment on the post to clarify that all
Mahalanobis distances should be computed relative to group N (02:16:26).
- Office hours for the next week are available, but the instructor may
be unavailable on Thursday and Friday, so students should get in touch
earlier to schedule a meeting (02:16:41).
- Monday or Wednesday are generally better days for office hours,
while Tuesday is not ideal due to the instructor’s doctor’s appointments
(02:17:12).
- Jay Verkuilen is available to meet on either Monday or Wednesday,
and prefers if the group can come together (02:17:42).
- For calculations, if a mean vector is needed, it comes from group n,
and if a covariance matrix is needed, it also comes from group n (02:17:55).
- Jay Verkuilen will close the session and post the content when it’s
done (02:18:07).
- The session ends with thank you messages from Carla O, Fabio Setti,
and Jay Verkuilen (02:18:14).