MVA Class 5 — M-distances, Eigendecomposition, SVD

Introduction and Course Overview

The class is about to start, and the instructor, Jay Verkuilen, mentions that the homework is due on Monday and will provide hints on the questions during the last 15-20 minutes of the class (00:08:19).
Jay Verkuilen plans to discuss the topics of M-distance, Prasanta Chandra Mahalanobis distance, and Singular value decomposition, which are connected to the previous topics of covariance matrices and matrix multiplication (00:09:44).
The class will also cover eigenvalues, which are substantially connected to the other topics (00:09:51).
Jay Verkuilen mentions that the topics are connected and will provide a high-level overview, with readings available for those who want a more in-depth understanding (00:10:41).
The instructor posts a reminder in the chat that the last 15 minutes of the class will be dedicated to discussing the homework (00:10:34).

Mahalanobis Distance

The class begins with a discussion of Prasanta Chandra Mahalanobis distance, named after the Indian statistician Mahalanobis, who developed the concept (00:11:32).
Mahalanobis was a physicist who studied Einsteinian relativity at Cambridge and later worked as a statistician in India, contributing to the foundations of multivariate statistics in the 1930s (00:11:39).
Principal component analysis (PCA) has its roots in the 19th century, but the modern version was developed in the 1930s, with Mahalanobis working on the concept at an anthropological laboratory, studying skulls and trying to understand comparable measurements and compare units (00:12:43).
When dealing with one dimension or variable, one of the most obvious ways to make comparable measurements is to z-score it, which cancels out the units of the mean and standard deviation (00:13:58).
Z-scores are unsigned and can be used as a distance, calculated as the absolute value of y minus the mean, divided by the standard deviation (00:14:39).
In the 1920s, researchers started dealing with multivariate observations by rescaling variables by their standard deviations, effectively z-scoring each variable but leaving the correlation structure alone (00:15:20).
This approach converts the covariance matrix into a correlation matrix but fails to take the correlation structure of the variables into account (00:16:00).
Prasanta Chandra Mahalanobis, with his background in general relativity, proposed using ideas from general relativity to decorrelate the variables, which can be achieved by taking the inverse of the covariance matrix, denoted as Sigma inverse (00:17:11).
Mahalanobis’ approach involves using the inverse of the covariance matrix to decorrelate the variables, which is a key concept in Principal component analysis (00:17:18).
The Mahalanobis distance is a proper generalization of the multivariate Z score, and it is usually unsigned because it’s difficult to decide the sign of a vector, with the exception of uniformly below the mean cases which could be considered negative (00:17:41).
The actual Mahalanobis distance (DM) is obtained by taking the square root of DM squared, but in practice, the squared Mahalanobis distance is often used instead (00:17:53).
The reason for using the squared Mahalanobis distance is its connection to the Chi-square distribution, allowing for reference to the Chi-square table, and because the ordering of the squared and unsquared distances is the same (00:19:20).
The choice between using the squared or unsquared Mahalanobis distance depends on the function used, and it’s recommended to check the documentation or perform test cases to determine the output (00:19:05).
The multivariate analog of the standard deviation (Sigma) is needed to calculate the Prasanta Chandra Mahalanobis distance, and it is represented by a square root matrix, which is the positive square root of the variance-covariance matrix (00:21:23).
The base R function uses the squared distance, but it’s essential to check the documentation to confirm the output of specific functions (00:21:40).
To obtain the Mahalanobis distance, one can take the square root of the squared Mahalanobis distance, and the choice between the two often depends on the specific application or interest (00:20:10).
The Mahalanobis distance is calculated using the formula z = (y - mu) / S, where S is the multivariate analog of sigma, and it is essential to find the square root matrix of the variance-covariance matrix to obtain S (00:20:50).
To verify the calculation of Mahalanobis distances, one can compare the results to a base function or check if the values are equal to the square root of the expected value, with Peter Johnson suggesting this as an easy check (00:22:09).
Jay Verkuilen mentions that another way to verify the calculation is to see if the Mahalanobis distances are scaling correctly, specifically that they should scale like the square root of the number of variables if the values are not squared, or like the number of variables if they are squared (00:22:38).
Verkuilen explains that if the Prasanta Chandra Mahalanobis distances are scaling like the square root of the number of variables, most of the values should be between 0 and 2 times the square root of the number of variables (00:23:01).
Verkuilen notes that the square root of 5 is approximately 2.2, and that the values should be close to this range if they are scaling correctly (00:23:20).
Verkuilen emphasizes that the best way to verify the calculation is to check the documentation, rather than relying on these rough checks (00:23:31).

Matrix Square Root, Eigenvalues, and Eigenvectors

The concept of a Square root of a matrix is introduced, which is necessary to define the Mahalanobis distance, and Verkuilen notes that this requires an understanding of eigenvalues and eigenvectors (00:23:43).
Verkuilen explains that the matrix square root is not simply the square root of each element in the covariance matrix, but rather a more complex operation that requires the use of eigenvalues and eigenvectors (00:24:14).
If Y is multivariate normal or approximately multivariate normal, the squared Prasanta Chandra Mahalanobis distance is approximately chi-square with degrees of freedom given by the dimension of Y (00:24:52).
Verkuilen notes that most of the time, people use squared Mahalanobis distances in an ordinal fashion, but if exact values are needed, the square root of the distance must be taken (00:25:24).

Dimensions and Calculations of Mahalanobis Distance

The dimensions of the mean vector are P by 1, but when transposed, it becomes 1 by P, where P is the number of variables in the column vector y (00:26:53).
The dimensions of the covariance matrix are P by P, where P is the number of variables in the random vector (00:27:24).
Inverting the covariance matrix operates similarly to standardization, and it can be used to calculate the Prasanta Chandra Mahalanobis distance squared (00:27:41).
The Mahalanobis distance squared can be calculated using the formula Z transpose Z, where Z is the multivariate analog of the Z score (00:28:47).
The covariance matrix Sigma must be invertible to calculate the Mahalanobis distance, but there are ways to work around this if Sigma is not invertible (00:29:14).

Eigendecomposition and Singular Value Decomposition (SVD)

The symmetric square root matrix depends on the properties of eigenvalues, which will be discussed in relation to Eigendecomposition of a matrix and Singular value decomposition (00:29:53).
The motivation for eigendecomposition and singular value decomposition is to break down a complicated matrix into simpler components (00:30:30).
Eigendecomposition and singular value decomposition can be used to simplify complex matrices, similar to how prime numbers cannot be broken down into simpler components (00:31:04).
The concept of prime factorization is discussed, where a number is broken down into its simplest components, such as 270 being factored into 2 times 3 cubed times 5 (00:31:23).
This concept is also applied to polynomials, where they can be factored into simpler components, such as x cubed minus x plus x squared minus one factoring into x plus one squared times x minus one (00:32:39).
The goal of eigenvalues, eigen decomposition, and singular value decomposition is to take complex matrices and break them down into simpler, more understandable components (00:33:23).
The concept of rank is introduced, which refers to the underlying dimensionality of a matrix, with the rank being greater than or equal to one, except possibly for the zero matrix (00:34:09).
The rank of a matrix X, which is N by P, is less than or equal to the minimum of N and P, meaning that if N is 5,000 and P is 10, the rank of the matrix will be 10 or less (00:35:00).
An example of a rank-one matrix is given, where a 3 by 1 matrix is multiplied by a 1 by 3 matrix, resulting in a 3 by 3 matrix, also known as an outer product (00:35:29).
The matrix s(023) 423-4234 has all rows that are the same, indicating it has 2 one-dimensional things interacting with each other, resulting in an outer product with a highly structured aspect (00:36:06).
The product of 2 vectors is rank one, and invertible matrices are full rank, meaning their rank is equal to the number of rows or columns, and they must be square (00:37:02).
A 10 by 10 matrix with full rank is invertible, and the Singular value decomposition (SVD) can be used to write a matrix as a product of 3 matrices: U, D, and V (00:37:39).
The SVD breaks down a matrix into 3 components: row singular vectors (U), column singular vectors (V), and singular values (D) (00:38:04).
The singular values (D) are diagonal and arranged from greatest to least, with the smallest singular value being 0 (00:39:24).
U and V are orthogonal matrices, meaning their transposes are their inverses, and U transpose U equals the identity or V transpose V equals the identity (00:39:55).
Orthogonal matrices have geometric meanings and are characterized by their transposes being their inverses (00:40:10).
Singular Value Decomposition (SVD) involves a rotation, a stretch, and another rotation, which can be thought of as a decomposition into P rank one layers, where each layer is an outer product of two vectors with weights represented by the diagonal matrix D (00:41:10).
The Singular value decomposition process creates layers of rank one approximations, with each subsequent layer being orthogonal to the previous one, and typically only a few layers are used, such as 2 or 3 dimensions in Principal Components Analysis (Principal component analysis) (00:41:53).
The formula for SVD involves multiplying by the singular value and then by rank one matrices, with the first rank one matrix being the best approximation, the next one being the next best, and so on, similar to residual analysis (00:42:30).
An example of an orthogonal matrix is provided, which is a matrix Q with values 1, 1 over root 2, 1 over root 2, 1 over root 2, and negative 1 over root 2, and can be factored to show a contrast matrix (00:43:10).
Orthogonal matrices and singular value decomposition have similarities to contrasts in analysis of variance (ANOVA), where the grand mean is removed, followed by main effects, interactions, and so on (00:43:44).
Multiplying Q transpose by Q results in the identity matrix, and the matrix Q is symmetric (00:44:42).
The process of decomposing a matrix into simpler components allows for understanding what’s happening to vectors as they are multiplied, and it enables taking steps sequentially to analyze the transformation (00:45:18).
By breaking down the multiplication by a matrix into simpler components, it’s possible to think about it as a multi-step process, such as decomposing P into UDV transpose, and then multiplying U, D, and V transpose sequentially (00:46:11).
This decomposition also enables intervention during the process, such as dealing with a matrix that’s close to being singular, which will be discussed in the topic of regularization (00:46:37).
Breaking down the matrix into meaningful pieces, such as rotation, stretch, and rotation, provides an ability to understand what the matrix is doing (00:47:03).
Multiplying a vector by a matrix can change the length of the vector, and sometimes it’s desirable to preserve the original length, which can be achieved by using a different matrix, such as Q (00:48:04).
Matrix Q is designed to preserve the length of the vector, unlike matrix P, which changes the length (00:49:04).
Understanding matrices as actions on vectors, and being able to decompose them into simpler components, allows for a deeper understanding of what the matrix is doing (00:49:13).
Multiplying by a diagonal matrix stretches the data, for example, multiplying by 3, 2, 1 will stretch 1, 1 to 3, 2, 1 (00:49:43)
Singular values currently have no specific interpretation, but they will have meaning in the context of Principal component analysis, which is the next class (00:50:04)
Singular values will be interpreted as having a meaning in terms of the data when PCA is covered (00:50:33)
One of the most useful things that can be done with SVD is to test whether a matrix is rank deficient, or not full rank, by using the Singular value decomposition (00:51:00)
SVD can be used to determine how many dimensions a matrix has lost, which can be useful in identifying rank deficiency (00:51:07)
An example of using SVD to identify rank deficiency is in a multiple imputation project, where the algorithm was crashing constantly due to a rank deficient covariance matrix (00:52:22)
The problem in the multiple imputation project was caused by variables that were functionally dependent on each other, such as items and a total score, which is dependent on the items (00:53:40)
Multiple imputation is a difficult and challenging task, especially when dealing with missing data, and can take a significant amount of time to complete (00:51:57)
The use of SVD helped to identify the rank deficiency in the covariance matrix, which was causing the multiple imputation algorithm to crash (00:53:12)
Multiple imputation involves running a large number of regression analyses, which can be challenging when dealing with a large dataset, especially if it’s not well-documented, as was the case with a dataset containing around 150 variables (00:54:25).
Singular value decomposition (SVD) was found to be a useful tool in identifying the dependencies in the dataset and understanding the matrix’s rank (00:55:12).
SVD is considered a crucial numerical algorithm, possibly the second most important after Gaussian elimination, and is used in various applications, including Google (00:55:51).
By repeatedly running SVD and examining the documentation, it was possible to identify and remove problematic variables, eventually obtaining a matrix with a non-zero eigenvalue (00:56:06).
The process of identifying the dependencies and running the analysis was time-consuming, taking several days to complete, due to the complexity of the analysis and the limited computing power at the time (00:56:44).

Eigenvalues and Eigenvectors for Symmetric Matrices

The discussion will focus on eigenvalues and eigenvectors, specifically for Symmetric matrix, as the non-symmetric case can be more complex (00:57:33).
The eigenvalue eigenvector decomposition for a symmetric matrix A can be represented as A = UΛU^T, where U is orthogonal, Λ is diagonal, and U^T is orthogonal (00:58:04).
The application of Eigenvalues and Eigenvector decomposition will focus on symmetric matrices, which are commonly used in various fields, such as symmetric covariance matrices, correlation matrices, and distance matrices (00:58:48).
Symmetric matrices are preferred because their sums of squares are symmetric, and they are widely used in many applications (00:58:55).
There are other applications of Eigenvalues and Eigenvectors that involve non-symmetric matrices, such as Markov chains, which are discrete time series models used to track the movement of individuals or objects (00:59:31).
Markov chains can be used to model various phenomena, such as college persistence, where a person’s decision to register or stop out is tracked over time (00:59:58).
In cases where non-Symmetric matrix are used, the Eigenvalues and Eigenvectors will not be the same as those obtained from symmetric matrices (01:00:24).
To create a symmetric matrix, a matrix X can be multiplied by its transpose, X transpose, resulting in a symmetric matrix X transpose X (01:00:42).
The product of X transpose X can be expressed as the product of three matrices: U, D, and V, where U and V are orthogonal matrices, and D is a diagonal matrix (01:01:04).
By using the Singular value decomposition (SVD) of X, the product X transpose X can be expressed as U times D squared times V transpose, which is the same form as the Eigen decomposition (01:01:44).
This shows that a symmetric matrix can always be obtained by multiplying a matrix by its transpose, and the resulting matrix can be decomposed into its Eigenvalues and Eigenvectors (01:02:13).
The same process can be applied to the product X times X transpose, resulting in a symmetric matrix that can be decomposed into its Eigenvalues and Eigenvectors (01:02:34).
Squared singular values are also known as eigenvalues, and they can be listed as either in Principal component analysis output depending on the program used, with Facto Miner being an example that calls them eigenvalues (01:02:54).
These eigenvalues function like variances if the matrix A is positive semi-definite, meaning it follows the rules of a correlation matrix, including being symmetric and not having extremely strong correlations between variables (01:03:18).
A positive semi-definite matrix is a characteristic of all covariance matrices and distance matrices (01:04:01).
The computation of eigenvalues and eigenvectors is typically done using a computer, as it can be a complex and time-consuming process, especially for large matrices (01:04:16).
The development of numerical linear algebra, including eigenvalue algorithms, began in the 1950s, with some of the earliest algorithms being developed by psychometricians who worked with multivariate data (01:05:28).
One of the first eigenvalue algorithms was programmed in the 1950s, although the specific problem it was solving is not remembered (01:05:46).
The development of numerical linear algebra was necessary due to the lack of computers before the 1950s, which made solving large matrix problems difficult (01:06:17).
Inverting a 10 by 10 matrix was a challenging task in the past, and it was not until the development of computers that such tasks became more manageable (01:06:40).
An Eigenvector is a vector that, when multiplied by a matrix A, results in a scaled version of itself, with the eigenvalue being the scalar (01:07:10).
The concept of Eigenvectors and eigenvalues can be understood geometrically, although this may not be intuitive for everyone (01:07:10).
An Eigenvector is a vector that, when multiplied by a matrix, only gets stretched or shrunk, but does not change direction, and its eigenvalue determines the amount of stretching or shrinking (01:07:31).
When an Eigenvector is multiplied by its corresponding eigenvalue, the result is the same vector, but scaled by the eigenvalue, and this property makes Eigenvectors useful in linear algebra (01:08:02).
A vector that is close to an Eigenvector will remain close to that Eigenvector after multiplication by the matrix, and this concept is related to the Gershgoran disk, which provides a way to quantify how close is close (01:10:00).
The Gershgoran disk is a concept in advanced linear algebra that provides a way to determine how close a vector is to an Eigenvector, and it is named after a Russian mathematician, with different spellings such as Gershgoran, Hirschhoren, or Hirshhorn Museum and Sculpture Garden (01:10:22).
When eigenvalues are too close to each other, it can be difficult to determine the correct orientation of a configuration of points, and this can lead to poorly identified models or ambiguous results (01:10:52).
In general, distinct eigenvalues are desirable, as they provide a clear and unique solution, but when eigenvalues are close to each other, it can be challenging to distinguish between different aspects of the data (01:11:26).
The gaps between eigenvalues are related to the strength of the corresponding Eigenvectors, with larger gaps indicating stronger Eigenvectors (01:11:42).

Mahalanobis Distance and Whitening Transform

Jay Verkuilen discusses the concept of Prasanta Chandra Mahalanobis distance, starting with a covariance matrix S with full rank, which can be decomposed into S = ULU^T, where U represents the eigenvectors and L represents the eigenvalues (01:12:19).
The inverse of the covariance matrix S can be calculated by inverting the eigenvalues and using the same eigenvectors, resulting in S^-1 = UL^-1UT (01:12:46).
Verkuilen explains that this method is not practical for computing the inverse but is important to understand, as it is used in various concepts involving inverse eigenvalues (01:13:44).
He defines S^(1/2) = UL^(1/2)UT, where L^(1/2) has the square root of the eigenvalues on the diagonal, representing the singular values (01:14:15).
Verkuilen shows that multiplying S^(1/2) by itself results in the original covariance matrix S, and that S^(-1) can be calculated using S^(-1/2)S(-1/2) (01:15:03).
He concludes that the symmetric square root, also known as the eigenvalue square root, can be used to break down the calculation of the inverse covariance matrix (01:16:15).
M-distance is essentially a process that takes all the dependencies among equations into account, making Z scores that consider all the variables and their correlations at once (01:16:45).
The symmetric square root used in M-distance takes into account the strong correlations between variables, decorrelating and standardizing the data (01:17:00).
This process is also known as a whitening transform, which is an analogy to the colors of light, where white light represents all frequencies, and colored light represents enhanced or reduced frequencies, similar to variances and covariances in data (01:17:59).
The whitening transform turns “colored” data with interrelationships into “white noise” with no relationships, allowing for comparison of columns, variables, and cases on a common metric (01:18:47).
Matrices help solve complex problems by providing a way to think about and solve equations that would be difficult or impossible to solve otherwise, such as the inverse of a covariance matrix (01:19:35).
M-distance is the multivariate analog of Z-scoring, taking into account the variables and their dependencies, and can be thought of as rotating, squishing, and transforming a cloud of data into a circle (01:20:12).
The process of M-distance is necessary to put data on a common metric, allowing for comparison and analysis of the data (01:19:10).

Homework Discussion and Data Analysis Example

The process of calculating Prasanta Chandra Mahalanobis distances does not require manually computing the covariance, as this was already done previously, and the focus is on rerunning the steps up to generating x1 and x2, as well as z1 and z2, which involves z-scoring the columns without taking dependencies into account (01:21:53).
The Eigendecomposition of a matrix and Singular value decomposition of the covariance matrices s1 and s2 are examined, where s1 represents the covariance matrix for individuals who failed in a program, and s2 represents the covariance matrix for those who succeeded (01:22:42).
The three variables in the study are an IQ score, high school GPA, and a job-specific test score, each with different metrics, such as IQ scores on a 115 metric and high school GPA on a 0 to 4 metric (01:23:14).
The Eigenvalues and Eigenvectors of the covariance matrix s1 are obtained using the eigen function in R, which returns a vector of Eigenvalues and a matrix of Eigenvectors (01:24:21).
The Eigen decomposition is a diagonal matrix, but the eigen function returns the Eigenvalues as a vector, requiring the use of the diag function to reconstruct the diagonal matrix (01:24:56).
The Eigenvectors are also obtained, and it is noted that they can be used to reconstruct the original covariance matrix s1 by multiplying the Eigenvectors, the diagonal matrix of Eigenvalues, and the transpose of the Eigenvectors (01:25:26).
The Eigenvectors are examined, and it is observed that each row or column has a larger value in magnitude, with the remaining values being small or close to zero (01:26:49).
The relationship among the variables is not particularly strong, as indicated by the fairly small correlations, which is why the Eigenvectors end up having big and small values (01:27:13).
The Eigenvectors times the Eigenvalues can help reemerge the variances, and the last step rotates them back into position (01:27:57).
Rounding numbers can be a useful tool for understanding, as it can help simplify the matrix and make it easier to comprehend (01:28:58).
The Eigenvector matrix is not a resizing tool, but rather a tool that flips, flops, or reorders the elements (01:29:30).
Multiplying a vector by the Eigenvector matrix can result in negative elements and a change in the order of the elements (01:29:50).
The Eigenvector matrix is essentially shuffling the elements around, and it can be used to reconstruct the original matrix by multiplying it with its transpose and the Eigenvalues (01:30:30).
The process of reconstructing the original matrix involves shuffling the elements, stretching or shrinking them, and then shuffling them again to get back to the original position (01:31:20).
The Eigenvector matrix times its transpose can put the elements back in their original places, and the Eigenvalues can rescale them back to the original values (01:31:05).
The Eigenvalues obtained from a matrix have interpretations, with the largest Eigenvalue associated with the strongest variance, the next largest with the next strongest variance, and so on, taking into account all correlations and interrelationships (01:31:32).
The method of obtaining these Eigenvalues is not random, as demonstrated by the specific values obtained from a particular matrix, which are 1, 14, 4, 27, and 27, and the method actually flips them around (01:32:06).
The I function returns a vector and a matrix, while the Singular value decomposition function returns a vector and two matrices (01:33:15).
The Svd function applied to a covariance matrix gives the same eigenvectors as obtained previously, but the singular values are not the same as the Eigenvalues (01:33:56).
To obtain the same singular values as the Eigenvalues, the data needs to be column-centered, and the singular values are the square roots of the Eigenvalues (01:34:30).
The Eigenvectors obtained from the Svd of a correlation matrix are the same as those obtained from the covariance matrix, except they can differ by a sign (01:36:55).
Multiplying an Eigenvector by negative one results in the same Eigenvector (01:37:01).
The scale function is useful in data analysis, and there is a more advanced version called the Sweep function that can perform powerful operations (01:37:17).
Applying the Singular value decomposition to a correlation matrix requires dividing by the standard deviations, but once this is done, the results are consistent with the Eigenvectors obtained from the covariance matrix (01:37:35).
Singular value decomposition (SVD) and Eigendecomposition of a matrix produce singular values and eigenvalues, as well as eigenvectors and singular vectors, which are often the same (01:37:41).
Double centering is a process that removes both column and row means from a matrix, which can be useful in certain situations, such as when individual differences among people need to be removed (01:38:21).
When double centering is applied, the eigenvectors of the resulting matrix are often very similar to each other, and the eigenvalues can be affected, sometimes resulting in a rank deficient matrix (01:39:32).
Double centering can potentially remove individual differences among people by subtracting overall individual differences (01:41:57).
The effect of double centering on eigenvalues can be seen by comparing the eigenvalues of the original data and the double-centered data, which can result in similar values (01:42:01).
Double centering can make a matrix rank deficient because it subtracts out all the rows (01:42:34).
The purpose of double centering is to remove individual differences, and it will be encountered again in homework and correspondence analysis (01:41:37).
R uses scientific notation to display certain numbers, which can sometimes cause confusion (01:40:59).
The process of reconstructing a matrix and interpreting it can be complex and time-consuming, especially for larger datasets, making it impractical to do by hand for anything bigger than a 3x3 matrix (01:43:01).
A dataset comparing the head lengths and breadths of 25 pairs of brothers is used as an example, with the data being small and having been used extensively in research (01:44:01).
The dataset includes measurements for the eldest and second sons, with variables labeled as L1, B1, L2, and B2, representing the head length and breadth of each brother (01:44:37).
It is expected that L1 and B1 would be correlated, as well as L2 and B2, since bigger people tend to have bigger heads, and siblings would likely be correlated with each other, but to a smaller extent (01:44:54).
The importance of thinking theoretically and using descriptive analysis when dealing with datasets is emphasized, such as examining the correlation matrix and grouping similar variables together (01:45:22).
A correlation matrix can be used to identify patterns and relationships between variables, such as the correlation between each brother’s measurements and the inter-brother correlations (01:46:20).
Examining the standard deviations of the variables can also provide insight, and it is noted that drastically different standard deviations may require harmonizing the scales (01:46:57).
A scatter plot of the data is presented, showing a high correlation between the variables, with the audience agreeing that the correlation is “pretty highly correlated” (01:47:37).
The data points are relatively strongly correlated and appear to be tolerably Gaussian, but there might be an outlier due to the small sample size (01:47:54).
The scale function is useful for centering and scaling data, and it provides the means and standard deviations of the variables, making it easy to undo the scaling if needed (01:48:28).
The scale function standardizes the data but does not transform it to the Prasanta Chandra Mahalanobis level, and it also provides the center and standard deviation of the variables (01:48:55).
The sweep function can be used to apply other transformations besides scaling and rescaling, making it a useful tool for data manipulation (01:49:37).
To perform Eigendecomposition of a matrix, the data needs to be converted into a matrix, and then the matrix can be multiplied by its transpose to obtain the desired dimensions (01:49:51).
The dimensions of the resulting matrices are as expected, with a 25x25 matrix and a 4x4 matrix (01:50:18).
The eigenvectors and eigenvalues can be used to understand the relationships between the variables, with the first eigenvector representing the overall size and the subsequent eigenvectors representing contrasts between variables (01:50:42).
The reconstruction formula can be used to verify the results of the eigendecomposition, and it involves using the eigenvalues and eigenvectors to reconstitute the original matrix (01:51:42).
The zap small function in R can be used to remove small values from a matrix, but it does not always work as expected (01:52:24).
The “zap small” function is useful for making numbers in scientific notation more understandable, but it doesn’t always work, especially when the scale between numbers is dramatically different (01:52:36).
Rounding is a very helpful technique that makes a massive difference in making matrices easier to understand, especially for non-experts who may be overwhelmed by scientific notation (01:53:12).
Rounding requires judgment and experience, and it’s something that can be improved over time (01:54:04).
The rank of a matrix is the number of non-zero eigenvalues, and a full-rank matrix has the same number of non-zero eigenvalues as its dimensions (01:54:29).
To make a rank-3 approximation of a matrix, the smallest eigenvalue can be set to zero, effectively throwing out the corresponding information (01:55:09).
The resulting matrix is an approximation of the original matrix, and the eigenvalues will be smaller (01:55:30).
A rank-2 approximation is expected to be slightly worse than a rank-3 approximation, but the difference depends on the specific case (01:56:14).
The residual matrix of a rank-2 approximation is indeed worse than that of a rank-3 approximation, but it may not be bad enough to be a concern (01:56:40).
To make lower rank approximations, smaller eigenvalues are thrown out, and this is exactly what is done in Singular value decomposition (SVD) and principal components analysis (Principal component analysis) (01:56:49).
Lower rank approximations of the full data are used as the basis for analysis (01:57:18).
The properties of eigenvalues are reviewed, including the fact that eigenvalues and eigenvectors should be orthogonal, except for possible negative one flips (01:57:40).
The direction of eigenvectors is not identified, so expect negative one flips to happen, which can be confusing when doing bootstrapping with PCA (01:58:58).
The SVD is used to find the singular values, which are the square root of the eigenvalues (01:59:33).
The mechanics of SVD and eigen decomposition are important to understand, as they will be used to do PCA and other methods in the next class (02:00:04).
Principal component analysis involves taking the Singular value decomposition of a data matrix, scaling the variables to make them comparable, and then interpreting the results (02:00:08).
Methods like correspondence analysis, principal components analysis, and some variations of multi-dimensional scaling are all SVD-based (02:00:52).
Regularization will be used in machine learning methods, such as autoencoders, to soften the decision of whether to keep or drop data, rather than making a binary decision (02:01:06).
The Singular Value Decomposition (SVD) is a powerful tool, but it has its issues and limitations, and there are things it could do better (02:01:42).
The SVD is the basis of how the modern world runs, but it doesn’t do exactly what is wanted, and it has problems that need to be fixed (02:01:53).

Homework Help and Q&A

Homework is due, and if students need extra time, it’s fine, and they can take an extra week or half the time, as the assignments are for learning, not for evaluation (02:04:02).
The homework involves various matrix expressions and calculations, and students are encouraged to write them out step by step and follow the orders of operation (02:04:32).
Some of the matrix expressions may look complicated, but they can be broken down into simpler steps, and students can use previously calculated values to avoid redoing everything (02:04:47).
The instructor is available to answer questions and provide help, and students can also ask for office hours if needed (02:03:34).
A student, shemontee, asked for clarification on a specific problem, F, which involves a 1 column vector and a row vector with 3 ones, and the instructor provided an explanation (02:05:22).
When multiplying a row vector by a column vector, the result is not equal to the multiplication of the column vector by the row vector, as the dimensions are different (02:05:59).
A matrix is symmetric if it is equal to its transpose, and this property implies that the matrix is square (02:06:27).
When explaining properties of matrices, it is sufficient to state the property in one or two words, such as “symmetric” or “undefined”, and provide a brief explanation if necessary (02:06:54).
If a matrix operation is undefined, it is acceptable to explain why, for example, by stating that the dimensions are incompatible (02:07:14).
When multiplying a vector by a matrix, the result can be explained by describing the effect of the matrix on the vector, such as adding or taking differences between the vector’s components (02:09:20).
An arbitrary vector is a vector with variables, not concrete numbers, and not all components can be equal to 0 (02:08:46).
To explain what happens to a vector under matrix multiplication, one should take the vector and multiply it by different matrices, then describe the resulting effect on the vector (02:09:01).
The matrices used in the discussion are not complex and are similar to those that would be encountered in real-life scenarios, although they may be larger in size (02:09:29).
Carla O asked for clarification on the dimensions of the matrix X, and it was confirmed that X should be transposed to a 3 by 1 column (02:10:16).
Jay Verkuilen frequently writes matrices in their transposed form to make the page easier to read and to conserve space (02:10:27).
This is a common practice, but it is essential to remember to transpose the matrix back when working with it (02:11:03).
Jay Verkuilen does not expect students to transpose the matrices themselves, as it is primarily done for readability (02:11:12).
The discussion moved on to multiplying matrices and understanding the results, with a focus on using different Prasanta Chandra Mahalanobis and Euclidean distances (02:11:41).
Jay Verkuilen provided a dataset on Parkinson’s and asked students to use different cases to calculate Mahalanobis and Euclidean distances (02:11:53).
Fabio Setti asked if it is possible for Mahalanobis and Euclidean distances to be the same, and Jay Verkuilen confirmed that it is possible if the variables have the same variances and no correlations (02:12:23).
Jay Verkuilen emphasized the importance of understanding the data before performing any advanced analysis and encouraged students to use the R function to run descriptive analysis (02:13:03).
The due date for a course post can be changed to a week later if needed, and students can ask questions online if they require clarification (02:13:36).
For a specific problem, students are not expected to do calculations by hand, and they can use R for the computations (02:13:52).
When computing Prasanta Chandra Mahalanobis distances, students should use the covariance matrix of group N, not the full data set (02:14:49).
The Mahalanobis distance function should be computed relative to the mean vector of group N, and this applies to all comparisons (02:14:57).
The question was updated from last year, and there may be occasional bugs or errors, but the instructor will fix them and upload the corrected version (02:15:03).
For all comparisons, students should compare from group N, except in one case where they need to look at how things differ from each other (02:15:54).
The instructor will put a comment on the post to clarify that all Mahalanobis distances should be computed relative to group N (02:16:26).
Office hours for the next week are available, but the instructor may be unavailable on Thursday and Friday, so students should get in touch earlier to schedule a meeting (02:16:41).
Monday or Wednesday are generally better days for office hours, while Tuesday is not ideal due to the instructor’s doctor’s appointments (02:17:12).
Jay Verkuilen is available to meet on either Monday or Wednesday, and prefers if the group can come together (02:17:42).
For calculations, if a mean vector is needed, it comes from group n, and if a covariance matrix is needed, it also comes from group n (02:17:55).
Jay Verkuilen will close the session and post the content when it’s done (02:18:07).
The session ends with thank you messages from Carla O, Fabio Setti, and Jay Verkuilen (02:18:14).