Visualizing Homelessness Data Using Principal Component Analysis

Data Is Just Lots of Vectors!

  • We represent an enormous amount of information around us in the form of numeric data vectors.

plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-2

  • Examples:
    • Points on graph of \( y=x^2 \)
    • Colors of pixels in a photo

plot of chunk unnamed-chunk-3 plot of chunk unnamed-chunk-4

Example: Homelessness in the U.S.

  • If doing research on social issue like homelessness in U.S., could have a \( 3\times 50 \) matrix of data:

    • Each of \( n=50 \) states is represented by a column vector of the form, \[ \small{ \begin{aligned} &\mathbf{x}_i=\begin{bmatrix}x_{1}\\x_{2}\\x_{3}\end{bmatrix},&1\leq i\leq n \end{aligned} } \]
    • \( x_{1},x_{2},x_{3} \) are some three measurable factors (e.g., unemployment rate, poverty threshold, median rent).

Example: Plotting Homelessness Data

  • I can plot these column vectors since they are elements \( \mathbb{R}^3 \)

plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-6

The "Curse" of Dimensionality

  • Homelessness in the U.S. related to many factors - poverty levels, unemployment rates, median rent, median income, social services [1].

  • Question: What if I want to consider more than \( p=3 \) factors?

  • Answer: To plot \( n \) data vectors with \( p\geq 4 \) entries, must select only \( 3 \) entries to view at a time.

Plotting whole matrix of data to gain necessary intuition is no longer possible.

Introduction

  • Principal Component Analysis (PCA) is a process that allows for dimension-reduction of data while maintaining much of variance, i.e., “information”.

Overview of Presentation:

[1] Theory behind Principal Component Analysis

  • How PCA accomplishes plotting/compressing large datasets while maintaining important info.

[2] Applying PCA to a small dataset for each of the 50 states

  • Demonstrate applying PCA to reduce dimension of data on U.S. homelessness.

Theory of Principal Component Analysis

  • To maintain most of the “information” in our data, we want to maintain the majority of its variance.

  • Goal of PCA is to find orthogonal directions (unit vectors) of maximum variance in our data.

  • Recall our general \( p\times n \) matrix of data, \[ \small{ W=\begin{bmatrix}\mathbf{x}_1&\mathbf{x}_2&\cdots&\mathbf{x}_n\end{bmatrix} } \]

  • Column vectors are elements of \( \mathbb{R}^p \), \[ \small{ \begin{aligned} &\mathbf{x}_i=\begin{bmatrix}x_{i1}\\\vdots\\x_{ip}\end{bmatrix},&1\leq i\leq n \end{aligned} } \]

Descriptions of Original Data

  • Mean of original data is, \[ \small{ \mathbf{\bar{x}}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i } \]

  • Variance of original data is captured by covariance matrix, \[ \small{ S=\frac{1}{n-1}\sum_{i=1}^n(\mathbf{x}_i-\bar{\mathbf{x}}_i)(\mathbf{x}_i-\bar{\mathbf{x}}_i)^T } \]

  • Covariance matrix is necessarily a \( p\times p \) symmetric matrix.

  • Variance of \( p \) variates on diagonals, covariance on off-diagonals.

Maximizing Variance

  • To find first unit \( \mathbf{w}\in\mathbb{R}^p \) pointing in direction of maximum variance:

    • Project our data onto a unit vector \( \mathbf{w} \): \[ \small{ \begin{aligned} \mathbf{w}^TW&=\begin{bmatrix}\mathbf{x}_1\cdot\mathbf{w}&\mathbf{x}_2\cdot\mathbf{w}&\cdots&\mathbf{x}_n\cdot\mathbf{w}\end{bmatrix}\\ &=\begin{bmatrix}y_1&y_2&\cdots&y_n\end{bmatrix}\\ &=\mathbf{y} \end{aligned} } \]latex
    • Maximize \( \mathrm{Var}(\mathbf{y}) \)
  • Formula for Variance [8] \[ \small{ \mathrm{Var}(\mathbf{y})=\frac{1}{n-1}\sum_{i=1}^n(y_i-\bar{y})^2 } \]

Maximizing Variance of Projected Data

  • The mean of our projected data is, \[ \small{ \begin{aligned} \bar{y}&=\frac{1}{n}\sum_{i=1}^n y_i\\ &=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\cdot\mathbf{w}\\ &=\mathbf{w}\cdot\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\\ &=\mathbf{w}\cdot\bar{\mathbf{x}} \end{aligned} } \]
  • The variance of our projected data is, \[ \small{ \begin{aligned} \mathrm{Var(\mathbf{y})}&=\frac{1}{n-1}\sum_{i=1}^n(y_i-\bar{y})^2\\ &=\frac{1}{n-1}\sum_{i=1}^n(\mathbf{w}\cdot\mathbf{x}_i-\mathbf{w}\cdot\bar{\mathbf{x}})^2\\ &=\frac{1}{n-1}\sum_{i=1}^n\mathbf{w}^T(\mathbf{x}_i-\bar{\mathbf{x}_i})(\mathbf{x}_i-\bar{\mathbf{x}}_i)^T\mathbf{w}\\ \end{aligned} } \]

A New Perspective

  • A nice result: \[ \small{ \begin{aligned} \mathrm{Var(\mathbf{y})}&=\frac{1}{n-1}\sum_{i=1}^n\mathbf{w}^T(\mathbf{x}_i-\bar{\mathbf{x}}_i)(\mathbf{x}_i-\bar{\mathbf{x}}_i)^T\mathbf{w}\\ &=\mathbf{w}^T\left(\frac{1}{n-1}\sum_{i=1}^n(\mathbf{x}_i-\bar{\mathbf{x}}_i)(\mathbf{x}_i-\bar{\mathbf{x}}_i)^T\right)\mathbf{w}\\ &=\mathbf{w}^TS\mathbf{w} \end{aligned} } \]

  • To find the unit \( \mathbf{w} \) that gives the direction of maximum variance in our data, we maximize, \[ \boxed{ \begin{aligned} &\mathbf{w}^TS\mathbf{w},&\left\lVert\mathbf{w}\right\rVert=1 \end{aligned} } \]

Introduction to Quadratic Forms

  • Quadratic Form: function on \( \mathbb{R}^n \) whose value at position vector \( \mathbf{x} \) is given by, \[ \small{ Q(\mathbf{x})=\mathbf{x}^TA\mathbf{x} } \]

for an \( n\times n \) symmetric matrix \( A \).

  • Important theorem tells how to find, \[ \small{ M=\mathrm{Max}\{\mathbf{x}^TA\mathbf{x}\, | \, \left\lVert\mathbf{x}\right\rVert=1\} } \]

  • Pertinent to economics, physics, signal processing [4].

( Include picture of Quadratic form here! )

Maximizing Quadratic Forms

Theorem: Maximizing \( \{\mathbf{w}^TS\mathbf{w}\,| \,\left\lVert\mathbf{w}\right\rVert=1 \} \)

  • Let \( \{\lambda_1\geq\lambda_2\cdots\geq\lambda_n\} \) be set of eigenvalues of \( S \) in order of descending magnitude, and \( \{\mathbf{u}_1,\mathbf{u}_2\ldots,\mathbf{u}_n\} \) are set of corresponding orthonormal eigenvectors of \( S \).

  • Then, \[ \small{ M=\mathrm{Max}\{\mathbf{w}^TS\mathbf{w} \, | \, \left\lVert\mathbf{w}\right\rVert\}=\lambda_1 } \]

\[ \small{ \lambda_1=\mathbf{u}_1^TS\mathbf{u}_1 } \]

Conclusion: Direction of maximum variance is given by unit eigenvector of \( S \), corresponding to largest eigenvalue of \( S \).

Extending to n Orthogonal Directions

  • Theorem extends to \( n \) orthogonal directions of maximum variance \( \{\mathbf{w}_1,\ldots,\mathbf{w}_n\} \).
  • Directions of max. variance given by orthonormal unit eigenvectors of \( S \), \( \{\mathbf{u}_1,\mathbf{u}_2,\ldots,\mathbf{u}_n\} \).

For \( k=2,\ldots,n \) the maximum value of \( \mathbf{w}^TS\mathbf{w} \) where, \[ \small{ \begin{aligned} &\mathbf{w}^T\mathbf{w}=1,&\mathbf{w}^T\mathbf{u}_1=0&,\ldots,&\mathbf{w}^T\mathbf{u}_{k-1}=0 \end{aligned} } \]

is \( \lambda_k \) and occurs when \( \mathbf{w}=\mathbf{u}_k \) [4].

Main Idea: Orthonormal eigenvectors of \( S \) are data's orthogonal directions of maximum variance,i.e., principal components.

Singular Value Decomposition

  • To find Principal Components, use SVD, \[ \small{ \begin{aligned} \boxed{A=U\Sigma V^T} \end{aligned} } \]

  • Singular values \( \sigma_1,\ldots,\sigma_n \) are square roots of eigenvalues of \( A^TA \), \[ \small{ \sigma_1,\ldots,\sigma_n=\sqrt{\lambda_1},\cdots,\sqrt{\lambda_n} } \]latex where \( A^TA\mathbf{v}_i=\lambda_i\mathbf{v}_i \).

  • Left singular vectors of \( A \) are columns of \( U \)

  • Right singular vectors of \( A \) are columns of \( V \) and orthononormal eigenvectors of \( A^TA \),

\[ \small{ \begin{aligned} &\mathbf{v}_1,\ldots,\mathbf{v_n}\, & A^TA\mathbf{v}_i=\lambda_i\mathbf{v}_i \end{aligned} } \]

Applying SVD to Perform PCA

  • If we consider data \( W \) in mean-deviation form, \[ \small{ B=\begin{bmatrix}\mathbf{x}_1-\bar{\mathbf{x}}&\mathbf{x}_2-\bar{\mathbf{x}}&\cdots&\mathbf{x}_n-\bar{\mathbf{x}}\end{bmatrix} } \]

  • Apply SVD to \( A=\frac{1}{\sqrt{n-1}}B^T \), so that we get eigenvalues, eigenvectors of, \[ \small{ \begin{aligned} A^TA&=\left(\frac{1}{\sqrt{n-1}}B^T\right)^T\left(\frac{1}{\sqrt{n-1}}B\right)\\ &=\frac{1}{\sqrt{n-1}}BB^T\\ &=S \end{aligned} } \]

Applying SVD to Perform PCA Continued...

  • Thus, apply SVD to \( \frac{1}{\sqrt{n-1}}B^T \), \[ \small{ \frac{1}{\sqrt{n-1}}B^T=U\Sigma V^T } \] gives eigenvalues, eigenvectors of \( S \).

  • Singular values of \( \frac{1}{\sqrt{n-1}}B^T \) are square roots of eigenvalues of \( S \).

  • Left singular vectors are columns of \( V \) and orthonormal eigenvectors of \( S \).

  • Thus, left singular vectors are principal components of \( S \).

Applying PCA to Homelessness Data

  • Collect \( 3\times 50 \) matrix of data from 2010.
  • Sources: U.S. Census Bureau and U.S. Department of Housing and Urban Development.

  • Data: \( n=50 \) vectors of form, \[ \small{ \mathbf{x}_i=\begin{bmatrix}h\\cb\\p\end{bmatrix} } \]

  • \( h=\mathrm{percent\,of\,state\,population\,that\,is\,homeless} \)

  • \( cb=\mathrm{percent\,of\,state\,population\,that\,is\,cost\,burdened} \)

  • \( p=\mathrm{percent\,of\,state\,population\,in\,deep\,poverty} \)

Applying PCA to Homelessness Data

  • Original Data

plot of chunk unnamed-chunk-7

plot of chunk unnamed-chunk-8

Using R Built-In Functions

  • Can use svd() built-in function in R to find principal components of data,
[1] "d="
[1] 4.10582391 1.23541736 0.09071855
[1] "Eigenvalues of S="
[1] 16.857790020  1.526256047  0.008229856
[1] "Total Variance"
[1] 18.39228

Using R Built-In Functions

[1] "First six rows of u="
            [,1]        [,2]        [,3]
[1,]  0.11619489  0.14915260 -0.11649125
[2,] -0.12755129 -0.21413139  0.15892807
[3,]  0.08573686  0.17879062  0.03870884
[4,] -0.04074753  0.16857778 -0.09683053
[5,]  0.21843789 -0.03495111  0.14053457
[6,]  0.04246217 -0.05934145  0.17601419
[1] "v="
           [,1]         [,2]         [,3]
[1,] 0.00948655 -0.008718115  0.999916996
[2,] 0.99333641 -0.114778510 -0.010424853
[3,] 0.11485987  0.993352852  0.007571169

Percentage of Variance Explained by each PC

[1] "% total variance explained by PC1="
[1] 91.6569
[1] "% total variance explained by PC2="
[1] 8.298353
[1] "% total variance explained by PC3="
[1] 0.04474626

Original Data with Component Axes

plot of chunk unnamed-chunk-12

Dimension Reduction to 2D

plot of chunk unnamed-chunk-13 plot of chunk unnamed-chunk-14

Homelessness Index

  • Can create homelessness index based off our data's first principal component, \[ \small{ \mathbf{v}_1=\begin{bmatrix}0.00949\\0.99334\\0.99334\end{bmatrix} } \]

  • Index provides quick scalar quantification of “strength” of each state's relationship to factors related to homelessness.

\[ \small{ h_{\mathrm{index}} = 0.00949\hat{h}+0.99334\hat{cb}+0.99334\hat{p} } \] where \( \hat{h},\hat{cb},\hat{p} \) are the values of our data in mean-deviation form.

Homelessness Index Plot

plot of chunk unnamed-chunk-15

Works Cited

[1] Chris Glynn, Thomas H. Byrne, Dennis P. Culhane. “Inflection points in community-level homeless rates.” The Annals of Applied Statistics, 15(2) 1037-1053 June 2021. https://doi.org/10.1214/20-AOAS1414

[2] “2007 - 2021 Point-in-Time Estimates by State”, PIT and HIC Data Since 2007, US Department of Housing and Urban Development, 2022,

[3] Jolliffe IT, Cadima J. 2016 Principal component analysis: a review and recent developments.Phil. Trans. R. Soc. A 374: 20150202. http://dx.doi.org/10.1098/rsta.2015.0202

[4] Lay, David C., et al. Linear Algebra and Its Applications. Sixth ed., Pearson, 2022. U.S. Census Bureau.

Works Cited Continued...

[5] “DECENNIALCD1132010 .” P1: Census Bureau Table, United States Census Bureau, 2010, https://tinyurl.com/h8nzjpmv. Accessed 27 Nov. 2022.

[6] American Community Survey. “ACSDP1Y2010.” DP04: Census Bureau Table, United States Census Bureau, 2010, https://tinyurl.com/y2xzxt2r . Accessed 27 Nov. 2022.

[7] American Community Survey. “ACSST1Y2010 .” S1701: Poverty Status in the Past 12 Months, United States Census Bureau, 2010, https://tinyurl.com/5etsm7xm. Accessed 27 Nov. 2022.

Works Cited Continued ...

[8] Li, Xiaodong. “STA135 Lecture 2 : Sample Mean Vector and Sample Covariance Matrix.” STA135. Davis, California, https://www.stat.ucdavis.edu/~xdgli/Xiaodong_Li_Teaching_files/135Note2.pdf. Accessed 24 Oct. 2022.

[9] Henderson, Thomas C. “A Geometric Interpretation of the Covariance Matrix.” CS 4640 Fall Semester 2019. https://www.cs.utah.edu/~tch/CS4640F2019/resources/A%20geometric%20interpretation%20of%20the%20covariance%20matrix.pdf. Accessed 30 Nov. 2022.

[10] “What Is ‘Deep Poverty’?” Center for Poverty and Inequality Research, University of California, Davis, 18AD, https://poverty.ucdavis.edu/faq/what-deep-poverty.

Works Cited Continued ...

[11] “Rental Burdens: Rethinking Affordability Measures: HUD User.” Rental Burdens: Rethinking Affordability Measures | HUD USER, PD&R Edge, https://www.huduser.gov/portal/pdredge/pdr_edge_featd_article_092214.html.

[12] “Principal Component Analysis from Scratch in Python.” AskPython, AskPython, 19 Oct. 2020, https://www.askpython.com/python/examples/principal-component-analysis.

[13] Newsom, Jason T. “Principal Component Analysis.” Psy 523/623 Structural Equation Modeling, Spring 2020. https://web.pdx.edu/~newsomj/semclass/ho_pca.pdf. Accessed 30 Nov. 2022.

Works Cited Continued ...

[14] “Homelessness: A State of Emergency.” Homelessness: A State of Emergency, National Alliance to End Homelessness, 18 Oct. 2018, https://endhomelessness.org/resource/homelessness-a-state-of-emergency/.

[15] West, Mary, and Karin Gepp. “Maslow's Hierarchy of Needs Pyramid: Uses and Criticism.” Medical News Today, MediLexicon International, 28 July 2022, https://www.medicalnewstoday.com/articles/maslows-hierarchy-of-needs.

[16] US Census Bureau. “Poverty Thresholds.” Census.gov, United States Census Bureau, 13 Sept. 2022, https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-poverty-thresholds.html.

Works Cited Continued ...

[17] Chartier, Timothy P. When Life Is Linear: From Computer Graphics to Bracketology. Mathematical Association of America, 2015.

[18]RGB Image https://datagenetics.com/blog/august32013/index.html, Accessed 12/2/22

Introduction to Quadratic Forms

  • Quadratic Form: function on \( \mathbb{R}^n \) whose value at position vector \( \mathbf{x} \) is given by, \[ Q(\mathbf{x})=\mathbf{x}^TA\mathbf{x} \]

for an \( n\times n \) symmetric matrix \( A \).

  • Important theorem tells how to find, \[ M=\mathrm{Max}\{\mathbf{x}^TA\mathbf{x}\, | \, \left\lVert\mathbf{x}\right\rVert=1\} \]

  • Pertinent to economics, physics, signal processing [4].

( Include picture of Quadratic form here! )

Maximizing Quadratic Forms

Theorem: Finding \( \begin{aligned}M=\mathrm{Max}\{\mathbf{w}^TS\mathbf{w} \, \, | \, \left\lVert\mathbf{w}\right\rVert=1\}\end{aligned} \):

  • By the Spectral Theorem of Symmetric Matrices [4], can orthogonally diagonalize \( S \): \[ \begin{aligned} S&=PDP^{-1}\\ &=\begin{bmatrix}\mathbf{u}_1&\cdots&\mathbf{u}_n\end{bmatrix}\begin{bmatrix}\lambda_1&\cdots&0\\\vdots&\ddots&\vdots\\0&\cdots&\lambda_n\end{bmatrix}\begin{bmatrix}\mathbf{u}_1\\\vdots\\\mathbf{u}_n\end{bmatrix} \end{aligned} \]

  • \( \lambda_1\geq\lambda_2\geq\ldots\geq\lambda_n \) are eigenvalues of \( S \)

  • \( \mathbf{u}_1,\ldots,\mathbf{u}_n \) are corresponding orthonormal eigenvectors of \( S \)

Maximizing Quadratic Forms Continued...

\[ \begin{aligned} &S=PDP^{-1} &D=P^{-1}SP \end{aligned} \]

  • Perform orthogonal change of variable, \[ \begin{aligned} &\mathbf{w}=P\mathbf{y}, &P^T=P^{-1} \end{aligned} \] Then, quadratic form becomes, \[ \begin{aligned} \mathbf{w}^TS\mathbf{w}&=(P\mathbf{y})^TS(P\mathbf{y})\\ &=\mathbf{y}^T(P^TSP)\mathbf{y}\\ &=\mathbf{y}^TD\mathbf{y} \end{aligned} \]

  • Without loss of generality, suppose \( S \) is a \( 3\times 3 \) matrix and we can easily derive what \( M=\mathrm{Max}\{\mathbf{w}^TS\mathbf{w}\, | \, \left\lVert\mathbf{w}\right\rVert=1\} \) will be.

Maximizing Quadratic Forms Continued...

If \( S \) is a \( 3\times 3 \) matrix, \[ \begin{aligned} \mathbf{y}^TD\mathbf{y}&=\begin{bmatrix}y_1&y_2&y_3\end{bmatrix}\begin{bmatrix}\lambda_1&0&0\\0&\lambda_2&0\\0&0&\lambda_3\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}\\ &=\lambda_1y_1^2+\lambda_2y_2^2+\lambda_2y_3^2 \end{aligned} \]

Since \[ \lambda_1\geq\lambda_2\geq\lambda_3 \]

We have, \[ \begin{aligned} \mathbf{y}^TD\mathbf{y}&=\lambda_1y_1^2+\lambda_2y_2^2+\lambda_3y_3^2\\ &\leq\lambda_1y_1^2+\lambda_1y_2^2+\lambda_2y_3^2\\ \end{aligned} \]

Maximizing Quadratic Forms Continued...

Note, \[ \begin{aligned} \mathbf{y}^TD\mathbf{y}&=\lambda_1y_1^2+\lambda_2y_2^2\lambda_3y_3^2\\ &\leq\lambda_1y_1^2+\lambda_1y_2^2+\lambda_2y_3^2\\ &=\lambda_1(y_1^2+y_2^2+y_3^2)\\ &=\lambda_1\left\lVert\mathbf{y}\right\rVert^2\\ &=\lambda_1 \end{aligned} \]

  • \( \mathbf{y}^TD\mathbf{y} \) will always be less than or equal to the largest eigenvalue of \( S \)

Maximizing Quadratic Forms Continued...

  • For \( \mathbf{y}=\begin{bmatrix}1\\0\\0\end{bmatrix} \), \[ \begin{aligned} \mathbf{y}^TD\mathbf{y}&=\begin{bmatrix}1&0&0\end{bmatrix}\begin{bmatrix}\lambda_1&0&0\\0&\lambda_2&0\\0&0&\lambda_3\end{bmatrix}\begin{bmatrix}1\\0\\0\end{bmatrix}\\ &=\lambda_1\\ \end{aligned} \]

  • Thus, \( \mathbf{y}^TD\mathbf{y}=\mathrm{Max}\{\mathbf{y}^TD\mathbf{y} \, | \, \left\lVert\mathbf{y}\right\rVert=1\}=\lambda_1 \) when \( \mathbf{y}=\mathbf{e}_1 \).

Maximizing Quadratic Forms Continued...

  • For \( \mathbf{y}=\mathbf{e}_1 \), \[ \begin{aligned} \mathbf{w}&=P\mathbf{y}\\ &=\begin{bmatrix}\mathbf{u}_1&\mathbf{u}_2&\mathbf{u}_3\end{bmatrix}\begin{bmatrix}1\\0\\0\end{bmatrix}\\ &=\mathbf{u}_1 \end{aligned} \]

  • Thus, the unit \( \mathbf{w} \) that maximizes \( \mathbf{w}^TS\mathbf{w} \) is the unit eigenvector of \( S \) corresponding to its largest eigenvalue.

Theory of SVD

  • SVD answers the question of how much a matrix “stretches” a unit vector \( \mathbf{x} \).

  • Note: the same unit vector \( \mathbf{x} \) that maximizes \( \left\lVert A\mathbf{x}\right\rVert \), also maximizes,

\[ \begin{aligned} \left\lVert A\mathbf{x}\right\rVert^2&=(A\mathbf{x})^T(A\mathbf{x})\\ &=\mathbf{x}^TA^TA\mathbf{x}\\ &=\mathbf{x}(A^TA)\mathbf{x} \end{aligned} \]

Theory of SVD

  • Know from previous theorem that maximum of \( \mathbf{x}(A^TA)\mathbf{x} \) given by greatest eigenvalue of \( A^TA \)

  • Maximum occurs when \( \mathbf{x}=\mathbf{v}_1 \) where \( A^TA\mathbf{v}_1=\lambda_1\mathbf{v}_1 \)

  • \( A \) stretches \( \mathbf{v}_1 \) the most, by amount \( \lambda_1 \), then in mutually orthogonal directions by amounts given by eigenvalues of \( A^TA \) in order of decreasing magnitude

Applying SVD to Perform PCA Continued...

\[ \begin{aligned} \frac{1}{\sqrt{n-1}}BB^T&=\frac{1}{n-1}\begin{bmatrix}\mathbf{x}_1-\bar{\mathbf{x}}&\mathbf{x}_2-\bar{\mathbf{x}}&\cdots&\mathbf{x}_n-\bar{\mathbf{x}}\end{bmatrix}\begin{bmatrix}\mathbf{x}_1-\bar{\mathbf{x}}\\\mathbf{x}_2-\bar{\mathbf{x}}\\\vdots\\\mathbf{x}_n-\bar{\mathbf{x}}\end{bmatrix}\\ &=\frac{1}{n-1}\sum_{i=1}^n\begin{bmatrix}x_{1i}-\bar{x}_1\\\vdots\\x_{pi}-\bar{x}_p\end{bmatrix}\begin{bmatrix}x_{1i}-\bar{x}_1&\cdots&x_{pi}-\bar{x}_p\end{bmatrix}\\ &=\frac{1}{n-1}\sum_{i=1}^n(\mathbf{x}_i-\mathbf{\bar{x}})(\mathbf{x}_i-\mathbf{\bar{x}})^T\\ &=S \end{aligned} \]

3D Plot of Original Data

(2.0, 10.0)
(-0.5, 0.5)
(40.0, 80.0)

plot of chunk unnamed-chunk-16

plt.show()

plot of chunk unnamed-chunk-17

Plot of Percent Homeless