<style type="text/css"> .remark-slide-number { display: none; } </style> <center> # Understanding Covariance & Correlation #### University of Toronto Department of Statistics <img src="data:image/png;base64,#Figs/animation.gif" style="display: block; margin: auto;" /> --- <center> # Recall: Variance **Variance** measures how spread out individual data points are from the mean. <img src="data:image/png;base64,#Figs/var.png" width="3200" style="display: block; margin: auto;" /> **Variance** tells us about the **spread of scores** for ***a single variable*** --- <center> # Extending to Covariance **Covariance** is like variance, but for two variables in that it tells us about **how two variables vary together**. <img src="data:image/png;base64,#Figs/cov.png" width="3600" style="display: block; margin: auto;" /> .left[ - **Positive**: Higher scores on *x* are associated with ***higher*** scores on *y* - **Negative**: Higher scores on *x* are associated with ***lower*** scores on *y* - **Low / zero covariance**: No clear relationship between *x* and *y* ] --- <center> # Extending to Covariance **Covariance** is like variance, but for two variables in that it tells us about **how two variables vary together**. .left[ The formula for covariance of a **population** is: ] `$$Cov(X,Y) = \mathbb{E}[(X - \mu_{x})(Y - \mu_{y})]$$` .left[ In this formula, `\(X\)` and `\(Y\)` are random variables. - `\(\mu_{x}\)` is the expected value (mean) of `\(X\)` - `\(\mu_{y}\)` is the expected value (mean) of `\(Y\)` - `\(\mathbb{E}\)` denotes the expectation operator ] Population variance is often difficult to ascertain, so in practice we often measure a subset of the population (i.e., **a sample**) to estimate covariance. --- <center> # Extending to Covariance **Covariance** is like variance, but for two variables in that it tells us about **how two variables vary together**. .left[ The formula for covariance of a **sample** is: ] `$$cov(x,y) = \dfrac{\sum(x_i - \bar{x})(y_i - \bar{y})}{N - 1}$$` .left[ - `\(x_i\)` represents each data point's x-value - `\(\bar{x}\)` represents the mean of *x* - `\(y_i\)` represents each data point's y-value - `\(\bar{y}\)` represents the mean of *y* ] This formula allows us to calculate the extent to which two variables move together, distilled as a single numerical value. --- <center> # Extending to Covariance **Covariance** is like variance, but for two variables in that it tells us about **how two variables vary together**. .left[ - Units of covariance depend on the units of the quantities being measured. ] .left[ ***Imagine that we measure a group of 100 students' heights and weights...*** ] <img src="data:image/png;base64,#Figs/units.png" width="75%" style="display: block; margin: auto;" /> Although the nature of the relationship between the quantities is the same, the total covariance is different. --- <center> # Enter Correlation! **Correlation** standardizes covariance and makes it unit-free. .left[ Regardless of what units of measure are used, **correlation ranges from -1 to +1**. ] .left[ - There are many different ways to write the correlation formula. - One way that is simple to understand if you understand covariance is: ] $$ r = \dfrac{cov(x,y)}{s_x * s_y} $$ .left[ - Covariance is divided by the cross-product of the variables' standard deviations. ] --- <center> # Enter Correlation! **Correlation** standardizes covariance and makes it unit-free. .left[ Regardless of what units of measure are used, **correlation ranges from -1 to +1**. ] .left[ - The correlation between two quantities is the same no matter what scale of measurement is used. ] <img src="data:image/png;base64,#Figs/cor_units.png" width="75%" style="display: block; margin: auto;" /> --- <center> # Magnitude of Correlation .left[ - The magnitude of correlation does not provide complete information about the relationship - An infinite number of data patterns can produce the same correlation! ] <img src="data:image/png;base64,#Figs/quartet.png" width="75%" style="display: block; margin: auto;" /> --- <center> # Magnitude of Correlation .left[ - Try adjusting the slider below to see visual representations of the data points changing to represent the input! ] <iframe src="https://black-cat-enthusiast.shinyapps.io/correl/" width="100%" height="400" style="border:none;"></iframe> .left[ - [Click here](https://www.guessthecorrelation.com/) to play a quick and fun "guess the correlation" game. ] --- <center> # Summary .left[ - **Covariance** measures how two variables move together + Units matter! The scales of measurement alter the calculated value of covariance. ] .left[ - **Correlation** standardizes covariance so that it can only range from -1 to +1 + The unit-free nature of correlation makes it easier to compare different outcomes. ] .left[ - Visualizing data in addition to checking correlation is important to understand the nature of the relationship between two variables. ]