| Age | Yes | No/Not Sure | Row Totals |
|---|---|---|---|
| 18-44 | 228 | 205 | 433 |
| 45-64 | 201 | 118 | 319 |
| Col. Totals | 429 | 323 | 752 |
MAS 261 - Lecture 20
Introduction to Correlation and Covariance
Housekeeping
Today’s plan
Upcoming Dates
R Help 🪄
Review and New Questions
Two Sided Test of Proportions and Contingency Tables
Row Percentages and Column Percentages
Understanding Correlation
Examining correlations visually and quantitatively
Estimating correlation quantitatively
Converting Correlation to Covariance
Why and How
Conversion Formulas
Upcoming Dates
HW 6 was due 10/29 (Grace period ends today 10/30)
Demo videos are posted on Blackboard
This assignment seems long but it’s not.
It consists of just three hypothesis tests with questions about each test.
Most questions are multiple choice, but do not just guess and keep trying.
HW 7 is posted and is due Wed. 11/5 at midnight.
- Videos are posted.
Test 2 is on November 11th and will include material up through Lecture 20
Lecture 21 - Intro to Portfolio Management will be on Final Exam, not on Test 2.
R and RStudio
In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
For those who want to go further with R/RStudio:
If you are interested in downloading R and RStudio to your own computer, I can guide you through the process.
The software is completely free but it does have to be updated a couple times each year.
Review and NEW
Do Gen-Zs and Millenials differ from Gen-Xers with respect to daylight savings?
Column and Row Percentages
Original Data
| Age | Yes | No/Not Sure | Row Totals |
|---|---|---|---|
| 18-44 | 228 | 205 | 433 |
| 45-64 | 201 | 118 | 319 |
| Col. Totals | 429 | 323 | 752 |
Row %: Percentages of each age group that said ‘Yes’ or ‘No’.
| Yes | No/Not Sure | |
|---|---|---|
| 18-44 | 52.66 | 47.34 |
| 45-64 | 63.01 | 36.99 |
Column %: Percentages of Yes/No opinions in each age group.
| Yes | No/Not Sure | |
|---|---|---|
| 18-44 | 53.15 | 63.47 |
| 45-64 | 46.85 | 36.53 |
Lecture 20 In-class Exercises - Q1-Q2
Poll Everywhere - My User Name: penelopepoolereisenbies685
Review data with new concepts - Use tables on previous slide
Question 1. What percentage of all the ‘Yes, lets end daylight savings!’ votes are in the 45-64 age group?
Round percentage to one decimal place.
Question 2. What percentage of all 18 - 44 year olds said ‘No’ or ‘Not sure’ when asked if they want to eliminate daylight savings.
Round percentage to one decimal place.
Note: There will be homework questions providing more practice on relating questions to these percentage tables.
Row and column percentages can be calculated from raw data, but I provide them.
These questions focus on interpretation instead of arithmetic.
Linear Correlations
- The last part of the course will focus on understanding linear relationships between two or more quantitative variables.
- We will introduce the first part of this topic today and then build on these concepts after Quiz 2.
Often if we have two quantitative variables we want to understand the extent to which they are associated.
The first step is often to plot the data using a scatterplot.
We can also use quantitative measures of association to understand these relationships.
Grocery Sales per Sq. Ft. and Planned Store Openings
Understanding Linear Relationships
Direction of the Relationship
Strength of the Linear Relationship
In addition to determining if there is a positive or negative relationship,
- We also want to quantify, how strong the relationship is.
To quantify the strength a linear relationship, we calculate:
Pearson’s correlation coefficient, \(R_{xy}\).
\(R_{xy} = 0.85\)
How do we interpret this value?
- …Spoiler: This a strong positive correlation!
Interpreting \(R_{xy}\), the correlation coefficient
- The most extreme \(R_{xy}\) values represent ‘perfectly correlated data’:
Range of \(R_{xy}\) Guidelines for Interpretation
Example of Negative Correlation
Lecture 20 In-class Exercises - Q3
Poll Everywhere - My User Name: penelopepoolereisenbies685
When NOT to use \(R_{xy}\)
\(R_{xy}\) is only valid when examining linear relationships.
If the data have a curvilinear relationship, there are other tools that will be covered in other courses.
Calculating Covariance from Correlation
\(R_{xy}\), the correlation is straightforward to interpret because it is unitless.
\(R_{xy}\) is ALWAYS between -1 and 1 and interpreted the same way.
Another measure, Covariance, is also useful for calcuations
In Lecture 21, we will cover how to create and examine a linear combination of multiple variables.
Example: Mutual funds and stock portfolios are linear combinations of stocks.
In order to examine linear combinations of variables we first calculate their covariance:
Covariance of two variables, X and Y:
\(COV_{xy} = R_{xy} \times S_{x} \times S_{y}\)
\(R_{xy} = \frac{COV_{xy}}{S_{x} \times S_{y}}\)
- \(S_{x}\) is the standard deviation of x and \(S_{y}\) is the standard deviation of y.
Calculating \(COV_{xy}\) from the Data or \(R_{xy}\)
Below I show Covariance/Correlation calculations using the Grocery Data
In HW 7 you will use these formulas because you don’t have the data.
ALSO: Remember that if you are given variance (which you are),
Standard Deviation is the Square Root of Variance
R command to find Square Root is
sqrt()
Code
```{r echo=T}
Rxy <- cor(grocery$sales_sq_ft, grocery$openings) # correlation
Sx <- sd(grocery$sales_sq_ft) # sd of x
Sy <- sd(grocery$openings) # sd of y
Rxy*Sx*Sy # calculate cov from Rxy and SD
cov(grocery$sales_sq_ft, grocery$openings) # calculate cov from the data
cov(grocery$sales_sq_ft, grocery$openings)/(Sx * Sy) # calculate Rxy from cov
```[1] 1754.144
[1] 1754.144
[1] 0.8517842
Key Points from Today
This short lecture is an introduction to linear associations between variables.
We will continue this discussion in Lecture 21 when we examine linear combinations of variables
- This topic will provide insite into Portfolio Management
For now, you are expected to understand
- How to interpret a scatterplot
- Calculating \(R_{xy}\) in R using the
corcommand - Interpreting \(R_{xy}\)
- When NOT to use \(R_{xy}\) to examine data associations
- How to convert \(R_{xy}\) to \(COV_{xy}\) vise versa
To submit an Engagement Question or Comment about material from Lecture 20: Submit it by midnight today (day of lecture).