Correlation measures how two variables are related:
- Positive correlation: When one goes up, the other goes up
- Negative correlation: When one goes up, the other goes down
- No correlation: No clear relationship
Values range from -1 to +1
Correlation measures how two variables are related:
Values range from -1 to +1
The correlation coefficient (Pearson’s r) is calculated as:
\[r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}\]
where \(\bar{x}\) and \(\bar{y}\) are the means of variables X and Y.
We’ll look at how people’s height relates to their weight using R’s built-in women dataset.
## height weight ## 1 58 115 ## 2 59 117 ## 3 60 120 ## 4 61 123 ## 5 62 126 ## 6 63 129 ## 7 64 132 ## 8 65 135 ## 9 66 139 ## 10 67 142
The scatter plot shows a strong positive correlation between height and weight.
# Calculate correlation between two variables
x <- mtcars$mpg
y <- mtcars$wt
correlation <- cor(x, y)
print(paste("Correlation:", round(correlation, 3)))
## [1] "Correlation: -0.868"
This shows a strong negative correlation - heavier cars get fewer miles per gallon!
\[|r| = 0.0 \text{ to } 0.3: \text{ weak correlation}\] \[|r| = 0.3 \text{ to } 0.7: \text{ moderate correlation}\] \[|r| = 0.7 \text{ to } 1.0: \text{ strong correlation}\]
Remember: Correlation does NOT mean causation!
Key points about correlation:
cor() function