What is Correlation?

Correlation measures how two variables are related:

  • Positive correlation: When one goes up, the other goes up
  • Negative correlation: When one goes up, the other goes down
  • No correlation: No clear relationship

Values range from -1 to +1

The Correlation Formula

The correlation coefficient (Pearson’s r) is calculated as:

\[r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}\]

where \(\bar{x}\) and \(\bar{y}\) are the means of variables X and Y.

Example: Height and Weight

We’ll look at how people’s height relates to their weight using R’s built-in women dataset.

##    height weight
## 1      58    115
## 2      59    117
## 3      60    120
## 4      61    123
## 5      62    126
## 6      63    129
## 7      64    132
## 8      65    135
## 9      66    139
## 10     67    142

Basic Visualization

Analysis of Plot

The scatter plot shows a strong positive correlation between height and weight.

  • Taller individuals tend to weigh more
  • Data points follow a clear upward trend
  • The pattern supports a strong linear relationship

Simple R Code Example

# Calculate correlation between two variables
x <- mtcars$mpg
y <- mtcars$wt

correlation <- cor(x, y)
print(paste("Correlation:", round(correlation, 3)))
## [1] "Correlation: -0.868"

This shows a strong negative correlation - heavier cars get fewer miles per gallon!

Scatter Plot with Trend

3D Interactive Plot

Interpreting Correlation Values

\[|r| = 0.0 \text{ to } 0.3: \text{ weak correlation}\] \[|r| = 0.3 \text{ to } 0.7: \text{ moderate correlation}\] \[|r| = 0.7 \text{ to } 1.0: \text{ strong correlation}\]

Remember: Correlation does NOT mean causation!

Summary

Key points about correlation:

  • Measures strength and direction of linear relationships
  • Ranges from -1 (perfect negative) to +1 (perfect positive)
  • Easy to calculate in R using cor() function
  • Visualizations help understand the relationships
  • Always check for outliers and non-linear patterns