Correlations

Illya Mowerman, Ph.D.

Why correlations?

It brings satistical significance when describing a realtionship between two continuous variables.

What is the realtionship between: MPG vs HP?

Can you see the relationship?

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Scatter Plot: MPG vs. Horsepower",
       x = "Horsepower",
       y = "Miles per Gallon")

What is the realtionship between: MPG vs HP?

Can you see the relationship better?

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Scatter Plot: MPG vs. Horsepower",
       x = "Horsepower",
       y = "Miles per Gallon")

What is the realtionship between: MPG vs HP?

This is how to prove it statistically

cor_test_result <- cor.test(mtcars$mpg, mtcars$hp)
cor_test_result
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$hp
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8852686 -0.5860994
## sample estimates:
##        cor 
## -0.7761684

Introduction to Correlation

Interpreting P-values

Interpreting Correlation Coefficients

What is the realtionship between: MPG vs Weight?

cor.test(mtcars$mpg, mtcars$wt)
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$wt
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9338264 -0.7440872
## sample estimates:
##        cor 
## -0.8676594

Visualizing Correlations: MPG vs Weight

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Scatter Plot: MPG vs. Weight",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon")

Interpretation: We can clearly see that as weight increases, fuel efficiency (MPG) decreases.

What is the realtionship between: Horsepower vs Quarter-mile Time?

cor.test(mtcars$hp, mtcars$qsec)
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$hp and mtcars$qsec
## t = -5.4946, df = 30, p-value = 5.766e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8475998 -0.4774331
## sample estimates:
##        cor 
## -0.7082234

Visualizing Correlations: Horsepower vs Quarter-mile Time

ggplot(mtcars, aes(x = hp, y = qsec)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Scatter Plot: Quarter-mile Time vs. Horsepower",
       x = "Horsepower",
       y = "Quarter-mile Time (seconds)")

Interpretation: Cars with more horsepower generally complete the quarter-mile in less time.

Layman’s Explanation of Key Correlations

Computing Correlations

cor_matrix <- cor(mtcars)
round(cor_matrix, 2)
##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43
## qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00  0.74 -0.23 -0.21 -0.66
## vs    0.66 -0.81 -0.71 -0.72  0.44 -0.55  0.74  1.00  0.17  0.21 -0.57
## am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
## gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
## carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00

Correlation Matrix Visualization

corrplot(cor_matrix, method = "color", type = "upper", order = "hclust", 
         tl.col = "black", tl.srt = 45)

Statistical Significance: MPG vs Weight

cor_test_result <- cor.test(mtcars$mpg, mtcars$wt)
cor_test_result
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$wt
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9338264 -0.7440872
## sample estimates:
##        cor 
## -0.8676594

Interpretation: - Correlation coefficient: -0.8677 (strong negative correlation) - P-value: 1.294e-10 (much smaller than 0.05, so statistically significant) - We can be very confident that this correlation isn’t due to chance

Multiple Variable Relationships

pairs(mtcars[, c("mpg", "disp", "hp", "wt")], 
      main = "Scatter Plot Matrix of Selected mtcars Variables")

This matrix shows relationships between multiple variables at once.

Layman’s Explanation of Overall Patterns

Best Practices and Pitfalls

Conclusion

Thank You!

Questions? Comments?