2024-03-29

Introduction to Correlation

  • Correlation measures the strength and direction of the relationship between two variables
  • Positive correlation: As one variable increases, the other variable also tends to increase
  • Negative correlation: As one variable increases, the other variable tends to decrease
  • Types of correlation: Pearson, Spearman, Kendall

Pearson Correlation Coefficient

\[ r = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum{(x_i - \bar{x})^2}\sum{(y_i - \bar{y})^2}}} \]

  • Measures the strength and direction of the linear relationship between two variables
  • Utilizes the covariance of the variables divided by the product of their standard deviations
  • Widely used for assessing linear relationships between continuous variables in many fields, including statistics, economics, and psychology

Spearman Correlation

\[ rho = \frac{\sum{(x' - m_{x'})(y'_i - m_{y'})}}{\sqrt{\sum{(x' - m_{x'})^2\sum{(y' - m_{y'})^2}}}}\]

  • Measures the strength and direction of monotonic relationships between two variables
  • Utilizes ranks of the data rather than their actual values, offering robustness against outliers and non-linear associations
  • Useful when the data violates assumptions of linearity or normality

Kendall Correlation

\[ tau = \frac{n_{c} -n_{d}}{\frac{1}{2}n(n-1)}\]

  • Quantifies the degree of association between variables by comparing concordant and discordant pairs in their ranks
  • Assesses monotonic relationships without assuming linearity or normality in the data
  • Well-suited for situations where the data lacks linearity or when there’s a need for a non-parametric correlation measure
  • The following slides use the mtcars dataset to provide clearer examples of correlation.

Correlation Between MPG & HP

  • Scatter plot demonstrates the relationship between MPG and HP
  • Negative correlation: As the MPG increases, HP tends to decrease

Correlation Between Displacement & Hp

  • Scatter plot demonstrates the relationship between Disp and HP
  • Positive correlation: As the displacement increases, HP tends to increase

Correlation Between Cyl, Wt, & Qsec

  • Scatter plot demonstrates the relationship between Cyl, Wt, and Qsec
  • Negative correlation: As Cyl increases, Qsec decreases
  • Positive correlation: As Cyl increases, Wt also increases

Correlation between Cyl & Qsec

# Load the mtcars dataset
data(mtcars)

# Calculate correlation between 'cyl' and 'qsec' for the entire dataset
correlation <- cor(mtcars$cyl, mtcars$qsec)

# Print the correlation value
print(paste("Correlation between 'cyl' and 'qsec':", correlation))
## [1] "Correlation between 'cyl' and 'qsec': -0.591242073768869"
  • Moderate negative correlation (-0.59) implies that as the number of cylinders increases, the quarter-mile time tends to decrease.

Conclusion

  • We explored the concept of correlation and its significance in statistical analysis
  • Key findings include the identification of different types of correlation coefficients—Pearson, Spearman, and Kendall—and their respective applications
  • Through visualizations and mathematical formulations, we illustrated how correlation analysis can provide insights into relationships within data sets
  • Understanding correlations enables us to make informed decisions in various fields such as economics, finance, and healthcare