Correlation analysis is the analysis of relationship between two numeric variables. The correlation between such is estimated by covariance, but it is quite difficult to interpret its value in terms of correlation analysis. This is why it is better to convert the covariance into Pearson correlation coefficient r x1x2t (rho for the population)

Pearson correlation coeffitient

rx1,x2 = Sx1,x2:√S2x1S2x2

-1 ≤ rx1,x2 ≤ 1

The Pearson correlation coefficient estimates both strength direction of linear relationship between numeric variables. It is crucial to notice that it’s all about LINEAR relationship.

Assumptions

  • two numeric variables (if this assumption isn’t met, so one or both variables are ordinal, then Spearman correlation coefficient)
  • linear relationship between them (Scatteplot)

Method

  1. Create a scatterplot
  2. Estimate the Pearson correlation coefficient (though it is to be displayed in the corresponding test)
  3. Conduct the hypothesis test about the value of the correlation coefficient:

Spearman correlation coefficient

Assumptions

  • one or both variables aren’t numeric, but ordinal
  • linear relationship between themn(Scatterplot)

Method

  1. Create a scatterplot
  2. Estimate the Spearman correlation coefficient (though it is to be displayed in the corresponding test)
  3. Conduct the hypothesis test about the value of the correlation coefficient

As a reminder: if the H0 isn’t rejected, there is not enough evidence of linear correlation.

! Don’t confuse: the correlation, which is being explained by the correlation analysis, has nothing to do with dependency, causality. Height of a child is correlated with number of words a child knows, but this amount depends on the age. It is the ‘problem of the third variable’. Speak always about relationship or connection, but not about dependency