Correlation vs Regression

Rahul Venugopal

May 5, 2021

We have two variables. How do we calculate correlation and regression parameters?

Our dataset

  • ads Number of ads shown per day
  • packets Number of packets of chips bought
library(ggpubr)
ads <- c(5,4,4,6,8)
packets <- c(8,9,10,13,15)
ads
## [1] 5 4 4 6 8
packets
## [1]  8  9 10 13 15

Seeing is believing

library(ggpubr)
data <- data.frame(ads,packets)
ggscatter(data,
          x = "ads",
          y = "packets",
          color = "steelblue",
          palette = "npr",
          add = "reg.line",
          rug = TRUE) + 
  stat_cor(method = "pearson", label.x = 4, label.y = 16)

Let us run a correlation analyis to check if these two variables are associated!

cor(ads,packets)
## [1] 0.8711651

Run a linear model to see how well ads predict packets

packets_ads <- lm(packets ~ ads)
summary(packets_ads)
## 
## Call:
## lm(formula = packets ~ ads)
## 
## Residuals:
##        1        2        3        4        5 
## -2.39286  0.12500  1.12500  1.08929  0.05357 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   2.8036     2.7676   1.013   0.3857  
## ads           1.5179     0.4939   3.073   0.0544 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.653 on 3 degrees of freedom
## Multiple R-squared:  0.7589, Adjusted R-squared:  0.6786 
## F-statistic: 9.444 on 1 and 3 DF,  p-value: 0.05443
  • For every unit increase in ad shown, the packets sale goes up by 1.5179 units
  • If there are no ads, packets sold woule be 2.8036

Calculating the correlation value, r

cov(ads,packets)/(sd(ads)*sd(packets))
## [1] 0.8711651

Finding correlation co-efficient from regression

\(\beta=r*\frac{\sigma_{packets}}{\sigma_{ads}}\)

beta = packets_ads[[1]][2]

We know that \(\beta=1.517857\) So, substituting in above equation, we can find r

packets_ads[[1]][2] * (sd(ads)/sd(packets))
##       ads 
## 0.8711651