We have two variables. How do we calculate correlation and regression parameters?
Our dataset
adsNumber of ads shown per daypacketsNumber of packets of chips bought
## [1] 5 4 4 6 8
## [1] 8 9 10 13 15
Seeing is believing
library(ggpubr)
data <- data.frame(ads,packets)
ggscatter(data,
x = "ads",
y = "packets",
color = "steelblue",
palette = "npr",
add = "reg.line",
rug = TRUE) +
stat_cor(method = "pearson", label.x = 4, label.y = 16)Let us run a correlation analyis to check if these two variables are associated!
## [1] 0.8711651
Run a linear model to see how well ads predict packets
##
## Call:
## lm(formula = packets ~ ads)
##
## Residuals:
## 1 2 3 4 5
## -2.39286 0.12500 1.12500 1.08929 0.05357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8036 2.7676 1.013 0.3857
## ads 1.5179 0.4939 3.073 0.0544 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.653 on 3 degrees of freedom
## Multiple R-squared: 0.7589, Adjusted R-squared: 0.6786
## F-statistic: 9.444 on 1 and 3 DF, p-value: 0.05443
- For every unit increase in ad shown, the packets sale goes up by 1.5179 units
- If there are no ads, packets sold woule be 2.8036