Olesya Volchenko and Anna Shirokanova
May 13, 2021
| Relationship | Third variable |
|---|---|
| The larger the foot size of a kid, the more clever s/he is | ? |
| The lower the person, the longer the hair of that person | ? |
| The larger the school class, the better are average grades | ? |
| People using the Internet daily in Africa are happier | ? |
| Ice-cream sales are positively related to the number of people drowning | ? |
| Relationship | Third variable |
|---|---|
| The larger the foot size of a kid, the more clever s/he is | Age |
| The taller the person, the shorter the hair of that person | Gender |
| The larger the school class, the better are average grades | School size / equipment |
| People using the Internet daily in Africa are happier | Income |
| Ice-cream sales are positively related to the number of people drowning | Season |
There are variables X and Y, n = 50, normally distributed
| y | x |
|---|---|
| 0.0949945 | -0.2807451 |
| -1.4442674 | -0.5776430 |
| -2.4506750 | -1.1273309 |
| -2.2369508 | -0.9033092 |
| 2.1592886 | 0.4880545 |
| -6.8133458 | -1.6500107 |
## y x
## Min. :-8.146 Min. :-2.7091
## 1st Qu.:-3.083 1st Qu.:-0.9292
## Median :-1.367 Median :-0.3983
## Mean :-1.138 Mean :-0.3787
## 3rd Qu.: 1.265 3rd Qu.: 0.2604
## Max. : 7.922 Max. : 2.4979
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 21.127, df = 48, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9134637 0.9715847
## sample estimates:
## cor
## 0.9502104
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.39265 -0.72426 0.04457 0.70943 1.95334
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.009131 0.153084 -0.06 0.953
## x 2.980894 0.141096 21.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.014 on 48 degrees of freedom
## Multiple R-squared: 0.9029, Adjusted R-squared: 0.9009
## F-statistic: 446.3 on 1 and 48 DF, p-value: < 2.2e-16
200 males and females
## age salary sex
## Min. :17.00 Min. : 49.63 Length:200
## 1st Qu.:26.75 1st Qu.: 84.08 Class :character
## Median :30.00 Median : 94.42 Mode :character
## Mean :30.09 Mean : 95.50
## 3rd Qu.:34.25 3rd Qu.:106.66
## Max. :46.00 Max. :146.75
| salary | ||
|---|---|---|
| Predictors | Estimates | p |
| (Intercept) | -0.29 | 0.583 |
| age | 3.02 | <0.001 |
| sex [M] | 9.85 | <0.001 |
| Observations | 200 | |
| R2 / R2 adjusted | 0.994 / 0.994 | |
## educ dummy1 dummy2
## 1 tertiary 0 0
## 2 secondary 1 0
## 3 primary 0 1
Artwork by @allison_horst
model3 <- lm(salary ~ age + sex, data = genderdata2)
model4 <- lm(salary ~ age * sex, data = genderdata2)
tab_model(model3, model4, show.ci = F)| salary | salary | |||
|---|---|---|---|---|
| Predictors | Estimates | p | Estimates | p |
| (Intercept) | 9.62 | 0.164 | 100.67 | <0.001 |
| age | 3.97 | <0.001 | 0.98 | <0.001 |
| sex [M] | 81.04 | <0.001 | -96.41 | <0.001 |
| age * sex [M] | 5.89 | <0.001 | ||
| Observations | 200 | 200 | ||
| R2 / R2 adjusted | 0.904 / 0.903 | 0.990 / 0.990 | ||
Sometimes yes. Heteroskedasticity may indicate a moderation/ interaction. Let’s recall the assumptions first.
x <- 1:100
y1 <- rnorm(n = 100, mean = x, sd = 10)
y2 <- rnorm(n = 100, mean = x, sd = 0.4*x)
par(mfrow = c(1, 2))
plot(x, y1, pch = 16); abline(lm(y1 ~ x), col = "red")
plot(x, y2, pch = 16); abline(lm(y2 ~ x), col = "red")Pane on the right-hand side with data points ‘fanning out’ shows there is a third variable which comes into play at high values of X
| Feature | Value |
|---|---|
| Mean x | 9.0 |
| Variance x | 10.0 |
| Mean y | 7.5 |
| Variance y | 3.75 |
| Correlation between x and y | 0.816 |
| Regression fitted line | y = 3 + 0.5x |