Olesya Volchenko and Anna Shirokanova
April 2018
| Relationship | Third variable |
|---|---|
| The larger the foot size of a kid, the more clever s/he is | ? |
| The lower the person, the longer the hair of that person | ? |
| The larger the school class, the better are average grades | ? |
| People using the Internet daily in Africa are happier | ? |
| Ice-cream sales are positively related to the number of people drowning | ? |
| Relationship | Third variable |
|---|---|
| The larger the foot size of a kid, the more clever s/he is | Age |
| The taller the person, the shorter the hair of that person | Gender |
| The larger the school class, the better are average grades | School size / equipment |
| People using the Internet daily in Africa are happier | Income |
| Ice-cream sales are positively related to the number of people drowning | Season |
There are variables X and Y
| y | x |
|---|---|
| 2.3434072 | 1.1062056 |
| -5.7028911 | -1.6623645 |
| -1.2161804 | -0.2888668 |
| 1.0362797 | 0.5752666 |
| 1.3464117 | 0.3875749 |
| 2.0395967 | 0.2835436 |
| 1.2407077 | 0.1530659 |
| 4.1694978 | 1.0044248 |
| -0.8050663 | -0.1653561 |
| 2.7972354 | 1.1869595 |
| 1.2171334 | 0.2319040 |
| 0.2003237 | 0.4613343 |
| 1.4274962 | 0.6209180 |
| 6.7268963 | 2.1472300 |
| 0.4021215 | 0.1328819 |
| 0.3360385 | 0.0873094 |
| 2.6072062 | 0.6217123 |
| -4.0830110 | -0.8438386 |
| 5.9026377 | 1.3107213 |
| 2.9125798 | 0.7172064 |
| 4.9443642 | 1.6982076 |
| 2.9856271 | 1.1205246 |
| 1.3026918 | 0.3585168 |
| 1.9968218 | 0.3650547 |
| 4.3063373 | 1.7451252 |
| -4.0624447 | -1.3103822 |
| 3.3363657 | 1.3952343 |
| -4.5741076 | -1.2653576 |
| 4.8788024 | 1.2802166 |
| 1.5956168 | 0.5503804 |
| -5.5108671 | -1.9861473 |
| 1.0963555 | 0.3808488 |
| 4.0262914 | 1.0942326 |
| -0.9364985 | -0.6841675 |
| -2.9122435 | -0.9642431 |
| -5.3560686 | -1.6591004 |
| -5.0853931 | -1.3529958 |
| -2.7138662 | -0.6265333 |
| 0.7735975 | 0.4267002 |
| -0.0738835 | -0.4384743 |
| 4.4310860 | 1.0516549 |
| -3.1855930 | -0.9060857 |
| 0.8373001 | 0.2054775 |
| 2.8863445 | 1.1665390 |
| 3.6380946 | 0.6640382 |
| -4.3165562 | -0.7684066 |
| 2.0484183 | 0.2688752 |
| -4.3024800 | -1.1724599 |
| 6.1449382 | 1.7689950 |
| -3.1971329 | -0.7483182 |
plot(x, y, pch = 16)cor.test(x, y, method = "pearson")##
## Pearson's product-moment correlation
##
## data: x and y
## t = 26.972, df = 48, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9449707 0.9821281
## sample estimates:
## cor
## 0.9685582
model1 <- lm(y ~ x)
summary(model1)##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77859 -0.61624 -0.05182 0.61653 1.66946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03548 0.12302 -0.288 0.774
## x 3.25672 0.12074 26.972 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8539 on 48 degrees of freedom
## Multiple R-squared: 0.9381, Adjusted R-squared: 0.9368
## F-statistic: 727.5 on 1 and 48 DF, p-value: < 2.2e-16
plot(x, y, pch = 16)
abline(model1, col = "red")
text(0, 2, paste("y = ", round(model1$coefficients[1], 1), "+", round(model1$coefficients[2],
1), "* x"))
abline(v = 0, lty = 2, col = "gray")20 males and females (this is too few for a real regression)
## age salary sex
## 1 30 98.88262 M
## 2 36 118.76878 M
## 3 38 126.83304 M
## 4 35 114.72223 M
## 5 35 114.31441 M
## 6 26 90.22501 M
model2 <- lm(salary ~ age + sex, data = genderdata)sjt.lm(model2, show.ci = F)| salary | |||
| B | p | ||
| (Intercept) | -1.72 | .545 | |
| age | 3.07 | <.001 | |
| sex (M) | 9.88 | <.001 | |
| Observations | 20 | ||
| R2 / adj. R2 | .992 / .991 | ||
## educ dummy1 dummy2
## 1 tertiary 0 0
## 2 secondary 1 0
## 3 primary 0 1
## age salary sex
## 1 35 240.6612 M
## 2 27 194.1506 M
## 3 40 281.2867 M
## 4 22 157.2166 M
## 5 24 166.1852 M
## 6 36 257.1027 M
model3 <- lm(salary ~ age + sex, data = genderdata2)
model4 <- lm(salary ~ age * sex, data = genderdata2)
sjt.lm(model3, model4, show.ci = F)| salary | salary | |||||
| B | p | B | p | |||
| (Intercept) | -17.75 | .470 | 111.38 | <.001 | ||
| age | 4.80 | <.001 | 0.51 | .093 | ||
| sexM | 83.72 | <.001 | -112.88 | <.001 | ||
| age:sexM | 6.57 | <.001 | ||||
| Observations | 20 | 20 | ||||
| R2 / adj. R2 | .894 / .882 | .995 / .994 | ||||
x <- 1:100
y1 <- rnorm(n = 100, mean = x, sd = 10)
y2 <- rnorm(n = 100, mean = x, sd = 0.4*x)
par(mfrow = c(1, 2))
plot(x, y1, pch = 16); abline(lm(y1 ~ x), col = "red")
plot(x, y2, pch = 16); abline(lm(y2 ~ x), col = "red")| Feature | Value |
|---|---|
| Mean x | 9.0 |
| Variance x | 10.0 |
| Mean y | 7.5 |
| Variance y | 3.75 |
| Correlation between x and y | 0.816 |
| Regression fitted line | y = 3 + 0.5x |
This example originates from the book by Jeremy Miles and Mark Shevlin (2000).
##
## Shapiro-Wilk normality test
##
## data: df$grade
## W = 0.93009, p-value = 0.01622
model3 = lm(grade ~ books + attend, data = df)
summary(model3)##
## Call:
## lm(formula = grade ~ books + attend, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.802 -13.374 0.060 9.173 32.295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.379 7.745 4.827 2.41e-05 ***
## books 4.037 1.753 2.303 0.0270 *
## attend 1.284 0.587 2.187 0.0352 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.05 on 37 degrees of freedom
## Multiple R-squared: 0.3287, Adjusted R-squared: 0.2924
## F-statistic: 9.059 on 2 and 37 DF, p-value: 0.0006278
model3scaled <- lm(grade ~ scale(as.numeric(books)) +
scale(as.numeric(attend)), data = df)
summary(model3scaled)##
## Call:
## lm(formula = grade ~ scale(as.numeric(books)) + scale(as.numeric(attend)),
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.802 -13.374 0.060 9.173 32.295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 63.550 2.222 28.602 <2e-16 ***
## scale(as.numeric(books)) 5.782 2.511 2.303 0.0270 *
## scale(as.numeric(attend)) 5.490 2.511 2.187 0.0352 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.05 on 37 degrees of freedom
## Multiple R-squared: 0.3287, Adjusted R-squared: 0.2924
## F-statistic: 9.059 on 2 and 37 DF, p-value: 0.0006278
model5 = lm(grade ~ attend * books, data = df)
summary(model5)##
## Call:
## lm(formula = grade ~ attend * books, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.844 -11.850 1.466 8.163 34.198
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.2213 11.2621 4.903 2.02e-05 ***
## attend -0.1368 0.8782 -0.156 0.8771
## books -6.2078 5.1508 -1.205 0.2360
## attend:books 0.7349 0.3494 2.104 0.0425 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.44 on 36 degrees of freedom
## Multiple R-squared: 0.4022, Adjusted R-squared: 0.3524
## F-statistic: 8.073 on 3 and 36 DF, p-value: 0.0003058
anova(model3, model5)## Analysis of Variance Table
##
## Model 1: grade ~ books + attend
## Model 2: grade ~ attend * books
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 37 7306.2
## 2 36 6506.4 1 799.78 4.4252 0.04246 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model6 = lm(grade ~ lucky * books, data = df)
summary(model6)##
## Call:
## lm(formula = grade ~ lucky * books, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.481 -9.802 -2.542 11.454 29.109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.542 4.980 9.547 2.11e-11 ***
## lucky 11.759 8.227 1.429 0.16152
## books 6.004 2.104 2.853 0.00713 **
## lucky:books -1.709 3.336 -0.512 0.61149
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.46 on 36 degrees of freedom
## Multiple R-squared: 0.3086, Adjusted R-squared: 0.251
## F-statistic: 5.356 on 3 and 36 DF, p-value: 0.003737
sjp.int(model5, type = "eff")sjp.int(model5, type = "eff", show.ci = T)x = plotSlopes(model5, plotx = "books",
modx = "attend", interval = "conf")plot(testSlopes(x), shade = T)## Values of attend OUTSIDE this interval:
## lo hi
## -141.07291 13.13872
## cause the slope of (b1 + b2*attend)books to be statistically significant
This is marginal effect
sjp.int(model5, type = "eff")sjp.int(model5, type = "eff", show.ci = T)sjp.int(model5, type = "eff", show.ci = T, mdrt.values = "all")sjt.lm(model3, model5, model6)## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
| grade | grade | grade | ||||||||||
| B | CI | p | B | CI | p | B | CI | p | ||||
| (Intercept) | 37.38 | 21.69 – 53.07 | <.001 | 55.22 | 32.38 – 78.06 | <.001 | 47.54 | 37.44 – 57.64 | <.001 | |||
| books | 4.04 | 0.48 – 7.59 | .027 | -6.21 | -16.65 – 4.24 | .236 | 6.00 | 1.74 – 10.27 | .007 | |||
| attend | 1.28 | 0.09 – 2.47 | .035 | -0.14 | -1.92 – 1.64 | .877 | ||||||
| attend:books | 0.73 | 0.03 – 1.44 | .042 | |||||||||
| lucky | 11.76 | -4.93 – 28.44 | .162 | |||||||||
| lucky:books | -1.71 | -8.47 – 5.06 | .611 | |||||||||
| Observations | 40 | 40 | 40 | |||||||||
| R2 / adj. R2 | .329 / .292 | .402 / .352 | .309 / .251 | |||||||||
For interpretation, see: https://www.theanalysisfactor.com/interpreting-interactions-in-regression/
predictors - >= 1 pc.
R2
rpubs.com/shirokaner/lablm1