Олеся Волченко
22 ноября 2022
| Взаимосвязь | третья переменная |
|---|---|
| Чем больше размер ноги у ребенка, тем он/она умнее | ? |
| Чем меньше рост человека, тем длиннее у него волосы | ? |
| Успеваемость детей выше в больших классах | ? |
| Люди, использующие интернет, более счастливы | ? |
| Продажи мороженого положительно связаны с числом утонувших | ? |
image source: http://www.math.com/school/subject2/lessons/S2U4L2GL.html#sm1
Есть 2 переменные: x и y
x <- rnorm(50, 0, 1)
y <- 3 * x + rnorm(50, 0, 1)
data <- data.frame(x, y)
library(knitr)
kable(data)| x | y |
|---|---|
| -0.0420556 | 0.5375396 |
| 0.8294886 | 2.8540532 |
| -0.0365314 | 0.7394153 |
| 0.2105915 | 1.8580477 |
| 0.9946852 | 2.3642316 |
| -1.6090263 | -2.7957574 |
| -1.8319644 | -5.2320124 |
| 1.1489460 | 4.5646913 |
| -2.0052268 | -6.2681064 |
| -0.4576069 | -1.9061540 |
| -0.8430764 | -3.3892007 |
| 0.2194096 | 1.3838598 |
| 0.6534667 | 1.4168551 |
| 0.9266658 | 3.5620675 |
| 0.2123688 | 1.4899677 |
| -1.1216443 | -3.5516881 |
| 0.8906316 | 3.0148303 |
| 1.0072476 | 3.7730458 |
| -0.4796909 | -2.4513453 |
| 0.0556535 | 0.0458745 |
| 0.1889695 | -0.2651105 |
| 0.1163043 | -0.0669968 |
| -2.0867750 | -7.6099401 |
| 1.4688240 | 4.6039039 |
| -0.4476705 | -1.9104429 |
| 0.1508349 | -0.1324566 |
| 1.3299648 | 3.8582357 |
| -1.4597276 | -3.7786285 |
| 2.0213237 | 5.6173115 |
| 0.3902761 | 1.9510453 |
| -0.8059297 | -0.5299832 |
| -0.9580052 | -2.4226523 |
| 0.6459511 | 2.1698346 |
| 2.0494385 | 5.7930219 |
| 0.2644132 | 2.2680020 |
| 1.1738013 | 1.7261510 |
| 0.1894638 | 0.6644543 |
| 0.2548628 | 0.5268850 |
| 0.3071534 | 0.1429770 |
| 0.1029325 | 0.0201419 |
| 1.8215032 | 5.3972471 |
| 0.7463561 | 1.5809223 |
| -0.3374921 | -1.8074850 |
| -0.6843752 | -3.0694714 |
| 0.6212784 | 2.0992369 |
| 0.3754429 | 1.3152758 |
| 1.6351216 | 3.9681053 |
| 0.3276417 | 2.3506046 |
| 0.7083033 | 0.3979142 |
| 0.2424356 | 0.0150273 |
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 23.256, df = 48, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9274307 0.9762872
## sample estimates:
## cor
## 0.9583755
##
## Call:
## lm(formula = y ~ x, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.71322 -0.60132 -0.08079 0.61095 1.90246
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.006924 0.125357 0.055 0.956
## x 2.924218 0.125741 23.256 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8716 on 48 degrees of freedom
## Multiple R-squared: 0.9185, Adjusted R-squared: 0.9168
## F-statistic: 540.8 on 1 and 48 DF, p-value: < 2.2e-16
Dummy-переменные:
Данные о зарплате и возрасте 20 мужчин и женщин
| age | salary | sex |
|---|---|---|
| 36 | 117.07733 | M |
| 39 | 127.09086 | M |
| 34 | 112.68538 | M |
| 29 | 98.48969 | M |
| 25 | 86.51680 | M |
| 34 | 110.50185 | M |
| 30 | 99.26856 | M |
| 36 | 118.09416 | M |
| 31 | 102.80794 | M |
| 31 | 101.94750 | M |
| 36 | 108.88374 | F |
| 23 | 67.45167 | F |
| 36 | 108.70234 | F |
| 22 | 65.09488 | F |
| 31 | 91.36838 | F |
| 20 | 57.64829 | F |
| 28 | 83.64096 | F |
| 46 | 137.19560 | F |
| 29 | 85.96395 | F |
| 32 | 93.91930 | F |
par(mfrow = c(1, 2))
plot(genderdata$age, genderdata$salary, pch = 16, xlab = "age", ylab = "salary")
boxplot(genderdata$salary ~ genderdata$sex)##
## Call:
## lm(formula = salary ~ age + sex, data = genderdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49009 -0.79342 -0.09051 0.63942 1.78846
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.80055 1.27777 -1.409 0.177
## age 3.02929 0.04065 74.524 < 2e-16 ***
## sexM 10.79666 0.48940 22.061 5.99e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.076 on 17 degrees of freedom
## Multiple R-squared: 0.9975, Adjusted R-squared: 0.9972
## F-statistic: 3435 on 2 and 17 DF, p-value: < 2.2e-16
| salary | ||
|---|---|---|
| Predictors | Estimates | p |
| (Intercept) | -1.80 | 0.177 |
| age | 3.03 | <0.001 |
| sex [M] | 10.80 | <0.001 |
| Observations | 20 | |
| R2 / R2 adjusted | 0.998 / 0.997 | |
## educ dummy1 dummy1.1
## 1 tertiary 0 0
## 2 secondary 1 0
## 3 primary 0 1
Зависимая переменная
Независимые переменные
data <- read.spss("/Users/olesyavolchenko/Yandex.Disk.localized/datafiles/ESS/ESS7e02_2.sav", to.data.frame = T, use.value.labels = T)
data1 <- data[which(data$cntry == "United Kingdom"), ]
data1$trust <- as.numeric(data1$ppltrst) + as.numeric(data1$pplfair) + as.numeric(data1$pplhlp)
data1$agea <- as.numeric(as.character(data1$agea))
data1$eduyrs <- as.numeric(as.character(data1$eduyrs))
data1$hinctnta <- as.numeric(data1$hinctnta)| Overall (N=2264) | |
|---|---|
| trust | |
| Mean (SD) | 20.1 (5.22) |
| Median [Min, Max] | 21.0 [3.00, 33.0] |
| Missing | 14 (0.6%) |
| agea | |
| Mean (SD) | 52.2 (18.4) |
| Median [Min, Max] | 52.0 [15.0, 94.0] |
| Missing | 21 (0.9%) |
| eduyrs | |
| Mean (SD) | 13.6 (3.75) |
| Median [Min, Max] | 13.0 [0, 47.0] |
| Missing | 14 (0.6%) |
| hinctnta | |
| Mean (SD) | 4.95 (3.00) |
| Median [Min, Max] | 5.00 [1.00, 10.0] |
| Missing | 363 (16.0%) |
| marsts | |
| Legally married | 119 (5.3%) |
| In a legally registered civil union | 0 (0%) |
| Legally separated | 0 (0%) |
| Legally divorced/civil union dissolved | 280 (12.4%) |
| Widowed/civil partner died | 278 (12.3%) |
| None of these (NEVER married or in legally registered civil union) | 621 (27.4%) |
| Missing | 966 (42.7%) |
##
## Shapiro-Wilk normality test
##
## data: data1$trust
## W = 0.97308, p-value < 2.2e-16
##
## Call:
## lm(formula = trust ~ gndr + agea + eduyrs + hinctnta + marsts,
## data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.4422 -3.1892 0.6636 3.7162 15.6552
##
## Coefficients:
## Estimate
## (Intercept) 13.65197
## gndrFemale 0.21721
## agea 0.04946
## eduyrs 0.20883
## hinctnta 0.15818
## marstsLegally divorced/civil union dissolved -0.60940
## marstsWidowed/civil partner died 0.58870
## marstsNone of these (NEVER married or in legally registered civil union) 0.18298
## Std. Error
## (Intercept) 1.20466
## gndrFemale 0.34950
## agea 0.01280
## eduyrs 0.04971
## hinctnta 0.06588
## marstsLegally divorced/civil union dissolved 0.65411
## marstsWidowed/civil partner died 0.70527
## marstsNone of these (NEVER married or in legally registered civil union) 0.63993
## t value
## (Intercept) 11.333
## gndrFemale 0.621
## agea 3.865
## eduyrs 4.201
## hinctnta 2.401
## marstsLegally divorced/civil union dissolved -0.932
## marstsWidowed/civil partner died 0.835
## marstsNone of these (NEVER married or in legally registered civil union) 0.286
## Pr(>|t|)
## (Intercept) < 2e-16
## gndrFemale 0.534407
## agea 0.000118
## eduyrs 2.88e-05
## hinctnta 0.016517
## marstsLegally divorced/civil union dissolved 0.351730
## marstsWidowed/civil partner died 0.404072
## marstsNone of these (NEVER married or in legally registered civil union) 0.774978
##
## (Intercept) ***
## gndrFemale
## agea ***
## eduyrs ***
## hinctnta *
## marstsLegally divorced/civil union dissolved
## marstsWidowed/civil partner died
## marstsNone of these (NEVER married or in legally registered civil union)
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.438 on 1061 degrees of freedom
## (1195 пропущенных наблюдений удалены)
## Multiple R-squared: 0.04898, Adjusted R-squared: 0.0427
## F-statistic: 7.806 on 7 and 1061 DF, p-value: 3.082e-09
| trust | ||
|---|---|---|
| Predictors | Estimates | p |
| (Intercept) | 13.65 | <0.001 |
| gndr [Female] | 0.22 | 0.534 |
| agea | 0.05 | <0.001 |
| eduyrs | 0.21 | <0.001 |
| hinctnta | 0.16 | 0.017 |
|
marsts [Legally divorced/civil union dissolved] |
-0.61 | 0.352 |
|
marsts [Widowed/civil partner died] |
0.59 | 0.404 |
|
marsts [None of these (NEVER married or in legally registered civil union)] |
0.18 | 0.775 |
| Observations | 1069 | |
| R2 / R2 adjusted | 0.049 / 0.043 | |
data1$marsts1 <- relevel(data1$marsts, ref = "Widowed/civil partner died")
model1 <- lm(trust ~ gndr + agea + eduyrs + hinctnta + marsts1, data = data1)
summary(model1)##
## Call:
## lm(formula = trust ~ gndr + agea + eduyrs + hinctnta + marsts1,
## data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.4422 -3.1892 0.6636 3.7162 15.6552
##
## Coefficients:
## Estimate
## (Intercept) 14.24066
## gndrFemale 0.21721
## agea 0.04946
## eduyrs 0.20883
## hinctnta 0.15818
## marsts1Legally married -0.58870
## marsts1Legally divorced/civil union dissolved -1.19810
## marsts1None of these (NEVER married or in legally registered civil union) -0.40571
## Std. Error
## (Intercept) 1.28284
## gndrFemale 0.34950
## agea 0.01280
## eduyrs 0.04971
## hinctnta 0.06588
## marsts1Legally married 0.70527
## marsts1Legally divorced/civil union dissolved 0.53831
## marsts1None of these (NEVER married or in legally registered civil union) 0.62926
## t value
## (Intercept) 11.101
## gndrFemale 0.621
## agea 3.865
## eduyrs 4.201
## hinctnta 2.401
## marsts1Legally married -0.835
## marsts1Legally divorced/civil union dissolved -2.226
## marsts1None of these (NEVER married or in legally registered civil union) -0.645
## Pr(>|t|)
## (Intercept) < 2e-16
## gndrFemale 0.534407
## agea 0.000118
## eduyrs 2.88e-05
## hinctnta 0.016517
## marsts1Legally married 0.404072
## marsts1Legally divorced/civil union dissolved 0.026247
## marsts1None of these (NEVER married or in legally registered civil union) 0.519234
##
## (Intercept) ***
## gndrFemale
## agea ***
## eduyrs ***
## hinctnta *
## marsts1Legally married
## marsts1Legally divorced/civil union dissolved *
## marsts1None of these (NEVER married or in legally registered civil union)
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.438 on 1061 degrees of freedom
## (1195 пропущенных наблюдений удалены)
## Multiple R-squared: 0.04898, Adjusted R-squared: 0.0427
## F-statistic: 7.806 on 7 and 1061 DF, p-value: 3.082e-09
model1scaled <- lm(scale(trust) ~ gndr + scale(as.numeric(agea)) + scale(as.numeric(eduyrs)) +
scale(as.numeric(hinctnta)) + marsts1, data = data1)
summary(model1scaled)##
## Call:
## lm(formula = scale(trust) ~ gndr + scale(as.numeric(agea)) +
## scale(as.numeric(eduyrs)) + scale(as.numeric(hinctnta)) +
## marsts1, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5323 -0.6108 0.1271 0.7118 2.9985
##
## Coefficients:
## Estimate
## (Intercept) 0.05856
## gndrFemale 0.04160
## scale(as.numeric(agea)) 0.17419
## scale(as.numeric(eduyrs)) 0.15000
## scale(as.numeric(hinctnta)) 0.09075
## marsts1Legally married -0.11276
## marsts1Legally divorced/civil union dissolved -0.22948
## marsts1None of these (NEVER married or in legally registered civil union) -0.07771
## Std. Error
## (Intercept) 0.10017
## gndrFemale 0.06694
## scale(as.numeric(agea)) 0.04507
## scale(as.numeric(eduyrs)) 0.03571
## scale(as.numeric(hinctnta)) 0.03779
## marsts1Legally married 0.13508
## marsts1Legally divorced/civil union dissolved 0.10311
## marsts1None of these (NEVER married or in legally registered civil union) 0.12053
## t value
## (Intercept) 0.585
## gndrFemale 0.621
## scale(as.numeric(agea)) 3.865
## scale(as.numeric(eduyrs)) 4.201
## scale(as.numeric(hinctnta)) 2.401
## marsts1Legally married -0.835
## marsts1Legally divorced/civil union dissolved -2.226
## marsts1None of these (NEVER married or in legally registered civil union) -0.645
## Pr(>|t|)
## (Intercept) 0.558979
## gndrFemale 0.534407
## scale(as.numeric(agea)) 0.000118
## scale(as.numeric(eduyrs)) 2.88e-05
## scale(as.numeric(hinctnta)) 0.016517
## marsts1Legally married 0.404072
## marsts1Legally divorced/civil union dissolved 0.026247
## marsts1None of these (NEVER married or in legally registered civil union) 0.519234
##
## (Intercept)
## gndrFemale
## scale(as.numeric(agea)) ***
## scale(as.numeric(eduyrs)) ***
## scale(as.numeric(hinctnta)) *
## marsts1Legally married
## marsts1Legally divorced/civil union dissolved *
## marsts1None of these (NEVER married or in legally registered civil union)
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.042 on 1061 degrees of freedom
## (1195 пропущенных наблюдений удалены)
## Multiple R-squared: 0.04898, Adjusted R-squared: 0.0427
## F-statistic: 7.806 on 7 and 1061 DF, p-value: 3.082e-09
| trust | scale(trust) | |||
|---|---|---|---|---|
| Predictors | Estimates | p | Estimates | p |
| (Intercept) | 14.24 | <0.001 | 0.06 | 0.559 |
| gndr [Female] | 0.22 | 0.534 | 0.04 | 0.534 |
| agea | 0.05 | <0.001 | ||
| eduyrs | 0.21 | <0.001 | ||
| hinctnta | 0.16 | 0.017 | ||
| marsts1 [Legally married] | -0.59 | 0.404 | -0.11 | 0.404 |
|
marsts1 [Legally divorced/civil union dissolved] |
-1.20 | 0.026 | -0.23 | 0.026 |
|
marsts1 [None of these (NEVER married or in legally registered civil union)] |
-0.41 | 0.519 | -0.08 | 0.519 |
| agea | 0.17 | <0.001 | ||
| eduyrs | 0.15 | <0.001 | ||
| hinctnta | 0.09 | 0.017 | ||
| Observations | 1069 | 1069 | ||
| R2 / R2 adjusted | 0.049 / 0.043 | 0.049 / 0.043 | ||
Что нужно?
Что мы получим?
Artwork by @allison_horst
Используя базу данных ESS round 5, на примере одной страны проанализируйте, как связаны идеальный возраст выхода на пенсию (agertr, зависимая переменная) и следующие независимые переменные: возраст (agea), пол (gndr) и семейное положение (marsts)