With ESS data depression is measured by the CES-D8 scale. First we load the data, check reliability, and compute the score:
Depression values across Europe can than be then summarized as follows:
summary(df$cesd8)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.375 1.625 1.695 2.000 4.000 799
Use regression model to describe gender effect on depression in europe. To create a ```{r} function, press ctrl+alt+i
tapply(df$cesd8,df$gndr, mean, na.rm=T)
## Male Female
## 1.628996 1.752677
df$female = as.numeric(df$gndr == "Female")
table(df$female)
##
## 0 1
## 18760 21396
model = lm(cesd8 ~ female, data=df)
summary(model)
##
## Call:
## lm(formula = cesd8 ~ female, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7527 -0.3777 -0.1277 0.2473 2.3710
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.628996 0.003654 445.8 <2e-16 ***
## female 0.123682 0.005007 24.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4956 on 39355 degrees of freedom
## (799 observations deleted due to missingness)
## Multiple R-squared: 0.01527, Adjusted R-squared: 0.01524
## F-statistic: 610.2 on 1 and 39355 DF, p-value: < 2.2e-16
tapply(df$cesd8,df$health, mean, na.rm=T)
## Very good Good Fair Bad Very bad
## 1.472164 1.625812 1.867098 2.265404 2.616609
lm(cesd8 ~ health, data=df)
##
## Call:
## lm(formula = cesd8 ~ health, data = df)
##
## Coefficients:
## (Intercept) healthGood healthFair healthBad healthVery bad
## 1.4722 0.1536 0.3949 0.7932 1.1444
In general, the CES-D8 scores show a unimodal, slightly right skewed distribution:
summary(df$cesd8)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.375 1.625 1.695 2.000 4.000 799
Depression varies with age because biological, psychological, and social conditions change over the life course: it often rises in adolescence due to developmental and social stress, stabilizes or declines in midlife as coping improves, and may increase again in older age due to illness, loss, and loneliness.
In general, we expect an increase of depression scores with increasing age.
To test our hypothesis, we estimate a linear regression model.
model_1= lm(cesd8 ~ agea, data=df)
summary(model_1)
##
## Call:
## lm(formula = cesd8 ~ agea, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03759 -0.36137 -0.07893 0.26964 2.39464
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.53726 0.04843 31.744 < 2e-16 ***
## agea16 0.07411 0.05472 1.354 0.175597
## agea17 0.03023 0.05493 0.550 0.582030
## agea18 0.10798 0.05461 1.977 0.048013 *
## agea19 0.12499 0.05370 2.328 0.019935 *
## agea20 0.11668 0.05428 2.150 0.031578 *
## agea21 0.09496 0.05439 1.746 0.080804 .
## agea22 0.09448 0.05425 1.742 0.081582 .
## agea23 0.10096 0.05440 1.856 0.063471 .
## agea24 0.10552 0.05365 1.967 0.049232 *
## agea25 0.15896 0.05388 2.950 0.003177 **
## agea26 0.11634 0.05404 2.153 0.031342 *
## agea27 0.09818 0.05426 1.809 0.070394 .
## agea28 0.13110 0.05380 2.437 0.014825 *
## agea29 0.13511 0.05347 2.527 0.011508 *
## agea30 0.12470 0.05335 2.338 0.019416 *
## agea31 0.09238 0.05364 1.722 0.085059 .
## agea32 0.09954 0.05306 1.876 0.060645 .
## agea33 0.12745 0.05249 2.428 0.015183 *
## agea34 0.10673 0.05269 2.026 0.042797 *
## agea35 0.12177 0.05272 2.310 0.020900 *
## agea36 0.05820 0.05297 1.099 0.271892
## agea37 0.11597 0.05299 2.188 0.028642 *
## agea38 0.07890 0.05232 1.508 0.131556
## agea39 0.06810 0.05239 1.300 0.193618
## agea40 0.11542 0.05273 2.189 0.028618 *
## agea41 0.12506 0.05237 2.388 0.016955 *
## agea42 0.09132 0.05228 1.747 0.080680 .
## agea43 0.10377 0.05231 1.984 0.047285 *
## agea44 0.14334 0.05220 2.746 0.006032 **
## agea45 0.14633 0.05240 2.792 0.005235 **
## agea46 0.12806 0.05253 2.438 0.014774 *
## agea47 0.10534 0.05234 2.012 0.044179 *
## agea48 0.09292 0.05229 1.777 0.075555 .
## agea49 0.11420 0.05197 2.197 0.028005 *
## agea50 0.11947 0.05216 2.291 0.021992 *
## agea51 0.15321 0.05214 2.938 0.003301 **
## agea52 0.15323 0.05194 2.950 0.003181 **
## agea53 0.15182 0.05185 2.928 0.003413 **
## agea54 0.14496 0.05202 2.786 0.005331 **
## agea55 0.18683 0.05210 3.586 0.000336 ***
## agea56 0.16667 0.05199 3.206 0.001349 **
## agea57 0.18987 0.05188 3.660 0.000252 ***
## agea58 0.17107 0.05163 3.314 0.000921 ***
## agea59 0.20262 0.05194 3.901 9.59e-05 ***
## agea60 0.17041 0.05196 3.280 0.001040 **
## agea61 0.13789 0.05201 2.651 0.008022 **
## agea62 0.16134 0.05230 3.085 0.002038 **
## agea63 0.14219 0.05194 2.737 0.006195 **
## agea64 0.15870 0.05160 3.076 0.002102 **
## agea65 0.13682 0.05200 2.631 0.008510 **
## agea66 0.16096 0.05196 3.098 0.001950 **
## agea67 0.15599 0.05187 3.007 0.002636 **
## agea68 0.16188 0.05178 3.126 0.001773 **
## agea69 0.15090 0.05208 2.898 0.003762 **
## agea70 0.17359 0.05228 3.321 0.000899 ***
## agea71 0.14735 0.05257 2.803 0.005067 **
## agea72 0.15880 0.05256 3.021 0.002519 **
## agea73 0.20567 0.05269 3.904 9.49e-05 ***
## agea74 0.22546 0.05259 4.287 1.81e-05 ***
## agea75 0.23595 0.05273 4.475 7.68e-06 ***
## agea76 0.24728 0.05320 4.648 3.36e-06 ***
## agea77 0.26888 0.05350 5.026 5.02e-07 ***
## agea78 0.33565 0.05412 6.202 5.62e-10 ***
## agea79 0.21659 0.05503 3.936 8.30e-05 ***
## agea80 0.25237 0.05483 4.603 4.18e-06 ***
## agea81 0.29689 0.05606 5.296 1.19e-07 ***
## agea82 0.37678 0.05617 6.707 2.01e-11 ***
## agea83 0.29440 0.05688 5.176 2.28e-07 ***
## agea84 0.43039 0.05746 7.490 7.02e-14 ***
## agea85 0.45192 0.05931 7.620 2.60e-14 ***
## agea86 0.42201 0.06095 6.923 4.47e-12 ***
## agea87 0.50033 0.06364 7.861 3.90e-15 ***
## agea88 0.27524 0.06725 4.093 4.27e-05 ***
## agea89 0.30149 0.06917 4.359 1.31e-05 ***
## agea90 0.46923 0.05832 8.046 8.78e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4939 on 39052 degrees of freedom
## (1028 observations deleted due to missingness)
## Multiple R-squared: 0.02197, Adjusted R-squared: 0.02009
## F-statistic: 11.7 on 75 and 39052 DF, p-value: < 2.2e-16
table(df$agea)
##
## 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## 107 380 371 389 460 409 404 410 397 461 439 429 415 447 479 496 468 523 604 579
## 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## 571 541 540 633 625 567 625 637 633 659 615 602 633 644 698 662 666 700 721 684
## 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
## 674 688 720 776 705 696 689 637 704 783 694 710 714 738 671 649 599 600 587 594
## 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## 575 518 485 428 375 377 314 308 282 265 212 184 148 118 103 243
class(df$agea)
## [1] "factor"
# Oops, why so many parameters? because each value for age is read as a dumy variable ( as revealed by class function)
# how to solve it - transform in numeric values, but cannot use as.numeric function directly, first transform in to characters, example = as.numeric(as.character(df$agea))
age_num = as.numeric(as.character(df$agea))
model_2= lm(cesd8 ~ age_num , data=df)
plot(age_num, df$cesd8)
abline(lm(cesd8 ~ age_num, data = df), col = "red")
plot(df$agea,df$cesd8)
# what the heck is happening here???
Describe association between depression and age of respondents by applying correlation and regression analysis- Add gender to develop a multivariate model for depression.
model_3 = lm(cesd8 ~ age_num+female, data=df)
summary(model_3)
##
## Call:
## lm(formula = cesd8 ~ age_num + female, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.86258 -0.35092 -0.08342 0.26825 2.43407
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.4784555 0.0077136 191.67 <2e-16 ***
## age_num 0.0029157 0.0001329 21.94 <2e-16 ***
## female 0.1217120 0.0049864 24.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.492 on 39125 degrees of freedom
## (1028 observations deleted due to missingness)
## Multiple R-squared: 0.02734, Adjusted R-squared: 0.02729
## F-statistic: 549.9 on 2 and 39125 DF, p-value: < 2.2e-16
{
plot(df$agea, df$cesd8, ylim = c(1, 2.5), pch = 16, col = "#ff000033")
abline(model_3)
}
# Age effect is significantly associated with depression (p<0,01)
# Gender effect is significant associated with depression (p<0,01)
# The plot did not seemed to be improved by transforming the age values in numeric