library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(table1)
##
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
##
## units, units<-
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
library(ggplot2)
library(ggpubr)
library(sjPlot)
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.
load('test1-1.RData')
ess_ger <- subset(ess, cntry == 'Germany')
class(ess_ger$conf)
## [1] "numeric"
class(ess_ger$gndr)
## [1] "factor"
class(ess_ger$eduy)
## [1] "numeric"
label(ess_ger$conf) = 'Level of conformity'
label(ess_ger$gndr) = 'Gender'
label(ess_ger$eduy) = 'Years of education'
t1kable(table1(~conf + gndr +
eduy, data = ess_ger))%>%
kable_styling(font_size = 16)
Overall | |
---|---|
(N=2358) | |
Level of conformity | |
Mean (SD) | 3.68 (1.13) |
Median [Min, Max] | 3.50 [1.00, 6.00] |
Missing | 53 (2.2%) |
Gender | |
Male | 1212 (51.4%) |
Female | 1146 (48.6%) |
Years of education | |
Mean (SD) | 14.3 (3.50) |
Median [Min, Max] | 14.0 [0, 30.0] |
Missing | 7 (0.3%) |
here I checked the class of variables to be sure, and provided some descriptive statistics. Mean value for conformity is 3.68, the distribution between genders are almost equal (51% of males and 49% of female), and mean years of education are 14, with the min of 0 and max of 30.
#Task 1 (4 points):
Estimate pairwise relationships between conformity values and gender and education (select appropriate tests, check the assumptions, interpret the results). Visualize the relationships between conformity values and other variables.
Conf - numeric and gndr - binary = t-test H0: the mean values of conformity do not differ between male and female groups
#first, we check the assumptions (equality of variances and normality of data)
var.test(ess_ger$conf~ess_ger$gndr)
##
## F test to compare two variances
##
## data: ess_ger$conf by ess_ger$gndr
## F = 0.94576, num df = 1183, denom df = 1120, p-value = 0.344
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.8424239 1.0615852
## sample estimates:
## ratio of variances
## 0.9457645
shapiro.test(ess_ger$conf[ess_ger$gndr == 'Female'])
##
## Shapiro-Wilk normality test
##
## data: ess_ger$conf[ess_ger$gndr == "Female"]
## W = 0.97248, p-value = 9.238e-14
shapiro.test(ess_ger$conf[ess_ger$gndr == 'Male'])
##
## Shapiro-Wilk normality test
##
## data: ess_ger$conf[ess_ger$gndr == "Male"]
## W = 0.96223, p-value < 2.2e-16
t.test(ess_ger$conf~ess_ger$gndr, var.equal = T)
##
## Two Sample t-test
##
## data: ess_ger$conf by ess_ger$gndr
## t = 2.1376, df = 2303, p-value = 0.03266
## alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
## 95 percent confidence interval:
## 0.008316332 0.193015129
## sample estimates:
## mean in group Male mean in group Female
## 3.724662 3.623996
effectsize::cohens_d(ess_ger$conf~ess_ger$gndr)
## Cohen's d | 95% CI
## ------------------------
## 0.09 | [0.01, 0.17]
##
## - Estimated using pooled SD.
ggplot(ess_ger, aes(x=gndr, y=conf)) +
geom_boxplot()+
theme_minimal(base_size = 16)+
ylab('conformity')+xlab('gender')+
ggpubr::stat_compare_means(method = "t.test")
Interpretation: Assumptions: p-value for var.test is bigger than 0.05 (0.344), thus we confirm the null hypothesis: the variances are equal, for shapiro.test both for female and male groups the p-values (9.238e-14 and < 2.2e-1 соответственно) are smaller than 0.05, thus the data is not distributed normally.
T-test: p-value is smaller than 0.05 (0.03266), thus we reject the null hypothesis, mean values for conformity differ between male and female groups, for female mean is ~ 3.63 and for male mean is 3.72, for male respondents mean of conformity is higher than for female.
Effect-size: effect size is very small (0.09), it is almost invisible (as 0.2 is a conventionally small effect size, and we have even smaller one)
Visualization: we can see that means differ between the two groups, which corresponds to the results of the formal test
Conf - numeric and eduy - numeric = correlation H0: there is no correlation between conformity values and years of education
cor.test(ess_ger$conf, ess_ger$eduy)
##
## Pearson's product-moment correlation
##
## data: ess_ger$conf and ess_ger$eduy
## t = -7.1582, df = 2299, p-value = 1.095e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1873879 -0.1074406
## sample estimates:
## cor
## -0.1476554
ggscatter(ess_ger, x = "eduy", y = "conf",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "eduy", ylab = "conf")
Interpretation: p-value is smaller than 0.05, thus we reject null hypothesis, there is an association between conformity and years of education. Correlation coefficient equals to approximately -0.15, so the correlation is small and negative: the more years of education a person has, the less conformity he has and vice versa. Visualization: on the plot we can indeed see that relationships between years of education and conformity is negative.
#Task 2 (6 points):
Estimate a model to predict conformity values by social characteristics (age, gender, area of residence, and number of years of education), as well as an interaction between age and gender. Interpret the coefficients and comment on model fit. Visualize the interaction between age and gender and comment on the result.
m1 <- lm(conf ~ domicil + eduy + age * gndr, data = ess_ger)
summary(m1)
##
## Call:
## lm(formula = conf ~ domicil + eduy + age * gndr, data = ess_ger)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.02691 -0.80889 0.05107 0.85483 2.91159
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.877900 0.143060 27.107 < 2e-16
## domicilSuburbs or outskirts of big city -0.081002 0.082927 -0.977 0.32878
## domicilTown or small city 0.007904 0.070912 0.111 0.91126
## domicilCountry village -0.014835 0.072923 -0.203 0.83881
## domicilFarm or home in countryside -0.295865 0.156675 -1.888 0.05910
## eduy -0.048172 0.006793 -7.092 1.76e-12
## age 0.011658 0.001667 6.993 3.51e-12
## gndrFemale 0.197281 0.128853 1.531 0.12589
## age:gndrFemale -0.006783 0.002439 -2.781 0.00546
##
## (Intercept) ***
## domicilSuburbs or outskirts of big city
## domicilTown or small city
## domicilCountry village
## domicilFarm or home in countryside .
## eduy ***
## age ***
## gndrFemale
## age:gndrFemale **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.104 on 2288 degrees of freedom
## (61 observations deleted due to missingness)
## Multiple R-squared: 0.05057, Adjusted R-squared: 0.04725
## F-statistic: 15.23 on 8 and 2288 DF, p-value: < 2.2e-16
Interpretation:
For intercept: The intercept is significant, meaning that for male 0 years old person living in a big sity with 0 years of education, the level of conformity equals to ~ 3.88
For domicil: We do not see a significant difference in conformity between ‘Suburbs or outskirts of big city’ and ‘A big city’ (baseline category). We also do not see a significant difference in conformity between any domicil category and baseline category in out model.
For eduy: The effect of years of education on conformity is significant (p-value 1.76e-12) and negative (approximately -0.05). So with each additional year of education the conformity decreases by ~ 0.05.
For age: As we see, the interaction effect between age and gender is significant, thus there is an interaction between these variables. For age the effect is significant and positive (0.01 for male, 0.01 - 0.006 = 0.004 for female). For males when age increases by 1, the level of conformity increases by 0.01, while for female when age increases by 1, the level of conformity increases by 0.004.
For gender: The effect for females compared to males on the conformity is not significantly different (p-value is 0.12589)
Model fit: if we look at R-squared, it is very small (0.05), which means that the model explain only 5% of out dependent variable (conformity), the adjusted R-squared is even a little smaller (0.047).
sjPlot::plot_model(m1, type = 'pred', terms = c('age', 'gndr'))
Here we see that for males the effect of age on conformity is stronger than for female. While at age of 0 for males the conformity level is lower than for female at 0.