We are going to use ANOVA in order to explore subjective wellbeing in Belgium.
Research question: How does the extent of being hampered in daily activities by a certain disease vary according to the age? This topic is connected with sociology of medicine, which explore the influence of certain diseases on everyday life, on communication with people around and subjective well-being as well.
Variables.
In order to answer our research question, we have chosen two variables: 1 categorical and 1 continuous.
ggplot() +
geom_boxplot(data = BE, aes(x = hlthhmp, y = agea1 ), col = "#E52B50", fill = "#F0F8FF") +
ylab("Age") +
xlab("Hampered in daily activities by illness") +
ggtitle("The extent of being hampered by illness dependin on age") +
theme_bw()
After visualising our data we got three boxplots. The medians of three groups of respondets, who is hampered by illnesses in different extent, are not equal. The older the person, the more hampered he or shee feels. Besides, there are no outliers, so we are lucky.
As we don`t have a lot of observations, we will check the normality of distribution with the help of Shapiro test.
null (\(H_0\)): the distribution is normal.
alternative (\(H_1\)): the distribution is not normal.
shapiro.test(BE$agea1)
##
## Shapiro-Wilk normality test
##
## data: BE$agea1
## W = 0.97279, p-value < 2.2e-16
Conclusion: As P-value is extremely small, the null hypothesis can be rejected. So, our distribution is not normal. That is why we desided to build a histogram to look at the distribution of respondent’s age.
ggplot() +
geom_histogram(data = BE, aes(x = agea1), binwidth = 1, fill="#008080", col="#483D8B", alpha = 0.5)+
ggtitle("") +
theme_bw()
However, according to this histogram, we can say that distribution is quit simmilar to normal, but is is a positively skewed, which means that there are less elderly respondents.
Will check the homogenity of variances with the help of Bartlett test.
null (\(H_0\)): the variances are equal.
alternative (\(H_1\)): the variances are not equal.
bartlett.test(BE$agea1 ~ BE$hlthhmp)
##
## Bartlett test of homogeneity of variances
##
## data: BE$agea1 by BE$hlthhmp
## Bartlett's K-squared = 1.1168, df = 2, p-value = 0.5721
leveneTest(BE$agea1~as.factor(BE$hlthhmp))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.259 0.2842
## 1762
Conclusion: According to Bartlett test, p-value is bigger than 0.05. So, the null hypothesis cannot be rejected and variances are equal.
Because of the fact that we have equal variances, we can use either one-way or aov test to analyse variances. We decided to use one-way test.
null (\(H_0\)): the means between groups are equal and there are no differences between people of different age in the extent of being hampered by illnesses in daily activities.
alternative (\(H_1\)): the means between groups are not equal and there are differences between people of different age in the extent of being hampered by illnesses in daily activities.
oneway.test(BE$agea1~as.factor(BE$hlthhmp), var.equal = TRUE)
##
## One-way analysis of means
##
## data: BE$agea1 and as.factor(BE$hlthhmp)
## F = 59.1, num df = 2, denom df = 1762, p-value < 2.2e-16
F-ratio=59.1. F-ratio is big enough, so it represents significant differences between means. P-value is extremely small and the null hypothesis can be rejected, so means between our groups are not equal.
Conclusion: According to results of ANOVA, we can conclude that the extent in which people suffer from diseases in daily activities is associated with their age.
aov.out <- aov(BE$agea1 ~ as.factor(BE$hlthhmp))
plot(TukeyHSD(aov.out), las = 2)
TukeyHSD(aov.out)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BE$agea1 ~ as.factor(BE$hlthhmp))
##
## $`as.factor(BE$hlthhmp)`
## diff lwr upr p adj
## Yes to some extent-Yes a lot -7.399901 -11.99027 -2.809535 0.0004729
## No-Yes a lot -15.611704 -19.76375 -11.459659 0.0000000
## No-Yes to some extent -8.211802 -10.79221 -5.631390 0.0000000
As ANOVA is significant, we can appy post hoc test. As we have equal variances, we can not use Bonferroni or Games-Howell tests. That is why we used Tukey HSD in order to perform multiple pairwise-comparison between the means of three groups.
Conclusion: according to Tukey HSD, differences between all groups are not significant with the extremely small adjusted p-values. However, we can say that the smallest difference is between those respondents who answered whether they are hampered by illnesses “Yes” and “Yes to some extent” with adjusted p-value 0,0004. At the same time, the biggest difference is between those respondent who answered “No” and “Yes a lot”.
Overall, in spite of the fact that these resuls might seem pretty obvious, it was a very useful training of applying ANOVA test. Moreover, if to talk about future researches, it will be interesting to compare more detailed data about the extent of being hampered by illnesses, for example between different social groups. Thanks!