The Boring Team: Alex Stephenson, Anna Gorobtsova, Elizaveta Dyachenko, Marina Romanova Country: Slovenia Year: 2014
First, we download our dataset and all the needed packages.
library(foreign)
library(ggplot2)
library(dplyr)
library(car)
library(DescTools)
library(readr)
library(scales)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(data.table)
Slovenia <- read.spss("ESS7SI.sav", na.rm = T, to.data.frame = T, use.value.labels = T)
dim(Slovenia)## [1] 1224 601
There are 1224 observations of 601 variables.
maritalb: Legal marital status
weight: Weight of respondent (kg)
H0: Difference between mean weights for people of different marital statuses in Slovenia is not statistically significant.
We wanted to look at the relationship between weight and marital status - whether different marital status’ had a significant effect on differences in weight. Weight gain is a problem world wide and Slovenia is no exception - 69.5% and 57.7% of men and women respectively are overweight and 29.5 %and 27.8% obese. This is approximately double the worldwide average (Source: Source: WHO Global Health Observatory Data Repository (1)). Current research suggests that marriage is associated with weight gain and that divorce and widowhood are linked to weight loss (Umberson 1992, Sobal et al 2003) - the first argument is in line with our findings: there was a significant difference in the average weight of those who had married vs. those who had never married.
Further analysis might look to control age - for example, there are below the age of 25 there were 136 respondents who had never married (1/3 of the total amount), only 2 who had married and no divorcees or widows. It is possible, therefore, that age is playing a role in the lower weight of the never married cohort (given we know rates of obesity are higher in older people).
par(mar = c(5, 25, 2, 2))
barplot(table(Slovenia$maritalb)/nrow(Slovenia)*100, las = 2, horiz = T) Given our analysis is aiming to understand whether marital status has an effect on weight we have decided to merge variable “Legally married”" with variable “In a legally registered civil union” and variable “Legally divorced/civil union dissolved” with variable “Legally separated”. For the purpose of this analysis they are the same - a civil union is a non religious marriage certificate (often between homosexual couples whereby the church, or the law, prohibits marriages). Slovenia has recognized partnerships (Slovene: “partnerska zveza”“) since 24 February 2017. These provide same-sex partners with all the legal rights of marriages, with the exception of joint adoption and in vitro fertilisation. Previously, Slovenia had recognized the more limited”registrirana partnerska skupnost" for same-sex couples since 23 July 2006, which gave same-sex partners access to one another’s pensions and property.
Slovenia$maritalb1 <- ifelse(Slovenia$maritalb == "Legally married", "Married",
ifelse(Slovenia$maritalb == "In a legally registered civil union", "Married",
ifelse(Slovenia$maritalb == "Legally divorced/civil union dissolved", "Divorced",
ifelse(Slovenia$maritalb == "Legally separated", "Divorced",
ifelse(Slovenia$maritalb == "Widowed/civil partner died", "Widowed", "Never married")
)
)
)
)par(mar = c(3,10,0,3))
barplot(table(Slovenia$maritalb1)/nrow(Slovenia)*100, las = 2, horiz = T, col=c(1,2,3,4))Now we have four groups of comparable size, which is needed for ANOVA.
We make a separate data base and transfrom both variables into the variables of the right classes.
Slovenia1 <- select(Slovenia, maritalb1, weight)
Slovenia1$weight <- as.numeric(as.character(Slovenia1$weight))
Slovenia1$maritalb1 <- as.character(Slovenia1$maritalb1) ggplot(Slovenia1, aes(x = weight)) +
geom_density(alpha = 0.4, fill = "blue") summary(Slovenia1$weight)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 40.00 65.00 75.00 76.93 86.00 140.00 23
The weight is normally distributed
describeBy(Slovenia1$weight, Slovenia1$maritalb1, mat = TRUE) %>%
select(Marital_Status = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max,
Skew=skew, Kurtosis=kurtosis, st.error = se) %>%
kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
caption="Weight by Marital Status") %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE) | Marital_Status | N | Mean | SD | Median | Min | Max | Skew | Kurtosis | st.error |
|---|---|---|---|---|---|---|---|---|---|
| Divorced | 58 | 78.50 | 13.76 | 78.0 | 48 | 105 | -0.10 | -1.02 | 1.81 |
| Married | 618 | 79.43 | 15.63 | 78.0 | 42 | 140 | 0.63 | 0.59 | 0.63 |
| Never married | 402 | 73.44 | 15.53 | 72.0 | 42 | 136 | 0.60 | 0.31 | 0.77 |
| Widowed | 106 | 75.21 | 13.58 | 74.5 | 40 | 112 | 0.44 | 0.35 | 1.32 |
The weight is normally distrubited within different marital statuses.
Then we create a boxplot.
ggplot(Slovenia1, aes(x = maritalb1, y = weight)) +
geom_boxplot(aes(fill = maritalb1, alpha = 0.4)) +
theme_classic() +
labs(x = "Marital Status", y = "Weight", title = "The Distrubution of Weight according to Marital Status") +
theme(legend.position="none")From the boxplot, we see that the weight is distributed rather normally across the marital statuses and that it is slightly lower among the never married respondents.
ANOVA has three assumptions:
1. independence of observations
2. normal distribution of residuals
3. equal variances
leveneTest(Slovenia1$weight ~ Slovenia1$maritalb1) P-value is 0.1783 therefore variances are equal. Thus, in ANOVA we should indicate var.equal = T.
oneway.test(Slovenia1$weight ~ Slovenia1$maritalb1, var.equal = T)##
## One-way analysis of means
##
## data: Slovenia1$weight and Slovenia1$maritalb1
## F = 13.062, num df = 3, denom df = 1180, p-value = 2.146e-08
aov.out <- aov(Slovenia1$weight ~ Slovenia1$maritalb1)
summary(aov.out) ## Df Sum Sq Mean Sq F value Pr(>F)
## Slovenia1$maritalb1 3 9220 3073.2 13.06 2.15e-08 ***
## Residuals 1180 277629 235.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 40 observations deleted due to missingness
F(3, 1180) = 13.062 and p-value of <0.001 indicate that the difference in weight across marital status is statistically significant.
plot(aov.out, 2) Residual are normal.
Next, we want to find out which of our variables create the significance. As our variances are equal we can run a Tukey test to find out.
par(mar = c(5, 10, 3, 1))
Tukey <- TukeyHSD(aov.out)
plot(Tukey, las = 2) As we can see from the graph, the pairwise tests that have significantly different means at a 95% confidence level are ‘Married’ and ‘Never married’ and ‘Widowed’ and ‘Married’. This fits the sociological theory that marrying and being widowed cause a change in weight.
Non-parametric test for ANOVA is the Kruskal-Wallis test.
H0: Mean ranks of people from different marital satuses are the same.
Slovenia1$maritalb2 <- as.factor(Slovenia1$maritalb1)
kruskal.test(Slovenia1$weight ~ Slovenia1$maritalb2) ##
## Kruskal-Wallis rank sum test
##
## data: Slovenia1$weight by Slovenia1$maritalb2
## Kruskal-Wallis chi-squared = 38.916, df = 3, p-value = 1.808e-08
With KW chi-square(3) = 38.916 and p-value < 0.001, we can state that the mean ranks of different marital statuses are not the same. This results confirm our results in the ANOVA test.
As Kruskal-Wallis test is significant, we run non-parametric post hoc test - Dunn’s test.
DunnTest(weight ~ maritalb2, data = Slovenia1) ##
## Dunn's test of multiple comparisons using rank sums : holm
##
## mean.rank.diff pval
## Married-Divorced 2.154196 0.9634
## Never married-Divorced -130.326471 0.0331 *
## Widowed-Divorced -84.496259 0.3902
## Never married-Married -132.480667 8.7e-09 ***
## Widowed-Married -86.650455 0.0635 .
## Widowed-Never married 45.830212 0.4388
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Applying Dunn’s non-paramtric test, we can say that there is statistically significant difference between ‘Never Married - Divorced’, ‘Never married - Married’, and ‘Widowed-Married’, which actually proves the results of Kruskal-Wallis test.