Introduction

TEAM: Alex Stephenson, Anna Gorobtsova, Elizaveta Dyachenko, Marina Romanova

COUNTRY: Slovenia

YEAR: 2014

TOPIC: Health and Care

INDIVIDUAL CONTRIBUTION:

  • We chose the variables and interpreted the hypotheses and conclusions together. Also, each of us added his/her piece of information into the prehistory of our analysis.
  • Marina and Elizaveta were dealing with manipulations on marital status and a boxplot.
  • Marina also checked the homogeneity of variances, while Elizaveta was responsibel for the look of the project.
  • Anna checked the normality of residuals and run F-test.
  • Alex run post hoc test, non-parametric test and non-parametric post hoc test.

First, we download our dataset and all the needed packages.

library(foreign)
library(ggplot2)
library(dplyr)
library(car)
library(DescTools)
library(readr)
library(scales)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(data.table)

Slovenia <- read.spss("ESS7SI.sav", na.rm = T, to.data.frame = T, use.value.labels = T)
dim(Slovenia)
## [1] 1224  601

There are 1224 observations of 601 variables.

Variables

maritalb: Legal marital status
weight: Weight of respondent (kg)

Hypotheses

H0: Difference between mean weights for people of different marital statuses in Slovenia is not statistically significant.
H1: Difference between mean weights for people of different marital statuses in Slovenia is statistically significant.

We wanted to look at the relationship between weight and marital status - whether different marital status’ had a significant effect on differences in weight. Weight gain is a problem world wide and Slovenia is no exception - 69.5% and 57.7% of men and women respectively are overweight and 29.5 %and 27.8% obese. This is approximately double the worldwide average (Source: Source: WHO Global Health Observatory Data Repository (1)). Current research suggests that marriage is associated with weight gain and that divorce and widowhood are linked to weight loss (Umberson 1992, Sobal et al 2003) - the first argument is in line with our findings: there was a significant difference in the average weight of those who had married vs. those who had never married.

Further analysis might look to control age - for example, there are below the age of 25 there were 136 respondents who had never married (1/3 of the total amount), only 2 who had married and no divorcees or widows. It is possible, therefore, that age is playing a role in the lower weight of the never married cohort (given we know rates of obesity are higher in older people).

Marital Status

par(mar = c(5, 25, 2, 2))
barplot(table(Slovenia$maritalb)/nrow(Slovenia)*100, las = 2, horiz = T)  

Given our analysis is aiming to understand whether marital status has an effect on weight we have decided to merge variable “Legally married”" with variable “In a legally registered civil union” and variable “Legally divorced/civil union dissolved” with variable “Legally separated”. For the purpose of this analysis they are the same - a civil union is a non religious marriage certificate (often between homosexual couples whereby the church, or the law, prohibits marriages). Slovenia has recognized partnerships (Slovene: “partnerska zveza”“) since 24 February 2017. These provide same-sex partners with all the legal rights of marriages, with the exception of joint adoption and in vitro fertilisation. Previously, Slovenia had recognized the more limited”registrirana partnerska skupnost" for same-sex couples since 23 July 2006, which gave same-sex partners access to one another’s pensions and property.

Slovenia$maritalb1 <- ifelse(Slovenia$maritalb == "Legally married", "Married",
                             ifelse(Slovenia$maritalb == "In a legally registered civil union", "Married",
                                    ifelse(Slovenia$maritalb == "Legally divorced/civil union dissolved", "Divorced",
                                           ifelse(Slovenia$maritalb == "Legally separated", "Divorced",
                                                  ifelse(Slovenia$maritalb == "Widowed/civil partner died", "Widowed", "Never married")
                                           )
                                    )
                             )
)
par(mar = c(3,10,0,3))
barplot(table(Slovenia$maritalb1)/nrow(Slovenia)*100, las = 2, horiz = T, col=c(1,2,3,4))

Now we have four groups of comparable size, which is needed for ANOVA.

We make a separate data base and transfrom both variables into the variables of the right classes.

Slovenia1 <- select(Slovenia, maritalb1, weight)

Slovenia1$weight <- as.numeric(as.character(Slovenia1$weight))
Slovenia1$maritalb1 <- as.character(Slovenia1$maritalb1)  

Weight by Marital Status

describeBy(Slovenia1$weight, Slovenia1$maritalb1, mat = TRUE) %>% 
  select(Marital_Status = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max, 
                Skew=skew, Kurtosis=kurtosis, st.error = se) %>% 
  kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
        caption="Weight by Marital Status") %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE) 
Weight by Marital Status
Marital_Status N Mean SD Median Min Max Skew Kurtosis st.error
Divorced 58 78.50 13.76 78.0 48 105 -0.10 -1.02 1.81
Married 618 79.43 15.63 78.0 42 140 0.63 0.59 0.63
Never married 402 73.44 15.53 72.0 42 136 0.60 0.31 0.77
Widowed 106 75.21 13.58 74.5 40 112 0.44 0.35 1.32

The weight is normally distrubited within different marital statuses.

Then we create a boxplot.

ggplot(Slovenia1, aes(x = maritalb1, y = weight)) +
  geom_boxplot(aes(fill = maritalb1, alpha = 0.4)) +
  theme_classic() +
  labs(x = "Marital Status", y = "Weight", title = "The Distrubution of Weight according to Marital Status") +
  theme(legend.position="none")

From the boxplot, we see that the weight is distributed rather normally across the marital statuses and that it is slightly lower among the never married respondents.

ANOVA

ANOVA has three assumptions:
1. independence of observations
2. normal distribution of residuals
3. equal variances

Homogenity of variances

leveneTest(Slovenia1$weight ~ Slovenia1$maritalb1)  

P-value is 0.1783 therefore variances are equal. Thus, in ANOVA we should indicate var.equal = T.

F-test

oneway.test(Slovenia1$weight ~ Slovenia1$maritalb1, var.equal = T)
## 
##  One-way analysis of means
## 
## data:  Slovenia1$weight and Slovenia1$maritalb1
## F = 13.062, num df = 3, denom df = 1180, p-value = 2.146e-08
aov.out <- aov(Slovenia1$weight ~ Slovenia1$maritalb1) 
summary(aov.out) 
##                       Df Sum Sq Mean Sq F value   Pr(>F)    
## Slovenia1$maritalb1    3   9220  3073.2   13.06 2.15e-08 ***
## Residuals           1180 277629   235.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 40 observations deleted due to missingness

F(3, 1180) = 13.062 and p-value of <0.001 indicate that the difference in weight across marital status is statistically significant.

Normality of residuals

plot(aov.out, 2) 

Residual are normal.

Post hoc test

Next, we want to find out which of our variables create the significance. As our variances are equal we can run a Tukey test to find out.

par(mar = c(5, 10, 3, 1))
Tukey <- TukeyHSD(aov.out)
plot(Tukey, las = 2)  

As we can see from the graph, the pairwise tests that have significantly different means at a 95% confidence level are ‘Married’ and ‘Never married’ and ‘Widowed’ and ‘Married’. This fits the sociological theory that marrying and being widowed cause a change in weight.

Double-checking with non-parametric test

Non-parametric test for ANOVA is the Kruskal-Wallis test.
H0: Mean ranks of people from different marital satuses are the same.

Slovenia1$maritalb2 <- as.factor(Slovenia1$maritalb1)
kruskal.test(Slovenia1$weight ~ Slovenia1$maritalb2)  
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Slovenia1$weight by Slovenia1$maritalb2
## Kruskal-Wallis chi-squared = 38.916, df = 3, p-value = 1.808e-08

With KW chi-square(3) = 38.916 and p-value < 0.001, we can state that the mean ranks of different marital statuses are not the same. This results confirm our results in the ANOVA test.

As Kruskal-Wallis test is significant, we run non-parametric post hoc test - Dunn’s test.

DunnTest(weight ~ maritalb2, data = Slovenia1) 
## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##                        mean.rank.diff    pval    
## Married-Divorced             2.154196  0.9634    
## Never married-Divorced    -130.326471  0.0331 *  
## Widowed-Divorced           -84.496259  0.3902    
## Never married-Married     -132.480667 8.7e-09 ***
## Widowed-Married            -86.650455  0.0635 .  
## Widowed-Never married       45.830212  0.4388    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Applying Dunn’s non-paramtric test, we can say that there is statistically significant difference between ‘Never Married - Divorced’, ‘Never married - Married’, and ‘Widowed-Married’, which actually proves the results of Kruskal-Wallis test.