Team: Zdravstvuyte!

Team members: Batozhargalova Dari, Istomina Anastasia, Kyrbasova Galina (BSC183).

Topic: Perception of ideal timing of life in the United Kingdom.

Data: European Social Survey, Round 9 (2018), country - United Kingdom, theme - Timing of life.

Project 1. Describing the data.

In this project we are going to analyze variables related to our topic describing their scales of measurement, visualizing them in graphs and looking for some correlations. We are interested in what the perception of ideal timing of life people have in the United Kingdom and what it depends on. For this purpose, we have chosen 17 variables from the original dataset, calculated central tendency measures for numeric ones and created some graphs for clear understanding of their distributions and coming up with logical conclusions.

Members of the team contributed most to: Identification of variable types, descriptives’ table, 1 histogram, scatterplots, stacked barplots - Istomina Anastasia. Identification of variable types, barplots, scatterplots - Kyrbasova Galina. Identification of variable types, boxplots, report and overall design - Batozhargalova Dari.

Let’s get started!

Describing variables.

First of all, we load the libraries we need for analysis and the ESS9 data, and choose variables relevant for our topic.

library(foreign)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)
library(stringr)

ESS9 <- read.spss("C:/Users/Admin/Documents/ESS9GB.sav", use.value.labels = T, to.data.frame = T)

dataset <- ESS9 %>% 
  select(agea, gndr, evmar, bthcld, iagmr, iagpnt, iagrtr, iaglptn, ageadlt, anvcld, alvgptn, rlgdngb, prtclcgb,  yrbrn, bthcld, maryr, lvptnyr, fcldbrn)

Let’s have a look at the variables we have chosen!

Number <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17")
Label <- c("agea", "gndr", "yrbrn", "prtclcgb", "rlgdngb", "evmar", "maryr", "bthcld", "fcldbrn", "lvptnyr", "iaglptn", "iagmr", "iagpnt", "iagrtr", "ageadlt", "anvcld", "alvgptn")
Meaning <- c("Age of respondent", "Gender", "Year of birth", "Political affiliation", "Religion", "Are or ever been married", "Year first married", "Ever given birth to/fathered a child", "Year first child was born", "Year first lived with a partner for 3 months or more", "Ideal age to start living with partner not married to", "Ideal age for getting married", "Ideal age for becoming parents", "Ideal age for retiring", "Age to become adults", "Approvement of never having children", "Approvement of living with partner not married to")
Level_Of_Measurement <- c("Ratio", "Nominal", "Interval", "Nominal", "Nominal", "Nominal", "Interval", "Nominal", "Interval", "Interval", "Ratio", "Ratio", "Ratio", "Ratio", "Ratio", "Ordinal", "Ordinal")
var_table <- data.frame(Number, Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)

kable(var_table) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Number Label Meaning Level_Of_Measurement
1 agea Age of respondent Ratio
2 gndr Gender Nominal
3 yrbrn Year of birth Interval
4 prtclcgb Political affiliation Nominal
5 rlgdngb Religion Nominal
6 evmar Are or ever been married Nominal
7 maryr Year first married Interval
8 bthcld Ever given birth to/fathered a child Nominal
9 fcldbrn Year first child was born Interval
10 lvptnyr Year first lived with a partner for 3 months or more Interval
11 iaglptn Ideal age to start living with partner not married to Ratio
12 iagmr Ideal age for getting married Ratio
13 iagpnt Ideal age for becoming parents Ratio
14 iagrtr Ideal age for retiring Ratio
15 ageadlt Age to become adults Ratio
16 anvcld Approvement of never having children Ordinal
17 alvgptn Approvement of living with partner not married to Ordinal

As it can be seen from the table above, all in all, we have 17 variables with different scales of measurement: ratio, interval, ordinal and nominal. Further on, let’s calculate central tendency measures for them.

Calculating central tendency measures.

We are going to count means, medians and modes only for ratio variables as it does not make any sense for others. Since all of them are factors, we convert ratio ones into characters and then numerics.

dataset$agea <- as.numeric(as.character(dataset$agea))
dataset$iagmr <- as.numeric(as.character(dataset$iagmr))
dataset$iagpnt <- as.numeric(as.character(dataset$iagpnt))
dataset$iagrtr <- as.numeric(as.character(dataset$iagrtr))
dataset$iaglptn <- as.numeric(as.character(dataset$iaglptn))
dataset$ageadlt <- as.numeric(as.character(dataset$ageadlt))

In order to get mode values for the variables, we create a special function. Functions of median and mean exist by default.

datatable <- dataset %>% 
  select(agea, iagmr, iagpnt, iagrtr, iaglptn, ageadlt)
datatable <- na.omit(datatable)

mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
} 

var.agea <- c(round(mean(datatable$agea), 2), mode(datatable$agea), median(datatable$agea))
names(var.agea) <- c("mean", "mode", "median")
var.iagmr <- c(round(mean(datatable$iagmr), 2), mode(datatable$iagmr), median(datatable$iagmr))
var.iagpnt <- c(round(mean(datatable$iagpnt), 2), mode(datatable$iagpnt), median(datatable$iagpnt))
var.iagrtr <- c(round(mean(datatable$iagrtr), 2), mode(datatable$iagrtr), median(datatable$iagrtr))
var.iaglptn <- c(round(mean(datatable$iaglptn), 2), mode(datatable$iaglptn), median(datatable$iaglptn))
var.ageadlt <- c(round(mean(datatable$ageadlt), 2), mode(datatable$ageadlt), median(datatable$ageadlt))

table <- data.frame(var.agea, var.iagmr, var.iagpnt, var.iagrtr, var.iaglptn, var.ageadlt, stringsAsFactors = FALSE)
kable(table) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
var.agea var.iagmr var.iagpnt var.iagrtr var.iaglptn var.ageadlt
mean 51.64 24.97 26.32 62.56 21.33 19.21
mode 56.00 25.00 25.00 60.00 18.00 18.00
median 52.00 25.00 26.00 63.00 21.00 18.00

Here we see some statistics on the variables we have - for example, the average age of respondents was little above 50 years old. To explore the data more deeply, check the graphs we constructed!

Creating graphs.

Histograms

  1. As we said before, on average, our respondents are about 52 years old. But what about the whole distribution of age?
ggplot(datatable, aes(agea)) + geom_histogram(binwidth = 5, color = "black", fill = "#FFD39B", alpha = 0.5) + geom_vline(aes(xintercept = median(datatable$agea), color = 'median'), linetype="solid", size=1) +
  geom_vline(aes(xintercept = mean(datatable$agea), color = 'mean'), linetype="solid", size=1) + geom_vline(aes(xintercept = mode(datatable$agea), color = 'mode'), linetype="solid", size=1) +
  theme_bw() +  scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd", mode = "#339666")) + labs(title = "Distribution of age", x = "Age of respondent", y = "Number of people") 

Age of respondents is almost normally distributed, but left-skewed, so there are more people aged above 50s than under in our data.

  1. In order to explore what the concept of ideal timing of life means for people in UK, let’s discover when, they think, it is time to become adults and be considered as such.
ggplot(datatable, aes(ageadlt)) + geom_histogram(fill = "plum", color = "black", alpha = 0.5) + xlim(c(10,30)) + geom_vline(aes(xintercept = median(datatable$ageadlt), color = 'median'), linetype="solid", size=1) + geom_vline(aes(xintercept = mean(datatable$ageadlt), color = 'mean'), linetype="solid", size=1) + geom_vline(aes(xintercept = mode(datatable$ageadlt), color = 'mode'), linetype="solid", size=1) + theme_bw() + scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd", mode = "#339666")) + labs(title = "At what age do people become adults?", x = "Age", y = "Number of people")

Conclusion №1: the distribution of age for becoming adults in the Uk is right-skewed which means that respondents tend to think people grow up earlier than 20s - they report “18” way much frequently.

Boxplots

Further on, we are interested in what age people perceive as perfect for such life events as first living with a partner for a long period of time, getting married, having a first child and get retired from work. Besides, we wonder if this opinion differs in different age categories of people: teenagers (under 19), young adults (19-34), middle-aged (35-54), the elderly (55-64) and aged people (65 and above). We think boxplots are better visualization for this. So, let’s have a look at them.

  1. What different age groups of people in the UK think about the ideal age to start living with a partner not married to?
datatable$agecat <- ifelse(datatable$agea < 19, 
                         "under 19",
                         ifelse(datatable$agea >= 19 & datatable$agea < 35, 
                                "19-34",
                                ifelse(datatable$agea >= 35 & datatable$agea < 55,
                                       "35-54",
                                       ifelse(datatable$agea >= 55 & datatable$agea < 65,
                                              "55-64",
                                              "65 and above"))))
datatable$agecat <- factor(datatable$agecat, levels = c("under 19", "19-34", "35-54", "55-64", "65 and above"))

ggplot(datatable, aes(x = agecat, y = iaglptn)) + geom_boxplot(fill = "mistyrose1", color = "black") + labs(title = "At what age should people live with partner for 3 months or more for the first time?", x = "Age categories of respondents", y = "Age to live with a partner") + theme_bw()

Conclusion №2: While teenagers basically agree that ideal age should be around 20, the opinion of young adults is dispersed more to the older years and increases up to 23. The following middle-aged and elderly groups are noticeably more “conservative”: in general, they tend to overstate the age up to 25 at maximum point at 33. It means that perception of ideal age for living with a partner grows with people getting older. But, surprisingly, by the age of 65 and above the interquartile range of ideal age narrows again to 19-23 but the median value does not change.

  1. What different age groups of people in the UK think about the ideal age for getting married?
ggplot(datatable, aes(x = agecat, y = iagmr)) + geom_boxplot(fill = "mistyrose1", color = "black") + labs(title = "At what age should people get married?", x = "Age categories of respondents", y = "Age to get married") + theme_bw()

Conclusion №3: The ideal age for getting married is not changing dramatically across different age groups - the median values for all of them remains 25 years. However, interquartile range is the highest in the group of 35-54 years old - it is up to the point of 30 - and decreases on both sides of the hotizontal scale. Thus, teenagers are the most “liberal” group as they believe people can already get married in 20, the second place is occupied by 65 and above years old, and young adults and the elderly do not differ in their opinion: the ideal age range for them is from 23 to 27.

  1. What different age groups of people in the UK think about the ideal age for having a first child?
ggplot(datatable, aes(x = agecat, y = iagpnt)) + geom_boxplot(fill = "mistyrose1", color = "black") + 
  labs(title = "At what age should people give birth/father a child?", x = "Age categories of respondents", y = "Age to have a first child") + theme_bw()

Conclusion №4: In general, the ideal age for having a first child starts at 23 years for people in the UK but median values increase again by the middle-aged group up to 28 years and decreases by the aged people to 25.

  1. What different age groups of people in the UK think about the ideal age for retiring from work?
ggplot(datatable, aes(x = agecat, y = iagrtr)) + geom_boxplot(fill = "mistyrose1", color = "black") + labs(title = "At what age should people retire from their work?", x = "Age categories of respondents", y = "Age to get retired") + theme_bw()

Conclusion №5: The interquartile range does not change at all across all groups, except teenagers - they think people should retire at 58 and older, while others believe the better age would be 60 and older. Nevertheless, the median values vary from 60 for middle-aged and the elderly to 63 for young adults and aged people.

Barplots

Now we want to know whether, for example, ideal age for getting married or having a first child differs by political affiliations of the British.

  1. What do people of different political views think on average about ideal age to get married?
  2. What do people of different political views think on average about ideal age to become a mother/father?
politics <- dataset %>% 
  select(prtclcgb, iagmr, iagpnt) 
politics$iagmr <- as.numeric(as.character(politics$iagmr))
politics$iagpnt <- as.numeric(as.character(politics$iagpnt))
politics$prtclcgb <- as.character(politics$prtclcgb)
politics <- na.omit(politics)
politics$prtclcgb <- str_replace_all(politics$prtclcgb, "nir", "NI")
politics$prtclcgb <- str_replace_all(politics$prtclcgb, "Sinn FГ©in", "Sinn Fein")

politics1 <- politics %>% 
  group_by(prtclcgb) %>%
  summarize(iagmr_mean = mean(iagmr))

ggplot(politics1, aes(x = reorder(prtclcgb, -iagmr_mean), y = iagmr_mean)) + geom_col(fill = "darkseagreen2", color = "black") + coord_flip() + labs(title = "1. Average ideal age to get married \nby respondents' political affiliation", x = "Political affiliation", y = "Average ideal age to get married") + theme_bw()

politics2 <- politics %>%
  group_by(prtclcgb) %>%
  summarize(iagpnt_mean = mean(iagpnt))

ggplot(politics2, aes(x = reorder(prtclcgb, -iagpnt_mean), y = iagpnt_mean)) + geom_col(fill = "lightskyblue", color = "black") + coord_flip() + labs(title = "2. Average ideal age to become mother/father \nby respondents' political affiliation", x = "Political affiliation", y = "Average ideal age to become mother/father") + theme_bw()

(NI) indicates political parties of Nothern Ireland.

Conclusion №6: Speaking about marriage, progressive Green Party of Nothern Ireland is the most “liberal” in this sense since it has the minimum ideal age around 18 years. To the contrary, adherents of Playd Cymru which is a nationalist party of Wales believe that people should marry at 30-31. For having a first child, the picture hardly ever changes: although some reshuffling can be noticed, e.g. irish party of Sinn Fein considers 26-27 as ideal age for marriage but 24 for giving birth to/fathering a child, the opinion of Green Party NI and Playd Cymru remains the same.

Stacked barplots

Later on, let’s see if there is a difference in approval of not having children and living with a partner not married to in different religious groups.

  1. Do people of different religion in the UK approve never having children?
  2. Do people of different religion in the UK approve living with a partner not married to?
religion <- dataset %>% 
  filter(is.na(rlgdngb) == F, is.na(anvcld) == F, is.na(alvgptn) == F) 

ggplot(religion, aes(x = anvcld, fill = rlgdngb)) + geom_bar(position = "fill") + coord_flip() + labs(title = "Do you approve people \nnot having children?", x = "Approvement of never having children", y = "Share population") + scale_fill_discrete(name = "Religion") + theme_bw() 

ggplot(religion, aes(x = alvgptn, fill = rlgdngb)) + geom_bar(position = "fill") + coord_flip() + labs(title = "Do you apprrove people \nliving with a partner \nthey are not married to?", x = "Approvement of living with partner not married to", y = "Share population") + scale_fill_discrete(name = "Religion") + theme_bw()

Conclusion №7: As it can be seen from the two stacked barplots above, Church of England tends to be the most “liberal” in approval of people not having any child or living with a partner they are not married to. The most “conservative” religious group seems to be Muslims as they tend to strongly disapprove.

Scatter plots

Now we want to explore gender differences in perceptions of ideal age and real age when they started living with a partner, get married and gave birth to/fathered a child. Let’s have a look at our fancy scatter plots!:)

scatter <- dataset %>%
  select(lvptnyr, yrbrn, gndr, iaglptn, maryr, iagmr, fcldbrn, iagpnt)
scatter$lvptnyr <- as.numeric(as.character(scatter$lvptnyr))
scatter$yrbrn <- as.numeric(as.character(scatter$yrbrn))
scatter$iaglptn <- as.numeric(as.character(scatter$iaglptn))
scatter$maryr <- as.numeric(as.character(scatter$maryr))
scatter$iagmr <- as.numeric(as.character(scatter$iagmr))
scatter$fcldbrn <- as.numeric(as.character(scatter$fcldbrn))
scatter$iagpnt <- as.numeric(as.character(scatter$iagpnt))
scatter <- na.omit(scatter)

scatter1 <- scatter %>%
  mutate(real_age = lvptnyr - yrbrn) 
scatter1$gndr <- relevel(scatter1$gndr, ref = "Female") 

 ggplot(scatter1, aes(x = real_age, y = iaglptn, color = gndr)) + geom_point() + geom_smooth(method = "lm") + theme_bw() + labs(title = "1. Comparison of ideal and real age to start living with a partner", x = "Real age of respondents when they started living with a partner", y = "Ideal age to start living with a partner")

scatter2 <- scatter %>%
  mutate(real_age = maryr - yrbrn)
scatter2$gndr <- relevel(scatter2$gndr, ref = "Female") 

ggplot(scatter2, aes(x = real_age, y = iagmr, color = gndr)) + geom_point() + geom_smooth(method = "lm") + labs(title = "2. Comparison of ideal and real age to get married", x = "Real age of respondents when they married", y = "Ideal age to get married") + theme_bw()

scatter3 <- scatter %>%
  mutate(real_age = fcldbrn - yrbrn)
scatter3$gndr <- relevel(scatter3$gndr, ref = "Female")

ggplot(scatter3, aes(x = real_age, y = iagpnt, color = gndr)) + geom_point() + geom_smooth(method = "lm") + labs(title = "3. Comparison of ideal and real age to become mother/father", x = "Real age of respondents when they became mother/father", y = "Ideal age to become mother/father") + theme_bw()

Conclusion №8: Looking at the graphs, we see there is a positive dependence of ideal age on the real age of the event for both males and females. However, all the scatter plots show males thinking the ideal age is something later than females - their trend line is higher and steeper - while females get into these life events later than males as their points are located more to the right.