Team: Zdravstvuyte!
Team members: Batozhargalova Dari, Istomina Anastasia, Kyrbasova Galina (BSC183).
Topic: Perception of ideal timing of life in the United Kingdom.
Data: European Social Survey, Round 9 (2018), country - United Kingdom, theme - Timing of life.
In this project we are going to analyze variables related to our topic describing their scales of measurement, visualizing them in graphs and looking for some correlations. We are interested in what the perception of ideal timing of life people have in the United Kingdom and what it depends on. For this purpose, we have chosen 17 variables from the original dataset, calculated central tendency measures for numeric ones and created some graphs for clear understanding of their distributions and coming up with logical conclusions.
Members of the team contributed most to: Identification of variable types, descriptives’ table, 1 histogram, scatterplots, stacked barplots - Istomina Anastasia. Identification of variable types, barplots, scatterplots - Kyrbasova Galina. Identification of variable types, boxplots, report and overall design - Batozhargalova Dari.
Let’s get started!
First of all, we load the libraries we need for analysis and the ESS9 data, and choose variables relevant for our topic.
library(foreign)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)
library(stringr)
ESS9 <- read.spss("C:/Users/Admin/Documents/ESS9GB.sav", use.value.labels = T, to.data.frame = T)
dataset <- ESS9 %>%
select(agea, gndr, evmar, bthcld, iagmr, iagpnt, iagrtr, iaglptn, ageadlt, anvcld, alvgptn, rlgdngb, prtclcgb, yrbrn, bthcld, maryr, lvptnyr, fcldbrn)
Let’s have a look at the variables we have chosen!
Number <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17")
Label <- c("agea", "gndr", "yrbrn", "prtclcgb", "rlgdngb", "evmar", "maryr", "bthcld", "fcldbrn", "lvptnyr", "iaglptn", "iagmr", "iagpnt", "iagrtr", "ageadlt", "anvcld", "alvgptn")
Meaning <- c("Age of respondent", "Gender", "Year of birth", "Political affiliation", "Religion", "Are or ever been married", "Year first married", "Ever given birth to/fathered a child", "Year first child was born", "Year first lived with a partner for 3 months or more", "Ideal age to start living with partner not married to", "Ideal age for getting married", "Ideal age for becoming parents", "Ideal age for retiring", "Age to become adults", "Approvement of never having children", "Approvement of living with partner not married to")
Level_Of_Measurement <- c("Ratio", "Nominal", "Interval", "Nominal", "Nominal", "Nominal", "Interval", "Nominal", "Interval", "Interval", "Ratio", "Ratio", "Ratio", "Ratio", "Ratio", "Ordinal", "Ordinal")
var_table <- data.frame(Number, Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)
kable(var_table) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
| Number | Label | Meaning | Level_Of_Measurement |
|---|---|---|---|
| 1 | agea | Age of respondent | Ratio |
| 2 | gndr | Gender | Nominal |
| 3 | yrbrn | Year of birth | Interval |
| 4 | prtclcgb | Political affiliation | Nominal |
| 5 | rlgdngb | Religion | Nominal |
| 6 | evmar | Are or ever been married | Nominal |
| 7 | maryr | Year first married | Interval |
| 8 | bthcld | Ever given birth to/fathered a child | Nominal |
| 9 | fcldbrn | Year first child was born | Interval |
| 10 | lvptnyr | Year first lived with a partner for 3 months or more | Interval |
| 11 | iaglptn | Ideal age to start living with partner not married to | Ratio |
| 12 | iagmr | Ideal age for getting married | Ratio |
| 13 | iagpnt | Ideal age for becoming parents | Ratio |
| 14 | iagrtr | Ideal age for retiring | Ratio |
| 15 | ageadlt | Age to become adults | Ratio |
| 16 | anvcld | Approvement of never having children | Ordinal |
| 17 | alvgptn | Approvement of living with partner not married to | Ordinal |
As it can be seen from the table above, all in all, we have 17 variables with different scales of measurement: ratio, interval, ordinal and nominal. Further on, let’s calculate central tendency measures for them.
We are going to count means, medians and modes only for ratio variables as it does not make any sense for others. Since all of them are factors, we convert ratio ones into characters and then numerics.
dataset$agea <- as.numeric(as.character(dataset$agea))
dataset$iagmr <- as.numeric(as.character(dataset$iagmr))
dataset$iagpnt <- as.numeric(as.character(dataset$iagpnt))
dataset$iagrtr <- as.numeric(as.character(dataset$iagrtr))
dataset$iaglptn <- as.numeric(as.character(dataset$iaglptn))
dataset$ageadlt <- as.numeric(as.character(dataset$ageadlt))
In order to get mode values for the variables, we create a special function. Functions of median and mean exist by default.
datatable <- dataset %>%
select(agea, iagmr, iagpnt, iagrtr, iaglptn, ageadlt)
datatable <- na.omit(datatable)
mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
var.agea <- c(round(mean(datatable$agea), 2), mode(datatable$agea), median(datatable$agea))
names(var.agea) <- c("mean", "mode", "median")
var.iagmr <- c(round(mean(datatable$iagmr), 2), mode(datatable$iagmr), median(datatable$iagmr))
var.iagpnt <- c(round(mean(datatable$iagpnt), 2), mode(datatable$iagpnt), median(datatable$iagpnt))
var.iagrtr <- c(round(mean(datatable$iagrtr), 2), mode(datatable$iagrtr), median(datatable$iagrtr))
var.iaglptn <- c(round(mean(datatable$iaglptn), 2), mode(datatable$iaglptn), median(datatable$iaglptn))
var.ageadlt <- c(round(mean(datatable$ageadlt), 2), mode(datatable$ageadlt), median(datatable$ageadlt))
table <- data.frame(var.agea, var.iagmr, var.iagpnt, var.iagrtr, var.iaglptn, var.ageadlt, stringsAsFactors = FALSE)
kable(table) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
| var.agea | var.iagmr | var.iagpnt | var.iagrtr | var.iaglptn | var.ageadlt | |
|---|---|---|---|---|---|---|
| mean | 51.64 | 24.97 | 26.32 | 62.56 | 21.33 | 19.21 |
| mode | 56.00 | 25.00 | 25.00 | 60.00 | 18.00 | 18.00 |
| median | 52.00 | 25.00 | 26.00 | 63.00 | 21.00 | 18.00 |
Here we see some statistics on the variables we have - for example, the average age of respondents was little above 50 years old. To explore the data more deeply, check the graphs we constructed!
ggplot(datatable, aes(agea)) + geom_histogram(binwidth = 5, color = "black", fill = "#FFD39B", alpha = 0.5) + geom_vline(aes(xintercept = median(datatable$agea), color = 'median'), linetype="solid", size=1) +
geom_vline(aes(xintercept = mean(datatable$agea), color = 'mean'), linetype="solid", size=1) + geom_vline(aes(xintercept = mode(datatable$agea), color = 'mode'), linetype="solid", size=1) +
theme_bw() + scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd", mode = "#339666")) + labs(title = "Distribution of age", x = "Age of respondent", y = "Number of people")
Age of respondents is almost normally distributed, but left-skewed, so there are more people aged above 50s than under in our data.
ggplot(datatable, aes(ageadlt)) + geom_histogram(fill = "plum", color = "black", alpha = 0.5) + xlim(c(10,30)) + geom_vline(aes(xintercept = median(datatable$ageadlt), color = 'median'), linetype="solid", size=1) + geom_vline(aes(xintercept = mean(datatable$ageadlt), color = 'mean'), linetype="solid", size=1) + geom_vline(aes(xintercept = mode(datatable$ageadlt), color = 'mode'), linetype="solid", size=1) + theme_bw() + scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd", mode = "#339666")) + labs(title = "At what age do people become adults?", x = "Age", y = "Number of people")
Conclusion №1: the distribution of age for becoming adults in the Uk is right-skewed which means that respondents tend to think people grow up earlier than 20s - they report “18” way much frequently.
Further on, we are interested in what age people perceive as perfect for such life events as first living with a partner for a long period of time, getting married, having a first child and get retired from work. Besides, we wonder if this opinion differs in different age categories of people: teenagers (under 19), young adults (19-34), middle-aged (35-54), the elderly (55-64) and aged people (65 and above). We think boxplots are better visualization for this. So, let’s have a look at them.
datatable$agecat <- ifelse(datatable$agea < 19,
"under 19",
ifelse(datatable$agea >= 19 & datatable$agea < 35,
"19-34",
ifelse(datatable$agea >= 35 & datatable$agea < 55,
"35-54",
ifelse(datatable$agea >= 55 & datatable$agea < 65,
"55-64",
"65 and above"))))
datatable$agecat <- factor(datatable$agecat, levels = c("under 19", "19-34", "35-54", "55-64", "65 and above"))
ggplot(datatable, aes(x = agecat, y = iaglptn)) + geom_boxplot(fill = "mistyrose1", color = "black") + labs(title = "At what age should people live with partner for 3 months or more for the first time?", x = "Age categories of respondents", y = "Age to live with a partner") + theme_bw()
Conclusion №2: While teenagers basically agree that ideal age should be around 20, the opinion of young adults is dispersed more to the older years and increases up to 23. The following middle-aged and elderly groups are noticeably more “conservative”: in general, they tend to overstate the age up to 25 at maximum point at 33. It means that perception of ideal age for living with a partner grows with people getting older. But, surprisingly, by the age of 65 and above the interquartile range of ideal age narrows again to 19-23 but the median value does not change.
ggplot(datatable, aes(x = agecat, y = iagmr)) + geom_boxplot(fill = "mistyrose1", color = "black") + labs(title = "At what age should people get married?", x = "Age categories of respondents", y = "Age to get married") + theme_bw()
Conclusion №3: The ideal age for getting married is not changing dramatically across different age groups - the median values for all of them remains 25 years. However, interquartile range is the highest in the group of 35-54 years old - it is up to the point of 30 - and decreases on both sides of the hotizontal scale. Thus, teenagers are the most “liberal” group as they believe people can already get married in 20, the second place is occupied by 65 and above years old, and young adults and the elderly do not differ in their opinion: the ideal age range for them is from 23 to 27.
ggplot(datatable, aes(x = agecat, y = iagpnt)) + geom_boxplot(fill = "mistyrose1", color = "black") +
labs(title = "At what age should people give birth/father a child?", x = "Age categories of respondents", y = "Age to have a first child") + theme_bw()
Conclusion №4: In general, the ideal age for having a first child starts at 23 years for people in the UK but median values increase again by the middle-aged group up to 28 years and decreases by the aged people to 25.
ggplot(datatable, aes(x = agecat, y = iagrtr)) + geom_boxplot(fill = "mistyrose1", color = "black") + labs(title = "At what age should people retire from their work?", x = "Age categories of respondents", y = "Age to get retired") + theme_bw()
Conclusion №5: The interquartile range does not change at all across all groups, except teenagers - they think people should retire at 58 and older, while others believe the better age would be 60 and older. Nevertheless, the median values vary from 60 for middle-aged and the elderly to 63 for young adults and aged people.
Now we want to know whether, for example, ideal age for getting married or having a first child differs by political affiliations of the British.
politics <- dataset %>%
select(prtclcgb, iagmr, iagpnt)
politics$iagmr <- as.numeric(as.character(politics$iagmr))
politics$iagpnt <- as.numeric(as.character(politics$iagpnt))
politics$prtclcgb <- as.character(politics$prtclcgb)
politics <- na.omit(politics)
politics$prtclcgb <- str_replace_all(politics$prtclcgb, "nir", "NI")
politics$prtclcgb <- str_replace_all(politics$prtclcgb, "Sinn FГ©in", "Sinn Fein")
politics1 <- politics %>%
group_by(prtclcgb) %>%
summarize(iagmr_mean = mean(iagmr))
ggplot(politics1, aes(x = reorder(prtclcgb, -iagmr_mean), y = iagmr_mean)) + geom_col(fill = "darkseagreen2", color = "black") + coord_flip() + labs(title = "1. Average ideal age to get married \nby respondents' political affiliation", x = "Political affiliation", y = "Average ideal age to get married") + theme_bw()
politics2 <- politics %>%
group_by(prtclcgb) %>%
summarize(iagpnt_mean = mean(iagpnt))
ggplot(politics2, aes(x = reorder(prtclcgb, -iagpnt_mean), y = iagpnt_mean)) + geom_col(fill = "lightskyblue", color = "black") + coord_flip() + labs(title = "2. Average ideal age to become mother/father \nby respondents' political affiliation", x = "Political affiliation", y = "Average ideal age to become mother/father") + theme_bw()
Conclusion №6: Speaking about marriage, progressive Green Party of Nothern Ireland is the most “liberal” in this sense since it has the minimum ideal age around 18 years. To the contrary, adherents of Playd Cymru which is a nationalist party of Wales believe that people should marry at 30-31. For having a first child, the picture hardly ever changes: although some reshuffling can be noticed, e.g. irish party of Sinn Fein considers 26-27 as ideal age for marriage but 24 for giving birth to/fathering a child, the opinion of Green Party NI and Playd Cymru remains the same.
Later on, let’s see if there is a difference in approval of not having children and living with a partner not married to in different religious groups.
religion <- dataset %>%
filter(is.na(rlgdngb) == F, is.na(anvcld) == F, is.na(alvgptn) == F)
ggplot(religion, aes(x = anvcld, fill = rlgdngb)) + geom_bar(position = "fill") + coord_flip() + labs(title = "Do you approve people \nnot having children?", x = "Approvement of never having children", y = "Share population") + scale_fill_discrete(name = "Religion") + theme_bw()
ggplot(religion, aes(x = alvgptn, fill = rlgdngb)) + geom_bar(position = "fill") + coord_flip() + labs(title = "Do you apprrove people \nliving with a partner \nthey are not married to?", x = "Approvement of living with partner not married to", y = "Share population") + scale_fill_discrete(name = "Religion") + theme_bw()
Conclusion №7: As it can be seen from the two stacked barplots above, Church of England tends to be the most “liberal” in approval of people not having any child or living with a partner they are not married to. The most “conservative” religious group seems to be Muslims as they tend to strongly disapprove.
Now we want to explore gender differences in perceptions of ideal age and real age when they started living with a partner, get married and gave birth to/fathered a child. Let’s have a look at our fancy scatter plots!:)
scatter <- dataset %>%
select(lvptnyr, yrbrn, gndr, iaglptn, maryr, iagmr, fcldbrn, iagpnt)
scatter$lvptnyr <- as.numeric(as.character(scatter$lvptnyr))
scatter$yrbrn <- as.numeric(as.character(scatter$yrbrn))
scatter$iaglptn <- as.numeric(as.character(scatter$iaglptn))
scatter$maryr <- as.numeric(as.character(scatter$maryr))
scatter$iagmr <- as.numeric(as.character(scatter$iagmr))
scatter$fcldbrn <- as.numeric(as.character(scatter$fcldbrn))
scatter$iagpnt <- as.numeric(as.character(scatter$iagpnt))
scatter <- na.omit(scatter)
scatter1 <- scatter %>%
mutate(real_age = lvptnyr - yrbrn)
scatter1$gndr <- relevel(scatter1$gndr, ref = "Female")
ggplot(scatter1, aes(x = real_age, y = iaglptn, color = gndr)) + geom_point() + geom_smooth(method = "lm") + theme_bw() + labs(title = "1. Comparison of ideal and real age to start living with a partner", x = "Real age of respondents when they started living with a partner", y = "Ideal age to start living with a partner")
scatter2 <- scatter %>%
mutate(real_age = maryr - yrbrn)
scatter2$gndr <- relevel(scatter2$gndr, ref = "Female")
ggplot(scatter2, aes(x = real_age, y = iagmr, color = gndr)) + geom_point() + geom_smooth(method = "lm") + labs(title = "2. Comparison of ideal and real age to get married", x = "Real age of respondents when they married", y = "Ideal age to get married") + theme_bw()
scatter3 <- scatter %>%
mutate(real_age = fcldbrn - yrbrn)
scatter3$gndr <- relevel(scatter3$gndr, ref = "Female")
ggplot(scatter3, aes(x = real_age, y = iagpnt, color = gndr)) + geom_point() + geom_smooth(method = "lm") + labs(title = "3. Comparison of ideal and real age to become mother/father", x = "Real age of respondents when they became mother/father", y = "Ideal age to become mother/father") + theme_bw()
Conclusion №8: Looking at the graphs, we see there is a positive dependence of ideal age on the real age of the event for both males and females. However, all the scatter plots show males thinking the ideal age is something later than females - their trend line is higher and steeper - while females get into these life events later than males as their points are located more to the right.