Team members: Tolstokoraya Darya, Baturina Elina, Sorokina Darya, Suetina Anna

General information about our research

We chose Switzerland as a country for our analysis.

Topic: “Digital and social contacts within family and workplace and its relation to subjective well-being and social exclusion”

Research question: How digital and social contacts within family and workplace are related to subjective well-being and social exclusion?

In this project we will explore some sub research questions, connected with our topic.

Used packages and functions

library(dplyr)
library (kableExtra)
library(ggplot2)
library(foreign)
library(sjlabelled)
library(sjPlot)
library(ggpubr)
library(psych)
library(readr)

Mode = function(x){ 
 ta = table(x)
 tam = max(ta)
 if (all(ta == tam))
 mod = NA
 else
 if(is.numeric(x))
 mod = as.numeric(names(ta)[ta == tam])
 else
 mod = names(ta)[ta == tam]
 return(mod)
 }

The data

Here we upload the dataset than we use. It is 10th round of ESS.

ESS <- read.csv('/Users/admin/Downloads/ESS10-3/ESS10.csv', header = T)

Meanwhile, filtering the data by country and selecting variables that we will use in the analysis

ESS10 <- ESS %>% 
  filter(cntry == "CH") %>% 
  select(idno, yrbrn, sclmeet, impfree, happy, ttminpnt, mcclose, edlvdch)

DESCRIPTIVE TABLE

The table with the description of the variables we use for the analysis

Label = c("idno", "yrbrn", "sclmeet", "impfree", "happy", "ttminpnt", "mcclose", "edlvdch") 
Meaning = c("Respondent's identification number", "Year of birth", "How often socially meet with friends, relatives or colleagues", "Important to make own decisions and be free", "How happy the person is", "Travel time to parent, in minutes", "Online/mobile communication makes people feel closer to one another", "Highest level of education, Switzerland")
Level_Of_Measurement <- c("Ratio", "Interval", "Interval", "Interval", "Interval", "Ratio", "Interval", "Nominal")
df <- data.frame(Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)

kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Label Meaning Level_Of_Measurement
idno Respondent’s identification number Ratio
yrbrn Year of birth Interval
sclmeet How often socially meet with friends, relatives or colleagues Interval
impfree Important to make own decisions and be free Interval
happy How happy the person is Interval
ttminpnt Travel time to parent, in minutes Ratio
mcclose Online/mobile communication makes people feel closer to one another Interval
edlvdch Highest level of education, Switzerland Nominal

Investigating the variables

YEAR BORN

ESS10$yrbrn <- as.numeric(ESS10$yrbrn)

ESS10_yrbrn <- ESS10 %>% 
  select (yrbrn) %>% 
  filter(yrbrn!=7777)

table(ESS10_yrbrn$yrbrn)
## 
## 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 
##    5    5    3    7   10    5    9    5   11   11   10   14    6   16   21   18 
## 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 
##   26   13   17   19   23   19   21   19   26   24   22   26   25   25   26   37 
## 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 
##   24   25   33   30   37   21   36   24   20   19   24   17   32   19   20   18 
## 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 
##   21   29   26   33   22   26   21   23   16   32   22   26   17   13   24   11 
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 
##   17   27   24   15   16   19   22   23   22   15   22    6
summary(ESS10_yrbrn$yrbrn)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1931    1957    1971    1972    1987    2006
ggplot(ESS10_yrbrn)+
  geom_histogram(aes(x = yrbrn), fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlab("Year of birth") + 
  ylab("Number of people") +
  geom_vline(aes(xintercept = mean(yrbrn), color = 'mean'), linetype="solid",linewidth = 1) +
  geom_vline(aes(xintercept = median(yrbrn), color = 'median'), linetype="solid", linewidth = 1)+
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Years people were born")+
  xlim(1925, 2015)

Looking at the histogram, we see that the data is not distributed normally as it does not follow the bell-curve shape. However we see that mean and median are located approximately in the center of the distribution. We cannot illustrate mode graphically since particulary this distribution has two the most frequent values.

SOCIAL MEET

ESS10$sclmeet <- as.numeric(ESS10$sclmeet)

ESS10_sclmeet<- ESS10 %>% 
  select (sclmeet) %>% 
  filter(sclmeet != 88)

table(ESS10_sclmeet$sclmeet)
## 
##   1   2   3   4   5   6   7 
##   7  64 137 321 333 491 169
summary(ESS10_sclmeet$sclmeet)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   5.000   5.009   6.000   7.000
ggplot(ESS10_sclmeet)+
  geom_histogram( aes(x = sclmeet), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How often socially meet with friends, relatives or colleagues") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(sclmeet), color = 'mean'), linetype="solid", linewidth = 2.5) +
  geom_vline(aes(xintercept = median(sclmeet), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(sclmeet), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Social meetings") 

The data is a little skewed to the left, however mean and median are located in the middle of the scale.

FREE VALUES

ESS10$impfree <- as.numeric(ESS10$impfree)

ESS10_impfree<- ESS10 %>% 
  select (impfree)

table(ESS10$impfree)
## 
##   1   2   3   4   5   6   7   8 
## 661 640 169  35   8   4   2   4
summary(ESS10$impfree)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.772   2.000   8.000
ggplot(ESS10_impfree)+
  geom_histogram( aes(x = impfree), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("Important to make own decisions and be free") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(impfree), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(impfree), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(impfree), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("The value of freedom and independancy") 

The histogram is skewed to the right as it has long right tail. Thus, we assume that the data is not normally distributed. We also see that mean, median and mode are located near to the left, but they are approximately equal.

HAPPY

ESS10$happy <- as.numeric(ESS10$happy)

ESS10_happy<- ESS10 %>% 
  select (happy)

table(ESS10_happy$happy)
## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   2   4   7  12  17  51  62 220 537 380 231
summary(ESS10_happy$happy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   8.000   8.086   9.000  10.000
ggplot(ESS10_happy)+
  geom_histogram( aes(x = happy), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How happy people feel") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(happy), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(happy), color = 'median'), linetype="solid", linewidth = 2.5)+
  geom_vline(aes(xintercept = Mode(happy), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Feeling of happiness") 

The data again is not normally distributed, we see long left tail so the data is skewed to the left. Mean, mode and median are the same and equal to 8 score

TIME TO PARENT

ESS10$ttminpnt <- as.numeric(ESS10$ttminpnt)

ESS10_ttminpnt<- ESS10 %>% 
  select (ttminpnt) %>% 
  filter (ttminpnt < 6666)

table(ESS10_ttminpnt$ttminpnt)
## 
##    0    1    2    3    4    5    7    8   10   12   15   16   20   25   30   35 
##    2   25   20    9    6   60    7    2   78    1   58    1   66   13   62    6 
##   36   37   39   40   42   45   50   55   60   70   75   80   90  116  119  120 
##    1    1    1   15    1   20   11    2   37    2    7    2   27    1    1   25 
##  125  140  150  179  180  200  210  239  240  270  285  300  310  330  359  360 
##    1    1   10    1   12    1    4    1   23    3    1   11    1    3    1   12 
##  370  390  420  450  480  495  510  520  540  599  600  660  690  720  735  780 
##    1    3   15    1   15    1    2    1    5    1   14    1    2   13    1    5 
##  840  900 1020 1080 1200 1260 1320 1380 1440 1500 1920 2880 3000 4320 
##    6   10    1    9    8    2    1    1    7    2    1    1    1    1
summary(ESS10_ttminpnt$ttminpnt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    10.0    30.0   188.1   180.0  4320.0
ggplot(ESS10_ttminpnt)+
  geom_histogram(aes(x = ttminpnt), fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlab("How long does it takes to get to parents, min") + 
  ylab("Number of people") +
  geom_vline(aes(xintercept = mean(ttminpnt), color = 'mean'), linetype="solid",linewidth = 1) +
  geom_vline(aes(xintercept = median(ttminpnt), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(ttminpnt), color = 'mode'), linetype="solid",linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Time to get to parents")+
  xlim(0, 2000)+
  ylim(0,150)

The distribution of the data is not normal and right skewed. Mean, mode and median are shifted to the left side.

COMMUNICATION

ESS10$mcclose <- as.numeric(ESS10$mcclose)

ESS10_mcclose<- ESS10 %>% 
  select (mcclose) %>% 
  filter(mcclose != 88 & mcclose !=77)


table(ESS10_mcclose$mcclose)
## 
##   0   1   2   3   4   5   6   7   8   9  10 
##  86  31 109 124 106 231 159 228 266  62  86
summary(ESS10_mcclose$mcclose)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   6.000   5.575   8.000  10.000
ggplot(ESS10_mcclose)+
  geom_histogram( aes(x = mcclose), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("Online/mobile communication makes people feel closer to one another") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(mcclose), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(mcclose), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(mcclose), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Online/mobile communication") 

The distribution of observations is not normal as we cannot see the bell-curve shape of the histogram. Central tendency measures such as mean and median are located near to each other and point almost the middle of the scale. Mode is slightly shifted to the right side and equals to 8.

EDUCATION LEVEL

ESS10_edlvdch <- ESS10 %>% 
  select(edlvdch) %>% 
  filter(edlvdch != 7777)

ESS10_edlvdch$edlvdch[ESS10_edlvdch$edlvdch == 1 | ESS10_edlvdch$edlvdch == 2 | ESS10_edlvdch$edlvdch == 3 | ESS10_edlvdch$edlvdch == 4 | ESS10_edlvdch$edlvdch == 5 | ESS10_edlvdch$edlvdch == 6] <- "1. Compulsory"
ESS10_edlvdch$edlvdch[ESS10_edlvdch$edlvdch == 13 | ESS10_edlvdch$edlvdch == 12 | ESS10_edlvdch$edlvdch == 11 | ESS10_edlvdch$edlvdch == 10 | ESS10_edlvdch$edlvdch == 9 | ESS10_edlvdch$edlvdch == 8 | ESS10_edlvdch$edlvdch == 7 ] <- "2. Vocational"
ESS10_edlvdch$edlvdch[ESS10_edlvdch$edlvdch == 16 | ESS10_edlvdch$edlvdch == 15 | ESS10_edlvdch$edlvdch == 14] <- "3. Higher Vocational"
ESS10_edlvdch$edlvdch[ESS10_edlvdch$edlvdch == 23 | ESS10_edlvdch$edlvdch == 22 | ESS10_edlvdch$edlvdch == 21 | ESS10_edlvdch$edlvdch == 20 | ESS10_edlvdch$edlvdch == 19 | ESS10_edlvdch$edlvdch == 18 | ESS10_edlvdch$edlvdch == 17] <- "4. University"

ESS10_edlvdch$edlvdch <- as.factor(ESS10_edlvdch$edlvdch)

table(ESS10_edlvdch$edlvdch)
## 
##        1. Compulsory        2. Vocational 3. Higher Vocational 
##                  299                  625                  198 
##        4. University 
##                  400
ggplot(ESS10_edlvdch)+
  geom_bar(aes(x = edlvdch), fill="#367588", col="#6a5acd", alpha = 0.5)+
  xlab("Level of education") + 
  ylab("Number of people") +
  ggtitle("Highest level of education")

Analyzing the graph, it is notable that proportion of the highest education level among people in Switzerland is not equal with vocational level being the most popular. As for the other levels, they are also unequal and “Higher Vocational” level is the least popular.

DESCRIPTIVE STATISTICS

mode_yrbrn <- table(ESS10_yrbrn$yrbrn)
mode_yrbrn <- names(mode_yrbrn )[mode_yrbrn  == max(mode_yrbrn )]
mode_yrbrn <- paste(mode_yrbrn , collapse = ", ")



v.yrbrn <- c(round (mean(ESS10_yrbrn$yrbrn), 2), mode_yrbrn, median(ESS10_yrbrn$yrbrn))
names(v.yrbrn ) <- c("mean", "mode", "median")

v.sclmeet <- c(round(mean(ESS10_sclmeet$sclmeet), 2), Mode(ESS10_sclmeet$sclmeet), median(ESS10_sclmeet$sclmeet))
names(v.sclmeet) <- c("mean", "mode", "median")

v.impfree <- c(round(mean(ESS10_impfree$impfree), 2), Mode(ESS10_impfree$impfree), median(ESS10_impfree$impfree))
names(v.impfree) <- c("mean", "mode", "median")

v.happy <- c(round(mean(ESS10_happy$happy), 2), Mode(ESS10_happy$happy), median(ESS10_happy$happy))
names(v.happy) <- c("mean", "mode", "median")

v.ttminpnt <- c(round(mean(ESS10_ttminpnt$ttminpnt), 2), Mode(ESS10_ttminpnt$ttminpnt), median(ESS10_ttminpnt$ttminpnt))
names(v.ttminpnt) <- c("mean", "mode", "median")

v.mcclose <- c(round(mean(ESS10_mcclose$mcclose), 2), Mode(ESS10_mcclose$mcclose), median(ESS10_mcclose$mcclose))
names(v.mcclose) <- c("mean", "mode", "median")

v.edlvdch <- c(NA, Mode(ESS10_edlvdch$edlvdch), NA)
names(v.edlvdch) <- c("mean", "mode", "median") 


tendencymeasures =  data.frame(v.yrbrn, v.sclmeet, v.impfree, v.happy, v.ttminpnt, v.mcclose, v.edlvdch,  stringsAsFactors = FALSE)
kable(tendencymeasures) %>%    
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
v.yrbrn v.sclmeet v.impfree v.happy v.ttminpnt v.mcclose v.edlvdch
mean 1971.66 5.01 1.77 8.09 188.11 5.58 NA
mode 1962, 1967 6.00 1.00 8.00 10.00 8.00
  1. Vocational
median 1971 5.00 2.00 8.00 30.00 6.00 NA

Analysis

CORRELATION

Filtering data for each correlation we will make. In total there will be 4 correlations.

ESS10_cor1 <- ESS10 %>% 
  select (ttminpnt, mcclose, idno)%>%
  filter(ttminpnt != 6666 & ttminpnt != 7777 & ttminpnt != 8888 & ttminpnt != 9999) %>% 
  filter (mcclose != 77 & mcclose != 88 & mcclose != 99)
  
ESS10_cor2 <- ESS10 %>% 
  select (ttminpnt, impfree, idno)%>%
  filter(ttminpnt != 6666 & ttminpnt != 7777 & ttminpnt != 8888 & ttminpnt != 9999) %>% 
  filter (impfree != 7 & impfree != 8 & impfree != 9) 
  
ESS10_cor3 <- ESS10 %>% 
  select (ttminpnt, happy, idno)%>%
  filter(ttminpnt != 6666 & ttminpnt != 7777 & ttminpnt != 8888 & ttminpnt != 9999) %>% 
  filter (happy != 77 & happy != 88 & happy != 99)


ESS10_cor4 <- ESS10 %>% 
  select (ttminpnt, sclmeet, idno)%>%
  filter(ttminpnt != 6666 & ttminpnt != 7777 & ttminpnt != 8888 & ttminpnt != 9999) %>% 
  filter (sclmeet < 77)

For our correlation analysis we decided to choose continuous outcome - ttminpnt and found four correlations with it. Our variables: 1) mcclose - Online/mobile communication makes people feel closer to one another 2) impfree - Important to make own decisions and be free 3) happy - How happy are you 4) sclmeet - How often socially meet with friends, relatives or colleagues

Correlation 1 - continuous (ratio) outcome - ttminpnt - Travel time to parent, in minutes and discrete (quasi-interval) variable: mcclose - Online/mobile communication makes people feel closer to one another.

RQ: Is the belief “online/mobile communication makes people feel closer to one another” correlates with the amount of travel time to parent?

Research Hypothesis

H0: correlation between ttminpnt and mcclose is 0 and there is no association between the variables

HA: correlation between ttminpnt and mcclose is not 0 and there is an association between the variables

In this research “Contact between parents and adult children: The role of time constraints, commuting and automobility” by Ori Rubin the main focus was to find out to what extent the frequency of contact between parents and their adult children living out of home is associated with time allocated to work, including commuting time, and with automobility. But for us, it is more interesting to observe not commuting time to work, but commuting time to a parent’s house, how long it will take. And as we are interested in social communications, we decided to observe online communication and find whether it is associated with how much time it takes to get to a parent. Reference: Rubin, O. (2015). Contact between parents and adult children: The role of time constraints, commuting and automobility. Journal of Transport Geography, 49, 76–84. https://doi.org/10.1016/j.jtrangeo.2015.10.013

Assumptions for correlation

check for the normality

par(mfrow=c(2,2))
hist (ESS10_cor1$ttminpnt)
qqnorm(ESS10_cor1$ttminpnt)
qqline(ESS10_cor1$ttminpnt)
hist (ESS10_cor1$mcclose)
qqnorm(ESS10_cor1$mcclose)
qqline(ESS10_cor1$mcclose)

shapiro.test(ESS10_cor1$mcclose)
## 
##  Shapiro-Wilk normality test
## 
## data:  ESS10_cor1$mcclose
## W = 0.95016, p-value = 1.359e-15
shapiro.test(ESS10_cor1$ttminpnt)
## 
##  Shapiro-Wilk normality test
## 
## data:  ESS10_cor1$ttminpnt
## W = 0.53855, p-value < 2.2e-16

Interpretation: From the histograms and qq-plots we can see that data are non-normally distributed (although the distribution of mcclose is better). From the Shapiro-Wilk normality test we see that p-value < 0,05 in both cases so we can reject our null hypothesis and data is non-normally distributed.

So, Pearson’s methods of correlation analysis: - measurement level: interval or ratio, continuous - met - normal distributions of variables - not met Therefore we can’t use the Pearson’s coefficient of correlation, instead of it we will use the Spearman’s correlation coefficient (Kendall’s also does not suit here because we have ordinal data).

Correlation test

As we use Spearman’ correlation, we modify our research hypotheses a little. H0: there is no monotonic relationship between ttminpnt and mcclose in the population HA: there is monotonic relationship between ttminpnt and mcclose in the population

cor.test(ESS10_cor1$mcclose, ESS10_cor1$ttminpnt, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  ESS10_cor1$mcclose and ESS10_cor1$ttminpnt
## S = 68713458, p-value = 0.0001099
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.1378685

Interpretation: We can see, that p-value < 0,05, so, we can reject the null hypothesis. We can conclude that ttminpnt and mcclose correlate statistically significant with the p-value = 0.0001099 (there is monotonic relationship between ttminpnt and mcclose) and correlation coefficient 0.1378685 (correlation is positive and small).

library(ggpubr)
ggscatter(ESS10_cor1, x = "ttminpnt", y = "mcclose", 
          add = "reg.line",
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "Travel time to parents, in minutes", 
          ylab = "Value: online/mobile communication makes people feel closer to one another")

Interpretation: We see the line as a positive trend (that rises to the right corner), but dots are not very close to the line that means not very strong association.

Correlation 2 - continuous (ratio) outcome - ttminpnt - Travel time to parent, in minutes and continuous (ratio) variable: impfree - Important to make own decisions and be free.

RQ: Is the belief of respondents about importance to make own decisions and be free correlates with the amount of travel time to parent?

Research Hypothesis

H0: correlation between ttminpnt and impfree is 0 and there is no association between the variables

HA: correlation between ttminpnt and impfree is not 0 and there is an association between the variables

In this research “Letting go or holding on? Parents’ perceptions of their relationships with their children during emerging adulthood” the main focus was to reveal the perceived difficulties parents have in ‘letting go’ of their grown-up children, and in acknowledging their developing autonomy and try to demonstrate a range of perceived parental strategies in response to young people’s growing independence. It made us think about whether becoming more independent somehow associates with the distance between a child and a parent live. That is why we chose ttminpnt and impfree - as it designates that a child becomes independent and want to make own choices.
Reference: Kloep, M., & Hendry, L. B. (2010). Letting go or holding on? Parents’ perceptions of their relationships with their children during emerging adulthood. British Journal of Developmental Psychology, 28(4), 817–834. https://doi.org/10.1348/026151009x480581

Assumptions for correlation

check for the normality

par(mfrow=c(1,2))
hist (ESS10_cor2$impfree)
qqnorm(ESS10_cor2$impfree)
qqline(ESS10_cor2$impfree)

shapiro.test(ESS10_cor2$impfree)
## 
##  Shapiro-Wilk normality test
## 
## data:  ESS10_cor2$impfree
## W = 0.78962, p-value < 2.2e-16

Interpretation: From the histograms and qq-plots we can see that data are non-normally distributed. From the Shapiro-Wilk normality test we see that p-value < 0,05 so we can reject our null hypothesis and data is non-normally distributed.

So, Pearson’s methods of correlation analysis: - measurement level: interval or ratio, continuous - met - normal distributions of variables - not met Therefore we can’t use the Pearson’s coefficient of correlation, instead of it we will use the Spearman’s correlation coefficient (Kendall’s also does not suit here because we have ordinal data).

Correlation test

As we use Spearman’ correlation, we modify our research hypotheses a little. H0: there is no monotonic relationship between ttminpnt and impfree in the population HA: there is monotonic relationship between ttminpnt and impfree in the population

cor.test(ESS10_cor2$impfree, ESS10_cor2$ttminpnt, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  ESS10_cor2$impfree and ESS10_cor2$ttminpnt
## S = 79595629, p-value = 0.6441
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.01650188

Result: We can see, that p-value > 0,05, so, we are failed to reject our null hypothesis. We can conclude that ttminpnt and impfree correlate not statistically significant with the p-value = 0.6441 (there is no monotonic relationship between ttminpnt and impfree) and correlation coefficient 0.01650188 (correlation is positive and rather small, but still is not significant).

ggscatter(ESS10_cor2, x = "ttminpnt", y = "impfree", 
          add = "reg.line", 
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "Travel time to parents, in minutes", 
          ylab = "Value: importance to make own decisions and be free")

Interpretation: We see the positive trend (that rises to the right corner), so the distance between the respondent and his/her parents increase and the belief about the importance of online communication between people also increases. But dots are not very close to the line that means not very strong association.

Correlation 3 - continuous (ratio) outcome - ttminpnt - Travel time to parent, in minutes and continuous (quasi-interval) variable: happy - How happy are you.

RQ: Is the belief of respondents about the subjective estimation of one’s subjective well-being correlates with the amount of travel time to parent?

Research Hypothesis

H0: correlation between ttminpnt and happy is 0 and there is no association between the variables

HA: correlation between ttminpnt and happy is not 0 and there is an association between the variables

In this research “Does distance make happiness? geographic proximity of adult children and the well-being of older persons” the main focus was to investigate the association between the intergenerational geographic proximity of adult children and the well-being of older persons in China. The results show that older adults who live independently but with adult children living close by have significantly higher life satisfaction than those who live with or at a distance from their children. But we wanted, also considering the distance, to define whether there is an association with the distance between child and parent and child’s subjective well-being, not focusing on parents’ one. That is why we choose happy as a variable, as it measures the subjective estimation of child’s subjective well-being. Reference: Wang, Y., & Tsay, W. (2022). Does distance make happiness? geographic proximity of adult children and the well-being of older persons. Journal of Aging & Social Policy, 36(2), 222–240. https://doi.org/10.1080/08959420.2022.2080464

Assumptions for correlation

check for the normality

par(mfrow=c(1,2))
hist (ESS10_cor3$happy)
qqnorm(ESS10_cor3$happy)
qqline(ESS10_cor3$happy)

shapiro.test(ESS10_cor3$happy)
## 
##  Shapiro-Wilk normality test
## 
## data:  ESS10_cor3$happy
## W = 0.8627, p-value < 2.2e-16

Interpretation: From the histogram and qq-plot we can see that data is non-normally distributed. From the Shapiro-Wilk normality test we see that p-value < 0,05 (it is < 2.2e-16) so we can reject our null hypothesis and state that data is non-normally distributed. So, Pearson’s methods of correlation analysis: - measurement level: interval or ratio, continuous - met - normal distributions of variables - not met Therefore we can’t use the Pearson’s coefficient of correlation, instead of it we will use the Spearman’s correlation coefficient (Kendall’s also does not suit here because we have ordinal data).

Correlation test

As we use Spearman’ correlation, we modify our research hypotheses a little. H0: there is no monotonic relationship between ttminpnt and happy in the population HA: there is monotonic relationship between ttminpnt and happy in the population

cor.test(ESS10_cor3$ttminpnt, ESS10_cor3$happy, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  ESS10_cor3$ttminpnt and ESS10_cor3$happy
## S = 86438796, p-value = 0.09267
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## -0.05994177

Result: We can see, that p-value > 0,05, so, we fail to reject our null hypothesis. We can conclude that ttminpnt and happy correlate not statistically significant with the p-value = 0.09267 (there is no monotonic relationship between ttminpnt and happy) and correlation coefficient -0.05994177 (correlation is negative and rather small).

ggscatter(ESS10_cor3, x = "ttminpnt", y = "happy", 
          add = "reg.line", 
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "Travel time to parents, in minutes", 
          ylab = "How happy are you")

Interpretation: We see the negative trend (that falls to the right corner), so the distance between the respondent and his/her parents increases, people’s happiness level decreases. Dots are not very close to the trend line that means not very strong association.

Correlation 4 - continuous (ratio) outcome - ttminpnt - Travel time to parent, in minutes and continuous (quasi-interval) variable: sclmeet - How often socially meet with friends, relatives or colleagues.

RQ: Is the frequency of social interactions with friends, relatives or colleques correlates with the amount of travel time to parent?

Research Hypothesis

H0: correlation between ttminpnt and sclmeet is 0 and there is no association between the variables

HA: correlation between ttminpnt and sclmeet is not 0 and there is an association between the variables

Here we can lean on already mentioned research which is “Contact between parents and adult children: The role of time constraints, commuting and automobility” by Ori Rubin. The main focus was to find out to what extent the frequency of contact between parents and their adult children living out of home is associated with time allocated to work, including commuting time, and with automobility. But here we want to change the variable commuting on the frequency of social interactions, as we see it an interesting observation to consider. Reference: Rubin, O. (2015). Contact between parents and adult children: The role of time constraints, commuting and automobility. Journal of Transport Geography, 49, 76–84. https://doi.org/10.1016/j.jtrangeo.2015.10.013

Assumptions for correlation

check for the normality

par(mfrow=c(1,2))
hist (ESS10_cor4$sclmeet)
qqnorm(ESS10_cor4$sclmeet)
qqline(ESS10_cor4$sclmeet)

shapiro.test(ESS10_cor4$sclmeet)
## 
##  Shapiro-Wilk normality test
## 
## data:  ESS10_cor4$sclmeet
## W = 0.91116, p-value < 2.2e-16

Interpretation: From the histogram and qq-plot we can see that data is non-normally distributed. From the Shapiro-Wilk normality test we see that p-value < 0,05 (it is < 2.2e-16) so we can reject our null hypothesis and state that data is non-normally distributed. So, Pearson’s methods of correlation analysis: - measurement level: interval or ratio, continuous - met - normal distributions of variables - not met Therefore we can’t use the Pearson’s coefficient of correlation, instead of it we will use the Spearman’s correlation coefficient (Kendall’s also does not suit here because we have ordinal data).

Correlation test

As we use Spearman’ correlation, we modify our research hypotheses a little. H0: there is no monotonic relationship between ttminpnt and sclmeet in the population HA: there is monotonic relationship between ttminpnt and sclmeet in the population

cor.test(ESS10_cor4$ttminpnt, ESS10_cor4$sclmeet, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  ESS10_cor4$ttminpnt and ESS10_cor4$sclmeet
## S = 92153701, p-value = 0.000157
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.1343329

Interpretation: We can see, that p-value < 0,05, so, we can reject the null hypothesis. We can conclude that ttminpnt and sclmeet correlate statistically significant with the p-value = 0.000157 (there is monotonic relationship between ttminpnt and sclmeet) and correlation coefficient -0.1343329 (correlation is negative and small).

library(ggpubr)
ggscatter(ESS10_cor4, x = "ttminpnt", y = "sclmeet", 
          add = "reg.line", 
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "Travel time to parents, in minutes", 
          ylab = "How often socially meet with friends, relatives or colleagues")

Interpretation: We see the negative trend (that falls to the right corner), so the frequency of social interactions with friends, relatives or colleagues decreases, the distance between the respondent and his/her parents increases. Dots are not very close to the trend line that means not very strong association.

Correlation matrix

library(sjPlot)

ESS10_corr1 <- merge(ESS10_cor1, ESS10_cor2, all = TRUE)
ESS10_corr2 <- merge(ESS10_cor3, ESS10_cor4, all = TRUE)

ESS10_corr <- merge(ESS10_corr1, ESS10_corr2, all = TRUE)

ESS10_corr <- ESS10_corr %>% select(-idno)


tab_corr(ESS10_corr[, 1:5], 
         corr.method = "spearman", wrap.labels = 70)
  ttminpnt mcclose impfree happy sclmeet
ttminpnt   0.136*** 0.017 -0.059 -0.134***
mcclose 0.136***   0.044 0.057 0.052
impfree 0.017 0.044   -0.065 -0.040
happy -0.059 0.057 -0.065   0.094**
sclmeet -0.134*** 0.052 -0.040 0.094**  
Computed correlation used spearman-method with listwise-deletion.

A graphical table of the correlation

sjp.corr(ESS10_corr[, 1:5], wrap.labels = 100, decimals = 2)

Overall conclusions on correlation analysis:

Looking at our correlation matrix we see that there is only 2 statistically significantly correlations with ttminpnt variable: mcclose (Important to make own decisions and be free), the correlation coefficient is 0.137 and sclmeet (How often socially meet with friends, relatives or colleagues), the correlation coefficient is -0.134. Also, all the data we possess is not normally distributed, that is why we used Spearman in order to make correlation analysis.

REGRESSION

Research hypothesis: along with the rise of the age of the person and increase of the educational level, the distance from parents increases.

RQ: Is there a significant relation between distance from parents and the level of education of the person?

In this research “Too Far to Go On? Distance to School and University Participation” by Marc Frenette the main focus of the study is to investigate how the proximity of students’ residences to educational institutions impacts their decision to pursue higher education. Young people who attend school and live ‘out of commuting distance’ are far less likely to further enter the university and go for higher education than students living ‘within commuting distance’. So, this research made us question whether there is a significant relation between distance from parents and the level of education of the person. Reference: Frenette, M. (2006). Too far to go on? distance to school and university participation. Education Economics, 14(1), 31–58. https://doi.org/10.1080/09645290500481865

Working with variables

Filtering the data that we will use for creation regression models

ESS10_reg <- ESS10 %>% 
  select (ttminpnt, edlvdch, idno, yrbrn)%>%
  filter(ttminpnt != 6666 & ttminpnt != 7777 & ttminpnt != 8888 & ttminpnt != 9999) %>% 
  filter(edlvdch <= 23) %>% 
  filter(yrbrn != 7777)

We regrouped the variable “level of education” according to Switzerland’s education system. Below you can see the 4 categories, with which we have come up.

ESS10_reg$edlvdch[ESS10_reg$edlvdch == 1 | ESS10_reg$edlvdch == 2 | ESS10_reg$edlvdch == 3 | ESS10_reg$edlvdch == 4 | ESS10_reg$edlvdch == 5 | ESS10_reg$edlvdch == 6] <- "1. Compulsory"
ESS10_reg$edlvdch[ESS10_reg$edlvdch == 13 | ESS10_reg$edlvdch == 12 | ESS10_reg$edlvdch == 11 | ESS10_reg$edlvdch == 10 | ESS10_reg$edlvdch == 9 | ESS10_reg$edlvdch == 8 | ESS10_reg$edlvdch == 7 ] <- "2. Vocational"
ESS10_reg$edlvdch[ESS10_reg$edlvdch == 16 | ESS10_reg$edlvdch == 15 | ESS10_reg$edlvdch == 14] <- "3. Higher Vocational"
ESS10_reg$edlvdch[ESS10_reg$edlvdch == 23 | ESS10_reg$edlvdch == 22 | ESS10_reg$edlvdch == 21 | ESS10_reg$edlvdch == 20 | ESS10_reg$edlvdch == 19 | ESS10_reg$edlvdch == 18 | ESS10_reg$edlvdch == 17] <- "4. University"

class(ESS10_reg$edlvdch)
## [1] "character"
ESS10_reg$edlvdch <- as.factor(ESS10_reg$edlvdch)

Construction a boxplot

library(ggplot2)

ggplot(ESS10_reg)+
  geom_boxplot(aes(y=ttminpnt, x=edlvdch))+
  xlab("Education level")+
  ylab("Time to parents in min")

Regression Models

For our analysis, we need to convert year of birth variable to age

ESS10_reg$age <- 2020 - ESS10_reg$yrbrn

1. Model

H0: There is no significant relation between distance from parents (outcome) and the age of the person (continious predictor)

HA: There is a significant relation between distance from parents (outcome) and the age of the person (continious predictor)

m1 <- lm(ttminpnt ~ age, data = ESS10_reg)
tab_model(m1, show.ci = F, show.se = T)
  ttminpnt
Predictors Estimates std. Error p
(Intercept) 211.95 48.36 <0.001
age -0.57 1.10 0.604
Observations 782
R2 / R2 adjusted 0.000 / -0.001

2. Model

H0: There is no significant relation between distance from parents (outcome) and the level of education of the person (categorical predictor)

HA: There is a significant relation between distance from parents (outcome) and the level of education of the person (categorical predictor)

m2 <- lm(ttminpnt ~ age + edlvdch, data = ESS10_reg)
tab_model(m1, m2, show.ci = F, show.se = T)
  ttminpnt ttminpnt
Predictors Estimates std. Error p Estimates std. Error p
(Intercept) 211.95 48.36 <0.001 355.92 56.36 <0.001
age -0.57 1.10 0.604 0.21 1.08 0.849
edlvdch [2. Vocational] -235.92 41.87 <0.001
edlvdch [3. Higher
Vocational]
-273.07 48.83 <0.001
edlvdch [4. University] -134.68 42.16 0.001
Observations 782 782
R2 / R2 adjusted 0.000 / -0.001 0.054 / 0.049

Firstly we took a look at the relation between distance from parents and age variables and built the first model. However, the results were insignificant (p-value = 0.21, negative R square). We then added the education variable. We see that the second model is significantly better than the first model. There is a relationship between distance and education levels (P-value < 0.001; R square positive). However, it should be noted that the R square is still relatively small, but this model better explains the distance variable than the first one. Let’s compare the models using ANOVA test.

ANOVA

anova(m1, m2)
## Analysis of Variance Table
## 
## Model 1: ttminpnt ~ age
## Model 2: ttminpnt ~ age + edlvdch
##   Res.Df       RSS Df Sum of Sq      F    Pr(>F)    
## 1    780 105635050                                  
## 2    777  99992507  3   5642543 14.615 2.877e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to the results of ANOVA test, we see that the second model does explain the outcome variable better (p-value = 2.877e-09). The age variable was excluded from the final model due to lack of significance.

Final model: Distance from parents + level of education

tab_model(m2, show.ci = F, show.se = T)
  ttminpnt
Predictors Estimates std. Error p
(Intercept) 355.92 56.36 <0.001
age 0.21 1.08 0.849
edlvdch [2. Vocational] -235.92 41.87 <0.001
edlvdch [3. Higher
Vocational]
-273.07 48.83 <0.001
edlvdch [4. University] -134.68 42.16 0.001
Observations 782
R2 / R2 adjusted 0.054 / 0.049
plot_models(m2)

Interpretation: Based on the model results we see that there is a significant relation between distance from parents and level of the education. The reference category in our model is compulsory level of education. People with this level of education have the longest distance to parents (364 min). When we compare other categories with the reference one we see, that people with university degree are 134 minutes closer to their parents (-134,68 estimate coefficient). The levels of education, which are the closest to their parents are “vocational” and “higher vocational” (they are 235,9 and 273,07 minutes closer to their parents in comparison to compulsory level of education).

Constructing regression model equation

The general equation looks like this: E(Y) = β0 + β1X1 + β2I2 + β3I3 + β4I4.

β0 = 355.10 β1 = 0.21 X1 – age variable

β2 = -235.92 β3 = -273.07 β4 = -134.68

To construct regression model equation we need to think about levels of education variables as dummies varibls. - For people with compulsory level of education I2 = 0, I3 = 0, and I4 = 0, so E(Y ) = 355.10 + 0.21X1 - 235.92(0) - 273.07(0) - 134.68(0) = 355.10 + 0.21*X1

  • For people with Vocational level of education: I2 = 1, I3 = 0, and I4 = 0, so E(Y ) = 355.10 + 0.21X1 -235.92(1) -273.07(0) -134.68(0) = 355.10 + 0.21*X1 - 235.92

  • For people with higher vicational level of education: I2 = 0, I3 = 1, and I4 = 0, so E(Y ) = 355.10 + 0.21X1 - 235.92(0) - 273.07(1) - 134.68(0) = 355.10 + 0.21*X1 - 273.07

  • For people with higher university level of education: I2 = 0, I3 = 0, and I4 = 1, so E(Y ) = 355.10 + 0.21X1 - 235.92(0) - 273.07(0) - 134.68(1) = 355.10 + 0.21*X1 - 134.68

GENERAL CONCLUSION ON ANALYSIS

In our study, we investigated the relationship of geographical distance between parents and children with a number of variables of interest to us. Using correlation analysis, we found that there is a statistically significant relationship between parent-child geographic distance and the respondent’s evaluation of the importance of online communication. The relationship was positive, so we can conclude that as geographical distance increases, people begin to rate the importance of online communication higher. We assume that this is because when the distance increases people find it more difficult to communicate with each other in person, so it is important for them to resort to other means to maintain social connections and the value of such online means increases.
In addition, the correlation analysis showed that there is a statistically significant negative relationship between georgraphic distance from parents and the frequency of live social interactions. No statistically significant relationship was found for the other variables (happiness and value of freedom). Using regression analysis, we wanted to trace the relationship of parent-child distance with variables such as age and education. Whereas the relationship with age was rejected, the relationship of distance and education was found. People with different levels of education have different parent-child distance.