Team: Tolstokoraya Darya

Baturina Elina

Sorokina Darya

Suetina Anna

General Infromation about our research

We chose Switzerland as a country for our analysis.

Topic: “Digital and social contacts within family and workplace and its relation to subjective well-being, happiness and social exclusion”

Some background and justification of research interest:

Why Switzerland as a country? - Switzerland is one of the wealthiest countries in the world with $92,463 GDP per Capita (Richest Countries in the world 2024., n.d.). This point is directly connected with people’s work, how their work is organized and with people’s happiness and whether they enjoy their lives, as they probably have really high quality conditions of living in such a wealthy country. We got interested in this fact and thought we might derive interesting data in the ESS portal about Switzerland, happiness level there, work and family related data. We wanted to dive deeper and observe what factors are related to the happiness level of the population of Switzerland. SOURCE:Richest Countries in the world 2024. (n.d.).

Used packages and functions

library(dplyr)
library (kableExtra)
library(ggplot2)
library(foreign)
library(sjlabelled)
library(sjPlot)
library(ggpubr)
library(psych)
library(readr)
library(rstatix)
library(DescTools) 
library(sjstats)
library(corrplot)
library(effsize)
library(coin)
library(RGraphics)
library(rcompanion)
library(car)

Mode = function(x){ 
 ta = table(x)
 tam = max(ta)
 if (all(ta == tam))
 mod = NA
 else
 if(is.numeric(x))
 mod = as.numeric(names(ta)[ta == tam])
 else
 mod = names(ta)[ta == tam]
 return(mod)
 }

Project 1

Research question: How digital and social contacts within family and workplace are related to subjective well-being and social exclusion?

Literature review

In the study on “Technological affluence and subjective well-being” (Kavetsos, G., & Koutroumpis, P., 2011) a recent European pooled cross-sectional dataset is used, and researchers focus only on access to the internet and ownership of electronic gadgets and its relation to subjective well-being. In our case we want to go further and observe how connections via the Internet between family members and coworkers/bosses are related to a person’s subjective well-being. “Following Scanlon (1993) we study whether an individual’s well-being is enhanced due to the ownership of such technological amenities (TV, mobile phones, which in this setting can be viewed as substantive goods.” They used data only about the possession of electronic technologies: “These surveys ask individuals whether they own a television (TV), digital video disk player (DVD), compact disk music player (CD), computer (PC), internet connection, fixed telephone line and private mobile phone.”
While in our research we will look at the communication patterns, not just ownership. Also, from their country sample we can conclude that for Switzerland there was no such analysis. It means that another research gap is detected. “Responses are available for 29 countries, including: Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Turkey and the United Kingdom.”

SOURCE: Kavetsos, G., & Koutroumpis, P. (2011). Technological affluence and subjective well-being. Journal of Economic Psychology, 32(5), 742–753.

Research “Always Available, Always attached: A relational perspective on the effects of mobile phones and social media on Subjective Well-Being” by Taylor & Bazarova is the closest one to our in terms of what they want to measure: “ We want to address how interpersonal communication across the media ecosystem affects the multidimensional concept of subjective well-being (SWB)” (Taylor, S. H., & Bazarova, N. N., 2021) They focus on digital contacts and the feeling of happiness. However, in this study the focus was only on romantic relationships: “One hundred fourteen romantic couples in long-term relationships were recruited to participate in this study during January to March 2019.” (Taylor, S. H., & Bazarova, N. N., 2021). While we are planning not to narrow down to this limited sample and use data to describe a situation on a national level.

SOURCE: Taylor, S. H., & Bazarova, N. N. (2021). Always Available, Always attached: A relational perspective on the effects of mobile phones and social media on Subjective Well-Being. Journal of Computer-Mediated Communication, 26(4), 187–206.

We may want to look not only on digital connection patterns, but also on real-life social connections and its relation to subjective well-being level. In this study on social capital and subjective well-being (Hommerich, C., & Tiefenbach, T., 2017) the variables that are mentioned in it are the ones that we also want to look at and analyze: “To address this issue, we propose the concept of social affiliation, measuring the feeling of belonging to the social whole, of being a respected and valued member of society.” So, we also have variables such as “Feel like part of your team, how much”, “Take part in social activities compared to others of the same age” - which refer to the feeling of belonging to the social whole. However, this study was held in Japan, while we want to observe Switzerland.

SOURCE: Hommerich, C., & Tiefenbach, T. (2017). Analyzing the relationship between Social capital and Subjective Well-Being: The mediating role of social affiliation. Journal of Happiness Studies, 19(4), 1091–1114. https://doi.org/10.1007/s10902-017-9859-9

Description of variables and graphs

Downloading the data from ESS round 10

ESS1 <- read_csv(file = '/Users/admin/Downloads/ESS10/ESS10.csv')

Selecting the country & needed variables from dataset

ESS101 <- ESS1 %>% 
  filter(cntry == "CH") %>% 
  select(idno, acchome, sclact, closepnt, teamfeel, happy, ttminpnt)

ESS10_11 <- ESS1 %>% 
  filter(cntry == "CH") %>% 
  select(idno, acchome, sclact, closepnt, teamfeel, happy, ttminpnt)

Nominal variable: “acchome”

This variables represent the ability of the respondent to access the internet from home

#R represens this variable as numeric, so we assigning factor variable type
ESS101$acchome <- factor(ESS101$acchome, labels = c("Don't have an access", "Have an access"), ordered= F)

class(ESS101$acchome)

## [1] "factor"

summary(ESS101$acchome)

## Don't have an access       Have an access 
##                  104                 1419

Plot 1: How many people have access to the internet at home?

ggplot() +
  geom_bar(data = ESS101, aes(x = acchome), fill="#00FFFF", col="#0000FF", alpha = 0.5) +
  xlab("Having an ability to access the internet: Home") + 
  ylab("Number of people") +
  ggtitle("The level of people`s access to the Internet at home")

This is nominal (binary) variable, thus we cannot check normality of the distribution and describe it’s shape.

Conclusion 1: In Switzerland there are much more people who have an access to the internet at home in comparison to those who don’t have.

Ordinal variable 1: “sclact”

This variable represents answers of respondents to the question “Compared to other people of your age, how often would you say you take part in social activities?”.

ESS10_sclact <- ESS10_11 %>% 
  filter(sclact != 8 & sclact != 7)

#Deleting observations, which are not needed for the analysis: Refusal* & Don't know*
ESS101$sclact[ESS101$sclact == 8 | ESS101$sclact == 7] <- NA

#R represents this variable as numeric, so we assigning ordered factor variable type
ESS101$sclact <- factor(ESS101$sclact, labels = c("Much less than most", "Less than most", "About the same", "More than most", "Much more than most"), ordered= T)

class(ESS101$sclact)

## [1] "ordered" "factor"

summary(ESS101$sclact)

## Much less than most      Less than most      About the same      More than most 
##                 116                 455                 691                 206 
## Much more than most                NA's 
##                  31                  24

Plot 2: How often do people participate in social activities (compared to others of same age)?

ESS101 = ESS101 %>%
  filter(sclact != 8 )%>%
  filter(sclact != 9 )%>% 
  filter(sclact != 7 )

ESS101$sclact <- factor(ESS101$sclact, labels = c("Much less than most", "Less than most", "About the same", "More than most", "Much more than most"), ordered= F)

ggplot(ESS101 %>% 
         filter(sclact != "NA")) +
  geom_bar(aes(x = sclact), fill="#99FF66", col="#990033", alpha = 0.5) +
  xlab("Frequency of participation in social activities") + 
  ylab("Number of people") +
  ggtitle("The degree of participation in social activities (compared to others of same age)")

The distribution is pretty normal, however it is a bit right-skewed.

Conclusion 2: People in Switzerland think that they are take part in social activities in the same level as others of their age. Fewer people tend to think they participate less than others.

Ordinal variable 2: “closepnt”

This variable represents answers of respondents to the question “Taking everything into consideration, how close do you feel to him/her?”.

ESS10_closepnt <- ESS10_11 %>% 
  select(idno, closepnt) %>% 
  filter(closepnt < 6)

#Deleting observations, which are not needed for the analysis: Not applicable* & Refusal* & Don't know*
ESS101$closepnt[ESS101$closepnt == 6 | ESS101$closepnt == 7 | ESS101$closepnt == 8 | ESS101$closepnt == 9] <- NA

#R represents this variable as numeric, so we assigning ordered factor variable type
ESS101$closepnt <- factor(ESS101$closepnt, labels = c("Extremely close", "Very close", "Quite close", "Not very close", "Not at all close"), ordered= T)

class(ESS101$closepnt)

## [1] "ordered" "factor"

summary(ESS101$closepnt)

##  Extremely close       Very close      Quite close   Not very close 
##              209              432              237               67 
## Not at all close             NA's 
##               23              531

Plot 3: How are people close to their parents?

ggplot(ESS101 %>% 
         filter(closepnt != "NA")) +
  geom_bar(aes(x = closepnt), fill="#CCCCFF", col="#FF7F50", alpha = 0.5) +
  xlab("How close to parents") + 
  ylab("Number of people") +
  ggtitle("The degree of closeness to parents")

The distribution is pretty normal, however it is a bit right-skewed.

Conclusion 3: We see that people in Switzerland are more inclined to be close to their parents, fewer people have distant relationships inside family.

Interval variable 1: “teamfeel”

This variable represents answers of respondents to the question “If you work in a team, how much do you feel like part of your team?”

ESS10_teamfeel <- ESS101 %>%
  select(teamfeel) %>%
  filter(teamfeel <= 10)

class(ESS10_teamfeel$teamfeel)

## [1] "numeric"

summary(ESS10_teamfeel$teamfeel)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   9.000   8.487  10.000  10.000

Plot 4: Do people in Switzerland feel part of their working team?

ggplot(ESS10_teamfeel)+
  geom_histogram( aes(x = teamfeel), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How much people feel like a part of their working team") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=2))+
  geom_vline(aes(xintercept = mean(teamfeel), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(teamfeel), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(teamfeel), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("The level of feeling like a part of working team")

ggplot(ESS10_teamfeel)+
  geom_density( aes(x = teamfeel), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How much people feel like a part of their working team") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=2))+
  geom_vline(aes(xintercept = mean(teamfeel), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(teamfeel), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(teamfeel), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("The level of feeling like a part of working team")

The distribution is not normal, and very left-skewed.

Conclusion 4: As it can be seen from the graph, people in Switzerland mostly feel like a part of their working team as the histogram is left-skewed. Moreover, all central tendency measurement are higher then 8, which represents high lefel of feeling like a part of a workig team.

Interval variable 2: “happy”

This variable represents answers of respondents to the question “Taking all things together, how happy would you say you are?”

ESS10_happy <- ESS101 %>%
  select(happy) %>%
  filter(happy <= 10)

class(ESS10_happy$happy)

## [1] "numeric"

summary(ESS10_happy$happy)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   8.000   8.087   9.000  10.000

Plot 5: Do people in Switzerland feel happy?

ggplot(ESS10_happy)+
  geom_histogram( aes(x = happy), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How much people feel happy") + 
  ylab("Number of people") +
   scale_x_continuous(breaks= seq(0, 10, by=2))+
  geom_vline(aes(xintercept = mean(happy), color = 'mean'), linetype="solid",linewidth = 1) +
  geom_vline(aes(xintercept = median(happy), color = 'median'), linetype="solid", linewidth = 2)+
  geom_vline(aes(xintercept = Mode(happy), color = 'mode'), linetype="solid",linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Happines in Switzerland")

The distribution is not normal, and very left-skewed.

Conclusion 5: The graph illustrates that, on average, people in Switzerkand feel happy as the histogram is left-skewed and mean, median and mode are located approximately in 8 point, which is much higher than the central point (5).

Ratio variable: “ttminpnt”

This variable represents answers of respondents to the question “About how long would it take you to get to where your parents live, on average? Think of the way you would travel and of the time it would take door to door.”

ESS10_ttminpnt <- ESS101 %>%
  select(idno, ttminpnt) %>% 
  filter(ttminpnt != 6666) %>% 
  filter (ttminpnt != 7777) %>% 
  filter(ttminpnt != 8888) %>% 
  filter (ttminpnt != 9999)

class(ESS10_ttminpnt$ttminpnt)

## [1] "numeric"

summary(ESS10_ttminpnt$ttminpnt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    10.0    30.0   179.6   180.0  4320.0

Plot 6: How long does it takes for people to get to their parents?

ggplot(ESS10_ttminpnt)+
  geom_histogram(aes(x = ttminpnt), fill="gray", col="#FF6347", alpha = 0.5) +
  xlab("How long does it takes to get to parents, min") + 
  ylab("Number of people") +
  geom_vline(aes(xintercept = mean(ttminpnt), color = 'mean'), linetype="solid",linewidth = 1) +
  geom_vline(aes(xintercept = median(ttminpnt), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(ttminpnt), color = 'mode'), linetype="solid",linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Time to parents")+
  xlim(0, 1500)+
  ylim(0, 175)

The distribution is not normal, and very right-skewed.

Conclusion 6: On average, it takes 3 hours for people in Switzerland to get to their parents. Much more people tend to spend less than 250 minutes to get to their parents.

Summary of descriptive statistics

v.acchome <- c(NA, Mode(ESS10_11$acchome), NA)
names(v.acchome) <- c("mean", "mode", "median")

v.sclact <- c(NA, Mode(ESS10_sclact$sclact), median(ESS10_sclact$sclact))
names(v.sclact) <- c("mean", "mode", "median")

v.closepnt <- c(NA, Mode(ESS10_closepnt$closepnt), median(ESS10_closepnt$closepnt))
names(v.closepnt) <- c("mean", "mode", "median")

ESS10_teamfeel$teamfeel =  as.numeric(as.character(ESS10_teamfeel$teamfeel))
v.teamfeel <- c(mean(ESS10_teamfeel$teamfeel), Mode(ESS10_teamfeel$teamfeel), median(ESS10_teamfeel$teamfeel))
names(v.teamfeel) <- c("mean", "mode", "median")

ESS10_happy$happy =  as.numeric(as.character(ESS10_happy$happy))
v.happy <- c(mean(ESS10_happy$happy), Mode(ESS10_happy$happy), median(ESS10_happy$happy))
names(v.happy) <- c("mean", "mode", "median")

v.ttminpnt <- c(mean(ESS10_ttminpnt$ttminpnt), Mode(ESS10_ttminpnt$ttminpnt), median(ESS10_ttminpnt$ttminpnt))
names(v.ttminpnt) <- c("mean", "mode", "median")

tendencymeasures =  data.frame(v.acchome, v.sclact, v.closepnt, v.teamfeel, v.happy, v.ttminpnt, stringsAsFactors = FALSE)
kable(tendencymeasures) %>%    
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	v.acchome	v.sclact	v.closepnt	v.teamfeel	v.happy	v.ttminpnt
mean	NA	NA	NA	8.486766	8.086725	179.5907
mode	1	3	2	10.000000	8.000000	10.0000
median	NA	3	2	9.000000	8.000000	30.0000

Plots

Scatterplot

Do people who feel like part of their working team feel happier than those who do not feel like part of their team?

ggplot(ESS101 %>%
         filter(happy <=10 & teamfeel <= 10 ), aes(x = teamfeel, y = happy)) +
  geom_point( )+
  ylab("How happy are you") +
  xlab("How much you feel like a team") +
  ggtitle("Feeling happy due to feeling like a part of a working team") +
   scale_x_continuous(breaks= seq(0, 10, by=2))+
   scale_y_continuous(breaks= seq(0, 10, by=2))+
  theme_minimal()+
  geom_count()

Answer: Looking at the graph, it can be seen that there is a slight connecting between feeling of belonging to the working team and happiness as quite many high points of each scales corresponds with high values and have bigger size (therefore, there is bigger number of occurrences), however, still there are cases where high values of happiness corresponds with low values of feeling a part of a team and vise versa.

Boxplot

Is there a relation between how close a parent and child live and the closeness of their relationship (assessed by the child)?

ESS_boxplot <- full_join(ESS10_ttminpnt, ESS10_closepnt, by = "idno")

ggplot(ESS101 %>% 
         filter(closepnt != "NA") %>% 
         filter(ttminpnt != 6666) %>% 
         filter (ttminpnt != 7777) %>% 
         filter(ttminpnt != 8888) %>% 
         filter (ttminpnt != 9999), aes(x=closepnt, y=ttminpnt))+
  geom_boxplot(aes(fill = closepnt))+
  stat_summary(fun.y = mean, geom = "point", size = 2, col = "orange")+
  ylim(0, 500)+
  xlab("Closeness to parents")+
  ylab("Time to parents")+
  ggtitle("The relation of closeness within family and time between parents and child")

Answer: We see that the average distance to parents increases with decreasing degree of relationship closeness. Therefore we can conclude that families who live in a longer distance from each other have less close relationships.

Stacked barplot

Are people who have access to the internet at home more involved in social activities (compared to others of the same age)?

ggplot(ESS101 %>% 
         filter(sclact != "NA"), aes(x = sclact, fill = acchome)) +
  geom_bar(position="fill")+
  coord_flip()+
  xlab("The degree of participation in social activities") + 
  ylab("Нaving an ability to access the internet: Home") +
  ggtitle("Participation in social activities due to access to the Internet")

Answer: Looking at the graph, we can see that in Switzerland a fairly large number (about 80%) of people who have internet access at home think that they participate in social activities much less than others their age. Also a large number of people (about 90%) who have internet access at home think that they participate in social activities much more than others.

Summary of findings

As a result of considering six variables that are related to the topic of “Digital and social contacts within family and workplace and its relation to subjective well-being and social exclusion”, we can see that overall, people who are less engaged in social and digital interactions tend to have much lower indicators of subjective well-being. For example, the more a person feels like a part of a team, the more they feel happy. What is more, families who live not far from each other (the distance is small) tend to have close relationships in comparison to those families who live in a longer distance. And the final point is that people who have an access to Internet at home tend to significantly greater be socially active compared to others.

Project 2

Downloading the data from ESS round 10

ESS2 <- read_csv(file = '/Users/admin/Downloads/ESS10/ESS10.csv')

Selecting the country & needed variables from dataset

ESS10 <- ESS2 %>% 
  filter(cntry == "CH") %>% 
  select(idno, acchome, domicil, gndr, ttminpnt, speakpnt)

ESS10_1 <- ESS2 %>% 
  filter(cntry == "CH") %>% 
  select(idno, acchome, domicil, gndr, ttminpnt, speakpnt)

Describing variables

Label = c("acchome", "domicil", "gndr", "ttminpnt", "speakpnt") 
Meaning = c("Ability to acess the Internet from home", "Area of living type", "Gender", "Time to parents in minutes", "Frequancy of speaking to parents")
Level_Of_Measurement <- c("Nominal, binary", "Nominal", "Nominal, binary", "Ratio", "Ordinal")
df <- data.frame(Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)

kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label	Meaning	Level_Of_Measurement
acchome	Ability to acess the Internet from home	Nominal, binary
domicil	Area of living type	Nominal
gndr	Gender	Nominal, binary
ttminpnt	Time to parents in minutes	Ratio
speakpnt	Frequancy of speaking to parents	Ordinal

Tests

Chi-square

Research question: Is there a relation between the respondent’s description of the type of the area of living and their ability to access the internet from home?

The issue of internet usage is recently highly developed topic in Switzerland. The most full reaserch on which was conducted in context of World Internet Project. Autors collected the statistics about the usage of Internet in the country according to many parameters (e.g. age, purposes, fears, types of usage, etc.) The research showed that the Switzerland is one of the leading countries according to Internet usage in the Worls. Around 92% of population use Internet. Based on this fact we hypothesized that people all over the country should have equal access to the Internet, there should be no difference in the access to internet in different areas.

Reference: Latzer, Michael and Büchi, Moritz and Festic, Noemi, Internet Use in Switzerland 2011—2019: Trends, Attitudes and Effects. Summary Report from the World Internet Project – Switzerland (2020). Zurich, Switzerland: University of Zurich, 2020

Variables We use 2 categorical variables for this test.

First variable “acchome” – nominal, binary. This variables represent the ability of the respondent to access the internet from home - “Imagine you wanted to access the Internet. At which of these locations would you be able to do it?” (People marked or not marked home as a location to access the internet)

ESS10$acchome <- factor(ESS10$acchome, labels = c("Don't have an access", "Have an access"), ordered= F)
class(ESS10$acchome)

## [1] "factor"

summary(ESS10$acchome)

## Don't have an access       Have an access 
##                  104                 1419

Second variable “domicil” - domicile, respondents description - nominal. This variable represents the respondent’s description of type of the area where they live. For the second variable “domicil” first delete observations, which are not needed for the analysis: 7 = “Refusal” & 8 = “Don’t know” & 9 = “No answer”

ESS10$domicil[ESS10$domicil == 7 | ESS10$domicil == 8 | ESS10$domicil == 9] <- NA
ESS10$domicil <- factor(ESS10$domicil, labels = c("A big city", "Suburbs or outskirts of big city", "Town or small town", "Country village", "Farm or home in countryside"), ordered= F)
class(ESS10$domicil)

## [1] "factor"

summary(ESS10$domicil)

##                       A big city Suburbs or outskirts of big city 
##                              112                              164 
##               Town or small town                  Country village 
##                              386                              800 
##      Farm or home in countryside                             NA's 
##                               60                                1

Descriptive plot

Here we are able to see descriptive plot which shows the amount of people living in a particular type of area.

ggplot(ESS10)+
  geom_bar(aes(x=domicil, fill=acchome), position="stack", na.rm = TRUE)+
  scale_x_discrete(na.translate = FALSE)+
  ggtitle("The relationship between the description of the area type and ability to access the internet from home")+
  xlab("Description of the area of living")+
  ylab("Number of respondents")+
 labs(caption = "ESS10, Switzerland")+
  theme(axis.text.x = element_text(angle=65, vjust = 0.5))

We see, that there is a great majority of people live in a country village. Whereas in the other types of areas there are much less residents. In this case the proportions can be less obvious when we just look at the stacked bar plot, and it will be hard to derive valid conclusions from it. To solve this issue we build plot_xtab to look at the proportions.

library(sjPlot)
plot_xtab (ESS10$domicil, ESS10$acchome, margin = "row", bar.pos = "stack",
         show.summary = TRUE)

Interpretation: We see that proportions are approximately equal, as there is not a big difference between proportions, that is why it is hard to understand whether this difference is significant. That is why we need to do chi-squared test in order to discover it.

Cheking assumptions

Assumptions:

Data is independent, the catagories are mutually exclusive
at least 5 observations per cell

table(ESS10$acchome, ESS10$domicil)

##                       
##                        A big city Suburbs or outskirts of big city
##   Don't have an access          8                               10
##   Have an access              104                              154
##                       
##                        Town or small town Country village
##   Don't have an access                 17              60
##   Have an access                      369             740
##                       
##                        Farm or home in countryside
##   Don't have an access                           9
##   Have an access                                51

exp<-chisq.test(ESS10$acchome, ESS10$domicil)
exp$expected

##                       ESS10$domicil
## ESS10$acchome          A big city Suburbs or outskirts of big city
##   Don't have an access   7.653088                         11.20631
##   Have an access       104.346912                        152.79369
##                       ESS10$domicil
## ESS10$acchome          Town or small town Country village
##   Don't have an access           26.37582        54.66491
##   Have an access                359.62418       745.33509
##                       ESS10$domicil
## ESS10$acchome          Farm or home in countryside
##   Don't have an access                    4.099869
##   Have an access                         55.900131

The assumption is met.

Chi-square Test

HO: There is no association between the type of the area of living and ability to access the Internet from home

HA: There is association between the type of the area of living and ability to access the Internet from home

chisq.test(ESS10$acchome, ESS10$domicil)

## 
##  Pearson's Chi-squared test
## 
## data:  ESS10$acchome and ESS10$domicil
## X-squared = 10.579, df = 4, p-value = 0.03173

Our p-value = 0.03173, meaning we reject the null hypothesis and state that these two categorical variables are not independently distributed, meaning there is an association between the type of the area of living and ability to access the Internet from home. It means people have different abilities to access the Internet from home in different types of areas they live in.

Post-Hoc test

The analysis of the standardized residuals:

res <- chisq.test(ESS10$acchome, ESS10$domicil)
res$stdres

##                       ESS10$domicil
## ESS10$acchome          A big city Suburbs or outskirts of big city
##   Don't have an access  0.1349795                       -0.3952332
##   Have an access       -0.1349795                        0.3952332
##                       ESS10$domicil
## ESS10$acchome          Town or small town Country village
##   Don't have an access         -2.1892416       1.0854129
##   Have an access                2.1892416      -1.0854129
##                       ESS10$domicil
## ESS10$acchome          Farm or home in countryside
##   Don't have an access                   2.5581477
##   Have an access                        -2.5581477

Describe residuals: The residuals of 2.5581477 and -2.5581477 that appear for intersection of both “do not have an access” and “have an access” in a “Farm or home in countryside” category indicate substantial deviations between the observed and expected values. There is a positive association between living in farm or home in countryside and not having access to the Internet from home.

In the “Town or small town” category, the indicators are also beyond -2 and 2. So there is a positive association between living in a town or small town and having an access to the internet from home.

Other values are in the range from -2 to 2, meaning this deviation is not different from the expected values.

Visualize residuals:

corrplot(chisq.test(ESS10$acchome, ESS10$domicil)$stdres,  is.corr = FALSE, method = "number")

Conclusions:

After conducting chi-squared test we can conclude that there is a relation between between the respondent’s description of the type of the area of living and their ability to access the Internet from home (or the categorical variables “domicil” and “acchome” are not independently distributed). Based on the residuals analysis, we can conclude that the variables that have the most influence on the test results. We see that in our sample there are many more people from town or small city who do not have access to the incinerator at home than we expected. On the other hand, we see that people who live in a farm or home in a countryside and have access to the Internet turned out to be much more than expected.

Thus, it can be concluded that our original hypothesis cannot be confirmed: people from Switzerland, living in different places, have different levels of access to the internet from home.

T-test

Research question: Do Swiss people of different gender (female, male) have the different mean time in minutes spent on the getting to parent’s place of living?

Research of Kolk and Martin was aimed at figuring out the geographical distance of children from their parents of different gender in Sweden. Unfortunatly the study do not have the data about children of different age, however it provides the information that mothers in comparison to fathers tend to live closer to their children. We introduced this logic to our data eximation and hypothesized that female children are tend to live closer to parents in Switherland.

Reference: Kolk, Martin (2016). A Life-Course Analysis of Geographical Distance to Siblings, Parents, and Grandparents in Sweden. Population, Space and Place

Data inspection

We are going to do independent samples t-test, where: Categorical variable: gndr - Gender of respondents

ESS10_ttminpnt <- ESS10 %>%
  select(gndr, ttminpnt) %>% 
  filter(ttminpnt != 6666) %>% 
  filter (ttminpnt != 7777) %>% 
  filter(ttminpnt != 8888) %>% 
  filter (ttminpnt != 9999)

ESS10_ttminpnt$gndr <- factor(ESS10_ttminpnt$gndr, labels = c("Male", "Female"), ordered= F)
class (ESS10_ttminpnt$gndr)

## [1] "factor"

summary(ESS10_ttminpnt$gndr)

##   Male Female 
##    391    397

Description of variables: The “gndr” variable is a categorical and binary, since there are 2 variants (according to descriptive statistic function there are 391 males and 397 females). R identified the class of the variable as “numeric” one, but we converted it into “factor”, which corresponds to categorical type of data.

Continuous variable: ttminpnt - Travel time to parent, in minutes

ESS10_ttminpnt$ttminpnt <- as.numeric(ESS10_ttminpnt$ttminpnt)
class(ESS10_ttminpnt$ttminpnt)

## [1] "numeric"

summary(ESS10_ttminpnt$ttminpnt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    10.0    30.0   188.1   180.0  4320.0

The variable “ttminpnt” is a continuous variable. It was identified as “integer” type by R so we we converted it into “numeric”. According to central tendency measures of “ttminpnt”, we can see that the mean of getting to the parents is 188.1 and median is 30.0, also minimum is 0 and the max is 4320.

Descriptive plot

Boxplot can help to visualize our data:

library (ggplot2)
ggplot(ESS10_ttminpnt)+
  geom_boxplot(aes(x=gndr, y=ttminpnt), fill="#FFDDFF", col="#221100",alpha = 0.5)+
  ylim(0, 500)+
  ggtitle("Minutes spent on getting to the parents by Gender of Respondent")+
  xlab("Gender of respondents")+
  ylab("Duration of time for getting to the parents in minutes")

Interpretation: based on the plot we see that females (time is approximately 25 minutes) need more time to get to parents then males (time is approximately 20 minutes). It also means that distance between parents and females more than males and their parents.

Summary about data inspection:

there is > 300 observations in both groups
females need more time to get to parents (= on average, they have longer distance between themselves and parents)

Checking assumptions

Сhecking the normality assumption for the t-test

Here we are going to check normality of distribution of our continuous variable (time to get to parents) by Gender.

Histogram

ggplot(ESS10_ttminpnt, aes(x = ttminpnt, color = gndr, fill = gndr)) +
      geom_density(alpha = 0.5) +
      labs(title = "Minutes spent on getting to parents by Gender", x = "Duration of time to get to the parents in Minutes", y = "Density") +
      theme_classic()

Interpretation: this histogram show that the distributions are skewed to the right (i.e. the right tail is stretched).

Skew and kurtosis

#install.packages("psych")
library(psych)
describeBy(ESS10_ttminpnt, group = ESS10_ttminpnt$gndr)

## 
##  Descriptive statistics by group 
## group: Male
##          vars   n   mean     sd median trimmed   mad min  max range skew
## gndr        1 391   1.00   0.00      1    1.00  0.00   1    1     0  NaN
## ttminpnt    2 391 195.61 382.47     30   99.64 37.06   0 3000  3000  3.3
##          kurtosis    se
## gndr          NaN  0.00
## ttminpnt    14.71 19.34
## ------------------------------------------------------------ 
## group: Female
##          vars   n   mean     sd median trimmed   mad min  max range skew
## gndr        1 397   2.00   0.00      2    2.00  0.00   2    2     0  NaN
## ttminpnt    2 397 180.73 352.11     35  103.69 44.48   1 4320  4319 5.34
##          kurtosis    se
## gndr          NaN  0.00
## ttminpnt     49.2 17.67

Interpretation:

Males: skew (3.3) is not normal (more than 0.5). And kurtosis (14.71) is not normal (more than 1), as the graph above tells us (very sharp top and long tail).

Females: skew (5.34) is not normal (more than 0.5). And kurtosis (49.2) is not normal (more than 1), as the graph above tells us (very sharp top and long tail).

In both groups distribution is skewed and not normal.

QQ-plot

Here we are going also to test normality of variables:

qqnorm(ESS10_ttminpnt$ttminpnt)
qqline(ESS10_ttminpnt$ttminpnt)

Interpretation: Q-Q plot do not look normal (heavy right tail and U-shaped line). Also we can see that the points on the plot do not follow a straight line.

Shapiro test

Here we also check the normality of our data with a help of test.

shapiro.test(ESS10_ttminpnt$ttminpnt)

## 
##  Shapiro-Wilk normality test
## 
## data:  ESS10_ttminpnt$ttminpnt
## W = 0.54001, p-value < 2.2e-16

Interpretation: according to Shapiro test we reject our null hypothesis (p-value < 0,05), so there is not a normal distribution.

Homogeneity of variances assumption

Here is visualization of comparison of the variances in the groups (males and females) with the help of boxplots:

ggplot(ESS10_ttminpnt, aes(x = gndr, y = ttminpnt)) + 
    ylim(0, 500)+
  geom_boxplot() +
  stat_summary(fun.y = mean, geom = "point", shape = 4, size = 4) +
  theme_classic() +
  ggtitle("Minutes spent on getting to parents by Gender of Respondent")

Interpretation: Women have a wider distribution, while men have a smaller one. Women spend more time on average to reach their parents than men (median in females group is slightly more than in males group). The mean among women and men is almost the same. Data distributions (women and men) are skewed because of the mean points are significantly displaced towards the longer tail of the distribution in both groups and do not align well with the medians. Also there are many outliers (points on the plot).

Here we are going to use the test in order to check our visualization results.

H0: Variances are equal.

HA: Variances are not equal.

bartlett.test(ESS10_ttminpnt$ttminpnt ~ ESS10_ttminpnt$gndr)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  ESS10_ttminpnt$ttminpnt by ESS10_ttminpnt$gndr
## Bartlett's K-squared = 2.683, df = 1, p-value = 0.1014

Interpretation: according to Bartlett test we are failed to reject our null hypothesis (p-value > 0,05), so variances of groups are equal.

T-Test

The distributions of the continuous variable are not normal but the number of observations in both groups is high enough, so we can try to run t-test (and ignore non-parametric for now).

H0: The mean value of time to get to the parents of males is equal to mean value of of time to get to the parents of females.

HA: The mean value of time to get to the parents of males is not equal to mean value of of time to get to the parents of females.

Note: variances are equal (according our previous results), so Welch’s correction should be applied

t.test(ESS10_ttminpnt$ttminpnt ~ ESS10_ttminpnt$gndr, var.equal = F)

## 
##  Welch Two Sample t-test
## 
## data:  ESS10_ttminpnt$ttminpnt by ESS10_ttminpnt$gndr
## t = 0.56798, df = 778.57, p-value = 0.5702
## alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
## 95 percent confidence interval:
##  -36.54920  66.31075
## sample estimates:
##   mean in group Male mean in group Female 
##             195.6113             180.7305

Interpretation: according to Welch Two Sample t-test we are failed to reject our null hypothesis (p-value > 0,05), so there is no statistically significant difference in mean of time to get to the parents between males and females.

Effect size (t-test)

cohen.d(ESS10_ttminpnt$ttminpnt ~ ESS10_ttminpnt$gndr, na.rm = T)

## 
## Cohen's d
## 
## d estimate: 0.04049353 (negligible)
## 95 percent confidence interval:
##       lower       upper 
## -0.09938187  0.18036893

Interpretation: according to the results the Cohen’s d effect size estimate is 0.04049353. This value indicates a negligible effect size, which means that there is very little difference between the mean values of the two groups being compared (we can also prove our results of t-test in such way).

Non-parametric t-test

Since our data is not normally distributed, t-test is not really reliable in this case. So there is a need to do non-parametric t-test (Wilcox test) for double-checking the results.

H0: The mean of time to get to the parents in minutes of males is equal to mean of time to get to the parents in minutes of females.

HA:The mean of time to get to the parents in minutes of males is not equal to mean of time to get to the parents in minutes of females.

wilcox.test(ESS10_ttminpnt$ttminpnt ~ ESS10_ttminpnt$gndr)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ESS10_ttminpnt$ttminpnt by ESS10_ttminpnt$gndr
## W = 72632, p-value = 0.1183
## alternative hypothesis: true location shift is not equal to 0

Interpretation: according to Wilcox test our p-value is 0.1183, which is greater than 0,05. So, we are failed to reject our null hypothesis, that means there is no significant difference in means of time to get to parents between women and men.

wilcox_effsize(ttminpnt ~ gndr, data = ESS10_ttminpnt, na.rm = T)

## # A tibble: 1 × 7
##   .y.      group1 group2 effsize    n1    n2 magnitude
## * <chr>    <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 ttminpnt Male   Female  0.0556   391   397 small

Interpretation: based on our results there is effect size = 0.05564254, that we can interpret as small effect (we can also prove our results of non-parametric test in such way). It means really little difference of means of time to get to parents between males and females.

Conclusions and answer to the RQ: Based on the results after conducting visualizations and test the data is not normally distributed. Also according to the tests we provided, there is no statistically significant difference in the mean time in minutes to get to the parents between females and males. Thus, there is no enough proofs to state that Swiss people of different gender have different mean time in minutes spent on getting to the parents.

ANOVA

Research question: Is there a relation between the the amount of the time people spend to get to their parents and their frequency of live speaking in Swiztherland?

The study conducted by Schwarz, Trommsdorff, Albert and Mayer eximined the relationship quality of parent-child relationships. One of the measures which they used in the analysis was “residential distance”. They found out that residential distance have negative correlation with emotional and instrumental support types, expetially for mother-child relationships. Therefore we hypothesized, that there is a relation between distance between parent and child and frequancy of their communication.

Reference: Beate Schwarz; Gisela Trommsdorff; Isabelle Albert; Boris Mayer (2005). Adult Parent–Child Relationships: Relationship Quality, Support, and Reciprocity. , 54(3), 396–417. doi:10.1111/j.1464-0597.2005.00217.x

Data inspection

ESS10_anova <- ESS2 %>% 
  filter(cntry == "CH" & speakpnt <= 7 & ttminpnt != 6666) %>% 
  select(idno, ttminpnt, speakpnt)

First variable speakpnt – This variable answers the question “How often do you speak with them in person? Please only include occasions where you are physically in the same location.” And indicate the frequancy of speaking to parents in person.

ESS10_anova$speakpnt <- factor(ESS10_anova$speakpnt, labels = c('Several times a day', 'Once a day', 'Several times a week', 'Several times a month', 
                                                    'Once a month', 'Less often', 'Never' ), ordered = T)
class(ESS10_anova$speakpnt)

## [1] "ordered" "factor"

summary(ESS10_anova$speakpnt)

##   Several times a day            Once a day  Several times a week 
##                    24                    34                   177 
## Several times a month          Once a month            Less often 
##                   234                    86                   210 
##                 Never 
##                    36

The second variable ttminpnt was described previously - Travel time to parent, in minutes

ESS10_anova$ttminpnt <- as.numeric(ESS10_anova$ttminpnt)
class(ESS10_anova$ttminpnt)

## [1] "numeric"

summary(ESS10_anova$ttminpnt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    10.0    30.0   335.6   239.0  8888.0

The variable “ttminpnt” is a continuous variable.

Descriptive plot

Now lets make a box plot in order to estimate our data

ggplot(ESS10_anova)+
  geom_boxplot(aes(x=speakpnt, y=ttminpnt), fill="#367588", col="#6a5acd", alpha = 0.5)+
   scale_x_discrete(na.translate = FALSE)+
  ggtitle("Relationship between the frequency of live communication with parents and time of people getting to parents")+
  xlab("How often speak")+
  ylab("Time to parent")+
  theme(axis.text = element_text(size = 7, angle=90))

Interpretation: we see that some groups have visual difference, however some of them not. It is hard to estimate the difference because of the size of the boxes since we have many outliers.

Lets group categories by the approximate frequency

ESS10_anova$speak <- rep(NA, length(ESS10_anova$speakpnt)) #new variable with grouped data from speakpnt

ESS10_anova$speak [ESS10_anova$speakpnt == "Several times a day"| 
ESS10_anova$speakpnt == "Once a day"] <- "Daily"

ESS10_anova$speak [ESS10_anova$speakpnt == "Several times a week" ] <- "Weekly"

ESS10_anova$speak [ESS10_anova$speakpnt  == "Several times a month"| 
ESS10_anova$speakpnt  == "Once a month" ] <- "Monthly" 

ESS10_anova$speak [ESS10_anova$speakpnt  == "Less often"] <- "Less often"
ESS10_anova$speak [ESS10_anova$speakpnt  == "Never" ] <- "Never" 

ESS10_anova$speak  <- as.factor(ESS10_anova$speak)

ESS10_anova$speak <- factor(ESS10_anova$speak, levels = c("Daily", "Weekly", "Monthly", "Less often", "Never"))
table(ESS10_anova$speak)

## 
##      Daily     Weekly    Monthly Less often      Never 
##         58        177        320        210         36

And make a box plot for the new groups of variables

ggplot(ESS10_anova)+
  geom_boxplot(aes(x=speak, y=ttminpnt), fill="#367588", col="#6a5acd", alpha = 0.5)+
   scale_x_discrete(na.translate = FALSE)+
  ggtitle("Relationship between the frequency of live communication with parents and time of people getting to parents")+
  xlab("How often speak")+
  ylab("Time to parent")+
  theme(axis.text = element_text(size = 7, angle=90))

Interpretation: we still se some difference in groups of different frequency of live speaking with parents, but it is hard to estimate significance of this difference only visually

Checking assumptions for ANOVA test Homogentity of variances

H0 variances are equal

H1 variances are not equal

leveneTest(ESS10_anova$ttminpnt ~ ESS10_anova$speak)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   4  19.414 2.956e-15 ***
##       796                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Variances are not equal as p.value is less then 0,005. Thus, we will use var.equal = F in our ANOVA test later.

Performing F-test

oneway.test(ESS10_anova$ttminpnt ~ ESS10_anova$speak, var.equal = F)

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  ESS10_anova$ttminpnt and ESS10_anova$speak
## F = 11.486, num df = 4.00, denom df = 178.95, p-value = 2.541e-08

str(oneway.test(ESS10_anova$ttminpnt ~ ESS10_anova$speak, var.equal = F))

## List of 5
##  $ statistic: Named num 11.5
##   ..- attr(*, "names")= chr "F"
##  $ parameter: Named num [1:2] 4 179
##   ..- attr(*, "names")= chr [1:2] "num df" "denom df"
##  $ p.value  : num 2.54e-08
##  $ method   : chr "One-way analysis of means (not assuming equal variances)"
##  $ data.name: chr "ESS10_anova$ttminpnt and ESS10_anova$speak"
##  - attr(*, "class")= chr "htest"

Cheching the residuals

one.way.anova <- aov(ESS10_anova$ttminpnt ~ ESS10_anova$speak)
summary(one.way.anova)

##                    Df    Sum Sq  Mean Sq F value Pr(>F)    
## ESS10_anova$speak   4 118573377 29643344   24.19 <2e-16 ***
## Residuals         796 975437988  1225425                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p.valuse if less then 0.05, the difference in the level of time needed to get to parents values across different frequency groups is statistically significant

Now lets check the second assumprion for ANOVA - the normality of residuals

plot(one.way.anova, 2)

We see that the points are not lying along the diagonal line, so our distribution is far from normal

Now lets check the normality of residuals using a test

anova_residuals <- residuals(one.way.anova)
describe(anova_residuals)

##    vars   n mean      sd median trimmed   mad      min     max   range skew
## X1    1 801    0 1104.22 -84.38 -117.09 71.57 -1597.25 8750.35 10347.6 6.14
##    kurtosis    se
## X1     41.1 39.02

The skew and kurtosis are much more than 2, so we again see that the residuals are not normal

shapiro.test(x = anova_residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  anova_residuals
## W = 0.33506, p-value < 2.2e-16

The data definitely is not normal as p value is so low

hist(anova_residuals)

Visually we also see that resiaduals are not normal as it is skewed to the right and have many outliers

As not all the assumprions for ANOVa are not met (namely, our residuals are not distibuted normally), we will use non-parametric ANOVA, which is Kruskal-Wallis test.

kruskal.test(ESS10_anova$ttminpnt ~ ESS10_anova$speakpnt)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  ESS10_anova$ttminpnt by ESS10_anova$speakpnt
## Kruskal-Wallis chi-squared = 391.25, df = 6, p-value < 2.2e-16

P.value is less than 0.05 so there is a significant difference between mean ranks of different frequency groups

Post-Hoc for non parametric test

DunnTest(ESS10$ttminpnt ~ ESS10$speakpnt)

## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##       mean.rank.diff    pval    
## 2-1       -542.97148 8.4e-14 ***
## 3-1       -717.23890 < 2e-16 ***
## 4-1       -657.65260 < 2e-16 ***
## 5-1       -559.38737 < 2e-16 ***
## 6-1       -361.76776 1.1e-15 ***
## 7-1       -295.86223  0.0027 ** 
## 66-1       133.62426  0.0076 ** 
## 77-1       -98.62574  1.0000    
## 88-1       502.62426  1.0000    
## 3-2       -174.26741  0.2120    
## 4-2       -114.68111  1.0000    
## 5-2        -16.41588  1.0000    
## 6-2        181.20373  0.1575    
## 7-2        247.10926  0.1575    
## 66-2       676.59574 < 2e-16 ***
## 77-2       444.34574  1.0000    
## 88-2      1045.59574  0.2439    
## 4-3         59.58630  1.0000    
## 5-3        157.85153  0.0893 .  
## 6-3        355.47114 3.9e-16 ***
## 7-3        421.37667 5.5e-07 ***
## 66-3       850.86316 < 2e-16 ***
## 77-3       618.61316  0.6543    
## 88-3      1219.86316  0.0893 .  
## 5-4         98.26523  0.9297    
## 6-4        295.88484 1.2e-12 ***
## 7-4        361.79037 2.6e-05 ***
## 66-4       791.27686 < 2e-16 ***
## 77-4       559.02686  0.9297    
## 88-4      1160.27686  0.1286    
## 6-5        197.61961  0.0058 ** 
## 7-5        263.52514  0.0342 *  
## 66-5       693.01163 < 2e-16 ***
## 77-5       460.76163  1.0000    
## 88-5      1062.01163  0.2221    
## 7-6         65.90553  1.0000    
## 66-6       495.39202 < 2e-16 ***
## 77-6       263.14202  1.0000    
## 88-6       864.39202  0.6543    
## 66-7       429.48649 4.1e-08 ***
## 77-7       197.23649  1.0000    
## 88-7       798.48649  0.9297    
## 77-66     -232.25000  1.0000    
## 88-66      369.00000  1.0000    
## 88-77      601.25000  1.0000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that we have significance difference in the following groups: Once a month-Several times a day, Less often-Several times a day, Never-Several times a day, Several times a month-Once a day, Once a month-Once a day, Less often-Once a day, Never-Once a day, Several times a month-Several times a week, Once a month-Several times a week, Less often-Several times a week, Never-Several times a week, Once a month-Several times a month, Less often-Several times a month, Never-Several times a month, Less often-Once a month, Never-Once a month.

Effect size

epsilonSquared(x = ESS10$ttminpnt, g = ESS10$speakpnt)

## epsilon.squared 
##            0.71

We got a result 0,489 which represents large effect, so we have strong statistically significant difference among different groups pof frequency of live speaking to parents.

Conclusions and answer to the RQ: In conclusion, we can see a relation between residental distance and frequency of in person communication in parent-child relations: The larger the distance between parent and child, the less frequently they communicate in person. We see statistical support of out research hypothesis.

Project 3

As we mentioned in our previous projects, we chose Switzerland as a country for our analysis. Switzerland is known for its rich economy and stable political system. Switzerland is the world’s ninth-happiest country according to the world happiness record 2024. We wondered how the social interactions of people in this country affect their sense of general well-being and happiness.

The overall topic of our research is: ‘Digital and social contacts within family and workplace and its relation to subjective well-being and happiness’. However, for this part of our research we decided to focus on feelings of happiness and to look at potential social factors that may influence happiness levels.

Thus, our research question is: What factors connected to social contacts are related to people’s level of happiness?

Happiness is currently considered one of the most important individual goals in human life. We decided to focus on people’s happiness because we believe that happiness is the most general indicator of a person’s emotional state and well-being. It is known that there are statistically significant factors that influence the level of people’s happiness (for example: health, earnings, etc.). We wondered whether other, less known factors related to social interactions can influence people’s happiness levels. For example, the book by Prilleltensky [1] is devoted to a qualitative analysis of the influence of social factors of belonging to different groups on people’s happiness. From the theories of social psychology it is known that belonging to certain communities positively affects the general mental state of a person. Communities and quality social interactions provide a supportive and positive environment.

A related study was conducted using data from Holland, where researchers found correlations between the frequency and quality of people’s social connections and their overall sense of happiness. [2] A number of other studies have also found correlations between the quality of social relationships in the family and happiness. For example, a study by Tammisalo, K., Danielsbacka, M., Tanskanen, A. O., & Arpino, B. reveals how relationships with different family members are related to levels of happiness. [3] In addition to family contact, research reveals the importance of work relationships in influencing an individual’s happiness. [4]

Thus, we propose the following research hypotheses:

The more social contacts a person has and the more often he/she participates in social activities the higher his/her level of happiness.
The more a person feels that he belongs to a community of colleagues, the higher his level of happiness.
The better a person rates the closeness of their relationship with their parents, the higher their level of happiness.
The more work relationships interfere with relationships with family, the lower a person’s level of happiness.

References: Prilleltensky, I., & Prilleltensky, O. (2021). How people matter: Why it affects health, happiness, love, work, and society. Cambridge University Press. Arampatzi, E., Burger, M. J., & Novik, N. (2018). Social network sites, individual social capital and happiness. Journal of Happiness Studies, 19, 99-122. Tammisalo, K., Danielsbacka, M., Tanskanen, A. O., & Arpino, B. (2024). Social media contact with family members and happiness in younger and older adults. Computers in Human Behavior, 153, 108103. Haar, J., Schmitz, A., Di Fabio, A., & Daellenbach, U. (2019). The role of relationships at work and happiness: A moderated moderated mediation study of New Zealand managers. Sustainability, 11(12), 3443.

Descriptive statistics of used variables

Uploud data

ESS <- read.csv('/Users/admin/Downloads/ESS10/ESS10.csv', header = T)

Filtering the data

ESS <- ESS %>% 
  filter(cntry == "CH") %>% 
  select(idno, sclact, sclmeet, closepnt, teamfeel,  happy, hhlipnt, colprop, jbprtfp)

The table with the description of the variables we use for the analysis

Label = c("idno", "sclact", "sclmeet", "closepnt", "teamfeel",  "happy", "jbprtfp") 
Meaning = c("Respondent's identification number", "Taking part in social activities", "How often socially meet with friends, relatives or colleagues", "How close a person feels to parent", "Feeling like part of your work team", "How happy the person is", "Job prevents you from giving time to partner/family, how often")
Level_Of_Measurement <- c("Ratio", "Quasi interval", "Quasi Interval", "Quasi Interval", "Quasi interval", "Quasi interval", "Ordinal")
df <- data.frame(Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)

kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label	Meaning	Level_Of_Measurement
idno	Respondent’s identification number	Ratio
sclact	Taking part in social activities	Quasi interval
sclmeet	How often socially meet with friends, relatives or colleagues	Quasi Interval
closepnt	How close a person feels to parent	Quasi Interval
teamfeel	Feeling like part of your work team	Quasi interval
happy	How happy the person is	Quasi interval
jbprtfp	Job prevents you from giving time to partner/family, how often	Ordinal

Investigating the variables

Outcome

“happy” - How happy are you

table(ESS$happy)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   2   4   7  12  17  51  62 220 537 380 231

ESS$happy <- as.numeric(ESS$happy)

ESS_happy<- ESS %>% 
  select (happy)

table(ESS_happy$happy)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   2   4   7  12  17  51  62 220 537 380 231

summary(ESS_happy$happy)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   8.000   8.086   9.000  10.000

class(ESS_happy$happy)

## [1] "numeric"

ggplot(ESS_happy)+
  geom_histogram( aes(x = happy), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How happy people feel") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(happy), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(happy), color = 'median'), linetype="solid", linewidth = 2.5)+
  geom_vline(aes(xintercept = Mode(happy), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Feeling of happiness")

The data is not normally distributed, we see long left tail, so the data is skewed to the left. Mean, mode and median are the same and equal to 8 score, meaning Swiss people have high level of subjective well-being.

describeBy(ESS_happy$happy, group = ESS_happy$happy >0)

## 
##  Descriptive statistics by group 
## group: FALSE
##    vars n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 2    0  0      0       0   0   0   0     0  NaN      NaN  0
## ------------------------------------------------------------ 
## group: TRUE
##    vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 1521  8.1 1.46      8    8.26 1.48   1  10     9 -1.33     3.12 0.04

Interpretation: Skew (-1.33) is not normal (less than - 0.5). And kurtosis (3.12) is not normal (more than 1), as the graph above tells us (not normally distributed with a sharp top and long left tail). So distribution is not normal.

Predictors

“sclact” - Taking part in social activities

table(ESS$sclact)

## 
##   1   2   3   4   5   7   8 
## 116 455 691 206  31   1  23

ESS_sclact <- ESS %>% 
  select( sclact) %>% 
  filter(sclact < 6)

table(ESS_sclact$sclact)

## 
##   1   2   3   4   5 
## 116 455 691 206  31

class(ESS_sclact$sclact)

## [1] "integer"

summary(ESS_sclact$sclact)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    3.00    2.72    3.00    5.00

ESS_sclact$sclact <- as.numeric(ESS_sclact$sclact)

ggplot(ESS_sclact)+
  geom_histogram( aes(x = sclact), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("Taking part in social activities") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(sclact), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(sclact), color = 'median'), linetype="solid", linewidth = 3)+
  geom_vline(aes(xintercept = Mode(sclact), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("The degree of participation in social activities (compared to others of same age)")

The distribution is pretty normal, however it is a bit right-skewed. People in Switzerland think that they are take part in social activities in the same level as others (“3” stands for “About the same”) of their age. Fewer people tend to think they participate less than others (“1” and “2” - “Much less than most” and “Less than most” respectively. And the lowest number (“5” stands for “Much more than most”) of people believe they participate much more than their peers.

describeBy(ESS_sclact, group = ESS_sclact$sclact >0)

## 
##  Descriptive statistics by group 
## group: TRUE
##        vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## sclact    1 1499 2.72 0.87      3    2.72 1.48   1   5     4 0.05    -0.04 0.02

Interpretation: Skew (0.05) is normal (less than 0.5). And kurtosis (-0.04) is normal (within +-1), as the graph above tells us (relatively normal histogram without very sharp top and long tails). So distribution is rather normal according to these results.

“sclmeet” - How often socially meet with friends, relatives or colleagues

table(ESS$sclmeet)

## 
##   1   2   3   4   5   6   7  88 
##   7  64 137 321 333 491 169   1

ESS$sclmeet <- as.numeric(ESS$sclmeet)

ESS_sclmeet<- ESS %>% 
  select (sclmeet) %>% 
  filter(sclmeet != 88)

table(ESS_sclmeet$sclmeet)

## 
##   1   2   3   4   5   6   7 
##   7  64 137 321 333 491 169

summary(ESS_sclmeet$sclmeet)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   5.000   5.009   6.000   7.000

ESS_sclmeet$sclmeet <- as.numeric(ESS_sclmeet$sclmeet)

ggplot(ESS_sclmeet)+
  geom_histogram( aes(x = sclmeet), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How often socially meet with friends, relatives or colleagues") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(sclmeet), color = 'mean'), linetype="solid", linewidth = 2.5) +
  geom_vline(aes(xintercept = median(sclmeet), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(sclmeet), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Social meetings")

The data is a little skewed to the left, however mean and median are located in the middle of the scale and they coincide. As “mode” shows, most frequent response of Swiss is that they meet with friends, relatives or colleagues several days a week (as “6” stands for “Several times a week”). Much fewer amount of people report they meet with friends, family or colleagues never or less than once a month (1 and 2 respectively).

describeBy(ESS_sclmeet, group = ESS_sclmeet$sclmeet >0)

## 
##  Descriptive statistics by group 
## group: TRUE
##         vars    n mean   sd median trimmed  mad min max range skew kurtosis
## sclmeet    1 1522 5.01 1.34      5    5.08 1.48   1   7     6 -0.5    -0.39
##           se
## sclmeet 0.03

Interpretation: Skew (-0.5) is normal (within +-0.5). And kurtosis (-0.39) is normal (within +-1), as the graph above tells us (relatively normal histogram without a very sharp top but with a little left tail). So distribution is rather normal according to these results.

“closepnt” - How close a repondent feels to parent

table(ESS$closepnt)

## 
##   1   2   3   4   5   6   7   8 
## 211 441 241  69  23 536   1   1

# Filter the observations
ESS_closepnt <- ESS %>% 
  select(closepnt) %>% 
  filter(closepnt < 6)

table(ESS_closepnt$closepnt)

## 
##   1   2   3   4   5 
## 211 441 241  69  23

# We need to invert the scale first, as in the initial scale 1 stands for "Extremely close" and 5 is for "Not at all close"
ESS_closepnt$closepnt <- as.numeric (6 - ESS_closepnt$closepnt)


ggplot(ESS_closepnt)+
  geom_histogram( aes(x = closepnt), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How close a repondent feels to parent") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(closepnt), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(closepnt), color = 'median'), linetype="solid", linewidth = 3)+
  geom_vline(aes(xintercept = Mode(closepnt), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("The degree of closeness to parents")

Our graph is left-skewed, data is distributed slightly not normally. The majority of respondent estimate that they are vary close to their parents (which is “4” after we recode the variable), while the minority thinks they are not close at all. So, we see that people in Switzerland are more inclined to be close to their parents, fewer people have distant relationships inside family.

describeBy(ESS_closepnt, group = ESS_closepnt$closepnt > 0)

## 
##  Descriptive statistics by group 
## group: TRUE
##          vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## closepnt    1 985 3.76 0.94      4    3.85 1.48   1   5     4 -0.67     0.29
##            se
## closepnt 0.03

Interpretation: Skew (-0.67) is not normal (less than - 0.5). And kurtosis (0.29) is normal (within +-1), as the graph above tells us (relatively normal histogram without a very sharp top but with a little left tail). So distribution is rather normal according to these results, but still not perfecly normal distribution.

“teamfeel” - Feel like part of your work team, how much

table(ESS$teamfeel)

## 
##   0   1   2   3   4   5   6   7   8   9  10  55  66  77  88 
##   7   4   6  13   7  17  23  90 212 187 316 100 530   4   7

ESS$teamfeel <- as.numeric(ESS$teamfeel)

# Filter the observations
ESS_teamfeel <- ESS %>% 
  select (teamfeel) %>% 
  filter (teamfeel <=10)

table(ESS_teamfeel$teamfeel)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   7   4   6  13   7  17  23  90 212 187 316

summary(ESS_teamfeel$teamfeel)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   9.000   8.475  10.000  10.000

ggplot(ESS_teamfeel)+
  geom_histogram( aes(x = teamfeel), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("Feeling like part of your working team") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(teamfeel), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(teamfeel), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(teamfeel), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Feeling like a part of your work team")

The graph is skewed to the left, our data is not distributed normally. The most frequent response of Swiss people is that they completely feel like a part of a team (“10”). The mean (average) response is around 9. By analyzing all central tendency measurement, we define they are higher then 8, meaning individuals mostly feel their belonging to a team and feel comfortable in their working teams.

We defined that our variables “teamfeel” is on the scale from 0 to 10, so we can not calculate skew and kurtosis. So we need to recode our variable by changing the scale from 1 to 11, with all values stay the same in their meaning.

ESS_teamfeel$teamfeel <-  (ESS_teamfeel$teamfeel + 1)
table (ESS_teamfeel$teamfeel)

## 
##   1   2   3   4   5   6   7   8   9  10  11 
##   7   4   6  13   7  17  23  90 212 187 316

library(psych)
describeBy(ESS_teamfeel, group = ESS_teamfeel$teamfeel > 0)

## 
##  Descriptive statistics by group 
## group: TRUE
##          vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## teamfeel    1 882 9.48 1.81     10     9.8 1.48   1  11    10 -2.02     5.37
##            se
## teamfeel 0.06

Interpretation: Skew (-2.02) is not normal (less than - 0.5). And kurtosis (5.37) is not normal (more than 1), as the graph above tells us (not normally distributed with a sharp top and long left tail). So distribution is not normal.

jbprtfp - Job prevents you from giving time to partner/family, how often

table(ESS$jbprtfp)

## 
##   1   2   3   4   5   6  66  77  88 
## 126 277 348 177  18  35 530   4   8

# Filtering observations 
ESS_jbprtfp <- ESS %>% 
  select(idno, jbprtfp) %>% 
  filter(jbprtfp < 6)

table(ESS_jbprtfp$jbprtfp)

## 
##   1   2   3   4   5 
## 126 277 348 177  18

class(ESS_jbprtfp$jbprtfp)

## [1] "integer"

#Recode into 3 categories
ESS_jbprtfp$jbprtfp <- dplyr::recode(ESS_jbprtfp$jbprtfp,
                             "1"= "Never/hardly ever",
                             "2"= "Never/hardly ever",
                             "3"= "Sometimes",
                             "4"= "Often/always",
                             "5"= "Often/always")

#R represents this variable as integer, so we assigning ordered factor variable type
ESS_jbprtfp$jbprtfp <- factor(ESS_jbprtfp$jbprtfp, levels = c("Never/hardly ever", "Sometimes", "Often/always"), ordered= T)



ggplot(ESS_jbprtfp %>% 
         filter(jbprtfp != "NA")) +
  geom_bar(aes(x = jbprtfp), fill="#CCCCFF", col="#FF7F50", alpha = 0.5) +
  xlab("The frequency of job preventing from giving time to partner/family") + 
  ylab("Number of people") +
  ggtitle("The frequency of job preventing from giving time to partner/family")

ost Swiss report that their job never or hardly ever prevents from devoting time to close ones. Fewest amount of respondents estimate that their job always or often prevents them from dedicating time to partner or family. A medium amount of Swiss report that sometimes their job distracts them from giving time to their close people.

Table of discriptive statistics

v.sclact <- c(round(mean(ESS_sclact$sclact), 2), Mode(ESS_sclact$sclact), median(ESS_sclact$sclact))
names(v.sclact) <- c("mean", "mode", "median")

v.sclmeet <- c(round(mean(ESS_sclmeet$sclmeet), 2), Mode(ESS_sclmeet$sclmeet), median(ESS_sclmeet$sclmeet))
names(v.sclmeet) <- c("mean", "mode", "median")

v.closepnt <- c(round(mean(ESS_closepnt$closepnt), 2), Mode(ESS_closepnt$closepnt), median(ESS_closepnt$closepnt))
names(v.closepnt) <- c("mean", "mode", "median")

v.teamfeel <- c(round(mean(ESS_teamfeel$teamfeel), 2), Mode(ESS_teamfeel$teamfeel), median(ESS_teamfeel$teamfeel))
names(v.teamfeel) <- c("mean", "mode", "median")

v.happy <- c(round(mean(ESS_happy$happy), 2), Mode(ESS_happy$happy), median(ESS_happy$happy))
names(v.happy) <- c("mean", "mode", "median")

v.jbprtfp <- c(NA, Mode(ESS_jbprtfp$jbprtfp), "Sometimes")
names(v.jbprtfp) <- c("mean", "mode", "median")


tendencymeasures =  data.frame(v.sclact, v.sclmeet, v.closepnt, v.teamfeel, v.happy,  v.jbprtfp,  stringsAsFactors = FALSE)
kable(tendencymeasures) %>%    
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	v.sclact	v.sclmeet	v.closepnt	v.teamfeel	v.happy	v.jbprtfp
mean	2.72	5.01	3.76	9.48	8.09	NA
mode	3.00	6.00	4.00	11.00	8.00	Never/hardly ever
median	3.00	5.00	4.00	10.00	8.00	Sometimes

Correlation analysis

For our correlation analysis we choose continuous outcome - happy (Respondent report how happy they are) and found four correlations with it.

Our variables: 1) sclact - Taking part in social activities 2) closepnt - How close a person feels to parent 3) teamfeel - Feeling like part of your team
4) sclmeet - How often socially meet with friends, relatives or colleagues

Filtering the data

ESS_cor1 <- ESS %>% 
  select (sclact, happy, idno)%>%
  filter(sclact != 7 & sclact != 8)  %>% 
  filter (happy < 11)

ESS_cor2 <- ESS %>% 
  select (sclmeet, happy, idno)%>%
  filter(sclmeet <=7) %>% 
  filter (happy < 11) 

  
ESS_cor3 <- ESS %>% 
  select (closepnt, happy, idno)%>%
  filter(closepnt <=5) %>% 
  filter (happy < 11)

#Recoding the initial scale of closepnt variable
ESS_cor3$closepnt <- (6 - ESS_cor3$closepnt)

table(ESS_cor3$closepnt)

## 
##   1   2   3   4   5 
##  23  69 241 441 211

ESS_cor4 <- ESS %>% 
  select (teamfeel, happy, idno)%>%
  filter(teamfeel <= 10) %>% 
  filter (happy < 11)

Checking the class of the variables for correlation and change if needed

class(ESS$happy)

## [1] "numeric"

class(ESS$sclact)

## [1] "integer"

class(ESS$sclmeet)

## [1] "numeric"

class(ESS$closepnt)

## [1] "integer"

class(ESS$teamfeel)

## [1] "numeric"

# Changing the variable type to numeric for correlation
ESS$happy <- as.numeric(ESS$happy)
ESS$sclact <- as.numeric(ESS$sclact)
ESS$sclmeet <- as.numeric(ESS$sclmeet)
ESS$closepnt <- as.numeric(ESS$closepnt)
ESS$teamfeel <- as.numeric(ESS$teamfeel)

Checking assumptions for correlation

Shapiro test to define the distribution of the variables and decide what test to use

options(scipen = 999)

shapiro.test(ESS$happy)

## 
##  Shapiro-Wilk normality test
## 
## data:  ESS$happy
## W = 0.8598, p-value < 0.00000000000000022

shapiro.test(ESS$sclact)

## 
##  Shapiro-Wilk normality test
## 
## data:  ESS$sclact
## W = 0.80976, p-value < 0.00000000000000022

shapiro.test(ESS$sclmeet)

## 
##  Shapiro-Wilk normality test
## 
## data:  ESS$sclmeet
## W = 0.34364, p-value < 0.00000000000000022

shapiro.test(ESS$closepnt)

## 
##  Shapiro-Wilk normality test
## 
## data:  ESS$closepnt
## W = 0.80974, p-value < 0.00000000000000022

shapiro.test(ESS$teamfeel)

## 
##  Shapiro-Wilk normality test
## 
## data:  ESS$teamfeel
## W = 0.69648, p-value < 0.00000000000000022

We see that our data in our variables are not distributed normally, so for correlation analysis we apply Spearman test.

Corelation 1 - taking part in social activities and feeling happy

Statistic hypothesis for correlation analysis:

H0: There is no association between taking part in social activities and feeling happy

HA: There is an association between taking part in social activities and feeling happy

cor.test(ESS_cor1$sclact, ESS_cor1$happy, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  ESS_cor1$sclact and ESS_cor1$happy
## S = 497645342, p-value = 0.00001053
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.113525

We can see, that p-value < 0,05, so, we can reject the null hypothesis. We can conclude that sclact and happy correlate statistically significant with the p-value = 0.00001053 (there is monotonic relationship between sclact and happy) and correlation coefficient 0.113525 (0.11) - Positive (but small) statistically significant correlation between taking part in social activities and feeling happy.

library(ggpubr)
ggscatter(ESS_cor1, x = "happy", y = "sclact", 
          add = "reg.line",
          cor.coef = TRUE, 
          corr.method = "spearman",
          xlab = "feeling happy", 
          ylab = "taking part in social activitie") +  
  geom_jitter(width = 0.45, height = 0.45, alpha = 0.5)

Interpretation: We see the line as a positive trend (that rises to the right corner), but dots are not so close to the line (only on the right side) , which means there is not a very strong association.

Corelation 2 - the frequency of social meets with friends, relatives or colleagues and feeling happy

Statistics hypothesis for correlation:

H0: There is no association between the frequency of social meets with friends, relatives or colleagues and feeling happy

HA: There is an association between the frequency of social meets with friends, relatives or colleagues and feeling happy

cor.test(ESS_cor2$sclmeet, ESS_cor2$happy, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  ESS_cor2$sclmeet and ESS_cor2$happy
## S = 546702171, p-value = 0.006581
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.06962447

We can see, that p-value < 0,05, so, we can reject the null hypothesis. We can conclude that sclmeet and happy correlate statistically significant with the p-value = 0.006581 (there is monotonic relationship between sclmeet and happy) and correlation coefficient 0.06962447 (0.07) - Positive (but small) statistically significant correlation between the frequency of social meets with friends, relatives or colleagues and feeling happy.

ggscatter(ESS_cor2, x = "happy", y = "sclmeet", 
          add = "reg.line",
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "feeling happy", 
          ylab = "the frequency of socially meet with friends, relatives or colleagues")+  
  geom_jitter(width = 0.45, height = 0.45, alpha = 0.5)

Interpretation: We see the positive trend (that rises to the right corner), so the frequency of social meets with friends, relatives or colleagues increases and the feeling of happiness also increases. But dots are not very close to the line that means not very strong association but there is a cluster on the right side.

Corelation 3 - The feeling of closeness to a parent and feeling happy

Statistics hypothesis for correlation:

H0: There is no association between the feeling of closeness and feeling happy

HA: There is an association between the feeling of closeness and feeling happy

cor.test(ESS_cor3$closepnt, ESS_cor3$happy, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  ESS_cor3$closepnt and ESS_cor3$happy
## S = 135515083, p-value = 0.000002566
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.1491938

We can see, that p-value < 0,05, so, we can reject the null hypothesis. We can conclude that closepnt and happy correlate statistically significant with the p-value = 0.000002566 (there is monotonic relationship between closepnt and happy) and correlation coefficient 0.1491938 (0.15) - positive (but small) statistically significant correlation between the feeling of closeness to a parent and feeling happy.

library(ggpubr)
ggscatter(ESS_cor3, x = "happy", y = "closepnt", 
          add = "reg.line",
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "feeling happy", 
          ylab = "The feeling of closeness to a parent")+
  geom_jitter(width = 0.45, height = 0.45, alpha = 0.5)

Interpretation: We see the line as a positive rising trend (that rises to the right corner), but dots are not so close to the line (only on the right side), which means there is not a very strong association.

Corelation 4 - Feeling like a part of a work team and feeling happiness

Statistics hypothesis for correlation:

H0: There is no association between the feeling like a part of a team and feeling happy

HA: There is an association between the feeling like a part of a team and feeling happy

cor.test(ESS_cor4$teamfeel, ESS_cor4$happy, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  ESS_cor4$teamfeel and ESS_cor4$happy
## S = 91192793, p-value = 0.000000001281
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.2025443

We can see, that p-value < 0,05, so, we can reject the null hypothesis. We can conclude that teamfeel and happy correlate statistically significant with the p-value = 0.000000001281 (there is monotonic relationship between teamfeel and happy) and correlation coefficient 0.2025443 (0.2) - positive (but small) statistically significant correlation between the feeling like a part of a team and feeling happy.

ggscatter(ESS_cor4, x = "happy", y = "teamfeel", 
          add = "reg.line",
          cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "Feeling happy", 
          ylab = "Feeling like a part of a team") +  
  geom_jitter(width = 0.45, height = 0.45, alpha = 0.5)

Interpretation: We see the line as a positive rising trend (that rises to the right corner), so the feeling like a part of a team increases and the feeling of happiness also increases. So there is not a very strong association (we can also notice the cluster on the right corner).

Correlation matrix

ESS_corr1 <- merge(ESS_cor1, ESS_cor2, all = TRUE)
ESS_corr2 <- merge(ESS_cor3, ESS_cor4, all = TRUE)

ESS_corr <- merge(ESS_corr1, ESS_corr2, all = TRUE)


ESS_corr <- ESS_corr %>% select(-idno)

tab_corr(ESS_corr[, 1:5], 
         corr.method = "spearman", wrap.labels = 70)

	happy	sclact	sclmeet	closepnt	teamfeel
happy		0.120**	0.074*	0.124***	0.214***
sclact	0.120**		0.249***	0.147***	0.091*
sclmeet	0.074*	0.249***		0.166***	0.089*
closepnt	0.124***	0.147***	0.166***		0.129***
teamfeel	0.214***	0.091*	0.089*	0.129***
Computed correlation used spearman-method with listwise-deletion.

A graphical table of the correlation:

sjp.corr(ESS_corr[, 1:5], wrap.labels = 100, decimals = 3, , corr.method = "spearman")

Regression

ESS_reg <- ESS %>% 
  filter(happy < 77 & teamfeel <= 10 & jbprtfp < 6)

ESS_reg$closepnt <- as.numeric(ESS_reg$teamfeel)
ESS_reg$jbprtfp <-  as.factor(ESS_reg$jbprtfp)
ESS_reg$happy <- as.numeric(ESS_reg$happy)

ESS_reg$jbprtfp <- dplyr::recode(ESS_reg$jbprtfp,
                             "1"= "Never/hardly ever",
                             "2"="Never/hardly ever",
                             "3"="Sometimes",
                             "4"="Often/always",
                             "5"="Often/always")
table(ESS_reg$jbprtfp)

## 
## Never/hardly ever         Sometimes      Often/always 
##               353               313               178

m1 <- lm(happy ~ teamfeel, data = ESS_reg)
m2<- lm(happy ~ teamfeel + jbprtfp, data = ESS_reg)
sjPlot::tab_model(m1, m2, show.ci = F)

	happy		happy
Predictors	Estimates	p	Estimates	p
(Intercept)	6.75	<0.001	7.07	<0.001
teamfeel	0.16	<0.001	0.15	<0.001
jbprtfp [Sometimes]			-0.25	0.020
jbprtfp [Often/always]			-0.61	<0.001
Observations	844		844
R² / R² adjusted	0.044 / 0.042		0.070 / 0.067

Model 1

H0: There is no significant relation between feeling happy (outcome) and the team feeling (continious predictor)

HA: There is a significant relation between feeling happy (outcome) and the team feeling (continious predictor)

Model 2

H0: There is no significant relation between feeling happy (outcome) and the frequency with which job prevents from communicating (categorical predictor)

HA: There is a significant relation between feeling happy (outcome) and the frequency with which job prevents from communication (categorical predictor)

Firstly we took a look at the relation between feeling happy and the team feeling and built the first model. The results were significant (p-value <0.001) and R-sq is positive that equals to 0.042 (the model explains by 4.2% the change in the dependent variable, i.e. percentage of the variance in the dependent variable that the independent variable explains). We then added the job preventing communication variable. We see that the second model is significantly better than the first model. There is a relationship between feeling happy and different frequency with which job prevents from communication (P-value < 0.001, P-value = 0.020). Also in this model R-sq is positive (0.067) that means that the model explains by 6.7% the change in the dependent variable. However, it should be noted that the R square is still relatively small, but this model better explains the feeling happy. Let’s compare the models using ANOVA test.

anova(m1,m2)

## Analysis of Variance Table
## 
## Model 1: happy ~ teamfeel
## Model 2: happy ~ teamfeel + jbprtfp
##   Res.Df    RSS Df Sum of Sq      F      Pr(>F)    
## 1    842 1581.4                                    
## 2    840 1537.3  2    44.139 12.059 0.000006864 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to the results of ANOVA test, we can see that the second model (1537.3) has less RSS than the first (1581.4), i.e. the first model has more variation that is not explained by the model, so the second model fits the data better. We also can notice that the second model does explain the outcome variable better as p-value = 6.864e-06.

sjPlot::tab_model(m2, show.ci = F)

	happy
Predictors	Estimates	p
(Intercept)	7.07	<0.001
teamfeel	0.15	<0.001
jbprtfp [Sometimes]	-0.25	0.020
jbprtfp [Often/always]	-0.61	<0.001
Observations	844
R² / R² adjusted	0.070 / 0.067

sjPlot::plot_models(m2)

Interpretation: Based on the model results we see that there is a significant relation between feeling happy and team feeling, different frequency with which job prevents from communication. We see that the more a person feels part of a team (increases by one), the more (by 0.15) they feel happy. The reference jbprtfp category in our model is job which never/hardly prevents communication with friends and relatives. People with this frequency of preventing have the highest degree of feeling happy (7.07). When we compare other categories with the reference one we see, that people who have the job which prevents the communication sometimes less happy on 0.25 than those who have job without preventing (-0.25 estimate coefficient). Also we see, that people who have the job which prevents the communication often/always in a more degree less happy on 0.61 than those who have job without preventing (-0.61 estimate coefficient).

To summaries, people are much happier when they have a job that does not interrupt them from socialising with family and friends, and the feeling of happiness increases when one feels part of a team.

Constructing regression model equation

The general equation looks like this: E(Y) = β0 + β1X1 + β2I2 + β3I3 + β4I4.

As the equation was asked for the whole model with significant coefficients. So, in our case it would be would be look like this:

feeling happy = 7.07 + 0.15 * teem feeling - 0.25 * job sometimes prevents from communication - 0.61 * job often/always prevents from communication

Summary of findings

Let’s now go back to our research hypothesis and take a general overview of our results based on previous analysis:

1) The more social contacts a person has and the more often he/she participates in social activities the higher his/her level of happiness.

The hypothesis was proved. People who meet other people and take part in social activities more frequantly are happier.

2) The more a person feels that he belongs to a community of colleagues, the higher his level of happiness.

The hypothesis was proved. People who feel more like a part of their working team are happier.

3) The better a person rates the closeness of their relationship with their parents, the higher their level of happiness.

The hypothesis was proved. People who have closer relationships with their parents are happier.

4) The more work relationships interfere with relationships with family, the lower a person’s level of happiness.

The hypothesis was proved. People who have a job, which prevents their communication with famlily are less happy.

Going back to our discussion of explaining happiness, wee see that our analysis describe relationships not fully. The persantage of observations described by predictors (our socially connected factors) is pretty low. Therefore, we need to conclude that there are other factors (such as income and health), which probably describe hapinness better.

Project 4

This study is a continuation of our study number 3. Here we aim to expand our understanding of the relationship between social contacts and happiness.

As a reminder, the overall topic of our research is: ‘Digital and social contacts within family and workplace and its relation to subjective well-being and happiness’. However, for this part of our research we decided to focus on feelings of happiness and to look at potential social factors that may influence happiness levels.

Thus, our research question is: What factors influence the relation between connection to social groups and feelings of happiness?

A number of other studies have also found a relationship between the quality of social relationships in the family and happiness. For example, a study by Tammisalo, K., Danielsbacka, M., Tanskanen, A. O., & Arpino, B. reveals how relationships with different family members are related to the level of happiness. [1] In addition to family contact, research reveals the importance of work relationships in influencing an individual’s happiness. [2]

In our project, we focused on two types of communities: co-workers and family. These groups were chosen because of their qualitative differences from each other. In a coworking community, people are in a more formal setting, while a family is based on personal relationships between its members.

References: Tammisalo, K., Danielsbacka, M., Tanskanen, A. O., & Arpino, B. (2024). Social media contact with family members and happiness in younger and older adults. Computers in Human Behavior, 153, 108103. Haar, J., Schmitz, A., Di Fabio, A., & Daellenbach, U. (2019). The role of relationships at work and happiness: A moderated moderated mediation study of New Zealand managers. Sustainability, 11(12), 3443.

Decriptive statistics

The table with the description of the variables we use for the analysis

For this investigation we chose the variables, which were shown to have statistically significant correlation in previous project.

Label = c("idno", "closepnt", "teamfeel",  "happy", "hhlipnt", "colprop") 
Meaning = c("Respondent's identification number", "How close a person feels to parent", "Feeling like part of your work team", "How happy the person is", "Parent lives in same household with a respondent", "Proportion of colleagues based at the same location")
Level_Of_Measurement <- c("Ratio", "Quasi Interval", "Quasi interval", "Quasi interval", "Nominal, binary", "Ordinal")
df <- data.frame(Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)

kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label	Meaning	Level_Of_Measurement
idno	Respondent’s identification number	Ratio
closepnt	How close a person feels to parent	Quasi Interval
teamfeel	Feeling like part of your work team	Quasi interval
happy	How happy the person is	Quasi interval
hhlipnt	Parent lives in same household with a respondent	Nominal, binary
colprop	Proportion of colleagues based at the same location	Ordinal

Investigating the variables

Outcome

“happy” - How happy are you

table(ESS$happy)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   2   4   7  12  17  51  62 220 537 380 231

ESS$happy <- as.numeric(ESS$happy)

ESS_happy<- ESS %>% 
  select (happy)

table(ESS_happy$happy)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   2   4   7  12  17  51  62 220 537 380 231

summary(ESS_happy$happy)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   8.000   8.086   9.000  10.000

class(ESS_happy$happy)

## [1] "numeric"

ggplot(ESS_happy)+
  geom_histogram( aes(x = happy), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How happy people feel") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(happy), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(happy), color = 'median'), linetype="solid", linewidth = 2.5)+
  geom_vline(aes(xintercept = Mode(happy), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Feeling of happiness")

describeBy(ESS_happy$happy, group = ESS_happy$happy >0)

## 
##  Descriptive statistics by group 
## group: FALSE
##    vars n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 2    0  0      0       0   0   0   0     0  NaN      NaN  0
## ------------------------------------------------------------ 
## group: TRUE
##    vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 1521  8.1 1.46      8    8.26 1.48   1  10     9 -1.33     3.12 0.04

Predictors

“closepnt” - How close a repondent feels to parent

table(ESS$closepnt)

## 
##   1   2   3   4   5   6   7   8 
## 211 441 241  69  23 536   1   1

# Filter the observations
ESS_closepnt <- ESS %>% 
  select(closepnt) %>% 
  filter(closepnt < 6)

table(ESS_closepnt$closepnt)

## 
##   1   2   3   4   5 
## 211 441 241  69  23

# We need to invert the scale first, as in the initial scale 1 stands for "Extremely close" and 5 is for "Not at all close"
ESS_closepnt$closepnt <- as.numeric (6 - ESS_closepnt$closepnt)


ggplot(ESS_closepnt)+
  geom_histogram( aes(x = closepnt), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("How close a repondent feels to parent") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(closepnt), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(closepnt), color = 'median'), linetype="solid", linewidth = 3)+
  geom_vline(aes(xintercept = Mode(closepnt), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("The degree of closeness to parents")

describeBy(ESS_closepnt, group = ESS_closepnt$closepnt > 0)

## 
##  Descriptive statistics by group 
## group: TRUE
##          vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## closepnt    1 985 3.76 0.94      4    3.85 1.48   1   5     4 -0.67     0.29
##            se
## closepnt 0.03

“teamfeel” - Feel like part of your work team, how much

table(ESS$teamfeel)

## 
##   0   1   2   3   4   5   6   7   8   9  10  55  66  77  88 
##   7   4   6  13   7  17  23  90 212 187 316 100 530   4   7

ESS$teamfeel <- as.numeric(ESS$teamfeel)

# Filter the observations
ESS_teamfeel <- ESS %>% 
  select (teamfeel) %>% 
  filter (teamfeel <=10)

table(ESS_teamfeel$teamfeel)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   7   4   6  13   7  17  23  90 212 187 316

summary(ESS_teamfeel$teamfeel)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.000   9.000   8.475  10.000  10.000

ggplot(ESS_teamfeel)+
  geom_histogram( aes(x = teamfeel), binwidth = 1, fill="#367588", col="#6a5acd", alpha = 0.5) +
  xlim(c(0, 10))+
  xlab("Feeling like part of your working team") + 
  ylab("Number of people") +
  scale_x_continuous(breaks= seq(0, 10, by=1))+
  geom_vline(aes(xintercept = mean(teamfeel), color = 'mean'), linetype="solid", linewidth = 1) +
  geom_vline(aes(xintercept = median(teamfeel), color = 'median'), linetype="solid", linewidth = 1)+
  geom_vline(aes(xintercept = Mode(teamfeel), color = 'mode'), linetype="solid", linewidth = 1) +
  scale_color_manual(name = "Measurement", values = c(median = "#1e90ff", mean = "#ffb6c1", mode = "#C9F76F"))+
  ggtitle("Feeling like a part of your work team")

ESS_teamfeel$teamfeel <-  (ESS_teamfeel$teamfeel + 1)
table (ESS_teamfeel$teamfeel)

## 
##   1   2   3   4   5   6   7   8   9  10  11 
##   7   4   6  13   7  17  23  90 212 187 316

library(psych)
describeBy(ESS_teamfeel, group = ESS_teamfeel$teamfeel > 0)

## 
##  Descriptive statistics by group 
## group: TRUE
##          vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## teamfeel    1 882 9.48 1.81     10     9.8 1.48   1  11    10 -2.02     5.37
##            se
## teamfeel 0.06

“hhlipnt” - Parent lives in same household with a respondent

table (ESS$hhlipnt)

## 
##   1   2   6   7 
## 183 803 536   1

#Filtering the observations
ESS_hhlipnt <- ESS %>% 
  select(hhlipnt) %>% 
  filter(hhlipnt <=2)

table(ESS_hhlipnt$hhlipnt)

## 
##   1   2 
## 183 803

class(ESS_hhlipnt$hhlipnt)

## [1] "integer"

#R represents this variable as integer, so we assigning ordered factor variable type
ESS_hhlipnt$hhlipnt <- factor(ESS_hhlipnt$hhlipnt, labels = c("Yes", "No"), ordered= T)


ggplot(ESS_hhlipnt)+
  geom_bar(aes(x = hhlipnt), fill="#367588", col="#6a5acd", alpha = 0.5)+
  xlab("Living with parents: yes or no") + 
  ylab("Number of people") +
  ggtitle("Living with parents")

The majority of Swiss people report they do not live in the same household as their parents.

“colprop” - Proportion of colleagues based at the same location

table(ESS$colprop)

## 
##   1   2   3   4   5   6   7  55  66  77  88 
## 175 185  82 131  93 154  78  78 530   5  12

# Filtering observations 
ESS_colprop <- ESS %>% 
  select(idno, colprop) %>% 
  filter(colprop <= 7)

table(ESS_colprop$colprop)

## 
##   1   2   3   4   5   6   7 
## 175 185  82 131  93 154  78

class(ESS_colprop$colprop)

## [1] "integer"

# recode in 3 categories
ESS_colprop$colprop <- dplyr::recode(ESS_colprop$colprop,
                             "7"= "Small or none",
                             "6"= "Small or none",
                             "5"="A half",
                             "4"="A half",
                             "3"="A half", 
                             "2"="Very large",
                             "1"="Very large")


#R represents this variable as integer, so we assigning ordered factor variable type
ESS_colprop$colprop <- factor(ESS_colprop$colprop, levels = c("Small or none", "A half", "Very large"), ordered= T)

ggplot(ESS_colprop %>% 
         filter(colprop != "NA")) +
  geom_bar(aes(x = colprop), fill="#CCCCFF", col="#FF7F50", alpha = 0.5) +
  xlab("The shares of colleagues at the same location") + 
  ylab("Number of people") +
  ggtitle("Proportion of colleagues based at the same location on a normal working day")

From the graph we see that the majority of respondents report that they are set with a very large amount of colleagues at their work location on a normal work day (“All” and “Very large” on the initial scale). The fewest amount of respondents state there are none or very small number of colleagues with them during work. A medium number report they have about a half of their colleagues based at the same location.

Descriptive statistics table

table(ESS_colprop$colprop)

## 
## Small or none        A half    Very large 
##           232           306           360

table(ESS_jbprtfp$jbprtfp)

## 
## Never/hardly ever         Sometimes      Often/always 
##               403               348               195

v.closepnt <- c(round(mean(ESS_closepnt$closepnt), 2), Mode(ESS_closepnt$closepnt), median(ESS_closepnt$closepnt))
names(v.closepnt) <- c("mean", "mode", "median")

v.teamfeel <- c(round(mean(ESS_teamfeel$teamfeel), 2), Mode(ESS_teamfeel$teamfeel), median(ESS_teamfeel$teamfeel))
names(v.teamfeel) <- c("mean", "mode", "median")

v.happy <- c(round(mean(ESS_happy$happy), 2), Mode(ESS_happy$happy), median(ESS_happy$happy))
names(v.happy) <- c("mean", "mode", "median")

v.hhlipnt <- c(NA, Mode(ESS_hhlipnt$hhlipnt), NA)
names(v.hhlipnt) <- c("mean", "mode", "median") 

v.colprop <- c(NA, Mode(ESS_colprop$colprop), "A half")
names(v.colprop) <- c("mean", "mode", "median")



tendencymeasures =  data.frame( v.closepnt, v.teamfeel, v.happy, v.hhlipnt, v.colprop,  stringsAsFactors = FALSE)
kable(tendencymeasures) %>%    
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	v.closepnt	v.teamfeel	v.happy	v.hhlipnt	v.colprop
mean	3.76	9.48	8.09	NA	NA
mode	4.00	11.00	8.00	No	Very large
median	4.00	10.00	8.00	NA	A half

Multiple regression model

Starting with choosing variables and filtering them. Also we will recode categorical variables.

ESS_regr <- ESS %>% 
  filter(happy < 77 & closepnt  < 6 & hhlipnt < 6 & teamfeel <= 10 & colprop < 55)
table(ESS_regr$hhlipnt)

## 
##   1   2 
##  96 620

ESS_regr$closepnt <- as.numeric(6 - ESS_regr$closepnt)
ESS_regr$hhlipnt <-  as.factor(ESS_regr$hhlipnt)
ESS_regr$happy <- as.numeric(ESS_regr$happy)
ESS_regr$colprop <- as.factor(ESS_regr$colprop)

ESS_regr$colprop <- dplyr::recode(ESS_regr$colprop,
                             "1"= "Very large",
                             "2"="Very large",
                             "3"="A half",
                             "4"="A half",
                             "5"="A half", 
                             "6"="Small or none",
                             "7"="Small or none")

ESS_regr$hhlipnt <- dplyr::recode(ESS_regr$hhlipnt,
                             "1"= "Yes",
                             "2"="No")

Constructing additive model

m3 <- lm(happy ~ closepnt + hhlipnt + teamfeel + colprop, data = ESS_regr)
sjPlot::tab_model(m3, show.ci = F)

	happy
Predictors	Estimates	p
(Intercept)	6.08	<0.001
closepnt	0.16	0.006
hhlipnt [No]	0.18	0.239
teamfeel	0.16	<0.001
colprop [A half]	-0.04	0.707
colprop [Small or none]	-0.24	0.073
Observations	716
R² / R² adjusted	0.069 / 0.063

Interpretation: We see in this model that R^2 here describes 0,063% of variance and that we have 2 significant predictors for the happiness of individuals in Switzerland, those are closeness to parents and feeling like a part of a working team. Interestingly, estimates of both predictors are the same. Thus, with closeness to parent increasing by 1 point, happiness of a person also increases by 0,16 scores. The same is applicable to feeling like a part of a team. The more person feels connected with his working team, the happier he is (namely, with feeling a part of a team increasing by 1, a person gets happier by 0,16 scores).

Multiple model

m4 <- lm(happy ~ closepnt * hhlipnt + teamfeel * colprop, data = ESS_regr)
sjPlot::tab_model( m4, show.ci = F)

	happy
Predictors	Estimates	p
(Intercept)	3.17	<0.001
closepnt	0.53	0.002
hhlipnt [No]	1.93	0.012
teamfeel	0.32	<0.001
colprop [A half]	1.99	0.006
colprop [Small or none]	1.36	0.030
closepnt × hhlipnt [No]	-0.43	0.018
teamfeel × colprop [A half]	-0.23	0.005
teamfeel × colprop [Small or none]	-0.18	0.010
Observations	716
R² / R² adjusted	0.090 / 0.079

Interpretation: In the multiple model it can be seen that all of the chosen predictors are significantly associated with the happiness, just as both moderations are significant. Startling with closeness to parents, for each unit increase in closeness, happiness increases by 0.53 units. Positive effect is also found for feeling a part of working team. With feeling changing by 1 (positively), happiness increases by 0.32 points. Shifting to categorical predictors, people who do not live with their parents in the same household are on average happier than those who live on 1.93 scores. Looking at working team conditions, namely at number of people working in the same physical place, people from those companies where only near a half of workers are located in the same place are 1.99 scores happier that those from companies where large number of people work together life. The similar situation is for group where small number of people works together, their are on average happier than workers from offline companies on 1.36 scores. As for the interaction effect, starting with the influence of living with parents or not on the effect of closeness to parents on subjective happiness, living separately from parents decreases the effect of closeness to happiness on 0.43 scores. It means when people live with parents in the same household their happiness is less related to the closeness to parents than when they are separated from them. As for number of workers put together in the same place, having only half of them or small/none number reduces the impact of feeling a part of a team on happiness. Notably, it lowers more (on 0.23) when the half of workers are situated in the same physical place than when none or few of them (here it lowers the effect on 0.18) compared to the group where large number of people are together.

Comparing the models

sjPlot::tab_model(m3, m4, show.ci = F)

	happy		happy
Predictors	Estimates	p	Estimates	p
(Intercept)	6.08	<0.001	3.17	<0.001
closepnt	0.16	0.006	0.53	0.002
hhlipnt [No]	0.18	0.239	1.93	0.012
teamfeel	0.16	<0.001	0.32	<0.001
colprop [A half]	-0.04	0.707	1.99	0.006
colprop [Small or none]	-0.24	0.073	1.36	0.030
closepnt × hhlipnt [No]			-0.43	0.018
teamfeel × colprop [A half]			-0.23	0.005
teamfeel × colprop [Small or none]			-0.18	0.010
Observations	716		716
R² / R² adjusted	0.069 / 0.063		0.090 / 0.079

anova(m3,m4)

## Analysis of Variance Table
## 
## Model 1: happy ~ closepnt + hhlipnt + teamfeel + colprop
## Model 2: happy ~ closepnt * hhlipnt + teamfeel * colprop
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    710 1337.2                                
## 2    707 1307.8  3    29.392 5.2964 0.001291 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to anova results, the second model is significantly better, just as it has smaller RSS. Also the second model has bigger R^2.

Constructing plots

Plot 1 – closepnt*hhlipnt

library(ggplot2)
ggplot(ESS_regr, aes(x=closepnt, y=happy, color=hhlipnt)) +
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  geom_jitter(width = 0.45, height = 0.45, alpha = 0.4)

Comment: here we see visual confirmation of interpretation placed above. There is a positive relationship between closeness to parents in both groups, though this relationship is stronger in the “Yes” group which is colored with pink. We can make this conclusion based on the slope of the curve, as it is stepper than blue (corresponds to the “no” group)

Plot 2 – teamfeel*colprop

ggplot(ESS_regr, aes(x=teamfeel, y=happy, color=colprop)) +
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  geom_jitter(width = 0.45, height = 0.45, alpha = 0.4)

Comment: we see that the most difference in effect appears in the group where a large number of workers work in a one physical place, so it is the most strong among three groups as the curve is the steepest. As for categories “A half” and “Small or none”, for them effect of feeling a part of a working team on happiness is approximately the same as the slope of the curves is not much different, at the same time they are flatter (compared with the curve for “Very large”).

Summary of findings

There are factors, which describe the relationships between happiness and predictors of it. As for family enviroments, hapinness related closeness to parents, howver these relationships moderated by the fact of living in the same household or separated by parents. We saw, that for people, who live in the same household with their parents, clossness to them will significantly define their level of happiness. As for working enviroments, we have shown that feeling of belonging to the team affects happiness, however for people, most colleguase of whom are in the same location, this relationship will be stronger.

Data analysis Projects

Academic Weapons

General Infromation about our research

Some background and justification of research interest:

Project 1

Literature review

Description of variables and graphs

Summary of descriptive statistics

Plots

Scatterplot

Boxplot

Stacked barplot

Summary of findings

Project 2

Describing variables

Tests

Chi-square

T-test

ANOVA

Project 3

Descriptive statistics of used variables

The table with the description of the variables we use for the analysis

Investigating the variables

Outcome

Predictors

Table of discriptive statistics

Correlation analysis

Checking assumptions for correlation

Corelation 1 - taking part in social activities and feeling happy

Corelation 2 - the frequency of social meets with friends, relatives or colleagues and feeling happy

Corelation 3 - The feeling of closeness to a parent and feeling happy

Corelation 4 - Feeling like a part of a work team and feeling happiness

Correlation matrix

Regression

Summary of findings

Project 4

Decriptive statistics

The table with the description of the variables we use for the analysis

Investigating the variables

Outcome

Predictors

Descriptive statistics table

Multiple regression model

Constructing additive model

Multiple model

Comparing the models

Constructing plots

Plot 1 – closepnt*hhlipnt

Plot 2 – teamfeel*colprop

Summary of findings