Project 1. Explaining subjective Well-Being after crisis: Financial security and employment status

library(readr)
library(haven)
library("wesanderson")
library(ggplot2)
library(dplyr)
library(forcats)
library(gridExtra)
library(formattable)

wvs <- read_csv("~/Course paper/wvs_final.csv")[,-1]
wvs2011 <- read_csv("~/Course paper/wvs_2011.csv")
wvs11 = wvs %>% filter(year == 2011)

wvs11 = wvs11 %>% dplyr::select(`E-equality`, satisf, happy, `SD-finance`, `SD-quantile`, `SD-gender`, `SD-age`) %>% mutate(`SD-employ` = wvs2011$V229, `lose_job` = wvs2011$V181)

wvs11 = wvs11 %>% filter(happy != 'NA')
wvs11 = wvs11 %>% filter(satisf != 'NA')

wvs11$`SD-employ`[wvs11$`SD-employ` == 8] = NA

#var_lab(wvs11$`lose_job`) #1 - very much 2-a great deal 3-not much 4-not at all
#table(wvs11$`lose_job`)

Introduction

The concept of personal subjective well-being has been of a great scientific interest for numerous sociologists all over the world during the last several decades. Of course, all countries have their specific histories and development trajectories which makes each country a unique object for analysis in terms of sociological perspective of subjective wellbeing. However, I purposefully decided to narrow down the scope of the analysis to the Russian case, since the past couple of decades in Russia’s history can be characterized as time of significant social, political, and cultural change.

Still, for simplicity, in this project I will consider only one time point - a WVS survey of 2011 (wave 6), which is the first years after the global economic crisis of 2008-2010. Having said that, my general idea will be based on employment reliability and financial satisfaction as the core factors determining subjective wellbeing of Russian citizens in 2011.

Of course, according to various literature, there are many other important factors determining personal wellbeing such as state of health, individual values concerning work-leisure balance, level of religiosity, level of political and civic engagement, emancipative values, as well as level of institutional and generalized trust. However, it is quite reasonable to assume, and numerous scientific works also point out that financial stability and self-provision capacity make a huge contribution to the perceived level of wellbeing, especially for those individuals recently gone through economic crisis.

Data Description

For this project I have chosen World Values Survey (WVS) dataset for Russian Federation, year 2011. Originally, it consisted of 2500 observations. However, after filtering out all the observations where outcome variables (happiness and life satisfaction) were missing, there were 2370 observations left for the analysis.

All the preliminary data manipulations (e.g. cleaning, variables selection, recoding, etc.) were performed earlier as this data is used as a part of my course work. In this project I will go straight to the task at hand. However, in case if you have any concerns or doubts, you are more than welcome to check this script.

Hypotheses

Hypotheses <- c("H1:", "H2:", "H3:", "H4:")

Description <- c("The level of wellbeing is positively associated with the level of satisfaction with financial situation in household", "The level of wellbeing is positively associated with the subjective placement on the income scale", "The level of wellbeing is positively associated with full-time stable employment", "The level of wellbeing is negatively associated with the fear of loosing a job")

Confirmed <- c("","" ,"" ,"")

VarTable1 <- data.frame(Hypotheses, Description, Confirmed)

formattable(VarTable1, 
            align =c("c","l","l","l","l"), 
            list(`Indicator Name` = formatter(
              "span", style = ~ style(color = "grey",font.weight = "bold"))))

Hypotheses	Description	Confirmed
H1:	The level of wellbeing is positively associated with the level of satisfaction with financial situation in household
H2:	The level of wellbeing is positively associated with the subjective placement on the income scale
H3:	The level of wellbeing is positively associated with full-time stable employment
H4:	The level of wellbeing is negatively associated with the fear of loosing a job

Analysis and technical interpretation

Creating wellbeing (happiness) index

To properly represent the concept of happiness in a broader sense, subjective well-being index (SWB index) is often used in the scientific literature. The concept of wellbeing includes two parts: happiness and life satisfaction.

The level of happiness represents dynamic part of subjective wellbeing, reflecting rather on internal emotional state and mood of an individual in relatively short-term perspective, whereas life satisfaction allows to consider static part of subjective wellbeing, representing respondent’s overall judgement about his/her own life and reflecting rather on external circumstances such as individual achievement, income, quality of social and personal relationships in long-term perspective.

According to K. Welzel, there is a great way for index construction using variables with different scales. His approach is as follows (and will be used in the current project):

“To have a uniform scale range, most variables brought into a range of 0 to 1.0 with the possible minimum set at 0 and the maximum at 1.0. For a 4-point scale this implies a coding of 0, .33, .66 and 1.0. For a 5-point scale the coding would be 0, .25, .50, .75 and 1.0. Using this uniform format facilitates the construction of composite indices and makes comparable the magnitudes of unstandardized regression coefficients in multivariate regression analyses.”

First,

both happiness and life satisfaction were reverted in the same direction meaning that in both variables greater values now represent greater intensity.

Second,

happy(1 = Very happy, 4 = Not at all happy) were recoded as follows: 1=1, 2=0.66, 3=0.33, 4=0 [happy_index], and life satisfaction (1 = Not at all satisfied, 10 = Completely satisfied) was recoded as (life satisfaction - 1)/(10 - 1) [satisf_index].

Third,

SWB index was constructed as follows: SWB = happy_index * satisf_index, which ranges from [0,1].

wvs11$happy_index[wvs11$happy == 1] = 1
wvs11$happy_index[wvs11$happy == 2] = 0.66
wvs11$happy_index[wvs11$happy == 3] = 0.33
wvs11$happy_index[wvs11$happy == 4] = 0

wvs11$satisf_index = (wvs11$satisf - 1)/(10 - 1)

wvs11$SWB = wvs11$happy_index*wvs11$satisf_index

The choice of explanatory variables

According to the main idea stated in the introductory part, the majority of predictors for analysis were chosen in a way to reflect employment status and financial satisfaction of a respondent.The controls are genger and age. Outcome variable is subjective wellbeing index.The Table below shows all the predictors in the final dataset.

Variables <- c("Income inequality attitude", "Satisfaction with financial situation", "Subjective income decile", "The fear of loosing a job","Employment status")

Range <- c("1-10: More equal - More diverse","1-10: Dissatisfied - Satisfied" ,"1-10: First decile-Tenth decile" ,"1-4: Very much - Not at all", "Categories for different levels")

VarTable2 <- data.frame(Variables, Range)

formattable(VarTable2, 
            align =c("l","l"), 
            list(`Indicator Name` = formatter(
              "span", style = ~ style(color = "grey",font.weight = "bold"))))

Variables	Range
Income inequality attitude	1-10: More equal - More diverse
Satisfaction with financial situation	1-10: Dissatisfied - Satisfied
Subjective income decile	1-10: First decile-Tenth decile
The fear of loosing a job	1-4: Very much - Not at all
Employment status	Categories for different levels

Descriptive statistics

ggplot(wvs11, aes(x = SWB))+
  geom_histogram(alpha = 0.5, color = "black", fill = "lightblue", binwidth = 0.05)+
  labs(title = "Subjective Well-Being index distribution\nfor representative sample of Russian population in 2011
            ", y  = "", x = "SWB") +
  theme_minimal() +
  scale_x_continuous(breaks = 0:10*0.1, limits = c(0,1)) +
  theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

The following visuals are based on SWB index dichotomization: Those who scored >= 0.8 on SWB were labelled to have High SWB index. Those who scored between 0.4 and 0.8 were labbeled as having Moderate SWB. Those who scored below 0.4 were labbeled as having Low SWB index.

Socio-demographic characteristics:

wvs11$SWB_cut[wvs11$SWB >= 0.8] = "High"

wvs11$SWB_cut[wvs11$SWB >= 0.4 & wvs11$SWB < 0.8] = "Moderate"

wvs11$SWB_cut[wvs11$SWB < 0.4] = "Low"

wvs11$SWB_cut = ordered(wvs11$SWB_cut, levels = c("Low", "Moderate", "High"))


wvs11$`SD-gender`[wvs11$`SD-gender` == 1] = "Male"
wvs11$`SD-gender`[wvs11$`SD-gender` == 2] = "Female"
wvs11$`SD-gender` = as.factor(wvs11$`SD-gender`)

gender = wvs11 %>% group_by(`SD-gender`, SWB_cut) %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))

ggplot(gender, aes(x = `SD-gender`, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.97, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "             %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "             %")), y = 0.3, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by gender
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

As it can be seen from the graph, gender doesn’t help to discriminate between those of high and low subjaective wellbeing. Within both males and females, on average, there are 5% high-level SWB individuals, 40% moderate level and about 55% low-level.

wvs11$age_cut = cut(wvs11$`SD-age`, breaks = c(18,25,35,45,55,65,75,91))
age = wvs11 %>% group_by(age_cut, SWB_cut) %>% na.omit() %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))
            
ggplot(age, aes(x = age_cut, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.98, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "             %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "           %")), y = 0.25, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by age groups
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

As for the age of the respondent, the general almost linear trend can be easily observed: the older the respondent gets, the lower the SWB index becomes.

Other predictors:

equality = wvs11 %>% group_by(`E-equality`, SWB_cut) %>% na.omit() %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))

equality$`E-equality` = as.factor(equality$`E-equality`)
            
ggplot(equality, aes(x = `E-equality`, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.98, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "          %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "           %")), y = 0.25, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by income inequality attitudes:\n1-more equal, 10-more diverse
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

Still, there is no clear picture about the association between income inequality attitudes and SWB. As it can be seen from the graph, more than 12% of people who firmly believe that incomes should be made more diverse exhibit high levels of SWB. However, in all other cases, the percentages are, on average, lower and it is hard to differentiate any other outstanding feature.

finance = wvs11 %>% group_by(`SD-finance`, SWB_cut) %>% na.omit() %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))

finance$`SD-finance` = as.factor(finance$`SD-finance`)
            
ggplot(finance, aes(x = `SD-finance`, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.98, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "          %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "           %")), y = 0.25, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by satisfaction with\nfinancial situation: 1-dissatisfied, 10-satisfied
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

As for the satisfaction with financial situation, it can be noticed that people with low satisfaction rates experience low wellbeing rates as well. The more satisfied financially a person becomes, the higher the percentage of moderete and high SWB can be observed. On average, there is about 5-6 % of people with high wellbeing level within whose who score from 1 to 8 on financial satisfaction. However, among people who score from 9 to 10, there is about 18 % of respondents with high level of wellbeing.

decile = wvs11 %>% group_by(`SD-quantile`, SWB_cut) %>% na.omit() %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))

decile$`SD-quantile` = as.factor(decile$`SD-quantile`)
            
ggplot(decile, aes(x = `SD-quantile`, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.98, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "          %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "           %")), y = 0.15, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by subjective placement on\nincome scale: 1-first decile, 10-tenth decile
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

An extremely interesting picture can be observed with subjective placement on income scale: there is a steady linear trend attributing higher income decile to higher SWB but only among first 8 out of 10 deciles. It seems like that on average people of 8th income decile are the most happiest ones, whereas in the 10th decile high level SWB category is absent at all!

lose_job = wvs11 %>% group_by(lose_job, SWB_cut) %>% na.omit() %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))

lose_job$lose_job = as.factor(lose_job$lose_job)
            
ggplot(lose_job, aes(x = lose_job, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.98, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "          %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "           %")), y = 0.15, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by degree of fear of\nloosing a current job: 1-Very much, 4-Not at all
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

Surprisingly enought, the fear of loosing a current job or not finding a new on doesn’t reveal any distinctive trend. The distribution of SWB index in approximately equal in both categories.

wvs11$`SD-employ`[wvs11$`SD-employ` == 1] = "Full"
wvs11$`SD-employ`[wvs11$`SD-employ` == 2] = "Part"
wvs11$`SD-employ`[wvs11$`SD-employ` == 3] = "Self"
wvs11$`SD-employ`[wvs11$`SD-employ` == 4] = "Retired"
wvs11$`SD-employ`[wvs11$`SD-employ` == 5] = "Housewife"
wvs11$`SD-employ`[wvs11$`SD-employ` == 6] = "Student"
wvs11$`SD-employ`[wvs11$`SD-employ` == 7] = "Unemployed"

wvs11$`SD-employ` = as.factor(wvs11$`SD-employ`)

wvs11$`SD-employ` = ordered(wvs11$`SD-employ`, levels = c("Full", "Part", "Self", "Housewife", "Student", "Retired", "Unemployed"))

employ = wvs11 %>% group_by(`SD-employ`, SWB_cut) %>% na.omit() %>% summarise(count = n()) %>% mutate(perc = round((count/sum(count)*100),1))

employ$`SD-employ` = as.factor(employ$`SD-employ`)
            
ggplot(employ, aes(x =`SD-employ`, y = count, fill = forcats::fct_rev(SWB_cut))) + 
       geom_bar(stat = "identity", color = "black", alpha = 0.6, position = "fill") + 
       geom_text(aes(label = ifelse(SWB_cut == "High", perc, "          %")), y = 0.98, size = 3) +
       geom_text(aes(label = ifelse(SWB_cut == "Moderate", perc, "          %")), y = 0.75, size = 3)+
       geom_text(aes(label = ifelse(SWB_cut == "Low", perc, "           %")), y = 0.15, size = 3) +
       scale_y_continuous(labels = scales::percent)+
       labs(title = "Subjective Well-Being by employment status
            ", y  = "", x = "") +
       theme_minimal() +
       guides(fill=guide_legend(title= "SWB level:"))+
       scale_fill_manual(values = wes_palette("Chevalier1")[c(3,4,1)])+
       theme(text = element_text(size = 10),
       plot.title = element_text(hjust = 0.5), title = element_text(size = 10))

Speaking about emplyment status, it can be noticed that students and self-emplyed people exhibit the highest levels of wellbeing, whereas retired and unemployed people are the most depressed ones.

Regression analysis: forward or backward selection?

As it is stated in the A.Field book, “the backward direction is preferable to the forward method.” However, let’s try both forward and backward methods to select the best one for the given the data.

For forward, backward and sequential selection with min = 1 and max = 7 predictors (according to the data), the lowest RMSE was 0.2143363, 0.2145073, 0.2144430 accordingly. It was found in the model with 3 predictors. In huge summary summary(forward/backward/both.model$finalModel) it can be seen that those predictors in descending order of importance are:

Subjective placement on income scale
Satisfaction with financial situation
Age

library(tidyverse)
library(caret)
library(leaps)
library(MASS)

wvs11$lose_job[wvs11$lose_job == 1|wvs11$lose_job == 2] = "Afraid"
wvs11$lose_job[wvs11$lose_job == 3|wvs11$lose_job == 4] = "Not afraid"

wvs11$lose_job = as.factor(wvs11$lose_job)

wvs11_modeling = wvs11 %>% dplyr::select(-satisf, -happy, -happy_index, -satisf_index, -SWB_cut, -age_cut) %>% na.omit()

set.seed(123)

# Set up repeated 10-fold cross-validation
train.control <- trainControl(method = "cv", number = 10)

# Train the model
forward.model <- train(SWB ~., data = wvs11_modeling,
                    method = "leapForward", 
                    tuneGrid = data.frame(nvmax = 1:7),
                    trControl = train.control)
#as.data.frame(forward.model$results)

set.seed(124)

# Set up repeated 10-fold cross-validation
train.control <- trainControl(method = "cv", number = 10)

# Train the model
backward.model <- train(SWB ~., data = wvs11_modeling,
                    method = "leapBackward", 
                    tuneGrid = data.frame(nvmax = 1:7),
                    trControl = train.control)
#as.data.frame(backward.model$results)

set.seed(125)

# Set up repeated 10-fold cross-validation
train.control <- trainControl(method = "cv", number = 10)

# Train the model
both.model <- train(SWB ~., data = wvs11_modeling,
                    method = "leapSeq", 
                    tuneGrid = data.frame(nvmax = 1:7),
                    trControl = train.control)
#as.data.frame(both.model$results)

Now, given that RMSE(forward) = 0.2143363 , RMSE(backward) = 0.2145073 , RMSE(stepwise) = 0.2144430, it would be logical to select forward selection method, as it has the lowest RMSE. However, it contradicts the literature recommendation.

Still, all the models perform almost equally well and show the same final result in terms of predictors selection. Three best predictors were included in the final model below.

Final model

model <- lm(SWB ~ `SD-quantile` + `SD-finance` + scale(`SD-age`), 
   data = wvs11_modeling)

library(sjPlot) 
tab_model(model)

	SWB
Predictors	Estimates	CI	p
(Intercept)	0.18	0.15 – 0.21	<0.001
`SD-quantile`	0.03	0.02 – 0.03	<0.001
`SD-finance`	0.02	0.01 – 0.02	<0.001
scale(`SD-age`)	-0.04	-0.05 – -0.03	<0.001
Observations	1916
R² / adjusted R²	0.156 / 0.154

Overall, the final model with 3 predictors explains about 15.5% of the variance in the outcome variable (whereas model with all existing predictors explains around 16%). Of course, all predictors are statistically significant.

As a starting point, it can be mentioned that for people of average age, who consider themselves belonging to the 1st income decile and experiencing total dissatisfaction with financial situation, subjective wellbeing index is predicted to be at least 0.18 (out of 1).

The greatest contribution in explaining SWB, as it was mentioned earlier, is done by the subjective placement on income scale variable. To be more precise, for people of average age and total dissatisfaction with financial situation in the household, each additional point increase within the scale of income deciles will be associated with +0.028 or 3% increase in subjective wellbeing index.

Besides, for people of average age who place themselves in the 1st income decile, each additional point increase in satisfaction with financial situation will be associated with +0.018 or 2% increase in subjective wellbeing.

Last but not least, for people who completely dissatisfied with financial situation and place themselves in the 1st income decile, each additional +1 SD increase above the mean age will be associated with -0.039 or -4% decrease in subjective wellbeing index.

Model diagnostics

library(car)
vif(model)

##   `SD-quantile`    `SD-finance` scale(`SD-age`) 
##        1.197996        1.126573        1.067879

Variance inflation factor shows all the estimates to be less then 5, so everything is fine.

outlierTest(model)

## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##      rstudent unadjusted p-value Bonferonni p
## 1141 3.807099         0.00014504       0.2779

Outlier test shows only one observation - 1141.

par(mfrow=c(2,2))
plot(model)

By inspecting QQ-Plot, it can be seen that the data distribution is far from perfectly normal. There is also an outlier 1141 can be spotted. By examining Residuals VS Leverages plot, it can be also seen that there are no leverages in the data as there are no observations falling behind Cool’s distance line. However, there is a bunch of outliers: 1806, 647, 499.

library(car)
leveragePlots(model)

In summary, the following outliers will be removed: 1141, 1806, 647, 499, 167, 1037, 1660.

wvs11_modeling = wvs11_modeling[-c(1141, 1806, 647, 499, 167, 1037, 1660), ]

model_out <- lm(SWB ~ `SD-quantile` + `SD-finance` + scale(`SD-age`), 
   data = wvs11_modeling)

#summary(model_out)

Having removed all the outliers, R-squared increased by 1 %, whereas all the other coefficients remained relatively the same.

Non-linear effect

Even though I generally assume that there are no non-linear effects in the data, just for the reason of example let’s suppose that there is a cubic relationships between SWB and financial satisfaction. The graph below gives some incentives to assume that.

library(tidyverse)
library(caret)
theme_set(theme_minimal())

# Split the data into training and test set
set.seed(126)
training.samples <- wvs11_modeling$SWB %>%
  createDataPartition(p = 0.8, list = FALSE)

train.data  <- wvs11_modeling[training.samples, ]
test.data <- wvs11_modeling[-training.samples, ]

ggplot(train.data, aes(`SD-finance`, SWB) ) +
  geom_point() +
  stat_smooth()

# Build the model
model_l <- lm(SWB ~ `SD-finance`, data = train.data)
# Make predictions
predictions_l <- model_l %>% predict(test.data)

# Build the model
model_poly <- lm(SWB ~ poly(`SD-finance`, 3, raw = TRUE), data = train.data)
# Make predictions
predictions <- model_poly %>% predict(test.data)
# Model performance
as<- data.frame(
  RMSE_lin = RMSE(predictions_l, test.data$SWB),
  R2_lin = R2(predictions_l, test.data$SWB),
  RMSE_poly = RMSE(predictions, test.data$SWB),
  R2_poly = R2(predictions, test.data$SWB)
)

formattable(as, 
            align =c("l","l"), 
            list(`Indicator Name` = formatter(
              "span", style = ~ style(color = "grey",font.weight = "bold"))))

RMSE_lin	R2_lin	RMSE_poly	R2_poly
0.213874	0.09801051	0.2148258	0.08747304

From the output it can be seen that for linear regression RMSE is lower and R2 is higher, which means that liner model is better and there is no apparent reasons to assume that there is a cubic relationships.

Interaction effect

Let`s suppose that with age, SWB index drops faster for respondent’s in the first income deciles, while drops slower for respondents in the last income deciles.

model_int <- lm(SWB ~ `SD-finance` +`SD-age` * `SD-quantile`, 
                data = wvs11_modeling)


plot_model(model_int, type = "std")

According to the interaction model summary, interaction term is not statistically significant (neither any other interaction terms available in the data). Beta coefficients for subjective placement on income scale makes the largest contribution into explaining SWB, whereas interaction term fails to do so.

Meaningful interpretation

In general, it was found that people’s age and perception of their own financial situation greatly interconnected with their subjective level of well-being, whereas other aspects of people’s life (such an employment, fear of losing a job, etc.) are not helpful in understanding SWB distribution.

On average, a starting point for measuring subjective well-being will be 17 out of 100, which basically means that even if a person of average age is extremely poor and completely dissatisfied with his or her financial situation, there is almost no people who score 0 or about 0 on SWB index.

First, it is important to consider how people allocate themselves on income scale. Let us consider that those who are placed in the 1st income decile score 17 out of 100 on SWB. Going further, analysis show that roughly speaking each step up in income deciles result in +3 out of 100 to predicted SWB score. For example, in the 5th decile, predicted SWB score will on average constitute 17 + (3x4) = 29 out of 100.

Second, the scheme with financial situation is very similar to the previous one, except one: each additional step up in financial satisfaction will result in +2 out of 100 to predicted SWB. This means that higher financial satisfaction remains resulting in higher SWB, however this association is weaker compared to previous one with subjective placement on income scale.

Third, the older the person gets, the lower the SWB becomes. Each additional +1 SD above the mean age would result in -4 out of 100 to predicted SWB. In our case, standard deviation of age is 16.5 years, while mean age is 44.5. Thus, if an extremely poor person of average age (44.5) scores 17 out of 100 of SWB, then an extremely poor person of 44.5 + 16.5 = 61 y. o. would score 17 - 4 = 13 out of 100 in predicted SWB.

Conclusion

As it was shown in this project, financial stability is indeed quite important while considering personal subjective well-being as it managed to explain about 15% of the variation in SWB index.

As it turned out, only perception of financial situation (subjective placement on income scale and financial satisfaction) plays an important role in explaining SWB, whereas income inequality attitudes, job security and employment status failed to be equally influential. As for the control variables, gender makes no significant contribution into explaining SWB, while age of the respondent - does.

In general, there is positive association between 1) subjective placement on income scale, 2) financial satisfaction and Subjective Well-Being and negative association between age of a respondent and SWB. People of 8th income decile were found to exhibit the highest level of wellbeing. The same can be said about students and self-employed people. On the contrary, people of lower income deciles, retired or unemployed were found to exhibit the lowest levels of wellbeing.

No non-linear relationships were found in the data, as well as no significant interaction terms.

Hypotheses <- c("H1:", "H2:", "H3:", "H4:")

Description <- c("The level of wellbeing is positively associated with the level of satisfaction with financial situation in household", "The level of wellbeing is positively associated with the subjective placement on the income scale", "The level of wellbeing is positively associated with full-time stable employment", "The level of wellbeing is negatively associated with the fear of loosing a job")

Confirmed <- c("True","True" ,"Not confirmed" ,"No confirmed")

VarTable1 <- data.frame(Hypotheses, Description, Confirmed)

formattable(VarTable1, 
            align =c("c","l","c"), 
            list(`Indicator Name` = formatter(
              "span", style = ~ style(color = "grey",font.weight = "bold"))))

Hypotheses	Description	Confirmed
H1:	The level of wellbeing is positively associated with the level of satisfaction with financial situation in household	True
H2:	The level of wellbeing is positively associated with the subjective placement on the income scale	True
H3:	The level of wellbeing is positively associated with full-time stable employment	Not confirmed
H4:	The level of wellbeing is negatively associated with the fear of loosing a job	No confirmed