final_project

Introduction.
Methodology: Describe the data, selected variables.
Preparation of data
Dealing with missings
2nd level variables
Simple bivariate tests
Multilevel regression
Interactions
- Interaction with 1st level and 1st level
- Interaction 1st level and 2nd level
Random effects
Diagnostics
- Extra model
Findings
Reflection

Introduction.

Financial satisfaction is an important matter which needs to be addressed because it shapes the lives of people and their happiness with life in general. Therefore, in my project I want to examine the effect of different indicators to see how they shape the financial satisfaction. To get a more detailed picture I’m going to use data on 20 countries from all over the world. It will help to see, how the effect of different variables differ between countries.

The main effect in which I’m interested is the life satisfaction and belief in income equality. These aspects are significant because satisfaction with life and financial situation are connected since both of them represent the happiness of a person with his situation and perspectives. Belief in income equality is connected to the economic views. I argue that it worth studying since the assessment of personal happiness is affected by the beliefs about the general situation.

Thus, I come with the following Research Question:

How does life satisfaction and belief in income equality affect satisfaction with financial situation among individuals with varying levels of education and income?

To answer this RQ I developed the following hypotheses:

People with higher scores of life satisfaction are more satisfied with their financial situation
People who don’t believe that incomes should be equal are more satisfied with their financial situation.

Additional hypotheses:

Higher levels of education are associated with higher satisfaction with financial situation.
The more children the person has, the less they are satisfied with their financial situation.
The higher the self-assessed level of income, the higher the satisfaction with financial situation.

Hypotheses for the 2nd level variables:

People coming from countries with higher GDP per capita are more satisfied with their financial situation
People coming from countries with higher human capital index are more satisfied with their financial situation.
People coming from countries with higher unemployment rate are less satisfied with their financial situation.

Methodology: Describe the data, selected variables.

To answer the research question and address hypotheses I have selected the following countries: Russia, Kazakhstan, Ukraine, USA, Thailand, China, Japan, South Korea, Germany, Australia, Mexico, Turkey, Canada, Indonesia, Netherlands, Singapore, Pakistan, Brazil, Mongolia, Colombia. This list was motivated by the presence of all required variables and the diversity of these countries in terms of 2nd level variables: they have different levels of GDP per capita, HCI, and unemployment rate. Therefore, I believe that studying them will show important differences which are essential to the understanding of satisfaction with financial situation.

The dependent variable is: Satisfaction with the financial situation in the household. It has 10 levels, the lower one meaning - Completely dissatisfied, the higher - Completely satisfied.

The independent variables are the following:

first level

Life satisfaction. It has 10 levels, the lower one meaning - Completely dissatisfied, the higher - Completely satisfied.
Self-assessed level of income. It has 10 levels, the lower one meaning - Lowest group, the higher - Highest group.
Highest level of education. Factor variable.
Belief about equality of incomes. It has 10 levels, the lower one meaning - Incomes should be made more equal, the higher - There should be greater incentives for individual effort.
Number of children that person has.
Control: sex, age.

second level

The data was gotten from the World Bank Open Data.

GDP per capita.
Human Capital Index.
Unemployment rate.

Preparation of data

After the filtering, renaming, and releveling the variables, we get the following structure of the data. The total number of observations is 39892. Number of variables - 9.

view_df(data, show.type =T, show.frq = T, show.prc = T, show.na = T)

Data frame: data
ID	Name	Type	missings	Values	Value Labels	Freq.	%
1	country	categorical	0 (0.00%)		Australia Brazil Canada China Colombia Germany Indonesia Japan Kazakhstan South Korea Mexico Mongolia Netherlands Pakistan Russia <… truncated>	1813 1762 4018 3036 1520 1528 3200 1353 1276 1245 1741 1638 2145 1995 1810	4.54 4.42 10.07 7.61 3.81 3.83 8.02 3.39 3.20 3.12 4.36 4.11 5.38 5.00 4.54
2	finsat	integer	293 (0.73%)	range: 1-10
3	lifesat	integer	248 (0.62%)	range: 1-10
4	selfincome	integer	1381 (3.46%)	range: 1-10
5	age	integer	32 (0.08%)	range: 17-98
6	sex	categorical	23 (0.06%)		male female	18840 21029	47.25 52.75
7	edu	categorical	521 (1.31%)		primary and lower secondary post-secondary BA higher BA	6498 14037 7654 7588 3594	16.50 35.65 19.44 19.27 9.13
8	equalinc	integer	631 (1.58%)	range: 1-10
9	children	integer	565 (1.42%)	range: 0-22

The graphics of variables distribution can be seen below:

ggarrange(
ggplot(data = data, aes(x = finsat)) +
  geom_histogram(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Financial satisfaction',
       y = 'Count'),

ggplot(data = data, aes(x = lifesat)) +
  geom_histogram(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Life satisfaction',
       y = 'Count'),

ggplot(data = data, aes(x = selfincome)) +
  geom_histogram(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Income groups',
       y = 'Count'),

ggplot(data = data, aes(x = equalinc)) +
  geom_histogram(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Incomes should not be equal',
       y = 'Count'),

ncol = 2, nrow = 2)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All of the variables are not distributed normally except for Income groups. However, even this plot shows that there are too many people rating themselves as the lowest income group to call it properly normally distributed.

These plots show us that on average people are rather satisfied with their financial situation or rate it average. In terms of life satisfaction, people are more positive. More people tend to rate their household’s income group as average, once again, the distribution is similar to normal. Talking about the beliefs about the equality of income, we can see that mor people think that there should be greater incentives for individual effort. However, there is also a big group of people who think that incomes should be equal completely.

ggarrange(
ggplot(data = data, aes(x = age)) +
  geom_histogram(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Age',
       y = 'Count'),

ggplot(data = data, aes(x = sex)) +
  geom_bar(fill="skyblue")+
  theme_minimal() + 
  labs( x = 'Sex',
       y = 'Count'),

ggplot(data = data, aes(x = edu)) +
  geom_bar(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Education',
       y = 'Count'),

ggplot(data = data, aes(x = children)) +
  geom_histogram(fill="skyblue")+
  theme_minimal() + 
  labs(x = 'Number of children',
       y = 'Count'),

ncol = 2, nrow = 2)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

In terms of ages, the distribution is similar to normal, but it is right-skewed. There is almost equal number of male and female respondents, however there are slightly more women. The most saturated level of education is secondary. The least - having a degree higher than BA. Most people either don’t have or have small number of children.

Dealing with missings

mdf <- missing_data.frame (data[,-1])
mdf <- change (mdf, y = c("age", "finsat", "lifesat", "selfincome", "equalinc"), what = "type", to = "positive-continuous")
mdf <- change (mdf, y = c("edu"), what = "type", to = "ordered-categorical")
show(mdf)

## Object of class missing_data.frame with 39892 observations on 8 variables
## 
## There are 67 missing data patterns
## 
## Append '@patterns' to this missing_data.frame to access the corresponding pattern for every observation or perhaps use table()
## 
##                           type missing method  model
## finsat     positive-continuous     293    ppd linear
## lifesat    positive-continuous     248    ppd linear
## selfincome positive-continuous    1381    ppd linear
## age        positive-continuous      32    ppd linear
## sex                     binary      23    ppd  logit
## edu        ordered-categorical     521    ppd ologit
## equalinc   positive-continuous     631    ppd linear
## children            continuous     565    ppd linear
## 
##                 family     link transformation
## finsat        gaussian identity            log
## lifesat       gaussian identity            log
## selfincome    gaussian identity            log
## age           gaussian identity            log
## sex           binomial    logit           <NA>
## edu        multinomial    logit           <NA>
## equalinc      gaussian identity            log
## children      gaussian identity    standardize

#image(mdf)

As we can see, the number of missings is not high for most of the varibales. The variable with the highest share of missings is Level of income. The type of missingness is most likely MAR.

To deal with missing values I’m going to use imputation for every country using Random Forest method.

After imputing for every country separately, I join all the datasets together. The resulting table without missings can be seen below.

d<-rbind(a_imputed_missForest$ximp,b_imputed_missForest$ximp,c_imputed_missForest$ximp,ch_imputed_missForest$ximp,
      co_imputed_missForest$ximp, g_imputed_missForest$ximp, i_imputed_missForest$ximp, j_imputed_missForest$ximp,
      k_imputed_missForest$ximp, s_imputed_missForest$ximp, m_imputed_missForest$ximp, mo_imputed_missForest$ximp,
      n_imputed_missForest$ximp, p_imputed_missForest$ximp, r_imputed_missForest$ximp, sin_imputed_missForest$ximp,
      tur_imputed_missForest$ximp, uk_imputed_missForest$ximp, th_imputed_missForest$ximp, us_imputed_missForest$ximp)

view_df(d, show.type =T, show.frq = T, show.prc = T, show.na = T)

Data frame: d
ID	Name	Type	missings	Values	Value Labels	Freq.	%
1	country	categorical	0 (0.00%)		Australia Brazil Canada China Colombia Germany Indonesia Japan Kazakhstan South Korea Mexico Mongolia Netherlands Pakistan Russia <… truncated>	1813 1762 4018 3036 1520 1528 3200 1353 1276 1245 1741 1638 2145 1995 1810	4.54 4.42 10.07 7.61 3.81 3.83 8.02 3.39 3.20 3.12 4.36 4.11 5.38 5.00 4.54
2	finsat	numeric	0 (0.00%)	range: 1.0-10.0
3	lifesat	numeric	0 (0.00%)	range: 1.0-10.0
4	selfincome	numeric	0 (0.00%)	range: 1.0-10.0
5	age	numeric	0 (0.00%)	range: 17.0-98.0
6	sex	categorical	0 (0.00%)		male female	18850 21042	47.25 52.75
7	edu	categorical	0 (0.00%)		primary and lower secondary post-secondary BA higher BA	6551 14228 7795 7661 3657	16.42 35.67 19.54 19.20 9.17
8	equalinc	numeric	0 (0.00%)	range: 1.0-10.0
9	children	numeric	0 (0.00%)	range: -0.0-22.0

2nd level variables

Now I’m going to add 2nd level variables, namely GDP per capita, unemployment rate, and HCI. All the data was gotten through the World Bank.

seclev <- data.frame(id=1:20,
                  gdppc=c('15270.7', '11492.0', '4534.0', '76329.6', '6910.0', '12720.2', '34017.3', '32422.6', '48718.0', '65099.8', '11496.5', '10674.5', '55522.4', '4788.0', '57025.0', '82807.6', '1588.9', '8917.7', '5045.5', '6624.2'),
                  inflation=c('15.8', '19.8', '34.3','7.0', '4.7', '2.2', '0.3','1.3','5.3', '7.1', '6.7', '96.0', '7.7', '9.6', '5.5', '9.1', '14.0','8.3','17.7','14.3'),
                  unemp=c('3.9', '4.9', '9.8', '3.6', '0.9', '5.0', '2.6', '2.9', '3.1', '3.7', '3.3', '10.4', '5.3', '3.5', '3.5', '3.6', '5.6', '9.2', '6.2', '10.6'),
                  hci=c('0.68', '0.63', '0.63', '0.70', '0.61', '0.65', '0.80', '0.80', '0.75', '0.77', '0.61', '0.65', '0.80', '0.54', '0.79', '0.88', '0.41', '0.55', '0.61', '0.60'),
                  gini=c('36.0', '27.8', '25.6', '39.8', '35.1', '37.1', '32.9', '31.4', '31.7', '34.3', '45.4', '41.9', '31.7', '37.9', '26.0', '36.0', '29.6', '52.9', '32.7', '51.5'),
                  regime=c('Electoral autocracy', 'Electoral autocracy', 'Electoral autocracy', 'Liberal democracy', 'Closed autocracy', 'Closed autocracy', 'Liberal democracy', 'Liberal democracy', 'Liberal democracy', 'Liberal democracy', 'Electoral democracy', 'Electoral autocracy', 'Electoral democracy', 'Electoral democracy', 'Liberal democracy', 'Electoral autocracy', 'Electoral autocracy', 'Electoral democracy', 'Electoral democracy', 'Electoral democracy'),
                  country=c('Russia', 'Kazakhstan', 'Ukraine', 'USA', 'Thailand', 'China', 'Japan', 'South Korea', 'Germany', 'Australia', 'Mexico', 'Turkey', 'Canada', 'Indonesia', 'Netherlands', 'Singapore', 'Pakistan', 'Brazil', 'Mongolia', 'Colombia'))

da<-merge(d, seclev, by="country", all = T)

da$gdppc<-as.numeric(as.character(da$gdppc))
da$inflation<-as.numeric(as.character(da$inflation))
da$unemp<-as.numeric(as.character(da$unemp))
da$hci<-as.numeric(as.character(da$hci))
da$gini<-as.numeric(as.character(da$gini))
da$regime <- as.factor(da$regime)

summary(da)

##         country          finsat          lifesat         selfincome    
##  Canada     : 4018   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
##  Indonesia  : 3200   1st Qu.: 5.000   1st Qu.: 6.000   1st Qu.: 3.000  
##  China      : 3036   Median : 7.000   Median : 7.000   Median : 5.000  
##  USA        : 2596   Mean   : 6.391   Mean   : 7.181   Mean   : 4.847  
##  Turkey     : 2415   3rd Qu.: 8.000   3rd Qu.: 9.000   3rd Qu.: 6.000  
##  Netherlands: 2145   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##  (Other)    :22482                                                     
##       age            sex                       edu           equalinc     
##  Min.   :17.00   male  :18850   primary and lower: 6551   Min.   : 1.000  
##  1st Qu.:31.00   female:21042   secondary        :14228   1st Qu.: 4.000  
##  Median :44.00                  post-secondary   : 7795   Median : 6.000  
##  Mean   :44.77                  BA               : 7661   Mean   : 5.973  
##  3rd Qu.:57.00                  higher BA        : 3657   3rd Qu.: 8.000  
##  Max.   :98.00                                            Max.   :10.000  
##                                                                           
##     children           id            gdppc         inflation    
##  Min.   : 0.00   Min.   : 1.00   Min.   : 1589   Min.   : 0.30  
##  1st Qu.: 0.00   1st Qu.: 6.00   1st Qu.: 6910   1st Qu.: 5.50  
##  Median : 2.00   Median :12.00   Median :12720   Median : 7.70  
##  Mean   : 1.65   Mean   :10.84   Mean   :29692   Mean   :14.58  
##  3rd Qu.: 2.00   3rd Qu.:15.00   3rd Qu.:55522   3rd Qu.:14.00  
##  Max.   :22.00   Max.   :20.00   Max.   :82808   Max.   :96.00  
##                                                                 
##      unemp             hci              gini                       regime     
##  Min.   : 0.900   Min.   :0.4100   Min.   :25.60   Closed autocracy   : 4536  
##  1st Qu.: 3.500   1st Qu.:0.6100   1st Qu.:31.70   Electoral autocracy:10797  
##  Median : 3.900   Median :0.6500   Median :36.00   Electoral democracy:13879  
##  Mean   : 5.067   Mean   :0.6746   Mean   :36.04   Liberal democracy  :10680  
##  3rd Qu.: 5.600   3rd Qu.:0.7900   3rd Qu.:39.80                              
##  Max.   :10.600   Max.   :0.8800   Max.   :52.90                              
##

There are also some additional second level variables such as inflation and political regime. But they are not used further due to their inefficiency.

Simple bivariate tests

In this part I’m doing bivariate tests between the dependent variable and predictors.

Life satisfaction

cor.test(da$finsat, da$lifesat)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$lifesat
## t = 133.22, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5480819 0.5616651
## sample estimates:
##       cor 
## 0.5549105

ggplot(da, aes(x=lifesat, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'Life satisfaction',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

The correlation between this variables is rather high - 0.56, also it is positive. Thus, people with higher life satisfaction are more financially satisfied.

Income level

cor.test(da$finsat, da$selfincome)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$selfincome
## t = 69.625, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3204001 0.3378999
## sample estimates:
##       cor 
## 0.3291783

ggplot(da, aes(x=selfincome, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'Income lvl',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

The correlation is positive and equals 0.33. Thus, people with higher incomes are more financially satisfied.

Belief about equality of incomes

cor.test(da$finsat, da$equalinc)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$equalinc
## t = 24.482, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1119880 0.1313238
## sample estimates:
##       cor 
## 0.1216675

ggplot(da, aes(x=equalinc, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'Belief about income equaliy',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

The correlation is low and positive - 0.12. It means that people who believe that there should be greater incentives for individual effort, are more financially satisfied.

Education

TukeyHSD(aov(da$finsat~da$edu))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = da$finsat ~ da$edu)
## 
## $`da$edu`
##                                         diff         lwr         upr     p adj
## secondary-primary and lower       0.07580887 -0.01840439 0.170022133 0.1815165
## post-secondary-primary and lower -0.01201743 -0.11777927 0.093744411 0.9980004
## BA-primary and lower              0.48485270  0.37866933 0.591036068 0.0000000
## higher BA-primary and lower       0.55951072  0.42926019 0.689761247 0.0000000
## post-secondary-secondary         -0.08782630 -0.17674307 0.001090472 0.0547747
## BA-secondary                      0.40904382  0.31962607 0.498461576 0.0000000
## higher BA-secondary               0.48370185  0.36671541 0.600688281 0.0000000
## BA-post-secondary                 0.49687013  0.39535676 0.598383486 0.0000000
## higher BA-post-secondary          0.57152815  0.44505580 0.698000492 0.0000000
## higher BA-BA                      0.07465802 -0.05216704 0.201483081 0.4936123

ggplot(da, aes(x=edu, y=finsat)) + 
  geom_boxplot()+
  theme_minimal() + 
  labs(x = 'Education',
       y = 'Financial satisfaction')

The significant comparisons include primary and lower with BA/higher BA. People with degrees are significantly more financially satisfied than people with low education. The same situation goes with secondary and post-secondary. People with at least BA degrees are more financially satisfied than respondents from those groups.

Age

cor.test(da$finsat, da$age)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$age
## t = 6.7647, df = 39890, p-value = 1.354e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02404557 0.04364935
## sample estimates:
##        cor 
## 0.03385072

ggplot(da, aes(x=age, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'Age',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

There’s low and positive correlation between age and financial satisfaction. Thus, the older the person, on average the more they are financially satisfied.

Sex

t.test(da$finsat ~ da$sex)

## 
##  Welch Two Sample t-test
## 
## data:  da$finsat by da$sex
## t = 4.0227, df = 39519, p-value = 5.764e-05
## alternative hypothesis: true difference in means between group male and group female is not equal to 0
## 95 percent confidence interval:
##  0.04801586 0.13926864
## sample estimates:
##   mean in group male mean in group female 
##             6.440673             6.347030

ggplot(da, aes(x=sex, y=finsat)) + 
  geom_boxplot()+
  theme_minimal() + 
  labs(x = 'Sex',
       y = 'Financial satisfaction')

There’s a small but significant difference between male and female in terms of financial satisfaction. The mean in group male is 6.44, and for female - 6.35.

Number of children

cor.test(da$finsat, da$children)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$children
## t = 1.8987, df = 39890, p-value = 0.05762
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.0003071809  0.0193173068
## sample estimates:
##         cor 
## 0.009505978

ggplot(da, aes(x=children, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'Number of children',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

The correlation is extremely samll (less than 1%) but positive. Thus, people with more children are more financially satisfied.

GDP per capita

cor.test(da$finsat, da$gdppc)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$gdppc
## t = 9.0871, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03565404 0.05523976
## sample estimates:
##        cor 
## 0.04545127

ggplot(da, aes(x=gdppc, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'GDP per capita',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

There’s a positive correlation between financial satisfaction and GDP per capita. Thus, people from countries with higher GDP per capita are more financially satisfied.

Unemployment

cor.test(da$finsat, da$unemp)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$unemp
## t = -13.649, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.07794129 -0.05840625
## sample estimates:
##        cor 
## -0.0681803

ggplot(da, aes(x=unemp, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'Unemployment',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

There’s a negative correlation between financial satisfaction and level of unemployment. Thus, the higher the level of unemployment, the less financially satisfied people are.

HCI

cor.test(da$finsat, da$hci)

## 
##  Pearson's product-moment correlation
## 
## data:  da$finsat and da$hci
## t = -0.72524, df = 39890, p-value = 0.4683
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.013443836  0.006182166
## sample estimates:
##          cor 
## -0.003631185

ggplot(da, aes(x=hci, y=finsat))+ 
  geom_point(color="grey")+ 
  geom_smooth(method=lm, color="skyblue")+
  theme_minimal() + 
  labs(x = 'HCI',
       y = 'Financial satisfaction')

## `geom_smooth()` using formula = 'y ~ x'

There’s small negative correlation between financial satisfaction and HCI. However, it is hard to interpret since there’s an extreme low value of Pakistan respondents.

Multilevel regression

To create a model I will first inspect whether the inclusion of second level is justified.

nullmodel <- lmer(finsat ~ (1 | country), data = da, REML = FALSE) 
summary(nullmodel)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ (1 | country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 178658.8 178684.5 -89326.4 178652.8    39889 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7742 -0.6469  0.1396  0.6756  2.2092 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.2944   0.5426  
##  Residual             5.1458   2.2684  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)   6.3500     0.1219 19.9585   52.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

icc(nullmodel) #5% is described by country

## # Intraclass Correlation Coefficient
## 
##     Adjusted ICC: 0.054
##   Unadjusted ICC: 0.054

The ICC value shows 0.054 which means that 5% variance is described by countries. It let us to justify the inclusion of second level in the model.

At the beginning I’m adding only the control variables - age and sex.

model1<-lmer(finsat~ age + sex + (1|country), data=da, REML = FALSE)
summary(model1)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ age + sex + (1 | country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 178617.5 178660.5 -89303.8 178607.5    39887 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8457 -0.6248  0.1397  0.7005  2.2729 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.293    0.5413  
##  Residual             5.140    2.2672  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  6.189e+00  1.267e-01  2.349e+01  48.850  < 2e-16 ***
## age          4.346e-03  7.282e-04  3.987e+04   5.969 2.41e-09 ***
## sexfemale   -6.479e-02  2.282e-02  3.988e+04  -2.839  0.00453 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) age   
## age       -0.263       
## sexfemale -0.107  0.044

Both of the variables are significant at this point, age having positive effect on the result, and being female - negative.

The next step is inclusion of all first level variables.

model11<-lmer(finsat~ age + sex + lifesat + selfincome +edu + equalinc + children +(1|country), data=da, REML = FALSE)
summary(model11)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ age + sex + lifesat + selfincome + edu + equalinc +  
##     children + (1 | country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 160912.6 161024.4 -80443.3 160886.6    39879 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7011 -0.5337  0.0699  0.6147  4.6827 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.1176   0.3429  
##  Residual             3.2972   1.8158  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        5.142e-01  9.588e-02  4.699e+01   5.363 2.44e-06 ***
## age                8.928e-03  6.696e-04  3.890e+04  13.334  < 2e-16 ***
## sexfemale         -1.603e-02  1.839e-02  3.989e+04  -0.871  0.38361    
## lifesat            5.586e-01  4.689e-03  3.983e+04 119.115  < 2e-16 ***
## selfincome         2.656e-01  4.806e-03  3.988e+04  55.265  < 2e-16 ***
## edusecondary       6.442e-02  2.982e-02  3.939e+04   2.160  0.03074 *  
## edupost-secondary  1.917e-02  3.693e-02  3.762e+04   0.519  0.60369    
## eduBA              1.752e-01  3.639e-02  3.867e+04   4.815 1.48e-06 ***
## eduhigher BA       1.456e-01  4.507e-02  3.810e+04   3.231  0.00123 ** 
## equalinc           3.271e-02  3.352e-03  3.987e+04   9.758  < 2e-16 ***
## children          -6.196e-02  6.954e-03  3.979e+04  -8.910  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) age    sexfml lifest slfncm edscnd edpst- eduBA  edhgBA
## age         -0.319                                                        
## sexfemale   -0.143  0.084                                                 
## lifesat     -0.266 -0.045 -0.028                                          
## selfincome  -0.138  0.067  0.038 -0.190                                   
## edusecondry -0.273  0.133  0.042  0.009 -0.086                            
## edpst-scndr -0.261  0.156  0.049  0.014 -0.125  0.667                     
## eduBA       -0.248  0.176  0.052 -0.001 -0.187  0.673  0.659              
## eduhigherBA -0.192  0.116  0.051 -0.007 -0.215  0.575  0.602  0.578       
## equalinc    -0.151  0.004  0.033 -0.095 -0.090 -0.025 -0.029 -0.027 -0.017
## children    -0.003 -0.414 -0.058 -0.023 -0.008  0.105  0.101  0.122  0.092
##             equlnc
## age               
## sexfemale         
## lifesat           
## selfincome        
## edusecondry       
## edpst-scndr       
## eduBA             
## eduhigherBA       
## equalinc          
## children    -0.020

tab_model(model11)

	finsat
Predictors	Estimates	CI	p
(Intercept)	0.51	0.33 – 0.70	<0.001
age	0.01	0.01 – 0.01	<0.001
sex [female]	-0.02	-0.05 – 0.02	0.384
lifesat	0.56	0.55 – 0.57	<0.001
selfincome	0.27	0.26 – 0.28	<0.001
edu [secondary]	0.06	0.01 – 0.12	0.031
edu [post-secondary]	0.02	-0.05 – 0.09	0.604
edu [BA]	0.18	0.10 – 0.25	<0.001
edu [higher BA]	0.15	0.06 – 0.23	0.001
equalinc	0.03	0.03 – 0.04	<0.001
children	-0.06	-0.08 – -0.05	<0.001
Random Effects
σ²	3.30
τ₀₀ _country	0.12
ICC	0.03
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.360 / 0.382

After the inclusion of all first level variables, we can see that almost all of them proved to be significant with the exception of female sex and post secondary education. The positive effect is shown by age, life satisfaction, income, secondary education, BA and higher, belief that incomes should not be equal. The negative effect is shown be children variable which is surprising since the Person’s correlation have shown positive result.

Finally, we add second level variables. Since the scale of GPD per capita and HCI have quite different scales, they are rescaled. Thus, GDP per capita is divided by 100 and HCI is multiplied by 10.

da$gdp100<-da$gdppc/100
da$hci100<-da$hci*10

model12<-lmer(finsat~ age + sex + lifesat + selfincome +edu + equalinc + children + gdp100 + inflation + unemp + hci100 + gini + (1|country), data=da, REML = FALSE)
summary(model12)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ age + sex + lifesat + selfincome + edu + equalinc +  
##     children + gdp100 + inflation + unemp + hci100 + gini + (1 |      country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 160915.7 161070.4 -80439.8 160879.7    39874 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7019 -0.5337  0.0701  0.6149  4.6828 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.08252  0.2873  
##  Residual             3.29715  1.8158  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        2.182e+00  7.475e-01  2.010e+01   2.919  0.00846 ** 
## age                8.938e-03  6.706e-04  3.986e+04  13.329  < 2e-16 ***
## sexfemale         -1.605e-02  1.840e-02  3.988e+04  -0.872  0.38309    
## lifesat            5.586e-01  4.691e-03  3.989e+04 119.077  < 2e-16 ***
## selfincome         2.657e-01  4.807e-03  3.989e+04  55.273  < 2e-16 ***
## edusecondary       6.415e-02  2.984e-02  3.981e+04   2.150  0.03156 *  
## edupost-secondary  1.803e-02  3.698e-02  3.926e+04   0.488  0.62582    
## eduBA              1.750e-01  3.644e-02  3.976e+04   4.803 1.57e-06 ***
## eduhigher BA       1.439e-01  4.512e-02  3.927e+04   3.189  0.00143 ** 
## equalinc           3.265e-02  3.352e-03  3.987e+04   9.742  < 2e-16 ***
## children          -6.221e-02  6.958e-03  3.989e+04  -8.941  < 2e-16 ***
## gdp100             4.737e-04  4.287e-04  1.988e+01   1.105  0.28239    
## inflation         -6.344e-04  4.246e-03  1.985e+01  -0.149  0.88274    
## unemp             -4.991e-02  3.521e-02  2.006e+01  -1.417  0.17172    
## hci100            -1.993e-01  1.008e-01  1.998e+01  -1.977  0.06199 .  
## gini              -5.414e-03  1.005e-02  2.011e+01  -0.539  0.59583    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

tab_model(model12)

	finsat
Predictors	Estimates	CI	p
(Intercept)	2.18	0.72 – 3.65	0.004
age	0.01	0.01 – 0.01	<0.001
sex [female]	-0.02	-0.05 – 0.02	0.383
lifesat	0.56	0.55 – 0.57	<0.001
selfincome	0.27	0.26 – 0.28	<0.001
edu [secondary]	0.06	0.01 – 0.12	0.032
edu [post-secondary]	0.02	-0.05 – 0.09	0.626
edu [BA]	0.18	0.10 – 0.25	<0.001
edu [higher BA]	0.14	0.06 – 0.23	0.001
equalinc	0.03	0.03 – 0.04	<0.001
children	-0.06	-0.08 – -0.05	<0.001
gdp100	0.00	-0.00 – 0.00	0.269
inflation	-0.00	-0.01 – 0.01	0.881
unemp	-0.05	-0.12 – 0.02	0.156
hci100	-0.20	-0.40 – -0.00	0.048
gini	-0.01	-0.03 – 0.01	0.590
Random Effects
σ²	3.30
τ₀₀ _country	0.08
ICC	0.02
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.369 / 0.385

As we can see from the results of this model, most of the second level variables are not significant. Therefore, inflation and GINI coefficient will be deleted from the model.

model13<-lmer(finsat~ age + sex + lifesat + selfincome +edu + equalinc + children + gdp100 + unemp + hci100 + (1|country), data=da, REML = FALSE)
summary(model13)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ age + sex + lifesat + selfincome + edu + equalinc +  
##     children + gdp100 + unemp + hci100 + (1 | country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 160912.0 161049.5 -80440.0 160880.0    39876 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7020 -0.5339  0.0700  0.6148  4.6840 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.08386  0.2896  
##  Residual             3.29715  1.8158  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        1.968e+00  6.192e-01  2.022e+01   3.178  0.00468 ** 
## age                8.945e-03  6.704e-04  3.981e+04  13.342  < 2e-16 ***
## sexfemale         -1.597e-02  1.840e-02  3.989e+04  -0.868  0.38541    
## lifesat            5.585e-01  4.689e-03  3.982e+04 119.111  < 2e-16 ***
## selfincome         2.657e-01  4.805e-03  3.987e+04  55.289  < 2e-16 ***
## edusecondary       6.454e-02  2.982e-02  3.952e+04   2.164  0.03046 *  
## edupost-secondary  1.885e-02  3.695e-02  3.821e+04   0.510  0.60993    
## eduBA              1.756e-01  3.641e-02  3.931e+04   4.824 1.41e-06 ***
## eduhigher BA       1.450e-01  4.508e-02  3.819e+04   3.217  0.00130 ** 
## equalinc           3.268e-02  3.351e-03  3.985e+04   9.750  < 2e-16 ***
## children          -6.223e-02  6.957e-03  3.989e+04  -8.944  < 2e-16 ***
## gdp100             4.548e-04  4.258e-04  1.994e+01   1.068  0.29822    
## unemp             -5.799e-02  2.646e-02  2.003e+01  -2.191  0.04042 *  
## hci100            -1.910e-01  9.900e-02  2.006e+01  -1.929  0.06802 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

tab_model(model13)

	finsat
Predictors	Estimates	CI	p
(Intercept)	1.97	0.75 – 3.18	0.001
age	0.01	0.01 – 0.01	<0.001
sex [female]	-0.02	-0.05 – 0.02	0.385
lifesat	0.56	0.55 – 0.57	<0.001
selfincome	0.27	0.26 – 0.28	<0.001
edu [secondary]	0.06	0.01 – 0.12	0.030
edu [post-secondary]	0.02	-0.05 – 0.09	0.610
edu [BA]	0.18	0.10 – 0.25	<0.001
edu [higher BA]	0.14	0.06 – 0.23	0.001
equalinc	0.03	0.03 – 0.04	<0.001
children	-0.06	-0.08 – -0.05	<0.001
gdp100	0.00	-0.00 – 0.00	0.285
unemp	-0.06	-0.11 – -0.01	0.028
hci100	-0.19	-0.38 – 0.00	0.054
Random Effects
σ²	3.30
τ₀₀ _country	0.08
ICC	0.02
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.369 / 0.385

anova(model12,model13)

## Data: da
## Models:
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## model12: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + inflation + unemp + hci100 + gini + (1 | country)
##         npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)
## model13   16 160912 161049 -80440   160880                     
## model12   18 160916 161070 -80440   160880 0.2905  2     0.8648

Despite deletion of variables, the model didn’t become worse. Therefore, only GDP, unemployment and HCI will be left in the model. The interpretation of its effect is the following:

The positive effect is shown by:

1.Age. Every year, adds on average 0.01 to financial satisfaction of a person. 2.Life satisfaction. Every level of life satisfaction adds on average 0.56 to financial satisfaction of a person. 3.Income level. Every level of income adds on average 0.27 to financial satisfaction of a person. 4.Secondary education. In comparison with low education, people with secondary education have 0.06 higher financial satisfaction. 5.BA. In comparison with low education, people with BA have 0.18 higher financial satisfaction. 6.higher BA. In comparison with low education, people with higher than BA education have 0.14 higher financial satisfaction. 7.Belief about inequality of incomes. Every level of people belief that incomes should not be equal adds 0.03 to financial satisfaction.

The negative effect is shown by:

Children. With every child the level of financial satisfaction decreases by 0.06.
Unemployment. Every level of unemployment decreases the financial satisfaction by 0.06.

anova(model11, model13)

## Data: da
## Models:
## model11: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + (1 | country)
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
##         npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)  
## model11   13 160913 161024 -80443   160887                       
## model13   16 160912 161049 -80440   160880 6.6736  3    0.08306 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

However, as we can see, the comparison of model with only 1st level variables and model with 2nd level variables are not significantly different. Therefore, we can state that 2nd level variables do not improve the model.

model_performance(model11)

## Model was not fitted with REML, however, `estimator = "REML"`. Set
##   `estimator = "ML"` to obtain identical results as from `AIC()`.

## # Indices of model performance
## 
## AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
## -----------------------------------------------------------------------------------
## 1.610e+05 | 1.610e+05 | 1.611e+05 |      0.382 |      0.360 | 0.034 | 1.815 | 1.816

model_performance(model13)

## Model was not fitted with REML, however, `estimator = "REML"`. Set
##   `estimator = "ML"` to obtain identical results as from `AIC()`.

## # Indices of model performance
## 
## AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
## -----------------------------------------------------------------------------------
## 1.610e+05 | 1.610e+05 | 1.612e+05 |      0.385 |      0.369 | 0.025 | 1.815 | 1.816

As we can see, AIC for model with 1st level variables and 2nd level variables are the same. Which shows that they are not different in prediction. The same with BIC. However, the R2 value for the model with 2nd level variables is a bit higher, which means that it determines the proportion of variance in the dependent variable a bit better.

Interactions

Interaction with 1st level and 1st level

For the interaction between two 1st level variables I’ve chosen education and life satisfaction.

model21<-lmer(finsat~ age + sex + selfincome +edu*lifesat + equalinc + children + gdp100 + unemp + hci100 + (1|country), data=da, REML = FALSE)
summary(model21)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ age + sex + selfincome + edu * lifesat + equalinc +  
##     children + gdp100 + unemp + hci100 + (1 | country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 160873.6 161045.5 -80416.8 160833.6    39872 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7419 -0.5400  0.0699  0.6133  4.7726 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.08255  0.2873  
##  Residual             3.29335  1.8148  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##                             Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)                2.335e+00  6.173e-01  2.061e+01   3.783 0.001120 ** 
## age                        8.879e-03  6.702e-04  3.981e+04  13.248  < 2e-16 ***
## sexfemale                 -1.492e-02  1.839e-02  3.988e+04  -0.812 0.417053    
## selfincome                 2.639e-01  4.810e-03  3.987e+04  54.850  < 2e-16 ***
## edusecondary              -2.987e-01  9.052e-02  3.988e+04  -3.300 0.000969 ***
## edupost-secondary         -5.529e-01  1.070e-01  3.978e+04  -5.168 2.38e-07 ***
## eduBA                     -4.199e-01  1.151e-01  3.989e+04  -3.649 0.000264 ***
## eduhigher BA              -4.237e-01  1.485e-01  3.985e+04  -2.853 0.004339 ** 
## lifesat                    5.088e-01  9.385e-03  3.989e+04  54.218  < 2e-16 ***
## equalinc                   3.260e-02  3.350e-03  3.985e+04   9.734  < 2e-16 ***
## children                  -6.266e-02  6.954e-03  3.989e+04  -9.011  < 2e-16 ***
## gdp100                     4.458e-04  4.225e-04  1.994e+01   1.055 0.303919    
## unemp                     -5.747e-02  2.626e-02  2.003e+01  -2.189 0.040643 *  
## hci100                    -1.896e-01  9.823e-02  2.006e+01  -1.931 0.067803 .  
## edusecondary:lifesat       4.963e-02  1.176e-02  3.989e+04   4.222 2.43e-05 ***
## edupost-secondary:lifesat  7.911e-02  1.397e-02  3.989e+04   5.663 1.49e-08 ***
## eduBA:lifesat              8.193e-02  1.506e-02  3.988e+04   5.440 5.36e-08 ***
## eduhigher BA:lifesat       7.770e-02  1.935e-02  3.989e+04   4.015 5.96e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

tab_model(model21)

	finsat
Predictors	Estimates	CI	p
(Intercept)	2.34	1.13 – 3.55	<0.001
age	0.01	0.01 – 0.01	<0.001
sex [female]	-0.01	-0.05 – 0.02	0.417
selfincome	0.26	0.25 – 0.27	<0.001
edu [secondary]	-0.30	-0.48 – -0.12	0.001
edu [post-secondary]	-0.55	-0.76 – -0.34	<0.001
edu [BA]	-0.42	-0.65 – -0.19	<0.001
edu [higher BA]	-0.42	-0.71 – -0.13	0.004
lifesat	0.51	0.49 – 0.53	<0.001
equalinc	0.03	0.03 – 0.04	<0.001
children	-0.06	-0.08 – -0.05	<0.001
gdp100	0.00	-0.00 – 0.00	0.291
unemp	-0.06	-0.11 – -0.01	0.029
hci100	-0.19	-0.38 – 0.00	0.054
edu [secondary] × lifesat	0.05	0.03 – 0.07	<0.001
edu [post-secondary] × lifesat	0.08	0.05 – 0.11	<0.001
edu [BA] × lifesat	0.08	0.05 – 0.11	<0.001
edu [higher BA] × lifesat	0.08	0.04 – 0.12	<0.001
Random Effects
σ²	3.29
τ₀₀ _country	0.08
ICC	0.02
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.370 / 0.385

anova(model13, model21)

## Data: da
## Models:
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## model21: finsat ~ age + sex + selfincome + edu * lifesat + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
##         npar    AIC    BIC logLik deviance Chisq Df Pr(>Chisq)    
## model13   16 160912 161049 -80440   160880                        
## model21   20 160874 161046 -80417   160834 46.31  4  2.123e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot_model(model21, type="int")

As we can see from the result, the inclusion of interaction effect is significant and makes the model better.

As a result, we can see that the higher the level of education, the lower the financial satisfaction. However, the negative result of all levels of education is lower and even becomes positive the higher the life satisfaction of a person.

You can observe on the graph, for the highest level of of life satisfaction, every level of education have more positive effect on financial satisfaction. At the same time, for the lowest level of life satisfaction, every level of education have more negative effect on financial satisfaction.

Interaction 1st level and 2nd level

model23<-lmer(finsat~ age + sex + selfincome +edu + equalinc + children + unemp*lifesat + hci100 + gdp100 + (1|country), data=da, REML = FALSE)
summary(model23)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: finsat ~ age + sex + selfincome + edu + equalinc + children +  
##     unemp * lifesat + hci100 + gdp100 + (1 | country)
##    Data: da
## 
##      AIC      BIC   logLik deviance df.resid 
## 160838.5 160984.6 -80402.3 160804.5    39875 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7350 -0.5352  0.0687  0.6142  4.6518 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.08666  0.2944  
##  Residual             3.29087  1.8141  
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        1.428e+00  6.322e-01  2.058e+01   2.258  0.03491 *  
## age                8.777e-03  6.701e-04  3.982e+04  13.099  < 2e-16 ***
## sexfemale         -1.816e-02  1.838e-02  3.988e+04  -0.988  0.32310    
## selfincome         2.647e-01  4.802e-03  3.987e+04  55.125  < 2e-16 ***
## edusecondary       6.446e-02  2.979e-02  3.955e+04   2.164  0.03050 *  
## edupost-secondary  1.707e-02  3.692e-02  3.833e+04   0.462  0.64389    
## eduBA              1.717e-01  3.638e-02  3.936e+04   4.718 2.39e-06 ***
## eduhigher BA       1.420e-01  4.504e-02  3.832e+04   3.153  0.00162 ** 
## equalinc           3.253e-02  3.348e-03  3.985e+04   9.717  < 2e-16 ***
## children          -6.192e-02  6.951e-03  3.989e+04  -8.908  < 2e-16 ***
## unemp              4.901e-02  2.957e-02  2.924e+01   1.657  0.10816    
## lifesat            6.379e-01  1.027e-02  3.975e+04  62.105  < 2e-16 ***
## hci100            -1.917e-01  1.006e-01  2.003e+01  -1.905  0.07117 .  
## gdp100             4.531e-04  4.326e-04  1.991e+01   1.047  0.30748    
## unemp:lifesat     -1.518e-02  1.747e-03  3.949e+04  -8.689  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

tab_model(model23)

	finsat
Predictors	Estimates	CI	p
(Intercept)	1.43	0.19 – 2.67	0.024
age	0.01	0.01 – 0.01	<0.001
sex [female]	-0.02	-0.05 – 0.02	0.323
selfincome	0.26	0.26 – 0.27	<0.001
edu [secondary]	0.06	0.01 – 0.12	0.030
edu [post-secondary]	0.02	-0.06 – 0.09	0.644
edu [BA]	0.17	0.10 – 0.24	<0.001
edu [higher BA]	0.14	0.05 – 0.23	0.002
equalinc	0.03	0.03 – 0.04	<0.001
children	-0.06	-0.08 – -0.05	<0.001
unemp	0.05	-0.01 – 0.11	0.097
lifesat	0.64	0.62 – 0.66	<0.001
hci100	-0.19	-0.39 – 0.01	0.057
gdp100	0.00	-0.00 – 0.00	0.295
unemp × lifesat	-0.02	-0.02 – -0.01	<0.001
Random Effects
σ²	3.29
τ₀₀ _country	0.09
ICC	0.03
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.370 / 0.386

anova(model13, model21, model23)

## Data: da
## Models:
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## model23: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 | country)
## model21: finsat ~ age + sex + selfincome + edu * lifesat + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
##         npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)    
## model13   16 160912 161049 -80440   160880                         
## model23   17 160839 160985 -80402   160805 75.413  1     <2e-16 ***
## model21   20 160874 161046 -80417   160834  0.000  3          1    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot_model(model23, type="int")

The inclusion of interaction between 1st level and 2nd level variables makes the model better. This model have the lowest AIC between all models. From the model results we can see, that the higher the level of life satisfaction, the stronger the negative effect of unemployment. This finding is also supported by the graph.

Random effects

For the random effect, I’ve decided to choose income level

model31<-lmer(finsat~ age + sex + selfincome +edu + equalinc + children + unemp*lifesat + hci100 + gdp100 + (1+ selfincome|country), data=da, control = lmerControl(optimizer ="Nelder_Mead"))
summary(model31)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: finsat ~ age + sex + selfincome + edu + equalinc + children +  
##     unemp * lifesat + hci100 + gdp100 + (1 + selfincome | country)
##    Data: da
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
## REML criterion at convergence: 160543.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.8079 -0.5343  0.0657  0.6150  4.7142 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  country  (Intercept) 0.529710 0.72781       
##           selfincome  0.009776 0.09887  -0.92
##  Residual             3.256631 1.80461       
## Number of obs: 39892, groups:  country, 20
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        9.504e-01  6.256e-01  1.882e+01   1.519  0.14535    
## age                8.847e-03  6.718e-04  3.981e+04  13.168  < 2e-16 ***
## sexfemale         -9.102e-03  1.832e-02  3.986e+04  -0.497  0.61935    
## selfincome         2.744e-01  2.274e-02  1.916e+01  12.069 2.11e-10 ***
## edusecondary       9.018e-02  2.991e-02  3.954e+04   3.016  0.00257 ** 
## edupost-secondary  5.478e-02  3.698e-02  3.843e+04   1.481  0.13853    
## eduBA              1.774e-01  3.643e-02  3.936e+04   4.870 1.12e-06 ***
## eduhigher BA       9.826e-02  4.497e-02  3.703e+04   2.185  0.02891 *  
## equalinc           3.125e-02  3.336e-03  3.980e+04   9.369  < 2e-16 ***
## children          -6.456e-02  6.943e-03  3.986e+04  -9.299  < 2e-16 ***
## unemp              9.228e-02  2.898e-02  2.583e+01   3.184  0.00376 ** 
## lifesat            6.357e-01  1.040e-02  3.954e+04  61.100  < 2e-16 ***
## hci100            -1.698e-01  9.639e-02  1.620e+01  -1.762  0.09691 .  
## gdp100             7.867e-04  4.141e-04  1.605e+01   1.900  0.07560 .  
## unemp:lifesat     -1.618e-02  1.775e-03  3.918e+04  -9.116  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

tab_model(model31, show.std = T)

	finsat
Predictors	Estimates	std. Beta	CI	standardized CI	p	std. p
(Intercept)	0.95	-0.04	-0.28 – 2.18	-0.11 – 0.02	0.129	0.209
age	0.01	0.06	0.01 – 0.01	0.05 – 0.07	<0.001	<0.001
sex [female]	-0.01	-0.00	-0.05 – 0.03	-0.02 – 0.01	0.619	0.619
selfincome	0.27	0.25	0.23 – 0.32	0.21 – 0.29	<0.001	<0.001
edu [secondary]	0.09	0.04	0.03 – 0.15	0.01 – 0.06	0.003	0.003
edu [post-secondary]	0.05	0.02	-0.02 – 0.13	-0.01 – 0.05	0.139	0.139
edu [BA]	0.18	0.08	0.11 – 0.25	0.05 – 0.11	<0.001	<0.001
edu [higher BA]	0.10	0.04	0.01 – 0.19	0.00 – 0.08	0.029	0.029
equalinc	0.03	0.04	0.02 – 0.04	0.03 – 0.05	<0.001	<0.001
children	-0.06	-0.04	-0.08 – -0.05	-0.05 – -0.03	<0.001	<0.001
unemp	0.09	-0.03	0.04 – 0.15	-0.08 – 0.03	0.001	0.354
lifesat	0.64	0.49	0.62 – 0.66	0.48 – 0.50	<0.001	<0.001
hci100	-0.17	-0.08	-0.36 – 0.02	-0.17 – 0.01	0.078	0.078
gdp100	0.00	0.09	-0.00 – 0.00	-0.00 – 0.19	0.057	0.057
unemp × lifesat	-0.02	-0.04	-0.02 – -0.01	-0.04 – -0.03	<0.001	<0.001
Random Effects
σ²	3.26
τ₀₀ _country	0.53
τ₁₁ _{country.selfincome}	0.01
ρ₀₁ _country	-0.92
ICC	0.05
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.366 / 0.395

anova(model23, model31)

## refitting model(s) with ML (instead of REML)

## Data: da
## Models:
## model23: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 | country)
## model31: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 + selfincome | country)
##         npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)    
## model23   17 160839 160985 -80402   160805                         
## model31   19 160471 160634 -80217   160433 371.51  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ranova(model31)

## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## finsat ~ age + sex + selfincome + edu + equalinc + children + unemp + lifesat + hci100 + gdp100 + (1 + selfincome | country) + unemp:lifesat
##                                          npar logLik    AIC    LRT Df
## <none>                                     19 -80272 160582          
## selfincome in (1 + selfincome | country)   17 -80459 160951 373.51  2
##                                          Pr(>Chisq)    
## <none>                                                 
## selfincome in (1 + selfincome | country)  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As we can see from the anova test results, this model is even better than the previous one, which hasn’t included the random effect. Thus, we can conclude that this model is the best among all. The detailed information on predictors is the following:

The positive effect is shown by:

1.Age. Every year, adds on average 0.01 to financial satisfaction of a person. 2.Life satisfaction. Every level of life satisfaction adds on average 0.64 to financial satisfaction of a person. 3.Income level. Every level of income adds on average 0.27 to financial satisfaction of a person. 4.Secondary education. In comparison with low education, people with secondary education have 0.09 higher financial satisfaction. 5.BA. In comparison with low education, people with BA have 0.18 higher financial satisfaction. 6.higher BA. In comparison with low education, people with higher than BA education have 0.09 higher financial satisfaction. 7.Belief about inequality of incomes. Every level of people belief that incomes should not be equal adds 0.03 to financial satisfaction.

The negative effect is shown by:

Children. With every child the level of financial satisfaction decreases by 0.06.

Interaction effect:

The higher the level of life satisfaction the higher the negative effect of unemployment on financial satisfaction.

Random effect:

The inclusion of random slope with income level is significant. There is a difference of income level effect on independent variable between countries.

The random slope graph can be seen below:

dotplot(ranef(model31, condVar=TRUE))

## $country

From the plot we can see that effect of income level is different in different countries.

It is negative for: Pakistan, Thailand, Netherlands, Japan, Mexico, Mongolia, Colombia, Indonesia. Positive for: USA, Ukraine, Canada, Russia, Germany, Brazil, Singapore. Intersect 0: China, South Korea, Turkey, Australia, Kazakhstan.

Diagnostics

model_performance(model31)

## # Indices of model performance
## 
## AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
## -----------------------------------------------------------------------------------
## 1.606e+05 | 1.606e+05 | 1.607e+05 |      0.395 |      0.366 | 0.046 | 1.804 | 1.805

Compared to previous results if AIC and BIC, the final model has the lowest. Also it has an R2 of 0.395 which means that it describes 0.395 variance.

residuals <- resid(model31)
plot(residuals, main = "Residuals")

From the residuals plot we can see that residuals are distributed along the straight line in the middle and don’t have a pattern. Which means that the model has a good fit.

qqnorm(residuals)
qqline(residuals)

The Q-Q plot is not ideal because the line deviates at the beginning and end.

vif_model <- vif(model31)
print(vif_model)

##                   GVIF Df GVIF^(1/(2*Df))
## age           1.314342  1        1.146447
## sex           1.013325  1        1.006641
## selfincome    1.006487  1        1.003238
## edu           1.105423  4        1.012607
## equalinc      1.011858  1        1.005912
## children      1.280316  1        1.131510
## unemp         1.520246  1        1.232983
## lifesat       5.007857  1        2.237824
## hci100        2.810049  1        1.676320
## gdp100        2.899219  1        1.702709
## unemp:lifesat 5.292935  1        2.300638

In general, the GVIF results for all variables are fine, however the life satisfaction has rather high value, which means that the multicolinearity is possible in the model.

Extra model

To fix the problem of multicolinearity, we can eliminate the life satisfaction from the predictors.

model41<-lmer(finsat~ age + sex + selfincome +edu + equalinc + children + unemp + hci100 + gdp100 + (1+ selfincome|country), data=da, control = lmerControl(optimizer ="Nelder_Mead"))
tab_model(model41)

	finsat
Predictors	Estimates	CI	p
(Intercept)	5.35	3.50 – 7.19	<0.001
age	0.01	0.01 – 0.01	<0.001
sex [female]	0.06	0.02 – 0.10	0.007
selfincome	0.39	0.33 – 0.46	<0.001
edu [secondary]	0.08	0.01 – 0.15	0.020
edu [post-secondary]	0.03	-0.06 – 0.11	0.523
edu [BA]	0.19	0.11 – 0.28	<0.001
edu [higher BA]	0.12	0.01 – 0.22	0.027
equalinc	0.07	0.06 – 0.08	<0.001
children	-0.05	-0.07 – -0.03	<0.001
unemp	-0.03	-0.10 – 0.05	0.520
hci100	-0.33	-0.62 – -0.04	0.023
gdp100	0.00	0.00 – 0.00	0.015
Random Effects
σ²	4.39
τ₀₀ _country	1.22
τ₁₁ _{country.selfincome}	0.02
ρ₀₁ _country	-0.92
ICC	0.08
N _country	20
Observations	39892
Marginal R² / Conditional R²	0.153 / 0.219

anova(model31, model41)

## refitting model(s) with ML (instead of REML)

## Data: da
## Models:
## model41: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp + hci100 + gdp100 + (1 + selfincome | country)
## model31: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 + selfincome | country)
##         npar    AIC    BIC logLik deviance Chisq Df Pr(>Chisq)    
## model41   17 172396 172542 -86181   172362                        
## model31   19 160471 160634 -80217   160433 11929  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

vif_model1 <- vif(model41)
print(vif_model1)

##                GVIF Df GVIF^(1/(2*Df))
## age        1.309402  1        1.144291
## sex        1.012176  1        1.006070
## selfincome 1.002969  1        1.001484
## edu        1.101886  4        1.012202
## equalinc   1.002761  1        1.001380
## children   1.279523  1        1.131160
## unemp      1.204905  1        1.097682
## hci100     2.804680  1        1.674718
## gdp100     2.893936  1        1.701157

residuals1 <- resid(model41)
plot(residuals1, main = "Residuals")

qqnorm(residuals1)
qqline(residuals1)

As we can see, the model performs worse than the previous one in terms of AIC, since its value is higher here. Moreover, R2 is lower (0.153). However, the Q-Q plot looks better and there is no such problem as multicolinearity here.

Findings

The Research question was: How does life satisfaction and belief in income equality affect satisfaction with financial situation among individuals with varying levels of education and income?

This project has shown that life satisfaction, income, and belief that incomes should not be equal are positively associated with financial satisfaction

**Hypotheses*

People with higher scores of life satisfaction are more satisfied with their financial situation. - confirmed
People who don’t believe that incomes should be equal are more satisfied with their financial situation. - confirmed
Higher levels of education are associated with higher satisfaction with financial situation. - confirmed
The more children the person has, the less they are satisfied with their financial situation. - confirmed
The higher the self-assessed level of income, the higher the satisfaction with financial situation. - confirmed
People coming from countries with higher GDP per capita are more satisfied with their financial situation - wasn’t supported
People coming from countries with higher human capital index are more satisfied with their financial situation. - wasn’t supported
People coming from countries with higher unemployment rate are less satisfied with their financial situation. - partly supported since for higher level of life satisfaction this hypothesis is true.

Reflection

The project was interesting to me, however, I have encountered a problem with the memory of my computer since the image of missings data frame couldn’t be plotted (image(mdf)). Also, for some reason I wasn’t able to plot_model and couldn’t find the solution to it.

Otherwise, the work on selecting variables and creating different models, especially with the random slope was very intriguing and important to me.