Financial satisfaction is an important matter which needs to be addressed because it shapes the lives of people and their happiness with life in general. Therefore, in my project I want to examine the effect of different indicators to see how they shape the financial satisfaction. To get a more detailed picture I’m going to use data on 20 countries from all over the world. It will help to see, how the effect of different variables differ between countries.
The main effect in which I’m interested is the life satisfaction and belief in income equality. These aspects are significant because satisfaction with life and financial situation are connected since both of them represent the happiness of a person with his situation and perspectives. Belief in income equality is connected to the economic views. I argue that it worth studying since the assessment of personal happiness is affected by the beliefs about the general situation.
Thus, I come with the following Research Question:
How does life satisfaction and belief in income equality affect satisfaction with financial situation among individuals with varying levels of education and income?
To answer this RQ I developed the following hypotheses:
Additional hypotheses:
Hypotheses for the 2nd level variables:
To answer the research question and address hypotheses I have selected the following countries: Russia, Kazakhstan, Ukraine, USA, Thailand, China, Japan, South Korea, Germany, Australia, Mexico, Turkey, Canada, Indonesia, Netherlands, Singapore, Pakistan, Brazil, Mongolia, Colombia. This list was motivated by the presence of all required variables and the diversity of these countries in terms of 2nd level variables: they have different levels of GDP per capita, HCI, and unemployment rate. Therefore, I believe that studying them will show important differences which are essential to the understanding of satisfaction with financial situation.
The dependent variable is: Satisfaction with the financial situation in the household. It has 10 levels, the lower one meaning - Completely dissatisfied, the higher - Completely satisfied.
The independent variables are the following:
first level
Life satisfaction. It has 10 levels, the lower one meaning - Completely dissatisfied, the higher - Completely satisfied.
Self-assessed level of income. It has 10 levels, the lower one meaning - Lowest group, the higher - Highest group.
Highest level of education. Factor variable.
Belief about equality of incomes. It has 10 levels, the lower one meaning - Incomes should be made more equal, the higher - There should be greater incentives for individual effort.
Number of children that person has.
Control: sex, age.
second level
The data was gotten from the World Bank Open Data.
GDP per capita.
Human Capital Index.
Unemployment rate.
After the filtering, renaming, and releveling the variables, we get the following structure of the data. The total number of observations is 39892. Number of variables - 9.
view_df(data, show.type =T, show.frq = T, show.prc = T, show.na = T)
| ID | Name | Type | Label | missings | Values | Value Labels | Freq. | % |
|---|---|---|---|---|---|---|---|---|
| 1 | country | categorical | 0 (0.00%) |
Australia Brazil Canada China Colombia Germany Indonesia Japan Kazakhstan South Korea Mexico Mongolia Netherlands Pakistan Russia <… truncated> |
1813 1762 4018 3036 1520 1528 3200 1353 1276 1245 1741 1638 2145 1995 1810 |
4.54 4.42 10.07 7.61 3.81 3.83 8.02 3.39 3.20 3.12 4.36 4.11 5.38 5.00 4.54 |
||
| 2 | finsat | integer | 293 (0.73%) | range: 1-10 | ||||
| 3 | lifesat | integer | 248 (0.62%) | range: 1-10 | ||||
| 4 | selfincome | integer | 1381 (3.46%) | range: 1-10 | ||||
| 5 | age | integer | 32 (0.08%) | range: 17-98 | ||||
| 6 | sex | categorical | 23 (0.06%) |
male female |
18840 21029 |
47.25 52.75 |
||
| 7 | edu | categorical | 521 (1.31%) |
primary and lower secondary post-secondary BA higher BA |
6498 14037 7654 7588 3594 |
16.50 35.65 19.44 19.27 9.13 |
||
| 8 | equalinc | integer | 631 (1.58%) | range: 1-10 | ||||
| 9 | children | integer | 565 (1.42%) | range: 0-22 | ||||
The graphics of variables distribution can be seen below:
ggarrange(
ggplot(data = data, aes(x = finsat)) +
geom_histogram(fill="skyblue")+
theme_minimal() +
labs(x = 'Financial satisfaction',
y = 'Count'),
ggplot(data = data, aes(x = lifesat)) +
geom_histogram(fill="skyblue")+
theme_minimal() +
labs(x = 'Life satisfaction',
y = 'Count'),
ggplot(data = data, aes(x = selfincome)) +
geom_histogram(fill="skyblue")+
theme_minimal() +
labs(x = 'Income groups',
y = 'Count'),
ggplot(data = data, aes(x = equalinc)) +
geom_histogram(fill="skyblue")+
theme_minimal() +
labs(x = 'Incomes should not be equal',
y = 'Count'),
ncol = 2, nrow = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
All of the variables are not distributed normally except for Income groups. However, even this plot shows that there are too many people rating themselves as the lowest income group to call it properly normally distributed.
These plots show us that on average people are rather satisfied with their financial situation or rate it average. In terms of life satisfaction, people are more positive. More people tend to rate their household’s income group as average, once again, the distribution is similar to normal. Talking about the beliefs about the equality of income, we can see that mor people think that there should be greater incentives for individual effort. However, there is also a big group of people who think that incomes should be equal completely.
ggarrange(
ggplot(data = data, aes(x = age)) +
geom_histogram(fill="skyblue")+
theme_minimal() +
labs(x = 'Age',
y = 'Count'),
ggplot(data = data, aes(x = sex)) +
geom_bar(fill="skyblue")+
theme_minimal() +
labs( x = 'Sex',
y = 'Count'),
ggplot(data = data, aes(x = edu)) +
geom_bar(fill="skyblue")+
theme_minimal() +
labs(x = 'Education',
y = 'Count'),
ggplot(data = data, aes(x = children)) +
geom_histogram(fill="skyblue")+
theme_minimal() +
labs(x = 'Number of children',
y = 'Count'),
ncol = 2, nrow = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In terms of ages, the distribution is similar to normal, but it is right-skewed. There is almost equal number of male and female respondents, however there are slightly more women. The most saturated level of education is secondary. The least - having a degree higher than BA. Most people either don’t have or have small number of children.
mdf <- missing_data.frame (data[,-1])
mdf <- change (mdf, y = c("age", "finsat", "lifesat", "selfincome", "equalinc"), what = "type", to = "positive-continuous")
mdf <- change (mdf, y = c("edu"), what = "type", to = "ordered-categorical")
show(mdf)
## Object of class missing_data.frame with 39892 observations on 8 variables
##
## There are 67 missing data patterns
##
## Append '@patterns' to this missing_data.frame to access the corresponding pattern for every observation or perhaps use table()
##
## type missing method model
## finsat positive-continuous 293 ppd linear
## lifesat positive-continuous 248 ppd linear
## selfincome positive-continuous 1381 ppd linear
## age positive-continuous 32 ppd linear
## sex binary 23 ppd logit
## edu ordered-categorical 521 ppd ologit
## equalinc positive-continuous 631 ppd linear
## children continuous 565 ppd linear
##
## family link transformation
## finsat gaussian identity log
## lifesat gaussian identity log
## selfincome gaussian identity log
## age gaussian identity log
## sex binomial logit <NA>
## edu multinomial logit <NA>
## equalinc gaussian identity log
## children gaussian identity standardize
#image(mdf)
As we can see, the number of missings is not high for most of the varibales. The variable with the highest share of missings is Level of income. The type of missingness is most likely MAR.
To deal with missing values I’m going to use imputation for every country using Random Forest method.
After imputing for every country separately, I join all the datasets together. The resulting table without missings can be seen below.
d<-rbind(a_imputed_missForest$ximp,b_imputed_missForest$ximp,c_imputed_missForest$ximp,ch_imputed_missForest$ximp,
co_imputed_missForest$ximp, g_imputed_missForest$ximp, i_imputed_missForest$ximp, j_imputed_missForest$ximp,
k_imputed_missForest$ximp, s_imputed_missForest$ximp, m_imputed_missForest$ximp, mo_imputed_missForest$ximp,
n_imputed_missForest$ximp, p_imputed_missForest$ximp, r_imputed_missForest$ximp, sin_imputed_missForest$ximp,
tur_imputed_missForest$ximp, uk_imputed_missForest$ximp, th_imputed_missForest$ximp, us_imputed_missForest$ximp)
view_df(d, show.type =T, show.frq = T, show.prc = T, show.na = T)
| ID | Name | Type | Label | missings | Values | Value Labels | Freq. | % |
|---|---|---|---|---|---|---|---|---|
| 1 | country | categorical | 0 (0.00%) |
Australia Brazil Canada China Colombia Germany Indonesia Japan Kazakhstan South Korea Mexico Mongolia Netherlands Pakistan Russia <… truncated> |
1813 1762 4018 3036 1520 1528 3200 1353 1276 1245 1741 1638 2145 1995 1810 |
4.54 4.42 10.07 7.61 3.81 3.83 8.02 3.39 3.20 3.12 4.36 4.11 5.38 5.00 4.54 |
||
| 2 | finsat | numeric | 0 (0.00%) | range: 1.0-10.0 | ||||
| 3 | lifesat | numeric | 0 (0.00%) | range: 1.0-10.0 | ||||
| 4 | selfincome | numeric | 0 (0.00%) | range: 1.0-10.0 | ||||
| 5 | age | numeric | 0 (0.00%) | range: 17.0-98.0 | ||||
| 6 | sex | categorical | 0 (0.00%) |
male female |
18850 21042 |
47.25 52.75 |
||
| 7 | edu | categorical | 0 (0.00%) |
primary and lower secondary post-secondary BA higher BA |
6551 14228 7795 7661 3657 |
16.42 35.67 19.54 19.20 9.17 |
||
| 8 | equalinc | numeric | 0 (0.00%) | range: 1.0-10.0 | ||||
| 9 | children | numeric | 0 (0.00%) | range: -0.0-22.0 | ||||
Now I’m going to add 2nd level variables, namely GDP per capita, unemployment rate, and HCI. All the data was gotten through the World Bank.
seclev <- data.frame(id=1:20,
gdppc=c('15270.7', '11492.0', '4534.0', '76329.6', '6910.0', '12720.2', '34017.3', '32422.6', '48718.0', '65099.8', '11496.5', '10674.5', '55522.4', '4788.0', '57025.0', '82807.6', '1588.9', '8917.7', '5045.5', '6624.2'),
inflation=c('15.8', '19.8', '34.3','7.0', '4.7', '2.2', '0.3','1.3','5.3', '7.1', '6.7', '96.0', '7.7', '9.6', '5.5', '9.1', '14.0','8.3','17.7','14.3'),
unemp=c('3.9', '4.9', '9.8', '3.6', '0.9', '5.0', '2.6', '2.9', '3.1', '3.7', '3.3', '10.4', '5.3', '3.5', '3.5', '3.6', '5.6', '9.2', '6.2', '10.6'),
hci=c('0.68', '0.63', '0.63', '0.70', '0.61', '0.65', '0.80', '0.80', '0.75', '0.77', '0.61', '0.65', '0.80', '0.54', '0.79', '0.88', '0.41', '0.55', '0.61', '0.60'),
gini=c('36.0', '27.8', '25.6', '39.8', '35.1', '37.1', '32.9', '31.4', '31.7', '34.3', '45.4', '41.9', '31.7', '37.9', '26.0', '36.0', '29.6', '52.9', '32.7', '51.5'),
regime=c('Electoral autocracy', 'Electoral autocracy', 'Electoral autocracy', 'Liberal democracy', 'Closed autocracy', 'Closed autocracy', 'Liberal democracy', 'Liberal democracy', 'Liberal democracy', 'Liberal democracy', 'Electoral democracy', 'Electoral autocracy', 'Electoral democracy', 'Electoral democracy', 'Liberal democracy', 'Electoral autocracy', 'Electoral autocracy', 'Electoral democracy', 'Electoral democracy', 'Electoral democracy'),
country=c('Russia', 'Kazakhstan', 'Ukraine', 'USA', 'Thailand', 'China', 'Japan', 'South Korea', 'Germany', 'Australia', 'Mexico', 'Turkey', 'Canada', 'Indonesia', 'Netherlands', 'Singapore', 'Pakistan', 'Brazil', 'Mongolia', 'Colombia'))
da<-merge(d, seclev, by="country", all = T)
da$gdppc<-as.numeric(as.character(da$gdppc))
da$inflation<-as.numeric(as.character(da$inflation))
da$unemp<-as.numeric(as.character(da$unemp))
da$hci<-as.numeric(as.character(da$hci))
da$gini<-as.numeric(as.character(da$gini))
da$regime <- as.factor(da$regime)
summary(da)
## country finsat lifesat selfincome
## Canada : 4018 Min. : 1.000 Min. : 1.000 Min. : 1.000
## Indonesia : 3200 1st Qu.: 5.000 1st Qu.: 6.000 1st Qu.: 3.000
## China : 3036 Median : 7.000 Median : 7.000 Median : 5.000
## USA : 2596 Mean : 6.391 Mean : 7.181 Mean : 4.847
## Turkey : 2415 3rd Qu.: 8.000 3rd Qu.: 9.000 3rd Qu.: 6.000
## Netherlands: 2145 Max. :10.000 Max. :10.000 Max. :10.000
## (Other) :22482
## age sex edu equalinc
## Min. :17.00 male :18850 primary and lower: 6551 Min. : 1.000
## 1st Qu.:31.00 female:21042 secondary :14228 1st Qu.: 4.000
## Median :44.00 post-secondary : 7795 Median : 6.000
## Mean :44.77 BA : 7661 Mean : 5.973
## 3rd Qu.:57.00 higher BA : 3657 3rd Qu.: 8.000
## Max. :98.00 Max. :10.000
##
## children id gdppc inflation
## Min. : 0.00 Min. : 1.00 Min. : 1589 Min. : 0.30
## 1st Qu.: 0.00 1st Qu.: 6.00 1st Qu.: 6910 1st Qu.: 5.50
## Median : 2.00 Median :12.00 Median :12720 Median : 7.70
## Mean : 1.65 Mean :10.84 Mean :29692 Mean :14.58
## 3rd Qu.: 2.00 3rd Qu.:15.00 3rd Qu.:55522 3rd Qu.:14.00
## Max. :22.00 Max. :20.00 Max. :82808 Max. :96.00
##
## unemp hci gini regime
## Min. : 0.900 Min. :0.4100 Min. :25.60 Closed autocracy : 4536
## 1st Qu.: 3.500 1st Qu.:0.6100 1st Qu.:31.70 Electoral autocracy:10797
## Median : 3.900 Median :0.6500 Median :36.00 Electoral democracy:13879
## Mean : 5.067 Mean :0.6746 Mean :36.04 Liberal democracy :10680
## 3rd Qu.: 5.600 3rd Qu.:0.7900 3rd Qu.:39.80
## Max. :10.600 Max. :0.8800 Max. :52.90
##
There are also some additional second level variables such as inflation and political regime. But they are not used further due to their inefficiency.
In this part I’m doing bivariate tests between the dependent variable and predictors.
cor.test(da$finsat, da$lifesat)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$lifesat
## t = 133.22, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5480819 0.5616651
## sample estimates:
## cor
## 0.5549105
ggplot(da, aes(x=lifesat, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'Life satisfaction',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
The correlation between this variables is rather high - 0.56, also it is positive. Thus, people with higher life satisfaction are more financially satisfied.
cor.test(da$finsat, da$selfincome)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$selfincome
## t = 69.625, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3204001 0.3378999
## sample estimates:
## cor
## 0.3291783
ggplot(da, aes(x=selfincome, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'Income lvl',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
The correlation is positive and equals 0.33. Thus, people with higher incomes are more financially satisfied.
cor.test(da$finsat, da$equalinc)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$equalinc
## t = 24.482, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1119880 0.1313238
## sample estimates:
## cor
## 0.1216675
ggplot(da, aes(x=equalinc, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'Belief about income equaliy',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
The correlation is low and positive - 0.12. It means that people who believe that there should be greater incentives for individual effort, are more financially satisfied.
TukeyHSD(aov(da$finsat~da$edu))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = da$finsat ~ da$edu)
##
## $`da$edu`
## diff lwr upr p adj
## secondary-primary and lower 0.07580887 -0.01840439 0.170022133 0.1815165
## post-secondary-primary and lower -0.01201743 -0.11777927 0.093744411 0.9980004
## BA-primary and lower 0.48485270 0.37866933 0.591036068 0.0000000
## higher BA-primary and lower 0.55951072 0.42926019 0.689761247 0.0000000
## post-secondary-secondary -0.08782630 -0.17674307 0.001090472 0.0547747
## BA-secondary 0.40904382 0.31962607 0.498461576 0.0000000
## higher BA-secondary 0.48370185 0.36671541 0.600688281 0.0000000
## BA-post-secondary 0.49687013 0.39535676 0.598383486 0.0000000
## higher BA-post-secondary 0.57152815 0.44505580 0.698000492 0.0000000
## higher BA-BA 0.07465802 -0.05216704 0.201483081 0.4936123
ggplot(da, aes(x=edu, y=finsat)) +
geom_boxplot()+
theme_minimal() +
labs(x = 'Education',
y = 'Financial satisfaction')
The significant comparisons include primary and lower with BA/higher BA. People with degrees are significantly more financially satisfied than people with low education. The same situation goes with secondary and post-secondary. People with at least BA degrees are more financially satisfied than respondents from those groups.
cor.test(da$finsat, da$age)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$age
## t = 6.7647, df = 39890, p-value = 1.354e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.02404557 0.04364935
## sample estimates:
## cor
## 0.03385072
ggplot(da, aes(x=age, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'Age',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
There’s low and positive correlation between age and financial satisfaction. Thus, the older the person, on average the more they are financially satisfied.
t.test(da$finsat ~ da$sex)
##
## Welch Two Sample t-test
##
## data: da$finsat by da$sex
## t = 4.0227, df = 39519, p-value = 5.764e-05
## alternative hypothesis: true difference in means between group male and group female is not equal to 0
## 95 percent confidence interval:
## 0.04801586 0.13926864
## sample estimates:
## mean in group male mean in group female
## 6.440673 6.347030
ggplot(da, aes(x=sex, y=finsat)) +
geom_boxplot()+
theme_minimal() +
labs(x = 'Sex',
y = 'Financial satisfaction')
There’s a small but significant difference between male and female in terms of financial satisfaction. The mean in group male is 6.44, and for female - 6.35.
cor.test(da$finsat, da$children)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$children
## t = 1.8987, df = 39890, p-value = 0.05762
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.0003071809 0.0193173068
## sample estimates:
## cor
## 0.009505978
ggplot(da, aes(x=children, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'Number of children',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
The correlation is extremely samll (less than 1%) but positive. Thus, people with more children are more financially satisfied.
cor.test(da$finsat, da$gdppc)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$gdppc
## t = 9.0871, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03565404 0.05523976
## sample estimates:
## cor
## 0.04545127
ggplot(da, aes(x=gdppc, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'GDP per capita',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
There’s a positive correlation between financial satisfaction and GDP per capita. Thus, people from countries with higher GDP per capita are more financially satisfied.
cor.test(da$finsat, da$unemp)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$unemp
## t = -13.649, df = 39890, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.07794129 -0.05840625
## sample estimates:
## cor
## -0.0681803
ggplot(da, aes(x=unemp, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'Unemployment',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
There’s a negative correlation between financial satisfaction and level of unemployment. Thus, the higher the level of unemployment, the less financially satisfied people are.
cor.test(da$finsat, da$hci)
##
## Pearson's product-moment correlation
##
## data: da$finsat and da$hci
## t = -0.72524, df = 39890, p-value = 0.4683
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.013443836 0.006182166
## sample estimates:
## cor
## -0.003631185
ggplot(da, aes(x=hci, y=finsat))+
geom_point(color="grey")+
geom_smooth(method=lm, color="skyblue")+
theme_minimal() +
labs(x = 'HCI',
y = 'Financial satisfaction')
## `geom_smooth()` using formula = 'y ~ x'
There’s small negative correlation between financial satisfaction and HCI. However, it is hard to interpret since there’s an extreme low value of Pakistan respondents.
To create a model I will first inspect whether the inclusion of second level is justified.
nullmodel <- lmer(finsat ~ (1 | country), data = da, REML = FALSE)
summary(nullmodel)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 178658.8 178684.5 -89326.4 178652.8 39889
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7742 -0.6469 0.1396 0.6756 2.2092
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.2944 0.5426
## Residual 5.1458 2.2684
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.3500 0.1219 19.9585 52.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
icc(nullmodel) #5% is described by country
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.054
## Unadjusted ICC: 0.054
The ICC value shows 0.054 which means that 5% variance is described by countries. It let us to justify the inclusion of second level in the model.
At the beginning I’m adding only the control variables - age and sex.
model1<-lmer(finsat~ age + sex + (1|country), data=da, REML = FALSE)
summary(model1)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ age + sex + (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 178617.5 178660.5 -89303.8 178607.5 39887
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8457 -0.6248 0.1397 0.7005 2.2729
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.293 0.5413
## Residual 5.140 2.2672
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.189e+00 1.267e-01 2.349e+01 48.850 < 2e-16 ***
## age 4.346e-03 7.282e-04 3.987e+04 5.969 2.41e-09 ***
## sexfemale -6.479e-02 2.282e-02 3.988e+04 -2.839 0.00453 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) age
## age -0.263
## sexfemale -0.107 0.044
Both of the variables are significant at this point, age having positive effect on the result, and being female - negative.
The next step is inclusion of all first level variables.
model11<-lmer(finsat~ age + sex + lifesat + selfincome +edu + equalinc + children +(1|country), data=da, REML = FALSE)
summary(model11)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ age + sex + lifesat + selfincome + edu + equalinc +
## children + (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 160912.6 161024.4 -80443.3 160886.6 39879
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7011 -0.5337 0.0699 0.6147 4.6827
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.1176 0.3429
## Residual 3.2972 1.8158
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.142e-01 9.588e-02 4.699e+01 5.363 2.44e-06 ***
## age 8.928e-03 6.696e-04 3.890e+04 13.334 < 2e-16 ***
## sexfemale -1.603e-02 1.839e-02 3.989e+04 -0.871 0.38361
## lifesat 5.586e-01 4.689e-03 3.983e+04 119.115 < 2e-16 ***
## selfincome 2.656e-01 4.806e-03 3.988e+04 55.265 < 2e-16 ***
## edusecondary 6.442e-02 2.982e-02 3.939e+04 2.160 0.03074 *
## edupost-secondary 1.917e-02 3.693e-02 3.762e+04 0.519 0.60369
## eduBA 1.752e-01 3.639e-02 3.867e+04 4.815 1.48e-06 ***
## eduhigher BA 1.456e-01 4.507e-02 3.810e+04 3.231 0.00123 **
## equalinc 3.271e-02 3.352e-03 3.987e+04 9.758 < 2e-16 ***
## children -6.196e-02 6.954e-03 3.979e+04 -8.910 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) age sexfml lifest slfncm edscnd edpst- eduBA edhgBA
## age -0.319
## sexfemale -0.143 0.084
## lifesat -0.266 -0.045 -0.028
## selfincome -0.138 0.067 0.038 -0.190
## edusecondry -0.273 0.133 0.042 0.009 -0.086
## edpst-scndr -0.261 0.156 0.049 0.014 -0.125 0.667
## eduBA -0.248 0.176 0.052 -0.001 -0.187 0.673 0.659
## eduhigherBA -0.192 0.116 0.051 -0.007 -0.215 0.575 0.602 0.578
## equalinc -0.151 0.004 0.033 -0.095 -0.090 -0.025 -0.029 -0.027 -0.017
## children -0.003 -0.414 -0.058 -0.023 -0.008 0.105 0.101 0.122 0.092
## equlnc
## age
## sexfemale
## lifesat
## selfincome
## edusecondry
## edpst-scndr
## eduBA
## eduhigherBA
## equalinc
## children -0.020
tab_model(model11)
| finsat | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.51 | 0.33 – 0.70 | <0.001 |
| age | 0.01 | 0.01 – 0.01 | <0.001 |
| sex [female] | -0.02 | -0.05 – 0.02 | 0.384 |
| lifesat | 0.56 | 0.55 – 0.57 | <0.001 |
| selfincome | 0.27 | 0.26 – 0.28 | <0.001 |
| edu [secondary] | 0.06 | 0.01 – 0.12 | 0.031 |
| edu [post-secondary] | 0.02 | -0.05 – 0.09 | 0.604 |
| edu [BA] | 0.18 | 0.10 – 0.25 | <0.001 |
| edu [higher BA] | 0.15 | 0.06 – 0.23 | 0.001 |
| equalinc | 0.03 | 0.03 – 0.04 | <0.001 |
| children | -0.06 | -0.08 – -0.05 | <0.001 |
| Random Effects | |||
| σ2 | 3.30 | ||
| τ00 country | 0.12 | ||
| ICC | 0.03 | ||
| N country | 20 | ||
| Observations | 39892 | ||
| Marginal R2 / Conditional R2 | 0.360 / 0.382 | ||
After the inclusion of all first level variables, we can see that almost all of them proved to be significant with the exception of female sex and post secondary education. The positive effect is shown by age, life satisfaction, income, secondary education, BA and higher, belief that incomes should not be equal. The negative effect is shown be children variable which is surprising since the Person’s correlation have shown positive result.
Finally, we add second level variables. Since the scale of GPD per capita and HCI have quite different scales, they are rescaled. Thus, GDP per capita is divided by 100 and HCI is multiplied by 10.
da$gdp100<-da$gdppc/100
da$hci100<-da$hci*10
model12<-lmer(finsat~ age + sex + lifesat + selfincome +edu + equalinc + children + gdp100 + inflation + unemp + hci100 + gini + (1|country), data=da, REML = FALSE)
summary(model12)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ age + sex + lifesat + selfincome + edu + equalinc +
## children + gdp100 + inflation + unemp + hci100 + gini + (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 160915.7 161070.4 -80439.8 160879.7 39874
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7019 -0.5337 0.0701 0.6149 4.6828
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.08252 0.2873
## Residual 3.29715 1.8158
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.182e+00 7.475e-01 2.010e+01 2.919 0.00846 **
## age 8.938e-03 6.706e-04 3.986e+04 13.329 < 2e-16 ***
## sexfemale -1.605e-02 1.840e-02 3.988e+04 -0.872 0.38309
## lifesat 5.586e-01 4.691e-03 3.989e+04 119.077 < 2e-16 ***
## selfincome 2.657e-01 4.807e-03 3.989e+04 55.273 < 2e-16 ***
## edusecondary 6.415e-02 2.984e-02 3.981e+04 2.150 0.03156 *
## edupost-secondary 1.803e-02 3.698e-02 3.926e+04 0.488 0.62582
## eduBA 1.750e-01 3.644e-02 3.976e+04 4.803 1.57e-06 ***
## eduhigher BA 1.439e-01 4.512e-02 3.927e+04 3.189 0.00143 **
## equalinc 3.265e-02 3.352e-03 3.987e+04 9.742 < 2e-16 ***
## children -6.221e-02 6.958e-03 3.989e+04 -8.941 < 2e-16 ***
## gdp100 4.737e-04 4.287e-04 1.988e+01 1.105 0.28239
## inflation -6.344e-04 4.246e-03 1.985e+01 -0.149 0.88274
## unemp -4.991e-02 3.521e-02 2.006e+01 -1.417 0.17172
## hci100 -1.993e-01 1.008e-01 1.998e+01 -1.977 0.06199 .
## gini -5.414e-03 1.005e-02 2.011e+01 -0.539 0.59583
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
tab_model(model12)
| finsat | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 2.18 | 0.72 – 3.65 | 0.004 |
| age | 0.01 | 0.01 – 0.01 | <0.001 |
| sex [female] | -0.02 | -0.05 – 0.02 | 0.383 |
| lifesat | 0.56 | 0.55 – 0.57 | <0.001 |
| selfincome | 0.27 | 0.26 – 0.28 | <0.001 |
| edu [secondary] | 0.06 | 0.01 – 0.12 | 0.032 |
| edu [post-secondary] | 0.02 | -0.05 – 0.09 | 0.626 |
| edu [BA] | 0.18 | 0.10 – 0.25 | <0.001 |
| edu [higher BA] | 0.14 | 0.06 – 0.23 | 0.001 |
| equalinc | 0.03 | 0.03 – 0.04 | <0.001 |
| children | -0.06 | -0.08 – -0.05 | <0.001 |
| gdp100 | 0.00 | -0.00 – 0.00 | 0.269 |
| inflation | -0.00 | -0.01 – 0.01 | 0.881 |
| unemp | -0.05 | -0.12 – 0.02 | 0.156 |
| hci100 | -0.20 | -0.40 – -0.00 | 0.048 |
| gini | -0.01 | -0.03 – 0.01 | 0.590 |
| Random Effects | |||
| σ2 | 3.30 | ||
| τ00 country | 0.08 | ||
| ICC | 0.02 | ||
| N country | 20 | ||
| Observations | 39892 | ||
| Marginal R2 / Conditional R2 | 0.369 / 0.385 | ||
As we can see from the results of this model, most of the second level variables are not significant. Therefore, inflation and GINI coefficient will be deleted from the model.
model13<-lmer(finsat~ age + sex + lifesat + selfincome +edu + equalinc + children + gdp100 + unemp + hci100 + (1|country), data=da, REML = FALSE)
summary(model13)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ age + sex + lifesat + selfincome + edu + equalinc +
## children + gdp100 + unemp + hci100 + (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 160912.0 161049.5 -80440.0 160880.0 39876
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7020 -0.5339 0.0700 0.6148 4.6840
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.08386 0.2896
## Residual 3.29715 1.8158
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.968e+00 6.192e-01 2.022e+01 3.178 0.00468 **
## age 8.945e-03 6.704e-04 3.981e+04 13.342 < 2e-16 ***
## sexfemale -1.597e-02 1.840e-02 3.989e+04 -0.868 0.38541
## lifesat 5.585e-01 4.689e-03 3.982e+04 119.111 < 2e-16 ***
## selfincome 2.657e-01 4.805e-03 3.987e+04 55.289 < 2e-16 ***
## edusecondary 6.454e-02 2.982e-02 3.952e+04 2.164 0.03046 *
## edupost-secondary 1.885e-02 3.695e-02 3.821e+04 0.510 0.60993
## eduBA 1.756e-01 3.641e-02 3.931e+04 4.824 1.41e-06 ***
## eduhigher BA 1.450e-01 4.508e-02 3.819e+04 3.217 0.00130 **
## equalinc 3.268e-02 3.351e-03 3.985e+04 9.750 < 2e-16 ***
## children -6.223e-02 6.957e-03 3.989e+04 -8.944 < 2e-16 ***
## gdp100 4.548e-04 4.258e-04 1.994e+01 1.068 0.29822
## unemp -5.799e-02 2.646e-02 2.003e+01 -2.191 0.04042 *
## hci100 -1.910e-01 9.900e-02 2.006e+01 -1.929 0.06802 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
tab_model(model13)
| finsat | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 1.97 | 0.75 – 3.18 | 0.001 |
| age | 0.01 | 0.01 – 0.01 | <0.001 |
| sex [female] | -0.02 | -0.05 – 0.02 | 0.385 |
| lifesat | 0.56 | 0.55 – 0.57 | <0.001 |
| selfincome | 0.27 | 0.26 – 0.28 | <0.001 |
| edu [secondary] | 0.06 | 0.01 – 0.12 | 0.030 |
| edu [post-secondary] | 0.02 | -0.05 – 0.09 | 0.610 |
| edu [BA] | 0.18 | 0.10 – 0.25 | <0.001 |
| edu [higher BA] | 0.14 | 0.06 – 0.23 | 0.001 |
| equalinc | 0.03 | 0.03 – 0.04 | <0.001 |
| children | -0.06 | -0.08 – -0.05 | <0.001 |
| gdp100 | 0.00 | -0.00 – 0.00 | 0.285 |
| unemp | -0.06 | -0.11 – -0.01 | 0.028 |
| hci100 | -0.19 | -0.38 – 0.00 | 0.054 |
| Random Effects | |||
| σ2 | 3.30 | ||
| τ00 country | 0.08 | ||
| ICC | 0.02 | ||
| N country | 20 | ||
| Observations | 39892 | ||
| Marginal R2 / Conditional R2 | 0.369 / 0.385 | ||
anova(model12,model13)
## Data: da
## Models:
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## model12: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + inflation + unemp + hci100 + gini + (1 | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model13 16 160912 161049 -80440 160880
## model12 18 160916 161070 -80440 160880 0.2905 2 0.8648
Despite deletion of variables, the model didn’t become worse. Therefore, only GDP, unemployment and HCI will be left in the model. The interpretation of its effect is the following:
The positive effect is shown by:
1.Age. Every year, adds on average 0.01 to financial satisfaction of a person. 2.Life satisfaction. Every level of life satisfaction adds on average 0.56 to financial satisfaction of a person. 3.Income level. Every level of income adds on average 0.27 to financial satisfaction of a person. 4.Secondary education. In comparison with low education, people with secondary education have 0.06 higher financial satisfaction. 5.BA. In comparison with low education, people with BA have 0.18 higher financial satisfaction. 6.higher BA. In comparison with low education, people with higher than BA education have 0.14 higher financial satisfaction. 7.Belief about inequality of incomes. Every level of people belief that incomes should not be equal adds 0.03 to financial satisfaction.
The negative effect is shown by:
anova(model11, model13)
## Data: da
## Models:
## model11: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + (1 | country)
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model11 13 160913 161024 -80443 160887
## model13 16 160912 161049 -80440 160880 6.6736 3 0.08306 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
However, as we can see, the comparison of model with only 1st level variables and model with 2nd level variables are not significantly different. Therefore, we can state that 2nd level variables do not improve the model.
model_performance(model11)
## Model was not fitted with REML, however, `estimator = "REML"`. Set
## `estimator = "ML"` to obtain identical results as from `AIC()`.
## # Indices of model performance
##
## AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
## -----------------------------------------------------------------------------------
## 1.610e+05 | 1.610e+05 | 1.611e+05 | 0.382 | 0.360 | 0.034 | 1.815 | 1.816
model_performance(model13)
## Model was not fitted with REML, however, `estimator = "REML"`. Set
## `estimator = "ML"` to obtain identical results as from `AIC()`.
## # Indices of model performance
##
## AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
## -----------------------------------------------------------------------------------
## 1.610e+05 | 1.610e+05 | 1.612e+05 | 0.385 | 0.369 | 0.025 | 1.815 | 1.816
As we can see, AIC for model with 1st level variables and 2nd level variables are the same. Which shows that they are not different in prediction. The same with BIC. However, the R2 value for the model with 2nd level variables is a bit higher, which means that it determines the proportion of variance in the dependent variable a bit better.
For the interaction between two 1st level variables I’ve chosen education and life satisfaction.
model21<-lmer(finsat~ age + sex + selfincome +edu*lifesat + equalinc + children + gdp100 + unemp + hci100 + (1|country), data=da, REML = FALSE)
summary(model21)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ age + sex + selfincome + edu * lifesat + equalinc +
## children + gdp100 + unemp + hci100 + (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 160873.6 161045.5 -80416.8 160833.6 39872
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7419 -0.5400 0.0699 0.6133 4.7726
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.08255 0.2873
## Residual 3.29335 1.8148
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.335e+00 6.173e-01 2.061e+01 3.783 0.001120 **
## age 8.879e-03 6.702e-04 3.981e+04 13.248 < 2e-16 ***
## sexfemale -1.492e-02 1.839e-02 3.988e+04 -0.812 0.417053
## selfincome 2.639e-01 4.810e-03 3.987e+04 54.850 < 2e-16 ***
## edusecondary -2.987e-01 9.052e-02 3.988e+04 -3.300 0.000969 ***
## edupost-secondary -5.529e-01 1.070e-01 3.978e+04 -5.168 2.38e-07 ***
## eduBA -4.199e-01 1.151e-01 3.989e+04 -3.649 0.000264 ***
## eduhigher BA -4.237e-01 1.485e-01 3.985e+04 -2.853 0.004339 **
## lifesat 5.088e-01 9.385e-03 3.989e+04 54.218 < 2e-16 ***
## equalinc 3.260e-02 3.350e-03 3.985e+04 9.734 < 2e-16 ***
## children -6.266e-02 6.954e-03 3.989e+04 -9.011 < 2e-16 ***
## gdp100 4.458e-04 4.225e-04 1.994e+01 1.055 0.303919
## unemp -5.747e-02 2.626e-02 2.003e+01 -2.189 0.040643 *
## hci100 -1.896e-01 9.823e-02 2.006e+01 -1.931 0.067803 .
## edusecondary:lifesat 4.963e-02 1.176e-02 3.989e+04 4.222 2.43e-05 ***
## edupost-secondary:lifesat 7.911e-02 1.397e-02 3.989e+04 5.663 1.49e-08 ***
## eduBA:lifesat 8.193e-02 1.506e-02 3.988e+04 5.440 5.36e-08 ***
## eduhigher BA:lifesat 7.770e-02 1.935e-02 3.989e+04 4.015 5.96e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
tab_model(model21)
| finsat | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 2.34 | 1.13 – 3.55 | <0.001 |
| age | 0.01 | 0.01 – 0.01 | <0.001 |
| sex [female] | -0.01 | -0.05 – 0.02 | 0.417 |
| selfincome | 0.26 | 0.25 – 0.27 | <0.001 |
| edu [secondary] | -0.30 | -0.48 – -0.12 | 0.001 |
| edu [post-secondary] | -0.55 | -0.76 – -0.34 | <0.001 |
| edu [BA] | -0.42 | -0.65 – -0.19 | <0.001 |
| edu [higher BA] | -0.42 | -0.71 – -0.13 | 0.004 |
| lifesat | 0.51 | 0.49 – 0.53 | <0.001 |
| equalinc | 0.03 | 0.03 – 0.04 | <0.001 |
| children | -0.06 | -0.08 – -0.05 | <0.001 |
| gdp100 | 0.00 | -0.00 – 0.00 | 0.291 |
| unemp | -0.06 | -0.11 – -0.01 | 0.029 |
| hci100 | -0.19 | -0.38 – 0.00 | 0.054 |
| edu [secondary] × lifesat | 0.05 | 0.03 – 0.07 | <0.001 |
|
edu [post-secondary] × lifesat |
0.08 | 0.05 – 0.11 | <0.001 |
| edu [BA] × lifesat | 0.08 | 0.05 – 0.11 | <0.001 |
| edu [higher BA] × lifesat | 0.08 | 0.04 – 0.12 | <0.001 |
| Random Effects | |||
| σ2 | 3.29 | ||
| τ00 country | 0.08 | ||
| ICC | 0.02 | ||
| N country | 20 | ||
| Observations | 39892 | ||
| Marginal R2 / Conditional R2 | 0.370 / 0.385 | ||
anova(model13, model21)
## Data: da
## Models:
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## model21: finsat ~ age + sex + selfincome + edu * lifesat + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model13 16 160912 161049 -80440 160880
## model21 20 160874 161046 -80417 160834 46.31 4 2.123e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model21, type="int")
As we can see from the result, the inclusion of interaction effect is significant and makes the model better.
As a result, we can see that the higher the level of education, the lower the financial satisfaction. However, the negative result of all levels of education is lower and even becomes positive the higher the life satisfaction of a person.
You can observe on the graph, for the highest level of of life satisfaction, every level of education have more positive effect on financial satisfaction. At the same time, for the lowest level of life satisfaction, every level of education have more negative effect on financial satisfaction.
model23<-lmer(finsat~ age + sex + selfincome +edu + equalinc + children + unemp*lifesat + hci100 + gdp100 + (1|country), data=da, REML = FALSE)
summary(model23)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: finsat ~ age + sex + selfincome + edu + equalinc + children +
## unemp * lifesat + hci100 + gdp100 + (1 | country)
## Data: da
##
## AIC BIC logLik deviance df.resid
## 160838.5 160984.6 -80402.3 160804.5 39875
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.7350 -0.5352 0.0687 0.6142 4.6518
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 0.08666 0.2944
## Residual 3.29087 1.8141
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.428e+00 6.322e-01 2.058e+01 2.258 0.03491 *
## age 8.777e-03 6.701e-04 3.982e+04 13.099 < 2e-16 ***
## sexfemale -1.816e-02 1.838e-02 3.988e+04 -0.988 0.32310
## selfincome 2.647e-01 4.802e-03 3.987e+04 55.125 < 2e-16 ***
## edusecondary 6.446e-02 2.979e-02 3.955e+04 2.164 0.03050 *
## edupost-secondary 1.707e-02 3.692e-02 3.833e+04 0.462 0.64389
## eduBA 1.717e-01 3.638e-02 3.936e+04 4.718 2.39e-06 ***
## eduhigher BA 1.420e-01 4.504e-02 3.832e+04 3.153 0.00162 **
## equalinc 3.253e-02 3.348e-03 3.985e+04 9.717 < 2e-16 ***
## children -6.192e-02 6.951e-03 3.989e+04 -8.908 < 2e-16 ***
## unemp 4.901e-02 2.957e-02 2.924e+01 1.657 0.10816
## lifesat 6.379e-01 1.027e-02 3.975e+04 62.105 < 2e-16 ***
## hci100 -1.917e-01 1.006e-01 2.003e+01 -1.905 0.07117 .
## gdp100 4.531e-04 4.326e-04 1.991e+01 1.047 0.30748
## unemp:lifesat -1.518e-02 1.747e-03 3.949e+04 -8.689 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
tab_model(model23)
| finsat | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 1.43 | 0.19 – 2.67 | 0.024 |
| age | 0.01 | 0.01 – 0.01 | <0.001 |
| sex [female] | -0.02 | -0.05 – 0.02 | 0.323 |
| selfincome | 0.26 | 0.26 – 0.27 | <0.001 |
| edu [secondary] | 0.06 | 0.01 – 0.12 | 0.030 |
| edu [post-secondary] | 0.02 | -0.06 – 0.09 | 0.644 |
| edu [BA] | 0.17 | 0.10 – 0.24 | <0.001 |
| edu [higher BA] | 0.14 | 0.05 – 0.23 | 0.002 |
| equalinc | 0.03 | 0.03 – 0.04 | <0.001 |
| children | -0.06 | -0.08 – -0.05 | <0.001 |
| unemp | 0.05 | -0.01 – 0.11 | 0.097 |
| lifesat | 0.64 | 0.62 – 0.66 | <0.001 |
| hci100 | -0.19 | -0.39 – 0.01 | 0.057 |
| gdp100 | 0.00 | -0.00 – 0.00 | 0.295 |
| unemp × lifesat | -0.02 | -0.02 – -0.01 | <0.001 |
| Random Effects | |||
| σ2 | 3.29 | ||
| τ00 country | 0.09 | ||
| ICC | 0.03 | ||
| N country | 20 | ||
| Observations | 39892 | ||
| Marginal R2 / Conditional R2 | 0.370 / 0.386 | ||
anova(model13, model21, model23)
## Data: da
## Models:
## model13: finsat ~ age + sex + lifesat + selfincome + edu + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## model23: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 | country)
## model21: finsat ~ age + sex + selfincome + edu * lifesat + equalinc + children + gdp100 + unemp + hci100 + (1 | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model13 16 160912 161049 -80440 160880
## model23 17 160839 160985 -80402 160805 75.413 1 <2e-16 ***
## model21 20 160874 161046 -80417 160834 0.000 3 1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model23, type="int")
The inclusion of interaction between 1st level and 2nd level variables makes the model better. This model have the lowest AIC between all models. From the model results we can see, that the higher the level of life satisfaction, the stronger the negative effect of unemployment. This finding is also supported by the graph.
For the random effect, I’ve decided to choose income level
model31<-lmer(finsat~ age + sex + selfincome +edu + equalinc + children + unemp*lifesat + hci100 + gdp100 + (1+ selfincome|country), data=da, control = lmerControl(optimizer ="Nelder_Mead"))
summary(model31)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: finsat ~ age + sex + selfincome + edu + equalinc + children +
## unemp * lifesat + hci100 + gdp100 + (1 + selfincome | country)
## Data: da
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 160543.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8079 -0.5343 0.0657 0.6150 4.7142
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## country (Intercept) 0.529710 0.72781
## selfincome 0.009776 0.09887 -0.92
## Residual 3.256631 1.80461
## Number of obs: 39892, groups: country, 20
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 9.504e-01 6.256e-01 1.882e+01 1.519 0.14535
## age 8.847e-03 6.718e-04 3.981e+04 13.168 < 2e-16 ***
## sexfemale -9.102e-03 1.832e-02 3.986e+04 -0.497 0.61935
## selfincome 2.744e-01 2.274e-02 1.916e+01 12.069 2.11e-10 ***
## edusecondary 9.018e-02 2.991e-02 3.954e+04 3.016 0.00257 **
## edupost-secondary 5.478e-02 3.698e-02 3.843e+04 1.481 0.13853
## eduBA 1.774e-01 3.643e-02 3.936e+04 4.870 1.12e-06 ***
## eduhigher BA 9.826e-02 4.497e-02 3.703e+04 2.185 0.02891 *
## equalinc 3.125e-02 3.336e-03 3.980e+04 9.369 < 2e-16 ***
## children -6.456e-02 6.943e-03 3.986e+04 -9.299 < 2e-16 ***
## unemp 9.228e-02 2.898e-02 2.583e+01 3.184 0.00376 **
## lifesat 6.357e-01 1.040e-02 3.954e+04 61.100 < 2e-16 ***
## hci100 -1.698e-01 9.639e-02 1.620e+01 -1.762 0.09691 .
## gdp100 7.867e-04 4.141e-04 1.605e+01 1.900 0.07560 .
## unemp:lifesat -1.618e-02 1.775e-03 3.918e+04 -9.116 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
tab_model(model31, show.std = T)
| finsat | ||||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | std. Beta | CI | standardized CI | p | std. p |
| (Intercept) | 0.95 | -0.04 | -0.28 – 2.18 | -0.11 – 0.02 | 0.129 | 0.209 |
| age | 0.01 | 0.06 | 0.01 – 0.01 | 0.05 – 0.07 | <0.001 | <0.001 |
| sex [female] | -0.01 | -0.00 | -0.05 – 0.03 | -0.02 – 0.01 | 0.619 | 0.619 |
| selfincome | 0.27 | 0.25 | 0.23 – 0.32 | 0.21 – 0.29 | <0.001 | <0.001 |
| edu [secondary] | 0.09 | 0.04 | 0.03 – 0.15 | 0.01 – 0.06 | 0.003 | 0.003 |
| edu [post-secondary] | 0.05 | 0.02 | -0.02 – 0.13 | -0.01 – 0.05 | 0.139 | 0.139 |
| edu [BA] | 0.18 | 0.08 | 0.11 – 0.25 | 0.05 – 0.11 | <0.001 | <0.001 |
| edu [higher BA] | 0.10 | 0.04 | 0.01 – 0.19 | 0.00 – 0.08 | 0.029 | 0.029 |
| equalinc | 0.03 | 0.04 | 0.02 – 0.04 | 0.03 – 0.05 | <0.001 | <0.001 |
| children | -0.06 | -0.04 | -0.08 – -0.05 | -0.05 – -0.03 | <0.001 | <0.001 |
| unemp | 0.09 | -0.03 | 0.04 – 0.15 | -0.08 – 0.03 | 0.001 | 0.354 |
| lifesat | 0.64 | 0.49 | 0.62 – 0.66 | 0.48 – 0.50 | <0.001 | <0.001 |
| hci100 | -0.17 | -0.08 | -0.36 – 0.02 | -0.17 – 0.01 | 0.078 | 0.078 |
| gdp100 | 0.00 | 0.09 | -0.00 – 0.00 | -0.00 – 0.19 | 0.057 | 0.057 |
| unemp × lifesat | -0.02 | -0.04 | -0.02 – -0.01 | -0.04 – -0.03 | <0.001 | <0.001 |
| Random Effects | ||||||
| σ2 | 3.26 | |||||
| τ00 country | 0.53 | |||||
| τ11 country.selfincome | 0.01 | |||||
| ρ01 country | -0.92 | |||||
| ICC | 0.05 | |||||
| N country | 20 | |||||
| Observations | 39892 | |||||
| Marginal R2 / Conditional R2 | 0.366 / 0.395 | |||||
anova(model23, model31)
## refitting model(s) with ML (instead of REML)
## Data: da
## Models:
## model23: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 | country)
## model31: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 + selfincome | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model23 17 160839 160985 -80402 160805
## model31 19 160471 160634 -80217 160433 371.51 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ranova(model31)
## ANOVA-like table for random-effects: Single term deletions
##
## Model:
## finsat ~ age + sex + selfincome + edu + equalinc + children + unemp + lifesat + hci100 + gdp100 + (1 + selfincome | country) + unemp:lifesat
## npar logLik AIC LRT Df
## <none> 19 -80272 160582
## selfincome in (1 + selfincome | country) 17 -80459 160951 373.51 2
## Pr(>Chisq)
## <none>
## selfincome in (1 + selfincome | country) < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As we can see from the anova test results, this model is even better than the previous one, which hasn’t included the random effect. Thus, we can conclude that this model is the best among all. The detailed information on predictors is the following:
The positive effect is shown by:
1.Age. Every year, adds on average 0.01 to financial satisfaction of a person. 2.Life satisfaction. Every level of life satisfaction adds on average 0.64 to financial satisfaction of a person. 3.Income level. Every level of income adds on average 0.27 to financial satisfaction of a person. 4.Secondary education. In comparison with low education, people with secondary education have 0.09 higher financial satisfaction. 5.BA. In comparison with low education, people with BA have 0.18 higher financial satisfaction. 6.higher BA. In comparison with low education, people with higher than BA education have 0.09 higher financial satisfaction. 7.Belief about inequality of incomes. Every level of people belief that incomes should not be equal adds 0.03 to financial satisfaction.
The negative effect is shown by:
Interaction effect:
The higher the level of life satisfaction the higher the negative effect of unemployment on financial satisfaction.
Random effect:
The inclusion of random slope with income level is significant. There is a difference of income level effect on independent variable between countries.
The random slope graph can be seen below:
dotplot(ranef(model31, condVar=TRUE))
## $country
From the plot we can see that effect of income level is different in different countries.
It is negative for: Pakistan, Thailand, Netherlands, Japan, Mexico, Mongolia, Colombia, Indonesia. Positive for: USA, Ukraine, Canada, Russia, Germany, Brazil, Singapore. Intersect 0: China, South Korea, Turkey, Australia, Kazakhstan.
model_performance(model31)
## # Indices of model performance
##
## AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
## -----------------------------------------------------------------------------------
## 1.606e+05 | 1.606e+05 | 1.607e+05 | 0.395 | 0.366 | 0.046 | 1.804 | 1.805
Compared to previous results if AIC and BIC, the final model has the lowest. Also it has an R2 of 0.395 which means that it describes 0.395 variance.
residuals <- resid(model31)
plot(residuals, main = "Residuals")
From the residuals plot we can see that residuals are distributed along the straight line in the middle and don’t have a pattern. Which means that the model has a good fit.
qqnorm(residuals)
qqline(residuals)
The Q-Q plot is not ideal because the line deviates at the beginning and end.
vif_model <- vif(model31)
print(vif_model)
## GVIF Df GVIF^(1/(2*Df))
## age 1.314342 1 1.146447
## sex 1.013325 1 1.006641
## selfincome 1.006487 1 1.003238
## edu 1.105423 4 1.012607
## equalinc 1.011858 1 1.005912
## children 1.280316 1 1.131510
## unemp 1.520246 1 1.232983
## lifesat 5.007857 1 2.237824
## hci100 2.810049 1 1.676320
## gdp100 2.899219 1 1.702709
## unemp:lifesat 5.292935 1 2.300638
In general, the GVIF results for all variables are fine, however the life satisfaction has rather high value, which means that the multicolinearity is possible in the model.
To fix the problem of multicolinearity, we can eliminate the life satisfaction from the predictors.
model41<-lmer(finsat~ age + sex + selfincome +edu + equalinc + children + unemp + hci100 + gdp100 + (1+ selfincome|country), data=da, control = lmerControl(optimizer ="Nelder_Mead"))
tab_model(model41)
| finsat | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 5.35 | 3.50 – 7.19 | <0.001 |
| age | 0.01 | 0.01 – 0.01 | <0.001 |
| sex [female] | 0.06 | 0.02 – 0.10 | 0.007 |
| selfincome | 0.39 | 0.33 – 0.46 | <0.001 |
| edu [secondary] | 0.08 | 0.01 – 0.15 | 0.020 |
| edu [post-secondary] | 0.03 | -0.06 – 0.11 | 0.523 |
| edu [BA] | 0.19 | 0.11 – 0.28 | <0.001 |
| edu [higher BA] | 0.12 | 0.01 – 0.22 | 0.027 |
| equalinc | 0.07 | 0.06 – 0.08 | <0.001 |
| children | -0.05 | -0.07 – -0.03 | <0.001 |
| unemp | -0.03 | -0.10 – 0.05 | 0.520 |
| hci100 | -0.33 | -0.62 – -0.04 | 0.023 |
| gdp100 | 0.00 | 0.00 – 0.00 | 0.015 |
| Random Effects | |||
| σ2 | 4.39 | ||
| τ00 country | 1.22 | ||
| τ11 country.selfincome | 0.02 | ||
| ρ01 country | -0.92 | ||
| ICC | 0.08 | ||
| N country | 20 | ||
| Observations | 39892 | ||
| Marginal R2 / Conditional R2 | 0.153 / 0.219 | ||
anova(model31, model41)
## refitting model(s) with ML (instead of REML)
## Data: da
## Models:
## model41: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp + hci100 + gdp100 + (1 + selfincome | country)
## model31: finsat ~ age + sex + selfincome + edu + equalinc + children + unemp * lifesat + hci100 + gdp100 + (1 + selfincome | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model41 17 172396 172542 -86181 172362
## model31 19 160471 160634 -80217 160433 11929 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif_model1 <- vif(model41)
print(vif_model1)
## GVIF Df GVIF^(1/(2*Df))
## age 1.309402 1 1.144291
## sex 1.012176 1 1.006070
## selfincome 1.002969 1 1.001484
## edu 1.101886 4 1.012202
## equalinc 1.002761 1 1.001380
## children 1.279523 1 1.131160
## unemp 1.204905 1 1.097682
## hci100 2.804680 1 1.674718
## gdp100 2.893936 1 1.701157
residuals1 <- resid(model41)
plot(residuals1, main = "Residuals")
qqnorm(residuals1)
qqline(residuals1)
As we can see, the model performs worse than the previous one in terms of AIC, since its value is higher here. Moreover, R2 is lower (0.153). However, the Q-Q plot looks better and there is no such problem as multicolinearity here.
The Research question was: How does life satisfaction and belief in income equality affect satisfaction with financial situation among individuals with varying levels of education and income?
This project has shown that life satisfaction, income, and belief that incomes should not be equal are positively associated with financial satisfaction
**Hypotheses*
The project was interesting to me, however, I have encountered a problem with the memory of my computer since the image of missings data frame couldn’t be plotted (image(mdf)). Also, for some reason I wasn’t able to plot_model and couldn’t find the solution to it.
Otherwise, the work on selecting variables and creating different models, especially with the random slope was very intriguing and important to me.