Introduction

In this work I am going to analyze how several social and economic parameters contribute to the level of happiness in a country. For this purpose I am taking Finland, as a country that holds the top position in the world’s happiness ranking for the third year in a row (source: https://worldhappiness.report/) what makes it the “happiest” country in the world. It would be interesting to discover the relationships between happiness and several factors in the years preceeding these 3 years when the country was on the top.

This ranking is based on polling that includes six variables of interest - GDP per capita, social support, healthy life expectancy, freedom, generosity, and absence of corruption. Following this thought, I tried to consider the same parameters when working with the Word Values Survey dataset that included waves conducted from 1981 to 2007. Yet, due to some of mentioned variables not having counterparts in the dataset, I eventually had a bit different set of variables, consisting of age (age), level of income earned by a household (income), subjective feeling of health (health), subjective feeling of degree of control over own life choices (choice), subjective satisfaction with financial situation of a household (household_income_satisf), and subjective happiness (happy) and subjective satisfaction (satisfaction) levels to consequently create an outcome variable.

Hypotheses

While income level of a household and satisfaction with this level may correspond to the GDP per capita variable, subjective perception of health may correspond to healthy life expectancy, and freedom of choice over the lifecourse may correspond to freedom variable, age feels out of place. The reason of choosing it as a predictor of happiness among all other factors was my intention to observe if there is a relationship such that the older a person gets the less happier s/he becomes. My logic behind it is that young people, especially children, are less bothered by material as well as spiritual problems/issues that are faced during one’s life. The idea is that such problems are accumulated throughout our life, as only the minority of theam can be figured out in a short period of time and permanently. And the size of this load of problems has an impact on happiness.

The predicting variables can be roughly classified into 3 groups of parameters: economic, personal demographic, and social. The first group includes level of income and satisfaction with this level of income, the second concerns age and health status, and the third group covers feeling of free choice.

The hypotheses for the study can be formulated as follows:

  • H1: level of happiness in a country is associated with economic factors.
    • H1a: economic factors, such as level of household income and satisfaction with this level, are positively associated with level of happiness in a country.
  • H2: level of happiness in a country is associated with personal demographic factors.
    • H2a: personal demographic factors, such as age and health, are negatively associated with level of happiness in a country.
  • H3: level of happiness in a country is associated with social factors.
    • H3a: social factors, such as free choice, are positively associated with level of happiness in a country.
  • H4: Adding an ineraction term to the model increases the latter’s explanative ability.

NB! I do NOT claim that the groups of factors are extensively represented with variables.

Data Description

After the required out of 289 possible variables were selected, filtering by country and omition of missing data were done, I got a dataset comprising of 2328 cases for each of 8 variables.

library(dplyr)
library(foreign)
require(ggplot2)
wvs = read.spss("wvs.sav", to.data.frame=TRUE)

wvs_FN<- wvs %>% filter(country == "Finland ") %>% dplyr::select(V237, V10, V253, V22, V11, V46, V68)
wvs_FN <- na.omit(wvs_FN)

names(wvs_FN) <- c ("age", "happy", "income",
                      "satisfaction", "health", "choice", "household_income_satisf")
wvs_FN$age <- as.numeric(wvs_FN$age)
str(wvs_FN)
## 'data.frame':    2368 obs. of  7 variables:
##  $ age                    : num  11 20 36 52 31 24 34 7 19 8 ...
##  $ happy                  : Factor w/ 4 levels "Very happy","Quite happy",..: 2 2 3 2 2 2 2 2 2 2 ...
##  $ income                 : Factor w/ 10 levels "Lower step","second step",..: 8 10 10 2 8 10 10 9 9 10 ...
##  $ satisfaction           : Factor w/ 10 levels "Dissatisfied",..: 7 9 7 6 8 8 5 9 7 8 ...
##  $ health                 : Factor w/ 5 levels "Very good","Good",..: 2 1 1 3 3 1 2 1 1 1 ...
##  $ choice                 : Factor w/ 10 levels "None at all",..: 9 9 6 4 8 7 6 7 7 7 ...
##  $ household_income_satisf: Factor w/ 10 levels "Dissatisfied",..: 4 9 6 6 7 3 5 5 6 2 ...
##  - attr(*, "variable.labels")= Named chr  "" "Set" "Unified respondent number" "Country/Region codes" ...
##   ..- attr(*, "names")= chr  "S003" "S004" "S007" "S009" ...
##  - attr(*, "codepage")= int 1251
##  - attr(*, "na.action")= 'omit' Named int  1 3 4 6 7 8 9 10 14 20 ...
##   ..- attr(*, "names")= chr  "1" "3" "4" "6" ...

Analysis

Descriptive statistics

First of all, let’s look at distribution of answers by categories of happiness and satisfaction in Finland during 1981-2007 time period.

ggplot(data = wvs_FN, aes(x = happy,
                      y = prop.table(stat(count)))) +
  geom_bar(aes(y = prop.table(..count..)),
             position = "dodge", alpha = 0.6, color = "black", fill = "blue2") + 
    scale_y_continuous(labels = scales::percent) +  
  labs(subtitle="Barplot", 
       y="Percentage of People", 
       x="Subjective Happiness", 
       title="Subjective Happiness in Finalnd") +
  theme_bw()

So, 91% of population, quite a majority, indicates their level of happiness as not less than “Quite happy”.

The initial idea of this study was to apply regression analysis which requires a quantitative outcome variable. As the subjective level of happiness is originally a categorical variable, the decision was made to align it with another variable responsible for feeling of subjective satisfaction with life, thus creating a numeric index that can be used as an outcome of a regression model.

The resulting variable has a range from 1 to 7 following an approximately normal (yet, skewed to the left) distribution of values. As it can be seen on the histogram below, median (white dashed line) and mean (red line) coincide suggesting the existence of a bell-shaped normal distribution.

wvs_FN$happy0 <-   ifelse(wvs_FN$happy=="Not at all happy",1,
                            ifelse(wvs_FN$happy=="Not very happy",2,
                                   ifelse(wvs_FN$happy=="Quite happy",3,
                                          ifelse(wvs_FN$happy=="Very happy",4, NA))))

wvs_FN$satisf1 <-   ifelse(wvs_FN$satisf=="Satisfied",10,
                             ifelse(wvs_FN$satisf=="Dissatisfied",1,
                                    wvs_FN$satisf))

wvs_FN$happyIND<- rowMeans(wvs_FN[c('happy0','satisf1')], na.rm=T)                  

ggplot(wvs_FN, aes(x = happyIND)) +
  ggtitle("Distribution of Happiness Index") + labs(subtitle = "Histogram") +
  xlab("Happiness Index") + 
  ylab("") +
  geom_histogram(binwidth = 0.5, fill = "sienna1", col= "black", alpha = 0.7) +
  geom_vline(aes(xintercept = mean(wvs_FN$happyIND)), linetype="solid", color="#8B0000", size=1) +
  geom_vline(aes(xintercept = median(wvs_FN$happyIND)), linetype="dashed", color="white", size=1) +
  theme_bw()

range(wvs_FN$happyIND, na.rm=T)
## [1] 1 7

Consequently, several predictors were recoded to numeric values, as well. Those were choice, household_income_satisf, and income, having been renamed as choice_rec, household_isr, and income_num, respectively.

The graphic representation of bivariate relationship between these variables, age and outcome can be seen on the following scatterplots.

wvs_FN$household_isr <-  ifelse(wvs_FN$household_income_satisf=="Satisfied",10,
                             ifelse(wvs_FN$household_income_satisf=="Dissatisfied",1,
                                    wvs_FN$household_income_satisf))

wvs_FN$choice_rec <-  ifelse(wvs_FN$choice=="A great deal",10,
                             ifelse(wvs_FN$choice=="None at all",1,
                                    wvs_FN$choice))

wvs_FN$income_num <- ifelse (wvs_FN$income=="Eigth step",
                           8, 
                            ifelse(wvs_FN$income=="Fifth step",
                                   5, 
                                   ifelse(wvs_FN$income=="Fourth step",
                                          4, 
                                          ifelse(wvs_FN$income=="Lower step",
                                                 1,
                                                 ifelse(wvs_FN$income=="Nineth step",
                                                        9,
                                                        ifelse(wvs_FN$income=="second step",
                                                               2,
                                                               ifelse(wvs_FN$income=="Seventh step",
                                                                      7,
                                                                      ifelse(wvs_FN$income=="Sixth step",
                                                                             6, 
                                                                             ifelse(wvs_FN$income=="Tenth step",
                                                                                    10,
                                                                                    ifelse(wvs_FN$income=="Third step",
                                                                                           3, NA)))))))))) 

age <- ggplot(wvs_FN, aes(age, happyIND) ) +
  geom_point() +
  stat_smooth() + 
  labs(y="Happiness Level", 
       x="Age") + theme_bw()

inc <- ggplot(wvs_FN, aes(income_num, happyIND) ) +
  geom_point() +
  stat_smooth() + 
  labs(y="Happiness Level", 
       x="Income Level") + theme_bw()

free <- ggplot(wvs_FN, aes(choice_rec, happyIND) ) +
  geom_point() +
  stat_smooth() +
  labs(y="Happiness Level", 
       x="Freedom of Choice") + theme_bw()

sat <- ggplot(wvs_FN, aes(household_isr, happyIND) ) +
  geom_point() +
  stat_smooth() + 
  labs(y="Happiness Level", 
       x="Satisfaction with Income Level") + theme_bw()

pdp::grid.arrange(age, inc, free, sat, ncol = 2)

Judging by these plots, it can be said that, while the rest of the variables do comply with linear trend when predicting the outcome, the relationship between the level of happiness and satisfaction withhousehold income level is non-linear. But let me come back to this notion a bit later.

The correlation between the abovementioned variables and the outcome was found to be the highest for “happiness index-satisfaction with houshold income” pair, being equal to 0.41, while the lowest value was observed in the pair of “age-level of happiness”, making the correlation coefficient equal to -0.08. The correlation coefficients for those and the remaining pairs can be observed on the plot below. The highest values are colored in red and blue, while the lowest are more close to be of white color. Negative values signify negative, or reversed, relationship between variables, while positive values indicate that the association is positive.

library(ggcorrplot)
cor_plot <- wvs_FN %>% dplyr::select(happyIND, age, household_isr, choice_rec, income_num) %>% cor() %>%round(2)

ggcorrplot(cor_plot, hc.order = TRUE, type = "lower",
   lab = TRUE)

Now, to the one and only categorical variable that will be used in a model, the subjective perception of health state. The plot below displays the existence of some kind of relationship between this variable and the outcome. The median values of happiness index are not the same for different categories of feeling healthy. This suggests that there exists some innfluence of perception of health on the level of happiness.

ggplot(data = wvs_FN) + 
  geom_boxplot(aes(x = health, y = happyIND), col= "red") + 
  labs(subtitle="Boxplots", 
       y="Hapiness Index", 
       x="Perception of Health", 
       title="Distribution of Happiness Index Values by Feeling Healthy") +
  theme_bw()

To be more confident about this fact, I check this association with the Kruskal–Wallis one-way analysis of variance test, which is a non-parametric analogy of the famous ANOVA testing.

options(scipen = 999)
kruskal.test(happyIND ~ health, data = wvs_FN)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  happyIND by health
## Kruskal-Wallis chi-squared = 296.89, df = 4, p-value <
## 0.00000000000000022

The returned p-value is lower than alpha-level which allows me to reject the null hypothesis of this testing. The latter states that the mean ranks of the different groups, in this case, categories og health variable, are the same. Thus, the existence of association between health variable and the outcome variable is proved to be statistically signifficant.

Thus, I’ve discovered that the variables chosen for regression analysis do exhibit some relationship with the outcome variable, happiness level, and can proceed to the model building.

Regression modeling

Model №1

Let the model with all of the initially selected variables put be the first model (model1). Judging by the output table of this model, the following statements can be made:

  1. the determined overall p-value of it is much lower than the alpha-level meaning that the regression mmodel is statistically signifficant, which supports the H1, H2, and H3 hypotheses;

  2. the adjusted R-squared value is 0.3077 meaning that about 31% of variance in the outcome variable can be explained with this model;

  3. all of the predictor variables, except for age, have a statistically signifficant effect on the outcome variable at alpha-levels of 0.04 and lower. This partly disconfirms H2;

  4. health has a reversed relationship with the level of happiness - the coefficient of health variable is negative for all of its categories. This partly supports H2a suggesting that every category of perceived health except for “Very good”, all other variables being constant, gives a decrease by 0.001 to 1.104 points in the level of happiness. “Very good” category, all other variables being constant, doesn’t give any effect on the outcome;

  5. income level is, surprisingly, negatively associated with the outcome as well, partly disconfirming H1a. Each additional point of income level variable, all other variables being constant, is associated with a decrease by -0.013 in the level of happiness;

  6. freedom of choice has a positive relationship with the outcome variable - this supports H3a hypothesis. Each additional point of freedom of choice variable, all other variables being constant, is associated with an increase by 0.114 in the level of happiness;

  7. satisfaction with the level of household income has a positive impact on the outcome variable, partly confirming H1a. Each additional point of satisfaction with the level of household income, all other variables being constant, is associated with an increase by 0.165 in the level of happiness.

model1 <- lm(happyIND ~ age  + health + income_num + choice_rec + household_isr, 
             data = wvs_FN)
summary(model1)
## 
## Call:
## lm(formula = happyIND ~ age + health + income_num + choice_rec + 
##     household_isr, data = wvs_FN)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6447 -0.3910  0.0581  0.5396  2.1846 
## 
## Coefficients:
##                  Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)      3.867469   0.106855  36.193 < 0.0000000000000002 ***
## age             -0.001296   0.001274  -1.018               0.3090    
## healthGood      -0.207038   0.044190  -4.685          0.000002956 ***
## healthFair      -0.556379   0.055290 -10.063 < 0.0000000000000002 ***
## healthPoor      -1.104759   0.092618 -11.928 < 0.0000000000000002 ***
## healthVery poor -1.490750   0.266228  -5.600          0.000000024 ***
## income_num      -0.013155   0.006314  -2.083               0.0373 *  
## choice_rec       0.114519   0.010963  10.446 < 0.0000000000000002 ***
## household_isr    0.165663   0.008837  18.747 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.864 on 2359 degrees of freedom
## Multiple R-squared:  0.3101, Adjusted R-squared:  0.3077 
## F-statistic: 132.5 on 8 and 2359 DF,  p-value: < 0.00000000000000022

Model №2

The statistical insignifficance of age variable in the model1 suggest the possibility of removing it from the predictors set. To assess the correctness of suh decision AIC values of model with (the first model) and without age - let it be the second model (model2) - can be compared.

model2 <- lm(happyIND~health + household_isr + choice_rec + income_num, 
             data=wvs_FN)
AIC(model1)
## [1] 6038.85
AIC(model2)
## [1] 6037.89

The first value, corresponding to the first model, is higher than the second value corresponding to the second model. This alone allows to conclude that the model is better off without age variable. However, let’s consider the output summary of the model without age (model2).

summary(model2)
## 
## Call:
## lm(formula = happyIND ~ health + household_isr + choice_rec + 
##     income_num, data = wvs_FN)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6410 -0.3933  0.0545  0.5400  2.1696 
## 
## Coefficients:
##                  Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)      3.844754   0.104498  36.793 < 0.0000000000000002 ***
## healthGood      -0.215034   0.043486  -4.945          0.000000815 ***
## healthFair      -0.577700   0.051165 -11.291 < 0.0000000000000002 ***
## healthPoor      -1.132902   0.088392 -12.817 < 0.0000000000000002 ***
## healthVery poor -1.521325   0.264529  -5.751          0.000000010 ***
## household_isr    0.162706   0.008345  19.497 < 0.0000000000000002 ***
## choice_rec       0.115887   0.010880  10.651 < 0.0000000000000002 ***
## income_num      -0.012381   0.006268  -1.975               0.0484 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.864 on 2360 degrees of freedom
## Multiple R-squared:  0.3098, Adjusted R-squared:  0.3077 
## F-statistic: 151.3 on 7 and 2360 DF,  p-value: < 0.00000000000000022

Well, all the points valid for the first model still apply here, yet with really slight difference in numbers. The age variable is not present, though. The predicitve power of a model didn’t change - adjusted R-squared value remained absolutely the same.

Nevertheless, let me stick to the model as the best going further to model diagnostics.

plot(model2)

The diagnostics plots suggest that the distribution of residual values in the outcome is approximately normal - on the normal probability plot residuals approximately follow a straight line and the residuals are spread equally along the ranges of predictors (3rd plot). Slight deviations from normality, , i.e., outliers, are present, yet it was expected from the point where I discussed the histograme of happiness index. The leverages, according to the “Residuals vs Leverage” plot are absent - there are no observations falling below Cook’s distance.

So, the model2 is okay.

Polynomial Model

Now, let me remind you of the moment when I said that satisfaction with household income variable has a non-linear relationship with happiness level. Let me bring the scatterplot again:

ggplot(wvs_FN, aes(household_isr, happyIND) ) +
  geom_point(color = "tomato", size = 3, alpha = 0.3) +
  stat_smooth(color = "black") + 
  labs(title = "Happiness Level VS Satisfaction with Income Level",
       subtitle = "Scatterplot",
       y="Happiness Level", 
       x="Satisfaction with Income Level") + theme_bw()

It seems like till the moment satisfaction with household income level gets to 2.5 the happiness level is rising, while at this point it starts to steadily decrease till the point of 4.0 on houshold income level satisfaction after what it is increasing, again. it There is a way to model such an effect, and that’s what I am doing next - bringing the quadratic term to the mentioned predictor in the model2 to capture the non-linear trend.

model_poly <- lm(happyIND ~  I(household_isr^2) + health + income_num + choice_rec + household_isr, 
             data = wvs_FN)
summary(model_poly)
## 
## Call:
## lm(formula = happyIND ~ I(household_isr^2) + health + income_num + 
##     choice_rec + household_isr, data = wvs_FN)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6490 -0.3923  0.0577  0.5384  2.1960 
## 
## Coefficients:
##                     Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)         3.802711   0.140729  27.021 < 0.0000000000000002 ***
## I(household_isr^2) -0.001378   0.003089  -0.446               0.6556    
## healthGood         -0.215877   0.043534  -4.959         0.0000007599 ***
## healthFair         -0.577707   0.051174 -11.289 < 0.0000000000000002 ***
## healthPoor         -1.131755   0.088445 -12.796 < 0.0000000000000002 ***
## healthVery poor    -1.516346   0.264809  -5.726         0.0000000116 ***
## income_num         -0.012304   0.006272  -1.962               0.0499 *  
## choice_rec          0.116077   0.010890  10.659 < 0.0000000000000002 ***
## household_isr       0.179010   0.037486   4.775         0.0000019038 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8642 on 2359 degrees of freedom
## Multiple R-squared:  0.3098, Adjusted R-squared:  0.3075 
## F-statistic: 132.4 on 8 and 2359 DF,  p-value: < 0.00000000000000022

Well, the modification did not do the job - the polynom of houshold income satisfaction variable has no statistically signifficant effect on the outcome variable. Moreover, the adjusted R-squared value dropped by 0.0002 points indicating that this model (model_poly) accounts for less variation in the outcome. Everything else (direction of predictor-outcome relationship, signifficance of predictors’ effect on the happiness level, the approximate range of values of predictors’ coefficients) remained the same as it was in the model2, and model1, actually. One important thing to be reported, perhaps, is that the quadratic term has a minus in its coefficient. This can be attributed to the very beginning of the x-axis on the scatterplot where the curve bends in the reversed U.

Interaction Model

Finally, to check the H4 hypothesis, let me add the interaction effect to the model.

Let’s suspect that the relationship between happiness level and satisfaction with houshold income is moderated by subjective health state. This way the way one feels about his/her health has some influence on the speed how fast happiness level increases with the rise in the level f satisfaction with income (trend’s slope steepness) and at which point the relationship between mentioned satisfaction and happiness starts to be observed (intercept point). The logic behind it can be explained as following: the more satisfied a person with his/her income is, the happier s/he is, yet the strength of this relationship is either heightened or lowered with the subjective perception of the person’s health. Thus, the assumprion is - if one is highly satisfied with his/her income level, s/he will be on average happier if this person feels really healthy, compared to those who feel like their health is poor.

Let’s test this assumption by adding a multiplication (interaction) term to the pevious model (model2).

model3 <- lm(happyIND~ household_isr*health + health + household_isr + choice_rec + income_num, 
             data=wvs_FN)
summary(model3)
## 
## Call:
## lm(formula = happyIND ~ household_isr * health + health + household_isr + 
##     choice_rec + income_num, data = wvs_FN)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6526 -0.3868  0.0489  0.5303  2.2627 
## 
## Coefficients:
##                               Estimate Std. Error t value
## (Intercept)                    4.07995    0.14102  28.932
## household_isr                  0.13269    0.01556   8.529
## healthGood                    -0.31187    0.14540  -2.145
## healthFair                    -0.91472    0.16073  -5.691
## healthPoor                    -2.13969    0.22429  -9.540
## healthVery poor               -2.24190    0.56592  -3.962
## choice_rec                     0.11158    0.01086  10.272
## income_num                    -0.01075    0.00626  -1.717
## household_isr:healthGood       0.01334    0.02006   0.665
## household_isr:healthFair       0.04879    0.02233   2.185
## household_isr:healthPoor       0.17082    0.03473   4.919
## household_isr:healthVery poor  0.11920    0.08805   1.354
##                                           Pr(>|t|)    
## (Intercept)                   < 0.0000000000000002 ***
## household_isr                 < 0.0000000000000002 ***
## healthGood                                  0.0321 *  
## healthFair                            0.0000000142 ***
## healthPoor                    < 0.0000000000000002 ***
## healthVery poor                       0.0000766811 ***
## choice_rec                    < 0.0000000000000002 ***
## income_num                                  0.0861 .  
## household_isr:healthGood                    0.5062    
## household_isr:healthFair                    0.0290 *  
## household_isr:healthPoor              0.0000009307 ***
## household_isr:healthVery poor               0.1760    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8596 on 2356 degrees of freedom
## Multiple R-squared:  0.318,  Adjusted R-squared:  0.3148 
## F-statistic: 99.85 on 11 and 2356 DF,  p-value: < 0.00000000000000022

It seems like the situation is changed now. The following notions can be made:

  1. the relationship between predictors and the outcome suggested by the model is still statistically signifficant (p-value is much lower than 0.05 alpha-level) - H4 is supported;

  2. the effect of income level on happiness, all other variables being equal, became statistically insignifficant, compared to other models - the p-value of its cofficient is lower than the alpha level;

  3. the direction (signs of coefficients) of relationships between predictors and the outcome remained the same as in the previous models, size of effects of predictors on the outcome (values of coefficients) changed very little;

  4. adding an interaction term increased the proportion of explained by the model variance in the outcome - now it’s about 32%;

  5. interaction effect between satisfaction with level of income and health is statistically signifficant only for categories of “Fair” and “Poor” health;

  6. the interaction effect between satisfaction with level of income and health suggests positive association with the outcome variable.

Now, let’s consider the graphical representation of the model with interaction term.

require(sjPlot)
plot_model(model3, type = "int") + theme_sjplot2() + labs(x = "Satisfaction with Household Income", y = "Happiness Index", title = "Predicted Values of Happiness Index")

Trend lines of happiness values predicted by income level satisfaction and moderated by categories of health that appeared to be statistically signifficant for interaction are colored in purple and green. It can be seen that these lines intersect closer to the end of satisfaction with income variable’s scale, signifying the existence of moderation for the two categories. The “Fair” category trend line tends to fall lower than the “Poor” category one when the satisfaction with income is at its highest. It can be interpreted as if people with “fair” health and who are not satisfied with income are happier than those who have “poor” health and the same income satisfaction level, while, being highly satisfied with income level, those with “poor” health are happier and than those with “fair” health. That’s acurious relationship for which I don’t have accurate explanation to due to lack of knowledge. It seems like when you’re not okay with how much your household earns you concentrate the thoughts on your health state deriving feeling of happiness from that as well. While, when you starting to feel satisfied with the income level, you feel happier even when the health goes down. Still, kinda complicated..

Conclusion

Summary

That’s all for the analysis of factors related to the level of happiness in Finland. Let’s briefly summarize what we have learned.

tab_model(model1, model_poly, model2, model3)
  happy IND happy IND happy IND happy IND
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 3.87 3.66 – 4.08 <0.001 3.80 3.53 – 4.08 <0.001 3.84 3.64 – 4.05 <0.001 4.08 3.80 – 4.36 <0.001
age -0.00 -0.00 – 0.00 0.309
Good -0.21 -0.29 – -0.12 <0.001 -0.22 -0.30 – -0.13 <0.001 -0.22 -0.30 – -0.13 <0.001 -0.31 -0.60 – -0.03 0.032
Fair -0.56 -0.66 – -0.45 <0.001 -0.58 -0.68 – -0.48 <0.001 -0.58 -0.68 – -0.48 <0.001 -0.91 -1.23 – -0.60 <0.001
Poor -1.10 -1.29 – -0.92 <0.001 -1.13 -1.31 – -0.96 <0.001 -1.13 -1.31 – -0.96 <0.001 -2.14 -2.58 – -1.70 <0.001
Very poor -1.49 -2.01 – -0.97 <0.001 -1.52 -2.04 – -1.00 <0.001 -1.52 -2.04 – -1.00 <0.001 -2.24 -3.35 – -1.13 <0.001
income num -0.01 -0.03 – -0.00 0.037 -0.01 -0.02 – -0.00 0.050 -0.01 -0.02 – -0.00 0.048 -0.01 -0.02 – 0.00 0.086
choice rec 0.11 0.09 – 0.14 <0.001 0.12 0.09 – 0.14 <0.001 0.12 0.09 – 0.14 <0.001 0.11 0.09 – 0.13 <0.001
household isr 0.17 0.15 – 0.18 <0.001 0.18 0.11 – 0.25 <0.001 0.16 0.15 – 0.18 <0.001 0.13 0.10 – 0.16 <0.001
I(household_isr^2) -0.00 -0.01 – 0.00 0.656
household_isr:healthGood 0.01 -0.03 – 0.05 0.506
household_isr:healthFair 0.05 0.01 – 0.09 0.029
household_isr:healthPoor 0.17 0.10 – 0.24 <0.001
household_isr:healthVery poor 0.12 -0.05 – 0.29 0.176
Observations 2368 2368 2368 2368
R2 / adjusted R2 0.310 / 0.308 0.310 / 0.307 0.310 / 0.308 0.318 / 0.315
  • Level of happiness in Finland is associated with age, economic factors, freedom of choice, and health.

  • Among the four models built to predict Finnish happiness the greatest proportion of variance in the outcome explained with a model is observed in last model that includes health, level of income, satisfaction with the level of income, freedom of choice, and interaction between health and satisfaction with the level of income as predicting variables.

  • Age is found to be non-linearly associated with the level of happiness in Finland, yet impact of the former on the happiness is statistically insignifficant neither in linear modeling nor in non-linear.

  • The relationship between health and the level of happiness appears to be negative in all built models, indicating that the presence of one of the categories of the predicting variable, all other predictors being constant, is associated with a decrease in the outcome variable.

  • The relationship between level of income and the level of happiness appears to be positive in all built models, indicating that an increase in the predicting variable, all other predictors being constant, is associated with an increase in the outcome variable.

  • The impact of level of income on the level of happiness is statistically signifficant in all models except for the last one that includes an interaction term.

  • The relationship between freedom of choice and the level of happiness appears to be positive in all built models, indicating that an increase in the predicting variable, all other predictors being constant, is associated with an increase in the outcome variable.

  • The relationship between freedom of choice and the level of happiness appears to be positive in all built models, indicating that an increase in the predicting variable, all other predictors being constant, is associated with an increase in the outcome variable.

  • The relationship between satisfaction with income level and the level of happiness appears to be positive in all built models, indicating that an increase in the predicting variable, all other predictors being constant, is associated with an increase in the outcome variable.

  • The interaction effect between satisfaction with income level and health is statistically signifficant for categories of “Poor” and “Fair” health and indicates a positive relationship with an outcome variable, meaning that a multiplication of satisfaction with income level and one of the 2 mentioned categories, all other predictors being constant, is associated with an increase by 0.05 points (for “Fair” health category) ad by 0.17 points (for “Poor” health category) in the outcome variable.

And here the end comes. Thank you for bearing with me!!