The data we are using is from WVS Database (World Values Survey). Using data for the last period (2010-2014), we select the desired country for analysis and work with part of the data. We leave the entire time period, and so we get 3421 observations, which allows us to analyze and predict happiness.
The next step is to choose variables that will explain our level of happiness.
Since we want to predict the level of happiness, based on income, position in society and the prestige of work, we are going to take the following variables:
finsatisf - How satisfied are you with the financial situation of your household? Completely dissatisfied 1 to Completely satisfied 10 1 2 3 4 5 6 7 8 9 10
hardsuccess - How would you place your views on this scale? 1 means you agree completely with the statement on the left; 10 means you agree completely with the statement on the right; and if your views fall somewhere in between, you can choose any number in between.
In the long run, hard work usually brings a better life
Hard work doesn’t generally bring success—it’s more a matter of luck and connections]
1 2 3 4 5 6 7 8 9 10
Yes, has paid employment: Full time employee (30 hours a week or more) 1 Part time employee (less than 30 hours a week) 2 Self employed 3
No, no paid employment: Retired/pensioned 4 Housewife not otherwise employed 5 Student 6 Unemployed 7 Other 8
1 Upper class 2 Upper middle class 3 Lower middle class 4 Working class 5 Lower class
The variable which we explain:
1 Very happy 2 Rather happy 3 Not very happy 4 Not at all happy
The variable for creating happy INDEX:
Completely dissatisfied 1 to Completely satisfied 10 1 2 3 4 5 6 7 8 9 10
Our variables are factor variables (expection is age, which is numeric), however, the scales in the dataset itself are not all suitable for the correct analysis, so the next step was to recode the variables. That means, the answers to every question is recoded this way:
Not at all happy = 1 Not very happy = 2, and so on with happy, satisf, finsatisf and hardsucess.
When the variables were encoded, we proceeded to create the index we needed for the further construction of models. Therefore, the happiness index was created from two variables: happy and satisf. However, let us look at plots of these two variables.
Actually, we see that great part of participants are quite happy, hovewer there are some who is not very happy. That is interesting to know why.
Again we see that there are not a lot people who are completely dissatisfied with their life and not a lot of people who are completely satisfied. However, main part of participants more or less are satisfied.
After looking at the plots and creating INDEX, let us look at the distribution of the INDEX with general histigram and standartized one.
Here we can conclude that our INDEX is distributed normally and we can go further.
Before constructing the analysis, it is important to look at the distribution of our predictive variables.
The distribution of finsatisf
There are few Koreans who are completely satisfied with their financial condition, but basically they are more or less satisfied. Although those who are not completely satisfied are also a sufficient number as we see.
The distribution of employment
The graph shows us that most Koreans work full time, and few who are not employed.
The distribution of hardsucess
It is noteworthy that Koreans believe that hard work really brings success (1, 2, 3)
The distribution of class
People consider themselves as lower middle class often and as upper middle class in general.
Let us look at the relationships between out outcome variable happy and predictors.
The relationship between happy and employment
The graph shows the presence of outliers, but it is also good that we see a difference in the level of happiness among different professions. For example, on average people who are unemplolyed or has a part-time job a less happy.
The relationship between happy and class
People who consider themselves as upper class on average seems more happy than others. It is seen from the graph that working and lower class are less happy than people from upper classes.
The relationship between happy and hardsuccess
No actually pattern has been noticed, but maybe later it will be significant in our model.
The relationship between happy and finsatisf
A lot of extreme cases: outliers, which may affect regression. Positive relatioship: the more a person is satisfied with his financial situation of household, the happier he is.
The higher level of satisfaction with the financial situation of household corresponds with the higher level of happiness (Hagerty & Veenhoven, 2003). In a study, Michael R. Hagerty & Ruut Veenhoven uses the theory of absolute utility that predicts that extra income allows each person to satisfy additional needs, thereby increasing the average long-term happiness - it was proved that satisfaction with your financial situation positively affects the feeling of happiness only in the short term.
Individuals who believe that their hard work brings success, feel happier.
Full-time workers, part-time workers and retired individuals are likely to be happier, than unemployed individuals (Lawrence et al., 2016). According to longitudinal research about happiness in USA, about 30% of individuals who have full-time job, part-time job or retired are “very happy”, while only 18% of unemployed individuals feel the same. Individuals feel much happier becoming older. (Stone A. A. et al., 2010). In a study on psychological well-being and its relationship with age, the results showed that after 50 years, people feel much happier than in their youth, when the level of happiness is on the decline.
Individuals belonging to higher classes feel happier (Paul Cameron, 2016). In a study, Paul Cameron came to the conclusion that people belonging to a higher class report more positive and happy moods than people with a lower position.
To predict happiness, we first build a model containing all the predictors and look at its significance.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$hardsuccess1 +
## wvsKR1$employment + wvsKR1$class)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9606 -0.4655 0.0111 0.4934 3.4490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.224063 0.143430 -8.534 < 2e-16 ***
## wvsKR1$finsatisf12 0.514464 0.124068 4.147 3.62e-05 ***
## wvsKR1$finsatisf13 0.659641 0.119297 5.529 3.96e-08 ***
## wvsKR1$finsatisf14 0.918816 0.113766 8.076 1.65e-15 ***
## wvsKR1$finsatisf15 1.173729 0.111577 10.519 < 2e-16 ***
## wvsKR1$finsatisf16 1.517238 0.110906 13.680 < 2e-16 ***
## wvsKR1$finsatisf17 1.677895 0.116939 14.348 < 2e-16 ***
## wvsKR1$finsatisf18 2.115567 0.161752 13.079 < 2e-16 ***
## wvsKR1$finsatisf110 2.795932 0.238240 11.736 < 2e-16 ***
## wvsKR1$hardsuccess12 0.172406 0.079709 2.163 0.030747 *
## wvsKR1$hardsuccess13 0.091913 0.094103 0.977 0.328907
## wvsKR1$hardsuccess14 0.135191 0.089237 1.515 0.130054
## wvsKR1$hardsuccess15 0.177270 0.097880 1.811 0.070383 .
## wvsKR1$hardsuccess16 0.172175 0.100708 1.710 0.087597 .
## wvsKR1$hardsuccess17 0.266182 0.123495 2.155 0.031334 *
## wvsKR1$hardsuccess18 0.004769 0.136674 0.035 0.972173
## wvsKR1$hardsuccess19 -0.049126 0.168479 -0.292 0.770654
## wvsKR1$hardsuccess110 0.289107 0.083688 3.455 0.000571 ***
## wvsKR1$employmentHousewife -0.164734 0.064706 -2.546 0.011028 *
## wvsKR1$employmentOther -0.256835 0.081735 -3.142 0.001718 **
## wvsKR1$employmentPart time -0.326759 0.100251 -3.259 0.001149 **
## wvsKR1$employmentRetired -0.592034 0.134066 -4.416 1.10e-05 ***
## wvsKR1$employmentSelf employed -0.095628 0.109554 -0.873 0.382909
## wvsKR1$employmentStudents -0.077495 0.080850 -0.958 0.338010
## wvsKR1$employmentUnemployed -0.171329 0.129388 -1.324 0.185711
## wvsKR1$classLower middle class 0.138021 0.113062 1.221 0.222427
## wvsKR1$classUpper class 0.470612 0.317285 1.483 0.138280
## wvsKR1$classUpper middle class 0.065752 0.122877 0.535 0.592680
## wvsKR1$classWorking class -0.122818 0.122239 -1.005 0.315232
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8116 on 1166 degrees of freedom
## Multiple R-squared: 0.3568, Adjusted R-squared: 0.3414
## F-statistic: 23.1 on 28 and 1166 DF, p-value: < 2.2e-16
P-value is less than 0.05, R-squared is a statistical measure of how close the data are to the fitted regression line. The higher the R-squared, the better the model fits the data. This model explains 36% of the variability of the response data around its mean. Finsatisf here is significant in model (***), as employment variable. However, we can notice that hardsuccess and class variables are not that much significant. Explanatory power not really strong.
The intercept is -1.22, ,means that if age, finsatisf, class and employment variables are egual to 0, the index of happy is egual to -1.22.
Next step of our analysis - creating the best model. Here we are using backward method -removing insignificant factors untill we will find good model. We don not have a lot of predictors, so this step is quite simple.
Now we are removing hardsuccess1 since our previous model showed it is not significant.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$employment +
## wvsKR1$class)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8647 -0.4387 -0.0083 0.5063 3.5807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.09065 0.13179 -8.276 3.44e-16 ***
## wvsKR1$finsatisf12 0.50082 0.12399 4.039 5.71e-05 ***
## wvsKR1$finsatisf13 0.66108 0.11871 5.569 3.18e-08 ***
## wvsKR1$finsatisf14 0.92209 0.11347 8.126 1.11e-15 ***
## wvsKR1$finsatisf15 1.16308 0.11132 10.448 < 2e-16 ***
## wvsKR1$finsatisf16 1.53027 0.11081 13.810 < 2e-16 ***
## wvsKR1$finsatisf17 1.69070 0.11677 14.479 < 2e-16 ***
## wvsKR1$finsatisf18 2.07441 0.16157 12.839 < 2e-16 ***
## wvsKR1$finsatisf110 2.79215 0.23835 11.714 < 2e-16 ***
## wvsKR1$employmentHousewife -0.17554 0.06453 -2.720 0.006618 **
## wvsKR1$employmentOther -0.24747 0.08156 -3.034 0.002465 **
## wvsKR1$employmentPart time -0.32996 0.09966 -3.311 0.000958 ***
## wvsKR1$employmentRetired -0.60818 0.13405 -4.537 6.29e-06 ***
## wvsKR1$employmentSelf employed -0.10120 0.10951 -0.924 0.355633
## wvsKR1$employmentStudents -0.09971 0.08038 -1.240 0.215067
## wvsKR1$employmentUnemployed -0.18816 0.12938 -1.454 0.146111
## wvsKR1$classLower middle class 0.15269 0.11253 1.357 0.175085
## wvsKR1$classUpper class 0.43503 0.31550 1.379 0.168204
## wvsKR1$classUpper middle class 0.07795 0.12234 0.637 0.524138
## wvsKR1$classWorking class -0.11025 0.12209 -0.903 0.366715
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8143 on 1175 degrees of freedom
## Multiple R-squared: 0.3475, Adjusted R-squared: 0.3369
## F-statistic: 32.93 on 19 and 1175 DF, p-value: < 2.2e-16
P-value is less than 0.05, therefore it is significant and we can conlcude that some predictors are explaining the level of happiness. This model explains 35% of the variability of the response data around its mean. Finsatisf here is significant again, as employment variable. However, we can notice that class variables is not significant. The intercept is -1.09, ,means that if age, finsatisf, class and employment variables are egual to 0, the index of happy is egual to -1.09. Positive relation with finsatisf and class. Negative relation with employment. Explanatory power not very strong.
Removing class variable
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$employment)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8114 -0.4574 -0.0252 0.4961 3.6641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.02927 0.10086 -10.205 < 2e-16 ***
## wvsKR1$finsatisf12 0.50252 0.12372 4.062 5.19e-05 ***
## wvsKR1$finsatisf13 0.67853 0.11708 5.796 8.73e-09 ***
## wvsKR1$finsatisf14 0.94735 0.11211 8.450 < 2e-16 ***
## wvsKR1$finsatisf15 1.18626 0.10885 10.898 < 2e-16 ***
## wvsKR1$finsatisf16 1.56030 0.10747 14.519 < 2e-16 ***
## wvsKR1$finsatisf17 1.72075 0.11300 15.227 < 2e-16 ***
## wvsKR1$finsatisf18 2.10526 0.15711 13.400 < 2e-16 ***
## wvsKR1$finsatisf110 2.84957 0.23253 12.255 < 2e-16 ***
## wvsKR1$employmentHousewife -0.16754 0.06478 -2.586 0.009819 **
## wvsKR1$employmentOther -0.23954 0.08187 -2.926 0.003500 **
## wvsKR1$employmentPart time -0.38329 0.09899 -3.872 0.000114 ***
## wvsKR1$employmentRetired -0.57970 0.13439 -4.314 1.74e-05 ***
## wvsKR1$employmentSelf employed -0.07190 0.10938 -0.657 0.511078
## wvsKR1$employmentStudents -0.07381 0.08010 -0.921 0.357008
## wvsKR1$employmentUnemployed -0.22635 0.12590 -1.798 0.072452 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8186 on 1179 degrees of freedom
## Multiple R-squared: 0.3384, Adjusted R-squared: 0.3299
## F-statistic: 40.2 on 15 and 1179 DF, p-value: < 2.2e-16
P-value is less than 0.05. This model explains 33% of the variability of the response data around its mean. R-squared became less - it is worse. It decreased because a predictor improved the model less than what is predicted previoulsy.
Removing employment variable just to see how one predictor expain happiness.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ wvsKR1$finsatisf1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9254 -0.4085 0.0237 0.5326 3.6166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.22137 0.09318 -13.108 < 2e-16 ***
## wvsKR1$finsatisf12 0.55201 0.12467 4.428 1.04e-05 ***
## wvsKR1$finsatisf13 0.72974 0.11700 6.237 6.19e-10 ***
## wvsKR1$finsatisf14 0.99941 0.11159 8.956 < 2e-16 ***
## wvsKR1$finsatisf15 1.26713 0.10831 11.699 < 2e-16 ***
## wvsKR1$finsatisf16 1.62935 0.10710 15.213 < 2e-16 ***
## wvsKR1$finsatisf17 1.78730 0.11298 15.819 < 2e-16 ***
## wvsKR1$finsatisf18 2.14900 0.15695 13.692 < 2e-16 ***
## wvsKR1$finsatisf110 2.92503 0.23326 12.540 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8282 on 1186 degrees of freedom
## Multiple R-squared: 0.3187, Adjusted R-squared: 0.3141
## F-statistic: 69.34 on 8 and 1186 DF, p-value: < 2.2e-16
P-value is less than 0.05. This model explains 31% of the variability of the response data around its mean. However, R-squered are smaller, since we have only one predictor.
Removing finsatisf variable
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ wvsKR1$employment)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.91265 -0.54538 0.05894 0.49662 2.69888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.175084 0.047957 3.651 0.000273 ***
## wvsKR1$employmentHousewife -0.237500 0.077351 -3.070 0.002186 **
## wvsKR1$employmentOther -0.312467 0.097062 -3.219 0.001320 **
## wvsKR1$employmentPart time -0.715074 0.116096 -6.159 9.98e-10 ***
## wvsKR1$employmentRetired -0.478726 0.160460 -2.983 0.002908 **
## wvsKR1$employmentSelf employed -0.054106 0.130728 -0.414 0.679036
## wvsKR1$employmentStudents -0.005434 0.095488 -0.057 0.954625
## wvsKR1$employmentUnemployed -0.409321 0.149426 -2.739 0.006249 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9805 on 1187 degrees of freedom
## Multiple R-squared: 0.04429, Adjusted R-squared: 0.03865
## F-statistic: 7.858 on 7 and 1187 DF, p-value: 2.483e-09
P-value is less than 0.05. This model explains 4% of the variability of the response data around its mean. Finsatisf here is significant again, as employment variable. It is important to say, that our employment variable became molre significant in this model not like in previous. Explanatory power is weak. It decreased because a predictor improved the model less than what is predicted previoulsy.
Now since we have several models, we need to compare them to find the best one.
For nested models it is usually used anova, for non-nested - AIC. We are using anova.
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$employment + wvsKR1$class
## Model 2: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$hardsuccess1 +
## wvsKR1$employment + wvsKR1$class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1175 779.10
## 2 1166 767.98 9 11.12 1.8758 0.05175 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reduced model is better than full because p valuse > 0.05.
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$employment
## Model 2: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$hardsuccess1 +
## wvsKR1$employment + wvsKR1$class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1179 790.00
## 2 1166 767.98 13 22.025 2.5724 0.001632 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here full model is better than reduced because p value < 0.05.
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1
## Model 2: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$hardsuccess1 +
## wvsKR1$employment + wvsKR1$class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1186 813.51
## 2 1166 767.98 20 45.529 3.4563 4.353e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here again full model is better than reduced because p value < 0.05.
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ wvsKR1$employment
## Model 2: wvsKR1$happyIND2 ~ wvsKR1$finsatisf1 + wvsKR1$hardsuccess1 +
## wvsKR1$employment + wvsKR1$class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1187 1141.12
## 2 1166 767.98 21 373.15 26.978 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The last anova test showed that full model is better than reduced because p value < 0.05.
Therefore, Model0 is better than models 2,3 and 4. Model1 is better than model0. So, model1 - our best model.
Model diagnostic is should also be presented. Firsly, we check our model on multicollinerity, where collinearity exists between three or more variables. If it has been presented in the model, the solution of the regression model becomes unstable. Therefore we are using VIF, the variance inflation factor, which measures how much the variance of a regression coefficient is inflated due to multicollinearity in the model.
## GVIF Df GVIF^(1/(2*Df))
## wvsKR1$finsatisf1 1.423944 8 1.022335
## wvsKR1$employment 1.276656 7 1.017599
## wvsKR1$class 1.491733 4 1.051263
The test showed that vif score for the predictor variableles less than 5 - that is okay (moderately correlated). No multicollinearity in our model is presented.
Next step in model diagnostic is a look on residuals and leverages.
The first plot is a scatter plot of residuals on the y axis and fitted values (estimated responses) on the x axis. It seems that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed errors. So, no heteroscedasticity.
Normal Q-Q plot shows that the distributions matched more or less perfectly, the residuals are normally distributed because the points follow the dotted line closely.It is seen expect observations 577,1098, 1165. That is okay. The model residuals have passed the test of normality.
Scale location plot indicates spread of points across predicted values range. A horizontal red line is ideal and would indicate that residuals have uniform variance across the range. For our model the results are not good.
The last graphs show that we have outliers, but not leverages. Outliers are data points whose response y does not follow the general trend of the rest of the data. A data point has high leverage if it has “extreme” predictor x values. Leverage is a measure of how unusual the X value of a point is. Leverage is an outlier if it greatly affects the slope of the regression line. Leverages should be deleted. But also everything is depended on how many observations do we have. Under Cook’s distance there is no points, means no leverages, which is good.
Other tests to assess the adequacy of our model
## rstudent unadjusted p-value Bonferonni p
## 1098 4.482532 8.0979e-06 0.009677
Bonferonni p-value shows that observation 1098 is an outlier, but it is not influences the regression line - the test statistically significant.
qqPlot(model1, main="QQ Plot")
## [1] 1098 1165
There is another way to present Q-Q plot which also shows a normal distribution.
And here is another variant to show leverages plot.
The distribution of studentized residuals is normal.
For adding non-linear effect and see if we have a better model, we will use 3 methods: polynom, spline, GAM.
Polynom
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ poly(finsatisf1, 3) + employment +
## class, data = wvsKR1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9187 -0.4491 -0.0169 0.4869 3.5483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04855 0.11391 0.426 0.67007
## poly(finsatisf1, 3)1 18.50086 0.89505 20.670 < 2e-16 ***
## poly(finsatisf1, 3)2 0.60084 0.84270 0.713 0.47599
## poly(finsatisf1, 3)3 1.66392 0.82843 2.009 0.04482 *
## employmentHousewife -0.17701 0.06450 -2.745 0.00615 **
## employmentOther -0.25499 0.08104 -3.146 0.00169 **
## employmentPart time -0.32664 0.09963 -3.279 0.00107 **
## employmentRetired -0.61375 0.13388 -4.584 5.04e-06 ***
## employmentSelf employed -0.09531 0.10941 -0.871 0.38389
## employmentStudents -0.09626 0.08026 -1.199 0.23064
## employmentUnemployed -0.19563 0.12876 -1.519 0.12896
## classLower middle class 0.15309 0.11226 1.364 0.17294
## classUpper class 0.43735 0.31498 1.389 0.16524
## classUpper middle class 0.08157 0.12193 0.669 0.50364
## classWorking class -0.11063 0.12175 -0.909 0.36372
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8149 on 1180 degrees of freedom
## Multiple R-squared: 0.3438, Adjusted R-squared: 0.336
## F-statistic: 44.16 on 14 and 1180 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ finsatisf1 + employment + class
## Model 2: wvsKR1$happyIND2 ~ poly(finsatisf1, 3) + employment + class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1175 779.10
## 2 1180 783.52 -5 -4.4272 1.3354 0.2466
## [1] 2922.081
## [1] 2918.852
The lower AIC - the better. According to AIC, adding non-linear effect did not bring a really better results and better model.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ poly(finsatisf1, 4) + employment +
## class, data = wvsKR1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9149 -0.4527 -0.0205 0.4834 3.5422
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04729 0.11420 0.414 0.67884
## poly(finsatisf1, 4)1 18.50228 0.89546 20.662 < 2e-16 ***
## poly(finsatisf1, 4)2 0.60310 0.84315 0.715 0.47457
## poly(finsatisf1, 4)3 1.66390 0.82877 2.008 0.04491 *
## poly(finsatisf1, 4)4 0.14087 0.82266 0.171 0.86407
## employmentHousewife -0.17721 0.06453 -2.746 0.00612 **
## employmentOther -0.25441 0.08115 -3.135 0.00176 **
## employmentPart time -0.32623 0.09970 -3.272 0.00110 **
## employmentRetired -0.61450 0.13401 -4.585 5.01e-06 ***
## employmentSelf employed -0.09492 0.10948 -0.867 0.38613
## employmentStudents -0.09617 0.08030 -1.198 0.23131
## employmentUnemployed -0.19414 0.12911 -1.504 0.13292
## classLower middle class 0.15431 0.11254 1.371 0.17056
## classUpper class 0.43500 0.31540 1.379 0.16810
## classUpper middle class 0.08264 0.12214 0.677 0.49877
## classWorking class -0.10920 0.12209 -0.894 0.37127
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8152 on 1179 degrees of freedom
## Multiple R-squared: 0.3438, Adjusted R-squared: 0.3355
## F-statistic: 41.18 on 15 and 1179 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ finsatisf1 + employment + class
## Model 2: wvsKR1$happyIND2 ~ poly(finsatisf1, 4) + employment + class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1175 779.1
## 2 1179 783.5 -4 -4.4077 1.6619 0.1565
## Analysis of Variance Table
##
## Model 1: wvsKR1$happyIND2 ~ poly(finsatisf1, 3) + employment + class
## Model 2: wvsKR1$happyIND2 ~ poly(finsatisf1, 4) + employment + class
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1180 783.52
## 2 1179 783.50 1 0.019485 0.0293 0.8641
## [1] 2922.081
## [1] 2918.852
## [1] 2920.823
Comparing models, adding non-linear effect did not bring better results in this case too.
Spline
##
## Call:
## lm(formula = happyIND2 ~ employment + bs(finsatisf1, knots = knots) +
## class, data = wvsKR1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9187 -0.4491 -0.0169 0.4869 3.5483
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.54505 0.20299 7.611 5.53e-14 ***
## employmentHousewife -0.17701 0.06450 -2.745 0.00615 **
## employmentOther -0.25499 0.08104 -3.146 0.00169 **
## employmentPart time -0.32664 0.09963 -3.279 0.00107 **
## employmentRetired -0.61375 0.13388 -4.584 5.04e-06 ***
## employmentSelf employed -0.09531 0.10941 -0.871 0.38389
## employmentStudents -0.09626 0.08026 -1.199 0.23064
## employmentUnemployed -0.19563 0.12876 -1.519 0.12896
## bs(finsatisf1, knots = knots)1 NA NA NA NA
## bs(finsatisf1, knots = knots)2 NA NA NA NA
## bs(finsatisf1, knots = knots)3 -2.59618 0.20116 -12.906 < 2e-16 ***
## bs(finsatisf1, knots = knots)4 -1.49246 0.18091 -8.250 4.19e-16 ***
## bs(finsatisf1, knots = knots)5 -1.50138 0.35694 -4.206 2.79e-05 ***
## bs(finsatisf1, knots = knots)6 NA NA NA NA
## classLower middle class 0.15309 0.11226 1.364 0.17294
## classUpper class 0.43735 0.31498 1.389 0.16524
## classUpper middle class 0.08157 0.12193 0.669 0.50364
## classWorking class -0.11063 0.12175 -0.909 0.36372
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8149 on 1180 degrees of freedom
## Multiple R-squared: 0.3438, Adjusted R-squared: 0.336
## F-statistic: 44.16 on 14 and 1180 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Response: wvsKR1$happyIND2
## Df Sum Sq Mean Sq F value Pr(>F)
## finsatisf1 8 380.49 47.562 71.7307 < 2.2e-16 ***
## employment 7 23.50 3.358 5.0640 1.119e-05 ***
## class 4 10.91 2.726 4.1119 0.002587 **
## Residuals 1175 779.10 0.663
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
##
## Response: wvsKR1$happyIND2
## Df Sum Sq Mean Sq F value Pr(>F)
## poly(finsatisf1, 3) 3 375.43 125.143 188.4676 < 2.2e-16 ***
## employment 7 24.10 3.442 5.1840 7.85e-06 ***
## class 4 10.95 2.738 4.1237 0.002533 **
## Residuals 1180 783.52 0.664
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 2918.852
## [1] 2918.852
## [1] 2922.081
Here again we see that Spline method did not make our model better.
GAM
AIC(modelgam) #2899.895 AIC(model1pl0) #2892.488
So, GAM also did not impove our model.
Therefore, we should continie to work with modek without non-linear effect.
We should also try to add an interactive effect, and to do this, we take a variable such as age.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ finsatisf1 + employment + age +
## class, data = wvsKR1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9612 -0.4590 -0.0265 0.4436 3.6296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.882609 0.118417 -7.453 1.75e-13 ***
## finsatisf1 0.294515 0.012598 23.377 < 2e-16 ***
## employment -0.026821 0.010531 -2.547 0.010992 *
## age -0.006273 0.001720 -3.647 0.000277 ***
## class -0.060197 0.018903 -3.185 0.001487 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8221 on 1190 degrees of freedom
## Multiple R-squared: 0.3265, Adjusted R-squared: 0.3242
## F-statistic: 144.2 on 4 and 1190 DF, p-value: < 2.2e-16
The p-value is smaller than 0.05, therefore it is significant and we can conlcude that some predictors are explaining the level of happiness. This model explains 33% of the variability of the response data around its mean. Finsatisf here is significant again, as employment variable. The intercept is -0.88,means that if age, finsatisf, class and employment variables are egual to 0, the index of happy is egual to -0.88. If an finsatisf changes on 1, the variable happy changes on 0.29. If an employment varible changes on 1, the happy variable changes on -0.02. If an age varible changes on 1, the happy variable changes on -0.006. If an class varible changes on 1, the happy variable changes on -0.06.
Hypothesis: The older the person and the more class he considers himself to be, the happier he is.
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.
Interpretation:
1 - Lower class 2 - Lower middle class 3 - Upper class 4 - Upper middle class 5 - Working class
The interaction effect is significance. The hypothesis is confirmed, since with an increase in age, the upper-middle class feels happier. And the lower the class the less happiness a person with age.
Hypothesis: The older the person and the more he satisfied with his financial situation, the happier he is.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ age * finsatisf1 + employment +
## class, data = wvsKR1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0387 -0.4421 -0.0155 0.4546 3.8209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5255270 0.2057840 -2.554 0.010780 *
## age -0.0147889 0.0043683 -3.386 0.000734 ***
## finsatisf1 0.2185959 0.0379533 5.760 1.07e-08 ***
## employment -0.0255225 0.0105329 -2.423 0.015536 *
## class -0.0602325 0.0188751 -3.191 0.001454 **
## age:finsatisf1 0.0018072 0.0008524 2.120 0.034197 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8208 on 1189 degrees of freedom
## Multiple R-squared: 0.329, Adjusted R-squared: 0.3262
## F-statistic: 116.6 on 5 and 1189 DF, p-value: < 2.2e-16
The p-value is smaller than 0.05, therefore it is significant and we can conlcude that some predictors are explaining the level of happiness. This model explains 33% of the variability of the response data around its mean. Finsatisf here is significant again, as employment, class and age variable. Age became more significant than in previous model. The intercept is -0.52, ,means that if age, finsatisf, class and employment variables are egual to 0, the index of happy is egual to -0.52. If an finsatisf changes on 1, the variable happy changes on 0.21. If an employment varible changes on 1, the happy variable changes on -0.03. If an age varible changes on 1, the happy variable changes on -0.01. If an class varible changes on 1, the happy variable changes on -0.06.
Interpretation:
1 - completely satisfied 9 - competely dissatisifed
The hypothesis is fully confirmed, since people who are satisfied with their financial condition are much happier than those who are not satisfied, and their happiness index increases with age.
Hypothesis: The older the person and if he had a paid stable employment, the happier he is.
##
## Call:
## lm(formula = wvsKR1$happyIND2 ~ finsatisf1 + age * employment +
## class, data = wvsKR1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9657 -0.4587 -0.0115 0.4574 3.6180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9627870 0.1599010 -6.021 2.3e-09 ***
## finsatisf1 0.2941031 0.0126129 23.318 < 2e-16 ***
## age -0.0041720 0.0032995 -1.264 0.20633
## employment -0.0060472 0.0297596 -0.203 0.83901
## class -0.0600479 0.0189074 -3.176 0.00153 **
## age:employment -0.0005617 0.0007526 -0.746 0.45561
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8222 on 1189 degrees of freedom
## Multiple R-squared: 0.3268, Adjusted R-squared: 0.324
## F-statistic: 115.4 on 5 and 1189 DF, p-value: < 2.2e-16
The p-value is smaller than 0.05, therefore it is significant and we can conlcude that some predictors are explaining the level of happiness. This model explains 33% of the variability of the response data around its mean. Finsatisf here is significant again, as class, but employment and age are not significant. The intercept is -0.96, ,means that if age, finsatisf, class and employment variables are egual to 0, the index of happy is egual to -0.96. If an finsatisf changes on 1, the variable happy changes on 0.29. If an employment varible changes on 1, the happy variable changes on -0.006. If an age varible changes on 1, the happy variable changes on -0.004. If an class varible changes on 1, the happy variable changes on -0.06.
Interpretation:
1 - Full-time paid job 8 - Other
The interaction effect is not really significant. The hypothesis cannot really be fully confirmed. We can only notice that people at full time work are indeed happier, but over the years their happiness index does not get higher. The level of happiness is higher for them, but falls almost the same as for other people.
Therefore, our first and second interaction model seems to be significant in our analysis.
We conducted data analysis and predicted the level of happiness for South Koreans. After describing the data, constructing a regression model and choosing the best one, we tested it and added interactive effects for a more detailed understanding of the results and the relationship of variables. We have chosen the best model containing such variables as: finsatisf, class, employment. The hardsuccess variable was deleted because it swas unsignificant.
Back to our hypotheses.
The higher level of satisfaction with the financial situation of household corresponds with the higher level of happiness - was confirmed. The model showed a strong positive relationship between the happiness index and the financial situation of Koreans.
Individuals who believe that their hard work brings success, feel happier - was not confirmed. Hardsuccess variable was deleted and had a weak relationship with the happiness index.
Full-time workers, part-time workers and retired individuals are likely to be happier, than unemployed individuals - was confirmed partically. The interactive effect showed that work affects the level of happiness, and full-time workers are happier, just as the level of happiness increases with age. However, we were unable to prove that part-time workers are indeed less happy with age.
Individuals belonging to higher classes feel happier - was confirmed. The model showed positive relationship with the level of happiness. The interactive model showed that with age people of higher classes are more happier.
Therefore, we can conclude that people in Korea are indeed happier when they are financially stable, have good paid jobs, and consider themselves to be in the upper class.
Further refinement may include improved models, as well as considering the question of the level of happiness on the other hand.
Hagerty M. R., Veenhoven R. Wealth and happiness revisited–growing national income does go with greater happiness //Social indicators research. – 2003. – Т. 64. – №. 1. – С. 1-27.
Happiness and Longevity in the United States Elizabeth M. Lawrence, Richard G. Rogers, Tim Wadsworth Soc Sci Med. Author manuscript; available in PMC 2016 Nov 1.Published in final edited form as: Soc Sci Med. 2015 Nov; 145: 115–119
Piff, Paul K. and Jake P. Moskowitz. “Wealth, Poverty, and Happiness: Social Class Is Differentially Associated With Positive Emotions.” Emotion 18 (2018): 902–905.
Paul Cameron, Mood as an Indicant of Happiness: Age, Sex, Social Class, and Situational Differences, Journal of Gerontology, Volume 30, Issue 2, March 1975, Pages 216–224.
Stone A. A. et al. A snapshot of the age distribution of psychological well-being in the United States // Proceedings of the National Academy of Sciences. - 2010. - T. 107. - No. 22. - S. 9985-9990.