Firstly I have tried to continue studying of Iceland, that was started a year ago on course with A. Shirokanova, but this wave does not have any Iceland. Thus, I switch my analysis to a Norway, since I am going to study there. But there is no Norway as well, so I cherry pick Japan, because it is the best country in the world.
This analysis will depend on two core articles: + Easterlin, R. A. (2003). Explaining happiness. Proceedings of the National Academy of Sciences, 100(19), 11176–11183. https://doi.org/10.1073/pnas.1633144100 + Veenhoven, R. (1991). Is happiness relative? Social Indicators Research, 24(1), 1–34. https://doi.org/10.1007/BF00292648
Easterlin (2003) explains happiness based on age cohort, marital status and income. Their main findings includes explaining relation between age and health, since when person become older, he/she become less healthier and less happy. Maritage increase overall happiness in any age.
Veenhoven (1991) claims about relative neutral of happiness, and their main finding includes tracking important events in people life, that can strongly affect perception of life.
Based on this literature were created hypothesis. 1. Negative experience increase happiness 2. More money, more happiness 3. Older people is less healthy, which leads to less happiness 4. People more happy in the middle of life 5. Old people experienced more in life, and less care about money
library(dplyr)
wvs <- readRDS("~/datanal/3year/hw/F00007762-WV6_Data_R_v20180912.rds")
wvs <- wvs %>% filter(V2 == 392)
wvs_JP <- wvs
wvs_JP$happy <- as.integer(wvs_JP$V10)
wvs_JP$age <- as.integer(wvs_JP$V242)
wvs_JP$health <- as.integer(wvs_JP$V11)
wvs_JP$marital <- as.factor(wvs_JP$V57)
wvs_JP$money <- as.factor(wvs_JP$V59)
wvs_JP$crime <- as.factor(ifelse( wvs_JP$V179 == 1 | wvs_JP$V180 == 1, 1, 0))
wvs_JP$suffer <- ifelse(wvs_JP$V188 == 1 | wvs_JP$V189 == 1 | wvs_JP$V190 == 1 | wvs_JP$V191 == 1, 1, ifelse(wvs_JP$V188 == 2 | wvs_JP$V189 == 2 | wvs_JP$V190 == 2 | wvs_JP$V191 == 2, 2, ifelse(wvs_JP$V188 == 3 | wvs_JP$V189 == 3 | wvs_JP$V190 == 3 | wvs_JP$V191 == 3, 3, 4)))
wvs_JP <- wvs_JP %>% filter(!happy %in% c(-3,-2,-1)) %>% filter(age > 15) %>% filter(!health %in% c(-3,-2,-1)) %>% filter(!marital %in% c(-3,-2,-1)) %>% filter(!money %in% c(-3,-2,-1)) %>% filter(!V179 %in% c(-3,-2,-1)) %>% filter(!V188 %in% c(-3,-2,-1) | !V189 %in% c(-3,-2,-1) | !V190 %in% c(-3,-2,-1) | !V191 %in% c(-3,-2,-1)) %>% dplyr::select(happy, age, health, marital, money, crime, suffer)
Predicted variable is level of happiness, that was measured by question “Taking all things together, would you say you are” and 4 levels from Very happy to Not at all happy.
Crime variable was constructed as index from two questions. Have person experienced crime, and have their family faced it.
Money is relative scale, that measured satisfaction with current financial situation in the household. Scale from 1 to 10.
Suffer also an index. And measured by set of question on negative experience in last 12 month. Gone without enough food to eat, Felt unsafe from crime in your home, Gone without medicine or medical treatment that you needed, Gone without a cash income Max value were taken for suffer variable. 4 level scale
library(psych)
knitr::kable(describe(wvs_JP))
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
happy | 1 | 2148 | 1.785847 | 0.6508530 | 2 | 1.719767 | 0.0000 | 1 | 4 | 3 | 0.4684161 | 0.2555566 | 0.0140432 |
age | 2 | 2148 | 50.523743 | 16.1036146 | 51 | 50.818605 | 19.2738 | 18 | 80 | 62 | -0.1260729 | -1.0319298 | 0.3474611 |
health | 3 | 2148 | 2.427840 | 0.8335689 | 2 | 2.420930 | 1.4826 | 1 | 4 | 3 | 0.0048275 | -0.5851558 | 0.0179856 |
marital* | 4 | 2148 | 3.251397 | 2.0204181 | 2 | 2.940698 | 0.0000 | 2 | 7 | 5 | 1.1375209 | -0.5488163 | 0.0435937 |
money* | 5 | 2148 | 7.067970 | 2.2457408 | 7 | 7.123256 | 2.9652 | 2 | 11 | 9 | -0.2428700 | -0.5912202 | 0.0484554 |
crime* | 6 | 2148 | 1.061452 | 0.2402144 | 1 | 1.000000 | 0.0000 | 1 | 2 | 1 | 3.6496004 | 11.3248563 | 0.0051830 |
suffer | 7 | 2148 | 3.223929 | 0.8328631 | 3 | 3.330233 | 1.4826 | 1 | 4 | 3 | -0.8587002 | 0.0400307 | 0.0179703 |
Median age is 51, what is normal for Japan. It is very old society. Most of people happy or neutral about life. (same as in theory) They are rarely suffered, or were victims in a crime.
library(car)
library(RColorBrewer)
pairs.panels(wvs_JP,
hist.col = "#D0104C",
)
As it clear from scatter matrix, there is strong negative correlation between age and marital status, and between happiness and money. Happiness and health positively correlated. Most of people married, does not experienced crime, and not suffer.
library(ggplot2)
ggplot(data = wvs_JP, aes(x = age))+
geom_density(fill = "#D0104C")+
labs(x = "Age",
y = "Density",
title = "Distribution of respondents age in Japane")+
theme_minimal()
Happiness distributed from 18 to 80 years, have two modes near 40 and 60 years.
library(stats)
ks.test(rnorm(10^4), wvs_JP$happy)
##
## Two-sample Kolmogorov-Smirnov test
##
## data: rnorm(10^4) and wvs_JP$happy
## D = 0.8336, p-value < 2.2e-16
## alternative hypothesis: two-sided
As we see, our distribution is not really normal, as p-value < 2.2e-16 and given null hypothesis of normality. However we are okay with this violation, because there are more than 2000 observations.
Lets start our forwards approach to construction of regression.
Obviously we will start from the wealth
model1 <- lm(happy ~ ordered(money), data = wvs_JP)
library(stargazer)
stargazer(model1, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## ============================================
## HAPPY
## --------------------------------------------
## ordered(money).L -1.048***
## (.058)
##
## ordered(money).Q -.119*
## (.054)
##
## ordered(money).C .044
## (.051)
##
## ordered(money)4 -.122*
## (.053)
##
## ordered(money)5 .123*
## (.054)
##
## ordered(money)6 -.098*
## (.048)
##
## ordered(money)7 .128**
## (.042)
##
## ordered(money)8 -.066
## (.039)
##
## ordered(money)9 .075*
## (.036)
##
## Constant 1.844***
## (.015)
##
## Observations 2,148
## R2 .162
## Adjusted R2 .158
## Residual Std. Error .597 (df = 2138)
## F Statistic 45.856*** (df = 9; 2138)
## --------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Mean happiness is near 2 Rather happy. As more people satisfied with money, than they are more happy, there is linear and quadratic relation. However, neutral satisfaction with money decrease happiness. Base model explained 16 % of data.
model2 <- update(model1, ~.+health)
stargazer(model2, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## =============================================
## HAPPY
## ---------------------------------------------
## ordered(money).L -.864***
## (.056)
##
## ordered(money).Q -.071
## (.051)
##
## ordered(money).C .059
## (.048)
##
## ordered(money)4 -.103*
## (.050)
##
## ordered(money)5 .111*
## (.050)
##
## ordered(money)6 -.089
## (.046)
##
## ordered(money)7 .134***
## (.040)
##
## ordered(money)8 -.096**
## (.037)
##
## ordered(money)9 .080*
## (.034)
##
## health .247***
## (.015)
##
## Constant 1.241***
## (.040)
##
## Observations 2,148
## R2 .256
## Adjusted R2 .252
## Residual Std. Error .563 (df = 2137)
## F Statistic 73.397*** (df = 10; 2137)
## ---------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Less health mean less happiness. Explanation power of the model increased.
model3 <- update(model2, ~.+ marital)
stargazer(model3, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## =============================================
## HAPPY
## ---------------------------------------------
## ordered(money).L -.805***
## (.055)
##
## ordered(money).Q -.071
## (.049)
##
## ordered(money).C .056
## (.047)
##
## ordered(money)4 -.106*
## (.048)
##
## ordered(money)5 .101*
## (.049)
##
## ordered(money)6 -.062
## (.044)
##
## ordered(money)7 .108**
## (.039)
##
## ordered(money)8 -.089*
## (.035)
##
## ordered(money)9 .079*
## (.033)
##
## health .254***
## (.015)
##
## marital2 .199*
## (.093)
##
## marital3 .288***
## (.054)
##
## marital4 .221
## (.138)
##
## marital5 .131*
## (.057)
##
## marital6 .344***
## (.031)
##
## Constant 1.131***
## (.039)
##
## Observations 2,148
## R2 .302
## Adjusted R2 .297
## Residual Std. Error .546 (df = 2132)
## F Statistic 61.492*** (df = 15; 2132)
## ---------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Marital status decrease happiness, with base of marriage. Being single marital6 .344*** has strongest effect on happiness. Being separated, does not matter in terms of happiness.
However, according to Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6. https://doi.org/10.1038/s41562-017-0189-z, I can not claim about lack of relation. I am saying only, i was not able to find strong relation and mean for this variable is .221, SD .138.
model4 <- update(model3, ~.+ crime)
stargazer(model4, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## =============================================
## HAPPY
## ---------------------------------------------
## ordered(money).L -.805***
## (.055)
##
## ordered(money).Q -.072
## (.050)
##
## ordered(money).C .056
## (.047)
##
## ordered(money)4 -.106*
## (.048)
##
## ordered(money)5 .100*
## (.049)
##
## ordered(money)6 -.062
## (.044)
##
## ordered(money)7 .108**
## (.039)
##
## ordered(money)8 -.088*
## (.036)
##
## ordered(money)9 .079*
## (.033)
##
## health .253***
## (.015)
##
## marital2 .199*
## (.093)
##
## marital3 .288***
## (.054)
##
## marital4 .222
## (.138)
##
## marital5 .131*
## (.057)
##
## marital6 .343***
## (.031)
##
## crime1 .010
## (.049)
##
## Constant 1.131***
## (.040)
##
## Observations 2,148
## R2 .302
## Adjusted R2 .297
## Residual Std. Error .546 (df = 2131)
## F Statistic 57.625*** (df = 16; 2131)
## ---------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Experience crime does not have any effect on happiness. This variable will be excluded in later research.
model5 <- update(model4, ~.+ ordered(suffer) - crime)
stargazer(model5, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## =============================================
## HAPPY
## ---------------------------------------------
## ordered(money).L -.787***
## (.056)
##
## ordered(money).Q -.087
## (.050)
##
## ordered(money).C .061
## (.047)
##
## ordered(money)4 -.114*
## (.048)
##
## ordered(money)5 .104*
## (.049)
##
## ordered(money)6 -.063
## (.044)
##
## ordered(money)7 .107**
## (.039)
##
## ordered(money)8 -.094**
## (.035)
##
## ordered(money)9 .082*
## (.033)
##
## health .251***
## (.015)
##
## marital2 .187*
## (.093)
##
## marital3 .285***
## (.054)
##
## marital4 .208
## (.138)
##
## marital5 .125*
## (.057)
##
## marital6 .340***
## (.031)
##
## ordered(suffer).L -.111**
## (.043)
##
## ordered(suffer).Q .053
## (.036)
##
## ordered(suffer).C -.007
## (.028)
##
## Constant 1.173***
## (.043)
##
## Observations 2,148
## R2 .304
## Adjusted R2 .298
## Residual Std. Error .545 (df = 2129)
## F Statistic 51.729*** (df = 18; 2129)
## ---------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Experience struggles in life have linear relation with happiness. And if person do suffer in lats 12 month, he or she will be more happy now.
Let’s check model for multicollinearity.
vif(model5)
## GVIF Df GVIF^(1/(2*Df))
## ordered(money) 1.225983 9 1.011383
## health 1.089645 1 1.043861
## marital 1.089711 5 1.008628
## ordered(suffer) 1.144411 3 1.022736
Coefficients are less than 5, everything’s OK!
outlierTest(model5)
## rstudent unadjusted p-value Bonferonni p
## 2004 4.557849 5.461e-06 0.01173
No problems with outliers
qqPlot(model5, main="QQ Plot")
## [1] 138 2004
Residuals distributed normally, there are some outliers
par(mfrow=c(2,2))
plot(model5)
Model explain data OK in each set, graphs looks fine, due to data structure they are funny. (Variables are no integer in general) There are outliers
wvs_out <- wvs_JP[c(138,2004, 412, 1266, 175, 1157, 254, 585),]
wvs_JP <- wvs_JP[c(-138,-2004, -412, - 1266, - 175, -1157, -254, -585),]
For the record, I wrote down each outlier and checked each particular respondent and have fiend out some anomaly in their answers. They will be excluded from data.
model6 <- lm(happy ~ ordered(money) + health + marital + ordered(suffer) + age, data = wvs_JP)
stargazer(model6, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## =============================================
## HAPPY
## ---------------------------------------------
## ordered(money).L -.804***
## (.056)
##
## ordered(money).Q -.097*
## (.049)
##
## ordered(money).C .055
## (.046)
##
## ordered(money)4 -.111*
## (.048)
##
## ordered(money)5 .089
## (.048)
##
## ordered(money)6 -.041
## (.044)
##
## ordered(money)7 .098**
## (.038)
##
## ordered(money)8 -.089*
## (.035)
##
## ordered(money)9 .079*
## (.032)
##
## health .241***
## (.015)
##
## marital2 .150
## (.092)
##
## marital3 .295***
## (.053)
##
## marital4 .384**
## (.145)
##
## marital5 .099
## (.057)
##
## marital6 .370***
## (.036)
##
## ordered(suffer).L -.105*
## (.042)
##
## ordered(suffer).Q .053
## (.036)
##
## ordered(suffer).C -.012
## (.028)
##
## age .002*
## (.001)
##
## Constant 1.079***
## (.059)
##
## Observations 2,140
## R2 .312
## Adjusted R2 .306
## Residual Std. Error .535 (df = 2120)
## F Statistic 50.629*** (df = 19; 2120)
## ---------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Final model explains 30% of data, Intercept situated near 1.1, which means happy. More satisfaction with financial situation means more money. Better health means more happiness. Being not married decrease happiness. Experience of suffering increase happiness.
vif(model6)
## GVIF Df GVIF^(1/(2*Df))
## ordered(money) 1.267103 9 1.013239
## health 1.136205 1 1.065929
## marital 1.598494 5 1.048024
## ordered(suffer) 1.163428 3 1.025549
## age 1.585285 1 1.259081
outlierTest(model6)
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferonni p
## 939 3.841103 0.00012608 0.26981
qqPlot(model6, main="QQ Plot")
## [1] 939 1688
plot(model6)
Again, there are outliers and for a better model it is necessary, to check them and remove, if it is possible.
library(MASS)
sresid <- studres(model6)
hist(sresid, freq=FALSE,
main="Distribution of Studentized Residuals")
xfit<-seq(min(sresid),max(sresid),length=40)
yfit<-dnorm(xfit)
lines(xfit, yfit)
Residuals distributed normally!
ncvTest(model6)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 49.08955, Df = 1, p = 2.4454e-12
there is no heteroscedasticity in fitted value.
model7 <- lm(happy ~ + health + marital + poly(age, 2, raw = TRUE) , data = wvs_JP)
stargazer(model7, type = "text", title = "Happyness and nonlinear age", style = "ajs")
##
## Happyness and nonlinear age
##
## ==================================================
## HAPPY
## --------------------------------------------------
## health .298***
## (.015)
##
## marital2 .150
## (.097)
##
## marital3 .401***
## (.056)
##
## marital4 .560***
## (.153)
##
## marital5 .129*
## (.061)
##
## marital6 .448***
## (.041)
##
## poly(age, 2, raw = TRUE)1 .024***
## (.006)
##
## poly(age, 2, raw = TRUE)2 -0.000***
## (0.000)
##
## Constant .366**
## (.142)
##
## Observations 2,140
## R2 .223
## Adjusted R2 .220
## Residual Std. Error .567 (df = 2131)
## F Statistic 76.551*** (df = 8; 2131)
## --------------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
Let’s ignore nonlinear effect of money and focus only on quadratic effect of the age on happiness.
ggplot(wvs_JP, aes(age, happy) ) +
geom_point() +
stat_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE))+
theme_minimal()
There is really small effect of the age, due to structure of data, but significantly, people born a the age 18 less happy, become happier at the middle of the life and die less happy.
wvs_JP$money <- as.integer(wvs_JP$money)
model8 <- lm(happy ~ money + health + marital + suffer + age +
money*age, data = wvs_JP)
stargazer(model8, type = "text", title = "Happyness and money", style = "ajs")
##
## Happyness and money
##
## =============================================
## HAPPY
## ---------------------------------------------
## money -.040*
## (.017)
##
## health .241***
## (.015)
##
## marital2 .140
## (.092)
##
## marital3 .300***
## (.053)
##
## marital4 .345*
## (.145)
##
## marital5 .092
## (.057)
##
## marital6 .377***
## (.036)
##
## suffer -.025
## (.015)
##
## age .009***
## (.003)
##
## money:age -.001**
## (0.000)
##
## Constant 1.345***
## (.144)
##
## Observations 2,140
## R2 .307
## Adjusted R2 .304
## Residual Std. Error .536 (df = 2129)
## F Statistic 94.379*** (df = 10; 2129)
## ---------------------------------------------
## Notes: *P < .05
## **P < .01
## ***P < .001
library(sjPlot)
plot_model(model8, type = "int") + theme_minimal()
As it clear from the plot, younger people much happier with a lack of money, than older people, however, if person satisfied with current financial situation age does not matter. Money = Happiness.
Probably this phenomenon can be explained with a related nature of happiness and in old age there is no joy, because olden used to have better financial situation.
anova(model8, model7, model6)
## Analysis of Variance Table
##
## Model 1: happy ~ money + health + marital + suffer + age + money * age
## Model 2: happy ~ +health + marital + poly(age, 2, raw = TRUE)
## Model 3: happy ~ ordered(money) + health + marital + ordered(suffer) +
## age
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2129 611.85
## 2 2131 685.96 -2 -74.105 129.312 < 2.2e-16 ***
## 3 2120 607.46 11 78.502 24.906 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model 6 is the best one, because it has less RSS.
AIC(model6)
## [1] 3420.193
AIC(model8)
## [1] 3417.627
However, by measuring of Akaike’s Information Criterion, model 8 is better. I think, that all models are good.
In Japan, old people care about lack of money more then young, it makes them less happy. Age barely effect on happiness. Wife or Husband is happiness.