downloading the data

library(readr)
wvsRU <- read_csv("~/datanal/3year/wvsRU.csv")
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   age = col_integer(),
##   marital = col_character(),
##   edu = col_character(),
##   income = col_character(),
##   happy = col_character(),
##   satisf = col_character(),
##   country = col_character()
## )
head(wvsRU)
## # A tibble: 6 x 8
##      X1   age marital     edu                income  happy   satisf country
##   <int> <int> <chr>       <chr>              <chr>   <chr>   <chr>  <chr>  
## 1 19334    35 Married     Complete primary … <NA>    Quite … 5      RU     
## 2 19335    55 Married     Incomplete second… Third … Not ve… 5      RU     
## 3 19336    42 Married     Incomplete second… Sixth … Very h… 4      RU     
## 4 19337    28 Married     Complete secondar… Fourth… Very h… Satis… RU     
## 5 19338    34 Married     Complete secondar… Fourth… Quite … 5      RU     
## 6 19339    23 Single/Nev… Incomplete second… Fourth… Quite … 5      RU

We want to understand whether variables “happy” and “satisfy” work generally in the same way = they characterize groups equally. For the given set of variables (“age”, “marital”, “edu”, “income”, “happy”, “satisf”), create a set of hypotheses for happiness and satisfaction. Treat satisfaction as a numeric variable. First explore the distributions. Then - test bivariate relationships. Last step - visualize. What is your conclusion? Which variable would you use in your analysis about happiness? Why?

  1. In general, how are happiness and satisfaction related to the sociodemographic characteristics?
  2. Have you found any differences in the effects? Interpret them.
  3. Which of the two variables would you take for your analysis of happiness? Why?
  4. What problems have you faced?

Hypothesis for happiness and satisfaction 1. level of satisfaction is related to marital status 2. the more the income step, the higher the satisfaction 3.

Data manipulation

let’s recode our satisfaction variable

wvsRU$sat = ifelse(wvsRU$satisf == "Satisfied", 10, ifelse(wvsRU$satisf == "Dissatisfied", 1, wvsRU$satisf)) 
wvsRU$sat = as.numeric(wvsRU$sat)

let’s look at the distribution

library(ggplot2)
hist(wvsRU$sat)

library(ggpubr)
## Loading required package: magrittr
ggqqplot(wvsRU$sat)

and check it for normality with simple ks.test. (Taking into consideration that we have more than 5000 observations and Shapiro will show us normality in any cases)

(\(H_0\)): the data is distributed normally;
(\(H_A\)): the data is not distributed normally.

library(ggplot2)

ggplot(data = wvsRU, aes(x =  sat))+
  geom_density()

library(stats)

ks.test(rnorm(10^4), wvsRU$sat)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  rnorm(10^4) and wvsRU$sat
## D = 0.90568, p-value < 2.2e-16
## alternative hypothesis: two-sided

as we see, our distribution is not really normal, as p-value < 2.2e-16 and given null hypothesis of normality. However we are okay with this violation. (many observation luck)

to work with happiness, let’s make it numeric the same way as satisfaction: the happier, the bigger the value

wvsRU$happy = as.factor(wvsRU$happy)
summary(wvsRU$happy)
## Not at all happy   Not very happy      Quite happy       Very happy 
##              507             3161             5516             1079 
##             NA's 
##              518
wvsRU$hap = ifelse(wvsRU$happy == "Not at all happy", 1, ifelse(wvsRU$happy == "Not very happy", 2, ifelse(wvsRU$happy == "Quite happy", 3, ifelse(wvsRU$happy == "Very happy", 4, wvsRU$happy))))

test bivariate relationships

t1 = t.test(wvsRU$hap, wvsRU$sat)
t2 = t.test(wvsRU$hap, wvsRU$sat, paired = TRUE)
t1
## 
##  Welch Two Sample t-test
## 
## data:  wvsRU$hap and wvsRU$sat
## t = -120.39, df = 12380, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.106701 -3.007159
## sample estimates:
## mean of x mean of y 
##  2.698334  5.755264
t2
## 
##  Paired t-test
## 
## data:  wvsRU$hap and wvsRU$sat
## t = -138.37, df = 10132, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.121768 -3.034553
## sample estimates:
## mean of the differences 
##                -3.07816

as we can conclude by the results of the tests, these to variables HAPPINESS and SATISFACTION have different means therefore, we can establish unconditional relationships. + as t-test shows, there is a significant (p-value < 2.2e-16) difference between HAPPINESS and SATISFACTION, the mean difference equals -3.07816 .

regressions

Satisfaction and Marital status

wvsRU$marital = as.factor(wvsRU$marital)
m1 = lm(sat ~ marital, data = wvsRU)
summary(m1)
## 
## Call:
## lm(formula = sat ~ marital, data = wvsRU)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0550 -1.8134  0.0583  2.0583  5.2920 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        4.81342    0.09039  53.252  < 2e-16 ***
## maritalLiving together as married  0.87584    0.20630   4.246  2.2e-05 ***
## maritalMarried                     1.12826    0.09522  11.849  < 2e-16 ***
## maritalSeparated                   0.12658    0.19648   0.644    0.519    
## maritalSingle/Never married        1.24159    0.10797  11.499  < 2e-16 ***
## maritalWidowed                    -0.10547    0.12283  -0.859    0.391    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.467 on 10531 degrees of freedom
##   (244 observations deleted due to missingness)
## Multiple R-squared:  0.03249,    Adjusted R-squared:  0.03203 
## F-statistic: 70.73 on 5 and 10531 DF,  p-value: < 2.2e-16

as wee see, not all marital statuses are related to satisfaction level significantly:

“Living together as married” is positively associated with coef = 0.9 (p-value = 2.2e-05) “Married” is positively associated with coef = 1.1 (p-value < 2e-16) “Never married” is positively associated with coef = 1.2 (p-value < 2e-16)

compared to divorced group

Happiness and Marital status

m2 = lm(hap ~ marital, data = wvsRU)
summary(m2)
## 
## Call:
## lm(formula = hap ~ marital, data = wvsRU)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.808 -0.788  0.212  0.212  1.790 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        2.31276    0.02565  90.164  < 2e-16 ***
## maritalLiving together as married  0.35184    0.06031   5.834 5.58e-09 ***
## maritalMarried                     0.47524    0.02703  17.579  < 2e-16 ***
## maritalSeparated                   0.03099    0.05618   0.552  0.58118    
## maritalSingle/Never married        0.49500    0.03066  16.146  < 2e-16 ***
## maritalWidowed                    -0.10266    0.03494  -2.938  0.00331 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6926 on 10209 degrees of freedom
##   (566 observations deleted due to missingness)
## Multiple R-squared:  0.0771, Adjusted R-squared:  0.07665 
## F-statistic: 170.6 on 5 and 10209 DF,  p-value: < 2.2e-16

as wee see, not all marital statuses are related to happiness level significantly:

“Living together as married” is positively associated with coef = 0.35 (p-value = 5.58e-09) “Married” is positively associated with coef = 0.48 (p-value < 2e-16) “Never married” is positively associated with coef = 0.49 (p-value < 2e-16) “Widowed” is negatively associated with coef = -0.1 (p-value < 0.00331)

compared to divorced group We should also notice that the R2 of the second model is more than 2 times bigger (but it’s still 0.07)

In general, models are similar, but in case of happiness there is one more marital status which is associated with the variable and in a negative way.

We may assume that being divorced doesn’t imply being less satisfied, but implies being less happy

wvsRU$income = as.factor(wvsRU$income)
wvsRU$income <- ordered(wvsRU$income, levels = c("Lower step", "second step", "Third step", "Fourth step", "Fifth step", "Sixth step", "Seventh step", "Eigth step", "Nineth step", "Tenth step"))
m3 = lm(hap ~ income, data = wvsRU)
summary(m3)
## 
## Call:
## lm(formula = hap ~ income, data = wvsRU)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8568 -0.5884  0.2262  0.2944  1.6133 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.691728   0.008445 318.740  < 2e-16 ***
## income.L     0.444065   0.030936  14.354  < 2e-16 ***
## income.Q    -0.165206   0.029997  -5.507 3.75e-08 ***
## income.C     0.019528   0.028380   0.688   0.4914    
## income^4    -0.005142   0.028093  -0.183   0.8548    
## income^5    -0.045935   0.027265  -1.685   0.0921 .  
## income^6     0.033271   0.025690   1.295   0.1953    
## income^7     0.036913   0.023851   1.548   0.1217    
## income^8     0.005063   0.022443   0.226   0.8215    
## income^9     0.014616   0.022163   0.659   0.5096    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7087 on 8347 degrees of freedom
##   (2424 observations deleted due to missingness)
## Multiple R-squared:  0.03123,    Adjusted R-squared:  0.03018 
## F-statistic:  29.9 on 9 and 8347 DF,  p-value: < 2.2e-16

There is positive leaner relation between happiness and level of income. Also there is negative quadratic relation between happiness and level of income.

wvsRUS = na.omit(wvsRU)
m4 = lm(sat ~ income, data = wvsRUS)
summary(m4)
## 
## Call:
## lm(formula = sat ~ income, data = wvsRUS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3501 -1.6120 -0.1183  1.7926  5.6366 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.609119   0.031604 177.481  < 2e-16 ***
## income.L     1.936910   0.114250  16.953  < 2e-16 ***
## income.Q    -0.536676   0.110449  -4.859 1.21e-06 ***
## income.C     0.043040   0.105273   0.409   0.6827    
## income^4    -0.053406   0.104731  -0.510   0.6101    
## income^5    -0.100063   0.102596  -0.975   0.3294    
## income^6     0.167033   0.097003   1.722   0.0851 .  
## income^7     0.163358   0.090498   1.805   0.0711 .  
## income^8    -0.009641   0.085881  -0.112   0.9106    
## income^9     0.124971   0.084200   1.484   0.1378    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.431 on 6680 degrees of freedom
## Multiple R-squared:  0.04943,    Adjusted R-squared:  0.04815 
## F-statistic:  38.6 on 9 and 6680 DF,  p-value: < 2.2e-16

Without NAs There is strong positive leaner relation between happiness and level of income. Also there is weak negative quadratic relation between happiness and level of income.

Same way as happiness , satisfaction decrease for low income, but much more significant.

Education?

wvsRUS$edu = as.factor(wvsRUS$edu)
wvsRU$edu <- ordered(wvsRU$edu, levels = c("Incomplete primary school", "Complete primary school", "Incomplete secondary school: technical/ vocational type", "Incomplete secondary school: university-preparatory type", "Complete secondary school: technical/ vocational type", "Complete secondary school: university-preparatory type", "Some university-level education, without degree", "University - level education, with degree"))
m5 = lm(hap ~ ordered(edu), data = wvsRUS)
summary(m5)
## 
## Call:
## lm(formula = hap ~ ordered(edu), data = wvsRUS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9799 -0.6758  0.2018  0.3242  1.3533 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2.75178    0.01017 270.673  < 2e-16 ***
## ordered(edu).L  0.17673    0.02996   5.898 3.85e-09 ***
## ordered(edu).Q  0.04172    0.02878   1.450 0.147123    
## ordered(edu).C -0.05760    0.02829  -2.036 0.041777 *  
## ordered(edu)^4 -0.08480    0.03056  -2.775 0.005530 ** 
## ordered(edu)^5 -0.09649    0.02936  -3.287 0.001017 ** 
## ordered(edu)^6 -0.09227    0.02681  -3.442 0.000582 ***
## ordered(edu)^7 -0.15617    0.02734  -5.711 1.17e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.726 on 6682 degrees of freedom
## Multiple R-squared:  0.01158,    Adjusted R-squared:  0.01054 
## F-statistic: 11.18 on 7 and 6682 DF,  p-value: 3.813e-14

Weak positive leaner relation between happiness and level of education. However, there is super-weak negative cubic relation. Each level decrease happiness in compare with lover level.

m6 = lm(sat ~ ordered(edu), data = wvsRUS)
summary(m6)
## 
## Call:
## lm(formula = sat ~ ordered(edu), data = wvsRUS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5772 -1.5772 -0.3629  1.9430  4.9430 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.68179    0.03461 164.148  < 2e-16 ***
## ordered(edu).L  0.64353    0.10201   6.308 3.00e-10 ***
## ordered(edu).Q  0.28396    0.09797   2.898  0.00376 ** 
## ordered(edu).C -0.12166    0.09631  -1.263  0.20657    
## ordered(edu)^4 -0.51816    0.10403  -4.981 6.50e-07 ***
## ordered(edu)^5 -0.29952    0.09995  -2.997  0.00274 ** 
## ordered(edu)^6 -0.26729    0.09129  -2.928  0.00342 ** 
## ordered(edu)^7 -0.72579    0.09310  -7.796 7.38e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 6682 degrees of freedom
## Multiple R-squared:  0.01706,    Adjusted R-squared:  0.01603 
## F-statistic: 16.57 on 7 and 6682 DF,  p-value: < 2.2e-16

Positive leaner relation, and positive quadratic relation between satisfaction and level of education. For complete secondary school (any type) or some uni degree with or without diploma, there is small decrease in happiness.

Conclusion

We don`t need no education. Higher education decrees satisfaction from live and happiness as well. Education in Russia shows people perspectives in this country. Money increase happiness, as always.