library(readr)
wvsRU <- read_csv("~/datanal/3year/wvsRU.csv")
## Parsed with column specification:
## cols(
## X1 = col_integer(),
## age = col_integer(),
## marital = col_character(),
## edu = col_character(),
## income = col_character(),
## happy = col_character(),
## satisf = col_character(),
## country = col_character()
## )
head(wvsRU)
## # A tibble: 6 x 8
## X1 age marital edu income happy satisf country
## <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 19334 35 Married Complete primary … <NA> Quite … 5 RU
## 2 19335 55 Married Incomplete second… Third … Not ve… 5 RU
## 3 19336 42 Married Incomplete second… Sixth … Very h… 4 RU
## 4 19337 28 Married Complete secondar… Fourth… Very h… Satis… RU
## 5 19338 34 Married Complete secondar… Fourth… Quite … 5 RU
## 6 19339 23 Single/Nev… Incomplete second… Fourth… Quite … 5 RU
We want to understand whether variables “happy” and “satisfy” work generally in the same way = they characterize groups equally. For the given set of variables (“age”, “marital”, “edu”, “income”, “happy”, “satisf”), create a set of hypotheses for happiness and satisfaction. Treat satisfaction as a numeric variable. First explore the distributions. Then - test bivariate relationships. Last step - visualize. What is your conclusion? Which variable would you use in your analysis about happiness? Why?
Hypothesis for happiness and satisfaction 1. level of satisfaction is related to marital status 2. the more the income step, the higher the satisfaction 3.
let’s recode our satisfaction variable
wvsRU$sat = ifelse(wvsRU$satisf == "Satisfied", 10, ifelse(wvsRU$satisf == "Dissatisfied", 1, wvsRU$satisf))
wvsRU$sat = as.numeric(wvsRU$sat)
let’s look at the distribution
library(ggplot2)
hist(wvsRU$sat)
library(ggpubr)
## Loading required package: magrittr
ggqqplot(wvsRU$sat)
and check it for normality with simple ks.test. (Taking into consideration that we have more than 5000 observations and Shapiro will show us normality in any cases)
(\(H_0\)): the data is distributed normally;
(\(H_A\)): the data is not distributed normally.
library(ggplot2)
ggplot(data = wvsRU, aes(x = sat))+
geom_density()
library(stats)
ks.test(rnorm(10^4), wvsRU$sat)
##
## Two-sample Kolmogorov-Smirnov test
##
## data: rnorm(10^4) and wvsRU$sat
## D = 0.90568, p-value < 2.2e-16
## alternative hypothesis: two-sided
as we see, our distribution is not really normal, as p-value < 2.2e-16 and given null hypothesis of normality. However we are okay with this violation. (many observation luck)
to work with happiness, let’s make it numeric the same way as satisfaction: the happier, the bigger the value
wvsRU$happy = as.factor(wvsRU$happy)
summary(wvsRU$happy)
## Not at all happy Not very happy Quite happy Very happy
## 507 3161 5516 1079
## NA's
## 518
wvsRU$hap = ifelse(wvsRU$happy == "Not at all happy", 1, ifelse(wvsRU$happy == "Not very happy", 2, ifelse(wvsRU$happy == "Quite happy", 3, ifelse(wvsRU$happy == "Very happy", 4, wvsRU$happy))))
t1 = t.test(wvsRU$hap, wvsRU$sat)
t2 = t.test(wvsRU$hap, wvsRU$sat, paired = TRUE)
t1
##
## Welch Two Sample t-test
##
## data: wvsRU$hap and wvsRU$sat
## t = -120.39, df = 12380, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.106701 -3.007159
## sample estimates:
## mean of x mean of y
## 2.698334 5.755264
t2
##
## Paired t-test
##
## data: wvsRU$hap and wvsRU$sat
## t = -138.37, df = 10132, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.121768 -3.034553
## sample estimates:
## mean of the differences
## -3.07816
as we can conclude by the results of the tests, these to variables HAPPINESS and SATISFACTION have different means therefore, we can establish unconditional relationships. + as t-test shows, there is a significant (p-value < 2.2e-16) difference between HAPPINESS and SATISFACTION, the mean difference equals -3.07816 .
Satisfaction and Marital status
wvsRU$marital = as.factor(wvsRU$marital)
m1 = lm(sat ~ marital, data = wvsRU)
summary(m1)
##
## Call:
## lm(formula = sat ~ marital, data = wvsRU)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0550 -1.8134 0.0583 2.0583 5.2920
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.81342 0.09039 53.252 < 2e-16 ***
## maritalLiving together as married 0.87584 0.20630 4.246 2.2e-05 ***
## maritalMarried 1.12826 0.09522 11.849 < 2e-16 ***
## maritalSeparated 0.12658 0.19648 0.644 0.519
## maritalSingle/Never married 1.24159 0.10797 11.499 < 2e-16 ***
## maritalWidowed -0.10547 0.12283 -0.859 0.391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.467 on 10531 degrees of freedom
## (244 observations deleted due to missingness)
## Multiple R-squared: 0.03249, Adjusted R-squared: 0.03203
## F-statistic: 70.73 on 5 and 10531 DF, p-value: < 2.2e-16
as wee see, not all marital statuses are related to satisfaction level significantly:
“Living together as married” is positively associated with coef = 0.9 (p-value = 2.2e-05) “Married” is positively associated with coef = 1.1 (p-value < 2e-16) “Never married” is positively associated with coef = 1.2 (p-value < 2e-16)
compared to divorced group
Happiness and Marital status
m2 = lm(hap ~ marital, data = wvsRU)
summary(m2)
##
## Call:
## lm(formula = hap ~ marital, data = wvsRU)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.808 -0.788 0.212 0.212 1.790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.31276 0.02565 90.164 < 2e-16 ***
## maritalLiving together as married 0.35184 0.06031 5.834 5.58e-09 ***
## maritalMarried 0.47524 0.02703 17.579 < 2e-16 ***
## maritalSeparated 0.03099 0.05618 0.552 0.58118
## maritalSingle/Never married 0.49500 0.03066 16.146 < 2e-16 ***
## maritalWidowed -0.10266 0.03494 -2.938 0.00331 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6926 on 10209 degrees of freedom
## (566 observations deleted due to missingness)
## Multiple R-squared: 0.0771, Adjusted R-squared: 0.07665
## F-statistic: 170.6 on 5 and 10209 DF, p-value: < 2.2e-16
as wee see, not all marital statuses are related to happiness level significantly:
“Living together as married” is positively associated with coef = 0.35 (p-value = 5.58e-09) “Married” is positively associated with coef = 0.48 (p-value < 2e-16) “Never married” is positively associated with coef = 0.49 (p-value < 2e-16) “Widowed” is negatively associated with coef = -0.1 (p-value < 0.00331)
compared to divorced group We should also notice that the R2 of the second model is more than 2 times bigger (but it’s still 0.07)
In general, models are similar, but in case of happiness there is one more marital status which is associated with the variable and in a negative way.
We may assume that being divorced doesn’t imply being less satisfied, but implies being less happy
wvsRU$income = as.factor(wvsRU$income)
wvsRU$income <- ordered(wvsRU$income, levels = c("Lower step", "second step", "Third step", "Fourth step", "Fifth step", "Sixth step", "Seventh step", "Eigth step", "Nineth step", "Tenth step"))
m3 = lm(hap ~ income, data = wvsRU)
summary(m3)
##
## Call:
## lm(formula = hap ~ income, data = wvsRU)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8568 -0.5884 0.2262 0.2944 1.6133
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.691728 0.008445 318.740 < 2e-16 ***
## income.L 0.444065 0.030936 14.354 < 2e-16 ***
## income.Q -0.165206 0.029997 -5.507 3.75e-08 ***
## income.C 0.019528 0.028380 0.688 0.4914
## income^4 -0.005142 0.028093 -0.183 0.8548
## income^5 -0.045935 0.027265 -1.685 0.0921 .
## income^6 0.033271 0.025690 1.295 0.1953
## income^7 0.036913 0.023851 1.548 0.1217
## income^8 0.005063 0.022443 0.226 0.8215
## income^9 0.014616 0.022163 0.659 0.5096
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7087 on 8347 degrees of freedom
## (2424 observations deleted due to missingness)
## Multiple R-squared: 0.03123, Adjusted R-squared: 0.03018
## F-statistic: 29.9 on 9 and 8347 DF, p-value: < 2.2e-16
There is positive leaner relation between happiness and level of income. Also there is negative quadratic relation between happiness and level of income.
wvsRUS = na.omit(wvsRU)
m4 = lm(sat ~ income, data = wvsRUS)
summary(m4)
##
## Call:
## lm(formula = sat ~ income, data = wvsRUS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3501 -1.6120 -0.1183 1.7926 5.6366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.609119 0.031604 177.481 < 2e-16 ***
## income.L 1.936910 0.114250 16.953 < 2e-16 ***
## income.Q -0.536676 0.110449 -4.859 1.21e-06 ***
## income.C 0.043040 0.105273 0.409 0.6827
## income^4 -0.053406 0.104731 -0.510 0.6101
## income^5 -0.100063 0.102596 -0.975 0.3294
## income^6 0.167033 0.097003 1.722 0.0851 .
## income^7 0.163358 0.090498 1.805 0.0711 .
## income^8 -0.009641 0.085881 -0.112 0.9106
## income^9 0.124971 0.084200 1.484 0.1378
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.431 on 6680 degrees of freedom
## Multiple R-squared: 0.04943, Adjusted R-squared: 0.04815
## F-statistic: 38.6 on 9 and 6680 DF, p-value: < 2.2e-16
Without NAs There is strong positive leaner relation between happiness and level of income. Also there is weak negative quadratic relation between happiness and level of income.
Same way as happiness , satisfaction decrease for low income, but much more significant.
Education?
wvsRUS$edu = as.factor(wvsRUS$edu)
wvsRU$edu <- ordered(wvsRU$edu, levels = c("Incomplete primary school", "Complete primary school", "Incomplete secondary school: technical/ vocational type", "Incomplete secondary school: university-preparatory type", "Complete secondary school: technical/ vocational type", "Complete secondary school: university-preparatory type", "Some university-level education, without degree", "University - level education, with degree"))
m5 = lm(hap ~ ordered(edu), data = wvsRUS)
summary(m5)
##
## Call:
## lm(formula = hap ~ ordered(edu), data = wvsRUS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9799 -0.6758 0.2018 0.3242 1.3533
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.75178 0.01017 270.673 < 2e-16 ***
## ordered(edu).L 0.17673 0.02996 5.898 3.85e-09 ***
## ordered(edu).Q 0.04172 0.02878 1.450 0.147123
## ordered(edu).C -0.05760 0.02829 -2.036 0.041777 *
## ordered(edu)^4 -0.08480 0.03056 -2.775 0.005530 **
## ordered(edu)^5 -0.09649 0.02936 -3.287 0.001017 **
## ordered(edu)^6 -0.09227 0.02681 -3.442 0.000582 ***
## ordered(edu)^7 -0.15617 0.02734 -5.711 1.17e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.726 on 6682 degrees of freedom
## Multiple R-squared: 0.01158, Adjusted R-squared: 0.01054
## F-statistic: 11.18 on 7 and 6682 DF, p-value: 3.813e-14
Weak positive leaner relation between happiness and level of education. However, there is super-weak negative cubic relation. Each level decrease happiness in compare with lover level.
m6 = lm(sat ~ ordered(edu), data = wvsRUS)
summary(m6)
##
## Call:
## lm(formula = sat ~ ordered(edu), data = wvsRUS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5772 -1.5772 -0.3629 1.9430 4.9430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.68179 0.03461 164.148 < 2e-16 ***
## ordered(edu).L 0.64353 0.10201 6.308 3.00e-10 ***
## ordered(edu).Q 0.28396 0.09797 2.898 0.00376 **
## ordered(edu).C -0.12166 0.09631 -1.263 0.20657
## ordered(edu)^4 -0.51816 0.10403 -4.981 6.50e-07 ***
## ordered(edu)^5 -0.29952 0.09995 -2.997 0.00274 **
## ordered(edu)^6 -0.26729 0.09129 -2.928 0.00342 **
## ordered(edu)^7 -0.72579 0.09310 -7.796 7.38e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 6682 degrees of freedom
## Multiple R-squared: 0.01706, Adjusted R-squared: 0.01603
## F-statistic: 16.57 on 7 and 6682 DF, p-value: < 2.2e-16
Positive leaner relation, and positive quadratic relation between satisfaction and level of education. For complete secondary school (any type) or some uni degree with or without diploma, there is small decrease in happiness.
We don`t need no education. Higher education decrees satisfaction from live and happiness as well. Education in Russia shows people perspectives in this country. Money increase happiness, as always.