Group members

Anna Gorobtsova
Nadezda Bykova
Artem Kulikov
Anastasia Vlasenko

Introduction

General idea:

The general idea of this paper is based on the assumption that there are some universal predictors of happiness, which work irrespective of the country and specific cultural features which also contribute to the level of happiness. Among universal predictors it is possible to highlight the state of health of citizens and their satisfaction with financial situation. There are several statements in favour of the fact that these two variables are indeed universal and work for most of the countries. First of all, general state of the health of the person is closely connected with his or her mental health. Therefore, people who often experience stress, anxiety or depression are more likely to be less satisfied with their lives. Additionally, one’s dissatisfaction with his or her financial situation expresses the fact that a person doesn’t have enough financial resources in order to satisfy all the needs. Therefore, people who are not satisfied with their financial situation are more likely to be unhappy. So these variables work almost for any country, as all people want to be healthy and financially stable.

However, there are still some variables, which work only for some cultures and not for others. For example, some countries value hard work, while others might have more hedonic attitude towards life. Moreover, some cultures value freedom of choice and their ability to make decisions by themselves. Therefore, the extent to which people are free in their decisions might an important preidictor of happiness.

Therefore, the general research question is as follows: Do culture specific features contribute a lot to the overall level of happiness in the chosen country? For this purpose World Value Survey from the years 2005-2006 has been used.

Country choice and data description:

In order to conduct the analysis the United States of America have been chosen. Out of 347922 observation from the whole WVS dataset 8155 are from USA and after ommiting all the NA’s 3931 complete cases have been obtained.

The whole WVS dataset contains 289 variables. However, as it was stated in the introduction only four of them are used in the model building:

V11 - measures state of health
V46 - measures freedom of choice
V120 - measures the extent to which a person values hard work
V68 - measures one’s satisfaction with financial situation

Additionally, two variables were used in order to create the happiness index, which is the explanatory variable (the exact formula for creating the index will be specified during the analysis):

V10 - measures happiness level of a person
V22 - measures one’s life satisfaction

Finally, gender and age are also taken into account in some cases:

V235 - gender
V22 - age

Hypothesis:

The main hypothesis is that universal predictors, such as health and financial situation explain more variance than culture specific ones like value of hard work and freedom of choice.

Data preparation

Downloading necessary packages:

library(foreign)
library(ggplot2)
library(car)
library(lmtest)
library(sjPlot)
library(ggcorrplot)

Importing data:

wvs <- read.spss("wvs.sav", to.data.frame = TRUE, use.value.labels = TRUE)

Recoding satisfaction variable:

wvs$sat <- rep(NA, length(wvs$V22))

wvs$sat[wvs$V22 == "Dissatisfied" |
          wvs$V22 == "2"] <- "1"

wvs$sat[wvs$V22 == "3" |
          wvs$V22 == "4" |
          wvs$V22 == "5"] <- "2" 

wvs$sat[wvs$V22 == "6" |
          wvs$V22 == "7" |
          wvs$V22 == "8"] <- "3" 

wvs$sat[wvs$V22 == "Satisfied" |
          wvs$V22 == "9"] <- "4"

Recoding happiness variable:

wvs$V10 <-   ifelse(wvs$V10 =="Not at all happy",1,
                            ifelse(wvs$V10 =="Not very happy",2,
                                   ifelse(wvs$V10 =="Quite happy",3,
                                          ifelse(wvs$V10 =="Very happy",4, NA))))

Creating index of happiness:

In order to create the index of happiness we decided to sum up satisfaction variable(V22) and hapiness variable(V10). Also before doing so we recoded the satisfaction variable making less levels in order to put both variables on the same scale:

wvs$hapIND <- as.numeric(wvs$V10) + as.numeric(wvs$sat)

Creating a subset with USA and recoding necessary variables:

wvsUSA <- subset(wvs, V2 == "USA")
wvsUSA$V68 <- ifelse(wvsUSA$V68 == "Completely satisfied",10,
                        ifelse(wvsUSA =="Completely dissatisfied",1,
                               wvsUSA$V68))

wvsUSA$V11 <- ifelse(wvsUSA$V11=="Very good",4,
                     ifelse(wvsUSA$V11=="Good",3,
                            ifelse(wvsUSA$V11=="Fair",2,
                                   ifelse(wvsUSA$V11=="Poor",1, NA))))       
 
wvsUSA$V46 <- ifelse (wvsUSA$V46=="None at all",1,
                      ifelse(wvsUSA$V46=="A great deal",10,
                             wvsUSA$V46))

wvsUSA$V120 <- ifelse(wvsUSA$V120=="Hard work doesn't generally bring success - it's more a matter of luck and connections",10,
                      ifelse(wvsUSA$V120=="In the long run, hard work usually brings a better life",1,
                             wvsUSA$V120))

Creating a separate list with needed variables in order to remove NA’s:

save <- c("V10", "V11", "V2", "V68", "V120", "V46", "V22", "V4", "hapIND", "V237","V239", "V235")
data1 <- wvsUSA[save] 
data1 <- na.omit(data1)

wvsUSA1 <- data1

Specifying variable types:

wvsUSA1$V237 <- as.numeric(as.character(wvsUSA1$V237))
wvsUSA1$V120 <- as.numeric(as.character(wvsUSA1$V120))
wvsUSA1$V46 <- as.numeric(as.character(wvsUSA1$V46))
wvsUSA1$V68 <- as.numeric(as.character(wvsUSA1$V68))
wvsUSA1$V11 <- as.factor(wvsUSA1$V11)
wvsUSA1$V239 <- as.numeric(as.character(wvsUSA1$V239))

Descriptive statistics of the USA dataset

Vizualizing happiness:

ggplot(wvsUSA1, aes(x = hapIND)) +
  geom_bar(col = "navy", fill = "cornflowerblue") +
  xlab("Happiness level, from low to high") +
  ylab("Number of observations") +
  ggtitle("Happiness level for USA") +
  geom_vline(aes(xintercept = mean(wvsUSA1$hapIND), colour="Mean"), lwd=1.1 )

From the barplot it can be seen that generally people in USA feel themselves more or less happy.

State of health in the USA

ggplot(wvsUSA1, aes(x = V11, y = hapIND)) +
  geom_boxplot(col = "navy", fill = "cornflowerblue") +
  theme(axis.text.x=element_text(vjust = 0.5)) +
  xlab("Level of health") + ggtitle("Health boxplot") + ylab("Subjective appiness level")+
  scale_x_discrete(labels=c("Poor", "Fair", "Good", "Very good"))

Boxplot above shows that people with very good health have the highest median happiness level.

Freedom of choice in USA

ggplot(wvsUSA1, aes(x=V46)) + 
  geom_histogram(binwidth=0.5, col = "navy", fill = "cornflowerblue")+
  labs(title="Freedom of choice histogram", x="Self-percievesd freedom of choice", y = "Number of people")+
  geom_vline(aes(xintercept = mean(wvsUSA1$hapIND), colour="Mean"), lwd=1.1 )

On the histogram above 1 means, that people feel that they don’t have freedom of choice at all and 10 that they have a great deal of choice. Therefore, we can see that people are not completely free in their choices but still there is some degeree of freedom.

Hard work histogram

ggplot(wvsUSA1, aes(x=V120)) + 
  geom_histogram(binwidth=0.5,col = "navy", fill = "cornflowerblue")+
  labs(title="Hard work histogram", x="Does hard work brings better life", y = "Number of people")+
  geom_vline(aes(xintercept = mean(wvsUSA1$hapIND), colour="Mean"), lwd=1.1 )

On the histogram above 1 means that people value hard work and believe that it brings a better life, while 10 means that they think that hard work doesn’t generally bring success. Therefore, it can be seen, that people in America mostly value hard work and believe that it can bring success and better life.

Financial satisfaction histogram

ggplot(wvsUSA1, aes(x=V68)) + 
  geom_histogram(binwidth=0.5, col = "navy", fill = "cornflowerblue")+
  labs(title="Financial situation histogram", x="Financial satisfaction level", y = "Number of people")+
  geom_vline(aes(xintercept = mean(wvsUSA1$hapIND), colour="Mean"), lwd=1.1 )

On the histogram above 1 means that they are completely dissatisfied with the financial situation of their household and 10 that they are completely ssatisfied with it. Therefore, it can be seen generally people are quite satisfied with their financial situation.

Correlation matrix

corrsubset <- wvsUSA1[c("hapIND", "V46", "V120", "V68")]
names(corrsubset) <- c ("Happiness", "Choice", "Hard work", 
                      "Financial satisfaction")
cornum <- cor(corrsubset, use="complete.obs")
ggcorrplot(cornum, lab = TRUE, type = "lower")

From the correlation matrix it can be seen that our explanatory variable has higher correlation coefficients with choice varibale and financial situation variable, which is 0,4 and 0,43, meaning that there is some positive correlation (the higher the level of freedom, the higher the level of happiness and the higher the financial satisfaction the higher the happiness level). All other variables have not that big correlation coefficients with Happiness variable and between each other as well.

T-test for gender variable

thap <- t.test(hapIND ~ V235, wvsUSA1)
thap

## 
##  Welch Two Sample t-test
## 
## data:  hapIND by V235
## t = -0.147, df = 3923.2, p-value = 0.8831
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.07572420  0.06516071
## sample estimates:
##   mean in group Male mean in group Female 
##             6.504343             6.509625

The p-value equals 0.8831, which is rather big. Therefore, we accept the null hypothesis and conclude that there is no difference in mean levels of happiness between men and women.

Anova for health variable

aov.out <- aov(wvsUSA1$hapIND ~ wvsUSA1$V11)
summary(aov.out)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## wvsUSA1$V11    3    415  138.40   118.9 <2e-16 ***
## Residuals   3927   4571    1.16                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-value is small, which means, that the difference in happiness level within health groups is statistically significant

Building models

Assumptions of linear regression:

Linearity
Homoscedasticity
No multicolleniarity
Normality of distribution

Model 1 (happiness ~ health):

model1<-lm(hapIND ~ V11, data = wvsUSA1) 
summary(model1)

## 
## Call:
## lm(formula = hapIND ~ V11, data = wvsUSA1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8758 -0.7255  0.1242  0.9056  2.2745 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.72549    0.08723  65.640  < 2e-16 ***
## V112         0.36894    0.09701   3.803 0.000145 ***
## V113         0.69123    0.09112   7.586 4.11e-14 ***
## V114         1.15028    0.09169  12.545  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.079 on 3927 degrees of freedom
## Multiple R-squared:  0.08326,    Adjusted R-squared:  0.08256 
## F-statistic: 118.9 on 3 and 3927 DF,  p-value: < 2.2e-16

This model explains only 8% of variance in the explanatory variable. The p-values are significant and coefficients shows that:

Interpretation of coefficients:

Comparing with people, who have poor health,for those who have Fair health happiness level increases by 0.36894
Comparing with people, who have poor health,for those who have good health happiness level increases by 0.69123
Comparing with people, who have poor health,for those who have very good health happiness level increases 1.15028
So generally we can say that the better health a person has, the happier he or she is

Model 2 (happiness ~ health + financial situation):

model2<-lm(hapIND ~ V11 + V68, data = wvsUSA1) 
summary(model2)

## 
## Call:
## lm(formula = hapIND ~ V11 + V68, data = wvsUSA1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0612 -0.6955  0.0243  0.7294  2.9710 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.667153   0.088570  52.694  < 2e-16 ***
## V112        0.336978   0.088797   3.795  0.00015 ***
## V113        0.584855   0.083492   7.005  2.9e-12 ***
## V114        0.942783   0.084260  11.189  < 2e-16 ***
## V68         0.180923   0.006556  27.597  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9875 on 3926 degrees of freedom
## Multiple R-squared:  0.2322, Adjusted R-squared:  0.2314 
## F-statistic: 296.8 on 4 and 3926 DF,  p-value: < 2.2e-16

This model includes only those predictors which were assumed to be universal. It can be seen that it explains ~23% of variance. The p-values are significant and coefficients shows that:

Interpretation of coefficients:

Comparing with people, who have poor health,for those who have Fair health happiness level increases by 0.336978
Comparing with people, who have poor health,for those who have good health happiness level increases by 0.584855
Comparing with people, who have poor health,for those who have very good health happiness level increases 0.942783
So generally we can say that the better health a person has, the happier he or she is
And with one unit increase in financial satisfaction variable, the level of happiness increases by 0.180923. This means that those people who are satisfied with the financial situation of their household are generally more happy

Model 3 (happiness ~ health + financial situation + hard work)

model3<-lm(hapIND ~ V11 + V68 + V46, data = wvsUSA1) 
summary(model3)

## 
## Call:
## lm(formula = hapIND ~ V11 + V68 + V46, data = wvsUSA1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5167 -0.6033  0.0048  0.6599  3.1753 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.750342   0.096469  38.876  < 2e-16 ***
## V112        0.269057   0.084788   3.173  0.00152 ** 
## V113        0.469508   0.079872   5.878 4.49e-09 ***
## V114        0.785450   0.080786   9.723  < 2e-16 ***
## V68         0.147451   0.006482  22.750  < 2e-16 ***
## V46         0.164472   0.008349  19.701  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9422 on 3925 degrees of freedom
## Multiple R-squared:  0.3013, Adjusted R-squared:  0.3004 
## F-statistic: 338.5 on 5 and 3925 DF,  p-value: < 2.2e-16

Here we start adding culture specific predictors. In his model there is only one, which is freedom of choice and we can see that it adds approximately 7% of variance in the explanatory variable. The p-values are significant and coefficients shows that:

Interpretation of coefficients:

Comparing with people, who have poor health,for those who have Fair health happiness level increases by 0.269057
Comparing with people, who have poor health,for those who have good health happiness level increases by 0.469508
Comparing with people, who have poor health,for those who have very good health happiness level increases 0.785450
So generally we can say that the better health a person has, the happier he or she is
And with one unit increase in financial satisfaction variable, the level of happiness increases by 0.147451. This means that those people who are satisfied with the financial situation of their household are generally more happy
With one unit increase in the choice variable, the level of happiness increases by 0.164472. Which means that the more freedom of choice a person has the more happy he or she is

Model 4 (happiness ~ health + financial satisfaction + freedom of choice):

model4<-lm(hapIND ~ V11 + V68 + V46 + V120, data = wvsUSA1) 
summary(model4)

## 
## Call:
## lm(formula = hapIND ~ V11 + V68 + V46 + V120, data = wvsUSA1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5059 -0.6169  0.0219  0.6547  3.3359 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.897671   0.101400  38.438  < 2e-16 ***
## V112         0.277721   0.084592   3.283  0.00104 ** 
## V113         0.481076   0.079707   6.036 1.73e-09 ***
## V114         0.792382   0.080593   9.832  < 2e-16 ***
## V68          0.144328   0.006500  22.203  < 2e-16 ***
## V46          0.160632   0.008369  19.194  < 2e-16 ***
## V120        -0.029814   0.006474  -4.605 4.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9398 on 3924 degrees of freedom
## Multiple R-squared:  0.305,  Adjusted R-squared:  0.304 
## F-statistic: 287.1 on 6 and 3924 DF,  p-value: < 2.2e-16

This model contains both universal and culture specific predictors. And we can see that culture specific variables together add only 7% of variance to the model with two universal predictors. Therefore, the hypothesis which was stated in the beginig is correct and universal predictors like health and financial satisfaction explain bigger share of variance than the culture specific ones. The p-values are significant and coefficients shows that:

Interpretation of coefficients:

Comparing with people, who have poor health,for those who have Fair health happiness level increases by 0.277721
Comparing with people, who have poor health,for those who have good health happiness level increases by 0.481076
Comparing with people, who have poor health,for those who have very good health happiness level increases 0.792382
So generally we can say that the better health a person has, the happier he or she is
And with one unit increase in financial satisfaction variable, the level of happiness increases by 0.144328. This means that those people who are satisfied with the financial situation of their household are generally more happy
With one unit increase in the choice variable, the level of happiness increases by 0.160632. Which means that the more freedom of choice a person has the more happy he or she is
With one unit increase in hard work variable the levelof happiness decreases by 0.029814. In simple words this means that the more valuable to a person work is, the less happy he or she is.

Comparing models:

anova(model1,model2)

## Analysis of Variance Table
## 
## Model 1: hapIND ~ V11
## Model 2: hapIND ~ V11 + V68
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   3927 4571.4                                  
## 2   3926 3828.7  1    742.69 761.57 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(model2,model3)

## Analysis of Variance Table
## 
## Model 1: hapIND ~ V11 + V68
## Model 2: hapIND ~ V11 + V68 + V46
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   3926 3828.7                                  
## 2   3925 3484.2  1    344.52 388.11 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(model3,model4)

## Analysis of Variance Table
## 
## Model 1: hapIND ~ V11 + V68 + V46
## Model 2: hapIND ~ V11 + V68 + V46 + V120
##   Res.Df    RSS Df Sum of Sq     F    Pr(>F)    
## 1   3925 3484.2                                 
## 2   3924 3465.4  1    18.731 21.21 4.246e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that in each step of model comparison the p-value is significant, which means that every new model is better than the previous one. Therefore, we can claim that model 4 (which contains both universal and culture specific predictors) is the best one.

Checking for multicollinearity:

vif(model4)

##          GVIF Df GVIF^(1/(2*Df))
## V11  1.043999  3        1.007202
## V68  1.112580  1        1.054789
## V46  1.114957  1        1.055915
## V120 1.031463  1        1.015610

Values are less than 5. Therefore, it can be concluded that we do not have multicollinearity.

Model diagnostics:

par(mfrow = c(2,2))
plot(model4)

Residuals VS Fitted (we can see that dots are not quite evenly dispersed around zero, whihch means that we face the problem of heteroscedasticity)
Normal Q-Q plot shows that our data is normally distributed
Also we do not have any leverages or influential cases, as Cook’s distance line is not present on the last plot

Checking for heteroscedasticity again:

bptest(model4)

## 
##  studentized Breusch-Pagan test
## 
## data:  model4
## BP = 89.518, df = 6, p-value < 2.2e-16

ncvTest(model4)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 80.54891, Df = 1, p = < 2.22e-16

These two tests also produce significant p-values, which proves the fact that we have the problem of heteroscedasticity. However, all other Linear Regression assumptions have been met. Therefore, we can claim that our model is unbiased . Moreover, problem of heteroscedasticity is a common one for cross sectional studies like this one,meaninig that, for example in case of USA we can have rather different values for different states, in some of them they might be rather small in others rather big, especially taling into account the fact that USA is rather diverse country.

Adding non-linear effect in our model:

model6 <-lm(hapIND ~ V11 + V68 + V46 + V120 + V4 + poly(V237, 3, raw = TRUE), data = wvsUSA1)
summary(model6)

## 
## Call:
## lm(formula = hapIND ~ V11 + V68 + V46 + V120 + V4 + poly(V237, 
##     3, raw = TRUE), data = wvsUSA1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4204 -0.6053  0.0203  0.6549  3.2831 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 4.502e+00  2.834e-01  15.886  < 2e-16 ***
## V112                        3.204e-01  8.416e-02   3.807 0.000143 ***
## V113                        5.502e-01  8.004e-02   6.875 7.20e-12 ***
## V114                        8.867e-01  8.184e-02  10.835  < 2e-16 ***
## V68                         1.354e-01  6.671e-03  20.299  < 2e-16 ***
## V46                         1.581e-01  8.326e-03  18.991  < 2e-16 ***
## V120                       -2.721e-02  6.439e-03  -4.225 2.44e-05 ***
## V4Rather important         -4.400e-01  6.793e-02  -6.478 1.04e-10 ***
## V4Not very important       -2.188e-01  1.662e-01  -1.317 0.187974    
## V4Not at all important     -5.642e-02  3.312e-01  -0.170 0.864746    
## poly(V237, 3, raw = TRUE)1 -4.770e-02  1.758e-02  -2.713 0.006702 ** 
## poly(V237, 3, raw = TRUE)2  1.034e-03  3.617e-04   2.859 0.004279 ** 
## poly(V237, 3, raw = TRUE)3 -6.343e-06  2.317e-06  -2.737 0.006221 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9322 on 3918 degrees of freedom
## Multiple R-squared:  0.3172, Adjusted R-squared:  0.3151 
## F-statistic: 151.7 on 12 and 3918 DF,  p-value: < 2.2e-16

So from the summary table it can be seen that in the first part of the polynom with increase in age people experience the decrease of the level of happiness by 4.770e-02, then it increases by 1.034e-03 and then again drops by 6.343e-06

Plotting modelwith non-linear effect:

ggplot(wvsUSA1, aes(V237, hapIND)) +
  geom_point() +
  stat_smooth(model = model6)

So the plot above shows the trend described above. We can observe that in the period of 35-40 years old people are not that happy. Probably this is connected with the fact that when you get older a lot of responsibilities occur, causing you to feel a little bit more anxious about things, therefore the level of happiness drops. And then after 40 years old, when your children are more or less grown ups and you are more stable financialy and morally, you start to enjoy your life more because now you have more time for yourself. And when people reach approximately 80 years old the level of their happiness again starts to decrease, as probably during this age period they start to experience a lot of health problems.

Adding interaction effect

model7 <- lm(hapIND ~ V11 + V68 + V120 + V4 + V46*V235, data = wvsUSA1)
summary(model7)

## 
## Call:
## lm(formula = hapIND ~ V11 + V68 + V120 + V4 + V46 * V235, data = wvsUSA1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4973 -0.6183  0.0192  0.6524  3.2311 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             4.059443   0.118631  34.219  < 2e-16 ***
## V112                    0.298394   0.084258   3.541 0.000403 ***
## V113                    0.493619   0.079418   6.215 5.65e-10 ***
## V114                    0.802530   0.080301   9.994  < 2e-16 ***
## V68                     0.144087   0.006476  22.251  < 2e-16 ***
## V120                   -0.029121   0.006446  -4.518 6.44e-06 ***
## V4Rather important     -0.436179   0.068120  -6.403 1.70e-10 ***
## V4Not very important   -0.193597   0.166839  -1.160 0.245966    
## V4Not at all important -0.004553   0.331725  -0.014 0.989051    
## V46                     0.139543   0.011429  12.210  < 2e-16 ***
## V235Female             -0.259634   0.124122  -2.092 0.036523 *  
## V46:V235Female          0.036358   0.015781   2.304 0.021283 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9347 on 3919 degrees of freedom
## Multiple R-squared:  0.3134, Adjusted R-squared:  0.3115 
## F-statistic: 162.7 on 11 and 3919 DF,  p-value: < 2.2e-16

plot_model(model7, type = "int")

Here we added interaction between freedom of choice variable and gender variable

Interpretation of the result

With increase in freedom of choice the level of happiness increases not equally for men and women, which means that we do have an interaction effect. So from the plot it can be seen that when the level of freedom of choice reaches the value of approximately 6 and goes further, the level of happiness for women starts to grow more rapidly and at the point of 7.5 it starts to exeed the level of happiness of men.

Conclusion

After conducting the analysis the research question has been answered successfully. We may conclude that culture specific predictors, like the extent to which a person values hard work and the extent to which he or she is free in making choices and decisions do not contribute a lot to the model with only universal predictors. However, they still make this model explain more variance. Therefore, considering only universal predictors is not a good approach, it is still important to take into account culture specific features of the country.

Data Analysis 1st task

Anna Gorobtsova, group 172

21 04 2020

Group members

Introduction

General idea:

Country choice and data description:

Hypothesis:

Data preparation

Downloading necessary packages:

Importing data:

Recoding satisfaction variable:

Recoding happiness variable:

Creating index of happiness:

Creating a subset with USA and recoding necessary variables:

Creating a separate list with needed variables in order to remove NA’s:

Specifying variable types:

Descriptive statistics of the USA dataset

Vizualizing happiness:

State of health in the USA

Freedom of choice in USA

Hard work histogram

Financial satisfaction histogram

Correlation matrix

T-test for gender variable

Anova for health variable

Building models

Assumptions of linear regression:

Model 1 (happiness ~ health):

Model 2 (happiness ~ health + financial situation):

Model 3 (happiness ~ health + financial situation + hard work)

Model 4 (happiness ~ health + financial satisfaction + freedom of choice):

Comparing models:

Checking for multicollinearity:

Model diagnostics:

Adding non-linear effect in our model:

Adding interaction effect

Conclusion