Practical task 4

The practical task includes three types of action:

  • Running a script that is already written for you. The script embraces something new or reminds the material. I give you an explanation and interpretation of the results.
  • Answering questions. You should write down each answer.
  • Writing a script. You should write down own script and explanation of the results.

If there are no errors in the script, you can knit a document, and send me doc, Html or pdf file.

# Attaching the packages to work with
library(foreign)
library(sjPlot)
library(ggplot2)
# Loading the dataset and inspecting the variables
wvs <- read.spss("wvs RU.sav", to.data.frame=TRUE)
names(wvs)
##   [1] "S003"    "S004"    "S007"    "S009"    "S010"    "S018"    "S018a"  
##   [8] "S019"    "S019a"   "S021"    "S021A"   "S024"    "S024A"   "S025"   
##  [15] "S025A"   "S026"    "V1"      "V1A"     "V1B"     "V2"      "V2A"    
##  [22] "V3"      "V4"      "V5"      "V6"      "V7"      "V8"      "V9"     
##  [29] "V10"     "V11"     "V12"     "V13"     "V14"     "V15"     "V16"    
##  [36] "V17"     "V18"     "V19"     "V20"     "V21"     "V22"     "V23"    
##  [43] "V24"     "V25"     "V26"     "V27"     "V28"     "V29"     "V30"    
##  [50] "V31"     "V32"     "V33"     "V34"     "V35"     "V36"     "V37"    
##  [57] "V38"     "V39"     "V40"     "V41"     "V42"     "V43"     "V44"    
##  [64] "V45"     "V46"     "V47"     "V48"     "V49"     "V50"     "V51"    
##  [71] "V52"     "V53"     "V54"     "V55"     "V56"     "V57"     "V58"    
##  [78] "V59"     "V60"     "V61"     "V62"     "V63"     "V64"     "V65"    
##  [85] "V66"     "V67"     "V68"     "V69"     "V69HK"   "V70"     "V70HK"  
##  [92] "V71"     "V72"     "V73"     "V74HK"   "V74"     "V73HK"   "V75"    
##  [99] "V76"     "V77"     "V78"     "V79"     "V80"     "V81"     "V82"    
## [106] "V83"     "V84"     "V85"     "V86"     "V87"     "V88"     "V89"    
## [113] "V90"     "V91"     "V92"     "V93"     "V94"     "V95"     "V96"    
## [120] "V97"     "V98"     "V99"     "V100"    "V101"    "V102"    "V103"   
## [127] "V104"    "V105"    "V106"    "V107"    "V108"    "V109"    "V110"   
## [134] "V111"    "V112"    "V113"    "V114"    "V115"    "V116"    "V117"   
## [141] "V118"    "V119"    "V120"    "V121"    "V122"    "V123"    "V124"   
## [148] "V125"    "V126"    "V127"    "V128"    "V129"    "V130"    "V131"   
## [155] "V132"    "V133"    "V134"    "V135"    "V136"    "V137"    "V138"   
## [162] "V139"    "V140"    "V141"    "V142"    "V143"    "V144"    "V145"   
## [169] "V146"    "V147"    "V148"    "V149"    "V150"    "V151"    "V152"   
## [176] "V153"    "V154"    "V155"    "V156"    "V157"    "V158"    "V159"   
## [183] "V160"    "V161"    "V162"    "V163"    "V164"    "V165"    "V166"   
## [190] "V167"    "V168"    "V169"    "V170"    "V171"    "V172"    "V173"   
## [197] "V174"    "V175"    "V176"    "V177"    "V178"    "V179"    "V180"   
## [204] "V181"    "V182"    "V183"    "V184"    "V185"    "V186"    "V187"   
## [211] "V188"    "V189"    "V190"    "V191"    "V192"    "V193"    "V194"   
## [218] "V195"    "V196"    "V197"    "V198"    "V199"    "V200"    "V201"   
## [225] "V202"    "V203"    "V204"    "V205"    "V206"    "V207"    "V208"   
## [232] "V209"    "V210"    "V211"    "V212"    "V214"    "V215"    "V216"   
## [239] "V217"    "V218"    "V219"    "V220"    "V221"    "V222"    "V223"   
## [246] "V224"    "V225"    "V226"    "V227"    "V228"    "V229"    "V230"   
## [253] "V231"    "V232"    "V233"    "V233A"   "V234"    "V235"    "V236"   
## [260] "V237"    "V238"    "V239"    "V240"    "V241"    "V242"    "V243"   
## [267] "V244"    "V245"    "V246"    "V247"    "V248"    "V249"    "V250"   
## [274] "V251"    "V252"    "V252B"   "V253"    "V253CS"  "V254"    "V255"   
## [281] "V255CS"  "V256"    "V257"    "V258"    "V259A"   "V259"    "V260"   
## [288] "number"  "country"

There is a total of 10781 observations of 289 variables.

PART 1. Preparing the data

# Selecting particular variables to work with
wvs.ext <- data.frame(wvs$V237, wvs$V55, wvs$V238, wvs$V253, 
                      wvs$V10, wvs$V22, wvs$S009, wvs$V23,
                      wvs$V46, wvs$V205,
                      wvs$V4,wvs$V5,wvs$V6,wvs$V7,wvs$V8,wvs$V9)
names (wvs.ext) <- c ("age", "marital", "edu", "income", 
                      "happy", "satisf", "country", "trust", 
                      "choice", "divorce", "Family", 
                      "Friends", "LeisureTime", "Politics",
                      "Work", "Religion")

Recoding variables

Recoding variable “satisf”

# checking which class it belongs to
class(wvs.ext$satisf)
## [1] "factor"
# We observe that now "satisf" is a factor variable

table(wvs.ext$satisf)
## 
## Dissatisfied            2            3            4            5            6 
##          767          427          902          869         2262         1135 
##            7            8            9    Satisfied 
##         1304         1363          621          941
# All the levels except for the lowest and highest one and coded as numbers (a Likert scale). We should substitute "Satisfied" and "Dissatisfied" values with 10 and 1 accordingly 

wvs.ext$satisf1 <-   ifelse(wvs.ext$satisf=="Satisfied",10,
                             ifelse(wvs.ext$satisf=="Dissatisfied",1,
                                    wvs.ext$satisf))
table(wvs.ext$satisf, wvs.ext$satisf1)
##               
##                   1    2    3    4    5    6    7    8    9   10
##   Dissatisfied  767    0    0    0    0    0    0    0    0    0
##   2               0  427    0    0    0    0    0    0    0    0
##   3               0    0  902    0    0    0    0    0    0    0
##   4               0    0    0  869    0    0    0    0    0    0
##   5               0    0    0    0 2262    0    0    0    0    0
##   6               0    0    0    0    0 1135    0    0    0    0
##   7               0    0    0    0    0    0 1304    0    0    0
##   8               0    0    0    0    0    0    0 1363    0    0
##   9               0    0    0    0    0    0    0    0  621    0
##   Satisfied       0    0    0    0    0    0    0    0    0  941
wvs.ext$satisf <- as.numeric(wvs.ext$satisf)

# Double-checking
class(wvs.ext$satisf)
## [1] "numeric"
table(wvs.ext$satisf)
## 
##    1    2    3    4    5    6    7    8    9   10 
##  767  427  902  869 2262 1135 1304 1363  621  941

Recoding variable “happy”

class(wvs.ext$happy)
## [1] "factor"
# We observe that now "happy" is a factor variable
table(wvs.ext$happy)
## 
##       Very happy      Quite happy   Not very happy Not at all happy 
##             1079             5516             3161              507
# There is a total of four categories that can be recoeded to numbers from 1 to 4

# Vusгalizing the distribution
ggplot(wvs.ext, aes(x=happy))+
  geom_bar(fill="#b3e2cd", na.rm = TRUE)+
  labs(x="Answer", y="Number of observations", title = "Distribution of observations of the variable happy")+
  theme_bw()+
  stat_count(aes(label=..count..), vjust=0, 
                          geom="text", position="identity")

# We recode these categories using numbers
wvs.ext$happy0 <-   ifelse(wvs.ext$happy=="Not at all happy",1,
                            ifelse(wvs.ext$happy=="Not very happy",2,
                                   ifelse(wvs.ext$happy=="Quite happy",3,
                                          ifelse(wvs.ext$happy=="Very happy",4, NA))))
table(wvs.ext$happy, wvs.ext$happy0)
##                   
##                       1    2    3    4
##   Very happy          0    0    0 1079
##   Quite happy         0    0 5516    0
##   Not very happy      0 3161    0    0
##   Not at all happy  507    0    0    0
wvs.ext$happy<- as.numeric(wvs.ext$happy)

# Double-checking
class(wvs.ext$happy)
## [1] "numeric"
table(wvs.ext$happy)
## 
##    1    2    3    4 
## 1079 5516 3161  507

Recoding variable “income”

class(wvs.ext$income)
## [1] "factor"
# Similarly, the variable "income" is now a factor
table(wvs.ext$income)
## 
##   Lower step  second step   Third step  Fourth step   Fifth step   Sixth step 
##          413          718         1363         1225         1301          872 
## Seventh step   Eigth step  Nineth step   Tenth step 
##         1002          772          512          448
# There are ten factor levels starting from "Lower step" up to "Tenth step". We shall recode those to numeric values 

wvs.ext$income1 <- ifelse (wvs.ext$income=="Eigth step",8, 
                            ifelse(wvs.ext$income=="Fifth step",5, 
                                   ifelse(wvs.ext$income=="Fourth step",4, 
                                          ifelse(wvs.ext$income=="Lower step",1,
                                                 ifelse(wvs.ext$income=="Nineth step",9,
                                                        ifelse(wvs.ext$income=="second step",2,
                                                               ifelse(wvs.ext$income=="Seventh step",7,
                                                                      ifelse(wvs.ext$income=="Sixth step",6, 
                                                                             ifelse(wvs.ext$income=="Tenth step",10,
                                                                                    ifelse(wvs.ext$income=="Third step",3,NA))))))))))                                                                               
table(wvs.ext$income, wvs.ext$income1)
##               
##                   1    2    3    4    5    6    7    8    9   10
##   Lower step    413    0    0    0    0    0    0    0    0    0
##   second step     0  718    0    0    0    0    0    0    0    0
##   Third step      0    0 1363    0    0    0    0    0    0    0
##   Fourth step     0    0    0 1225    0    0    0    0    0    0
##   Fifth step      0    0    0    0 1301    0    0    0    0    0
##   Sixth step      0    0    0    0    0  872    0    0    0    0
##   Seventh step    0    0    0    0    0    0 1002    0    0    0
##   Eigth step      0    0    0    0    0    0    0  772    0    0
##   Nineth step     0    0    0    0    0    0    0    0  512    0
##   Tenth step      0    0    0    0    0    0    0    0    0  448
wvs.ext$income<- as.numeric(wvs.ext$income)

# Double-checking
class(wvs.ext$income)
## [1] "numeric"
table(wvs.ext$income)
## 
##    1    2    3    4    5    6    7    8    9   10 
##  413  718 1363 1225 1301  872 1002  772  512  448

Recoding variable “choice”

class(wvs.ext$choice)
## [1] "factor"
table(wvs.ext$choice)
## 
##  None at all            2            3            4            5            6 
##          552          307          531          650         1786          894 
##            7            8            9 A great deal 
##         1021         1175          568         1239
# In case of the "choice" variable we face the same situation as we did with the variable "satisf": we need to recode the very first and last values

wvs.ext$choice1 <- ifelse (wvs.ext$choice=="None at all",1,
                            ifelse(wvs.ext$choice=="A great deal",10, 
                                   wvs.ext$choice))                                                                               
table(wvs.ext$choice1,wvs.ext$choice)
##     
##      None at all    2    3    4    5    6    7    8    9 A great deal
##   1          552    0    0    0    0    0    0    0    0            0
##   2            0  307    0    0    0    0    0    0    0            0
##   3            0    0  531    0    0    0    0    0    0            0
##   4            0    0    0  650    0    0    0    0    0            0
##   5            0    0    0    0 1786    0    0    0    0            0
##   6            0    0    0    0    0  894    0    0    0            0
##   7            0    0    0    0    0    0 1021    0    0            0
##   8            0    0    0    0    0    0    0 1175    0            0
##   9            0    0    0    0    0    0    0    0  568            0
##   10           0    0    0    0    0    0    0    0    0         1239
wvs.ext$choice<- as.numeric(wvs.ext$choice)

# Double-checking
class(wvs.ext$choice)
## [1] "numeric"
table(wvs.ext$choice)
## 
##    1    2    3    4    5    6    7    8    9   10 
##  552  307  531  650 1786  894 1021 1175  568 1239

PART 2. Index of happiness

Creating index of happiness

# We create a new variable "happyIND" by using function rowMeans on two rows (varables) - "happy0" and "satisf1"
wvs.ext$happyIND<- rowMeans(wvs.ext[c('happy0','satisf1')], na.rm=T)                            
summary(wvs.ext$happyIND)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   3.000   4.000   4.262   5.500  10.000      60
# Mean equals to 4.262 and median equals to 4
class(wvs.ext$happyIND)
## [1] "numeric"
# Visualizing the distribution

ggplot(wvs.ext, aes(x=happyIND))+
  geom_histogram(fill="#fdcdac")+
  labs(title = "Distribution of the new variable happyIND", x="HappyIND", y="Frequency")

The distribution looks quite normal though it is probably slightly right-skewed.

Standartization

By standartizing we technically put different variables on the same scale which makes it easier to interpret all of them in the same manner and compare with each other. To put it simply, a distirubtion that was standartised has a mean of 0 and a standard deviation of 1.

# Standartizing happyIND and creating a new variable
wvs.ext$happyINDscale <- scale(wvs.ext$happyIND)

# Visualization to observe that the distribution still has the same "shape" but its position on x-axis changed
ggplot(wvs.ext, aes(x=happyINDscale))+
  geom_histogram(fill="#fdcdac")+
  labs(title = "Distribution of the new variable happyIND", x="HappyIND", y="Frequency")

Modeling the relationship between the level of happiness, income, and age

model1 <- lm(wvs.ext$happyINDscale ~ as.numeric(wvs.ext$age) + as.numeric(wvs.ext$income1) + as.numeric(wvs.ext$choice))
summary(model1)
## 
## Call:
## lm(formula = wvs.ext$happyINDscale ~ as.numeric(wvs.ext$age) + 
##     as.numeric(wvs.ext$income1) + as.numeric(wvs.ext$choice))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9537 -0.5839  0.0046  0.5627  4.6403 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -0.8561183  0.0383086  -22.35   <2e-16 ***
## as.numeric(wvs.ext$age)     -0.0090692  0.0006403  -14.16   <2e-16 ***
## as.numeric(wvs.ext$income1)  0.0482466  0.0040943   11.78   <2e-16 ***
## as.numeric(wvs.ext$choice)   0.1280611  0.0039149   32.71   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8773 on 7928 degrees of freedom
##   (2849 observations deleted due to missingness)
## Multiple R-squared:  0.1816, Adjusted R-squared:  0.1813 
## F-statistic: 586.6 on 3 and 7928 DF,  p-value: < 2.2e-16

The model explains 18% of the variance in the dependent variable and, judging by the small p-value (<0.05) the model is better than having no model at all (null model that is just based on the mean). All predictors are statistically significant and the regression equation can be written out as:

\[Predicted Happiness index = - 0.8561183 - 0.0090692 * age + 0.0482466 * income + 0.1280611 * choice\]

The interpretation would be:

With all predictor variables equal to 0 the predicted index of happiness would be - 0.8561183. With each one-point increase in the value of variable age the happiness index decreases by 0.0090692 while in case of a one-point increase in both income and choice variables it increases by 0.0482466 and 0.1280611 accordingly

To put it in plain words, the level of happiness decreases as a person gets older and increases when a person earns more or has more control over his life.

# Do you have outliers and leverage?
plot(model1)

By loking at the very last graph we see that there are three outliers that do not get along with the majority of other cases. However, there are no dots outside of the line of Cook’s distance (the line is not even visible on the plot).

PART 3. Interaction models

Ok, the older the person is, the lower the level of happiness is; and the richer the person is, the happier hi/she is. And we can suggest that rich people are happier even in old age. Let’s test this hypothesis!

Model 2

wvs.ext$age <- as.numeric(wvs.ext$age)
wvs.ext$income1 <- as.numeric(wvs.ext$income1)
wvs.ext$choice <- as.numeric(wvs.ext$choice)

# Building a new model with an interaction effect
model2 <- lm(wvs.ext$happyINDscale~ choice + age * income1 , 
             data= wvs.ext)
summary(model2)
## 
## Call:
## lm(formula = wvs.ext$happyINDscale ~ choice + age * income1, 
##     data = wvs.ext)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9269 -0.5843  0.0024  0.5612  4.6413 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.8167208  0.0534746 -15.273  < 2e-16 ***
## choice       0.1278022  0.0039225  32.581  < 2e-16 ***
## age         -0.0104389  0.0014465  -7.217 5.82e-13 ***
## income1      0.0407445  0.0081998   4.969 6.87e-07 ***
## age:income1  0.0002793  0.0002645   1.056    0.291    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8772 on 7927 degrees of freedom
##   (2849 observations deleted due to missingness)
## Multiple R-squared:  0.1818, Adjusted R-squared:  0.1813 
## F-statistic: 440.2 on 4 and 7927 DF,  p-value: < 2.2e-16

Though the model still explains 18% of the variance and p-value is statistically significant, the interation effect is not statistically significant (0.291).

In this case we are dealing with an interaction effect between two continious variables: age and income. We don’t really have to explore the effect further since it is not significant, but it we were to write out the equation it would look as follows:

\[Predicted Happiness index = - 0.8167208 + 0.1278022 * choice - 0.0104389 * age + 0.0407445 * income + 0.0002793 * age * income \]
The addition of the interaction effect slightly changes the coefficients of other variables. If previously we were dealing with the pure effect of the age on happiness and concluded that as the person gets older the happiness decreases, now we suppose that this effect is different for different levels of income.

Now, when the income changes it changes both the result of “0.0407445 * income” and “0.0002793 * age * income” terms. The effect of age on happiness is now a sum of the pure effect and the interaction effect. Unfortunately, the interaction effect is not significant.

plot_model(model2, type = "int")

The graph also makes it clear that the interactive effect is insignificant, which means that with age the level of happiness decreases equally for both the poor and the rich: the lines have similar slopes even though they have different intercepts.

anova(model1, model2)
## Analysis of Variance Table
## 
## Model 1: wvs.ext$happyINDscale ~ as.numeric(wvs.ext$age) + as.numeric(wvs.ext$income1) + 
##     as.numeric(wvs.ext$choice)
## Model 2: wvs.ext$happyINDscale ~ choice + age * income1
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   7928 6101.1                           
## 2   7927 6100.3  1    0.8581 1.1151  0.291

We can compare the second model to the first one because they have the same predictors and we simply add the interaction effect to the new one. As we see, the Df is equal to 1 meaning that the new model has one additional parameter (interaction effect). However, the p-value is big (>0.05) which means that the additive model was actually better than the one with the interaction effect.

What about the choice? If older people feel they have completely free choice and control over their lives, are they happier than those who think “no choice at all”?

Model 3

# Building a new model
model3 <- lm(wvs.ext$happyINDscale~ age * choice + income1 , 
             data= wvs.ext)
summary(model3)
## 
## Call:
## lm(formula = wvs.ext$happyINDscale ~ age * choice + income1, 
##     data = wvs.ext)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8555 -0.5720 -0.0009  0.5641  4.6403 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.6454136  0.0570204 -11.319  < 2e-16 ***
## age         -0.0158706  0.0015073 -10.529  < 2e-16 ***
## choice       0.0943615  0.0078117  12.080  < 2e-16 ***
## income1      0.0472345  0.0040932  11.540  < 2e-16 ***
## age:choice   0.0011569  0.0002322   4.983  6.4e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8759 on 7927 degrees of freedom
##   (2849 observations deleted due to missingness)
## Multiple R-squared:  0.1842, Adjusted R-squared:  0.1838 
## F-statistic: 447.5 on 4 and 7927 DF,  p-value: < 2.2e-16
plot_model(model3, type = "int")

While the model still explains around 18% of variance and remains better than the null model based on the mean, the interaction effect is now statistically significant (6.4e-07). It is quite clear on the plot that the line for choice=1 goes down steeper than the line for choice=10. It can be concluded that index of happiness drops more sharply in case of those with those who have “no choice at all”. The trend line for those who feel like they have control over their life is a lot flatter.

The equation would look like this:

\[Predicted Happiness index = - 0.6454136 + 0.0943615 * choice - 0.0158706 * age + 0.0472345 * income + 0.0011569 * age * choice \]
Even though the effect of age on happiness is still negative, the interaction effect is positive (even though its small) and when the “choice” is high the decrease of happiness “slows down”a little.

anova(model1, model3)
## Analysis of Variance Table
## 
## Model 1: wvs.ext$happyINDscale ~ as.numeric(wvs.ext$age) + as.numeric(wvs.ext$income1) + 
##     as.numeric(wvs.ext$choice)
## Model 2: wvs.ext$happyINDscale ~ age * choice + income1
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   7928 6101.1                                  
## 2   7927 6082.1  1    19.049 24.828 6.401e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The third model is, in fact, better than the first one judging by the small p-value and the interaction effect improved it.

Model 4

# Let's take a look at all the variables in the dataset
str(wvs.ext)
## 'data.frame':    10781 obs. of  22 variables:
##  $ age          : num  21 41 28 14 20 9 14 23 21 31 ...
##  $ marital      : Factor w/ 8 levels "Married","Living together as married",..: 1 1 1 1 1 6 1 1 1 1 ...
##  $ edu          : Factor w/ 9 levels "No formal education",..: 3 4 6 5 7 6 7 3 4 6 ...
##  $ income       : num  NA 3 6 4 4 4 NA 3 3 1 ...
##  $ happy        : num  2 3 1 1 2 2 2 2 1 2 ...
##  $ satisf       : num  5 5 4 10 5 5 6 5 10 7 ...
##  $ country      : Factor w/ 1 level "RU    ": 1 1 1 1 1 1 1 1 1 1 ...
##  $ trust        : Factor w/ 2 levels "Most people can be trusted",..: 1 2 2 2 2 2 1 2 2 1 ...
##  $ choice       : num  NA 5 4 9 8 5 10 5 9 8 ...
##  $ divorce      : Factor w/ 10 levels "Never justifiable",..: NA 1 3 1 1 1 1 1 2 5 ...
##  $ Family       : Factor w/ 4 levels "Very important",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Friends      : Factor w/ 4 levels "Very important",..: 2 3 2 2 2 2 1 2 3 1 ...
##  $ LeisureTime  : Factor w/ 4 levels "Very important",..: NA NA 3 1 3 2 2 2 1 2 ...
##  $ Politics     : Factor w/ 4 levels "Very important",..: 4 NA 3 3 3 2 1 3 2 3 ...
##  $ Work         : Factor w/ 4 levels "Very important",..: 2 NA 1 1 1 2 1 1 1 1 ...
##  $ Religion     : Factor w/ 4 levels "Very important",..: 1 NA 1 1 2 2 1 1 1 1 ...
##  $ satisf1      : num  5 5 4 10 5 5 6 5 10 7 ...
##  $ happy0       : num  3 2 4 4 3 3 3 3 4 3 ...
##  $ income1      : num  NA 3 6 4 4 4 NA 3 3 1 ...
##  $ choice1      : num  NA 5 4 9 8 5 10 5 9 8 ...
##  $ happyIND     : num  4 3.5 4 7 4 4 4.5 4 7 5 ...
##  $ happyINDscale: num [1:10781, 1] -0.17 -0.495 -0.17 1.779 -0.17 ...
##   ..- attr(*, "scaled:center")= num 4.26
##   ..- attr(*, "scaled:scale")= num 1.54
table(wvs.ext$trust)
## 
## Most people can be trusted       Canґt be too careful 
##                       2821                       7395

After checking all the variables (and trying to recode them) I have come to the conclusion that one of the variables that can actually be used for this model is “trust” (which has only two levels). I’ve decided to build a model with an interaction effect between age and trust.

# Building the model
model4 <- lm(happyINDscale ~  income1 + choice + age * trust, data=wvs.ext)
summary(model4)
## 
## Call:
## lm(formula = happyINDscale ~ income1 + choice + age * trust, 
##     data = wvs.ext)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9372 -0.5815  0.0061  0.5595  4.4663 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -0.824860   0.052183 -15.807  < 2e-16 ***
## income1                        0.048832   0.004143  11.786  < 2e-16 ***
## choice                         0.126242   0.003970  31.801  < 2e-16 ***
## age                           -0.006840   0.001230  -5.561 2.77e-08 ***
## trustCanґt be too careful     -0.027938   0.046373  -0.602    0.547    
## age:trustCanґt be too careful -0.003164   0.001440  -2.197    0.028 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8725 on 7656 degrees of freedom
##   (3119 observations deleted due to missingness)
## Multiple R-squared:  0.1832, Adjusted R-squared:  0.1827 
## F-statistic: 343.5 on 5 and 7656 DF,  p-value: < 2.2e-16

As we can see, the model still explains around 18% of the variance in the outcome variable and since p-value is small (<0.05) we can conclude that the model is better than the null model based on the mean value. The interaction effect is significant (0.028) even though the the pure effect of the variable trust is not statistically significant. Initially, judging by the coefficient for the trust the level of happiness would decrease by 0.027938 when the variable “trust” takes on the value of “Can’t be too careful”. Interestengly enough, that means that people who don’t trust others are less happy. The trend for other variables remains the same.

The equation would be as follows:

\[Predicted Happiness index = - 0.824860 + 0.126242 * choice - 0.006840 * age + 0.048832 * income - 0.003164 * age * trust \]

plot_model(model4, type = "int")

We observe that the line for those people who don’t really trust others goes down a little bit steeper than it does for those who think that most people can be trusted. Even though the relationship between age and happiness is still negative, this effect is “slowed down” by the “trust variable” for the latter group of people. Both lines have a similar intercept and the actual difference in happiness levels of these two groups can only be observed later in life. We may hypothesize that those people who trust others more make, on average, more friends as they grow older and later in life its those social connections that make them a little bit happier than those who didn’t trust people and were alone for a major part of their life.