library(readr)
library (dplyr)
library(lattice)
library(ggplot2)
# install.packages("mi")
library(mi)
library(missForest)
library(misty)
library(lme4)
library(sjPlot)
library(performance)
library (effects)
library(DHARMa)
library(car)
library(see)
library (insight)
library (mvnmle)

wvs7 <- read_csv("~/Downloads/wvs7.csv")

Introduction

As it was seen in the literature in the field of studying financial satisfaction, financial satisfaction is usually related to diverse factors including financial behaviors, financial stress levels, income and financial knowledge. I decided to focus on financial behavior as a predictor since it was one of the most mentioned predictors in papers that I looked through (Falahati, et al., 2012; Joo & Grable, 2004; Paudel & Ghising, 2024). In one of the papers, it was said how greater financial satisfaction is a result of having a positive attitude towards finances as well as having adequate financial knowledge and using efficient financial management techniques (Paudel & Ghising, 2024). The way that people manage their finances, saving and investing is a major factor in determining how satisfied they are with their overall financial situation. Therefore, I decided to focus on variables that show respondents’ financial situation and behaviour such as if they have enough money for some necessities, income, savings patterns, etc.

Research question: What are the key factors within financial behaviour that affect satisfaction with the financial situation?

References: Paudel, S. R., & Ghising, M. (2024). Predictors of Financial Satisfaction: Mediating Role of Financial Behavior. Journal of Emerging Management Studies, 2(2), 1-18. Falahati, L., Sabri, M. F., & Paim, L. H. (2012). Assessment a model of financial satisfaction predictors: Examining the mediate effect of financial behaviour and financial strain. World Applied Sciences Journal, 20(2), 190-197. Joo, S. H., & Grable, J. E. (2004). An exploratory framework of the determinants of financial satisfaction. Journal of family and economic Issues, 25, 25-50.

Variables

Outcome: Satisfaction with the financial situation (Q50- How satisfied are you with the financial situation of your household?)

First level predictors:

Q51 Frequency you/family (last 12 month): Gone without enough food to eat
Q53 Frequency you/family (last 12 month): Gone without needed medicine or treatment that you needed
Q286 Family savings during past year
Q285 Are you the chief wage earner in your house
Q279 Employment status
Q288 Income
Q262 Age
Q260 Gender

Second level predictors:

B_COUNTRY countries
regionWB Geographic region with 7 groups
clfhrat Civil Liberties rating (1=high to 7=low)

Hypotheses

I have the following hypotheses:

H1: Those who had experience with living without enough food to eat, have lower financial satisfaction

H2: Those who had experience with living without needed medicine and treatment, have lower financial satisfaction

H3: People who save money, have higher level of financial satisfaction

H4: Primary wage earners in a household will have lower level of financial satisfaction

H5: Unemployed people will have lower level of financial satisfaction compared to employed people

H6: People with higher income, will have higher level of financial satisfaction

H7: Financial satisfaction levels will differ between countries from different geographic regions

H8: Countries with higher mean income, will have higher level of financial satisfaction

H9: Countries with higher level of Civil Liberties rating, will have higher level of financial satisfaction

Selecting variables.

data1 = wvs7 %>% select (B_COUNTRY, Q50, Q262, Q260, Q288, Q51, Q53, Q286, Q285, Q279, regionWB, clfhrat)

Selection of countries.

I have 7 regions in my data set, so I decided to choose 3 countries for each variable with one exception where I had to pick 2. Therefore, I have 20 countries in my data set.

East Asia & Pacific - 7 China, Japan, Indonesia

Europe & Central Asia - 6 Germany, Serbia, Russia

Latin America & Caribbean - 5 Argentina, Mexico, Columbia

Middle East & North Africa - 4 Egypt, Iran, Morocco

North America - 3 United States, Canada

South Asia - 2 India, Bangladesh, Pakistan

Sub-Saharan Africa - 1 Nigeria, Kenya, Zimbabwe

data = data1 %>% filter (B_COUNTRY %in% c(156, 392, 360, 276, 688, 643, 32, 484, 170, 818, 364, 504, 840, 124, 356, 50, 586, 566, 404, 716))

lookup = c(country = "B_COUNTRY", fin_satisf="Q50", age="Q262", sex="Q260", income="Q288",
            food="Q51", medicine="Q53", savings="Q286", earner="Q285", employement="Q279", region="regionWB", liberties="clfhrat")

data = rename(data, all_of(lookup))

summary (data)

##     country      fin_satisf          age              sex        
##  Min.   : 32   Min.   :-5.000   Min.   : -5.00   Min.   :-2.000  
##  1st Qu.:156   1st Qu.: 5.000   1st Qu.: 28.00   1st Qu.: 1.000  
##  Median :364   Median : 6.000   Median : 40.00   Median : 2.000  
##  Mean   :405   Mean   : 6.104   Mean   : 41.73   Mean   : 1.507  
##  3rd Qu.:586   3rd Qu.: 8.000   3rd Qu.: 53.00   3rd Qu.: 2.000  
##  Max.   :840   Max.   :10.000   Max.   :100.00   Max.   : 2.000  
##      income            food         medicine         savings      
##  Min.   :-5.000   Min.   :-5.0   Min.   :-5.000   Min.   :-5.000  
##  1st Qu.: 3.000   1st Qu.: 3.0   1st Qu.: 2.000   1st Qu.: 1.000  
##  Median : 5.000   Median : 4.0   Median : 4.000   Median : 2.000  
##  Mean   : 4.571   Mean   : 3.4   Mean   : 3.227   Mean   : 1.965  
##  3rd Qu.: 6.000   3rd Qu.: 4.0   3rd Qu.: 4.000   3rd Qu.: 2.000  
##  Max.   :10.000   Max.   : 4.0   Max.   : 4.000   Max.   : 4.000  
##      earner        employement         region        liberties    
##  Min.   :-5.000   Min.   :-5.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.:3.000   1st Qu.:1.000  
##  Median : 2.000   Median : 3.000   Median :4.000   Median :4.000  
##  Mean   : 1.509   Mean   : 3.196   Mean   :4.234   Mean   :3.588  
##  3rd Qu.: 2.000   3rd Qu.: 5.000   3rd Qu.:6.000   3rd Qu.:5.000  
##  Max.   : 2.000   Max.   : 8.000   Max.   :7.000   Max.   :6.000

Descriptive analysis.

Changing type of variables.

data$sex = as.factor (data$sex)
data$country = as.factor (data$country)
data$food = as.factor (data$food)
data$medicine = as.factor (data$medicine)
data$savings = as.factor (data$savings)
data$earner = as.factor (data$earner)
data$employement = as.factor (data$employement)
data$region = as.factor (data$region)

summary (data)

##     country        fin_satisf          age         sex            income      
##  124    : 4018   Min.   :-5.000   Min.   : -5.00   -2:    5   Min.   :-5.000  
##  360    : 3200   1st Qu.: 5.000   1st Qu.: 28.00   -1:    2   1st Qu.: 3.000  
##  156    : 3036   Median : 6.000   Median : 40.00   1 :17406   Median : 5.000  
##  840    : 2596   Mean   : 6.104   Mean   : 41.73   2 :17942   Mean   : 4.571  
##  586    : 1995   3rd Qu.: 8.000   3rd Qu.: 53.00              3rd Qu.: 6.000  
##  643    : 1810   Max.   :10.000   Max.   :100.00              Max.   :10.000  
##  (Other):18700                                                                
##  food       medicine   savings    earner      employement    region  
##  -5:    1   -5:    2   -5:   26   -5:   40   1      :11834   1:3718  
##  -2:   47   -2:   60   -2:  349   -2:  164   3      : 5438   2:4887  
##  -1:   54   -1:   66   -1:  515   -1:   99   5      : 5226   3:6614  
##  1 : 1897   1 : 2650   1 :10230   1 :16112   4      : 3695   4:3899  
##  2 : 4661   2 : 6074   2 :15579   2 :18940   7      : 3128   5:4264  
##  3 : 5641   3 : 6508   3 : 5207              2      : 2837   6:4384  
##  4 :23054   4 :19995   4 : 3449              (Other): 3197   7:7589  
##    liberties    
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :4.000  
##  Mean   :3.588  
##  3rd Qu.:5.000  
##  Max.   :6.000  
##

Added names of the country

data$country <- factor(data$country, 
                            levels = c(156, 392, 360, 276, 688, 643, 32, 484, 170, 818, 364, 504, 840, 124, 356, 50, 586, 566, 404, 716), 
                            labels = c("China", "Japan", "Indonesia", "Germany", "Serbia", "Russia", "Argentina", "Mexico", "Colombia", "Egypt", "Iran", "Morocco", "USA", "Canada", "India", "Bangladesh", "Pakistan", "Nigeria", "Kenya", "Zimbabwe"))

Analyzing distribution of variables.

densityplot(~data$age | data$country)

In some countries the distribution of age is skewed to the right which probably means that there is a lot of NA since NA are coded as negative numbers.

densityplot(~data$fin_satisf | data$country)

Overall, individuals tend to report high levels of financial satisfaction (more than 5). However, there are some differences between countries - some are more drastically skewed to the left

densityplot(~data$income | data$country)

Seem quite close to normal distribution in most countries but the missings in some cases create the peak on the left.

ggplot(data, aes(x=sex)) + geom_bar(fill = "lightblue") + facet_wrap(vars(country))

The proportions seem the same in all countries.

ggplot(data, aes(x=food)) + geom_bar(fill = "lightblue") + facet_wrap(vars(country))

Mostly people report category “Never” - 4, but there is already seen drastic difference between countries where in some cases the levels are at the same level while for some there are much more responses for “Never” - 4.

ggplot(data, aes(x=medicine)) + geom_bar(fill = "lightblue") + facet_wrap(vars(country))

The same as for the previous variabe which make senses since they measure similar things.

ggplot(data, aes(x=savings)) + geom_bar(fill = "lightblue") + facet_wrap(vars(country))

The distribution seem to differ a lot between countries but overall people mostly report 2 which is “Just get by”.

ggplot(data, aes(x=earner)) + geom_bar(fill = "lightblue") + facet_wrap(vars(country))

Different trends depending on the country but the difference does not seem too drastic in most cases.

ggplot(data, aes(x=employement)) + geom_bar(fill = "lightblue") + facet_wrap(vars(country))

Employed people usually take the most in the sample but the difference between categories is different within countries.

Creating 2nd level variable - mean income.

data = data %>%
  group_by(country) %>%
  mutate(mean_income = mean(income, na.rm=T)) %>% ungroup()

Checking how the data is distributed within variables.

table(data$sex)

## 
##    -2    -1     1     2 
##     5     2 17406 17942

table(data$income)

## 
##   -5   -2   -1    1    2    3    4    5    6    7    8    9   10 
##   19  416  334 3729 2437 3934 4606 7468 5254 3981 1926  611  640

table(data$fin_satisf)

## 
##   -5   -2   -1    1    2    3    4    5    6    7    8    9   10 
##    3   80   64 2611 1247 2110 2605 5075 4514 5129 5374 2594 3949

table(data$food)

## 
##    -5    -2    -1     1     2     3     4 
##     1    47    54  1897  4661  5641 23054

table(data$medicine)

## 
##    -5    -2    -1     1     2     3     4 
##     2    60    66  2650  6074  6508 19995

table(data$savings)

## 
##    -5    -2    -1     1     2     3     4 
##    26   349   515 10230 15579  5207  3449

table(data$earner)

## 
##    -5    -2    -1     1     2 
##    40   164    99 16112 18940

table(data$employement)

## 
##    -5    -2    -1     1     2     3     4     5     6     7     8 
##    43   347    65 11834  2837  5438  3695  5226  2305  3128   437

table(data$region)

## 
##    1    2    3    4    5    6    7 
## 3718 4887 6614 3899 4264 4384 7589

table(data$liberties)

## 
##    1    2    3    4    5    6 
## 9495 1003 5999 4466 6847 7545

Keeping in mind that negative observations are NA, it can be seen that there is enough data in all categories due to the large sample, therefore there is no variable that have to be recoded (Added interpretation of data saturation)

Missings.

summary (data)

##       country        fin_satisf          age         sex       
##  Canada   : 4018   Min.   :-5.000   Min.   : -5.00   -2:    5  
##  Indonesia: 3200   1st Qu.: 5.000   1st Qu.: 28.00   -1:    2  
##  China    : 3036   Median : 6.000   Median : 40.00   1 :17406  
##  USA      : 2596   Mean   : 6.104   Mean   : 41.73   2 :17942  
##  Pakistan : 1995   3rd Qu.: 8.000   3rd Qu.: 53.00             
##  Russia   : 1810   Max.   :10.000   Max.   :100.00             
##  (Other)  :18700                                               
##      income       food       medicine   savings    earner      employement   
##  Min.   :-5.000   -5:    1   -5:    2   -5:   26   -5:   40   1      :11834  
##  1st Qu.: 3.000   -2:   47   -2:   60   -2:  349   -2:  164   3      : 5438  
##  Median : 5.000   -1:   54   -1:   66   -1:  515   -1:   99   5      : 5226  
##  Mean   : 4.571   1 : 1897   1 : 2650   1 :10230   1 :16112   4      : 3695  
##  3rd Qu.: 6.000   2 : 4661   2 : 6074   2 :15579   2 :18940   7      : 3128  
##  Max.   :10.000   3 : 5641   3 : 6508   3 : 5207              2      : 2837  
##                   4 :23054   4 :19995   4 : 3449              (Other): 3197  
##  region     liberties      mean_income   
##  1:3718   Min.   :1.000   Min.   :3.455  
##  2:4887   1st Qu.:1.000   1st Qu.:4.095  
##  3:6614   Median :4.000   Median :4.443  
##  4:3899   Mean   :3.588   Mean   :4.571  
##  5:4264   3rd Qu.:5.000   3rd Qu.:4.952  
##  6:4384   Max.   :6.000   Max.   :5.615  
##  7:7589

Recoding missings as NA.

data = data %>%
  mutate(income = ifelse(income < 1, NA, income))

data = data %>%
  mutate(fin_satisf = ifelse(fin_satisf < 1, NA, fin_satisf))

data = data %>%
  mutate(age = ifelse(age < 1, NA, age))

data = data %>%
  mutate(liberties = ifelse(liberties < 1, NA, liberties))

data = data %>%
  mutate(sex = case_when(
    sex %in% c("-1", "-2") ~ NA,
    TRUE ~ as.character(sex))) %>%
  mutate(sex = factor(sex))

data = data %>%
  mutate(food = case_when(
    food %in% c("-1", "-2", "-5") ~ NA,
    TRUE ~  as.character(food)))%>%
  mutate(food = factor(food))

data = data %>%
  mutate(medicine = case_when(
    medicine %in% c("-1", "-2", "-5") ~ NA,
    TRUE ~ as.character(medicine))) %>% 
    mutate(medicine = factor(medicine))

data = data %>%
  mutate(savings = case_when(
    savings %in% c("-1", "-2", "-5") ~ NA,
    TRUE ~ savings)) %>% 
    mutate(savings = factor(savings))

data = data %>%
  mutate(earner = case_when(
    earner %in% c("-1", "-2", "-5") ~ NA,
    TRUE ~ earner)) %>% 
   mutate(earner = factor(earner))

data = data %>%
  mutate(employement = case_when(
    employement %in% c("-1", "-2", "-5", "-4") ~ NA,
    TRUE ~ employement)) %>% 
      mutate(employement = factor(employement))

Renaming levels.

data$food <- factor(data$food, 
                            levels = c(1, 2, 3, 4), 
                            labels = c("Often", "Sometimes", "Rarely", "Never"))

data$medicine <- factor(data$medicine, 
                            levels = c(1, 2, 3, 4), 
                            labels = c("Often", "Sometimes", "Rarely", "Never"))

data$savings <- factor(data$savings, 
                            levels = c(1, 2, 3, 4), 
                            labels = c("Save money", "Just get by", "Spent some savings and borrowed money", "Spent savings and borrowed money"))

data$earner <- factor(data$earner, 
                            levels = c(1, 2), 
                            labels = c("Yes", "No"))

data$employement <- factor(data$employement, 
                            levels = c(1, 2, 3, 4, 5, 6, 7, 8), 
                            labels = c("Full time", "Part time", "Self employed", "Retired/pensioned", "Housewife not otherwise employed", "Student", "Unemployed", "Other"))

data$sex <- factor(data$sex, 
                            levels = c(1, 2), 
                            labels = c("Male", "Female"))

data$region <- factor(data$region, 
                            levels = c(1, 2, 3, 4, 5, 6, 7), 
                            labels = c("Sub-Saharan Africa", "South Asia", "North America", "Middle East/North Africa", "LA|Caribbean", "Europe|Central Asia", "East Asia | Pacific"))

summary (data)

##       country        fin_satisf          age             sex       
##  Canada   : 4018   Min.   : 1.000   Min.   : 16.00   Male  :17406  
##  Indonesia: 3200   1st Qu.: 5.000   1st Qu.: 28.00   Female:17942  
##  China    : 3036   Median : 6.000   Median : 40.00   NA's  :    7  
##  USA      : 2596   Mean   : 6.136   Mean   : 41.77                 
##  Pakistan : 1995   3rd Qu.: 8.000   3rd Qu.: 53.00                 
##  Russia   : 1810   Max.   :10.000   Max.   :100.00                 
##  (Other)  :18700   NA's   :147      NA's   :25                     
##      income              food            medicine    
##  Min.   : 1.000   Often    : 1897   Often    : 2650  
##  1st Qu.: 3.000   Sometimes: 4661   Sometimes: 6074  
##  Median : 5.000   Rarely   : 5641   Rarely   : 6508  
##  Mean   : 4.709   Never    :23054   Never    :19995  
##  3rd Qu.: 6.000   NA's     :  102   NA's     :  128  
##  Max.   :10.000                                      
##  NA's   :769                                         
##                                   savings       earner     
##  Save money                           :10230   Yes :16112  
##  Just get by                          :15579   No  :18940  
##  Spent some savings and borrowed money: 5207   NA's:  303  
##  Spent savings and borrowed money     : 3449               
##  NA's                                 :  890               
##                                                            
##                                                            
##                            employement                         region    
##  Full time                       :11834   Sub-Saharan Africa      :3718  
##  Self employed                   : 5438   South Asia              :4887  
##  Housewife not otherwise employed: 5226   North America           :6614  
##  Retired/pensioned               : 3695   Middle East/North Africa:3899  
##  Unemployed                      : 3128   LA|Caribbean            :4264  
##  (Other)                         : 5579   Europe|Central Asia     :4384  
##  NA's                            :  455   East Asia | Pacific     :7589  
##    liberties      mean_income   
##  Min.   :1.000   Min.   :3.455  
##  1st Qu.:1.000   1st Qu.:4.095  
##  Median :4.000   Median :4.443  
##  Mean   :3.588   Mean   :4.571  
##  3rd Qu.:5.000   3rd Qu.:4.952  
##  Max.   :6.000   Max.   :5.615  
##

Analising patterns in missings.

str (data)

## tibble [35,355 × 13] (S3: tbl_df/tbl/data.frame)
##  $ country    : Factor w/ 20 levels "China","Japan",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ fin_satisf : num [1:35355] 4 5 10 10 6 10 3 4 10 5 ...
##  $ age        : num [1:35355] 50 34 35 71 37 58 32 34 65 27 ...
##  $ sex        : Factor w/ 2 levels "Male","Female": 1 2 1 2 2 2 1 2 1 1 ...
##  $ income     : num [1:35355] 5 3 5 7 4 4 5 4 5 5 ...
##  $ food       : Factor w/ 4 levels "Often","Sometimes",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ medicine   : Factor w/ 4 levels "Often","Sometimes",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ savings    : Factor w/ 4 levels "Save money","Just get by",..: 2 NA NA NA 4 NA 4 3 2 2 ...
##  $ earner     : Factor w/ 2 levels "Yes","No": 1 1 1 2 1 2 1 2 1 2 ...
##  $ employement: Factor w/ 8 levels "Full time","Part time",..: 1 1 1 4 3 4 3 2 3 3 ...
##  $ region     : Factor w/ 7 levels "Sub-Saharan Africa",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ liberties  : num [1:35355] 2 2 2 2 2 2 2 2 2 2 ...
##  $ mean_income: num [1:35355] 4.73 4.73 4.73 4.73 4.73 ...

data = as.data.frame(data)
mdf = missing_data.frame (data) 
show (mdf)

## Object of class missing_data.frame with 35355 observations on 13 variables
## 
## There are 61 missing data patterns
## 
## Append '@patterns' to this missing_data.frame to access the corresponding pattern for every observation or perhaps use table()
## 
##                              type missing method  model
## country     unordered-categorical       0   <NA>   <NA>
## fin_satisf             continuous     147    ppd linear
## age                    continuous      25    ppd linear
## sex                        binary       7    ppd  logit
## income                 continuous     769    ppd linear
## food        unordered-categorical     102    ppd mlogit
## medicine    unordered-categorical     128    ppd mlogit
## savings     unordered-categorical     890    ppd mlogit
## earner                     binary     303    ppd  logit
## employement unordered-categorical     455    ppd mlogit
## region      unordered-categorical       0   <NA>   <NA>
## liberties              continuous       0   <NA>   <NA>
## mean_income            continuous       0   <NA>   <NA>
## 
##                  family     link transformation
## country            <NA>     <NA>           <NA>
## fin_satisf     gaussian identity    standardize
## age            gaussian identity    standardize
## sex            binomial    logit           <NA>
## income         gaussian identity    standardize
## food        multinomial    logit           <NA>
## medicine    multinomial    logit           <NA>
## savings     multinomial    logit           <NA>
## earner         binomial    logit           <NA>
## employement multinomial    logit           <NA>
## region             <NA>     <NA>           <NA>
## liberties          <NA>     <NA>    standardize
## mean_income        <NA>     <NA>    standardize

Added correction to mdf object

mdf = change (mdf, y = "food", what = "type", to = "ordered-categorical")
mdf = change (mdf, y = "medicine", what = "type", to = "ordered-categorical")
mdf = change (mdf, y = "savings", what = "type", to = "ordered-categorical")
show (mdf)

## Object of class missing_data.frame with 35355 observations on 13 variables
## 
## There are 61 missing data patterns
## 
## Append '@patterns' to this missing_data.frame to access the corresponding pattern for every observation or perhaps use table()
## 
##                              type missing method  model
## country     unordered-categorical       0   <NA>   <NA>
## fin_satisf             continuous     147    ppd linear
## age                    continuous      25    ppd linear
## sex                        binary       7    ppd  logit
## income                 continuous     769    ppd linear
## food          ordered-categorical     102    ppd ologit
## medicine      ordered-categorical     128    ppd ologit
## savings       ordered-categorical     890    ppd ologit
## earner                     binary     303    ppd  logit
## employement unordered-categorical     455    ppd mlogit
## region      unordered-categorical       0   <NA>   <NA>
## liberties              continuous       0   <NA>   <NA>
## mean_income            continuous       0   <NA>   <NA>
## 
##                  family     link transformation
## country            <NA>     <NA>           <NA>
## fin_satisf     gaussian identity    standardize
## age            gaussian identity    standardize
## sex            binomial    logit           <NA>
## income         gaussian identity    standardize
## food        multinomial    logit           <NA>
## medicine    multinomial    logit           <NA>
## savings     multinomial    logit           <NA>
## earner         binomial    logit           <NA>
## employement multinomial    logit           <NA>
## region             <NA>     <NA>           <NA>
## liberties          <NA>     <NA>    standardize
## mean_income        <NA>     <NA>    standardize

na.test(data)

##  Little's MCAR Test
## 
##       n nIncomp nPattern    chi2  df     p
##   35355    2161       61 2390.68 599 0.000

P-value is less than 0.05, so we reject null hypothesis - not MCAR.

Assessing the patterns.

image (mdf)

Seems like the biggest proportion of missings are in variables savings, earner, employment, liberties. They are not interlinking and are not a big part of the data, so we can impute it.

Imputation.

Breaking dataset into data sets for each country.

cntry1 = data %>% filter (country == "China")
cntry2 = data %>% filter (country == "Japan")
cntry3 = data %>% filter (country == "Indonesia")
cntry4 = data %>% filter (country == "Germany")
cntry5 = data %>% filter (country == "Serbia")
cntry6 = data %>% filter (country == "Russia")
cntry8 = data %>% filter (country == "Argentina")
cntry9 = data %>% filter (country == "Mexico")
cntry10 = data %>% filter (country == "Colombia")
cntry11 = data %>% filter (country == "Egypt")
cntry12 = data %>% filter (country == "Iran")
cntry13 = data %>% filter (country == "Morocco")
cntry14 = data %>% filter (country == "USA")
cntry15 = data %>% filter (country == "Canada")
cntry17 = data %>% filter (country == "India")
cntry18 = data %>% filter (country == "Bangladesh")
cntry19 = data %>% filter (country == "Pakistan")
cntry20 = data %>% filter (country == "Nigeria")
cntry21 = data %>% filter (country == "Kenya")
cntry22 = data %>% filter (country == "Zimbabwe")

set.seed(123)
imputed_missForest = missForest(cntry1, verbose = F)

data_imputed1 <- data.frame (
  country = cntry1$country,
  income_n = imputed_missForest$ximp$income,
  fin_satisf_n = imputed_missForest$ximp$fin_satisf,
  age_n = imputed_missForest$ximp$age,
  sex_n = imputed_missForest$ximp$sex,
  food_n = imputed_missForest$ximp$food,
  medicine_n = imputed_missForest$ximp$medicine,
  savings_n = imputed_missForest$ximp$savings, 
  earner_n = imputed_missForest$ximp$earner,
  employement_n = imputed_missForest$ximp$employement,
  region = cntry1$region,
  liberties_n = imputed_missForest$ximp$liberties, 
  mean_income = cntry1$mean_income
)

set.seed(123) 
imputed_missForest2 = missForest(cntry2, verbose = F)

data_imputed2 <- data.frame (
  country = cntry2$country,
  income_n = imputed_missForest2$ximp$income,
  fin_satisf_n = imputed_missForest2$ximp$fin_satisf,
  age_n = imputed_missForest2$ximp$age,
  sex_n = imputed_missForest2$ximp$sex,
  food_n = imputed_missForest2$ximp$food,
  medicine_n = imputed_missForest2$ximp$medicine,
  savings_n = imputed_missForest2$ximp$savings, 
  earner_n = imputed_missForest2$ximp$earner,
  employement_n = imputed_missForest2$ximp$employement,
  region = cntry2$region,
  liberties_n = imputed_missForest2$ximp$liberties, 
  mean_income = cntry2$mean_income
)

set.seed(123)
imputed_missForest3 = missForest(cntry3, verbose = F)

data_imputed3 <- data.frame (
  country = cntry3$country,
  income_n = imputed_missForest3$ximp$income,
  fin_satisf_n = imputed_missForest3$ximp$fin_satisf,
  age_n = imputed_missForest3$ximp$age,
  sex_n = imputed_missForest3$ximp$sex,
  food_n = imputed_missForest3$ximp$food,
  medicine_n = imputed_missForest3$ximp$medicine,
  savings_n = imputed_missForest3$ximp$savings, 
  earner_n = imputed_missForest3$ximp$earner,
  employement_n = imputed_missForest3$ximp$employement,
  region = cntry3$region,
  liberties_n = imputed_missForest3$ximp$liberties, 
  mean_income = cntry3$mean_income
)

set.seed(123)
imputed_missForest4 = missForest(cntry4, verbose = F)

data_imputed4 <- data.frame (
  country = cntry4$country,
  income_n = imputed_missForest4$ximp$income,
  fin_satisf_n = imputed_missForest4$ximp$fin_satisf,
  age_n = imputed_missForest4$ximp$age,
  sex_n = imputed_missForest4$ximp$sex,
  food_n = imputed_missForest4$ximp$food,
  medicine_n = imputed_missForest4$ximp$medicine,
  savings_n = imputed_missForest4$ximp$savings, 
  earner_n = imputed_missForest4$ximp$earner,
  employement_n = imputed_missForest4$ximp$employement,
  region = cntry4$region,
  liberties_n = imputed_missForest4$ximp$liberties, 
  mean_income = cntry4$mean_income
)

set.seed(123)
imputed_missForest5 = missForest(cntry5, verbose = F)

data_imputed5 <- data.frame (
  country = cntry5$country,
  income_n = imputed_missForest5$ximp$income,
  fin_satisf_n = imputed_missForest5$ximp$fin_satisf,
  age_n = imputed_missForest5$ximp$age,
  sex_n = imputed_missForest5$ximp$sex,
  food_n = imputed_missForest5$ximp$food,
  medicine_n = imputed_missForest5$ximp$medicine,
  savings_n = imputed_missForest5$ximp$savings, 
  earner_n = imputed_missForest5$ximp$earner,
  employement_n = imputed_missForest5$ximp$employement,
  region = cntry5$region,
  liberties_n = imputed_missForest5$ximp$liberties, 
  mean_income = cntry5$mean_income
)

set.seed(123)
imputed_missForest6 = missForest(cntry6, verbose = F)

data_imputed6 <- data.frame (
  country = cntry6$country,
  income_n = imputed_missForest6$ximp$income,
  fin_satisf_n = imputed_missForest6$ximp$fin_satisf,
  age_n = imputed_missForest6$ximp$age,
  sex_n = imputed_missForest6$ximp$sex,
  food_n = imputed_missForest6$ximp$food,
  medicine_n = imputed_missForest6$ximp$medicine,
  savings_n = imputed_missForest6$ximp$savings, 
  earner_n = imputed_missForest6$ximp$earner,
  employement_n = imputed_missForest6$ximp$employement,
  region = cntry6$region,
  liberties_n = imputed_missForest6$ximp$liberties, 
  mean_income = cntry6$mean_income
)

set.seed(123)
imputed_missForest8 = missForest(cntry8, verbose = F)

data_imputed8 <- data.frame (
  country = cntry8$country,
  income_n = imputed_missForest8$ximp$income,
  fin_satisf_n = imputed_missForest8$ximp$fin_satisf,
  age_n = imputed_missForest8$ximp$age,
  sex_n = imputed_missForest8$ximp$sex,
  food_n = imputed_missForest8$ximp$food,
  medicine_n = imputed_missForest8$ximp$medicine,
  savings_n = imputed_missForest8$ximp$savings, 
  earner_n = imputed_missForest8$ximp$earner,
  employement_n = imputed_missForest8$ximp$employement,
  region = cntry8$region,
  liberties_n = imputed_missForest8$ximp$liberties, 
  mean_income = cntry8$mean_income
)

set.seed(123)
imputed_missForest9 = missForest(cntry9, verbose = F)

data_imputed9 <- data.frame (
  country = cntry9$country,
  income_n = imputed_missForest9$ximp$income,
  fin_satisf_n = imputed_missForest9$ximp$fin_satisf,
  age_n = imputed_missForest9$ximp$age,
  sex_n = imputed_missForest9$ximp$sex,
  food_n = imputed_missForest9$ximp$food,
  medicine_n = imputed_missForest9$ximp$medicine,
  savings_n = imputed_missForest9$ximp$savings, 
  earner_n = imputed_missForest9$ximp$earner,
  employement_n = imputed_missForest9$ximp$employement,
  region = cntry9$region,
  liberties_n = imputed_missForest9$ximp$liberties, 
  mean_income = cntry9$mean_income
)

set.seed(123)
imputed_missForest10 = missForest(cntry10, verbose = F)

data_imputed10 <- data.frame (
  country = cntry10$country,
  income_n = imputed_missForest10$ximp$income,
  fin_satisf_n = imputed_missForest10$ximp$fin_satisf,
  age_n = imputed_missForest10$ximp$age,
  sex_n = imputed_missForest10$ximp$sex,
  food_n = imputed_missForest10$ximp$food,
  medicine_n = imputed_missForest10$ximp$medicine,
  savings_n = imputed_missForest10$ximp$savings, 
  earner_n = imputed_missForest10$ximp$earner,
  employement_n = imputed_missForest10$ximp$employement,
  region = cntry10$region,
  liberties_n = imputed_missForest10$ximp$liberties, 
  mean_income = cntry10$mean_income
)

set.seed(123)
imputed_missForest11 = missForest(cntry11, verbose = F)

data_imputed11 <- data.frame (
  country = cntry11$country,
  income_n = imputed_missForest11$ximp$income,
  fin_satisf_n = imputed_missForest11$ximp$fin_satisf,
  age_n = imputed_missForest11$ximp$age,
  sex_n = imputed_missForest11$ximp$sex,
  food_n = imputed_missForest11$ximp$food,
  medicine_n = imputed_missForest11$ximp$medicine,
  savings_n = imputed_missForest11$ximp$savings, 
  earner_n = imputed_missForest11$ximp$earner,
  employement_n = imputed_missForest11$ximp$employement,
  region = cntry11$region,
  liberties_n = imputed_missForest11$ximp$liberties, 
  mean_income = cntry11$mean_income
)

set.seed(123)
imputed_missForest12 = missForest(cntry12, verbose = F)

data_imputed12 <- data.frame (
  country = cntry12$country,
  income_n = imputed_missForest12$ximp$income,
  fin_satisf_n = imputed_missForest12$ximp$fin_satisf,
  age_n = imputed_missForest12$ximp$age,
  sex_n = imputed_missForest12$ximp$sex,
  food_n = imputed_missForest12$ximp$food,
  medicine_n = imputed_missForest12$ximp$medicine,
  savings_n = imputed_missForest12$ximp$savings, 
  earner_n = imputed_missForest12$ximp$earner,
  employement_n = imputed_missForest12$ximp$employement,
  region = cntry12$region,
  liberties_n = imputed_missForest12$ximp$liberties, 
  mean_income = cntry12$mean_income
)

set.seed(123)
imputed_missForest13 = missForest(cntry13, verbose = F)

data_imputed13 <- data.frame (
  country = cntry13$country,
  income_n = imputed_missForest13$ximp$income,
  fin_satisf_n = imputed_missForest13$ximp$fin_satisf,
  age_n = imputed_missForest13$ximp$age,
  sex_n = imputed_missForest13$ximp$sex,
  food_n = imputed_missForest13$ximp$food,
  medicine_n = imputed_missForest13$ximp$medicine,
  savings_n = imputed_missForest13$ximp$savings, 
  earner_n = imputed_missForest13$ximp$earner,
  employement_n = imputed_missForest13$ximp$employement,
  region = cntry13$region,
  liberties_n = imputed_missForest13$ximp$liberties, 
  mean_income = cntry13$mean_income
)

set.seed(123)
imputed_missForest14 = missForest(cntry14, verbose = F)

data_imputed14 <- data.frame (
  country = cntry14$country,
  income_n = imputed_missForest14$ximp$income,
  fin_satisf_n = imputed_missForest14$ximp$fin_satisf,
  age_n = imputed_missForest14$ximp$age,
  sex_n = imputed_missForest14$ximp$sex,
  food_n = imputed_missForest14$ximp$food,
  medicine_n = imputed_missForest14$ximp$medicine,
  savings_n = imputed_missForest14$ximp$savings, 
  earner_n = imputed_missForest14$ximp$earner,
  employement_n = imputed_missForest14$ximp$employement,
  region = cntry14$region,
  liberties_n = imputed_missForest14$ximp$liberties, 
  mean_income = cntry14$mean_income
)

set.seed(123)
imputed_missForest15 = missForest(cntry15, verbose = F)

data_imputed15 <- data.frame (
  country = cntry15$country,
  income_n = imputed_missForest15$ximp$income,
  fin_satisf_n = imputed_missForest15$ximp$fin_satisf,
  age_n = imputed_missForest15$ximp$age,
  sex_n = imputed_missForest15$ximp$sex,
  food_n = imputed_missForest15$ximp$food,
  medicine_n = imputed_missForest15$ximp$medicine,
  savings_n = imputed_missForest15$ximp$savings, 
  earner_n = imputed_missForest15$ximp$earner,
  employement_n = imputed_missForest15$ximp$employement,
  region = cntry15$region,
  liberties_n = imputed_missForest15$ximp$liberties, 
  mean_income = cntry15$mean_income
)

set.seed(123)
imputed_missForest17 = missForest(cntry17, verbose = F)

data_imputed17 <- data.frame (
  country = cntry17$country,
  income_n = imputed_missForest17$ximp$income,
  fin_satisf_n = imputed_missForest17$ximp$fin_satisf,
  age_n = imputed_missForest17$ximp$age,
  sex_n = imputed_missForest17$ximp$sex,
  food_n = imputed_missForest17$ximp$food,
  medicine_n = imputed_missForest17$ximp$medicine,
  savings_n = imputed_missForest17$ximp$savings, 
  earner_n = imputed_missForest17$ximp$earner,
  employement_n = imputed_missForest17$ximp$employement,
  region = cntry17$region,
  liberties_n = imputed_missForest17$ximp$liberties, 
  mean_income = cntry17$mean_income
)

set.seed(123)
imputed_missForest18 = missForest(cntry18, verbose = F)

data_imputed18 <- data.frame (
  country = cntry18$country,
  income_n = imputed_missForest18$ximp$income,
  fin_satisf_n = imputed_missForest18$ximp$fin_satisf,
  age_n = imputed_missForest18$ximp$age,
  sex_n = imputed_missForest18$ximp$sex,
  food_n = imputed_missForest18$ximp$food,
  medicine_n = imputed_missForest18$ximp$medicine,
  savings_n = imputed_missForest18$ximp$savings, 
  earner_n = imputed_missForest18$ximp$earner,
  employement_n = imputed_missForest18$ximp$employement,
  region = cntry18$region,
  liberties_n = imputed_missForest18$ximp$liberties, 
  mean_income = cntry18$mean_income
)

set.seed(123)
imputed_missForest19 = missForest(cntry19, verbose = F)

data_imputed19 <- data.frame (
  country = cntry19$country,
  income_n = imputed_missForest19$ximp$income,
  fin_satisf_n = imputed_missForest19$ximp$fin_satisf,
  age_n = imputed_missForest19$ximp$age,
  sex_n = imputed_missForest19$ximp$sex,
  food_n = imputed_missForest19$ximp$food,
  medicine_n = imputed_missForest19$ximp$medicine,
  savings_n = imputed_missForest19$ximp$savings, 
  earner_n = imputed_missForest19$ximp$earner,
  employement_n = imputed_missForest19$ximp$employement,
  region = cntry19$region,
  liberties_n = imputed_missForest19$ximp$liberties, 
  mean_income = cntry19$mean_income
)

set.seed(123)
imputed_missForest20 = missForest(cntry20, verbose = F)

data_imputed20 <- data.frame (
  country = cntry20$country,
  income_n = imputed_missForest20$ximp$income,
  fin_satisf_n = imputed_missForest20$ximp$fin_satisf,
  age_n = imputed_missForest20$ximp$age,
  sex_n = imputed_missForest20$ximp$sex,
  food_n = imputed_missForest20$ximp$food,
  medicine_n = imputed_missForest20$ximp$medicine,
  savings_n = imputed_missForest20$ximp$savings, 
  earner_n = imputed_missForest20$ximp$earner,
  employement_n = imputed_missForest20$ximp$employement,
  region = cntry20$region,
  liberties_n = imputed_missForest20$ximp$liberties, 
  mean_income = cntry20$mean_income
)

set.seed(123)
imputed_missForest21 = missForest(cntry21, verbose = F)

data_imputed21 <- data.frame (
  country = cntry21$country,
  income_n = imputed_missForest21$ximp$income,
  fin_satisf_n = imputed_missForest21$ximp$fin_satisf,
  age_n = imputed_missForest21$ximp$age,
  sex_n = imputed_missForest21$ximp$sex,
  food_n = imputed_missForest21$ximp$food,
  medicine_n = imputed_missForest21$ximp$medicine,
  savings_n = imputed_missForest21$ximp$savings, 
  earner_n = imputed_missForest21$ximp$earner,
  employement_n = imputed_missForest21$ximp$employement,
  region = cntry21$region,
  liberties_n = imputed_missForest21$ximp$liberties, 
  mean_income = cntry21$mean_income
)

set.seed(123)
imputed_missForest22 = missForest(cntry22, verbose = F)

data_imputed22 <- data.frame (
  country = cntry22$country,
  income_n = imputed_missForest22$ximp$income,
  fin_satisf_n = imputed_missForest22$ximp$fin_satisf,
  age_n = imputed_missForest22$ximp$age,
  sex_n = imputed_missForest22$ximp$sex,
  food_n = imputed_missForest22$ximp$food,
  medicine_n = imputed_missForest22$ximp$medicine,
  savings_n = imputed_missForest22$ximp$savings, 
  earner_n = imputed_missForest22$ximp$earner,
  employement_n = imputed_missForest22$ximp$employement,
  region = cntry22$region,
  liberties_n = imputed_missForest22$ximp$liberties, 
  mean_income = cntry22$mean_income
)

Combing in one data set.

dataset_list = list(data_imputed1, data_imputed2, data_imputed3, data_imputed4, 
                     data_imputed5, data_imputed6, data_imputed8, data_imputed9, 
                     data_imputed10, data_imputed11, data_imputed12, data_imputed13, 
                     data_imputed14, data_imputed15, data_imputed17, data_imputed18, 
                     data_imputed19, data_imputed20, data_imputed21, data_imputed22)

combined_data = do.call(rbind, dataset_list)

Added Post-imputation interpretation

I will assess the imputation by visualizing NA distribution and results of the imputation.

Savings.

ggplot() + 
  geom_bar(data = data, aes(savings), fill = "red", bins=10)+   
  geom_bar(data = combined_data, aes(savings_n), fill = "blue", alpha = 0.5, bins=10)+
  theme_minimal()

Sex.

ggplot() + 
  geom_bar(data = data, aes(sex), fill = "red", bins=10)+   
  geom_bar(data = combined_data, aes(sex_n), fill = "blue", alpha = 0.5, bins=10)+
  theme_minimal()

Food.

ggplot() + 
  geom_bar(data = data, aes(food), fill = "red", bins=10)+   
  geom_bar(data = combined_data, aes(food_n), fill = "blue", alpha = 0.5, bins=10)+
  theme_minimal()

Medicine.

ggplot() + 
  geom_bar(data = data, aes(medicine), fill = "red", bins=10)+   
  geom_bar(data = combined_data, aes(medicine_n), fill = "blue", alpha = 0.5, bins=10)+
  theme_minimal()

Earner.

ggplot() + 
  geom_bar(data = data, aes(earner), fill = "red", bins=10)+   
  geom_bar(data = combined_data, aes(earner_n), fill = "blue", alpha = 0.5, bins=10)+
  theme_minimal()

Employment.

ggplot() + 
  geom_bar(data = data, aes(employement), fill = "red", bins=10)+   
  geom_bar(data = combined_data, aes(employement_n), fill = "blue", alpha = 0.5, bins=10)+
  theme_minimal() + coord_flip()

As it can be seen, in all cases the new data (previous NA) is distributed according to previous distribution of the variables and the main trends are saved.

As for numeric variables, it can be assessed by looking at mean and median values.

Data before imputation.

summary (data)

##       country        fin_satisf          age             sex       
##  Canada   : 4018   Min.   : 1.000   Min.   : 16.00   Male  :17406  
##  Indonesia: 3200   1st Qu.: 5.000   1st Qu.: 28.00   Female:17942  
##  China    : 3036   Median : 6.000   Median : 40.00   NA's  :    7  
##  USA      : 2596   Mean   : 6.136   Mean   : 41.77                 
##  Pakistan : 1995   3rd Qu.: 8.000   3rd Qu.: 53.00                 
##  Russia   : 1810   Max.   :10.000   Max.   :100.00                 
##  (Other)  :18700   NA's   :147      NA's   :25                     
##      income              food            medicine    
##  Min.   : 1.000   Often    : 1897   Often    : 2650  
##  1st Qu.: 3.000   Sometimes: 4661   Sometimes: 6074  
##  Median : 5.000   Rarely   : 5641   Rarely   : 6508  
##  Mean   : 4.709   Never    :23054   Never    :19995  
##  3rd Qu.: 6.000   NA's     :  102   NA's     :  128  
##  Max.   :10.000                                      
##  NA's   :769                                         
##                                   savings       earner     
##  Save money                           :10230   Yes :16112  
##  Just get by                          :15579   No  :18940  
##  Spent some savings and borrowed money: 5207   NA's:  303  
##  Spent savings and borrowed money     : 3449               
##  NA's                                 :  890               
##                                                            
##                                                            
##                            employement                         region    
##  Full time                       :11834   Sub-Saharan Africa      :3718  
##  Self employed                   : 5438   South Asia              :4887  
##  Housewife not otherwise employed: 5226   North America           :6614  
##  Retired/pensioned               : 3695   Middle East/North Africa:3899  
##  Unemployed                      : 3128   LA|Caribbean            :4264  
##  (Other)                         : 5579   Europe|Central Asia     :4384  
##  NA's                            :  455   East Asia | Pacific     :7589  
##    liberties      mean_income   
##  Min.   :1.000   Min.   :3.455  
##  1st Qu.:1.000   1st Qu.:4.095  
##  Median :4.000   Median :4.443  
##  Mean   :3.588   Mean   :4.571  
##  3rd Qu.:5.000   3rd Qu.:4.952  
##  Max.   :6.000   Max.   :5.615  
##

Data after imputation.

summary (combined_data)

##       country         income_n      fin_satisf_n        age_n       
##  Canada   : 4018   Min.   : 1.00   Min.   : 1.000   Min.   : 16.00  
##  Indonesia: 3200   1st Qu.: 3.00   1st Qu.: 5.000   1st Qu.: 28.00  
##  China    : 3036   Median : 5.00   Median : 6.000   Median : 40.00  
##  USA      : 2596   Mean   : 4.71   Mean   : 6.136   Mean   : 41.76  
##  Pakistan : 1995   3rd Qu.: 6.00   3rd Qu.: 8.000   3rd Qu.: 53.00  
##  Russia   : 1810   Max.   :10.00   Max.   :10.000   Max.   :100.00  
##  (Other)  :18700                                                    
##     sex_n             food_n          medicine_n   
##  Male  :17407   Often    : 1897   Often    : 2654  
##  Female:17948   Sometimes: 4664   Sometimes: 6093  
##                 Rarely   : 5647   Rarely   : 6515  
##                 Never    :23147   Never    :20093  
##                                                    
##                                                    
##                                                    
##                                  savings_n     earner_n   
##  Save money                           :10517   Yes:16243  
##  Just get by                          :16109   No :19112  
##  Spent some savings and borrowed money: 5264              
##  Spent savings and borrowed money     : 3465              
##                                                           
##                                                           
##                                                           
##                           employement_n                        region    
##  Full time                       :12068   Sub-Saharan Africa      :3718  
##  Self employed                   : 5463   South Asia              :4887  
##  Housewife not otherwise employed: 5302   North America           :6614  
##  Retired/pensioned               : 3761   Middle East/North Africa:3899  
##  Unemployed                      : 3137   LA|Caribbean            :4264  
##  Part time                       : 2841   Europe|Central Asia     :4384  
##  (Other)                         : 2783   East Asia | Pacific     :7589  
##   liberties_n     mean_income   
##  Min.   :1.000   Min.   :3.455  
##  1st Qu.:1.000   1st Qu.:4.095  
##  Median :4.000   Median :4.443  
##  Mean   :3.588   Mean   :4.571  
##  3rd Qu.:5.000   3rd Qu.:4.952  
##  Max.   :6.000   Max.   :5.615  
##

It can be seen that for variables financial satisfaction, income, age the median stays unchanged and mean only changes by a little bit which means that imputation was done evenly in all variables. Therefore, I can conclude that imputation was well done.

Simple tests

Scaling income.

combined_data$income_c = center(combined_data$income_n, type ="CWC", cluster = combined_data$country)

First test

cor.test(combined_data$fin_satisf_n,combined_data$income_c)

## 
##  Pearson's product-moment correlation
## 
## data:  combined_data$fin_satisf_n and combined_data$income_c
## t = 60.405, df = 35353, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2963872 0.3152846
## sample estimates:
##      cor 
## 0.305866

Added visualization of the test

scatter.smooth(combined_data$fin_satisf_n,combined_data$income_c)

P-value is lower than 0.05, so there is moderate positive relationship - higher income is associated with higher level of financial satisfaction.

Second test

t.test(combined_data$fin_satisf_n~combined_data$sex_n)

## 
##  Welch Two Sample t-test
## 
## data:  combined_data$fin_satisf_n by combined_data$sex_n
## t = 2.0914, df = 35325, p-value = 0.0365
## alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
## 95 percent confidence interval:
##  0.003583581 0.110496901
## sample estimates:
##   mean in group Male mean in group Female 
##             6.165050             6.108009

boxplot(combined_data$fin_satisf_n~combined_data$sex_n)

P-value is lower than 0.05, so there is difference in level of satisfaction between men and women but the effect is very small. Men have extremely slightly higher level of financial satisfaction.

Third test

t.test(combined_data$fin_satisf_n~combined_data$earner_n)

## 
##  Welch Two Sample t-test
## 
## data:  combined_data$fin_satisf_n by combined_data$earner_n
## t = -6.0073, df = 34420, p-value = 1.905e-09
## alternative hypothesis: true difference in means between group Yes and group No is not equal to 0
## 95 percent confidence interval:
##  -0.2179316 -0.1107053
## sample estimates:
## mean in group Yes  mean in group No 
##          6.047267          6.211585

boxplot(combined_data$fin_satisf_n~combined_data$earner_n)

P-value is lower than 0.05, so there is difference in level of satisfaction between primary wage earners and not but the effect is very small. Non primary wage earners have extremely slightly higher level of financial satisfaction.

Fourth test

cor.test(combined_data$fin_satisf_n,combined_data$age_n)

## 
##  Pearson's product-moment correlation
## 
## data:  combined_data$fin_satisf_n and combined_data$age_n
## t = 4.3108, df = 35353, p-value = 1.631e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01250027 0.03333689
## sample estimates:
##        cor 
## 0.02292107

Added visualization of the test

scatter.smooth(combined_data$fin_satisf_n,combined_data$age_n)

P-value is lower than 0.05, so there is relationship between variables but it is very weak and positive. The almost non existent strength of the relationship can be also seen on the graph.

Fifth test

TukeyHSD(aov(combined_data$fin_satisf_n~combined_data$food_n))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = combined_data$fin_satisf_n ~ combined_data$food_n)
## 
## $`combined_data$food_n`
##                       diff         lwr       upr     p adj
## Sometimes-Often  0.1758748 0.003582608 0.3481669 0.0433194
## Rarely-Often     0.7338502 0.565949807 0.9017506 0.0000000
## Never-Often      1.8583397 1.707239826 2.0094397 0.0000000
## Rarely-Sometimes 0.5579754 0.432789467 0.6831614 0.0000000
## Never-Sometimes  1.6824650 1.580916119 1.7840139 0.0000000
## Never-Rarely     1.1244896 1.030584674 1.2183944 0.0000000

boxplot(combined_data$fin_satisf_n~combined_data$food_n)

There is significant difference between groups except for Sometimes-Often. Those who never had problems with money for food have highest level of financial satisfaction while those who had these problems the most, have lowest level of satisfaction.

Sixth test

TukeyHSD(aov(combined_data$fin_satisf_n~combined_data$medicine_n))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = combined_data$fin_satisf_n ~ combined_data$medicine_n)
## 
## $`combined_data$medicine_n`
##                       diff       lwr       upr p adj
## Sometimes-Often  0.7247131 0.5763617 0.8730645     0
## Rarely-Often     1.0914435 0.9445571 1.2383299     0
## Never-Often      1.9429486 1.8112087 2.0746886     0
## Rarely-Sometimes 0.3667304 0.2530518 0.4804089     0
## Never-Sometimes  1.2182355 1.1249477 1.3115233     0
## Never-Rarely     0.8515051 0.7605652 0.9424451     0

boxplot(combined_data$fin_satisf_n~combined_data$medicine_n)

There is significant difference between all groups. Those who never had problems with money for medical treatment have highest level of financial satisfaction while those who had these problems the most, have lowest level of satisfaction.

Seventh test

TukeyHSD(aov(combined_data$fin_satisf_n~combined_data$savings_n))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = combined_data$fin_satisf_n ~ combined_data$savings_n)
## 
## $`combined_data$savings_n`
##                                                                               diff
## Just get by-Save money                                                 -1.36712827
## Spent some savings and borrowed money-Save money                       -1.38904063
## Spent savings and borrowed money-Save money                            -2.31724414
## Spent some savings and borrowed money-Just get by                      -0.02191236
## Spent savings and borrowed money-Just get by                           -0.95011587
## Spent savings and borrowed money-Spent some savings and borrowed money -0.92820351
##                                                                               lwr
## Just get by-Save money                                                 -1.4461910
## Spent some savings and borrowed money-Save money                       -1.4955192
## Spent savings and borrowed money-Save money                            -2.4407780
## Spent some savings and borrowed money-Just get by                      -0.1220366
## Spent savings and borrowed money-Just get by                           -1.0682166
## Spent savings and borrowed money-Spent some savings and borrowed money -1.0661693
##                                                                                upr
## Just get by-Save money                                                 -1.28806558
## Spent some savings and borrowed money-Save money                       -1.28256206
## Spent savings and borrowed money-Save money                            -2.19371030
## Spent some savings and borrowed money-Just get by                       0.07821191
## Spent savings and borrowed money-Just get by                           -0.83201510
## Spent savings and borrowed money-Spent some savings and borrowed money -0.79023769
##                                                                            p adj
## Just get by-Save money                                                 0.0000000
## Spent some savings and borrowed money-Save money                       0.0000000
## Spent savings and borrowed money-Save money                            0.0000000
## Spent some savings and borrowed money-Just get by                      0.9432146
## Spent savings and borrowed money-Just get by                           0.0000000
## Spent savings and borrowed money-Spent some savings and borrowed money 0.0000000

boxplot(combined_data$fin_satisf_n~combined_data$savings_n)

There is significant difference between all groups except “Spent some savings and borrowed money-Just get by”. Those who save money have highest level of financial satisfaction while those who spent savings and borrowed money have lowest level of satisfaction.

Eighth test

TukeyHSD(aov(combined_data$fin_satisf_n~combined_data$employement_n))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = combined_data$fin_satisf_n ~ combined_data$employement_n)
## 
## $`combined_data$employement_n`
##                                                           diff         lwr
## Part time-Full time                                -0.26439659 -0.42430052
## Self employed-Full time                            -0.45996376 -0.58500643
## Retired/pensioned-Full time                         0.21246843  0.06926759
## Housewife not otherwise employed-Full time         -0.10512290 -0.23146571
## Student-Full time                                  -0.03994609 -0.21296706
## Unemployed-Full time                               -1.38494240 -1.53861855
## Other-Full time                                    -0.82530400 -1.19870170
## Self employed-Part time                            -0.19556717 -0.37293735
## Retired/pensioned-Part time                         0.47686502  0.28625812
## Housewife not otherwise employed-Part time          0.15927369 -0.01901545
## Student-Part time                                   0.22445050  0.01053283
## Unemployed-Part time                               -1.12054581 -1.31914306
## Other-Part time                                    -0.56090741 -0.95492562
## Retired/pensioned-Self employed                     0.67243219  0.50995944
## Housewife not otherwise employed-Self employed      0.35484086  0.20701184
## Student-Self employed                               0.42001767  0.23073700
## Unemployed-Self employed                           -0.92497865 -1.09675545
## Other-Self employed                                -0.36534024 -0.74654459
## Housewife not otherwise employed-Retired/pensioned -0.31759133 -0.48106681
## Student-Retired/pensioned                          -0.25241452 -0.45415193
## Unemployed-Retired/pensioned                       -1.59741084 -1.78282411
## Other-Retired/pensioned                            -1.03777243 -1.42531284
## Student-Housewife not otherwise employed            0.06517681 -0.12496527
## Unemployed-Housewife not otherwise employed        -1.27981951 -1.45254503
## Other-Housewife not otherwise employed             -0.72018111 -1.10181390
## Unemployed-Student                                 -1.34499632 -1.55429959
## Other-Student                                      -0.78535791 -1.18487930
## Other-Unemployed                                    0.55963840  0.16810623
##                                                            upr     p adj
## Part time-Full time                                -0.10449266 0.0000149
## Self employed-Full time                            -0.33492109 0.0000000
## Retired/pensioned-Full time                         0.35566928 0.0001856
## Housewife not otherwise employed-Full time          0.02121992 0.1860556
## Student-Full time                                   0.13307489 0.9970245
## Unemployed-Full time                               -1.23126626 0.0000000
## Other-Full time                                    -0.45190631 0.0000000
## Self employed-Part time                            -0.01819699 0.0188812
## Retired/pensioned-Part time                         0.66747193 0.0000000
## Housewife not otherwise employed-Part time          0.33756283 0.1201368
## Student-Part time                                   0.43836817 0.0318201
## Unemployed-Part time                               -0.92194856 0.0000000
## Other-Part time                                    -0.16688920 0.0004249
## Retired/pensioned-Self employed                     0.83490494 0.0000000
## Housewife not otherwise employed-Self employed      0.50266988 0.0000000
## Student-Self employed                               0.60929834 0.0000000
## Unemployed-Self employed                           -0.75320184 0.0000000
## Other-Self employed                                 0.01586410 0.0715969
## Housewife not otherwise employed-Retired/pensioned -0.15411585 0.0000001
## Student-Retired/pensioned                          -0.05067711 0.0037294
## Unemployed-Retired/pensioned                       -1.41199756 0.0000000
## Other-Retired/pensioned                            -0.65023203 0.0000000
## Student-Housewife not otherwise employed            0.25531888 0.9685316
## Unemployed-Housewife not otherwise employed        -1.10709398 0.0000000
## Other-Housewife not otherwise employed             -0.33854831 0.0000003
## Unemployed-Student                                 -1.13569305 0.0000000
## Other-Student                                      -0.38583653 0.0000001
## Other-Unemployed                                    0.95117057 0.0003929

boxplot(combined_data$fin_satisf_n~combined_data$employement_n)

There is significant difference between almost all groups and what is important for this observation is that unemployed individuals have much lower financial satisfaction compared to full-time workers and other types of employment.

Ninth test

cor.test(combined_data$fin_satisf_n,combined_data$mean_income)

## 
##  Pearson's product-moment correlation
## 
## data:  combined_data$fin_satisf_n and combined_data$mean_income
## t = 20.418, df = 35353, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.0976448 0.1182494
## sample estimates:
##       cor 
## 0.1079587

Added visualization of the test

scatter.smooth(combined_data$fin_satisf_n,combined_data$mean_income)

P-value is lower than 0.05, so there is relationship between variables but it is very weak and positive. The almost non existent strength of the relationship can be also seen on the graph.

Tenth test

cor.test(combined_data$fin_satisf_n,combined_data$liberties)

## 
##  Pearson's product-moment correlation
## 
## data:  combined_data$fin_satisf_n and combined_data$liberties
## t = -17.844, df = 35353, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.10479864 -0.08413714
## sample estimates:
##         cor 
## -0.09447806

Added visualization of the test

scatter.smooth(combined_data$fin_satisf_n,combined_data$liberties)

P-value is lower than 0.05, so there is relationship between variables but it is very weak and positive (keeping in mind that the variable is reversed). The almost non existent strength of the relationship can be also seen on the graph.

Eleventh test

TukeyHSD(aov(combined_data$fin_satisf_n~combined_data$region))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = combined_data$fin_satisf_n ~ combined_data$region)
## 
## $`combined_data$region`
##                                                     diff         lwr
## South Asia-Sub-Saharan Africa                 2.48761401  2.32922745
## North America-Sub-Saharan Africa              1.90599539  1.75681074
## Middle East/North Africa-Sub-Saharan Africa   1.04781656  0.88098455
## LA|Caribbean-Sub-Saharan Africa               2.09825442  1.93494497
## Europe|Central Asia-Sub-Saharan Africa        1.57109171  1.40882669
## East Asia | Pacific-Sub-Saharan Africa        1.97166358  1.82596831
## North America-South Asia                     -0.58161862 -0.71890684
## Middle East/North Africa-South Asia          -1.43979744 -1.59608217
## LA|Caribbean-South Asia                      -0.38935959 -0.54187834
## Europe|Central Asia-South Asia               -0.91652230 -1.06792220
## East Asia | Pacific-South Asia               -0.51595042 -0.64943865
## Middle East/North Africa-North America       -0.85817883 -1.00513010
## LA|Caribbean-North America                    0.19225903  0.04931942
## Europe|Central Asia-North America            -0.33490369 -0.47664884
## East Asia | Pacific-North America             0.06566819 -0.05676084
## LA|Caribbean-Middle East/North Africa         1.05043785  0.88916606
## Europe|Central Asia-Middle East/North Africa  0.52327514  0.36306106
## East Asia | Pacific-Middle East/North Africa  0.92384702  0.78043946
## Europe|Central Asia-LA|Caribbean             -0.52716271 -0.68370537
## East Asia | Pacific-LA|Caribbean             -0.12659083 -0.26588470
## East Asia | Pacific-Europe|Central Asia       0.40057188  0.26250399
##                                                      upr     p adj
## South Asia-Sub-Saharan Africa                 2.64600056 0.0000000
## North America-Sub-Saharan Africa              2.05518004 0.0000000
## Middle East/North Africa-Sub-Saharan Africa   1.21464858 0.0000000
## LA|Caribbean-Sub-Saharan Africa               2.26156386 0.0000000
## Europe|Central Asia-Sub-Saharan Africa        1.73335672 0.0000000
## East Asia | Pacific-Sub-Saharan Africa        2.11735886 0.0000000
## North America-South Asia                     -0.44433040 0.0000000
## Middle East/North Africa-South Asia          -1.28351271 0.0000000
## LA|Caribbean-South Asia                      -0.23684084 0.0000000
## Europe|Central Asia-South Asia               -0.76512240 0.0000000
## East Asia | Pacific-South Asia               -0.38246219 0.0000000
## Middle East/North Africa-North America       -0.71122755 0.0000000
## LA|Caribbean-North America                    0.33519863 0.0014259
## Europe|Central Asia-North America            -0.19315853 0.0000000
## East Asia | Pacific-North America             0.18809722 0.6943696
## LA|Caribbean-Middle East/North Africa         1.21170965 0.0000000
## Europe|Central Asia-Middle East/North Africa  0.68348922 0.0000000
## East Asia | Pacific-Middle East/North Africa  1.06725458 0.0000000
## Europe|Central Asia-LA|Caribbean             -0.37062005 0.0000000
## East Asia | Pacific-LA|Caribbean              0.01270304 0.1034226
## East Asia | Pacific-Europe|Central Asia       0.53863977 0.0000000

boxplot(combined_data$fin_satisf_n~combined_data$region)

There is significant difference between all groups except “East Asia | Pacific-North America” and “East Asia | Pacific-LA|Caribbean”. It seems that Sub_Saharan Africa has the lowest level of financial satisfaction compared to other regions.

Model

Null model for the 1st level.

fit = lm(fin_satisf_n ~ 1, data=combined_data)
summary(fit)

## 
## Call:
## lm(formula = fin_satisf_n ~ 1, data = combined_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1361 -1.1361 -0.1361  1.8639  3.8639 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.13609    0.01364     450   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.564 on 35354 degrees of freedom

logLik(fit)

## 'log Lik.' -83455.38 (df=2)

nullmodel <- lmer(fin_satisf_n ~ (1 | country), data = combined_data, REML = FALSE) 
summary(nullmodel)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: fin_satisf_n ~ (1 | country)
##    Data: combined_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##  163185.4  163210.9  -81589.7  163179.4     35352 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5937 -0.6270  0.1228  0.7006  2.6119 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.8407   0.9169  
##  Residual             5.8973   2.4284  
## Number of obs: 35355, groups:  country, 20
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   5.9799     0.2055    29.1

logLik(nullmodel)

## 'log Lik.' -81589.72 (df=3)

performance::icc(nullmodel)

## # Intraclass Correlation Coefficient
## 
##     Adjusted ICC: 0.125
##   Unadjusted ICC: 0.125

ICC is 0.125 which means that 12.5% of the variance in financial satisfaction is explained by country differences, so multilevel model is justified.

Now adding first level predictors one by one.

m1 <- lmer(fin_satisf_n ~ food_n + (1 | country), data = combined_data, REML = FALSE)

anova (nullmodel, m1)

## Data: combined_data
## Models:
## nullmodel: fin_satisf_n ~ (1 | country)
## m1: fin_satisf_n ~ food_n + (1 | country)
##           npar    AIC    BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)    
## nullmodel    3 163185 163211 -81590    163179                         
## m1           6 160691 160742 -80339    160679 2500.7  3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1