About the data

The European Value Study (EVS) and the World Value Survey (WVS) are extensive, cross-national, and longitudinal survey research initiatives that have been ongoing since the early eighties. These programs involve a multitude of questions replicated over the years. In 2017, EVS and WVS collaboratively undertook joint data collection efforts, with EVS focusing on European countries and WVS covering the rest of the world.

In this document, we will be exploring specifically the data from Austria collected in 2018. The aim of the survey is to gain valuable insights into the evolution of values and their impact on society, helping to inform academic and policy discussions at both national and international levels.It will provide insights into the social, political, and cultural dynamics within Austria

Preparing the data

First, we summarized the data and gathered all the information in the table below. We also renamed the “Happiness” and “Life Satisfaction” variables.

#Preparing the data 
# Loading the data on rda
load("c:/users/dell/downloads/Austria.rda")

#Library
# Descriptive statistics
library(psych)
## Warning: package 'psych' was built under R version 4.3.2
library(knitr)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(haven)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stats)

# Summary statistics
summary(data)
##    cntry_AN              year       reg_nuts1              A001      
##  Length:844         Min.   :2018   Length:844         Min.   :1.000  
##  Class :character   1st Qu.:2018   Class :character   1st Qu.:1.000  
##  Mode  :character   Median :2018   Mode  :character   Median :1.000  
##                     Mean   :2018                      Mean   :1.164  
##                     3rd Qu.:2018                      3rd Qu.:1.000  
##                     Max.   :2018                      Max.   :4.000  
##       A002            A004            A005            A006      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000  
##  Median :1.000   Median :2.000   Median :1.000   Median :3.000  
##  Mean   :1.417   Mean   :2.431   Mean   :1.649   Mean   :2.655  
##  3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.000  
##  Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##       A008            A009            A170             A173       
##  Min.   :1.000   Min.   :1.000   Min.   : 1.000   Min.   : 1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 7.000   1st Qu.: 6.000  
##  Median :2.000   Median :2.000   Median : 8.000   Median : 8.000  
##  Mean   :1.754   Mean   :1.975   Mean   : 7.919   Mean   : 7.255  
##  3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.: 9.000   3rd Qu.: 9.000  
##  Max.   :4.000   Max.   :5.000   Max.   :10.000   Max.   :10.000  
##       A065            A066             A067             A068        
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.000   Median :0.0000   Median :0.0000   Median :0.00000  
##  Mean   :0.359   Mean   :0.1363   Mean   :0.1351   Mean   :0.07583  
##  3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##       A071              A072              A074             A078        
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.0000   Median :0.00000  
##  Mean   :0.04028   Mean   :0.06043   Mean   :0.2761   Mean   :0.03199  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:1.0000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.0000   Max.   :1.00000  
##       A079            A080_01          A080_02           A124_02       
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   :0.07346   Mean   :0.0936   Mean   :0.04976   Mean   :0.09479  
##  3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##     A124_03          A124_06          A124_08          A124_09      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.6789   Mean   :0.1943   Mean   :0.7855   Mean   :0.1232  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       A165            B008           D001_B        G007_18_B    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000   Median :1.000   Median :1.000   Median :2.000  
##  Mean   :1.491   Mean   :1.491   Mean   :1.178   Mean   :1.864  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :2.000   Max.   :3.000   Max.   :4.000   Max.   :4.000  
##    G007_33_B       G007_34_B      G007_35_B       G007_36_B          E023      
##  Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :2.000   Median :3.00   Median :2.000   Median :2.000   Median :2.000  
##  Mean   :1.648   Mean   :2.62   Mean   :2.469   Mean   :2.387   Mean   :2.179  
##  3rd Qu.:2.000   3rd Qu.:3.00   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :4.000   Max.   :4.00   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##       E025            E026            E027            E028      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :1.000   Median :3.000   Median :2.000   Median :3.000  
##  Mean   :1.615   Mean   :2.368   Mean   :2.283   Mean   :2.685  
##  3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :3.000   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##       E035             E037             E039           E111_01      
##  Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
##  1st Qu.: 3.000   1st Qu.: 3.000   1st Qu.: 2.000   1st Qu.: 5.000  
##  Median : 5.000   Median : 5.000   Median : 3.000   Median : 7.000  
##  Mean   : 5.121   Mean   : 4.614   Mean   : 3.477   Mean   : 6.411  
##  3rd Qu.: 7.000   3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 8.000  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##       E117            E235             F025             F028      
##  Min.   :1.000   Min.   : 1.000   Min.   :0.0000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.: 9.000   1st Qu.:0.0000   1st Qu.:3.000  
##  Median :1.000   Median :10.000   Median :1.0000   Median :6.000  
##  Mean   :1.348   Mean   : 9.104   Mean   :0.9929   Mean   :5.351  
##  3rd Qu.:2.000   3rd Qu.:10.000   3rd Qu.:1.0000   3rd Qu.:8.000  
##  Max.   :4.000   Max.   :10.000   Max.   :9.0000   Max.   :8.000  
##       F034            F050             F063             F116       
##  Min.   :1.000   Min.   :0.0000   Min.   : 1.000   Min.   : 1.000  
##  1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 3.000   1st Qu.: 1.000  
##  Median :1.000   Median :1.0000   Median : 6.000   Median : 1.000  
##  Mean   :1.436   Mean   :0.7251   Mean   : 5.608   Mean   : 1.793  
##  3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 8.000   3rd Qu.: 2.000  
##  Max.   :3.000   Max.   :1.0000   Max.   :10.000   Max.   :10.000  
##       F118             F119             F120             F121       
##  Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
##  1st Qu.: 5.000   1st Qu.: 1.000   1st Qu.: 4.000   1st Qu.: 5.000  
##  Median : 8.000   Median : 5.000   Median : 6.000   Median : 8.000  
##  Mean   : 7.081   Mean   : 4.578   Mean   : 5.996   Mean   : 7.419  
##  3rd Qu.:10.000   3rd Qu.: 7.000   3rd Qu.: 9.000   3rd Qu.:10.000  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##       F122             F123             F132             G052      
##  Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   :1.000  
##  1st Qu.: 3.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.:2.000  
##  Median : 6.000   Median : 3.000   Median : 5.000   Median :3.000  
##  Mean   : 5.963   Mean   : 3.919   Mean   : 4.668   Mean   :2.883  
##  3rd Qu.: 9.000   3rd Qu.: 6.000   3rd Qu.: 7.000   3rd Qu.:4.000  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :5.000  
##       X001            X002           X003          G027A            X007      
##  Min.   :1.000   Min.   :1937   Min.   :18.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1953   1st Qu.:37.0   1st Qu.:1.000   1st Qu.:1.000  
##  Median :2.000   Median :1968   Median :50.0   Median :1.000   Median :3.000  
##  Mean   :1.556   Mean   :1967   Mean   :50.8   Mean   :1.105   Mean   :2.942  
##  3rd Qu.:2.000   3rd Qu.:1981   3rd Qu.:65.0   3rd Qu.:1.000   3rd Qu.:5.000  
##  Max.   :2.000   Max.   :2000   Max.   :82.0   Max.   :2.000   Max.   :6.000  
##       X011          x026_01           X013          X025A_01    
##  Min.   :0.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
##  1st Qu.:0.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:3.000  
##  Median :1.000   Median :1.000   Median :2.000   Median :3.000  
##  Mean   :1.412   Mean   :1.098   Mean   :2.259   Mean   :3.547  
##  3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:3.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :4.000   Max.   :6.000   Max.   :8.000  
##       X028         X047E_EVS5    
##  Min.   :1.000   Min.   : 1.000  
##  1st Qu.:1.000   1st Qu.: 3.000  
##  Median :2.000   Median : 5.000  
##  Mean   :2.573   Mean   : 4.773  
##  3rd Qu.:4.000   3rd Qu.: 7.000  
##  Max.   :8.000   Max.   :10.000
#Colnames
colnames(data)
##  [1] "cntry_AN"   "year"       "reg_nuts1"  "A001"       "A002"      
##  [6] "A004"       "A005"       "A006"       "A008"       "A009"      
## [11] "A170"       "A173"       "A065"       "A066"       "A067"      
## [16] "A068"       "A071"       "A072"       "A074"       "A078"      
## [21] "A079"       "A080_01"    "A080_02"    "A124_02"    "A124_03"   
## [26] "A124_06"    "A124_08"    "A124_09"    "A165"       "B008"      
## [31] "D001_B"     "G007_18_B"  "G007_33_B"  "G007_34_B"  "G007_35_B" 
## [36] "G007_36_B"  "E023"       "E025"       "E026"       "E027"      
## [41] "E028"       "E035"       "E037"       "E039"       "E111_01"   
## [46] "E117"       "E235"       "F025"       "F028"       "F034"      
## [51] "F050"       "F063"       "F116"       "F118"       "F119"      
## [56] "F120"       "F121"       "F122"       "F123"       "F132"      
## [61] "G052"       "X001"       "X002"       "X003"       "G027A"     
## [66] "X007"       "X011"       "x026_01"    "X013"       "X025A_01"  
## [71] "X028"       "X047E_EVS5"
# Renaming the variables
colnames(data)[9] <- "Happiness"
colnames(data)[11] <- "Life_Satisfaction"

Descriptive statistics

“A008” variable (happiness)

## Descriptive statistics for Happiness variable
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 844 1.753554 0.6450278 2 1.753554 0 1 4 3 0.4147694 -0.1000776 0.0222028
Summary of Happiness
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.754   2.000   4.000

“A170” variable (life satisfaction)

## Descriptive statistics for Life Satisfaction variable
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 844 7.919431 1.805002 8 7.919431 1.4826 1 10 9 -1.022233 0.7895252 0.0621307
Summary of Life Satisfaction
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   7.000   8.000   7.919   9.000  10.000

Analyzing the distribution of categorical variable “Happiness”:

For categorical variables, such as “Happiness,” which represents different categories of happiness, we decided to use a bar plot to illustrate the distribution. With this, we displayed the frequency of responses for each category of happiness:

## 
##   1   2   3   4 
## 300 457  82   5

This bar plot shows the number of individuals falling into each category of happiness, providing a visual representation of the distribution. In this specific dataset, the majority of respondents seem to report being either “Very Happy” or “Quite Happy,” with fewer individuals indicating lower levels of happiness (“Not Very Happy” and “Not at All Happy”).

Analyzing the distribution of numerical variable “Life_Satisfaction”:

As “Life_Satisfaction” represents a numerical scale from 1 to 10 for life satisfaction, we can visualize the distribution using a histogram. Life satisfaction can be measured by these 10 categories:

## 
##   1   2   3   4   5   6   7   8   9  10 
##   3   2  17  28  51  57  98 235 179 174

This histogram provides an overview of how individuals in the dataset distribute across different levels of life satisfaction, with the highest count in the “8” category, indicating a relatively high level of satisfaction.

Relation between Life Satisfaction and Happiness

##     
##      4 - Not at all happy 3 - Not very happy 2 - Quite happy 1 - Very happy
##   1                     2                  0               1              0
##   2                     1                  0               1              0
##   3                     0                  6               7              4
##   4                     1                 13               8              6
##   5                     0                 19              26              6
##   6                     0                 12              34             11
##   7                     1                 14              65             18
##   8                     0                 14             171             50
##   9                     0                  0              92             87
##   10                    0                  4              52            118

Graph

This type of graph helps visualize the distribution of happiness levels for each life satisfaction rating and can provide insights into the relationship between the two variables.

Happiness Level 1 (Very happy):

Pattern: Respondents who report being very happy tend to have higher life satisfaction scores. The counts are higher in the categories 7-10, indicating a positive association. Insights: There is a clear positive relationship between reporting a high level of happiness and having higher life satisfaction scores. This aligns with the expectation that individuals who feel very happy are more likely to rate their overall life satisfaction positively.

Happiness Level 2 (Quite happy):

Pattern: Similar to Happiness Level 1, individuals who report being quite happy also have higher life satisfaction scores. The counts are higher in the categories 7-10. Insights: The positive association observed in Happiness Level 1 is also evident in respondents who indicate being quite happy. This reinforces the notion that higher happiness levels correspond to higher life satisfaction.

Happiness Levels 3 and 4 (Not very happy and Not at all happy):

Pattern: Respondents who report lower happiness levels (3 and 4) tend to have lower life satisfaction scores. The counts are higher in the categories 1-6. Insights: There is a negative association between lower happiness levels and life satisfaction. Individuals reporting lower happiness levels are more likely to have lower life satisfaction scores.

Overall Implications:

The observed patterns reinforce the intuitive expectation that higher happiness levels correspond to higher life satisfaction.

The strong positive association between very happy and quite happy categories and higher life satisfaction scores suggests that happiness is a key factor influencing individuals’ overall life satisfaction.

Relations between happiness and life satisfaction with other variables

In this session we explored the relations between happiness and life satisfaction with marital status, health and income.

Marital status

The variable “Marital_Status” was measured by categorizing individuals into different relationship statuses. Respondents indicated their marital status by choosing one of the provided options, including:

  • 1 - Married
  • 2 - Living together as married
  • 3 - Divorced
  • 4 - Separated
  • 5 - Widowed
  • 6 - Single/Never married

Happiness by Marital Status

##                       
##                          1   2   3   4   5   6
##   4 - Not at all happy   1   0   3   0   0   1
##   3 - Not very happy    25   0  23   1  16  17
##   2 - Quite happy      216   2  75   1  44 119
##   1 - Very happy       172   3  36   0  16  73

In summary, the data suggests some associations between marital status and reported happiness levels. Married individuals, especially those living together as married, tend to report higher levels of happiness, while other marital statuses show varied patterns in reported happiness.

Life Satisfaction by Marital Status

##     
##        1   2   3   4   5   6
##   1    1   0   2   0   0   0
##   2    0   0   2   0   0   0
##   3    6   0   3   0   3   5
##   4   10   0   7   1   2   8
##   5   23   0  13   0   1  14
##   6   31   0   7   1   7  11
##   7   47   1  15   0   8  27
##   8  101   2  39   0  29  64
##   9   93   1  28   0   6  51
##   10 102   1  21   0  20  30

In summary, the data suggests a varied distribution of life satisfaction across different marital statuses. While some patterns are observed, it’s important to note that life satisfaction is influenced by a range of factors, and this analysis provides a snapshot of the reported levels within the given dataset.

Income

The variable “Income” was measured by categorizing households based on their total income, which includes wages, salaries, pensions, and other sources, after accounting for taxes and deductions. Each household is assigned a letter representing its income group.

  • 1 A - 1st decile
  • 2 B - 2nd decile
  • 3 C - 3rd decile
  • 4 D - 4th decile
  • 5 E - 5th decile
  • 6 F - 6th decile
  • 7 G - 7th decile
  • 8 H - 8th decile
  • 9 I - 9th decile
  • 10 J - 10th decile

Happiness by Income

##                       
##                         1  2  3  4  5  6  7  8  9 10
##   4 - Not at all happy  2  0  0  2  0  1  0  0  0  0
##   3 - Not very happy   21 19 16  6 12  1  6  1  0  0
##   2 - Quite happy      29 86 50 72 71 42 40 37 17 13
##   1 - Very happy       13 30 26 31 48 37 44 38 19 14

In summary, the data suggests that individuals reporting higher happiness levels are distributed across various income categories. While some patterns are observed, there isn’t a clear dominance of a particular income category within each happiness level. It’s important to note that happiness is influenced by various factors, and this analysis provides insights into their intersection with income.

Life Satisfaction by Income

##     
##       1  2  3  4  5  6  7  8  9 10
##   1   2  0  0  1  0  0  0  0  0  0
##   2   1  0  0  1  0  0  0  0  0  0
##   3   3  2  0  3  2  2  1  3  0  1
##   4   4  5  5  3  3  1  5  2  0  0
##   5   6  7  9  6  8  2  4  7  0  2
##   6   5  8  9  8  8  4  7  3  3  2
##   7   8 18 13 17 17  8 10  2  2  3
##   8  14 50 24 27 39 22 25 18 11  5
##   9  10 27 17 22 26 19 15 24  9 10
##   10 12 18 15 23 28 23 23 17 11  4

In summary, the data suggests a positive association between higher life satisfaction levels and higher income, particularly in the upper deciles (H, I, J). However, individuals with lower life satisfaction levels are distributed across various income categories, with slightly higher frequencies in the 5th and 6th deciles.

Health

The variable “Health” was measured by assessing individuals’ subjective evaluation of their overall health. Respondents were asked to describe their state of health, choosing from options such as:

  • 1 - Very good
  • 2 - Good
  • 3 - Fair
  • 4 - Poor
  • 5 - Very poor

Happiness by Health

##                       
##                          1   2   3   4   5
##   4 - Not at all happy   1   0   1   0   3
##   3 - Not very happy     3  15  45  18   1
##   2 - Quite happy      113 223 104  15   2
##   1 - Very happy       172 100  22   6   0

In summary, the data suggests a positive association between higher happiness levels and better self-reported health. However, individuals with lower happiness levels show a more varied distribution across different health categories.

Life Satisfaction by Health

##     
##        1   2   3   4   5
##   1    1   0   1   0   1
##   2    1   0   0   0   1
##   3    4   7   3   3   0
##   4    7   7   5   8   1
##   5   10  15  20   4   2
##   6   11  23  18   5   0
##   7   25  40  27   5   1
##   8   64 100  59  12   0
##   9   77  81  19   2   0
##   10  89  65  20   0   0

In summary, the data suggests a positive association between higher life satisfaction levels and better self-reported health, particularly in the “Very good” and “Good” health categories. Lower life satisfaction levels are more dispersed across different health categories.

Trust & Life Satisfaction

##     
##        1   2   8   9
##   1    0   3   0   0
##   2    1   1   0   0
##   3    6  11   0   0
##   4   11  17   0   0
##   5   18  33   0   0
##   6   27  30   0   0
##   7   41  57   0   0
##   8  123 112   0   0
##   9  113  66   0   0
##   10  90  84   0   0

Trust Level 1 (Most people can be trusted):

Pattern: There is a noticeable positive association between trusting most people and higher life satisfaction. This is evident in the increasing counts as we move towards higher life satisfaction levels (7 and 8). Insights: Individuals who express trust in others tend to report higher life satisfaction. This aligns with existing literature suggesting that a general trust in people is linked to positive well-being.

Trust Level 2 (Can’t be too careful):

Pattern: The counts are more evenly distributed across life satisfaction levels, indicating less distinct associations compared to Trust Level 1. Insights: People who express caution and indicate that they can’t be too careful show a less clear relationship with life satisfaction. The counts are spread across various satisfaction levels, and there isn’t a dominant trend.

Trust Levels 8 and 9 (Don’t know and No answer):

Pattern: Both these trust levels have zero counts across all life satisfaction levels. Insights: Respondents who answer “Don’t know” or provide “No answer” to questions about trust do not contribute information regarding their life satisfaction. This might suggest a lack of response or uncertainty in these individuals about both trust and life satisfaction.

Overall Implications:

Trust in people seems to be a factor associated with higher life satisfaction. The pattern observed in Trust Level 1 implies that fostering a sense of trust in a community might contribute positively to overall life satisfaction.

Obs: The lack of information from individuals who respond with uncertainty or choose not to answer underscores the importance of clear and reliable data collection methods.

Trust & Happiness

##    
##       1   2   8   9
##   4 173 127   0   0
##   3 232 225   0   0
##   2  24  58   0   0
##   1   1   4   0   0

Trust Level 1 (Most people can be trusted):

Pattern: Individuals who express trust in most people tend to report higher levels of happiness. This is evident in the higher counts in the categories “Quite happy” and “Very happy.” Insights: There is a positive association between trusting most people and higher happiness levels. This aligns with the idea that a general sense of trust can contribute to an individual’s overall happiness.

Trust Level 2 (Can’t be too careful):

Pattern: The counts are more evenly distributed across happiness levels, indicating less distinct associations compared to Trust Level 1. Insights: People who express caution and indicate that they can’t be too careful show a less clear relationship with happiness. The counts are spread across various happiness levels, and there isn’t a dominant trend.

Trust Levels 8 and 9 (Don’t know and No answer):

Pattern: Both these trust levels have zero counts across all happiness levels. Insights: Respondents who answer “Don’t know” or provide “No answer” to questions about trust do not contribute information regarding their happiness. This might suggest a lack of response or uncertainty in these individuals about both trust and happiness.

Overall Implications:

Similar to the analysis with life satisfaction, trust in people seems to be associated with higher happiness. The pattern observed in Trust Level 1 implies that fostering a sense of trust in a community might contribute positively to overall happiness.

Obs: The lack of information from individuals who respond with uncertainty or choose not to answer underscores the importance of clear and reliable data collection methods.

Weak and Strong ties

Results

Weak ties: Friends

Friends x satisfaction

Friends x happiness

As expected, people with greater life satisfaction and happiness rates tend to rely more on friends.

Strong ties: Family

Family x satisfaction

Family x happiness

And the same happens when we analyze the importance on family, but in a greater scale.

Finding components for political interest and attitude dimension

For this, we’ve run a PCA analysis using these variables provided in the study:

# Renaming the variables
colnames(data)[6] <- "Politics_Importance"
colnames(data)[37] <- "Interest_In_Politics"
colnames(data)[38] <- "Sign_Petition"
colnames(data)[39] <- "Join_Boycotts"
colnames(data)[40] <- "Attend_Demonstrations"
colnames(data)[41] <- "Join_Unofficial_Strikes"
colnames(data)[45] <- "Satisfaction_Political_System"
colnames(data)[46] <- "Democratic_Political_System"
colnames(data)[47] <- "Importance_of_Democracy"

Notes about these variables:

A004 - Important in life: Politics
  • 1 Very important
  • 2 Rather important
  • 3 Not very important
  • 4 Not at all important
E023 - Interest in politics
  • 1 Very interested
  • 2 Somewhat interested
  • 3 Not very interested
  • 4 Not at all interested
E025 - Political action: signing a petition
  • 1 Have done
  • 2 Might do
  • 3 Would never do
E026 - Political action: joining in boycotts
  • 1 Have done
  • 2 Might do
  • 3 Would never do
E027 - Political action: attending lawful/peaceful demonstrations
  • 1 Have done
  • 2 Might do
  • 3 Would never do
E111_01 - Satisfaction with the political system
  • 1 Not satisfied at all
  • 2 2
  • 3 3
  • 4 4
  • 5 5
  • 6 6
  • 7 7
  • 8 8
  • 9 9
  • 10 Completely satisfied

OBS: We modified this one to look like 1 is completely satisfied and 10 not satisfied at all. This way we can make an easier interpretation of the results!

E117 - Political system: Having a democratic political system
  • 1 Very good
  • 2 Fairly good
  • 3 Fairly bad
  • 4 Very bad

Results

## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6     PC7
## Standard deviation     1.6305 1.2188 1.0347 0.9433 0.75966 0.70775 0.66743
## Proportion of Variance 0.3323 0.1857 0.1338 0.1112 0.07213 0.06261 0.05568
## Cumulative Proportion  0.3323 0.5180 0.6518 0.7630 0.83517 0.89779 0.95347
##                            PC8
## Standard deviation     0.61012
## Proportion of Variance 0.04653
## Cumulative Proportion  1.00000

Upon analyzing the scree plot, a distinct elbow was evident, signifying that the first few principal components, particularly PC1, PC2 and PC3, capture a substantial portion of the overall variance in the data. The elbow method, validated by the plot’s noticeable bend after PC3, guided our decision to prioritize these key contributors.

Subsequent examination of the cumulative proportion of variance reinforced this choice, revealing that PC1,PC2 and PC3 collectively explain 68.51% of the dataset’s total variance. This significant proportion supports the decision to focus on these principal components, striking a balance between capturing essential information and maintaining analytical simplicity, providing concise yet meaningful insights into the underlying patterns within the dataset.

Component Loadings:

The component loadings represent the correlation between the original variables and each principal component. Higher absolute values indicate a stronger correlation.

##                                       PC1         PC2           PC3         PC4
## Politics_Importance           -0.38769993  0.10930526 -0.5975678525 -0.16283646
## Interest_In_Politics          -0.44139015  0.10596216 -0.4917631056 -0.03212581
## Sign_Petition                 -0.44630768  0.04246822  0.1954184735  0.09642800
## Join_Boycotts                 -0.38129810  0.28180280  0.4193160240 -0.13711993
## Attend_Demonstrations         -0.40908839  0.26544071  0.4014660444 -0.02375614
## Satisfaction_Political_System -0.06015696 -0.42941116  0.1049512349 -0.88298350
## Democratic_Political_System   -0.25414117 -0.57682388  0.1219735685  0.22667153
## Importance_of_Democracy        0.27335838  0.55500406 -0.0003222443 -0.33577938
##                                        PC5         PC6         PC7          PC8
## Politics_Importance           -0.182743226  0.07462493 -0.05028878 -0.642374613
## Interest_In_Politics           0.005883865 -0.04394387  0.05036271  0.739311604
## Sign_Petition                  0.796688592  0.15338228 -0.28238512 -0.116357075
## Join_Boycotts                 -0.379769364 -0.48364703 -0.44981680  0.009836965
## Attend_Demonstrations         -0.169767718  0.33542029  0.67633937 -0.041058439
## Satisfaction_Political_System  0.109837127 -0.02086056  0.07975329  0.049524013
## Democratic_Political_System   -0.378305629  0.52085919 -0.34024522  0.079074022
## Importance_of_Democracy       -0.060269308  0.59230335 -0.36509696  0.129407687

Graph

## Warning in plot.window(...): "text" não é um parâmetro gráfico
## Warning in plot.xy(xy, type, ...): "text" não é um parâmetro gráfico
## Warning in axis(side = side, at = at, labels = labels, ...): "text" não é um
## parâmetro gráfico

## Warning in axis(side = side, at = at, labels = labels, ...): "text" não é um
## parâmetro gráfico
## Warning in box(...): "text" não é um parâmetro gráfico
## Warning in title(...): "text" não é um parâmetro gráfico
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "text"
## não é um parâmetro gráfico
## Warning in plot.window(...): "text" não é um parâmetro gráfico
## Warning in plot.xy(xy, type, ...): "text" não é um parâmetro gráfico
## Warning in title(...): "text" não é um parâmetro gráfico
## Warning in axis(3, col = col[2L], ...): "text" não é um parâmetro gráfico
## Warning in axis(4, col = col[2L], ...): "text" não é um parâmetro gráfico
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "text" não é um parâmetro gráfico

PC1:

Strong Negative Loadings: “Politics_Importance”: -0.4098 “Interest_In_Politics”: -0.4596 “Sign_Petition”: -0.4558 “Join_Boycotts”: -0.4200 “Attend_Demonstrations”: -0.4469

Individuals with lower scores on PC1 are less likely to find politics important in their lives, have lower interest in politics, and are less inclined to engage in specific political actions such as signing petitions, joining boycotts, or attending lawful/peaceful demonstrations.

PC3:

Strong Negative Loadings:

“Important in life: Politics”: -0.5975 “Interest in politics”: -0.4917

Strong Positive Loadings:

“Political action: signing a petition”: 0.1954 “Political action: joining in boycotts”: 0.4192 “Political action: attending lawful/peaceful demonstrations”: 0.4013 “Satisfaction with the political system (reversed)”: 0.1059 “Political system: Having a democratic political system”: 0.1225

Individuals with lower scores on PC3 are less likely to consider politics important in their lives and have less interest in politics. Additionaly, they are more likely to engage in specific political actions such as signing petitions, joining boycotts, and attending lawful/peaceful demonstrations. Additionally, they tend to express higher satisfaction with the political system and prefer a democratic political system.

Overall Interpretation:

PC1:

Captures the lack of interest and engagement in specific political actions among those who find politics less important.

PC2:

Emphasizes a strong link between satisfaction and a preference for democracy.

PC3:

Suggests a group that, while not personally prioritizing politics, actively engages in political activities and supports a democratic system.

Freedom of Choice and Control: exploring the association between this variable and Life Satisfaction and Happiness

We decided to test how strong the associations are by using two approaches: correlation analysis and regression analysis, then we compare the findings.

Note: We decided to recode happiness to examine its alignment with the direction of movement in the other two variables and to facilitate a more meaningful comparison across the dataset.

Q: Please use the scale to indicate how much freedom of choice and control you feel you have over the way your life turns out?

1 None at all 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 A great deal

1: Correlation Analysis

# Load the necessary library for spearman correlation
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.3.2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following object is masked from 'package:psych':
## 
##     describe
## The following objects are masked from 'package:base':
## 
##     format.pval, units
# Renaming the variables
colnames(data)[12] <- "Freedom_of_Choice_Control"

# Convert to numeric if necessary
data$Freedom_of_Choice_Control <- as.numeric(data$Freedom_of_Choice_Control)
data$Happiness <- as.numeric(data$Happiness)

# Extract the happiness variable + other variables
happiness <- data$Happiness
life_satisfaction <- data$Life_Satisfaction
freedom_choice_control <- data$Freedom_of_Choice_Control

# Standardize the variables
scaled_happiness <- scale(happiness)
scaled_life_satisfaction <- scale(life_satisfaction)
scaled_freedom_choice_control <- scale(freedom_choice_control)

# Calculate the correlation
pearson_corr_fc_happiness <- cor(data$Freedom_of_Choice_Control, data$Happiness, method = "pearson")

# Spearman correlation between Freedom_of_Choice_Control and happiness_numeric
spearman_corr_fc_happiness <- cor(data$Freedom_of_Choice_Control, data$Happiness, method = "spearman")

# Pearson correlation between Freedom_of_Choice_Control and life_satisfaction
pearson_corr_fc_life_satisfaction <- cor(data$Freedom_of_Choice_Control, data$Life_Satisfaction, method = "pearson")

# Spearman correlation between Freedom_of_Choice_Control and life_satisfaction
spearman_corr_fc_life_satisfaction <- cor(data$Freedom_of_Choice_Control, data$Life_Satisfaction, method = "spearman")

Results:

## Pearson Correlation between Freedom_of_Choice_Control and Happiness: -0.2132671
## Spearman Correlation between Freedom_of_Choice_Control and Happiness: -0.1842096

There is a weak negative correlation (close to 0.2) between freedom of choice/control and happiness. This indicates that as the perceived freedom of choice/control decreases, there is a slight tendency for happiness to decrease, and vice versa. The correlation is weaker compared to the correlation with life satisfaction.

## Pearson Correlation between Freedom_of_Choice_Control and Life_Satisfaction: 0.492049
## Spearman Correlation between Freedom_of_Choice_Control and Life_Satisfaction: 0.4683609

There is a moderate positive correlation (close to 0.5) between freedom of choice/control and life satisfaction. This suggests that as freedom of choice/control increases, there is a tendency for life satisfaction to increase, and vice versa.

2: Regression analysis

Life satisfaction and freedom of choice/control

## 
## Call:
## lm(formula = Life_Satisfaction ~ Freedom_of_Choice_Control, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9313 -0.8078  0.1922  0.8775  4.8217 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                4.74004    0.20127   23.55   <2e-16 ***
## Freedom_of_Choice_Control  0.43825    0.02672   16.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.572 on 842 degrees of freedom
## Multiple R-squared:  0.2421, Adjusted R-squared:  0.2412 
## F-statistic:   269 on 1 and 842 DF,  p-value: < 2.2e-16

In the analysis of Life Satisfaction predicted by Freedom of Choice/Control, the linear regression model reveals a statistically significant and positive relationship. The estimated coefficient for Freedom_of_Choice_Control is 0.43825, indicating that, on average, a one-unit increase in Freedom of Choice/Control is associated with a 0.43825 unit increase in Life Satisfaction. The model is highly significant (p-value < 2.2e-16), and approximately 24.21% of the variability in Life Satisfaction is explained by Freedom of Choice/Control, as indicated by the R-squared value. These findings suggest that, while Freedom of Choice/Control is a significant predictor of Life Satisfaction, other unexamined factors may contribute to the remaining variance.

Happiness and freedom of choice/control

## 
## Call:
## lm(formula = Happiness ~ Freedom_of_Choice_Control, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1781 -0.6351  0.1613  0.2970  2.4328 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                2.24600    0.08072  27.825  < 2e-16 ***
## Freedom_of_Choice_Control -0.06788    0.01072  -6.334 3.88e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6306 on 842 degrees of freedom
## Multiple R-squared:  0.04548,    Adjusted R-squared:  0.04435 
## F-statistic: 40.12 on 1 and 842 DF,  p-value: 3.881e-10

In the analysis of Happiness predicted by Freedom of Choice/Control, the linear regression model also reveals a statistically significant relationship, but with a negative association. The estimated coefficient for Freedom_of_Choice_Control is -0.06788, indicating that a one-unit increase in Freedom of Choice/Control is associated with a decrease of 0.06788 units in Happiness. The model is highly significant (p-value = 3.88e-10), yet the R-squared value is relatively low at 4.55%, suggesting that Freedom of Choice/Control explains a modest portion of the variability in Happiness. This indicates a nuanced relationship where an increase in Freedom of Choice/Control is associated with a decrease in Happiness, but the strength of this association is limited, leaving room for the influence of other factors on Happiness.

Combining both results: correlation analysis and regression analysis

The analysis reveals compelling evidence for a substantial association between individual autonomy, as measured by Freedom of Choice/Control, and life satisfaction. This is supported by a robust positive correlation of 0.49 and a significant positive impact on life satisfaction in the regression model, where 24.21% of the variance is explained. However, the nuanced relationship with happiness is characterized by a weaker negative correlation of -0.21 and a modest impact on Happiness in the regression model (4.55% explained variance). While individual autonomy is strongly linked to life satisfaction, the connection with happiness involves additional complexities that warrant further investigation.

“Respondent immigrant / born in country”: exploring the association between this variable and Life Satisfaction and Happiness

For task 9, out of the variables that were not used to explain SWB within this project, the variable “Respondent immigrant / born in country” (renamed to Immigrant_Status) is the one we decided to use. We decided it because we wanted to know if being an immigrant or a native impacts on happiness and life satisfaction. To test this we used two different approaches: correlation analysis and regression analysis.

1: Correlation Analysis

colnames(data)[65] <- "Immigrant_Status"

# Standardize the variables
scaled_immigrant_status <- scale(data$Immigrant_Status)

### Correlation
# Calculate the correlation
pearson_corr_ims_happiness <- cor(data$Immigrant_Status, data$Happiness, method = "pearson")

# Spearman correlation between Freedom_of_Choice_Control and happiness_numeric
spearman_corr_ims_happiness <- cor(data$Immigrant_Status, data$Happiness, method = "spearman")

# Pearson correlation between Freedom_of_Choice_Control and life_satisfaction
pearson_corr_ims_life_satisfaction <- cor(data$Immigrant_Status, data$Life_Satisfaction, method = "pearson")

# Spearman correlation between Freedom_of_Choice_Control and life_satisfaction
spearman_corr_ims_life_satisfaction <- cor(data$Immigrant_Status, data$Life_Satisfaction, method = "spearman")

Results:

## Pearson Correlation between Immigrant Status and Happiness: 0.04149277
## Spearman Correlation between Immigrant Status and Happiness: 0.04024693

A correlation coefficient of -0.04 between immigrant status and happiness suggests a very weak negative correlation. This means that there is a slight tendency for immigrant status and happiness to move in opposite directions, but the relationship is very weak.

## Pearson Correlation between Immigrant Status and Life_Satisfaction: -0.02315874
## Spearman Correlation between Immigrant Status and Life_Satisfaction: -0.03093079

A correlation coefficient of -0.02 between immigrant status and life satisfaction suggests an extremely weak negative correlation. This indicates a slight tendency for immigrant status and life satisfaction to move in opposite directions, but again, the relationship is extremely weak.

2: Regression analysis

Life satisfaction and immigrant status

## 
## Call:
## lm(formula = Life_Satisfaction ~ Immigrant_Status, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9338 -0.9338  0.0662  1.0662  2.2022 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        8.0698     0.2322  34.758   <2e-16 ***
## Immigrant_Status  -0.1360     0.2024  -0.672    0.502    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.806 on 842 degrees of freedom
## Multiple R-squared:  0.0005363,  Adjusted R-squared:  -0.0006507 
## F-statistic: 0.4518 on 1 and 842 DF,  p-value: 0.5017

The regression analysis results show the relationship between life satisfaction and immigrant status:

The coefficient for immigrant status is -0.1360 with a standard error of 0.2024 and a t-value of -0.672. However, this coefficient is not statistically significant (p-value = 0.502), indicating that there is no evidence to reject the null hypothesis that the coefficient is equal to zero.

The adjusted R-squared value is -0.0006507, indicating that the model does not explain a significant portion of the variance in life satisfaction.

The F-statistic is 0.4518 with a p-value of 0.5017, suggesting that the model as a whole does not significantly predict life satisfaction.

Overall, the results suggest that there is no significant relationship between immigrant status and life satisfaction.

Happiness and immigrant status

## 
## Call:
## lm(formula = Happiness ~ Immigrant_Status, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8315 -0.7444  0.2556  0.2556  2.2556 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.65728    0.08292  19.987   <2e-16 ***
## Immigrant_Status  0.08709    0.07227   1.205    0.229    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6449 on 842 degrees of freedom
## Multiple R-squared:  0.001722,   Adjusted R-squared:  0.000536 
## F-statistic: 1.452 on 1 and 842 DF,  p-value: 0.2285

The regression analysis results show the relationship between happiness and immigrant status:

The coefficient for immigrant status is -0.08709 with a standard error of 0.07227 and a t-value of -1.205. This coefficient is not statistically significant (p-value = 0.229), indicating that there is no evidence to reject the null hypothesis that the coefficient is equal to zero.

The adjusted R-squared value is 0.000536, indicating that the model explains a very small fraction of the variance in happiness.

The F-statistic is 1.452 with a p-value of 0.2285, suggesting that the model as a whole does not significantly predict happiness.

Overall, the results suggest that there is no significant relationship between immigrant status and happiness.

Combining both results: correlation analysis and regression analysis

In the correlation analysis, immigrant status showed a very weak positive correlation with happiness (Pearson: -0.0415, Spearman: -0.0402) and an extremely weak negative correlation with life satisfaction (Pearson: -0.0232, Spearman: -0.0309). However, in the regression analysis, immigrant status did not emerge as a significant predictor for either life satisfaction or happiness. The regression models for life satisfaction and happiness explained minimal variability (Adjusted R-squared: -0.0006507 for life satisfaction and 0.000536 for happiness), indicating that immigrant status alone does not significantly predict well-being. These results suggest that while there may be some weak correlations between immigrant status and well-being, the relationship is not significant when considering other factors included in the regression models.

Relationship between “life satisfaction” and “control over life”, “health” and “immigrant status”

### Correlation Analysis
# Calculate correlation coefficients
pearson_corr_life_satisfaction_control <- cor(data$Life_Satisfaction, data$Freedom_of_Choice_Control, method = "pearson")
spearman_corr_life_satisfaction_control <- cor(data$Freedom_of_Choice_Control, data$Life_Satisfaction, method = "spearman")

pearson_corr_immigrant_health <- cor(data$Immigrant_Status, data$Health, method = "pearson")
spearman_corr_immigrant_health <- cor(data$Immigrant_Status, data$Health, method = "spearman")

# Print correlation coefficients
cat("Pearson Correlation between Freedom_of_Choice_Control and Happiness:", pearson_corr_life_satisfaction_control, "\n")
## Pearson Correlation between Freedom_of_Choice_Control and Happiness: 0.492049
cat("Spearman Correlation between Freedom_of_Choice_Control and Happiness:", spearman_corr_life_satisfaction_control, "\n")
## Spearman Correlation between Freedom_of_Choice_Control and Happiness: 0.4683609
cat("Pearson Correlation between Immigrant_Status and Health:", pearson_corr_immigrant_health, "\n")
## Pearson Correlation between Immigrant_Status and Health: 0.0009282111
cat("Spearman Correlation between Immigrant_Status and Health:", spearman_corr_immigrant_health, "\n")
## Spearman Correlation between Immigrant_Status and Health: 0.00827119

Certainly! Here’s the interpretation of the correlation coefficients:

  • Freedom of Choice Control and Happiness: The correlation coefficient between Freedom of Choice Control and Happiness is 0.49. This suggests a moderate positive correlation between these variables, indicating that as the level of perceived freedom of choice and control increases, the level of happiness tends to increase as well. The Spearman correlation coefficient of 0.47 confirms this positive relationship.

  • Immigrant Status and Health: The correlation coefficient between Immigrant Status and Health is close to 0 (0.0009 for Pearson and 0.0082 for Spearman). This indicates a very weak correlation between immigrant status and health status, suggesting that there is no significant linear relationship between these two variables.

Graph

# Create a contingency table between variables
contingency_table_aa <- table(data$Life_Satisfaction, data$Freedom_of_Choice_Control)
print(contingency_table_aa)
##     
##       1  2  3  4  5  6  7  8  9 10
##   1   2  0  0  0  1  0  0  0  0  0
##   2   0  0  0  1  0  1  0  0  0  0
##   3   1  0  4  3  5  2  1  1  0  0
##   4   1  1  1  5  7  7  2  3  1  0
##   5   1  0  5  8 16  7  2  2  3  7
##   6   2  1  2  7 14 13  7  4  4  3
##   7   0  0  6  6 13 20 24 12 10  7
##   8   1  1  2  8 22 23 51 73 27 27
##   9   1  0  1  1  9 14 39 69 28 17
##   10  1  0  2  3 11 14 12 32 17 82
# Spineplot for Life Satisfaction vs. Control over Life
spineplot(contingency_table_aa, 
          main = "Spine Plot: Life Satisfaction vs. Control over Life",
          xlab = "Life Satisfaction",
          ylab = "Control over Life", 
          col.lab = "black",
          col = c("#43ED9E", "#FA617D", "#7F5AF8", "#FFDD32", "#42A5F5", "#FF8A65", "#66BB6A", "#FFEB3B", "#9575CD", "#78909C"))

# Create a contingency table
contingency_table_bb <- table(data$Health, data$Immigrant_Status)

# Spineplot for Health vs. Immigrant Status
spineplot(contingency_table_bb, 
          main = "Spine Plot: Health vs. Immigrant Status",
          xlab = "Health", 
          ylab = "Immigrant Status", 
          col.lab = "black",
          col = c("#43ED9E", "#FA617D"))

# Add legend
legend("topright", 
       legend = c("Born in this country", "Immigrant to this country"), 
       fill = c("#FA617D", "#43ED9E"))

Test for Statistical Significance

# Perform hypothesis tests (Pearson correlation tests)
test_result_control <- cor.test(data$Life_Satisfaction, data$Freedom_of_Choice_Control)
test_result_health <- cor.test(data$Immigrant_Status, data$Health)

# Print the test results
cat("\nTest for Statistical Significance - Life Satisfaction and Control over Life:\n")
## 
## Test for Statistical Significance - Life Satisfaction and Control over Life:
print(test_result_control)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Life_Satisfaction and data$Freedom_of_Choice_Control
## t = 16.401, df = 842, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4391485 0.5415493
## sample estimates:
##      cor 
## 0.492049
cat("\nTest for Statistical Significance - Immigrant Status and Health:\n")
## 
## Test for Statistical Significance - Immigrant Status and Health:
print(test_result_health)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Immigrant_Status and data$Health
## t = 0.026934, df = 842, p-value = 0.9785
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.06655821  0.06840618
## sample estimates:
##          cor 
## 0.0009282111

To interpret the results of the statistical significance tests:

  1. Test for Statistical Significance - Life Satisfaction and Control over Life:
    • The Pearson correlation coefficient between life satisfaction and control over life is 0.492.
    • The p-value of the test is extremely low, less than 2.2e-16, indicating strong statistical evidence that the observed correlation is not due to chance.
    • This suggests that there is a significant and positive association between life satisfaction and control over life.
  2. Test for Statistical Significance - Immigrant Status and Health:
    • The Pearson correlation coefficient between immigrant status and health is 0.00093.
    • The p-value of the test is 0.9785, which is very high.
    • This indicates that there is no statistical evidence to reject the null hypothesis that there is no correlation between immigrant status and health. The observed correlation may have occurred by chance.

Relation between Life Satisfaction and Sex

The contingency table shows the distribution of life satisfaction levels across different genders (“1 - Male” and “2 - Female”). Each cell in the table represents the frequency of individuals falling into a specific combination of life satisfaction level and gender.

## Warning: package 'vcd' was built under R version 4.3.2
## Carregando pacotes exigidos: grid
##     
##      Male Female
##   1     2      1
##   2     0      2
##   3     4     13
##   4    15     13
##   5    25     26
##   6    25     32
##   7    45     53
##   8   108    127
##   9    80     99
##   10   71    103
## Total number of males: 375
## Total number of females: 469

Analyzing the provided contingency table, which shows the distribution of life satisfaction levels across different genders (males and females), can provide insights into whether there are associations or differences in life satisfaction based on gender. Here are some observations:

  1. Life Satisfaction Levels:
    • The distribution across life satisfaction levels appears to vary between males and females.
    • Life satisfaction levels 8 and 9 have higher frequencies for females, while levels 3 and 4 have higher frequencies for males.
  2. Gender Impact on Life Satisfaction:
    • Higher frequencies for females in life satisfaction levels 8 and 9 may suggest that, in this dataset, females are more represented among individuals with higher life satisfaction.
    • Conversely, the higher frequency of males in life satisfaction levels 3 and 4 may indicate a relatively higher representation of males in the lower life satisfaction levels.
  3. Statistical Analysis:
    • To rigorously assess the impact of gender on life satisfaction, statistical tests such as chi-square tests or logistic regression can be employed.
    • These tests can determine whether the observed differences are statistically significant or if they could occur by chance.

Graph

A spineplot is a suitable choice for visualizing the relationship between two categorical variables, such as gender (male/female) and life satisfaction levels.

X-Axis (Life Satisfaction): Represents the levels of life satisfaction ranging from 1 to 10.

Y-Axis (Frequency): Represents the count of individuals at each life satisfaction level.

Spines: There are two spines, one for each category of “Sex” (1 - Male and 2 - Female).

Interpretation:

Overall Distribution: The spineplot provides an overview of the distribution of life satisfaction levels for both males and females.

Comparison: Comparing the widths of corresponding segments in both spines, it appears that the distribution is somewhat similar between males and females.

Life Satisfaction Peaks: For both males and females, there is a noticeable peak around life satisfaction level 8, indicating that a substantial number of individuals from both genders report high life satisfaction at this level.

Gaps or Patterns: There are no significant gaps in the spines, suggesting that most life satisfaction levels have representation for both males and females.

Trends: The trend shows that, in general, the distribution is comparable between males and females across different life satisfaction levels.

Overall Impression:The spineplot suggests that, based on this data, there isn’t a clear and distinctive pattern indicating a strong association between sex and life satisfaction. The distributions are somewhat similar for both genders.

Keeping in mind that this interpretation is based on the visual patterns observed in the spineplot, and further statistical analysis may provide additional insights.

Statistical Analysis

To investigate whether “life satisfaction” differs with respect to “sex,” it’s possible to use a statistical test for independence. Given the nature of the data (categorical variables with more than two levels), the chi-square test for independence is a suitable choice.

The steps previous to conduct the chi-square test for independence were:

Step 1: Set Hypotheses

  • Null Hypothesis (H0): There is no association between life satisfaction and sex.
  • Alternative Hypothesis (H1): There is an association between life satisfaction and sex.

Step 2: Choose Significance Level

  • Typically, the significance level (α) is set to 0.05.
## Warning in chisq.test(contingency_table_x): Aproximação do qui-quadrado pode
## estar incorreta
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table_x
## X-squared = 7.8392, df = 9, p-value = 0.5504

Interpreting the results of the test includes:

  1. Chi-Squared Statistic:
    • Represents the magnitude of the difference between observed and expected frequencies. The higher the value, the greater the divergence.
    • In this case, the value is 7.8392.
  2. Degrees of Freedom:
    • Refers to the number of independent categories in the two variables minus 1.
    • Here, there are 2 categories for “Sex” and 10 categories for “Life Satisfaction,” so (2-1) * (10-1) = 9 degrees of freedom.
  3. p-value:
    • Represents the probability of observing a chi-square statistic as extreme as the calculated one, under the null hypothesis that the variables are independent.
    • If the p-value is low (usually less than 0.05), you can reject the null hypothesis.

What the Results Mean:

  • Chi-Squared Statistic:
    • The value of 7.8392 suggests a difference, but it’s important to consider the context and degrees of freedom.
  • Degrees of Freedom:
    • With 9 degrees of freedom, it’s essential to note that a higher number of degrees of freedom can make it more challenging to reject the null hypothesis.
  • p-value:
    • The p-value of 0.5504 is relatively high, indicating that there is no significant evidence against the null hypothesis.
    • There is no statistical support for the idea that life satisfaction differs significantly between men and women.

Conclusion:

  • Based on the Results:
    • There is no statistically significant evidence to suggest an association between “Life Satisfaction” and “Sex.”
    • The observed variation may be due to chance.
OLS regression
colnames(data)[71] <- "Employment_Status"
# Convert Life_Satisfaction to numeric
data$ls <- as.numeric(as.character(data$Life_Satisfaction))
data$health_factor <- factor(data$Health)
data$ms_factor <- factor(data$Marital_Status)
data$ed_factor <- factor(data$Education)
data$fcc_factor <- factor(data$Freedom_of_Choice_Control)
data$es_factor <- factor(data$Employment_Status)
data$income_factor <- factor(data$Income)
data$family_factor <- factor(data$Family_Importance)
data$friends_factor <- factor(data$Friends_Importance)

# Extrair os componentes principais
PC1 <- pca_result$x[, 1]
PC2 <- pca_result$x[, 2]
PC3 <- pca_result$x[, 3]

# Adicionar os componentes principais ao modelo
model_austria <- lm(ls ~ Age + I(Age^2) + health_factor + ed_factor + ms_factor + es_factor + Sex + family_factor + friends_factor + Trust_Most_People + fcc_factor + income_factor + Immigrant_Status + PC1 + PC2 + PC3, data = data)


summary(model_austria)
## 
## Call:
## lm(formula = ls ~ Age + I(Age^2) + health_factor + ed_factor + 
##     ms_factor + es_factor + Sex + family_factor + friends_factor + 
##     Trust_Most_People + fcc_factor + income_factor + Immigrant_Status + 
##     PC1 + PC2 + PC3, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3228 -0.7249  0.0900  0.8463  3.7652 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         6.8461429  0.7480785   9.152  < 2e-16 ***
## Age                                -0.0174337  0.0217101  -0.803 0.422203    
## I(Age^2)                            0.0002434  0.0002170   1.121 0.262517    
## health_factor2                     -0.3607685  0.1167250  -3.091 0.002067 ** 
## health_factor3                     -0.9060952  0.1511924  -5.993 3.13e-09 ***
## health_factor4                     -1.9980959  0.2629628  -7.598 8.50e-14 ***
## health_factor5                     -2.9401805  0.6452451  -4.557 6.02e-06 ***
## ed_factor3                         -0.3510046  0.1629346  -2.154 0.031521 *  
## ed_factor4                         -0.1389833  0.2130489  -0.652 0.514363    
## ed_factor5                         -0.0639857  0.3427820  -0.187 0.851971    
## ed_factor6                         -0.3550311  0.2744471  -1.294 0.196174    
## ed_factor7                         -0.4101972  0.2555920  -1.605 0.108918    
## ed_factor8                         -0.6463454  0.5107157  -1.266 0.206041    
## ms_factor2                         -0.0610846  0.6470681  -0.094 0.924814    
## ms_factor3                         -0.0554273  0.1649357  -0.336 0.736920    
## ms_factor4                         -1.9746885  1.0047074  -1.965 0.049714 *  
## ms_factor5                         -0.1020759  0.2131926  -0.479 0.632216    
## ms_factor6                          0.0755158  0.1590379   0.475 0.635040    
## es_factor2                          0.6273998  0.1891472   3.317 0.000952 ***
## es_factor3                          0.6860731  0.3130727   2.191 0.028713 *  
## es_factor4                          0.2292128  0.1936719   1.184 0.236963    
## es_factor5                          0.6139157  0.3315804   1.851 0.064473 .  
## es_factor6                          0.3235803  0.3965150   0.816 0.414712    
## es_factor7                         -0.2504944  0.2784085  -0.900 0.368535    
## es_factor8                         -0.1376870  0.5104698  -0.270 0.787442    
## SexFemale                          -0.0792532  0.1048404  -0.756 0.449910    
## family_factorRather important      -0.2926571  0.1626289  -1.800 0.072315 .  
## family_factorNot very important    -0.2277235  0.3427414  -0.664 0.506617    
## family_factorNot at all important   1.3507210  1.0205200   1.324 0.186032    
## friends_factorRather important      0.1344582  0.1083460   1.241 0.214972    
## friends_factorNot very important   -0.4587303  0.2861957  -1.603 0.109367    
## friends_factorNot at all important -2.6856154  1.4146889  -1.898 0.058011 .  
## Trust_Most_People2                 -0.1398153  0.1032622  -1.354 0.176130    
## fcc_factor2                         0.3740162  0.9424557   0.397 0.691583    
## fcc_factor3                         0.1367166  0.5644020   0.242 0.808663    
## fcc_factor4                         0.6885071  0.5238389   1.314 0.189111    
## fcc_factor5                         0.7647836  0.4930846   1.551 0.121298    
## fcc_factor6                         1.2010960  0.4925963   2.438 0.014976 *  
## fcc_factor7                         1.5578949  0.4893847   3.183 0.001513 ** 
## fcc_factor8                         1.9184906  0.4838476   3.965 8.00e-05 ***
## fcc_factor9                         1.8856998  0.4970020   3.794 0.000159 ***
## fcc_factor10                        2.6571468  0.4898558   5.424 7.74e-08 ***
## income_factor2                      0.1811685  0.2202121   0.823 0.410927    
## income_factor3                      0.1649567  0.2392277   0.690 0.490687    
## income_factor4                      0.3823277  0.2328188   1.642 0.100954    
## income_factor5                      0.4633034  0.2370188   1.955 0.050970 .  
## income_factor6                      0.7301691  0.2668782   2.736 0.006360 ** 
## income_factor7                      0.4863428  0.2616906   1.858 0.063475 .  
## income_factor8                      0.5061840  0.2686403   1.884 0.059899 .  
## income_factor9                      0.8838371  0.3195557   2.766 0.005810 ** 
## income_factor10                     0.3894775  0.3528727   1.104 0.270045    
## Immigrant_Status                   -0.0517032  0.1615893  -0.320 0.749078    
## PC1                                 0.1075613  0.0347368   3.096 0.002028 ** 
## PC2                                 0.3454944  0.0429129   8.051 3.02e-15 ***
## PC3                                -0.0488147  0.0499047  -0.978 0.328296    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.374 on 789 degrees of freedom
## Multiple R-squared:  0.4575, Adjusted R-squared:  0.4203 
## F-statistic: 12.32 on 54 and 789 DF,  p-value: < 2.2e-16

Note: Variable employment status (es): 1 Full time (30h a week or more) 2 Part time (less then 30 hours a week) 3 Self employed 4 Retired/pensioned 5 Housewife (not otherwise employed) 6 Student 7 Unemployed 8 Other

Which are the variables with a strong association?

Based on the provided analysis, several variables exhibit a strong association with life satisfaction:

Perceived Health Levels 2, 3, 4, and 5: Lower levels of perceived health, represented by health levels 2, 3, 4, and 5, show substantial decreases in life satisfaction. The coefficients for these health levels indicate reductions in life satisfaction ranging from approximately 0.36 to 2.94 units compared to the reference level, highlighting a robust relationship between health perception and life satisfaction. Part-time Employment (es_factor2): Part-time employment exhibits a strong association with higher life satisfaction, with an estimated increase of approximately 0.63 units compared to full-time employment. This numerical difference underscores the strong relationship between part-time employment and enhanced well-being. Freedom of Choice and Control: Each unit increase in perceived freedom of choice and control corresponds to substantial rises in life satisfaction. The coefficients for freedom of choice and control levels 6 to 10 range from approximately 1.20 to 2.66 units, indicating a significant impact on life satisfaction with greater perceived freedom and control. Income Level: Moving up in income deciles, particularly Deciles 6, 7, 8, and 9, is associated with notable increases in life satisfaction. The coefficients for these income deciles range from approximately 0.49 to 0.88 units, illustrating the strong positive relationship between income level and life satisfaction. Principal Components PC1 and PC2: PC1 and PC2 both demonstrate statistically significant associations with life satisfaction. For PC1, each unit increase corresponds to an increase of approximately 0.11 units in life satisfaction, while for PC2, each unit increase corresponds to an increase of approximately 0.35 units. These numerical values indicate the strength of the associations between these principal components and life satisfaction, highlighting their importance in shaping overall well-being.

These numerical explanations provide concrete evidence of the strong associations between these variables and life satisfaction, emphasizing their significance in understanding individuals’ overall well-being.

Which ones are surprisingly not statistically significant?

For the 0.05 level of significance, the variables that none of the statuses showed statistical significance are: age, sex, trust and the variable of interest immigrant status. Honestly it is no surprise that these variables didn’t show statistical significance. But we tend to naturally believe that immigrants and people that don’t trust others tend to be less satisfied in life. And thus the coefficient for not trusting people and being an immigrant are negative, they are not statistically significance.

Is your variable of interest important for life satisfaction or not at all?

We didn’t find statistic significance for our variable of interest. Thus, we can’t confirm that the computed coefficient corresponds to reality. Therefore, interpreting the coefficient (-0.05), being an immigrant has a negative impact on life satisfaction. If the person is an immigrant life satisfaction should be 5% lesser than if it wasn’t.

Is the variable employment (with all its levels) statistically significant?

It is but not for all levels. For the 0.05 level of significance, only the statuses part-time and self-employed are statistically significant. The others don’t show statistical significance.

In the bivariate analysis, we may not have seen a sex effect, is this in the multivariate case still so?

Despite not being statistically significant, being a female has a 7,9% decrease in life satisfaction. We can reach this conclusion by analyzing the sex binary variable coefficient and comparing its result to life satisfaction’s answers’ range. Since the model computed a -0.079 coefficient, being a female has a 7,9% decrease on the explained variable.

How is the dependence on age? At what age is the minimum level of life satisfaction reached?

age_graph <- seq(min(data$Age), max(data$Age), 1)
ls_graph <- 6.8461429 + -0.0117454 * age_graph + 0.0002139 * age_graph^2

# Create a scatter plot with a regression line
plot(data$Age, data$Life_Satisfaction, 
     main = "Scatter plot with Regression Line",
     xlab = "Age", ylab = "Life_Satisfaction")
lines(age_graph, ls_graph, col = "red")

# Find the age at which the minimum life satisfaction is reached
min_life_satisfaction_age <- data$Age[which.min(data$Life_Satisfaction)]
cat("The age at which the minimum life satisfaction is reached:", min_life_satisfaction_age, "\n")
## The age at which the minimum life satisfaction is reached: 44

The minimum level of life satisfaction is reached at age 44.

Is there a marital status significantly different from the other levels of this variable?

Yes. Marital status 4 (Separated) is the only one that shows statistical significance and its coefficient absolute value (-1.97) is significantly higher than the others.

Are the social capital variables (or principal components) of any relevance?

Yes, but only PC1 and PC2 have statistical significance. Both PC1 and PC2 thus contribute significantly to explaining variations in life satisfaction, indicating that factors related to political engagement and other dimensions of well-being play important roles in shaping overall life satisfaction. Higher scores on PC1, likely representing factors related to political engagement and satisfaction with the political system, are associated with higher levels of life satisfaction. PC2 also shows a statistically significant association with life satisfaction, with an estimated increase of approximately 0.35 units per unit increase in PC2 score (p < 0.001). This suggests that higher scores on PC2, which may reflect other dimensions of well-being not captured by PC1, are also associated with higher levels of life satisfaction.