The European Value Study (EVS) and the World Value Survey (WVS) are extensive, cross-national, and longitudinal survey research initiatives that have been ongoing since the early eighties. These programs involve a multitude of questions replicated over the years. In 2017, EVS and WVS collaboratively undertook joint data collection efforts, with EVS focusing on European countries and WVS covering the rest of the world.
In this document, we will be exploring specifically the data from Austria collected in 2018. The aim of the survey is to gain valuable insights into the evolution of values and their impact on society, helping to inform academic and policy discussions at both national and international levels.It will provide insights into the social, political, and cultural dynamics within Austria
First, we summarized the data and gathered all the information in the table below. We also renamed the “Happiness” and “Life Satisfaction” variables.
#Preparing the data
# Loading the data on rda
load("c:/users/dell/downloads/Austria.rda")
#Library
# Descriptive statistics
library(psych)
## Warning: package 'psych' was built under R version 4.3.2
library(knitr)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(haven)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stats)
# Summary statistics
summary(data)
## cntry_AN year reg_nuts1 A001
## Length:844 Min. :2018 Length:844 Min. :1.000
## Class :character 1st Qu.:2018 Class :character 1st Qu.:1.000
## Mode :character Median :2018 Mode :character Median :1.000
## Mean :2018 Mean :1.164
## 3rd Qu.:2018 3rd Qu.:1.000
## Max. :2018 Max. :4.000
## A002 A004 A005 A006
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000
## Median :1.000 Median :2.000 Median :1.000 Median :3.000
## Mean :1.417 Mean :2.431 Mean :1.649 Mean :2.655
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## A008 A009 A170 A173
## Min. :1.000 Min. :1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 7.000 1st Qu.: 6.000
## Median :2.000 Median :2.000 Median : 8.000 Median : 8.000
## Mean :1.754 Mean :1.975 Mean : 7.919 Mean : 7.255
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.: 9.000 3rd Qu.: 9.000
## Max. :4.000 Max. :5.000 Max. :10.000 Max. :10.000
## A065 A066 A067 A068
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.359 Mean :0.1363 Mean :0.1351 Mean :0.07583
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## A071 A072 A074 A078
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.04028 Mean :0.06043 Mean :0.2761 Mean :0.03199
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
## A079 A080_01 A080_02 A124_02
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.07346 Mean :0.0936 Mean :0.04976 Mean :0.09479
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## A124_03 A124_06 A124_08 A124_09
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.6789 Mean :0.1943 Mean :0.7855 Mean :0.1232
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## A165 B008 D001_B G007_18_B
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :1.000 Median :1.000 Median :1.000 Median :2.000
## Mean :1.491 Mean :1.491 Mean :1.178 Mean :1.864
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :2.000 Max. :3.000 Max. :4.000 Max. :4.000
## G007_33_B G007_34_B G007_35_B G007_36_B E023
## Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:2.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
## Median :2.000 Median :3.00 Median :2.000 Median :2.000 Median :2.000
## Mean :1.648 Mean :2.62 Mean :2.469 Mean :2.387 Mean :2.179
## 3rd Qu.:2.000 3rd Qu.:3.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.00 Max. :4.000 Max. :4.000 Max. :4.000
## E025 E026 E027 E028
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :1.000 Median :3.000 Median :2.000 Median :3.000
## Mean :1.615 Mean :2.368 Mean :2.283 Mean :2.685
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :3.000 Max. :3.000 Max. :3.000 Max. :3.000
## E035 E037 E039 E111_01
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 3.000 1st Qu.: 3.000 1st Qu.: 2.000 1st Qu.: 5.000
## Median : 5.000 Median : 5.000 Median : 3.000 Median : 7.000
## Mean : 5.121 Mean : 4.614 Mean : 3.477 Mean : 6.411
## 3rd Qu.: 7.000 3rd Qu.: 6.000 3rd Qu.: 5.000 3rd Qu.: 8.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## E117 E235 F025 F028
## Min. :1.000 Min. : 1.000 Min. :0.0000 Min. :1.000
## 1st Qu.:1.000 1st Qu.: 9.000 1st Qu.:0.0000 1st Qu.:3.000
## Median :1.000 Median :10.000 Median :1.0000 Median :6.000
## Mean :1.348 Mean : 9.104 Mean :0.9929 Mean :5.351
## 3rd Qu.:2.000 3rd Qu.:10.000 3rd Qu.:1.0000 3rd Qu.:8.000
## Max. :4.000 Max. :10.000 Max. :9.0000 Max. :8.000
## F034 F050 F063 F116
## Min. :1.000 Min. :0.0000 Min. : 1.000 Min. : 1.000
## 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.: 3.000 1st Qu.: 1.000
## Median :1.000 Median :1.0000 Median : 6.000 Median : 1.000
## Mean :1.436 Mean :0.7251 Mean : 5.608 Mean : 1.793
## 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.: 8.000 3rd Qu.: 2.000
## Max. :3.000 Max. :1.0000 Max. :10.000 Max. :10.000
## F118 F119 F120 F121
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 5.000 1st Qu.: 1.000 1st Qu.: 4.000 1st Qu.: 5.000
## Median : 8.000 Median : 5.000 Median : 6.000 Median : 8.000
## Mean : 7.081 Mean : 4.578 Mean : 5.996 Mean : 7.419
## 3rd Qu.:10.000 3rd Qu.: 7.000 3rd Qu.: 9.000 3rd Qu.:10.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## F122 F123 F132 G052
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. :1.000
## 1st Qu.: 3.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.:2.000
## Median : 6.000 Median : 3.000 Median : 5.000 Median :3.000
## Mean : 5.963 Mean : 3.919 Mean : 4.668 Mean :2.883
## 3rd Qu.: 9.000 3rd Qu.: 6.000 3rd Qu.: 7.000 3rd Qu.:4.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :5.000
## X001 X002 X003 G027A X007
## Min. :1.000 Min. :1937 Min. :18.0 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1953 1st Qu.:37.0 1st Qu.:1.000 1st Qu.:1.000
## Median :2.000 Median :1968 Median :50.0 Median :1.000 Median :3.000
## Mean :1.556 Mean :1967 Mean :50.8 Mean :1.105 Mean :2.942
## 3rd Qu.:2.000 3rd Qu.:1981 3rd Qu.:65.0 3rd Qu.:1.000 3rd Qu.:5.000
## Max. :2.000 Max. :2000 Max. :82.0 Max. :2.000 Max. :6.000
## X011 x026_01 X013 X025A_01
## Min. :0.000 Min. :1.000 Min. :1.000 Min. :2.000
## 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:3.000
## Median :1.000 Median :1.000 Median :2.000 Median :3.000
## Mean :1.412 Mean :1.098 Mean :2.259 Mean :3.547
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :4.000 Max. :6.000 Max. :8.000
## X028 X047E_EVS5
## Min. :1.000 Min. : 1.000
## 1st Qu.:1.000 1st Qu.: 3.000
## Median :2.000 Median : 5.000
## Mean :2.573 Mean : 4.773
## 3rd Qu.:4.000 3rd Qu.: 7.000
## Max. :8.000 Max. :10.000
#Colnames
colnames(data)
## [1] "cntry_AN" "year" "reg_nuts1" "A001" "A002"
## [6] "A004" "A005" "A006" "A008" "A009"
## [11] "A170" "A173" "A065" "A066" "A067"
## [16] "A068" "A071" "A072" "A074" "A078"
## [21] "A079" "A080_01" "A080_02" "A124_02" "A124_03"
## [26] "A124_06" "A124_08" "A124_09" "A165" "B008"
## [31] "D001_B" "G007_18_B" "G007_33_B" "G007_34_B" "G007_35_B"
## [36] "G007_36_B" "E023" "E025" "E026" "E027"
## [41] "E028" "E035" "E037" "E039" "E111_01"
## [46] "E117" "E235" "F025" "F028" "F034"
## [51] "F050" "F063" "F116" "F118" "F119"
## [56] "F120" "F121" "F122" "F123" "F132"
## [61] "G052" "X001" "X002" "X003" "G027A"
## [66] "X007" "X011" "x026_01" "X013" "X025A_01"
## [71] "X028" "X047E_EVS5"
# Renaming the variables
colnames(data)[9] <- "Happiness"
colnames(data)[11] <- "Life_Satisfaction"
## Descriptive statistics for Happiness variable
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 1 | 844 | 1.753554 | 0.6450278 | 2 | 1.753554 | 0 | 1 | 4 | 3 | 0.4147694 | -0.1000776 | 0.0222028 |
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 1.754 2.000 4.000
## Descriptive statistics for Life Satisfaction variable
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 1 | 844 | 7.919431 | 1.805002 | 8 | 7.919431 | 1.4826 | 1 | 10 | 9 | -1.022233 | 0.7895252 | 0.0621307 |
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 7.000 8.000 7.919 9.000 10.000
For categorical variables, such as “Happiness,” which represents different categories of happiness, we decided to use a bar plot to illustrate the distribution. With this, we displayed the frequency of responses for each category of happiness:
##
## 1 2 3 4
## 300 457 82 5
This bar plot shows the number of individuals falling into each category of happiness, providing a visual representation of the distribution. In this specific dataset, the majority of respondents seem to report being either “Very Happy” or “Quite Happy,” with fewer individuals indicating lower levels of happiness (“Not Very Happy” and “Not at All Happy”).
As “Life_Satisfaction” represents a numerical scale from 1 to 10 for life satisfaction, we can visualize the distribution using a histogram. Life satisfaction can be measured by these 10 categories:
##
## 1 2 3 4 5 6 7 8 9 10
## 3 2 17 28 51 57 98 235 179 174
This histogram provides an overview of how individuals in the dataset distribute across different levels of life satisfaction, with the highest count in the “8” category, indicating a relatively high level of satisfaction.
##
## 4 - Not at all happy 3 - Not very happy 2 - Quite happy 1 - Very happy
## 1 2 0 1 0
## 2 1 0 1 0
## 3 0 6 7 4
## 4 1 13 8 6
## 5 0 19 26 6
## 6 0 12 34 11
## 7 1 14 65 18
## 8 0 14 171 50
## 9 0 0 92 87
## 10 0 4 52 118
This type of graph helps visualize the distribution of happiness levels for each life satisfaction rating and can provide insights into the relationship between the two variables.
Pattern: Respondents who report being very happy tend to have higher life satisfaction scores. The counts are higher in the categories 7-10, indicating a positive association. Insights: There is a clear positive relationship between reporting a high level of happiness and having higher life satisfaction scores. This aligns with the expectation that individuals who feel very happy are more likely to rate their overall life satisfaction positively.
Pattern: Similar to Happiness Level 1, individuals who report being quite happy also have higher life satisfaction scores. The counts are higher in the categories 7-10. Insights: The positive association observed in Happiness Level 1 is also evident in respondents who indicate being quite happy. This reinforces the notion that higher happiness levels correspond to higher life satisfaction.
Pattern: Respondents who report lower happiness levels (3 and 4) tend to have lower life satisfaction scores. The counts are higher in the categories 1-6. Insights: There is a negative association between lower happiness levels and life satisfaction. Individuals reporting lower happiness levels are more likely to have lower life satisfaction scores.
The observed patterns reinforce the intuitive expectation that higher happiness levels correspond to higher life satisfaction.
The strong positive association between very happy and quite happy categories and higher life satisfaction scores suggests that happiness is a key factor influencing individuals’ overall life satisfaction.
In this session we explored the relations between happiness and life satisfaction with marital status, health and income.
The variable “Marital_Status” was measured by categorizing individuals into different relationship statuses. Respondents indicated their marital status by choosing one of the provided options, including:
##
## 1 2 3 4 5 6
## 4 - Not at all happy 1 0 3 0 0 1
## 3 - Not very happy 25 0 23 1 16 17
## 2 - Quite happy 216 2 75 1 44 119
## 1 - Very happy 172 3 36 0 16 73
Very Happy (1): The majority of individuals in this category are either married (172) or living together as married (216), indicating a higher prevalence of reported happiness in individuals with these marital statuses.
Quite Happy (2): Similar to the “Very Happy” category, individuals who are married or living together as married also have higher frequencies (3 and 2, respectively). However, this category also shows some individuals who are divorced (23) or single/never married (17) reporting a “Quite Happy” status.
Not Very Happy (3): The “Not Very Happy” category includes individuals across various marital statuses, with married and living together as married being the most frequent. Notably, divorced individuals also contribute to this category.
Not at All Happy (4): The frequencies in this category are relatively low, with the majority being married or living together as married. The “Separated” category has a small representation here as well.
In summary, the data suggests some associations between marital status and reported happiness levels. Married individuals, especially those living together as married, tend to report higher levels of happiness, while other marital statuses show varied patterns in reported happiness.
##
## 1 2 3 4 5 6
## 1 1 0 2 0 0 0
## 2 0 0 2 0 0 0
## 3 6 0 3 0 3 5
## 4 10 0 7 1 2 8
## 5 23 0 13 0 1 14
## 6 31 0 7 1 7 11
## 7 47 1 15 0 8 27
## 8 101 2 39 0 29 64
## 9 93 1 28 0 6 51
## 10 102 1 21 0 20 30
Dissatisfied (1): Individuals who are married, widowed, or single/never married are more represented in the “Dissatisfied” category, suggesting that these marital statuses may have higher frequencies of lower life satisfaction.
2, 3, 4, 5, 6, 7, 8, 9 (2-9): The frequencies are spread across various life satisfaction levels, and there isn’t a clear dominance of a particular marital status in these categories. However, individuals who are married or living together as married seem to be present across multiple life satisfaction levels.
Satisfied (10): Marital statuses of individuals reporting the highest life satisfaction are diverse, with married individuals having a relatively higher representation. Single/never married individuals also contribute significantly to this category.
In summary, the data suggests a varied distribution of life satisfaction across different marital statuses. While some patterns are observed, it’s important to note that life satisfaction is influenced by a range of factors, and this analysis provides a snapshot of the reported levels within the given dataset.
The variable “Income” was measured by categorizing households based on their total income, which includes wages, salaries, pensions, and other sources, after accounting for taxes and deductions. Each household is assigned a letter representing its income group.
##
## 1 2 3 4 5 6 7 8 9 10
## 4 - Not at all happy 2 0 0 2 0 1 0 0 0 0
## 3 - Not very happy 21 19 16 6 12 1 6 1 0 0
## 2 - Quite happy 29 86 50 72 71 42 40 37 17 13
## 1 - Very happy 13 30 26 31 48 37 44 38 19 14
Very Happy (1): Individuals reporting the highest level of happiness are distributed across various income categories, with higher frequencies in the 5th, 6th, and 7th deciles.
Quite Happy (2): Similar to the “Very Happy” category, individuals reporting high happiness levels are spread across different income categories, with higher frequencies in the 5th, 6th, and 7th deciles.
Not Very Happy (3): Individuals in this category are distributed across various income categories, with slightly higher frequencies in the 1st, 2nd, and 4th deciles.
Not at All Happy (4): The frequencies in this category are relatively low, and individuals reporting the lowest happiness levels are distributed across different income categories, with a slight concentration in the 1st decile.
In summary, the data suggests that individuals reporting higher happiness levels are distributed across various income categories. While some patterns are observed, there isn’t a clear dominance of a particular income category within each happiness level. It’s important to note that happiness is influenced by various factors, and this analysis provides insights into their intersection with income.
##
## 1 2 3 4 5 6 7 8 9 10
## 1 2 0 0 1 0 0 0 0 0 0
## 2 1 0 0 1 0 0 0 0 0 0
## 3 3 2 0 3 2 2 1 3 0 1
## 4 4 5 5 3 3 1 5 2 0 0
## 5 6 7 9 6 8 2 4 7 0 2
## 6 5 8 9 8 8 4 7 3 3 2
## 7 8 18 13 17 17 8 10 2 2 3
## 8 14 50 24 27 39 22 25 18 11 5
## 9 10 27 17 22 26 19 15 24 9 10
## 10 12 18 15 23 28 23 23 17 11 4
Dissatisfied (1): Individuals reporting the lowest life satisfaction levels are distributed across various income categories, with slightly higher frequencies in the 5th and 6th deciles.
2, 3, 4, 5, 6, 7, 8, 9 (2-9): The frequencies are spread across different life satisfaction levels, and there isn’t a clear dominance of a particular income category in these categories. However, individuals reporting higher life satisfaction levels tend to be more prevalent in the higher deciles (H, I, J).
Satisfied (10): Individuals reporting the highest life satisfaction levels are more concentrated in the higher income deciles (H, I, J), indicating a positive association between high life satisfaction and higher income.
In summary, the data suggests a positive association between higher life satisfaction levels and higher income, particularly in the upper deciles (H, I, J). However, individuals with lower life satisfaction levels are distributed across various income categories, with slightly higher frequencies in the 5th and 6th deciles.
The variable “Health” was measured by assessing individuals’ subjective evaluation of their overall health. Respondents were asked to describe their state of health, choosing from options such as:
##
## 1 2 3 4 5
## 4 - Not at all happy 1 0 1 0 3
## 3 - Not very happy 3 15 45 18 1
## 2 - Quite happy 113 223 104 15 2
## 1 - Very happy 172 100 22 6 0
Very Happy (1): Individuals reporting higher happiness levels are predominantly in the “Very good” and “Good” categories of health. This suggests a positive association between high happiness and good health.
Quite Happy (2): Similar to the “Very Happy” category, individuals reporting moderate happiness levels are also prevalent in the “Very good” and “Good” health categories.
Not Very Happy (3): Individuals with lower happiness levels are more evenly distributed across different health categories, with a slight concentration in the “Good” health category.
Not at All Happy (4): The frequencies in this category are relatively low, but individuals reporting the lowest happiness levels are more concentrated in the “Good” and “Fair” health categories.
In summary, the data suggests a positive association between higher happiness levels and better self-reported health. However, individuals with lower happiness levels show a more varied distribution across different health categories.
##
## 1 2 3 4 5
## 1 1 0 1 0 1
## 2 1 0 0 0 1
## 3 4 7 3 3 0
## 4 7 7 5 8 1
## 5 10 15 20 4 2
## 6 11 23 18 5 0
## 7 25 40 27 5 1
## 8 64 100 59 12 0
## 9 77 81 19 2 0
## 10 89 65 20 0 0
Dissatisfied (1): Individuals reporting the lowest life satisfaction levels are spread across different health categories, with slightly higher frequencies in the “Very good” and “Poor” health categories.
2, 3, 4, 5, 6, 7, 8, 9 (2-9): The frequencies are spread across various life satisfaction levels, and there isn’t a clear dominance of a particular health category in these categories. However, individuals reporting higher life satisfaction levels tend to be more prevalent in the “Very good” and “Good” health categories.
Satisfied (10): Individuals reporting the highest life satisfaction levels are more concentrated in the “Very good” and “Good” health categories, indicating a positive association between high life satisfaction and better self-reported health.
In summary, the data suggests a positive association between higher life satisfaction levels and better self-reported health, particularly in the “Very good” and “Good” health categories. Lower life satisfaction levels are more dispersed across different health categories.
##
## 1 2 8 9
## 1 0 3 0 0
## 2 1 1 0 0
## 3 6 11 0 0
## 4 11 17 0 0
## 5 18 33 0 0
## 6 27 30 0 0
## 7 41 57 0 0
## 8 123 112 0 0
## 9 113 66 0 0
## 10 90 84 0 0
Pattern: There is a noticeable positive association between trusting most people and higher life satisfaction. This is evident in the increasing counts as we move towards higher life satisfaction levels (7 and 8). Insights: Individuals who express trust in others tend to report higher life satisfaction. This aligns with existing literature suggesting that a general trust in people is linked to positive well-being.
Pattern: The counts are more evenly distributed across life satisfaction levels, indicating less distinct associations compared to Trust Level 1. Insights: People who express caution and indicate that they can’t be too careful show a less clear relationship with life satisfaction. The counts are spread across various satisfaction levels, and there isn’t a dominant trend.
Pattern: Both these trust levels have zero counts across all life satisfaction levels. Insights: Respondents who answer “Don’t know” or provide “No answer” to questions about trust do not contribute information regarding their life satisfaction. This might suggest a lack of response or uncertainty in these individuals about both trust and life satisfaction.
Trust in people seems to be a factor associated with higher life satisfaction. The pattern observed in Trust Level 1 implies that fostering a sense of trust in a community might contribute positively to overall life satisfaction.
Obs: The lack of information from individuals who respond with uncertainty or choose not to answer underscores the importance of clear and reliable data collection methods.
##
## 1 2 8 9
## 4 173 127 0 0
## 3 232 225 0 0
## 2 24 58 0 0
## 1 1 4 0 0
Pattern: Individuals who express trust in most people tend to report higher levels of happiness. This is evident in the higher counts in the categories “Quite happy” and “Very happy.” Insights: There is a positive association between trusting most people and higher happiness levels. This aligns with the idea that a general sense of trust can contribute to an individual’s overall happiness.
Pattern: The counts are more evenly distributed across happiness levels, indicating less distinct associations compared to Trust Level 1. Insights: People who express caution and indicate that they can’t be too careful show a less clear relationship with happiness. The counts are spread across various happiness levels, and there isn’t a dominant trend.
Pattern: Both these trust levels have zero counts across all happiness levels. Insights: Respondents who answer “Don’t know” or provide “No answer” to questions about trust do not contribute information regarding their happiness. This might suggest a lack of response or uncertainty in these individuals about both trust and happiness.
Similar to the analysis with life satisfaction, trust in people seems to be associated with higher happiness. The pattern observed in Trust Level 1 implies that fostering a sense of trust in a community might contribute positively to overall happiness.
Obs: The lack of information from individuals who respond with uncertainty or choose not to answer underscores the importance of clear and reliable data collection methods.
As expected, people with greater life satisfaction and happiness rates tend to rely more on friends.
And the same happens when we analyze the importance on family, but in a greater scale.
For this, we’ve run a PCA analysis using these variables provided in the study:
# Renaming the variables
colnames(data)[6] <- "Politics_Importance"
colnames(data)[37] <- "Interest_In_Politics"
colnames(data)[38] <- "Sign_Petition"
colnames(data)[39] <- "Join_Boycotts"
colnames(data)[40] <- "Attend_Demonstrations"
colnames(data)[41] <- "Join_Unofficial_Strikes"
colnames(data)[45] <- "Satisfaction_Political_System"
colnames(data)[46] <- "Democratic_Political_System"
colnames(data)[47] <- "Importance_of_Democracy"
OBS: We modified this one to look like 1 is completely satisfied and 10 not satisfied at all. This way we can make an easier interpretation of the results!
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.6305 1.2188 1.0347 0.9433 0.75966 0.70775 0.66743
## Proportion of Variance 0.3323 0.1857 0.1338 0.1112 0.07213 0.06261 0.05568
## Cumulative Proportion 0.3323 0.5180 0.6518 0.7630 0.83517 0.89779 0.95347
## PC8
## Standard deviation 0.61012
## Proportion of Variance 0.04653
## Cumulative Proportion 1.00000
Upon analyzing the scree plot, a distinct elbow was evident, signifying that the first few principal components, particularly PC1, PC2 and PC3, capture a substantial portion of the overall variance in the data. The elbow method, validated by the plot’s noticeable bend after PC3, guided our decision to prioritize these key contributors.
Subsequent examination of the cumulative proportion of variance reinforced this choice, revealing that PC1,PC2 and PC3 collectively explain 68.51% of the dataset’s total variance. This significant proportion supports the decision to focus on these principal components, striking a balance between capturing essential information and maintaining analytical simplicity, providing concise yet meaningful insights into the underlying patterns within the dataset.
The component loadings represent the correlation between the original variables and each principal component. Higher absolute values indicate a stronger correlation.
## PC1 PC2 PC3 PC4
## Politics_Importance -0.38769993 0.10930526 -0.5975678525 -0.16283646
## Interest_In_Politics -0.44139015 0.10596216 -0.4917631056 -0.03212581
## Sign_Petition -0.44630768 0.04246822 0.1954184735 0.09642800
## Join_Boycotts -0.38129810 0.28180280 0.4193160240 -0.13711993
## Attend_Demonstrations -0.40908839 0.26544071 0.4014660444 -0.02375614
## Satisfaction_Political_System -0.06015696 -0.42941116 0.1049512349 -0.88298350
## Democratic_Political_System -0.25414117 -0.57682388 0.1219735685 0.22667153
## Importance_of_Democracy 0.27335838 0.55500406 -0.0003222443 -0.33577938
## PC5 PC6 PC7 PC8
## Politics_Importance -0.182743226 0.07462493 -0.05028878 -0.642374613
## Interest_In_Politics 0.005883865 -0.04394387 0.05036271 0.739311604
## Sign_Petition 0.796688592 0.15338228 -0.28238512 -0.116357075
## Join_Boycotts -0.379769364 -0.48364703 -0.44981680 0.009836965
## Attend_Demonstrations -0.169767718 0.33542029 0.67633937 -0.041058439
## Satisfaction_Political_System 0.109837127 -0.02086056 0.07975329 0.049524013
## Democratic_Political_System -0.378305629 0.52085919 -0.34024522 0.079074022
## Importance_of_Democracy -0.060269308 0.59230335 -0.36509696 0.129407687
## Warning in plot.window(...): "text" não é um parâmetro gráfico
## Warning in plot.xy(xy, type, ...): "text" não é um parâmetro gráfico
## Warning in axis(side = side, at = at, labels = labels, ...): "text" não é um
## parâmetro gráfico
## Warning in axis(side = side, at = at, labels = labels, ...): "text" não é um
## parâmetro gráfico
## Warning in box(...): "text" não é um parâmetro gráfico
## Warning in title(...): "text" não é um parâmetro gráfico
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "text"
## não é um parâmetro gráfico
## Warning in plot.window(...): "text" não é um parâmetro gráfico
## Warning in plot.xy(xy, type, ...): "text" não é um parâmetro gráfico
## Warning in title(...): "text" não é um parâmetro gráfico
## Warning in axis(3, col = col[2L], ...): "text" não é um parâmetro gráfico
## Warning in axis(4, col = col[2L], ...): "text" não é um parâmetro gráfico
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "text" não é um parâmetro gráfico
Strong Negative Loadings: “Politics_Importance”: -0.4098 “Interest_In_Politics”: -0.4596 “Sign_Petition”: -0.4558 “Join_Boycotts”: -0.4200 “Attend_Demonstrations”: -0.4469
Individuals with lower scores on PC1 are less likely to find politics important in their lives, have lower interest in politics, and are less inclined to engage in specific political actions such as signing petitions, joining boycotts, or attending lawful/peaceful demonstrations.
“Important in life: Politics”: -0.5975 “Interest in politics”: -0.4917
“Political action: signing a petition”: 0.1954 “Political action: joining in boycotts”: 0.4192 “Political action: attending lawful/peaceful demonstrations”: 0.4013 “Satisfaction with the political system (reversed)”: 0.1059 “Political system: Having a democratic political system”: 0.1225
Individuals with lower scores on PC3 are less likely to consider politics important in their lives and have less interest in politics. Additionaly, they are more likely to engage in specific political actions such as signing petitions, joining boycotts, and attending lawful/peaceful demonstrations. Additionally, they tend to express higher satisfaction with the political system and prefer a democratic political system.
PC1:
Captures the lack of interest and engagement in specific political actions among those who find politics less important.
PC2:
Emphasizes a strong link between satisfaction and a preference for democracy.
PC3:
Suggests a group that, while not personally prioritizing politics, actively engages in political activities and supports a democratic system.
We decided to test how strong the associations are by using two approaches: correlation analysis and regression analysis, then we compare the findings.
Note: We decided to recode happiness to examine its alignment with the direction of movement in the other two variables and to facilitate a more meaningful comparison across the dataset.
Q: Please use the scale to indicate how much freedom of choice and control you feel you have over the way your life turns out?
1 None at all 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 A great deal
# Load the necessary library for spearman correlation
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.3.2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:base':
##
## format.pval, units
# Renaming the variables
colnames(data)[12] <- "Freedom_of_Choice_Control"
# Convert to numeric if necessary
data$Freedom_of_Choice_Control <- as.numeric(data$Freedom_of_Choice_Control)
data$Happiness <- as.numeric(data$Happiness)
# Extract the happiness variable + other variables
happiness <- data$Happiness
life_satisfaction <- data$Life_Satisfaction
freedom_choice_control <- data$Freedom_of_Choice_Control
# Standardize the variables
scaled_happiness <- scale(happiness)
scaled_life_satisfaction <- scale(life_satisfaction)
scaled_freedom_choice_control <- scale(freedom_choice_control)
# Calculate the correlation
pearson_corr_fc_happiness <- cor(data$Freedom_of_Choice_Control, data$Happiness, method = "pearson")
# Spearman correlation between Freedom_of_Choice_Control and happiness_numeric
spearman_corr_fc_happiness <- cor(data$Freedom_of_Choice_Control, data$Happiness, method = "spearman")
# Pearson correlation between Freedom_of_Choice_Control and life_satisfaction
pearson_corr_fc_life_satisfaction <- cor(data$Freedom_of_Choice_Control, data$Life_Satisfaction, method = "pearson")
# Spearman correlation between Freedom_of_Choice_Control and life_satisfaction
spearman_corr_fc_life_satisfaction <- cor(data$Freedom_of_Choice_Control, data$Life_Satisfaction, method = "spearman")
## Pearson Correlation between Freedom_of_Choice_Control and Happiness: -0.2132671
## Spearman Correlation between Freedom_of_Choice_Control and Happiness: -0.1842096
There is a weak negative correlation (close to 0.2) between freedom of choice/control and happiness. This indicates that as the perceived freedom of choice/control decreases, there is a slight tendency for happiness to decrease, and vice versa. The correlation is weaker compared to the correlation with life satisfaction.
## Pearson Correlation between Freedom_of_Choice_Control and Life_Satisfaction: 0.492049
## Spearman Correlation between Freedom_of_Choice_Control and Life_Satisfaction: 0.4683609
There is a moderate positive correlation (close to 0.5) between freedom of choice/control and life satisfaction. This suggests that as freedom of choice/control increases, there is a tendency for life satisfaction to increase, and vice versa.
##
## Call:
## lm(formula = Life_Satisfaction ~ Freedom_of_Choice_Control, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9313 -0.8078 0.1922 0.8775 4.8217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.74004 0.20127 23.55 <2e-16 ***
## Freedom_of_Choice_Control 0.43825 0.02672 16.40 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.572 on 842 degrees of freedom
## Multiple R-squared: 0.2421, Adjusted R-squared: 0.2412
## F-statistic: 269 on 1 and 842 DF, p-value: < 2.2e-16
In the analysis of Life Satisfaction predicted by Freedom of Choice/Control, the linear regression model reveals a statistically significant and positive relationship. The estimated coefficient for Freedom_of_Choice_Control is 0.43825, indicating that, on average, a one-unit increase in Freedom of Choice/Control is associated with a 0.43825 unit increase in Life Satisfaction. The model is highly significant (p-value < 2.2e-16), and approximately 24.21% of the variability in Life Satisfaction is explained by Freedom of Choice/Control, as indicated by the R-squared value. These findings suggest that, while Freedom of Choice/Control is a significant predictor of Life Satisfaction, other unexamined factors may contribute to the remaining variance.
##
## Call:
## lm(formula = Happiness ~ Freedom_of_Choice_Control, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1781 -0.6351 0.1613 0.2970 2.4328
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.24600 0.08072 27.825 < 2e-16 ***
## Freedom_of_Choice_Control -0.06788 0.01072 -6.334 3.88e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6306 on 842 degrees of freedom
## Multiple R-squared: 0.04548, Adjusted R-squared: 0.04435
## F-statistic: 40.12 on 1 and 842 DF, p-value: 3.881e-10
In the analysis of Happiness predicted by Freedom of Choice/Control, the linear regression model also reveals a statistically significant relationship, but with a negative association. The estimated coefficient for Freedom_of_Choice_Control is -0.06788, indicating that a one-unit increase in Freedom of Choice/Control is associated with a decrease of 0.06788 units in Happiness. The model is highly significant (p-value = 3.88e-10), yet the R-squared value is relatively low at 4.55%, suggesting that Freedom of Choice/Control explains a modest portion of the variability in Happiness. This indicates a nuanced relationship where an increase in Freedom of Choice/Control is associated with a decrease in Happiness, but the strength of this association is limited, leaving room for the influence of other factors on Happiness.
The analysis reveals compelling evidence for a substantial association between individual autonomy, as measured by Freedom of Choice/Control, and life satisfaction. This is supported by a robust positive correlation of 0.49 and a significant positive impact on life satisfaction in the regression model, where 24.21% of the variance is explained. However, the nuanced relationship with happiness is characterized by a weaker negative correlation of -0.21 and a modest impact on Happiness in the regression model (4.55% explained variance). While individual autonomy is strongly linked to life satisfaction, the connection with happiness involves additional complexities that warrant further investigation.
For task 9, out of the variables that were not used to explain SWB within this project, the variable “Respondent immigrant / born in country” (renamed to Immigrant_Status) is the one we decided to use. We decided it because we wanted to know if being an immigrant or a native impacts on happiness and life satisfaction. To test this we used two different approaches: correlation analysis and regression analysis.
colnames(data)[65] <- "Immigrant_Status"
# Standardize the variables
scaled_immigrant_status <- scale(data$Immigrant_Status)
### Correlation
# Calculate the correlation
pearson_corr_ims_happiness <- cor(data$Immigrant_Status, data$Happiness, method = "pearson")
# Spearman correlation between Freedom_of_Choice_Control and happiness_numeric
spearman_corr_ims_happiness <- cor(data$Immigrant_Status, data$Happiness, method = "spearman")
# Pearson correlation between Freedom_of_Choice_Control and life_satisfaction
pearson_corr_ims_life_satisfaction <- cor(data$Immigrant_Status, data$Life_Satisfaction, method = "pearson")
# Spearman correlation between Freedom_of_Choice_Control and life_satisfaction
spearman_corr_ims_life_satisfaction <- cor(data$Immigrant_Status, data$Life_Satisfaction, method = "spearman")
## Pearson Correlation between Immigrant Status and Happiness: 0.04149277
## Spearman Correlation between Immigrant Status and Happiness: 0.04024693
A correlation coefficient of -0.04 between immigrant status and happiness suggests a very weak negative correlation. This means that there is a slight tendency for immigrant status and happiness to move in opposite directions, but the relationship is very weak.
## Pearson Correlation between Immigrant Status and Life_Satisfaction: -0.02315874
## Spearman Correlation between Immigrant Status and Life_Satisfaction: -0.03093079
A correlation coefficient of -0.02 between immigrant status and life satisfaction suggests an extremely weak negative correlation. This indicates a slight tendency for immigrant status and life satisfaction to move in opposite directions, but again, the relationship is extremely weak.
##
## Call:
## lm(formula = Life_Satisfaction ~ Immigrant_Status, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9338 -0.9338 0.0662 1.0662 2.2022
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.0698 0.2322 34.758 <2e-16 ***
## Immigrant_Status -0.1360 0.2024 -0.672 0.502
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.806 on 842 degrees of freedom
## Multiple R-squared: 0.0005363, Adjusted R-squared: -0.0006507
## F-statistic: 0.4518 on 1 and 842 DF, p-value: 0.5017
The regression analysis results show the relationship between life satisfaction and immigrant status:
The coefficient for immigrant status is -0.1360 with a standard error of 0.2024 and a t-value of -0.672. However, this coefficient is not statistically significant (p-value = 0.502), indicating that there is no evidence to reject the null hypothesis that the coefficient is equal to zero.
The adjusted R-squared value is -0.0006507, indicating that the model does not explain a significant portion of the variance in life satisfaction.
The F-statistic is 0.4518 with a p-value of 0.5017, suggesting that the model as a whole does not significantly predict life satisfaction.
Overall, the results suggest that there is no significant relationship between immigrant status and life satisfaction.
##
## Call:
## lm(formula = Happiness ~ Immigrant_Status, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8315 -0.7444 0.2556 0.2556 2.2556
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.65728 0.08292 19.987 <2e-16 ***
## Immigrant_Status 0.08709 0.07227 1.205 0.229
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6449 on 842 degrees of freedom
## Multiple R-squared: 0.001722, Adjusted R-squared: 0.000536
## F-statistic: 1.452 on 1 and 842 DF, p-value: 0.2285
The regression analysis results show the relationship between happiness and immigrant status:
The coefficient for immigrant status is -0.08709 with a standard error of 0.07227 and a t-value of -1.205. This coefficient is not statistically significant (p-value = 0.229), indicating that there is no evidence to reject the null hypothesis that the coefficient is equal to zero.
The adjusted R-squared value is 0.000536, indicating that the model explains a very small fraction of the variance in happiness.
The F-statistic is 1.452 with a p-value of 0.2285, suggesting that the model as a whole does not significantly predict happiness.
Overall, the results suggest that there is no significant relationship between immigrant status and happiness.
In the correlation analysis, immigrant status showed a very weak positive correlation with happiness (Pearson: -0.0415, Spearman: -0.0402) and an extremely weak negative correlation with life satisfaction (Pearson: -0.0232, Spearman: -0.0309). However, in the regression analysis, immigrant status did not emerge as a significant predictor for either life satisfaction or happiness. The regression models for life satisfaction and happiness explained minimal variability (Adjusted R-squared: -0.0006507 for life satisfaction and 0.000536 for happiness), indicating that immigrant status alone does not significantly predict well-being. These results suggest that while there may be some weak correlations between immigrant status and well-being, the relationship is not significant when considering other factors included in the regression models.
### Correlation Analysis
# Calculate correlation coefficients
pearson_corr_life_satisfaction_control <- cor(data$Life_Satisfaction, data$Freedom_of_Choice_Control, method = "pearson")
spearman_corr_life_satisfaction_control <- cor(data$Freedom_of_Choice_Control, data$Life_Satisfaction, method = "spearman")
pearson_corr_immigrant_health <- cor(data$Immigrant_Status, data$Health, method = "pearson")
spearman_corr_immigrant_health <- cor(data$Immigrant_Status, data$Health, method = "spearman")
# Print correlation coefficients
cat("Pearson Correlation between Freedom_of_Choice_Control and Happiness:", pearson_corr_life_satisfaction_control, "\n")
## Pearson Correlation between Freedom_of_Choice_Control and Happiness: 0.492049
cat("Spearman Correlation between Freedom_of_Choice_Control and Happiness:", spearman_corr_life_satisfaction_control, "\n")
## Spearman Correlation between Freedom_of_Choice_Control and Happiness: 0.4683609
cat("Pearson Correlation between Immigrant_Status and Health:", pearson_corr_immigrant_health, "\n")
## Pearson Correlation between Immigrant_Status and Health: 0.0009282111
cat("Spearman Correlation between Immigrant_Status and Health:", spearman_corr_immigrant_health, "\n")
## Spearman Correlation between Immigrant_Status and Health: 0.00827119
Certainly! Here’s the interpretation of the correlation coefficients:
Freedom of Choice Control and Happiness: The correlation coefficient between Freedom of Choice Control and Happiness is 0.49. This suggests a moderate positive correlation between these variables, indicating that as the level of perceived freedom of choice and control increases, the level of happiness tends to increase as well. The Spearman correlation coefficient of 0.47 confirms this positive relationship.
Immigrant Status and Health: The correlation coefficient between Immigrant Status and Health is close to 0 (0.0009 for Pearson and 0.0082 for Spearman). This indicates a very weak correlation between immigrant status and health status, suggesting that there is no significant linear relationship between these two variables.
# Create a contingency table between variables
contingency_table_aa <- table(data$Life_Satisfaction, data$Freedom_of_Choice_Control)
print(contingency_table_aa)
##
## 1 2 3 4 5 6 7 8 9 10
## 1 2 0 0 0 1 0 0 0 0 0
## 2 0 0 0 1 0 1 0 0 0 0
## 3 1 0 4 3 5 2 1 1 0 0
## 4 1 1 1 5 7 7 2 3 1 0
## 5 1 0 5 8 16 7 2 2 3 7
## 6 2 1 2 7 14 13 7 4 4 3
## 7 0 0 6 6 13 20 24 12 10 7
## 8 1 1 2 8 22 23 51 73 27 27
## 9 1 0 1 1 9 14 39 69 28 17
## 10 1 0 2 3 11 14 12 32 17 82
# Spineplot for Life Satisfaction vs. Control over Life
spineplot(contingency_table_aa,
main = "Spine Plot: Life Satisfaction vs. Control over Life",
xlab = "Life Satisfaction",
ylab = "Control over Life",
col.lab = "black",
col = c("#43ED9E", "#FA617D", "#7F5AF8", "#FFDD32", "#42A5F5", "#FF8A65", "#66BB6A", "#FFEB3B", "#9575CD", "#78909C"))
# Create a contingency table
contingency_table_bb <- table(data$Health, data$Immigrant_Status)
# Spineplot for Health vs. Immigrant Status
spineplot(contingency_table_bb,
main = "Spine Plot: Health vs. Immigrant Status",
xlab = "Health",
ylab = "Immigrant Status",
col.lab = "black",
col = c("#43ED9E", "#FA617D"))
# Add legend
legend("topright",
legend = c("Born in this country", "Immigrant to this country"),
fill = c("#FA617D", "#43ED9E"))
# Perform hypothesis tests (Pearson correlation tests)
test_result_control <- cor.test(data$Life_Satisfaction, data$Freedom_of_Choice_Control)
test_result_health <- cor.test(data$Immigrant_Status, data$Health)
# Print the test results
cat("\nTest for Statistical Significance - Life Satisfaction and Control over Life:\n")
##
## Test for Statistical Significance - Life Satisfaction and Control over Life:
print(test_result_control)
##
## Pearson's product-moment correlation
##
## data: data$Life_Satisfaction and data$Freedom_of_Choice_Control
## t = 16.401, df = 842, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4391485 0.5415493
## sample estimates:
## cor
## 0.492049
cat("\nTest for Statistical Significance - Immigrant Status and Health:\n")
##
## Test for Statistical Significance - Immigrant Status and Health:
print(test_result_health)
##
## Pearson's product-moment correlation
##
## data: data$Immigrant_Status and data$Health
## t = 0.026934, df = 842, p-value = 0.9785
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.06655821 0.06840618
## sample estimates:
## cor
## 0.0009282111
To interpret the results of the statistical significance tests:
The contingency table shows the distribution of life satisfaction levels across different genders (“1 - Male” and “2 - Female”). Each cell in the table represents the frequency of individuals falling into a specific combination of life satisfaction level and gender.
## Warning: package 'vcd' was built under R version 4.3.2
## Carregando pacotes exigidos: grid
##
## Male Female
## 1 2 1
## 2 0 2
## 3 4 13
## 4 15 13
## 5 25 26
## 6 25 32
## 7 45 53
## 8 108 127
## 9 80 99
## 10 71 103
## Total number of males: 375
## Total number of females: 469
Analyzing the provided contingency table, which shows the distribution of life satisfaction levels across different genders (males and females), can provide insights into whether there are associations or differences in life satisfaction based on gender. Here are some observations:
A spineplot is a suitable choice for visualizing the relationship between two categorical variables, such as gender (male/female) and life satisfaction levels.
X-Axis (Life Satisfaction): Represents the levels of life satisfaction ranging from 1 to 10.
Y-Axis (Frequency): Represents the count of individuals at each life satisfaction level.
Spines: There are two spines, one for each category of “Sex” (1 - Male and 2 - Female).
Overall Distribution: The spineplot provides an overview of the distribution of life satisfaction levels for both males and females.
Comparison: Comparing the widths of corresponding segments in both spines, it appears that the distribution is somewhat similar between males and females.
Life Satisfaction Peaks: For both males and females, there is a noticeable peak around life satisfaction level 8, indicating that a substantial number of individuals from both genders report high life satisfaction at this level.
Gaps or Patterns: There are no significant gaps in the spines, suggesting that most life satisfaction levels have representation for both males and females.
Trends: The trend shows that, in general, the distribution is comparable between males and females across different life satisfaction levels.
Overall Impression:The spineplot suggests that, based on this data, there isn’t a clear and distinctive pattern indicating a strong association between sex and life satisfaction. The distributions are somewhat similar for both genders.
Keeping in mind that this interpretation is based on the visual patterns observed in the spineplot, and further statistical analysis may provide additional insights.
To investigate whether “life satisfaction” differs with respect to “sex,” it’s possible to use a statistical test for independence. Given the nature of the data (categorical variables with more than two levels), the chi-square test for independence is a suitable choice.
The steps previous to conduct the chi-square test for independence were:
## Warning in chisq.test(contingency_table_x): Aproximação do qui-quadrado pode
## estar incorreta
##
## Pearson's Chi-squared test
##
## data: contingency_table_x
## X-squared = 7.8392, df = 9, p-value = 0.5504
Interpreting the results of the test includes:
colnames(data)[71] <- "Employment_Status"
# Convert Life_Satisfaction to numeric
data$ls <- as.numeric(as.character(data$Life_Satisfaction))
data$health_factor <- factor(data$Health)
data$ms_factor <- factor(data$Marital_Status)
data$ed_factor <- factor(data$Education)
data$fcc_factor <- factor(data$Freedom_of_Choice_Control)
data$es_factor <- factor(data$Employment_Status)
data$income_factor <- factor(data$Income)
data$family_factor <- factor(data$Family_Importance)
data$friends_factor <- factor(data$Friends_Importance)
# Extrair os componentes principais
PC1 <- pca_result$x[, 1]
PC2 <- pca_result$x[, 2]
PC3 <- pca_result$x[, 3]
# Adicionar os componentes principais ao modelo
model_austria <- lm(ls ~ Age + I(Age^2) + health_factor + ed_factor + ms_factor + es_factor + Sex + family_factor + friends_factor + Trust_Most_People + fcc_factor + income_factor + Immigrant_Status + PC1 + PC2 + PC3, data = data)
summary(model_austria)
##
## Call:
## lm(formula = ls ~ Age + I(Age^2) + health_factor + ed_factor +
## ms_factor + es_factor + Sex + family_factor + friends_factor +
## Trust_Most_People + fcc_factor + income_factor + Immigrant_Status +
## PC1 + PC2 + PC3, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3228 -0.7249 0.0900 0.8463 3.7652
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.8461429 0.7480785 9.152 < 2e-16 ***
## Age -0.0174337 0.0217101 -0.803 0.422203
## I(Age^2) 0.0002434 0.0002170 1.121 0.262517
## health_factor2 -0.3607685 0.1167250 -3.091 0.002067 **
## health_factor3 -0.9060952 0.1511924 -5.993 3.13e-09 ***
## health_factor4 -1.9980959 0.2629628 -7.598 8.50e-14 ***
## health_factor5 -2.9401805 0.6452451 -4.557 6.02e-06 ***
## ed_factor3 -0.3510046 0.1629346 -2.154 0.031521 *
## ed_factor4 -0.1389833 0.2130489 -0.652 0.514363
## ed_factor5 -0.0639857 0.3427820 -0.187 0.851971
## ed_factor6 -0.3550311 0.2744471 -1.294 0.196174
## ed_factor7 -0.4101972 0.2555920 -1.605 0.108918
## ed_factor8 -0.6463454 0.5107157 -1.266 0.206041
## ms_factor2 -0.0610846 0.6470681 -0.094 0.924814
## ms_factor3 -0.0554273 0.1649357 -0.336 0.736920
## ms_factor4 -1.9746885 1.0047074 -1.965 0.049714 *
## ms_factor5 -0.1020759 0.2131926 -0.479 0.632216
## ms_factor6 0.0755158 0.1590379 0.475 0.635040
## es_factor2 0.6273998 0.1891472 3.317 0.000952 ***
## es_factor3 0.6860731 0.3130727 2.191 0.028713 *
## es_factor4 0.2292128 0.1936719 1.184 0.236963
## es_factor5 0.6139157 0.3315804 1.851 0.064473 .
## es_factor6 0.3235803 0.3965150 0.816 0.414712
## es_factor7 -0.2504944 0.2784085 -0.900 0.368535
## es_factor8 -0.1376870 0.5104698 -0.270 0.787442
## SexFemale -0.0792532 0.1048404 -0.756 0.449910
## family_factorRather important -0.2926571 0.1626289 -1.800 0.072315 .
## family_factorNot very important -0.2277235 0.3427414 -0.664 0.506617
## family_factorNot at all important 1.3507210 1.0205200 1.324 0.186032
## friends_factorRather important 0.1344582 0.1083460 1.241 0.214972
## friends_factorNot very important -0.4587303 0.2861957 -1.603 0.109367
## friends_factorNot at all important -2.6856154 1.4146889 -1.898 0.058011 .
## Trust_Most_People2 -0.1398153 0.1032622 -1.354 0.176130
## fcc_factor2 0.3740162 0.9424557 0.397 0.691583
## fcc_factor3 0.1367166 0.5644020 0.242 0.808663
## fcc_factor4 0.6885071 0.5238389 1.314 0.189111
## fcc_factor5 0.7647836 0.4930846 1.551 0.121298
## fcc_factor6 1.2010960 0.4925963 2.438 0.014976 *
## fcc_factor7 1.5578949 0.4893847 3.183 0.001513 **
## fcc_factor8 1.9184906 0.4838476 3.965 8.00e-05 ***
## fcc_factor9 1.8856998 0.4970020 3.794 0.000159 ***
## fcc_factor10 2.6571468 0.4898558 5.424 7.74e-08 ***
## income_factor2 0.1811685 0.2202121 0.823 0.410927
## income_factor3 0.1649567 0.2392277 0.690 0.490687
## income_factor4 0.3823277 0.2328188 1.642 0.100954
## income_factor5 0.4633034 0.2370188 1.955 0.050970 .
## income_factor6 0.7301691 0.2668782 2.736 0.006360 **
## income_factor7 0.4863428 0.2616906 1.858 0.063475 .
## income_factor8 0.5061840 0.2686403 1.884 0.059899 .
## income_factor9 0.8838371 0.3195557 2.766 0.005810 **
## income_factor10 0.3894775 0.3528727 1.104 0.270045
## Immigrant_Status -0.0517032 0.1615893 -0.320 0.749078
## PC1 0.1075613 0.0347368 3.096 0.002028 **
## PC2 0.3454944 0.0429129 8.051 3.02e-15 ***
## PC3 -0.0488147 0.0499047 -0.978 0.328296
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.374 on 789 degrees of freedom
## Multiple R-squared: 0.4575, Adjusted R-squared: 0.4203
## F-statistic: 12.32 on 54 and 789 DF, p-value: < 2.2e-16
Note: Variable employment status (es): 1 Full time (30h a week or more) 2 Part time (less then 30 hours a week) 3 Self employed 4 Retired/pensioned 5 Housewife (not otherwise employed) 6 Student 7 Unemployed 8 Other
Based on the provided analysis, several variables exhibit a strong association with life satisfaction:
Perceived Health Levels 2, 3, 4, and 5: Lower levels of perceived health, represented by health levels 2, 3, 4, and 5, show substantial decreases in life satisfaction. The coefficients for these health levels indicate reductions in life satisfaction ranging from approximately 0.36 to 2.94 units compared to the reference level, highlighting a robust relationship between health perception and life satisfaction. Part-time Employment (es_factor2): Part-time employment exhibits a strong association with higher life satisfaction, with an estimated increase of approximately 0.63 units compared to full-time employment. This numerical difference underscores the strong relationship between part-time employment and enhanced well-being. Freedom of Choice and Control: Each unit increase in perceived freedom of choice and control corresponds to substantial rises in life satisfaction. The coefficients for freedom of choice and control levels 6 to 10 range from approximately 1.20 to 2.66 units, indicating a significant impact on life satisfaction with greater perceived freedom and control. Income Level: Moving up in income deciles, particularly Deciles 6, 7, 8, and 9, is associated with notable increases in life satisfaction. The coefficients for these income deciles range from approximately 0.49 to 0.88 units, illustrating the strong positive relationship between income level and life satisfaction. Principal Components PC1 and PC2: PC1 and PC2 both demonstrate statistically significant associations with life satisfaction. For PC1, each unit increase corresponds to an increase of approximately 0.11 units in life satisfaction, while for PC2, each unit increase corresponds to an increase of approximately 0.35 units. These numerical values indicate the strength of the associations between these principal components and life satisfaction, highlighting their importance in shaping overall well-being.
These numerical explanations provide concrete evidence of the strong associations between these variables and life satisfaction, emphasizing their significance in understanding individuals’ overall well-being.
For the 0.05 level of significance, the variables that none of the statuses showed statistical significance are: age, sex, trust and the variable of interest immigrant status. Honestly it is no surprise that these variables didn’t show statistical significance. But we tend to naturally believe that immigrants and people that don’t trust others tend to be less satisfied in life. And thus the coefficient for not trusting people and being an immigrant are negative, they are not statistically significance.
We didn’t find statistic significance for our variable of interest. Thus, we can’t confirm that the computed coefficient corresponds to reality. Therefore, interpreting the coefficient (-0.05), being an immigrant has a negative impact on life satisfaction. If the person is an immigrant life satisfaction should be 5% lesser than if it wasn’t.
It is but not for all levels. For the 0.05 level of significance, only the statuses part-time and self-employed are statistically significant. The others don’t show statistical significance.
Despite not being statistically significant, being a female has a 7,9% decrease in life satisfaction. We can reach this conclusion by analyzing the sex binary variable coefficient and comparing its result to life satisfaction’s answers’ range. Since the model computed a -0.079 coefficient, being a female has a 7,9% decrease on the explained variable.
age_graph <- seq(min(data$Age), max(data$Age), 1)
ls_graph <- 6.8461429 + -0.0117454 * age_graph + 0.0002139 * age_graph^2
# Create a scatter plot with a regression line
plot(data$Age, data$Life_Satisfaction,
main = "Scatter plot with Regression Line",
xlab = "Age", ylab = "Life_Satisfaction")
lines(age_graph, ls_graph, col = "red")
# Find the age at which the minimum life satisfaction is reached
min_life_satisfaction_age <- data$Age[which.min(data$Life_Satisfaction)]
cat("The age at which the minimum life satisfaction is reached:", min_life_satisfaction_age, "\n")
## The age at which the minimum life satisfaction is reached: 44
The minimum level of life satisfaction is reached at age 44.
Yes. Marital status 4 (Separated) is the only one that shows statistical significance and its coefficient absolute value (-1.97) is significantly higher than the others.