Warning: package 'faux' was built under R version 4.4.3
************
Welcome to faux. For support and examples visit:
https://debruine.github.io/faux/
- Get and set global package options with: faux_options()
************
library(haven) #Imports survey data library(tidyverse)
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr)library(lmtest)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(sandwich)
Warning: package 'sandwich' was built under R version 4.4.3
library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(jtools)
Warning: package 'jtools' was built under R version 4.4.3
library(ggplot2)library(ggeffects)
Warning: package 'ggeffects' was built under R version 4.4.3
library(caret)
Warning: package 'caret' was built under R version 4.4.3
Loading required package: lattice
Attaching package: 'caret'
The following object is masked from 'package:purrr':
lift
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
library(flextable)
Warning: package 'flextable' was built under R version 4.4.3
Attaching package: 'flextable'
The following object is masked from 'package:jtools':
theme_apa
The following object is masked from 'package:purrr':
compose
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
Reverse coding for the variable “How to describe the state of personal health.” It aims to fit with the dependent variable “How Satisfied with Life” and independent variables like ” How satisfied with personal finances.”
new_names <-c("country_number", "Country", "Weight","personal_health", "life_satisfaction", "personal_finances", "trust_in_people", "science_technology", "biological_sex", "marital_status","age_group", "education") #Give your variables new informative names # Update column namescolnames(WVS_9_Countries_reduced) <- new_names #Apply new names to your data framehead(WVS_9_Countries_reduced)
skimr::skim(WVS_9_Countries_reduced) #Checks the variables in your data frame; evaluate for missing data
Data summary
Name
WVS_9_Countries_reduced
Number of rows
16536
Number of columns
12
_______________________
Column type frequency:
character
1
numeric
11
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
Country
0
1
3
3
0
9
0
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
country_number
0
1.00
413.41
297.72
50.00
124.00
504
716
840
▇▁▃▂▆
Weight
0
1.00
1.00
0.46
0.12
0.91
1
1
10
▇▁▁▁▁
personal_health
95
0.99
3.75
0.88
1.00
3.00
4
4
5
▁▁▅▇▃
life_satisfaction
104
0.99
6.90
2.25
1.00
6.00
7
8
10
▁▂▅▇▅
personal_finances
128
0.99
6.20
2.47
1.00
5.00
7
8
10
▂▃▆▇▅
trust_in_people
273
0.98
1.69
0.46
1.00
1.00
2
2
2
▃▁▁▁▇
science_technology
316
0.98
7.60
2.25
1.00
6.00
8
10
10
▁▁▃▆▇
biological_sex
0
1.00
1.50
0.50
1.00
1.00
2
2
2
▇▁▁▁▇
marital_status
148
0.99
2.68
2.14
1.00
1.00
1
5
6
▇▁▁▁▃
age_group
5
1.00
2.25
0.98
1.00
2.00
2
3
4
▅▇▁▅▃
education
279
0.98
1.93
1.05
1.00
1.00
2
3
4
▇▃▁▃▂
Non Interactive Model
Hypothesis and regression model
Directional hypothesis: The hypothesis would be that compared to the people who are satisfied with their financial and health status, people who are less satisfied with their financial and health status are less likely to be satisfied with life. Regarding the age groups, according to the research, age might positively correlate with life satisfaction (Baird, Lucas, and Donnellan 2010). Accordingly, the hypothesis would be the higher the age, the higher the level of life satisfaction.
During the OLS analysis, I will treat the independent variable “How satisfied with personal finances” and “How to describe the state of personal health” as a numerical variable as it is meaningful for adding one number as it means a higher level of satisfaction and status. I treated the variable “age” as a categorical variable as it can make the group range more clear.
I selected education, biological sex, and trust in people as control variables. The first reason is that education variables and biological sex variables are demographic factors. The second reason is that some studies demonstrate that social trust is associated with life satisfaction(Zhang 2020). Thus, adding the “trust” variable can explore the impact of trust on individuals’ life satisfaction.
stargazer(directional_hypothesis, digits=3, type="text", dep.var.labels=c("How Satisfied with Life"), covariate.labels=c("Satisfied With Personal Finances", "Describe State Of Personal Health", "30-49", "50-64", " 65+", "Some Post HS education", "BA/BS", "Advanced Degree", "Female", "Need to be very careful about trusting people"),single.row =TRUE)
===========================================================================
Dependent variable:
-----------------------------
How Satisfied with Life
---------------------------------------------------------------------------
Satisfied With Personal Finances 0.515*** (0.006)
Describe State Of Personal Health 0.505*** (0.017)
30-49 0.183*** (0.035)
50-64 0.302*** (0.040)
65+ 0.466*** (0.047)
Some Post HS education -0.028 (0.035)
BA/BS -0.089** (0.037)
Advanced Degree -0.082* (0.048)
Female 0.067** (0.027)
Need to be very careful about trusting people -0.025 (0.031)
Constant 1.641*** (0.078)
---------------------------------------------------------------------------
Observations 15,990
R2 0.434
Adjusted R2 0.434
Residual Std. Error 1.700 (df = 15979)
F Statistic 1,227.490*** (df = 10; 15979)
===========================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
The table above shows the result for the OLS model. According to the table, we can interpret that: - a) When financial satisfaction increases by one unit, life satisfaction will increase by 0.515. - b) When personal health increases by one unit, life satisfaction, on average, will increase by 0.505 when holding financial satisfaction constant. - c) Comparing the age group “18-29,” life satisfaction, on average, will increase by 0.183 in the “30-49” group when holding financial satisfaction constant. Comparing the age group “18-29,” life satisfaction will increase by 0.302 in the “50-64” group holding financial satisfaction constant. Comparing the age group “18-29,” life satisfaction, on average, will increase by 0.466 in the “65+” group when holding financial satisfaction constant. - d) Comparing the education group “HS degree or less,” life satisfaction, on average, will decrease by 0.028 in the “Some Post HS education” when holding financial satisfaction constant. Comparing the education group “HS degree or less,” life satisfaction, on average, will decrease by 0.089 in the “BA/BS” group when holding financial satisfaction constant. Comparing the education group “HS degree or less,” life satisfaction, on average, will decrease by 0.082 in the “Advanced Degree” group when holding financial satisfaction constant. - e) Comparing the males, life satisfaction, on average, will increase by 0.067 for females when holding financial satisfaction constant. - f) Comparing “Need to be very careful about trusting people,” life satisfaction, on average, will decrease by 0.025 in the “Most People can be trusted” group when holding financial satisfaction constant.
footnote_text <-str_wrap("Note: state of personal health is 'Very Good', Age group is set to '18-29' and '65+'", width =70)ggplot(personal_finances_three_term, aes(x =factor(x), y = predicted, group =as.factor(facet), fill =as.factor(facet))) +geom_bar(stat ="identity", width =0.7 , position =position_dodge()) +theme_minimal(base_size =7) +scale_fill_manual(name ="Age",values =c("lightgrey", "darkgrey"),labels =c("1"="18-29", "4"="65+"))+labs(x ="Satisfied With Personal Finances", y ="Predicted Satisfied With Life", title ="Bar Chart by Response Level and Group",subtitle= footnote_text)+geom_errorbar(aes(ymin=conf.low, ymax=conf.high),linewidth=.3, # Thinner lineswidth=.2, position =position_dodge(width=.7))
footnote_text <-str_wrap("Note: state of personal health is 'Very Good', Age group is set to '18-29' and '65+'", width =70)ggplot(personal_finances_three_term, aes(x =factor(x), y = predicted, group =as.factor(facet), color =as.factor(facet))) +geom_line(linewidth =1) +# Line for the predicted probabilitiesgeom_point(size =2) +# Points on the linesgeom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill =as.factor(facet)), alpha =0.2) +# Confidence intervalsscale_color_manual(name ="Age",values =c("lightgrey", "darkgrey"),labels =c("1"="18-29", "4"="65+")) +# Update legend labelsscale_fill_manual(name ="Age",values =c("lightgrey", "darkgrey"),labels =c("1"="18-29", "4"="65+")) +# Update fill legend labelstheme_minimal(base_size =10) +labs(x ="Satisfied With Personal Finances", y ="Predicted Satisfied With Life", title ="Line Chart by Response Level and Group",subtitle= footnote_text)
Using the ggpredict estimates the impact of people’s satisfaction with their finances on their satisfaction with life in one state of personal health, which is “very good,” and various age groups, which are “18-29” and “65+.”
Coefficient Plots and Interpretation
plot_summs(directional_hypothesis, model.names =c("OLS: How Satisfied with Life"), coefs=c( "Satisfied With Personal Finances"="personal_finances", "Describe State Of Personal Health"="personal_health", "30-49"="factor(age_group)2", "50-64"="factor(age_group)3", " 65+"="factor(age_group)4", "Some Post HS education"="factor(education)2", "BA/BS"="factor(education)3", "Advanced Degree"="factor(education)4", "Female"="factor(biological_sex)2", "Need to be very careful about trusting people"="factor(trust_in_people)2"),inner_ci_level = .9, robust=TRUE)
According to the plot, the coefficient for “Satisfied With Personal Finances” is 0.515, and 0.505 for “Describe State Of Personal Health,” which shows financial well-being is slightly more important to life satisfaction across these nine countries.
Re-estimate the model using robust standard errors
stargazer(directional_hypothesis, robust1,robust3, type="text", dep.var.labels=c("How Satisfied with Life"), covariate.labels=c("Satisfied With Personal Finances", "Describe State Of Personal Health", "30-49", "50-64", " 65+", "Some Post HS education", "BA/BS", "Advanced Degree", "Female", "Need to be very careful about trusting people"),single.row =TRUE)
===============================================================================================================
Dependent variable:
-----------------------------------------------------------------
How Satisfied with Life
OLS coefficient
test
(1) (2) (3)
---------------------------------------------------------------------------------------------------------------
Satisfied With Personal Finances 0.515*** (0.006) 0.515*** (0.008) 0.515*** (0.008)
Describe State Of Personal Health 0.505*** (0.017) 0.505*** (0.019) 0.505*** (0.019)
30-49 0.183*** (0.035) 0.183*** (0.036) 0.183*** (0.036)
50-64 0.302*** (0.040) 0.302*** (0.041) 0.302*** (0.041)
65+ 0.466*** (0.047) 0.466*** (0.047) 0.466*** (0.047)
Some Post HS education -0.028 (0.035) -0.028 (0.035) -0.028 (0.035)
BA/BS -0.089** (0.037) -0.089*** (0.034) -0.089*** (0.034)
Advanced Degree -0.082* (0.048) -0.082** (0.040) -0.082** (0.040)
Female 0.067** (0.027) 0.067** (0.027) 0.067** (0.027)
Need to be very careful about trusting people -0.025 (0.031) -0.025 (0.028) -0.025 (0.028)
Constant 1.641*** (0.078) 1.641*** (0.085) 1.641*** (0.085)
---------------------------------------------------------------------------------------------------------------
Observations 15,990
R2 0.434
Adjusted R2 0.434
Residual Std. Error 1.700 (df = 15979)
F Statistic 1,227.490*** (df = 10; 15979)
===============================================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
The table above shows the result that re-estimates the model using robust standard errors.
We found that: a) the coefficients do not change with the application of robust standard errors. b) The standard error for some variables, like “Satisfied With Personal Finances,” increases; for some variables, like”gender/female,” it remains the same; and for some variables, like “trust,” decreases. c) There is no loss of significance. Thus, there is no warrant for the use of robust standard error adjustments.
Interactive Model
Hypothesis and regression model
interactive_hypothesis <-lm(life_satisfaction ~personal_finances+personal_health, data=WVS_9_Countries_reduced)interactive_hypothesis_interaction <-lm(life_satisfaction ~personal_finances*personal_health, data=WVS_9_Countries_reduced)stargazer(interactive_hypothesis, interactive_hypothesis_interaction, type="text", digits=3, dep.var.labels=c("Satisfied With Life"), covariate.labels=c("Satisfied With Personal Finances", "State Of Personal Health", "state of personal health:State Of Personal Health"),single.row =TRUE)
===========================================================================================================
Dependent variable:
---------------------------------------------------------
Satisfied With Life
(1) (2)
-----------------------------------------------------------------------------------------------------------
Satisfied With Personal Finances 0.524*** (0.006) 0.675*** (0.021)
State Of Personal Health 0.470*** (0.016) 0.705*** (0.035)
state of personal health:State Of Personal Health -0.042*** (0.006)
Constant 1.895*** (0.060) 1.076*** (0.126)
-----------------------------------------------------------------------------------------------------------
Observations 16,362 16,362
R2 0.430 0.432
Adjusted R2 0.430 0.432
Residual Std. Error 1.702 (df = 16359) 1.699 (df = 16358)
F Statistic 6,170.153*** (df = 2; 16359) 4,145.519*** (df = 3; 16358)
===========================================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
According to the table above, the effect of financial satisfaction on life satisfaction is stronger for healthier individuals compared to persons in poorer health because the coefficient increases from 0.524 to 0.675.
As the p-value is less than 0.05, the interactive term is significant. The significant coefficient on the interaction term indicates that the influence of “How satisfied with personal finances” on “How satisfied with life” is different based on “How people describe their state of personal health.”
Not all rows are shown in the output. Use `print(..., n = Inf)` to show
all rows.
ggplot(financial_satisfaction, aes(x =factor(x), y = predicted, group = group, fill = group)) +geom_bar(stat ="identity", width =0.7, position =position_dodge()) +scale_fill_manual(name ="State Of Personal Health",values =c("lightgrey", "darkgrey"),labels =c("1"="Very Poor", "5"="Very Good")) +# Update legend labelstheme_minimal(base_size =10) +labs(x ="Satisfied With Personal Finances", y ="Predicted Satisfied with Life", title ="Bar Chart by Response Level and Group") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high),linewidth =0.3, # Thinner lineswidth =0.2, position =position_dodge(width =0.7))
ggplot(financial_satisfaction, aes(x =factor(x), y = predicted, group = group, color = group)) +geom_line(linewidth =1) +# Line for the predicted probabilitiesgeom_point(size =2) +# Points on the linesgeom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha =0.1) +# Confidence intervalsscale_color_manual(name ="State Of Personal Health",values =c("lightgrey", "darkgrey"),labels =c("1"="Very Poor", "5"="Very Good")) +# Update legend labelstheme_minimal(base_size =10) +labs(x =" Satisfied With Personal Finances", y ="Predicted Satisfied with Life", title ="Bar Chart by Response Level and Group")
According to the prediction and graphic, the slopes for the two lines are different and tend to interact as the significant interaction term from the interactive model. The model clearly shows that when people increase their satisfaction with personal finances, holding everything else constant, people with very good personal health are significantly more satisfied with their lives than people with very poor personal health. In addition, it is also worth noticing that when satisfaction with personal finances increases, the differences in prediction of people’s satisfaction with life between very good and very poor personal health decrease.
ggplot(financial_satisfaction, aes(x =factor(x), y = predicted, group = group, fill = group)) +geom_bar(stat ="identity", width =0.7, position =position_dodge()) +scale_fill_manual(name ="State Of Personal Health",values =c("lightgrey", "darkgrey"),labels =c("1"="Very Poor", "5"="Very Good")) +# Update legend labelstheme_minimal(base_size =10) +labs(x ="Satisfied With Personal Finances", y ="Predicted Satisfied with Life", title ="Bar Chart by Response Level and Group") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high),linewidth =0.3, # Thinner lineswidth =0.2, position =position_dodge(width =0.7))
ggplot(financial_satisfaction_no_interact, aes(x =factor(x), y = predicted, group =as.factor(group), color =as.factor(group))) +geom_line(linewidth =1) +# Line for the predicted probabilitiesgeom_point(size =2) +# Points on the linesgeom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill =as.factor(group)), alpha =0.2) +# Confidence intervalsscale_color_manual(name ="State Of Personal Health",values =c("lightgrey", "darkgrey"),labels =c("1"="Very Poor", "5"="Very Good")) +# Update legend labelsscale_fill_manual(name ="State Of Personal Health",values =c("lightgrey", "darkgrey"),labels =c("1"="Very Poor", "5"="Very Good")) +# Update fill legend labelstheme_minimal(base_size =10) +labs(x ="Satisfied With Personal Finances", y ="Predicted Satisfied With Life", title ="Line Chart by Response Level and Group")
The graphic shows that the two lines are parallel, which shows the relationship between satisfaction with personal finances and satisfaction with life is the same with “very good” and “very poor” personal health. Accordingly, the significant coefficient for the interaction term indicates the interactive model shows a more precise relationship between satisfaction with personal finances and satisfaction with life because it does not miss the impact of personal health.
Conclusion
According to the statistics from the summary of the regression model, the R square and adjusted R square for the interactive model are higher than the non-interactive model (0.430 to 0.432), the Residual Standard Error for the interactive model is lower than the non-interactive model (1.702 to 1.699), and the F Statistic for the interactive model is lower than the non-interactive model (6,170.153 to 4,145.519). Thus, according to the graphic and fit statistics, the interactive model fits the data more closely.
Baird, Brendan M., Richard E. Lucas, and M. Brent Donnellan. 2010. “Life Satisfaction Across the Lifespan: Findings from Two Nationally Representative Panel Studies.”Social Indicators Research 99 (2): 183–203. https://doi.org/10.1007/s11205-010-9584-9.
Zhang, Robert Jiqi. 2020. “Social Trust and Satisfaction with Life: A Cross-Lagged Panel Analysis Based on Representative Samples from 18 Societies.”Social Science & Medicine 251 (April): 112901. https://doi.org/10.1016/j.socscimed.2020.112901.