Libraries
library(psych)
library(rlist)
library(tidyr)
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ dplyr 1.0.7
## ✓ tibble 3.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::%+%() masks psych::%+%()
## x ggplot2::alpha() masks psych::alpha()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(stringr)Load and process data
load("wave_6_data.rdata")There are many columns, but we will be focusing on a select subset.
in_scope_columns <- colnames(WV6_Data_R_v20201117[,1:15])
in_scope_columns <- c(in_scope_columns, "V23")
wave_6 <- WV6_Data_R_v20201117 %>%
select(in_scope_columns)## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(in_scope_columns)` instead of `in_scope_columns` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
With our columns selected, we can now rename for convenience
new_col_names <- c("wave","country_code","country_regions","cow","c_cow_alpha",
"b_country_alpha","interview_id","family_imp",
"friends_imp","leisure_imp","politics_imp",
"work_imp","religion_imp","current_happiness",
"current_health","life_satisfaction")
colnames(wave_6) <- new_col_namesNeed to remove all rows where life_satisfaction is equal to NA, as this is our response variable.
wave_6 <- wave_6 %>%
drop_na(life_satisfaction)What countries are included in the survey?
wave_6 %>%
count(c_cow_alpha) %>%
arrange(desc(n)) %>%
head(15) %>%
ggplot(mapping = aes(x = reorder(c_cow_alpha, -n), n)) +
geom_bar(stat="identity")For our analysis, let’s focus on USA
usa_data <- wave_6 %>%
filter(c_cow_alpha=="USA")Define “engagement_ranking” as the average importance an indivudal places on measured life values
usa_data <- usa_data %>%
mutate(
engagement_ranking = rowMeans(.[8:13], na.rm=TRUE)
)View the data
head(usa_data)Research Question
Is a person’s engagement in measured life values predictive of self-reported life-satisfaction?
Alternatively, (assuming we cover multiple logistic regression) can life_satisfaction be modeled from an individual’s selected importance levels for Family, Friends, Leisure, Politics, Work, and Religion?
Cases
Each case represents a randomly selected survey respondent from the United States. There are 2,216 observations in the given dataset. Surveys took place between 2010-2014.
Data Collection
Surveying and Data Collection is managed by the World Values Survey Association.
Inglehart, R., C. Haerpfer, A. Moreno, C. Welzel, K. Kizilova, J. Diez-Medrano, M. Lagos, P. Norris, E. Ponarin & B. Puranen et al. (eds.). 2014. World Values Survey: Round Six - Country-Pooled Datafile Version: www.worldvaluessurvey.org/WVSDocumentationWV6.jsp. Madrid: JD Systems Institute.
Type of Study
Observational
Data Source
Data is collected by the World Values Survey Association and is available online here: https://www.worldvaluessurvey.org/WVSDocumentationWV6.jsp.
For this project, data was downloaded manually and read into memory.
Inglehart, R., C. Haerpfer, A. Moreno, C. Welzel, K. Kizilova, J. Diez-Medrano, M. Lagos, P. Norris, E. Ponarin & B. Puranen et al. (eds.). 2014. World Values Survey: Round Six - Country-Pooled Datafile Version: www.worldvaluessurvey.org/WVSDocumentationWV6.jsp. Madrid: JD Systems Institute.
Response
The response variable is life_satisfaction and it is ordinal
Explanatory
The explanatory variable is engagement_ranking score and is numerical, bounded between 1 and 5.
Relevant Summary Statistics
describe(usa_data$engagement_ranking)usa_data %>%
ggplot() +
geom_histogram(aes(x=engagement_ranking), bins=15)describe(usa_data$life_satisfaction)usa_data %>%
count(life_satisfaction) %>%
arrange(desc(n)) %>%
ggplot(aes(x=reorder(life_satisfaction, -n), n)) +
geom_bar(stat="identity")