Libraries

library(psych)
library(rlist)
library(tidyr)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ dplyr   1.0.7
## ✓ tibble  3.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ✓ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::%+%()   masks psych::%+%()
## x ggplot2::alpha() masks psych::alpha()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()

library(stringr)

Load and process data

load("wave_6_data.rdata")

There are many columns, but we will be focusing on a select subset.

in_scope_columns <- colnames(WV6_Data_R_v20201117[,1:15])
in_scope_columns <- c(in_scope_columns, "V23")

wave_6 <- WV6_Data_R_v20201117 %>%
            select(in_scope_columns)

## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(in_scope_columns)` instead of `in_scope_columns` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

With our columns selected, we can now rename for convenience

new_col_names <- c("wave","country_code","country_regions","cow","c_cow_alpha",
                   "b_country_alpha","interview_id","family_imp",
                   "friends_imp","leisure_imp","politics_imp",
                   "work_imp","religion_imp","current_happiness",
                   "current_health","life_satisfaction")

colnames(wave_6) <- new_col_names

Need to remove all rows where life_satisfaction is equal to NA, as this is our response variable.

wave_6 <- wave_6 %>% 
  drop_na(life_satisfaction)

What countries are included in the survey?

wave_6 %>%
  count(c_cow_alpha) %>%
  arrange(desc(n)) %>%
  head(15) %>%
  ggplot(mapping = aes(x = reorder(c_cow_alpha, -n), n)) + 
  geom_bar(stat="identity")

For our analysis, let’s focus on USA

usa_data <- wave_6 %>%
              filter(c_cow_alpha=="USA")

Define “engagement_ranking” as the average importance an indivudal places on measured life values

usa_data <- usa_data %>%
  mutate(
    engagement_ranking = rowMeans(.[8:13], na.rm=TRUE)
  )

View the data

head(usa_data)

Research Question

Is a person’s engagement in measured life values predictive of self-reported life-satisfaction?

Alternatively, (assuming we cover multiple logistic regression) can life_satisfaction be modeled from an individual’s selected importance levels for Family, Friends, Leisure, Politics, Work, and Religion?

Cases

Each case represents a randomly selected survey respondent from the United States. There are 2,216 observations in the given dataset. Surveys took place between 2010-2014.

Data Collection

Surveying and Data Collection is managed by the World Values Survey Association.

Inglehart, R., C. Haerpfer, A. Moreno, C. Welzel, K. Kizilova, J. Diez-Medrano, M. Lagos, P. Norris, E. Ponarin & B. Puranen et al. (eds.). 2014. World Values Survey: Round Six - Country-Pooled Datafile Version: www.worldvaluessurvey.org/WVSDocumentationWV6.jsp. Madrid: JD Systems Institute.

Type of Study

Observational

Data Source

Data is collected by the World Values Survey Association and is available online here: https://www.worldvaluessurvey.org/WVSDocumentationWV6.jsp.

For this project, data was downloaded manually and read into memory.

Response

The response variable is life_satisfaction and it is ordinal

Explanatory

The explanatory variable is engagement_ranking score and is numerical, bounded between 1 and 5.

Relevant Summary Statistics

describe(usa_data$engagement_ranking)

usa_data %>%
  ggplot() +
  geom_histogram(aes(x=engagement_ranking), bins=15)

describe(usa_data$life_satisfaction)

usa_data %>%
  count(life_satisfaction) %>%
  arrange(desc(n)) %>%
  ggplot(aes(x=reorder(life_satisfaction, -n), n)) +
  geom_bar(stat="identity")

606 Project Proposal

Alec

10/31/2021