Younger Generations’ Perception and Self-Evaluation of Well-Being
With the rise of social media usage in the past two decades, younger generations are gaining access to more resources and information faster than ever before. In 2017, research indicated that approximately 69% of adults and 81% of teenagers in the United States utilize social media (Smith 2017). This phenomenon has created a paradox of sorts in the development of both the Millennial generation as well as Generation Z. On one hand, the speed and openness with which experienced social media users can share information with another has created an environment where people can freely share their thoughts and feelings online. Consequently, the speed and accessibility of online information can exacerbate feelings of missing out compared to peers, increase awareness about worldwide injustice, and ultimately cause young adults to feel burdened by knowledge.
One particular issue worth highlighting is the double-edged sword of younger generations being more acutely aware of mental health issues. Studies indicate that both Generation Z and Millennials are concerned issues such as childhood trauma and problems they have with their peers. Mass shootings, massive amounts of debt, and other modern-era issues were cited as some of the major concerns facing younger generations today (Benthune 2019). As a result of this increased awareness, younger Americans are more likely to overcome the stigma of seeking therapy and be seen at an early age (Morin 2021) than older generations. The particular conundrum is that this focus and awareness of what constitutes good versus poor mental health may lead younger generations to rate their well-being more poorly than older generations that may have a higher sense of “dealing” with or not even perceiving problems as more than mere inconvenience. In other words, younger generations, despite being more attuned to their overall well-being, may be more likely to report their mental health and overall well-being as worse than older generations.
To test our hypothesis that younger generations report worse mental health and well-being than older generations, we will conduct analysis on a dataset based on well-being and associated factors. This dataset is comprised of a selection of American adults who were asked to self-evaluate their well-being and commenting on associated life experiences: examples of this include physical health, income levels, and sleep. We will use a variety of data analysis techniques to evaluate younger vs older generations of patients. In the process, we will examine behavioral factors that may affect younger generations' perceptions of their own wellbeing.
#loading in the appropriate libraries
library(dplyr)
library(ggplot2)
library(reshape2)
library(plotly)
library(rvest)
library(tidyr)
library(rmarkdown)
library(DT)
library(Amelia)
library(e1071)
library(gmodels)As our source of well-being information for this analysis, we will be utilizing the dataset created in Case-Study 27: Evidence of well-being, health behavior, and environmental factors. This dataset is an aggregate from multiple data sources containing information on factors associated with well-being as well as individuals’ perception of their mental health.
#reading in the well-being data
wb_data <- read.csv("27_Well_being_Health_Behavior_Environmental_Factors_Dataset.csv")Generating summary statistic of the data
#summarizing the data
summary(wb_data)## X age wellbeing SocialSupport
## Min. : 1 Min. : 7.00 Min. :0.000 Min. :1.000
## 1st Qu.: 98925 1st Qu.:45.00 1st Qu.:0.000 1st Qu.:4.000
## Median :197849 Median :57.00 Median :0.000 Median :5.000
## Mean :197849 Mean :56.32 Mean :0.055 Mean :4.181
## 3rd Qu.:296773 3rd Qu.:69.00 3rd Qu.:0.000 3rd Qu.:5.000
## Max. :395697 Max. :99.00 Max. :1.000 Max. :5.000
## NA's :17025 NA's :20998
## race income GeneralHealth PoorPhysicalHealthDays
## Min. :0.000 Min. :1.00 Min. :1.000 Min. : 0.000
## 1st Qu.:1.000 1st Qu.:2.00 1st Qu.:2.000 1st Qu.: 0.000
## Median :1.000 Median :4.00 Median :3.000 Median : 0.000
## Mean :0.806 Mean :3.61 Mean :2.586 Mean : 4.472
## 3rd Qu.:1.000 3rd Qu.:5.00 3rd Qu.:3.000 3rd Qu.: 3.000
## Max. :1.000 Max. :5.00 Max. :5.000 Max. :30.000
## NA's :5309 NA's :54657 NA's :1555 NA's :9341
## PoorMentalHealthDays Asthma HealthInsurance CVD
## Min. : 0.000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.000
## Median : 0.000 Median :0.0000 Median :1.0000 Median :0.000
## Mean : 3.497 Mean :0.0936 Mean :0.8955 Mean :0.067
## 3rd Qu.: 2.000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :30.000 Max. :1.0000 Max. :1.0000 Max. :1.000
## NA's :7361 NA's :2590 NA's :991 NA's :3928
## LimitedActivity Diabetes employ BMI
## Min. :0.0000 Min. :0.0000 Min. :1.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:1.000
## Median :0.0000 Median :0.0000 Median :3.000 Median :2.000
## Mean :0.2723 Mean :0.1274 Mean :3.885 Mean :1.933
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:7.000 3rd Qu.:3.000
## Max. :1.0000 Max. :1.0000 Max. :8.000 Max. :3.000
## NA's :1858 NA's :395 NA's :1322 NA's :17290
## HeavyDrinker CurrentSmoker PoorSleepDays PhysicalActivity
## Min. :0.000 Min. :0.0000 Min. : 0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median : 3.000 Median :1.0000
## Mean :0.046 Mean :0.1583 Mean : 7.707 Mean :0.7303
## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:12.000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.0000 Max. :30.000 Max. :1.0000
## NA's :10742 NA's :2463 NA's :7560 NA's :533
## marital PrematureMortality HouseholdIncome ParkAccess
## Min. :0.0000 Min. :133.5 Min. : 22289 Min. : 1.00
## 1st Qu.:0.0000 1st Qu.:294.0 1st Qu.: 41103 1st Qu.: 13.00
## Median :1.0000 Median :345.0 Median : 47930 Median : 32.00
## Mean :0.5552 Mean :355.8 Mean : 50177 Mean : 34.58
## 3rd Qu.:1.0000 3rd Qu.:405.7 3rd Qu.: 56166 3rd Qu.: 52.00
## Max. :1.0000 Max. :883.5 Max. :119525 Max. :100.00
## NA's :1559 NA's :9633
## CrimeRate UnemploymentRate WaterSafety HighSchoolRate
## Min. : 12.0 Min. : 1.100 Min. : 0.000 Min. : 27.0
## 1st Qu.: 202.0 1st Qu.: 7.000 1st Qu.: 0.000 1st Qu.: 74.0
## Median : 343.0 Median : 8.400 Median : 1.000 Median : 80.0
## Mean : 405.6 Mean : 8.716 Mean : 6.511 Mean : 79.1
## 3rd Qu.: 558.0 3rd Qu.:10.200 3rd Qu.: 5.000 3rd Qu.: 86.0
## Max. :2062.0 Max. :29.700 Max. :100.000 Max. :100.0
## NA's :4174 NA's :5039 NA's :16
## SomeCollegeRate AccessToRecFacility FastFoodPercentage PM2.5
## Min. :23.20 Min. : 0.00 Min. : 8.00 Min. : 6.50
## 1st Qu.:54.00 1st Qu.: 6.90 1st Qu.: 44.00 1st Qu.:10.86
## Median :61.20 Median : 9.80 Median : 50.00 Median :11.74
## Mean :60.45 Mean :10.13 Mean : 48.33 Mean :11.63
## 3rd Qu.:67.70 3rd Qu.:13.00 3rd Qu.: 54.00 3rd Qu.:12.78
## Max. :90.40 Max. :57.50 Max. :100.00 Max. :14.78
## NA's :42
## PopulationDensity
## Min. : 1.7
## 1st Qu.: 66.3
## Median : 276.9
## Mean : 1186.5
## 3rd Qu.: 991.3
## Max. :69468.4
##
There are some observed NAs in the dataset. We will go ahead and visualize this data using Amelia to see if there are any columsn that are not worth using due to overly missing data.
missmap(wb_data, main = "Missing values vs observed")Most values are readily available. We will remove the index column (X) and some of the county-level variables we will not be examining during this analysis.
#subsetting the data
wb_data = subset(wb_data, select = -c(X, PM2.5, PopulationDensity))Because missing data is approximately 1% of the dataset, we will ignore missing values instead of imputing with Amelia.
Well-being appears to be defined as a categorical factor. We will redefine it, along with several other factors that are categorical as factors.
wb_data$wellbeing <- as.factor(wb_data$wellbeing)
wb_data$race <- as.factor(wb_data$race)
wb_data$Asthma <- as.factor(wb_data$Asthma)
wb_data$HealthInsurance <- as.factor(wb_data$HealthInsurance)
wb_data$CVD <- as.factor(wb_data$CVD)
wb_data$LimitedActivity <- as.factor(wb_data$LimitedActivity)
wb_data$Diabetes <- as.factor(wb_data$Diabetes)
wb_data$HeavyDrinker <- as.factor(wb_data$HeavyDrinker)
wb_data$CurrentSmoker <- as.factor(wb_data$CurrentSmoker)
wb_data$PhysicalActivity <- as.factor(wb_data$PhysicalActivity)
wb_data$marital <- as.factor(wb_data$marital)#reexamining the data with categorical variables accurately defined
summary(wb_data)## age wellbeing SocialSupport race income
## Min. : 7.00 0 :357734 Min. :1.000 0 : 75583 Min. :1.00
## 1st Qu.:45.00 1 : 20938 1st Qu.:4.000 1 :314805 1st Qu.:2.00
## Median :57.00 NA's: 17025 Median :5.000 NA's: 5309 Median :4.00
## Mean :56.32 Mean :4.181 Mean :3.61
## 3rd Qu.:69.00 3rd Qu.:5.000 3rd Qu.:5.00
## Max. :99.00 Max. :5.000 Max. :5.00
## NA's :20998 NA's :54657
## GeneralHealth PoorPhysicalHealthDays PoorMentalHealthDays Asthma
## Min. :1.000 Min. : 0.000 Min. : 0.000 0 :356310
## 1st Qu.:2.000 1st Qu.: 0.000 1st Qu.: 0.000 1 : 36797
## Median :3.000 Median : 0.000 Median : 0.000 NA's: 2590
## Mean :2.586 Mean : 4.472 Mean : 3.497
## 3rd Qu.:3.000 3rd Qu.: 3.000 3rd Qu.: 2.000
## Max. :5.000 Max. :30.000 Max. :30.000
## NA's :1555 NA's :9341 NA's :7361
## HealthInsurance CVD LimitedActivity Diabetes employ
## 0 : 41258 0 :365585 0 :286595 0 :344933 Min. :1.000
## 1 :353448 1 : 26184 1 :107244 1 : 50369 1st Qu.:1.000
## NA's: 991 NA's: 3928 NA's: 1858 NA's: 395 Median :3.000
## Mean :3.885
## 3rd Qu.:7.000
## Max. :8.000
## NA's :1322
## BMI HeavyDrinker CurrentSmoker PoorSleepDays PhysicalActivity
## Min. :1.000 0 :367224 0 :330981 Min. : 0.000 0 :106578
## 1st Qu.:1.000 1 : 17731 1 : 62253 1st Qu.: 0.000 1 :288586
## Median :2.000 NA's: 10742 NA's: 2463 Median : 3.000 NA's: 533
## Mean :1.933 Mean : 7.707
## 3rd Qu.:3.000 3rd Qu.:12.000
## Max. :3.000 Max. :30.000
## NA's :17290 NA's :7560
## marital PrematureMortality HouseholdIncome ParkAccess
## 0 :175318 Min. :133.5 Min. : 22289 Min. : 1.00
## 1 :218820 1st Qu.:294.0 1st Qu.: 41103 1st Qu.: 13.00
## NA's: 1559 Median :345.0 Median : 47930 Median : 32.00
## Mean :355.8 Mean : 50177 Mean : 34.58
## 3rd Qu.:405.7 3rd Qu.: 56166 3rd Qu.: 52.00
## Max. :883.5 Max. :119525 Max. :100.00
## NA's :9633
## CrimeRate UnemploymentRate WaterSafety HighSchoolRate
## Min. : 12.0 Min. : 1.100 Min. : 0.000 Min. : 27.0
## 1st Qu.: 202.0 1st Qu.: 7.000 1st Qu.: 0.000 1st Qu.: 74.0
## Median : 343.0 Median : 8.400 Median : 1.000 Median : 80.0
## Mean : 405.6 Mean : 8.716 Mean : 6.511 Mean : 79.1
## 3rd Qu.: 558.0 3rd Qu.:10.200 3rd Qu.: 5.000 3rd Qu.: 86.0
## Max. :2062.0 Max. :29.700 Max. :100.000 Max. :100.0
## NA's :4174 NA's :5039 NA's :16
## SomeCollegeRate AccessToRecFacility FastFoodPercentage
## Min. :23.20 Min. : 0.00 Min. : 8.00
## 1st Qu.:54.00 1st Qu.: 6.90 1st Qu.: 44.00
## Median :61.20 Median : 9.80 Median : 50.00
## Mean :60.45 Mean :10.13 Mean : 48.33
## 3rd Qu.:67.70 3rd Qu.:13.00 3rd Qu.: 54.00
## Max. :90.40 Max. :57.50 Max. :100.00
## NA's :42
This data utilizes data from 2010, which would mean that the oldest possible age of a Generation Z individual is 13 years old using the following definitions (2021):
Gen Z born 1997-2012 –> 9-24 (2021), 0-13 (2010)
Millenials born 1981-1996 –> 25-40 (2021), 14-29 (2010)
Gen X born 1965-1980 –> 41-56 (2021) –> 30-45 (2010)
Boomers generally born 1965 –> 57+ (2021) –> 46+ (2010)
We will examine summary statistics of three defined groups: the “young” generation (gen Z + millenial), gen x, and baby boomers using the definitions outlined above for 2010.
young_gen <- wb_data %>%
filter(age < 30, .keep_all = TRUE)
datatable(summary(young_gen))gen_x <- wb_data %>%
filter(age < 46 & age > 29, .keep_all = TRUE)
datatable(summary(gen_x))oldest_gen <- wb_data %>%
filter(age > 45, .keep_all = TRUE)
datatable(summary(oldest_gen))plot_ly(x = wb_data$wellbeing, type = "histogram", name = "Full Population Well-Being Histogram")## Warning: Ignoring 17025 observations
#proportion of wellbeing responses for young generation
prop.table(table(wb_data$wellbeing))##
## 0 1
## 0.94470676 0.05529324
plot_ly(x = young_gen$wellbeing, type = "histogram", name = "Gen Z + Millenial Well-Being Histogram")## Warning: Ignoring 1401 observations
#proportion of wellbeing responses for young generation
prop.table(table(young_gen$wellbeing))##
## 0 1
## 0.94493132 0.05506868
plot_ly(x = gen_x$wellbeing, type = "histogram", name = "Gen Z Well-Being Histogram")## Warning: Ignoring 3141 observations
#proportion of wellbeing responses for generation x
prop.table(table(gen_x$wellbeing))##
## 0 1
## 0.94455555 0.05544445
plot_ly(x = oldest_gen$wellbeing, type = "histogram", name = "Baby Boomer Well-Being Histogram")## Warning: Ignoring 12483 observations
#proportion of wellbeing responses for oldest generation
prop.table(table(oldest_gen$wellbeing))##
## 0 1
## 0.94472736 0.05527264
Based on the histogram visualizations and the outcomes of the proportion tables, there appears to be minimal differences in self-evaluated well-being ratings between the three generations. We will continue investigating other factors related to well-being to determine if there are differences between young and older generations.
Poor Mental Health Days are another self-rated metric where participants rate how many poor mental health days they had in a month. Using Linear Regression, we can examine any trends in self-reported mental health days against age.
#making dataframe that only contains complete cases for linear regression
complete_wb <- wb_data[complete.cases(wb_data), ]
fit_wb<-lm(PoorMentalHealthDays ~ age, data=complete_wb)
plot_ly(complete_wb, x = ~age, y = ~PoorMentalHealthDays, type = 'scatter', mode = "markers", name="Data") %>%
add_trace(x=~mean(age), y=~mean(PoorMentalHealthDays), type="scatter", mode="markers",
name="(mean(age), mean(PoorMentalHealthDays))", marker=list(size=20, color='blue', line=list(color='yellow', width=2))) %>%
add_lines(x = ~age, y = fit_wb$fitted.values, mode = "lines", name="Linear Model") %>%
layout(title=paste0("Linear Regression Model of Age Vs. Poor Mental Health Days",
round(cor(complete_wb$age, complete_wb$PoorMentalHealthDays),3)))