Exploring the 2022 All Employee Survey data that is made publicly available by the VA here. Understanding employee sentiment across multiple VA departments provides insights into how their work environments can be improved. Employee satisfaction in the VA could be indicative of the care veterans are receiving, so its important to be mindful of their input.
Environment Setup
# set options
options(stringsAsFactors = F) # no automatic data transformation
options("scipen" = 100, "digits" = 4) # suppress math annotation
# activate packages
library(knitr)
library(lattice)
library(tidyverse)
library(likert)
library(dplyr)
library(naniar)
library(ggplot2)
library(corrplot)
library(ggcorrplot)
survey <- read.csv('../data/raw/AES_2022_PRDF.csv')
head(survey)
year administration gender age minority supv vaten s79 f03 f04 f06 f11 f12
1 2022 VHA 1 1 2 2 1 1 5 5 5 5 5
2 2022 VHA 2 2 2 1 1 1 5 5 4 4 5
3 2022 VHA 2 2 2 2 2 2 4 4 NA 4 4
4 2022 VHA 1 2 2 2 2 4 4 2 5 2 5
5 2022 VHA 1 2 2 2 1 1 5 5 5 5 5
6 2022 VHA 2 2 2 1 2 1 5 5 5 5 5
f17 f20 f29 f41 f47 f48 f49 f51 f52 f53 f54 f56 f59 f60 f61 f63 f64 f65
1 5 5 5 5 5 5 5 5 5 5 5 5 5 3 5 5 5 5
2 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5
3 4 3 4 4 5 5 5 4 5 3 3 4 4 3 3 4 4 4
4 5 4 5 5 4 5 5 5 5 4 4 5 4 5 5 5 4 3
5 5 5 5 5 5 5 5 5 5 4 4 4 4 5 5 5 4 5
6 4 5 5 4 5 5 5 5 5 3 4 4 2 5 4 4 4 5
s07f69 s09f71 s13f10 s15f40 s16f01 s31f42 s53f24
1 5 5 5 5 5 5 5
2 4 4 4 4 5 5 NA
3 4 3 2 4 4 5 3
4 3 2 3 4 5 3 3
5 5 4 5 4 5 5 5
6 5 3 5 3 5 5 5
Summary statistics of each column
summary(survey)
year administration gender age
Min. :2022 Length:297864 Min. :1.0 Min. :1
1st Qu.:2022 Class :character 1st Qu.:1.0 1st Qu.:1
Median :2022 Mode :character Median :2.0 Median :2
Mean :2022 Mean :1.9 Mean :2
3rd Qu.:2022 3rd Qu.:2.0 3rd Qu.:2
Max. :2022 Max. :4.0 Max. :2
NA's :10208
minority supv vaten s79 f03
Min. :1 Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:1 1st Qu.:1 1st Qu.:1 1st Qu.:1 1st Qu.:3
Median :1 Median :1 Median :1 Median :1 Median :4
Mean :1 Mean :1 Mean :1 Mean :2 Mean :4
3rd Qu.:2 3rd Qu.:1 3rd Qu.:2 3rd Qu.:2 3rd Qu.:5
Max. :2 Max. :2 Max. :3 Max. :6 Max. :5
NA's :14974 NA's :5153 NA's :7051 NA's :3936 NA's :3727
f04 f06 f11 f12 f17
Min. :1 Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:3 1st Qu.:4 1st Qu.:3 1st Qu.:4 1st Qu.:3
Median :4 Median :4 Median :4 Median :4 Median :4
Mean :4 Mean :4 Mean :4 Mean :4 Mean :4
3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5
Max. :5 Max. :5 Max. :5 Max. :5 Max. :5
NA's :4015 NA's :4710 NA's :6011 NA's :3513 NA's :9385
f20 f29 f41 f47 f48
Min. :1 Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:4 1st Qu.:4 1st Qu.:3 1st Qu.:4 1st Qu.:4
Median :4 Median :4 Median :4 Median :4 Median :5
Mean :4 Mean :4 Mean :3 Mean :4 Mean :4
3rd Qu.:5 3rd Qu.:5 3rd Qu.:4 3rd Qu.:5 3rd Qu.:5
Max. :5 Max. :5 Max. :5 Max. :5 Max. :5
NA's :4327 NA's :5400 NA's :12964 NA's :5749 NA's :4962
f49 f51 f52 f53 f54
Min. :1 Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:4 1st Qu.:4 1st Qu.:4 1st Qu.:3 1st Qu.:3
Median :5 Median :4 Median :5 Median :4 Median :4
Mean :4 Mean :4 Mean :4 Mean :3 Mean :4
3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5
Max. :5 Max. :5 Max. :5 Max. :5 Max. :5
NA's :8108 NA's :3536 NA's :8105 NA's :8232 NA's :12943
f56 f59 f60 f61 f63
Min. :1 Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:3 1st Qu.:3 1st Qu.:3 1st Qu.:3 1st Qu.:3
Median :4 Median :4 Median :4 Median :4 Median :4
Mean :4 Mean :4 Mean :4 Mean :4 Mean :4
3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5
Max. :5 Max. :5 Max. :5 Max. :5 Max. :5
NA's :5614 NA's :10673 NA's :30806 NA's :8061 NA's :4259
f64 f65 s07f69 s09f71 s13f10
Min. :1 Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:3 1st Qu.:3 1st Qu.:3 1st Qu.:3 1st Qu.:3
Median :4 Median :4 Median :4 Median :4 Median :4
Mean :4 Mean :4 Mean :4 Mean :4 Mean :4
3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5
Max. :5 Max. :5 Max. :5 Max. :5 Max. :5
NA's :3221 NA's :4792 NA's :3383 NA's :3620 NA's :4461
s15f40 s16f01 s31f42 s53f24
Min. :1 Min. :1 Min. :1 Min. :1
1st Qu.:3 1st Qu.:3 1st Qu.:4 1st Qu.:3
Median :4 Median :4 Median :5 Median :4
Mean :4 Mean :4 Mean :4 Mean :4
3rd Qu.:5 3rd Qu.:5 3rd Qu.:5 3rd Qu.:5
Max. :5 Max. :5 Max. :5 Max. :5
NA's :6666 NA's :3564 NA's :4746 NA's :12978
Renaming columns for clarity and categorizing survey items into personal, supervisor, work group, and senior leadership.
colnames(survey)[7] <- "va_tenure"
colnames(survey)[6] <- "supervisor"
personal = c(8, 9, 10, 11, 12, 13, 14, 17, 29, 31, 32, 33, 34, 35, 36)
workg = c(15, 16, 26, 38)
superv = c(18, 19, 20, 21, 22, 37)
srleader = c(23, 24, 25, 27, 28, 30)
colnames(survey)[workg] <- paste(colnames(survey)[workg], 'workgrp', sep='_')
colnames(survey)[personal] <- paste(colnames(survey)[personal], 'personal', sep='_')
colnames(survey)[superv] <- paste(colnames(survey)[superv], 'superv', sep='_')
colnames(survey)[srleader] <- paste(colnames(survey)[srleader], 'srlead', sep='_')
colnames(survey)
[1] "year" "administration" "gender" "age"
[5] "minority" "supervisor" "va_tenure" "s79_personal"
[9] "f03_personal" "f04_personal" "f06_personal" "f11_personal"
[13] "f12_personal" "f17_personal" "f20_workgrp" "f29_workgrp"
[17] "f41_personal" "f47_superv" "f48_superv" "f49_superv"
[21] "f51_superv" "f52_superv" "f53_srlead" "f54_srlead"
[25] "f56_srlead" "f59_workgrp" "f60_srlead" "f61_srlead"
[29] "f63_personal" "f64_srlead" "f65_personal" "s07f69_personal"
[33] "s09f71_personal" "s13f10_personal" "s15f40_personal" "s16f01_personal"
[37] "s31f42_superv" "s53f24_workgrp"
Percent of missing values in entire dataframe:
pct_miss(survey)
[1] 2.243
Percent of rows with at least one missing value:
pct_miss_case(survey)
[1] 32.22
Percent of row that are complete:
pct_complete_case(survey)
[1] 67.78
survey %>%
gg_miss_var(show_pct = TRUE)
survey %>%
gg_miss_var(show_pct = TRUE, facet = administration)
Based on these plots it seems that each department had similarly
distributed missing values for each question. The question
f60 (Overall, how good a job do you feel is being done by
the manager directly above your immediate supervisor?) is missing the
most values.
library(gridExtra)
# visualizing demographics
p1 <- ggplot(survey, aes(y = age)) +
geom_bar(fill = cat6[1]) +
scale_y_discrete(limit = c(1.0, 2.0), labels = c('Under 40', 'Over 40'))
p2 <- ggplot(survey, aes(y = gender)) +
geom_bar(fill = cat6[1]) +
scale_y_discrete(limit = c(1.0, 2.0, 3.0, 4.0),
labels = c('Male', 'Female', 'Neither', 'Prefer not to say'))
p3 <- ggplot(survey, aes(y = minority)) +
geom_bar(fill = cat6[1]) +
scale_y_discrete(limit = c(1.0, 2.0), labels = c('Non-minority', 'Minority'))
p4 <- ggplot(survey, aes(y = supervisor)) +
geom_bar(fill = cat6[1]) +
scale_y_discrete(limit = c(1.0, 2.0), labels = c('Non-supervisor', 'Supervisor'))
p5 <- ggplot(survey, aes(y = va_tenure)) +
geom_bar(fill = cat6[1]) +
scale_y_discrete(limit = c(1.0, 2.0, 3.0),
labels = c('Less than 10 years', '10 to 20 years', 'More than 20 years'))
p6 <-ggplot(survey, aes(y = administration)) +
geom_bar(fill = cat6[1])
grid.arrange(p1, p2, p3, p4, p5, p6, ncol= 3)
Majority of responses come from the Veterans Health Administration field and have been employed by the VA for less than 10 years as non-supervisors.
Correlation can be used to identify the relationship between survey items and determine if there are any problems or inconsistencies with the items.
Creating the correlation matrix
# correlation matrix of non-demographic survey questions
cor_mat <- survey %>%
select(8:38) %>%
cor(., use = "pairwise.complete.obs")
Plotting with corrplot:
corrplot(cor_mat,
order = "hclust", # hierarchical clustering
tl.cex = 0.7)
Plotting with ggcorplot:
ggcorrplot(cor_mat,
type = "lower", # show lower part of matrix only
hc.order = TRUE, # hierarchical clustering
outline.color = "white",
tl.cex = 7.0)
Both plots show that the first item s79_personal (Are you considering leaving your job within the next year?) is not highly correlated with the rest of the survey items. I’m still going to include this item in the personal group as it does relate to an employee’s personal attitudes/behavior.
Stacked bar charts are used to visualize Likert-type survey items.
The likert package provides an easy way to to create these
charts and clearly label the percentages of each response.
The three types of survey responses in this dataset are:
Missing/Do not know responses are labeled as 6 in the data.
Functions to re-code the responses
# disagree, agree questions
likert_agree_recode <- function(x) {
responses <- c("Strongly Disagree", "Disagree", "Neutral", "Agree",
"Strongly Agree")
# use response int value as index to retrieve label
y <- ifelse(is.na(x) | x==6, NA, responses[x])
y <- factor(y, levels = responses)
return(y)
}
# satisfied, dissatisfied questions
likert_satis_recode <- function(x) {
responses = c("Very Dissatisfied", "Dissatisfied", "Neutral",
"Satisfied", "Very Satisfied")
y <- ifelse(is.na(x) | x==6, NA, responses[x])
y <- factor(y, levels = responses)
return(y)
}
# poor, good questions
likert_poor_recode <- function(x) {
responses <- c("Very Poor", "Poor", "Fair", "Good", "Very Good")
y <- ifelse(is.na(x) | x==6, NA, responses[x])
y <- factor(y, levels = responses)
return(y)
}
pers_items_1 <- select(survey, ends_with('_personal')) %>%
select(-c(1, 9, 10 ,11, 12))
# use question as item name
names(pers_items_1) <- c(
f03_personal="I feel encouraged to come up with new and better ways
of doing things.",
f04_personal="My work gives me a feeling of personal accomplishment.",
f06_personal="I know what is expected of me on the job.",
f11_personal="My talents are used well in the workplace.",
f12_personal="I know how my work relates to the agency's goals",
f17_personal="I can disclose a suspected violation of any law, rule,
or regulation without fear of reprisal.",
f41_personal="I believe the results of this survey will be used to make
my agency a better place to work.",
s13f10_personal="My workload is reasonable.",
s15f40_personal="I recommend my organization as a good place to work.",
s16f01_personal="I am given a real opportunity to improve my skills in my
organization."
)
pers_items_2 <- select(survey, ends_with('_personal')) %>%
select(c(9, 10 ,11, 12))
names(pers_items_2) <- c(
f63_personal = "How satisfied are you with your involvement in decisions
that affect your work?",
f65_personal = "How satisfied are you with the recognition you receive for
doing a good job?",
s07f69_personal = "Considering everything, how satisfied are you with your job?",
s09f71_personal = "Considering everything, how satisfied are you with your
organization?"
)
# transform items into factors then likert object
personal_likert_da <- pers_items_1 %>%
mutate_all(likert_agree_recode) %>%
likert()
personal_likert_sa <- pers_items_2 %>%
mutate_all(likert_satis_recode) %>%
likert()
plot(personal_likert_da,
group.order = names(pers_items_1),
legend.position="right",
centered = FALSE,
wrap = 40) +
theme(axis.text = element_text(size = 12))
plot(personal_likert_sa,
group.order = names(pers_items_2),
legend.position="right",
centered = FALSE,
wrap = 40) +
theme(axis.text = element_text(size = 12))
Observations
ggplot(survey, aes(y = s79_personal)) +
geom_bar(fill = cat6[1]) +
scale_y_discrete(limit = c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0),
labels = c('No', 'Yes but taking another job within VA', 'Yes, to retire',
'Yes, to take another job within the Federal government',
'Yes, to take another job outside the Federal government',
'Yes, other', 'Missing/Do not know')) +
ggtitle("Are you considering leaving your job within the next year?") +
theme(plot.title = element_text(size = 14, face = "bold"), axis.text = element_text(size = 12))