Abstract

Exploring the 2022 All Employee Survey data that is made publicly available by the VA here. Understanding employee sentiment across multiple VA departments provides insights into how their work environments can be improved. Employee satisfaction in the VA could be indicative of the care veterans are receiving, so its important to be mindful of their input.


Environment Setup

# set options
options(stringsAsFactors = F)         # no automatic data transformation
options("scipen" = 100, "digits" = 4) # suppress math annotation

# activate packages
library(knitr)
library(lattice)
library(tidyverse)
library(likert)
library(dplyr)
library(naniar)
library(ggplot2)
library(corrplot)
library(ggcorrplot)

Load Data

survey <- read.csv('../data/raw/AES_2022_PRDF.csv')
head(survey)
  year administration gender age minority supv vaten s79 f03 f04 f06 f11 f12
1 2022            VHA      1   1        2    2     1   1   5   5   5   5   5
2 2022            VHA      2   2        2    1     1   1   5   5   4   4   5
3 2022            VHA      2   2        2    2     2   2   4   4  NA   4   4
4 2022            VHA      1   2        2    2     2   4   4   2   5   2   5
5 2022            VHA      1   2        2    2     1   1   5   5   5   5   5
6 2022            VHA      2   2        2    1     2   1   5   5   5   5   5
  f17 f20 f29 f41 f47 f48 f49 f51 f52 f53 f54 f56 f59 f60 f61 f63 f64 f65
1   5   5   5   5   5   5   5   5   5   5   5   5   5   3   5   5   5   5
2   5   5   5   4   5   5   5   5   5   5   5   5   5   5   5   5   5   5
3   4   3   4   4   5   5   5   4   5   3   3   4   4   3   3   4   4   4
4   5   4   5   5   4   5   5   5   5   4   4   5   4   5   5   5   4   3
5   5   5   5   5   5   5   5   5   5   4   4   4   4   5   5   5   4   5
6   4   5   5   4   5   5   5   5   5   3   4   4   2   5   4   4   4   5
  s07f69 s09f71 s13f10 s15f40 s16f01 s31f42 s53f24
1      5      5      5      5      5      5      5
2      4      4      4      4      5      5     NA
3      4      3      2      4      4      5      3
4      3      2      3      4      5      3      3
5      5      4      5      4      5      5      5
6      5      3      5      3      5      5      5

Summary statistics of each column

summary(survey)
      year      administration         gender         age       
 Min.   :2022   Length:297864      Min.   :1.0   Min.   :1      
 1st Qu.:2022   Class :character   1st Qu.:1.0   1st Qu.:1      
 Median :2022   Mode  :character   Median :2.0   Median :2      
 Mean   :2022                      Mean   :1.9   Mean   :2      
 3rd Qu.:2022                      3rd Qu.:2.0   3rd Qu.:2      
 Max.   :2022                      Max.   :4.0   Max.   :2      
                                                 NA's   :10208  
    minority          supv          vaten           s79            f03      
 Min.   :1       Min.   :1      Min.   :1      Min.   :1      Min.   :1     
 1st Qu.:1       1st Qu.:1      1st Qu.:1      1st Qu.:1      1st Qu.:3     
 Median :1       Median :1      Median :1      Median :1      Median :4     
 Mean   :1       Mean   :1      Mean   :1      Mean   :2      Mean   :4     
 3rd Qu.:2       3rd Qu.:1      3rd Qu.:2      3rd Qu.:2      3rd Qu.:5     
 Max.   :2       Max.   :2      Max.   :3      Max.   :6      Max.   :5     
 NA's   :14974   NA's   :5153   NA's   :7051   NA's   :3936   NA's   :3727  
      f04            f06            f11            f12            f17      
 Min.   :1      Min.   :1      Min.   :1      Min.   :1      Min.   :1     
 1st Qu.:3      1st Qu.:4      1st Qu.:3      1st Qu.:4      1st Qu.:3     
 Median :4      Median :4      Median :4      Median :4      Median :4     
 Mean   :4      Mean   :4      Mean   :4      Mean   :4      Mean   :4     
 3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5     
 Max.   :5      Max.   :5      Max.   :5      Max.   :5      Max.   :5     
 NA's   :4015   NA's   :4710   NA's   :6011   NA's   :3513   NA's   :9385  
      f20            f29            f41             f47            f48      
 Min.   :1      Min.   :1      Min.   :1       Min.   :1      Min.   :1     
 1st Qu.:4      1st Qu.:4      1st Qu.:3       1st Qu.:4      1st Qu.:4     
 Median :4      Median :4      Median :4       Median :4      Median :5     
 Mean   :4      Mean   :4      Mean   :3       Mean   :4      Mean   :4     
 3rd Qu.:5      3rd Qu.:5      3rd Qu.:4       3rd Qu.:5      3rd Qu.:5     
 Max.   :5      Max.   :5      Max.   :5       Max.   :5      Max.   :5     
 NA's   :4327   NA's   :5400   NA's   :12964   NA's   :5749   NA's   :4962  
      f49            f51            f52            f53            f54       
 Min.   :1      Min.   :1      Min.   :1      Min.   :1      Min.   :1      
 1st Qu.:4      1st Qu.:4      1st Qu.:4      1st Qu.:3      1st Qu.:3      
 Median :5      Median :4      Median :5      Median :4      Median :4      
 Mean   :4      Mean   :4      Mean   :4      Mean   :3      Mean   :4      
 3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      
 Max.   :5      Max.   :5      Max.   :5      Max.   :5      Max.   :5      
 NA's   :8108   NA's   :3536   NA's   :8105   NA's   :8232   NA's   :12943  
      f56            f59             f60             f61            f63      
 Min.   :1      Min.   :1       Min.   :1       Min.   :1      Min.   :1     
 1st Qu.:3      1st Qu.:3       1st Qu.:3       1st Qu.:3      1st Qu.:3     
 Median :4      Median :4       Median :4       Median :4      Median :4     
 Mean   :4      Mean   :4       Mean   :4       Mean   :4      Mean   :4     
 3rd Qu.:5      3rd Qu.:5       3rd Qu.:5       3rd Qu.:5      3rd Qu.:5     
 Max.   :5      Max.   :5       Max.   :5       Max.   :5      Max.   :5     
 NA's   :5614   NA's   :10673   NA's   :30806   NA's   :8061   NA's   :4259  
      f64            f65           s07f69         s09f71         s13f10    
 Min.   :1      Min.   :1      Min.   :1      Min.   :1      Min.   :1     
 1st Qu.:3      1st Qu.:3      1st Qu.:3      1st Qu.:3      1st Qu.:3     
 Median :4      Median :4      Median :4      Median :4      Median :4     
 Mean   :4      Mean   :4      Mean   :4      Mean   :4      Mean   :4     
 3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5     
 Max.   :5      Max.   :5      Max.   :5      Max.   :5      Max.   :5     
 NA's   :3221   NA's   :4792   NA's   :3383   NA's   :3620   NA's   :4461  
     s15f40         s16f01         s31f42         s53f24     
 Min.   :1      Min.   :1      Min.   :1      Min.   :1      
 1st Qu.:3      1st Qu.:3      1st Qu.:4      1st Qu.:3      
 Median :4      Median :4      Median :5      Median :4      
 Mean   :4      Mean   :4      Mean   :4      Mean   :4      
 3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      3rd Qu.:5      
 Max.   :5      Max.   :5      Max.   :5      Max.   :5      
 NA's   :6666   NA's   :3564   NA's   :4746   NA's   :12978  

Survey Components

Demographics

  • Administration:
    • VHA = Veterans Health Administration Field
    • VHACO = Veterans Health Administration Central Office
    • VACO = VA Central Office
    • VBA = Veterans Benefits Administration
    • NCA = National Cemetery Administration
    • OI&T = Office of Information and Technology
    • OIG = Office of the Inspector General
    • OGC = Office of General Counsel
    • BVA = Board of Veterans Appeals
  • Gender:
    • 1 = Male
    • 2 = Female
    • 3 = Neither female nor male (includes: intersex, non-binary, not listed)
    • 4 = Prefer not to say
  • Age:
    • 1 = Under 40
    • 2 = Over 40
  • Minority:
    • 1 = Non-minority
    • 2 = Minority
  • Supervisor:
    • 1 = Non-supervisor
    • 2 = Supervisor
  • VA tenure:
    • 1 = Less than 10 years
    • 2 = 10 to 20 years
    • 3 = More than 20 years

Survey Items

Data Cleaning

Descriptive Column Names

Renaming columns for clarity and categorizing survey items into personal, supervisor, work group, and senior leadership.

colnames(survey)[7] <- "va_tenure"
colnames(survey)[6] <- "supervisor"

personal = c(8, 9, 10, 11, 12, 13, 14, 17, 29, 31, 32, 33, 34, 35, 36)
workg = c(15, 16, 26, 38)
superv = c(18, 19, 20, 21, 22, 37)
srleader = c(23, 24, 25, 27, 28, 30)

colnames(survey)[workg]  <- paste(colnames(survey)[workg], 'workgrp', sep='_')
colnames(survey)[personal]  <- paste(colnames(survey)[personal], 'personal', sep='_')
colnames(survey)[superv]  <- paste(colnames(survey)[superv], 'superv', sep='_')
colnames(survey)[srleader]  <- paste(colnames(survey)[srleader], 'srlead', sep='_')
colnames(survey)
 [1] "year"            "administration"  "gender"          "age"            
 [5] "minority"        "supervisor"      "va_tenure"       "s79_personal"   
 [9] "f03_personal"    "f04_personal"    "f06_personal"    "f11_personal"   
[13] "f12_personal"    "f17_personal"    "f20_workgrp"     "f29_workgrp"    
[17] "f41_personal"    "f47_superv"      "f48_superv"      "f49_superv"     
[21] "f51_superv"      "f52_superv"      "f53_srlead"      "f54_srlead"     
[25] "f56_srlead"      "f59_workgrp"     "f60_srlead"      "f61_srlead"     
[29] "f63_personal"    "f64_srlead"      "f65_personal"    "s07f69_personal"
[33] "s09f71_personal" "s13f10_personal" "s15f40_personal" "s16f01_personal"
[37] "s31f42_superv"   "s53f24_workgrp" 

Missing Values

Summary

Percent of missing values in entire dataframe:

pct_miss(survey)
[1] 2.243

Percent of rows with at least one missing value:

pct_miss_case(survey)
[1] 32.22

Percent of row that are complete:

pct_complete_case(survey)
[1] 67.78

Missing values for entire dataset

survey %>%
  gg_miss_var(show_pct = TRUE)

Missing value distribution by administration

survey %>%
  gg_miss_var(show_pct = TRUE, facet = administration)

Based on these plots it seems that each department had similarly distributed missing values for each question. The question f60 (Overall, how good a job do you feel is being done by the manager directly above your immediate supervisor?) is missing the most values.


EDA

Demographic Data

library(gridExtra)

# visualizing demographics
p1 <- ggplot(survey, aes(y = age)) + 
      geom_bar(fill = cat6[1]) + 
      scale_y_discrete(limit = c(1.0, 2.0), labels = c('Under 40', 'Over 40'))  

p2 <- ggplot(survey, aes(y = gender)) + 
      geom_bar(fill = cat6[1]) + 
      scale_y_discrete(limit = c(1.0, 2.0, 3.0, 4.0), 
                       labels = c('Male', 'Female', 'Neither', 'Prefer not to say')) 

p3 <- ggplot(survey, aes(y = minority)) + 
      geom_bar(fill = cat6[1]) + 
      scale_y_discrete(limit = c(1.0, 2.0), labels = c('Non-minority', 'Minority'))

p4 <- ggplot(survey, aes(y = supervisor)) + 
      geom_bar(fill = cat6[1]) + 
      scale_y_discrete(limit = c(1.0, 2.0), labels = c('Non-supervisor', 'Supervisor'))

p5 <- ggplot(survey, aes(y = va_tenure)) + 
      geom_bar(fill = cat6[1]) + 
      scale_y_discrete(limit = c(1.0, 2.0, 3.0), 
                       labels = c('Less than 10 years', '10 to 20 years', 'More than 20 years'))

p6 <-ggplot(survey, aes(y = administration)) + 
      geom_bar(fill = cat6[1])

grid.arrange(p1, p2, p3, p4, p5, p6, ncol= 3)

Majority of responses come from the Veterans Health Administration field and have been employed by the VA for less than 10 years as non-supervisors.

Correlation

Correlation can be used to identify the relationship between survey items and determine if there are any problems or inconsistencies with the items.

Creating the correlation matrix

# correlation matrix of non-demographic survey questions
cor_mat <- survey %>%
  select(8:38) %>%
  cor(., use = "pairwise.complete.obs")

Plotting with corrplot:

corrplot(cor_mat, 
         order = "hclust", # hierarchical clustering
         tl.cex = 0.7)

Plotting with ggcorplot:

ggcorrplot(cor_mat,
           type = "lower", # show lower part of matrix only
           hc.order = TRUE, # hierarchical clustering
           outline.color = "white",
           tl.cex = 7.0) 

Both plots show that the first item s79_personal (Are you considering leaving your job within the next year?) is not highly correlated with the rest of the survey items. I’m still going to include this item in the personal group as it does relate to an employee’s personal attitudes/behavior.


Visualizing Survey Responses

Stacked bar charts are used to visualize Likert-type survey items. The likert package provides an easy way to to create these charts and clearly label the percentages of each response.

The three types of survey responses in this dataset are:

Missing/Do not know responses are labeled as 6 in the data.

Functions to re-code the responses

# disagree, agree questions
likert_agree_recode <- function(x) {
  responses <- c("Strongly Disagree", "Disagree", "Neutral", "Agree", 
                 "Strongly Agree") 
  
  # use response int value as index to retrieve label
  y <- ifelse(is.na(x) | x==6, NA, responses[x])
  
  y <- factor(y, levels = responses)
  
  return(y)
}

# satisfied, dissatisfied questions 
likert_satis_recode <- function(x) {
  responses = c("Very Dissatisfied", "Dissatisfied", "Neutral", 
               "Satisfied", "Very Satisfied")
  
  y <- ifelse(is.na(x) | x==6, NA, responses[x])
  
  y <- factor(y, levels = responses)
  
  return(y)
}

# poor, good questions
likert_poor_recode <- function(x) {
  responses <- c("Very Poor", "Poor", "Fair", "Good", "Very Good")
  
  y <- ifelse(is.na(x) | x==6, NA, responses[x])
  
  y <- factor(y, levels = responses)
  
  return(y)
}

 

Personal job satisfaction survey items

pers_items_1 <- select(survey, ends_with('_personal')) %>%
  select(-c(1, 9, 10 ,11, 12))

# use question as item name
names(pers_items_1) <- c(
  f03_personal="I feel encouraged to come up with new and better ways 
    of doing things.",
  f04_personal="My work gives me a feeling of personal accomplishment.",
  f06_personal="I know what is expected of me on the job.",
  f11_personal="My talents are used well in the workplace.",
  f12_personal="I know how my work relates to the agency's goals",
  f17_personal="I can disclose a suspected violation of any law, rule, 
    or regulation without fear of reprisal.",
  f41_personal="I believe the results of this survey will be used to make 
    my agency a better place to work.",
  s13f10_personal="My workload is reasonable.",
  s15f40_personal="I recommend my organization as a good place to work.",
  s16f01_personal="I am given a real opportunity to improve my skills in my 
    organization."
)

pers_items_2 <- select(survey, ends_with('_personal')) %>%
  select(c(9, 10 ,11, 12))

names(pers_items_2) <- c(
  f63_personal = "How satisfied are you with your involvement in decisions 
      that affect your work?",
  f65_personal = "How satisfied are you with the recognition you receive for 
    doing a good job?",
  s07f69_personal = "Considering everything, how satisfied are you with your job?",
  s09f71_personal = "Considering everything, how satisfied are you with your 
    organization?"
)

# transform items into factors then likert object
personal_likert_da <- pers_items_1 %>%
  mutate_all(likert_agree_recode) %>%
  likert()

personal_likert_sa <- pers_items_2 %>%
  mutate_all(likert_satis_recode) %>%
  likert()

plot(personal_likert_da,
     group.order = names(pers_items_1),
     legend.position="right",
     centered = FALSE,
     wrap = 40) + 
     theme(axis.text = element_text(size = 12))

plot(personal_likert_sa,
     group.order = names(pers_items_2),
     legend.position="right",
     centered = FALSE,
     wrap = 40) + 
     theme(axis.text = element_text(size = 12))

Observations

  • Overall 70% respondents reported satisfaction with their job, while 65% reported satisfcation with their organization.
    • Clarity on whether “organization” refers to the VA in general or specific departments could improve the survey question.
  • Employee innovation and creativity need to be fostered and encouraged to boost productivity and engagement.
    • Only 66% of respondents felt encouraged to be innovative or that their talents were being fully utilized.
  • 63% of respondents agreed their workload was reasonable.
    • I would recommend looking into the staffing of each work unit and ensuring the distribution of work is fair
  • There is a need for action resulting from the analysis of this survey data
    • Only 53% of respondents felt the survey would lead to improvements in the workplace.
    • Only 60% of respondents were satisfied with their involvement with workplace decisions.
    • Following up with employees about what changes they would like to see and keeping them informed on the progress of those changes should be prioritized.

 

Job turnover

ggplot(survey, aes(y = s79_personal)) + 
      geom_bar(fill = cat6[1]) + 
      scale_y_discrete(limit = c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0), 
                       labels = c('No', 'Yes but taking another job within VA', 'Yes, to retire',
                                  'Yes, to take another job within the Federal government',
                                  'Yes, to take another job outside the Federal government',
                                  'Yes, other', 'Missing/Do not know')) +
      ggtitle("Are you considering leaving your job within the next year?") + 
      theme(plot.title = element_text(size = 14, face = "bold"), axis.text = element_text(size = 12))