David Doret
Version 1.1
Created May 2020, Last Revised June 2020

In a nutshell

In this R notebook, I analyse survey data related to whether Identity and Access Management (IAM) teams are predominantly shared or dedicated in organizations, split by IAM sub-domains of activity and for IAM as a whole. A likert chart is used to display the data.

Introduction

As part of the IAM Performance Measurement 2020 Survey, I surveyed IAM professionals about whether IAM teams were predominantly shared or dedicated for every IAM sub-domain and for IAM as a whole.

The original survey question #20 was:

In your organization, are IAM processes managed by dedicated and specialized teams? Please choose among the following models the one that best fits your organization.

Predominantly dedicated: Most IAM process tasks are managed by specialized teams that are dedicated to IAM.

Balanced: Some IAM process tasks are managed by specialized teams that are dedicated to IAM while others are managed by shared teams for whom IAM is one subject among others.

Predominantly shared: Most IAM process tasks are managed by shared teams for whom IAM is one subject among others, e.g.: IT, Security or Service Desk teams.

The notebook focuses on data analysis. Its business implications are discussed in a distinct article.

Setting up the technical environment

To conduct our data analysis, we first need to setup our R technical environment.

# Set console to English
Sys.setenv(LANG = "en");

# Install R packages as needed
if(!require("devtools")) install.packages("devtools");
if(!require("knitr")) install.packages("knitr");
if(!require("likert")) install.packages("likert");

Loading the IAMPerf2020 data environment

Then we need to load our survey data. In accordance with the Open Source and Open Data spirit of the Open-Measure project, all data is publicly available on GitHub. Isn’t that extremely cool, hm?

I centralized data loading in a “setup” script to simplify the loading of all data, the application of consistent factors, etc.

# Load the IAMPerf2020 data environment.
devtools::source_url("https://raw.githubusercontent.com/Open-Measure/Open-Data/master/IAMPerf2020-Dataset/IAMPerf2020Setup.R?raw=TRUE")
## SHA-1 hash of file is 4f9a65d894e33c58aa3ebfb27ec9d2e2b4f8db54
## [1] "Loading the IAMPerf2020 environment..."
## Loading required package: RCurl
## [1] "IAMPerf2020 environment loaded."
# In the original survey data, we are interested in question #23: IAM Goals.
question_number = 20;
# In the dataset, columns related to this question are prefixed with "Q20R"
question_prefix = paste("Q", question_number, "R", sep = ""); 

IAM domains and sub-domains

Let’s have a look at our IAM domain and sub-domain categories.

# Count the number of goal categories (this will be needed later on)
domains_count = nrow(iamperf2020_q20_domains); 

# Have a quick loot at the goal categories
knitr::kable(iamperf2020_q20_domains);
X Title
Q20R1 IAM (in general)
Q20R2 Workforce IAM
Q20R3 3rd Party IAM
Q20R4 Customer IAM
Q20R5 IoT IAM
Q20R6 PAM / TAM

Team dedication

For every IAM domain and sub-domain in the sample, participants had to answer how the corresponding teams were dedicated to their function. So let’s have a look at our team dedication categories.

# And let's have a quick look at the data
knitr::kable(iamperf2020_q20_team_dedication);
X Title
1 Predominantly shared
2 Balanced
3 Predominantly dedicated

Question Data

It’s now time to prepare our survey data and look at it.

# In the original survey data, every domain has one corresponding column.
# They are easily recognizable because they are prefixed by Q20R followed
# by an integer value.
interesting_columns = paste("Q20R", 1:domains_count, sep = "");

# And let's have a quick look at the top records in our dataset.
knitr::kable(dplyr::sample_n(iamperf2020_survey[,interesting_columns], 10));
Q20R1 Q20R2 Q20R3 Q20R4 Q20R5 Q20R6
Predominantly dedicated Predominantly dedicated Predominantly dedicated Balanced Balanced Balanced
NA NA NA NA NA NA
NA NA NA NA NA NA
Predominantly dedicated Predominantly dedicated Predominantly shared Predominantly dedicated NA Predominantly dedicated
Predominantly dedicated Predominantly dedicated Balanced Predominantly shared NA Predominantly shared
Balanced Predominantly dedicated Balanced Balanced NA Predominantly dedicated
Balanced Balanced Predominantly shared Predominantly shared Predominantly shared Predominantly dedicated
Balanced NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA

As we can see, we now have a data structure with one row per survey answer and one column per selectable option (that is IAM domain). The cell values correspond to the team dedication value set for that domain by the participant.

Basic Counting

We have a lot of dropouts in surveys, especially at the beginning of the survey. The increase of dropouts as we advance towards the end of the survey should be studied as well but we’ll keep this for another workbook. For the time being, let’s clarify the size of our sample data and the number of dropouts for this question.

# Count the number of survey answers.
n = nrow(iamperf2020_survey);

# I count as unanswered those rows where all categorical values are NA.
na_per_row = colSums(apply(iamperf2020_survey[,interesting_columns], 1, is.na));
answer_status_per_row = ifelse(
  na_per_row == 6,
  "Unanswered",
  "Answered");
answered = nrow(iamperf2020_survey[answer_status_per_row == "Answered",]);
unanswered = nrow(iamperf2020_survey[answer_status_per_row == "Unanswered",]);

# Prepare a legend to inform on the sample size and answer rate.
legend_block = paste(
  "n: ", n,
  ", answered: ", answered, " (", round(answered / n * 100), "%)",
  ", unanswered:", unanswered, " (", round(unanswered / n * 100), "%)", 
  sep = "");

print(legend_block)
## [1] "n: 201, answered: 90 (45%), unanswered:111 (55%)"

Likert Chart

We have 3 dimensions in our data: participants, domains and team dedication. Data visualization is powerful to help us grasp data, but finding the right visualization technique is sometimes challenging. Here, I choose a Likert chart. It will allow us to see the frequency of participant answers across all goals and all priorities in a single picture. Isn’t that amazing?

# Re-set the column names to the long titles to get proper labels on the graph.
colnames(iamperf2020_survey[,interesting_columns]) = iamperf2020_q20_domains$Title;

likert_data = likert::likert(iamperf2020_survey[,interesting_columns]);

likert_colors = c("#ff9955", "#dddddd", "#5599ff");

graphics::plot(
  likert_data,
  type = "bar",
  col = likert_colors)

Cool, hm?

References

I list here a bunch of articles that have been inspiring or helpful while writing this notebook.