In a nutshell
Introduction
Setting up the technical environment
Retrieve the data
Basic counting
Introducing the Euler chart
Labeling
Plotting
Percentages per function
Sample and answer rate
References

David Doret May 2020

In a nutshell

This R notebook uses an Euler chart to visualize the reporting lines of IAM managers.

Introduction

As part of the IAM Performance Measurement 2020 Survey, I surveyed IAM professionals about the reporting lines IAM managers in their organization. In this data analysis notebook, I use R to analyse and visualize the answers to this question.

The original survey question #22 was:

To whom reports the IAM manager in your organization?

The non-exclusive possible answers were:

CFO
CISO
CIO
COO
CEO
Other
I don’t know

When the survey participant selected the **Other* option, he could type in the other function name. In this notebook, I will not analyse the answers provided in free text form.

Setting up the technical environment

To conduct this data analysis, we first need to setup our technical environment.

# Set console to English
Sys.setenv(LANG = "en");

# Install R packages as needed
if(!require("eulerr")) install.packages("eulerr");
if(!require("tidyr")) install.packages("tidyr");
if(!require("RCurl")) install.packages("RCurl");
if(!require("plyr")) install.packages("plyr");
if(!require("knitr")) install.packages("knitr");

# Load libraries
library(eulerr);
library(tidyr);
library(RCurl);
library(plyr);
library(knitr);

Retrieve the data

Then we need to load the survey data. In accordance with the Open Source and Open Data spirit of the Open-Measure project, all data is publicly available on GitHub.

# Configuration options
survey_url = "https://raw.githubusercontent.com/Open-Measure/Open-Data/master/IAMPerf2020-Dataset/IAMPerf2020.csv";
question_number = 22;

# Download the survey data from GitHub
survey_raw <- RCurl::getURL(survey_url);

# Interpret the CSV to get structured data
survey_data <- read.csv (text = survey_raw);

# In the original survey data, 
# we are interested in question 22: Reporting line of the line manager.
# The column names corresponding to the question options are prefixed with Q22.
# The column names of the options ("CFO", "CISO", ...) are suffixed with Ax.
# Here is the mapping table:
# Q22A1: CFO
# Q22A2: CISO
# Q22A3: CIO
# Q22A4: COO
# Q22A5: CEO
# Q22A6: Other
# Q22A7: I don't know

# Pick only the columns we are interested in,
# and rename columns to friendly names in the process.
question_data = data.frame(
  CFO = survey_data$Q22A1,
  CISO = survey_data$Q22A2,
  CIO = survey_data$Q22A3,
  COO = survey_data$Q22A4,
  CEO = survey_data$Q22A5,
  Other = survey_data$Q22A6
);

# Note: in the survey data, column Q22A7 is the "I don't know" option.
# We discard it here on purpose as "NA".

knitr::kable(head(question_data))

CFO	CISO	CIO	COO	CEO	Other
NA	1	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	1
NA	1	NA	NA	NA	1
NA	1	1	NA	NA	NA

As we can see, we have a table structure with one row per survey answer and one column per selectable option. When the option was selected by the participant, it takes value 1 and N/A otherwise.

Note that options are not exclusive. In consequence we may find both single-valued answers and multi-valued answers.

Basic counting

Let’s count how many times every option was selected, independently on whether the options come from single-valued answers or multi-valued answers.

# Some basic stats.
n = nrow(question_data)

# We may use the sum function because a selected option correspond to a value of 1.
option_count = c(
  CFO = sum(!is.na(question_data$CFO)),
  CISO = sum(!is.na(question_data$CISO)),
  CIO = sum(!is.na(question_data$CIO)),
  COO = sum(!is.na(question_data$COO)),
  CEO = sum(!is.na(question_data$CEO)),
  Other = sum(!is.na(question_data$Other))
  );

knitr::kable(data.frame(t(sort(option_count, decreasing = TRUE))));

CISO	CIO	Other	COO	CEO	CFO
42	15	13	7	2	0

We immediately spot the high frequency of the CISO reporting line, followed by the CIO.

Introducing the Euler chart

We could stop the analysis here and output a bar or pie chart but this would hide an interesting aspect of our data: multi-valued answers. To overcome this limitation, we will use a different visualization: an Euler chart. This visualization will show the relative importance of individual options by the size of its bubbles but will also display the multi-valued answers as interesections between bubbles.

To do this, we should count every unique combination of selected options.

# The question options in the survey are non-exclusive,
# thus we need to keep track of all combinations (e.g. CISO + CIO).
# What we really need is to count both multiple value answers (e.g. CISO + CIO)
# and and single value answers (e.g. CISO).
# To accomplish this, reduce the table to its unique row combinations (e.g. CISO + CIO).
# And count the occurrences of these combinations.
# The plyr count function does all of this in a single shot,
# and appends the freq column (# of occurences).
question_flat = data.frame(plyr::count(question_data));

knitr::kable(question_flat[order(question_flat$freq, decreasing = TRUE),]);

	CFO	CISO	CIO	COO	CEO	Other	freq
9	NA	NA	NA	NA	NA	NA	129
3	NA	1	NA	NA	NA	NA	36
8	NA	NA	NA	NA	NA	1	11
5	NA	NA	1	NA	NA	NA	10
6	NA	NA	NA	1	NA	NA	6
1	NA	1	1	NA	NA	NA	4
2	NA	1	NA	NA	NA	1	2
7	NA	NA	NA	NA	1	NA	2
4	NA	NA	1	1	NA	NA	1

Labeling

In our Euler graph, we wish to label bubbles with the title of the selected options, but we will not label intersections. In effect, intersections are self-explanatory (this is the whole visual value of Euler graphs). But to generate the Euler graph, we still need to keep track of these intersections.

So let’s label our data.

# Count the number of selected options per answer (ie row).
# This will help us distinguish between single-valued and multi-valued answers.
# We couldn't sum directly all columns because the freq column is now present.
question_flat$option_count =
  tidyr::replace_na(question_flat$CFO, 0) +
  tidyr::replace_na(question_flat$CISO, 0) +
  tidyr::replace_na(question_flat$CIO, 0) +
  tidyr::replace_na(question_flat$COO, 0) +
  tidyr::replace_na(question_flat$CEO, 0) +
  tidyr::replace_na(question_flat$Other, 0);

# Basic stats.
answered = sum(question_flat$freq[question_flat$option_count != 0]);
unanswered = sum(question_flat$freq[question_flat$option_count == 0]);

# To facilitate concatenation, prepare columns with strings.
question_flat$CFO_text = ifelse(question_flat$CFO == 1, "CFO", NA);
question_flat$CISO_text = ifelse(question_flat$CISO == 1, "CISO", NA);
question_flat$CIO_text = ifelse(question_flat$CIO == 1, "CIO", NA);
question_flat$COO_text = ifelse(question_flat$COO == 1, "COO", NA);
question_flat$CEO_text = ifelse(question_flat$CEO == 1, "CEO", NA);
question_flat$Other_text = ifelse(question_flat$Other == 1, "Other", NA);

# Declare a utility function to perform the paste operation
# on a vector (here it will be rows) while discarding NAs in the process.
get_technical_label = function(some_vector, sep = "") {
  some_vector = some_vector[!is.na(some_vector)];
  some_vector = base::paste(some_vector, collapse = sep);
  return(some_vector);
};

# Multi-value concatenations.
# The euler uses a naming convention
# where bubble names are listed and separated by "&", e.g. "A&B",
# to designate set intersections.
question_flat$technical_label = apply(
  question_flat[,c("CFO_text","CISO_text","CIO_text","COO_text","CEO_text","Other_text")], 
  1, 
  get_technical_label, "&");

# To populate human friendly readable labels, we only keep the single-valued ones.
question_flat$friendly_label = ifelse(
  question_flat$option_count == 1, 
  question_flat$technical_label, # A single-valued label.
  "" # A blank label.
  );

# All this work to get this simple representation:
knitr::kable(question_flat[order(question_flat$freq, decreasing = TRUE),][,c("technical_label", "friendly_label", "freq")]);

	technical_label	friendly_label	freq
9			129
3	CISO	CISO	36
8	Other	Other	11
5	CIO	CIO	10
6	COO	COO	6
1	CISO&CIO		4
2	CISO&Other		2
7	CEO	CEO	2
4	CIO&COO		1

Plotting

And finally, we may plot the Euler chart.

# Remove entries with no selected option
euler_data = question_flat$freq[question_flat$option_count != 0];
names(euler_data) = question_flat$technical_label[question_flat$option_count != 0];
euler_labels = question_flat$friendly_label[question_flat$option_count != 0];

# Keep a basic sample handily available to see how intersections are accounted for
# because we do not want to double-count entries.
# euler_data = c(100, 100, 100)
# names(euler_data) = c("A", "A&B", "B")

# Transform our data in the object class required by the eulerr package.
eulerr_object = eulerr::euler(euler_data)

# Some branding
palette = c("#0066ff", "#dddddd", "#dddddd", "#5599ff", "#5599ff");

# And now we may draw our Euler chart
plot(
  eulerr_object,
  #labels = euler_labels,
  fills = palette,
  edges = TRUE,
  quantities = list(fontsize = 24, cex = .5, type = "percent"),
  alpha = 0.7,
  main = "The reporting lines of the IAM manager"
  );

Percentages per function

It should be noted that percentages appearing on the Euler chart may be misleading for newbies. In effect, you must add the single-valued answers with the multi-valued answers to get the full percentage per function.

We should thus enrich our graph with the total percentages per function.

function_percentages = sort(option_count, decreasing = TRUE);
function_percentages = function_percentages / answered * 100;
function_percentages = c(function_percentages, Total = sum(function_percentages));
knitr::kable(data.frame(Percentage = function_percentages), digits = 1);

	Percentage
CISO	58.3
CIO	20.8
Other	18.1
COO	9.7
CEO	2.8
CFO	0.0
Total	109.7

It may surprise that the total percentage is greater than 100% but this is explained by multi-valued answers that account in several categories.

Sample and answer rate

And finally we should accompany the graph with information on the sample and answer rate.

# Prepare a legend to inform on the sample size and answer rate.
legend_block = paste(
  "n: ", n,
  ", answered: ", answered, " (", round(answered / n * 100), "%)",
  ", unanswered:", unanswered, " (", round(unanswered / n * 100), "%)", 
  sep = "");

print(legend_block)

## [1] "n: 201, answered: 72 (36%), unanswered:129 (64%)"

CFO	CISO	CIO	COO	CEO	Other
NA	1	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	1
NA	1	NA	NA	NA	1
NA	1	1	NA	NA	NA

	CFO	CISO	CIO	COO	CEO	Other	freq
9	NA	NA	NA	NA	NA	NA	129
3	NA	1	NA	NA	NA	NA	36
8	NA	NA	NA	NA	NA	1	11
5	NA	NA	1	NA	NA	NA	10
6	NA	NA	NA	1	NA	NA	6
1	NA	1	1	NA	NA	NA	4
2	NA	1	NA	NA	NA	1	2
7	NA	NA	NA	NA	1	NA	2
4	NA	NA	1	1	NA	NA	1

CFO	CISO	CIO	COO	CEO	Other
NA	1	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	1
NA	1	NA	NA	NA	1
NA	1	1	NA	NA	NA

	CFO	CISO	CIO	COO	CEO	Other	freq
9	NA	NA	NA	NA	NA	NA	129
3	NA	1	NA	NA	NA	NA	36
8	NA	NA	NA	NA	NA	1	11
5	NA	NA	1	NA	NA	NA	10
6	NA	NA	NA	1	NA	NA	6
1	NA	1	1	NA	NA	NA	4
2	NA	1	NA	NA	NA	1	2
7	NA	NA	NA	NA	1	NA	2
4	NA	NA	1	1	NA	NA	1

Visualizing the reporting lines of the IAM Manager with a Euler chart