This R notebook uses an Euler chart to visualize the reporting lines of IAM managers.
As part of the IAM Performance Measurement 2020 Survey, I surveyed IAM professionals about the reporting lines IAM managers in their organization. In this data analysis notebook, I use R to analyse and visualize the answers to this question.
The original survey question #22 was:
To whom reports the IAM manager in your organization?
The non-exclusive possible answers were:
When the survey participant selected the **Other* option, he could type in the other function name. In this notebook, I will not analyse the answers provided in free text form.
To conduct this data analysis, we first need to setup our technical environment.
# Set console to English
Sys.setenv(LANG = "en");
# Install R packages as needed
if(!require("eulerr")) install.packages("eulerr");
if(!require("tidyr")) install.packages("tidyr");
if(!require("RCurl")) install.packages("RCurl");
if(!require("plyr")) install.packages("plyr");
if(!require("knitr")) install.packages("knitr");
# Load libraries
library(eulerr);
library(tidyr);
library(RCurl);
library(plyr);
library(knitr);
Then we need to load the survey data. In accordance with the Open Source and Open Data spirit of the Open-Measure project, all data is publicly available on GitHub.
# Configuration options
survey_url = "https://raw.githubusercontent.com/Open-Measure/Open-Data/master/IAMPerf2020-Dataset/IAMPerf2020.csv";
question_number = 22;
# Download the survey data from GitHub
survey_raw <- RCurl::getURL(survey_url);
# Interpret the CSV to get structured data
survey_data <- read.csv (text = survey_raw);
# In the original survey data,
# we are interested in question 22: Reporting line of the line manager.
# The column names corresponding to the question options are prefixed with Q22.
# The column names of the options ("CFO", "CISO", ...) are suffixed with Ax.
# Here is the mapping table:
# Q22A1: CFO
# Q22A2: CISO
# Q22A3: CIO
# Q22A4: COO
# Q22A5: CEO
# Q22A6: Other
# Q22A7: I don't know
# Pick only the columns we are interested in,
# and rename columns to friendly names in the process.
question_data = data.frame(
CFO = survey_data$Q22A1,
CISO = survey_data$Q22A2,
CIO = survey_data$Q22A3,
COO = survey_data$Q22A4,
CEO = survey_data$Q22A5,
Other = survey_data$Q22A6
);
# Note: in the survey data, column Q22A7 is the "I don't know" option.
# We discard it here on purpose as "NA".
knitr::kable(head(question_data))
| CFO | CISO | CIO | COO | CEO | Other |
|---|---|---|---|---|---|
| NA | 1 | NA | NA | NA | NA |
| NA | NA | NA | NA | NA | NA |
| NA | NA | NA | NA | NA | NA |
| NA | NA | NA | NA | NA | 1 |
| NA | 1 | NA | NA | NA | 1 |
| NA | 1 | 1 | NA | NA | NA |
As we can see, we have a table structure with one row per survey answer and one column per selectable option. When the option was selected by the participant, it takes value 1 and N/A otherwise.
Note that options are not exclusive. In consequence we may find both single-valued answers and multi-valued answers.
Let’s count how many times every option was selected, independently on whether the options come from single-valued answers or multi-valued answers.
# Some basic stats.
n = nrow(question_data)
# We may use the sum function because a selected option correspond to a value of 1.
option_count = c(
CFO = sum(!is.na(question_data$CFO)),
CISO = sum(!is.na(question_data$CISO)),
CIO = sum(!is.na(question_data$CIO)),
COO = sum(!is.na(question_data$COO)),
CEO = sum(!is.na(question_data$CEO)),
Other = sum(!is.na(question_data$Other))
);
knitr::kable(data.frame(t(sort(option_count, decreasing = TRUE))));
| CISO | CIO | Other | COO | CEO | CFO |
|---|---|---|---|---|---|
| 42 | 15 | 13 | 7 | 2 | 0 |
We immediately spot the high frequency of the CISO reporting line, followed by the CIO.
We could stop the analysis here and output a bar or pie chart but this would hide an interesting aspect of our data: multi-valued answers. To overcome this limitation, we will use a different visualization: an Euler chart. This visualization will show the relative importance of individual options by the size of its bubbles but will also display the multi-valued answers as interesections between bubbles.
To do this, we should count every unique combination of selected options.
# The question options in the survey are non-exclusive,
# thus we need to keep track of all combinations (e.g. CISO + CIO).
# What we really need is to count both multiple value answers (e.g. CISO + CIO)
# and and single value answers (e.g. CISO).
# To accomplish this, reduce the table to its unique row combinations (e.g. CISO + CIO).
# And count the occurrences of these combinations.
# The plyr count function does all of this in a single shot,
# and appends the freq column (# of occurences).
question_flat = data.frame(plyr::count(question_data));
knitr::kable(question_flat[order(question_flat$freq, decreasing = TRUE),]);
| CFO | CISO | CIO | COO | CEO | Other | freq | |
|---|---|---|---|---|---|---|---|
| 9 | NA | NA | NA | NA | NA | NA | 129 |
| 3 | NA | 1 | NA | NA | NA | NA | 36 |
| 8 | NA | NA | NA | NA | NA | 1 | 11 |
| 5 | NA | NA | 1 | NA | NA | NA | 10 |
| 6 | NA | NA | NA | 1 | NA | NA | 6 |
| 1 | NA | 1 | 1 | NA | NA | NA | 4 |
| 2 | NA | 1 | NA | NA | NA | 1 | 2 |
| 7 | NA | NA | NA | NA | 1 | NA | 2 |
| 4 | NA | NA | 1 | 1 | NA | NA | 1 |
In our Euler graph, we wish to label bubbles with the title of the selected options, but we will not label intersections. In effect, intersections are self-explanatory (this is the whole visual value of Euler graphs). But to generate the Euler graph, we still need to keep track of these intersections.
So let’s label our data.
# Count the number of selected options per answer (ie row).
# This will help us distinguish between single-valued and multi-valued answers.
# We couldn't sum directly all columns because the freq column is now present.
question_flat$option_count =
tidyr::replace_na(question_flat$CFO, 0) +
tidyr::replace_na(question_flat$CISO, 0) +
tidyr::replace_na(question_flat$CIO, 0) +
tidyr::replace_na(question_flat$COO, 0) +
tidyr::replace_na(question_flat$CEO, 0) +
tidyr::replace_na(question_flat$Other, 0);
# Basic stats.
answered = sum(question_flat$freq[question_flat$option_count != 0]);
unanswered = sum(question_flat$freq[question_flat$option_count == 0]);
# To facilitate concatenation, prepare columns with strings.
question_flat$CFO_text = ifelse(question_flat$CFO == 1, "CFO", NA);
question_flat$CISO_text = ifelse(question_flat$CISO == 1, "CISO", NA);
question_flat$CIO_text = ifelse(question_flat$CIO == 1, "CIO", NA);
question_flat$COO_text = ifelse(question_flat$COO == 1, "COO", NA);
question_flat$CEO_text = ifelse(question_flat$CEO == 1, "CEO", NA);
question_flat$Other_text = ifelse(question_flat$Other == 1, "Other", NA);
# Declare a utility function to perform the paste operation
# on a vector (here it will be rows) while discarding NAs in the process.
get_technical_label = function(some_vector, sep = "") {
some_vector = some_vector[!is.na(some_vector)];
some_vector = base::paste(some_vector, collapse = sep);
return(some_vector);
};
# Multi-value concatenations.
# The euler uses a naming convention
# where bubble names are listed and separated by "&", e.g. "A&B",
# to designate set intersections.
question_flat$technical_label = apply(
question_flat[,c("CFO_text","CISO_text","CIO_text","COO_text","CEO_text","Other_text")],
1,
get_technical_label, "&");
# To populate human friendly readable labels, we only keep the single-valued ones.
question_flat$friendly_label = ifelse(
question_flat$option_count == 1,
question_flat$technical_label, # A single-valued label.
"" # A blank label.
);
# All this work to get this simple representation:
knitr::kable(question_flat[order(question_flat$freq, decreasing = TRUE),][,c("technical_label", "friendly_label", "freq")]);
| technical_label | friendly_label | freq | |
|---|---|---|---|
| 9 | 129 | ||
| 3 | CISO | CISO | 36 |
| 8 | Other | Other | 11 |
| 5 | CIO | CIO | 10 |
| 6 | COO | COO | 6 |
| 1 | CISO&CIO | 4 | |
| 2 | CISO&Other | 2 | |
| 7 | CEO | CEO | 2 |
| 4 | CIO&COO | 1 |
And finally, we may plot the Euler chart.
# Remove entries with no selected option
euler_data = question_flat$freq[question_flat$option_count != 0];
names(euler_data) = question_flat$technical_label[question_flat$option_count != 0];
euler_labels = question_flat$friendly_label[question_flat$option_count != 0];
# Keep a basic sample handily available to see how intersections are accounted for
# because we do not want to double-count entries.
# euler_data = c(100, 100, 100)
# names(euler_data) = c("A", "A&B", "B")
# Transform our data in the object class required by the eulerr package.
eulerr_object = eulerr::euler(euler_data)
# Some branding
palette = c("#0066ff", "#dddddd", "#dddddd", "#5599ff", "#5599ff");
# And now we may draw our Euler chart
plot(
eulerr_object,
#labels = euler_labels,
fills = palette,
edges = TRUE,
quantities = list(fontsize = 24, cex = .5, type = "percent"),
alpha = 0.7,
main = "The reporting lines of the IAM manager"
);
It should be noted that percentages appearing on the Euler chart may be misleading for newbies. In effect, you must add the single-valued answers with the multi-valued answers to get the full percentage per function.
We should thus enrich our graph with the total percentages per function.
function_percentages = sort(option_count, decreasing = TRUE);
function_percentages = function_percentages / answered * 100;
function_percentages = c(function_percentages, Total = sum(function_percentages));
knitr::kable(data.frame(Percentage = function_percentages), digits = 1);
| Percentage | |
|---|---|
| CISO | 58.3 |
| CIO | 20.8 |
| Other | 18.1 |
| COO | 9.7 |
| CEO | 2.8 |
| CFO | 0.0 |
| Total | 109.7 |
It may surprise that the total percentage is greater than 100% but this is explained by multi-valued answers that account in several categories.
And finally we should accompany the graph with information on the sample and answer rate.
# Prepare a legend to inform on the sample size and answer rate.
legend_block = paste(
"n: ", n,
", answered: ", answered, " (", round(answered / n * 100), "%)",
", unanswered:", unanswered, " (", round(unanswered / n * 100), "%)",
sep = "");
print(legend_block)
## [1] "n: 201, answered: 72 (36%), unanswered:129 (64%)"