Introduction

During early brigade history, the Brigade Organizer’s Playbook (BOP) was curated from the learnings of early brigade leaders, containing examples of what worked and how to replicate some of the organizing actions they took. However, the BOP, as it was, wasn’t serving the need for the core teams trying to bring people together around projects. As the Brigade Network has grown and evolved, transforming into a blend of small and large brigades spanning across five regions, it has become harder to identify priorities and share resources effectively.

The BOP Extension Project was initiated to create a framework to understand the needs across the network and to simultaneously gather brigade learnings and best practices, all while maintaining transparency. The goal of this project is to inform how we can make a better brigade organizing experience for all brigade captains based on their priorities, and ultimately, build on the existing playbook and improve its value as a collective, high-value resource for sharing and providing brigade guidance.

Specifically, this project aims to discover which topics currently in the BOP are in most need of examples and additional support, according to brigade leads. In addition, this project characterizes how these needs differ based on brigade size, region, and length of leadership. We use the responses to draw a conclusion about how brigade leads want to be supported by Code for America.

A separate report will detail the insights we can draw from a qualitative analysis of the interviews. In this report, we draw insights from the quantitative data.

Data Collection

The data were collected from a series of Zoom interviews between a member of the BOP Extension Project team and a leader of a local Code for America brigade. At the time this report was written, we had conducted 72 of these interviews. Each interview was structured with 33 unique questions. For 26 of these questions, we asked the interviewee to provide a numeric ranking that we can use for quantitative analysis, and to elaborate on their rankings in their own words. The remaining 7 questions were open-ended. For the numeric questions, we used this Google form to collect the responses. We collected the text of the interviews using the Otter AI transcription service.

We asked brigade leads: “We would like to learn about your brigade’s need for an example or model when carrying out a variety of activities and topics.” We then asked brigade leads to provide a score from 0 to 6 for each task, where the numbers mean

1 - We do this well, we don’t need an example
2 - I don’t need an example
3 - An example could be useful
4 - I will need an example in the future
5 - I wish I had an example yesterday
6 - Not having an example has limited my brigade

In addition to the data we collected directly from interviews, we collected data about the brigades’ membership by scraping membership numbers from each brigade’s meetup.com homepage. We also asked each brigade lead to tell us the length of time they have been in a leadership position for the brigade, and we used the official Code for America regions to categorize the brigades geographically.

Data Analysis

The following sections contain data analysis conducted using the R programming environment, version 4.0.3, and the cloud implementation of RStudio Server Pro, Version 1.4.1106-5. For full transparency, we display the R code we used to load, prepare, and analyze the data along with the results of the code.

We begin by loading the following libraries:

library(tidyverse)
library(DT)
library(googlesheets4)
library(mirt)
library(knitr)
library(plotly)

Data preparation

Responses from Brigade Lead Interviews

As data collection on this project is ongoing, we begin by generating a clean version of the data that is appropriate for our analyses. We start by pulling the most recent data from our Google sheet:

url <- "https://docs.google.com/spreadsheets/d/10uZjjrXpZH-y6wZBHZvhKgWNUBj7Xvyh9yoiqL4F_9I/edit?usp=sharing"
data <- read_sheet(url)

Next, we keep only the columns in the data that we will use going forward:

.names <- c("0.2 Can I confirm that the Brigade you are in is:",
            "0.3 [Copy/paste and send their name in the chat] Can I confirm that I have the spelling of your name right? I've just sent it in the chat. [if you need help pronouncing their name, you can say the following] Can you help me with the pronunciation of your name.  I want to make sure I get it right.",
            "1.0.0 Hosting hack nights, how would you rank this 0 through 6",
            "1.0.1 Days of action (for example, the National Day of Civic Hacking this past September)",
            "1.0.2 Cultivating government partnerships",
            "1.0.3 Cultivating community partnerships",
            "1.0.4 Hosting a workshop to help partners identify user needs",
            "1.0.5 Practicing lean software development",
            "1.0.6 Conducting user testing",
            "1.0.7 Code of Conduct - what happens after the fork - creating strategies for how to deal with Code of Conduct violations",
            "1.0.8 Building a core team",
            "1.0.9 Drafting a strategic plan for your brigade",
            "1.0.10 Drafting a strategic plan for a project",
            "1.0.11 Fundraising",
            "1.0.12 Tools to manage your brigade, for example: Discourse, Google Groups, Meetup, GitHub Issues, and Slack",
            "1.0.13 Developing a brand and media strategy",
            "1.0.14 Onboarding to the national network",
            "1.0.15 Guide for how to make open-source projects replicable by other brigades",
            "1.0.16 Running a remote brigade",
            "1.0.17 How to set and achieve DEI (diversity, equity, and inclusion) goals",
            "1.0.18 Connecting people with local government job opportunities",
            "1.0.19 Workforce development (resume help/LinkedIn review, career coaching, guided skill development)")

data <- data[, names(data) %in% .names]
names(data)[1:2] <- c("brigade", "interviewee")

Next, we reshape the data so that each row represents one question from one interview, and we push this dataset back to the Google sheet as the “cleaned_data” tab:

data <- data %>%
  pivot_longer(cols = .names[-c(1,2)], names_to="Question", values_to="Ranking") %>%
  mutate(Ranking = ifelse(Ranking==0, NA, Ranking)) %>%
  filter(!is.na(interviewee))
sheet_write(data, ss = url, sheet="cleaned_data")

Brigade Size

In addition, we pull the brigade metadata – brigade size, length of leadership, and region – into this dataset.

For brigade size, we pull the data from the Google sheet that contains the information we scraped from meetup.com:

url <- "https://docs.google.com/spreadsheets/d/10uZjjrXpZH-y6wZBHZvhKgWNUBj7Xvyh9yoiqL4F_9I/edit?usp=sharing"
meetup <- read_sheet(url, sheet='Live-Meetup-data')

The data contain a flaw that a few brigades were listed at 1/1000th of their membership size (due to the scraper confusing a comma with a decimal, so that 2,023 is read as 2.023). These values are all less than 5, so we multiply these values by 1000:

meetup <- meetup %>%
  mutate(`Meetup "members"` = ifelse(`Meetup "members"` < 5, 
                                     `Meetup "members"`*1000, 
                                     `Meetup "members"`))

Meetup’s membership numbers are notoriously inaccurate, as many people who sign up for the Meetup group do not participate in brigade activities, projects, or events. As such, we should not use the raw numbers to characterize the exact size of each brigade’s membership. However, we can use these numbers to distinguish large from small brigades. The median number of Meetup members across brigades is 771: we code brigades with fewer than 771 Meetup members as “small,” and brigades with at least 771 Meetup members as “large.” This distinction depends on an assumption that the ratio of active members to Meetup members is roughly equal across brigades and uncorrelated to brigade size, at least to an extent that we shouldn’t see large brigades misclassified as small, or small brigades misclassified as large. We create this size distinction and merge the size into the interview data:

meetup <- meetup %>%
  mutate(size = ifelse(`Meetup "members"` < median(`Meetup "members"`, na.rm=TRUE),
                       "Small", "Large")) %>%
  dplyr::select(`Brigade Name`, size)
data <- full_join(data, meetup, by = c("brigade" = "Brigade Name"))

The scraper failed to acquire the total membership for a few of the brigades, so we enter these sizes manually:

data[data$brigade %in% c("Code for Durham", "Open NC Collaborative"),]$size <- "Large"
data[is.na(data$size),]$size <- "Small"
data <- filter(data, brigade != "Last Updated: Sat Aug 08 2020")

Here are the brigades designated as large and as small according to this approach:

sizetabsmall <- filter(data, size=="Small") 
sizetablarge <- filter(data, size=="Large")

Small Brigades	Large Brigades
Code for ABQ Code for Akron Code for Anchorage Code for Asheville Code for BCS Code for Bloomington Code for BTV Code for Buffalo Code for Charlottesville Code for Chico Code for Connecticut Code for Dallas Code for Fort Collins Code for Fresno Code for Gainesville Code for Greenville Code for Hackensack Code for Hampton Roads Code for Hawaii Code for Indianapolis Code for Iowa Code for Kentuckiana Code for Lansing Code for Las Vegas Code for Little Rock Code for Milwaukee Code for Montana Code for Muskogee Code for Nebraska Code for New Hampshire Code for New Orleans Code for New River Valley Code for Phoenix Code for Puerto Rico Code for Santa Barbara Code for Sonoma County Code for Syracuse Code for Tallahassee Code for Upper Valley Code for Utah Code for Worcester Code Island Open Boise Open Columbus (Indiana) Open Eugene Open Maine Open Savannah Open Toledo	BetaNYC Code for Atlanta Code for Baltimore Code for Boston Code for Boulder Code for Cary Code for Chicago Code for Dayton Code for DC Code for Denver Code for Durham Code for Fort Lauderdale Code for Greensboro Code for Jersey City Code for Miami Code for Nashville Code for Newark Code for NoVA Code for Orlando Code for PDX Code for Philly Code for Pittsburgh Code for Sacramento Code for San Francisco Code for San Jose Code for Tampa Bay Code for Tucson Code for Tulsa Hack for LA Brigade Hack Michiana KC Digital Drive Open Austin Open Charlotte Brigade Open Cleveland Open Columbus (Ohio) Open Data Delaware Open NC Collaborative Open Oakland Open San Diego Open Seattle Open Twin Cities OpenSTL Sketch City (Houston)

Code for ABQ
Code for Akron
Code for Anchorage
Code for Asheville
Code for BCS
Code for Bloomington
Code for BTV
Code for Buffalo
Code for Charlottesville
Code for Chico
Code for Connecticut
Code for Dallas
Code for Fort Collins
Code for Fresno
Code for Gainesville
Code for Greenville
Code for Hackensack
Code for Hampton Roads
Code for Hawaii
Code for Indianapolis
Code for Iowa
Code for Kentuckiana
Code for Lansing
Code for Las Vegas
Code for Little Rock
Code for Milwaukee
Code for Montana
Code for Muskogee
Code for Nebraska
Code for New Hampshire
Code for New Orleans
Code for New River Valley
Code for Phoenix
Code for Puerto Rico
Code for Santa Barbara
Code for Sonoma County
Code for Syracuse
Code for Tallahassee
Code for Upper Valley
Code for Utah
Code for Worcester
Code Island
Open Boise
Open Columbus (Indiana)
Open Eugene
Open Maine
Open Savannah
Open Toledo

BetaNYC
Code for Atlanta
Code for Baltimore
Code for Boston
Code for Boulder
Code for Cary
Code for Chicago
Code for Dayton
Code for DC
Code for Denver
Code for Durham
Code for Fort Lauderdale
Code for Greensboro
Code for Jersey City
Code for Miami
Code for Nashville
Code for Newark
Code for NoVA
Code for Orlando
Code for PDX
Code for Philly
Code for Pittsburgh
Code for Sacramento
Code for San Francisco
Code for San Jose
Code for Tampa Bay
Code for Tucson
Code for Tulsa
Hack for LA Brigade
Hack Michiana
KC Digital Drive
Open Austin
Open Charlotte Brigade
Open Cleveland
Open Columbus (Ohio)
Open Data Delaware
Open NC Collaborative
Open Oakland
Open San Diego
Open Seattle
Open Twin Cities
OpenSTL
Sketch City (Houston)

Length of Leadership and Region

We collected the length of time that the current brigade lead has been in that role for each bridage, and the official region for the brigade as of 2021. We saved these data in our Google sheet, and we can bring it into R as follows:

leadreg <- read_sheet(url, sheet='Interview Response List + Tracking', skip=2)

## Reading from "BOP Interview Responses and Tracking"

## Range "'Interview Response List + Tracking'!3:5000000"

## New names:
## * `` -> ...27
## * `` -> ...28
## * `` -> ...29
## * `` -> ...30
## * `` -> ...31
## * ...

leadreg <- leadreg %>%
  dplyr::select(`Brigade Name`, `Length of Brigade Leadership quantified`, `2021Region`) %>%
  filter(!is.na(`Brigade Name`))
names(leadreg) <- c("brigade", "leadlength", "region")

There are a few missing values for region that we can replace with real data:

leadreg$region[leadreg$brigade %in% c("Code for Anchorage", "Hack for LA",
                                      "Code for Santa Barbara","Code for Chico")] <- "Pacific"
leadreg$region[leadreg$brigade %in% c("Code for Akron", "Code for Milwaukee", 
                                      "Code for Nebraska")] <- "Midwest"
leadreg$region[leadreg$brigade %in% c("Code for Little Rock")] <- "Southeast"

Some brigades have multiple leads with different lengths of time they have been in that role. In these cases, to reduce the data to one record per brigade, we keep the maximum tenure from among the captains:

leadreg <- leadreg %>%
  group_by(brigade) %>%
  summarize(leadlength = max(leadlength, na.rm=TRUE),
            region = first(region)) %>%
  mutate(leadlength = ifelse(leadlength != -Inf, leadlength, NA))

## Warning in max(leadlength, na.rm = TRUE): no non-missing arguments to max;
## returning -Inf

## Warning in max(leadlength, na.rm = TRUE): no non-missing arguments to max;
## returning -Inf

## Warning in max(leadlength, na.rm = TRUE): no non-missing arguments to max;
## returning -Inf

## `summarise()` ungrouping output (override with `.groups` argument)

The following histogram represents the distribution of lengths of leadership:

g <- ggplot(leadreg, aes(x=leadlength, fill="#69b3a2")) +
    geom_histogram(show.legend = FALSE, bins=15, color="black") +
  xlab("Years") +
  ggtitle("Distribution of Length of Leadership") +
  theme(legend.position="none")
ggplotly(g)

## Warning: Removed 3 rows containing non-finite values (stat_bin).

The following bar chart illustrates the number of brigades in each region:

g <- ggplot(leadreg, aes(y=region, fill=as.factor(region))) +
    geom_bar() +
  xlab("Count") +
  ggtitle("Number of Brigade Leads Interviewed By Region") +
  theme(legend.position="none")
ggplotly(g)

Finally we merge the leadership and region data into the working dataset:

data <- full_join(data, leadreg)

## Joining, by = "brigade"

data <- filter(data, !is.na(interviewee))
data <- data %>%
  mutate(shortquestion = Question,
    shortquestion = fct_recode(shortquestion,
                               "Hacknights"="1.0.0 Hosting hack nights, how would you rank this 0 through 6",
                               "Days of action"="1.0.1 Days of action (for example, the National Day of Civic Hacking this past September)",
                               "Gov. Partnerships"="1.0.2 Cultivating government partnerships",
                               "Community Partnerships"="1.0.3 Cultivating community partnerships",
                               "User needs workshops"="1.0.4 Hosting a workshop to help partners identify user needs",
                               "Lean software"="1.0.5 Practicing lean software development",
                               "User Testing"="1.0.6 Conducting user testing",
                               "Code of Conduct"="1.0.7 Code of Conduct - what happens after the fork - creating strategies for how to deal with Code of Conduct violations",
                               "Core team"="1.0.8 Building a core team",
                               "Brigade plan"="1.0.9 Drafting a strategic plan for your brigade",
                               "Project plan"="1.0.10 Drafting a strategic plan for a project",
                               "Fundraising"="1.0.11 Fundraising",
                               "Brigade management tools"="1.0.12 Tools to manage your brigade, for example: Discourse, Google Groups, Meetup, GitHub Issues, and Slack",
                               "Brand and media"="1.0.13 Developing a brand and media strategy",
                               "National network"="1.0.14 Onboarding to the national network",
                               "Make open-source projects replicable"="1.0.15 Guide for how to make open-source projects replicable by other brigades",
                               "Running a remote brigade"="1.0.16 Running a remote brigade",
                               "DEI"="1.0.17 How to set and achieve DEI (diversity, equity, and inclusion) goals",
                               "Local gov. job opportunities"="1.0.18 Connecting people with local government job opportunities",
                               "Workforce development"="1.0.19 Workforce development (resume help/LinkedIn review, career coaching, guided skill development)"))

We save a copy of the cleaned, working dataset as a tab on our Google sheet:

sheet_write(data, ss = url, sheet="cleaned_data_with_metadata")

## Writing to "BOP Interview Responses and Tracking"

## Writing to sheet "cleaned_data_with_metadata"

Ranking the Tasks By Need for Examples and Support

The most straightforward way to understand brigade needs is to calculate the mean score for every task. Scores have values between 1 and 6, where higher values indicate that brigades have a greater need for an example. We calculate these means, along with the counts of each value for each question:

data2 <- data %>%
  group_by(Question, shortquestion) %>%
  summarize(`Mean Ranking` = round(mean(Ranking, na.rm=TRUE), 2),
            `We do this well, we don't need an example` = sum(Ranking==1, na.rm=TRUE),
            `I don’t need an example`= sum(Ranking==2, na.rm=TRUE),
            `An example could be useful`= sum(Ranking==3, na.rm=TRUE),
            `I will need an example in the future`= sum(Ranking==4, na.rm=TRUE),
            `I wish I had an example yesterday`= sum(Ranking==5, na.rm=TRUE),
            `Not having an example has limited my brigade`= sum(Ranking==6, na.rm=TRUE)) %>%
  arrange(-`Mean Ranking`)

## `summarise()` regrouping output by 'Question' (override with `.groups` argument)

The aggregated data, sorted from the task with the highest average score to the task with the lowest, are shown in the following table:

datatable(data2)

We can also visualize these averages:

g <- ggplot(data2, aes(x = (`Mean Ranking`) , y = reorder(shortquestion, `Mean Ranking`), fill = `Mean Ranking`)) +
  geom_bar(stat = "summary", fun = "mean") +
  labs(title = "Overall Average Ranking of Survey Items across All Brigades",
       subtitle = "DEI ranks highest across network",
       caption = "(Data from BOP Brigade Lead Survey)",
       x = "Average Ranking",
       y = "Question",
       color = "Average ranking") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"),
        plot.title = element_text(size=22, hjust = 0.5),
        plot.subtitle = element_text(size=16, hjust = 0.5)) +
  scale_fill_continuous(guide = FALSE) 
g

Overall, we see that DEI ranks highest on average across all of the interviews we conducted. However, DEI is not the only question for which brigades reported a higher need for an example: brigades also request help with developing a strategic plan for the brigade, building a core team, fundraising, and onboarding to the national network. However, as we can see from the raw counts of the responses, some of the tasks are more divided across the network than others: cultivating government partnerships, for example, has more responses at either extreme end of the scale than most of the other tasks. This indicates a level of nuance we should explore: what distinguishes a medium level of preparedness from a high level of preparedness? Which of these questions are related to the overall concept of brigade capacity, and which are separate ideas? And how do these divisions depend on factors such as region, length of brigade leadership, and brigade size? We explore these questions in the next sections.

Measuring an Aggregate Brigade Self-Reported Preparedness Index (SRPI)

We asked each interviewee 20 questions. For each question, the interviewee reports the extent to which they feel they need help with a different area of brigade operations. Many of these questions are correlated: if someone reports that they could use some help with practicing lean software development, it is likely that they could also use some help with user testing. Given that many of these questions are correlated, we can imagine that there exists an underlying variable – the self-reported preparedness index (SRPI) – that characterizes the extent to which a brigade requests assistance from Code for America and the National Advisory Council on Brigade operations in general. In statistical terms, the SRPI is the causal factor and each of the 20 questions are effects of the SRPI, as illustrated below:

It is important to note that the index is a self-reported index, not a true measure of general preparedness. In no way should these indices be taken as an evaluation of brigades or brigade leads.

An aggregate index like the SRPI is useful in this case because it allows us to better understand which of the 20 questions are the most closely related to the SPRI, and it gives us an opportunity to see how the SRPI varies by external variables such as brigade size and age.

To measure the SRPI, we need a method that accounts for the fact that some tasks are easier for brigades than others, while at the same time weighting the questions differently based on how closely they correlate to the other questions. These kinds of measurement problems are solved well by a class of models from psychometrics called item response theory (IRT) models (Embretson and Reise 2013). IRT models generate estimates of three quantities:

A measurement of the latent variable (in this case, the self-reported preparedness index) for each participant in the study,
a metric that expresses the “difficulty” of each question: that is, the values of the latent variable for which one response to a question has an equal probability to another response, and
a metric that expresses how well the question “discriminates” between participants with high and low values of the latent variable. In this case, questions with high discrimination scores get the greatest weights in the overall SRPI, and questions with low discrimination get the smallest weights. These low-weighted questions should be thought of not as unimportant, but rather as measuring a concept that is essentially different from self-reported preparedness.

The specific version of IRT we employ in this case is the graded response model (Samejima 1997), which works specifically with ordinal data.

We first reshape the data so that each question occupies a different column. We collapse the responses for each question to three categories:

If the original response to a question is 1 or 2, indicating that the interviewee feels they need little or no help with a task, the question is coded as 1.
If the original response to a question is 3 or 4, indicating that the interviewee feels they need some help with a task, the question is coded as 2.
If the original response to a question is 5 or 6, indicating that the interviewee feels they need a great deal of help with a task, the question is coded as 3.

The following code prepares the data in this way:

data <- pivot_wider(data,
                    id_cols = c('brigade', 'interviewee', 'size', 'leadlength', 'region'),
                    names_from = "shortquestion",
                    values_from = "Ranking")

grmdata <- data
grmdata[,-c(1:5)] <- ceiling(grmdata[,-c(1:5)]/2)

The graded response model is implemented in the mirt package in R (Chalmers and others 2012):

my.grm <- mirt(grmdata[,-c(1:5)], 1, itemtype="graded", verbose=FALSE)

First we can extract the values of the SRPI for every interviewee in the data. These scores range from roughly -2.5 to 2.5, where positive values indicate that an interviewee self-reports as highly prepared so that they do not require much help in general, and negative values indicate that an interviewee self-reports as being in greater need of help with brigade tasks. The overall distribution of SRPI scores is illustrated below:

grmdata <- grmdata %>%
  mutate(preparedness = fscores(my.grm)*-1) %>%
  dplyr::select(brigade, interviewee, preparedness)
sheet_write(grmdata, ss = url, sheet="preparedness")
g <- ggplot(grmdata, aes(x=preparedness, fill="#69b3a2")) +
    geom_histogram(show.legend = FALSE, bins=15, color="black") +
  xlab("Self-Reported Preparedness Index") +
  ggtitle("Distribution of SRPI Scores")
g

One great advantage of using a statistical measurement model like the graded response model is that we can gain insight into how the different questions operate in terms of how difficult they are, and in terms of how similar the responses to the question are to responses to different questions. For difficulty, we have two indices:

(Difficulty in achieving a medium level of preparedness) The level of the SRPI needed to have an equal probability of responding with low or medium preparedness to a question. The higher this value, the more likely any brigade lead will state that they are in most need of an example.
(Difficulty in achieving a high level of preparedness) The level of the SRPI needed to have an equal probability of responding with medium or high preparedness to a question. The lower this value, the more likely any brigade lead will state that they are in least need of an example.
(Relevance) The higher this value, the more similar the responses to a question are to the responses to other questions. The smaller this value, the more idiosyncratic the responses to the question are. Questions with high relevance get the most weight in the calculation of the SRPI.

We first extract all these metrics from the GRM:

y <- coef(my.grm, IRTpars=TRUE)
questions <- names(y)
disc <- as.numeric(sapply(y, FUN=function(x){x[1]}))
diff1 <- as.numeric(sapply(y, FUN=function(x){x[2]}))
diff2 <- as.numeric(sapply(y, FUN=function(x){x[3]}))
item.pars <- data.frame(questions, disc, diff1, diff2)
item.pars <- item.pars[1:(nrow(item.pars)-1),]
item.pars <- mutate(item.pars, questions = fct_reorder(questions, disc, .desc = FALSE),
                    diff1 = -diff1, diff2 = -diff2)

The following figure plots the questions by their difficulty in achieving a medium level of preparedness:

item.pars <- mutate(item.pars, questions = fct_reorder(questions, diff2, .desc = FALSE))
g <- ggplot(item.pars, aes(y = questions, x = diff2, fill=diff2)) +
  geom_col() +
  scale_fill_continuous(guide = FALSE) +
  xlab("Difficulty") +
  ylab("") +
  ggtitle("Difficulty of Achieving a Medium Level \nof Preparedness for Each Task") +
  annotate("text", x=-3.5, y=2, label="Less Difficult") +
  annotate("text", x=-3.5, y=19, label="More Difficult")
g

The following figure plots the questions by their difficulty in achieving a high level of preparedness:

item.pars <- mutate(item.pars, questions = fct_reorder(questions, diff1, .desc = FALSE))
g <- ggplot(item.pars, aes(y = questions, x = diff1, fill=diff1)) +
  geom_col() +
  scale_fill_continuous(guide = FALSE) +
  xlab("Difficulty") +
  ylab("") +
  ggtitle("Difficulty of Achieving a High Level \nof Preparedness for Each Task") +
  annotate("text", x=3, y=2, label="Less Difficult") +
  annotate("text", x=3, y=19, label="More Difficult")
g

While all of the responses across tasks are positively correlated, some are more strongly correlated with more of the other tasks while a few tasks are more idiosyncratic. The following figure plots the questions by relevance to the overarching concept of “preparedness”:

item.pars <- mutate(item.pars, questions = fct_reorder(questions, disc, .desc = FALSE))
g <- ggplot(item.pars, aes(y = questions, x = disc, fill=disc)) +
  geom_col() +
  scale_fill_continuous(guide = FALSE) +
  xlab("Relevance") +
  ylab("") +
  ggtitle("Relevance of Each Question to the\nSelf-Reported Preparedness Index")
g

Participating in days of action, developing a brand and media strategy, onboarding to the national network, and developing a code of conduct are the tasks which appear to be least related to the rest of the tasks. That is not to say that these tasks aren’t important – just that they are more related to a different conception of “preparedness” than what the other questions in this survey are measuring.

How Do Large Brigades Differ From Small Brigades?

Should Code for America provide different support and guidence to large and small brigades? We examine whether there are differences between the brigades we coded as large and as small above in their overall rankings on each question. We also use independent samples t-tests to determine whether these differences are statistically significant (although please note that with 20 tests, we would expect one false positive result at the 95% level on average.) First we calculate these differences and t-test confidence intervals:

diff <- data.frame()
df <- apply(data[,6:25], 2, FUN=function(x){
  tt <- t.test(x ~ data$size)
  df <- data.frame(difference = -diff(tt$estimate),
                   lb = tt$conf.int[1],
                   ub = tt$conf.int[2])
  return(df)
  })
df <- data.frame(matrix(unlist(df), nrow=length(df), byrow=TRUE))
df <- cbind(names(data)[6:25], df)
names(df) <- c("question", "difference", "LB", "UB")
df <- arrange(df, difference)

Next we visualize all of these differences. In the plot below, dots which appear to the left of the vertical line at 0 mean that small brigades report needing more guidence on the task than large brigades, and dots which appear to the right of the line indicate that large brigades need more help than small brigades. The horizontal lines are 95% confidence intervals for t-tests, and statistical significance at the 95% level is illustrated when the horizontal line does not intersect the vertical line at 0:

g <- ggplot(df, aes(x = difference, y = reorder(question, -difference))) +
  geom_point() +
  geom_vline(xintercept=0, lty=2) +
  geom_segment(aes(x = LB, y = reorder(question, -difference),
                   xend = UB, yend = reorder(question, -difference))) +
  annotate("text", x=-1.25, y=8, label="Small brigades \nneed more help") +
  annotate("text", x=.85, y=8, label="Large brigades \nneed more help") +
  labs(
    title = "Few Meaningful Differences Between Small and Large Brigades",
    subtitle = "Only Running Hacknights is Statistically Significant",
    caption = "Horizontal lines are 95% confidence intervals from independent samples t-tests. \nIf the horizontal line does not cross 0 on the x-axis, then the difference is statistically significant at the 95% level."
  ) +
  ylab("Brigade Task")
g

In general, small brigades request more help than large brigades on most of the tasks. Only one of these tasks – running hacknights – is large enough to be considered statistically significantly different. Among the tasks, the only one that appears to have more requests for help from large brigades than small ones is fostering DEI.

Again, we must use caution when drawing conclusions because we simply do not have enough data to draw many conclusions about statistical significance. We see this pattern again when comparing small and large brigades in their overall self-reported preparedness index values:

data <- full_join(data, grmdata, by = c("brigade", "interviewee"))
t.test(data$preparedness ~ data$size)

## 
##  Welch Two Sample t-test
## 
## data:  data$preparedness by data$size
## t = 1.3457, df = 67.423, p-value = 0.1829
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1401134  0.7201639
## sample estimates:
## mean in group Large mean in group Small 
##           0.1292559          -0.1607694

We see that large brigades have a somewhat higher level of preparedness overall than small ones, although this difference is not statistically significant. We can visualize this difference with a box plot:

g <- ggplot(data, aes(x=size, y=preparedness, fill=size)) +
  geom_boxplot() +
  theme(legend.position = "none") +
  xlab("Brigade Size") +
  ylab("Self-Reported Preparedness Index") +
  labs(
    title = "Large Brigades Report Slightly More Preparedness Than Small Brigades",
    subtitle = "But the difference is not statistically significant"
  )
g

The evidence in the data support two tasks as being important for Code for America to take into account when building training and support programs for large and for small brigades. Small brigades request more help with most tasks but especially with hack nights, and large brigades request more help with DEI.

Does Length of Leadership and Region Play a Role?

We can measure whether brigades with longer-tenured leaders self-report higher levels of preparedness. In the following figure, we plot each brigade’s leadership length by the SRPI. We superimpose a linear fit on the data to gauge whether there is any relationship between these two variables:

g <- ggplot(data, aes(x=leadlength, y=preparedness)) +
  geom_point() +
  geom_smooth(method="lm") +
  xlab("Maximum length ") +
  labs(
    title = "Brigades with Longer-tenured Leadership Have Slightly Higher Average SRPI",
    subtitle = "But there's a lot of variance, and the relationship is not statistically significant"
  ) +
  xlab("Years that the longest-tenured brigade lead has been in that role") +
  ylab("Self-reported preparedness index")
g

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

## Warning: Removed 3 rows containing missing values (geom_point).

Likewise, we can use a boxplot to visualize differences between brigades from different regions in terms of their SRPI:

data2 <- filter(data, !is.na(region))
g <- ggplot(data2, aes(x=region, y=preparedness, fill=region)) +
  geom_boxplot() +
  theme(legend.position = "none") +
  xlab("Region") +
  ylab("Self-Reported Preparedness Index") +
  labs(
    title = "Bridages from the Midwest and Pacific are Slightly More Prepared",
    subtitle = "But the differences are small and not statistically significant"
  )
g

In both the case of length of leadership and the case of region, there are some differences in the data, but the differences are very small and there is substantial variation across brigades. Neither leadership length nor region appears to be a major driving force in and of itself of the responses we’ve collected about brigade needs.

Recommendations

From the outset of this project, the BOP extension team set as goals finding answers to the following questions:

Which topics that are / aren’t currently in the BOP are examples / support most needed? How do these needs differ based on:
- Brigade size (from the interviews + Metropolitan Statistical Areas)
- Region
- Length of leadership
How do Brigade leads’ want to be supported by Code for America?

The findings outlined in this report suggest that these factors – size, length of leadership, and region – are not the main drivers of where brigades report a need for support. Size seems to have some impact: smaller brigades are more likely to request help with hack nights, and larger brigades are more likely to request support with DEI. Region and leadership do not appear to generate much systematic variation in how brigade leads responded to the questions in this survey.

We measured an underlying concept from the brigade leads’ responses to various tasks, and we call this concept the self-reported preparedness index. The SRPI can be thought of as a measurement of how well prepared a brigade is to handle the various tasks discussed in this survey. The higher the SRPI, the less likely the brigade will report that they need an example on any given task. But some tasks are more difficult than others, so that a brigade will need a higher SRPI value before they report no need for an example. The two figures above show us that some tasks are difficult when both looking at the transition from high-need to medium-need for an example, and the transition from medium-need to low-need: in other words, some tasks are difficult both for brigades to achieve, and to master. The following tasks fall into this category:

ip <- item.pars %>% 
  mutate(hard = diff2 > median(diff2),
         hard_master = diff1 > median(diff1),
         rank_int = rank(-diff2),
         rank_master = rank(-diff1))
df <- filter(ip, hard & hard_master)
kable(select(df, questions, rank_int, rank_master),
      caption = "Tasks that are hard for Brigades to achieve and hard to master")

Tasks that are hard for Brigades to achieve and hard to master
questions	rank_int	rank_master
Core team	1	6
Brigade plan	6	3
Project plan	5	9
Fundraising	4	7
DEI	2	4

Brigades report higher needs for an example with building a core team, drafting strategic plans for projects and for the brigade, fundraising, and DEI. These topics should be the main focus of any plan to provide additional support and guidance to brigades regardless of the prior experience level of the brigade.

Some tasks are difficult for brigades to achieve, but once they do, find it relatively easy to master. These tasks are:

df <- filter(ip, hard & !hard_master)
kable(select(df, questions, rank_int, rank_master),
      caption = "Tasks that are hard for Brigades to achieve, but not to master")

Tasks that are hard for Brigades to achieve, but not to master
questions	rank_int	rank_master
Gov. Partnerships	3	11
Community Partnerships	8	14
User needs workshops	7	12
Lean software	9	18
Workforce development	10	15

It appears to be worthwhile to provide guidance and support to brigades to help them get started with cultivating government and community partnerships, hosting workshops to identify user needs, use lean software development, and assist their volunteers in workforce development. However, these efforts should be focused on brigades that self-report lower levels of preparation, as there’s a small gap between achieving and mastering these skills.

Some tasks are easier to achieve but relatively difficult to master:

df <- filter(ip, !hard & hard_master)
kable(select(df, questions, rank_int, rank_master),
      caption = "Tasks that are easy for Brigades to achieve, but hard to master")

Tasks that are easy for Brigades to achieve, but hard to master
questions	rank_int	rank_master
Days of action	17	8
Code of Conduct	20	5
Brand and media	12	2
National network	14	1
Make open-source projects replicable	11	10

These tasks all involve easily-adopted models that brigades can use: there are ongoing projects that brigades can participate in during days of action without having to plan the project from scratch; Code for America and established brigades have template codes of conduct that other brigades can adopt; Code for America sets new brigades up with some media resources such as Slack and Meetup, which is part of the initial onboarding process with the national network; and simply using GitHub makes projects replicable. But at the same time, it is hard to master these skills, or even to know how to move beyond the initial models. It is hard for a brigade to plan their own new projects for a day of action, write an original code of conduct, expand a media and branding presence, delve deeper into the national network, or become a GitHub rockstar. If Code for America wishes to provide support on these tasks, it should be geared towards helping brigades achieve a level of mastery.

Finally, some tasks are relatively easy for brigades to both achieve and to master:

df <- filter(ip, !hard & !hard_master)
kable(select(df, questions, rank_int, rank_master),
      caption = "Tasks that are easy for Brigades to achieve and to master")

Tasks that are easy for Brigades to achieve and to master
questions	rank_int	rank_master
Hacknights	18	16
User Testing	13	13
Brigade management tools	19	20
Running a remote brigade	16	17
Local gov. job opportunities	15	19

Brigades by and large report less need for examples and support for hosting hack nights, conducting user testing, using tools to communicate with the brigade or running remote events, or connecting people with local job opportunities. These tasks are generally essential for a brigade to exist in an active state at all. As our interviews were with active brigades, these brigades may be more universally successful with these tasks already.

Because the data show that certain tasks are difficult to achieve in any capacity, and other tasks are difficult to master, we recommend that Code for America focus on two levels of support for brigades: a “beginner’s” level and an “advanced” level. The beginner’s level should emphasize forming partnerships with government and communities, software and workforce development, and assessing user needs. The advanced track should emphasize developing new projects for days of action events, customizing and fully abiding by a code of conduct, improving the GitHub organization of code and software, optimizing a brnd and media strategy, and becoming more deeply involved in the national Code for America network. Both tracks should also include training on DEI, fundraising, developing strategic plans for brigade operations and for projects, and growing and supporting a core leadership team. We do not see compelling evidence that assignment to the beginner’s or advanced tracks should be made according to size, leadership length, or region. A better approach may be to ask brigade leads to self-select into a track, and to provide support based on their own assignment of their and their brigade’s level of preparedness to tackle all of the tasks that a successful brigade will accomplish.

References

Chalmers, R Philip, and others. 2012. “Mirt: A Multidimensional Item Response Theory Package for the r Environment.” Journal of Statistical Software 48 (6): 1–29.

Embretson, Susan E, and Steven P Reise. 2013. Item Response Theory. Psychology Press.

Samejima, Fumiko. 1997. “Graded Response Model.” In Handbook of Modern Item Response Theory, 85–100. Springer.

Brigade Organizer’s Playbook Extension Project: Quantitative Analysis Report

The BOP Team

May 31, 2021