There are three goals in this notebook:
Establish a data cleaning pipeline to convert the data in the BOP Awareness and Usage Data Google sheet into a clean, ready-for-analysis version of the data.
Save this version of the the data as another tab in the the Google sheet.
Generate some basic tables and plots to illustrate the responses.
We will use the following libraries:
library(tidyverse)
library(DT)
library(googlesheets4)
We first load the data. We set all 23 columns to be character-valued:
googleurl <- "https://docs.google.com/spreadsheets/d/1T3DVMbGVjlap_QnVx98vcRF650cD49JMPRFSj48yNuI/edit?usp=sharing"
data <- read_sheet(googleurl, col_types="ccccccccccccccccccccccc")
We will only be using the Interview ID, Question, Form Response, Response (Cleaned up), and Size columns, so we keep only these columns:
data <- data[,c(1,2,3,5,6)]
Currently each row is defined by an interview and a question. We reshape the data to wide format in which the questions are in different columns, and there is one row per interview. Both the Form Response and Response (Cleaned up) columns are broken up into four new columns, one for each question:
data <- pivot_wider(data,
id_cols = c(`Interview ID`, Size),
names_from = Question,
values_from = c(`Form Response`, `Response (Cleaned up)`))
Next we give the columns much more manageable names:
colnames(data) <- c("ID", "size", "seenBOP", "howoften", "howhelpful", "recommend",
"seenBOP_response", "howoften_response", "howhelpful_response",
"recommend_response")
We convert size to a factor (categorical) class, and we recode several values of howoften to deal with idiosyncratic text entries. We also convert howhelpful and recommend to numeric class and reorder the categories of size:
data <- data %>%
mutate(size = factor(size),
howoften = fct_recode(howoften,
"Not anymore" = "Initially it was weekly and then stopped referring because there were no updates",
"Not anymore" = "Hasn't referred it in a long time, none of this is relevant to smaller brigade",
"Monthly" = "3 monthly",
"Not anymore" = "Can't remember the last time I referred to it"),
howhelpful = as.numeric(howhelpful),
recommend = as.numeric(recommend),
size = fct_relevel(size, "XSMALL : 0 to 7.25", "SMALL (75% mark) : 8.25 to 14",
"MEDIUM (50% mark) : 15 to 29", "LARGE (25% mark) : 30 to 99",
"XLARGE : 100"))
We can use the sheet_write() function:
sheet_write(data, ss = googleurl, sheet="Cleaned")
## Writing to "BOP Awareness and Usage"
## Writing to sheet "Cleaned"
First, we can create a histogram of seenBOP:
g <- ggplot(data, aes(x = seenBOP)) +
geom_bar(fill="blue") +
xlab("Have you seen the Brigade Organizer’s Playbook (BOP)?") +
ylab("Count") +
theme(text=element_text(size=16, family="serif")) +
geom_text(aes(label = ..count..), stat = "count", vjust = 1.5, colour = "white")
g
We can plot these counts along with brigade size. First we need a smaller dataframe with the brigade sizes, whether or not they’ve seen the BOP, and the count:
data2 <- data %>%
mutate(yes = (seenBOP=="Yes"),
no = (seenBOP=="No")) %>%
group_by(size) %>%
summarize(`Seen BOP` = sum(yes),
`Has not seen BOP` = sum(no)) %>%
pivot_longer(cols = c(`Seen BOP`, `Has not seen BOP`),
names_to = "seenBOP")
## `summarise()` ungrouping output (override with `.groups` argument)
g <- ggplot(data2, aes(x = size, y = value, fill = seenBOP)) +
geom_col(position="dodge") +
xlab("Brigade Size") +
ylab("Count") +
theme(text=element_text(size=16, family="serif")) +
#geom_text(aes(label = ..count..), stat = "count", vjust = 1.5, colour = "white") +
coord_flip()
g