The starting point for a Choose Your Own Adventure is the stories. Designing the scenarios can seem a bit overwhelming at first, especially if your story has many different options. The easiest thing to do is to start small – it’s much easier to make something more complicated later on. This document will serve as a map for the coding that occurs later. Start by writing a question and several options for answers. For options that will go to additional questions, add a note for the route and then add headers for those questions as a placeholder (eg “Question 2”, “Question 3”, etc). For options that lead to an ending, write the ending at the end of your document along with a placeholder number (eg “End 1”, “End 2”, etc). This numbering may change later, but it’s important to account for the options now.
Once your first question is complete, go to the first question placeholder that you wrote and repeat the process from above. Some of the options that route to an ending that has already been written, but if not, add a new ending option. Repeat this process until all of your questions have options.
While the fun could theoretically continue indefinitely, eventually you will want to reach a question where all of the options route to an ending. This can be somewhat artificial. For example, there may be one correct option whereas other options route to an ending that indicates the user was “less than” correct.
Go back to your first question and evaluate the options.
Repeat this process for all questions.
Now that the scenarios are written, they need to be moved into a medium that allows users to traverse the options. For our purposes, the easiest way to do this was through a survey that allows users to be routed to different questions. We opted to use Google Forms, but there are likely other services (Microsoft forms, Qualtrics, etc) that would achieve the same goal.
###Survey sections To start, create sections that correspond to all of the questions and endings. You may want to include a header and question for user identification, but wait until the end to do this. This will make it easier to associate your current document with the survey; otherwise, all of the questions and endings will be one more (eg n+1) compared to how they are assigned in your document.
Once all of the sections are created, start filling in the prompts and options. When adding the options, make sure that “Go to section based on answer” is selected; this can be found within the three vertical dots at the bottom of the section. Now the routes indicated within your document can be added into the survey. Sample:
If you want to be able to identify individual users, add in a section at the beginning for users to enter an ID. If you want users to be “blinded” by the meaning of each ending, make the text within each ending the same within the survey, but make sure that your document contains a key to the meaning of each ending.
Try your survey several times to ensure that
Most programs will give the option to have the survey data saved as a spreadsheet. Access this data and save it for later use.
Access the survey using the “readxl” package and the file name. If you do not have a working drive set, you will need to do that so that R knows where to find the file.
data1 <- read_excel("MTX CYOA ScenarioIV_levels (Responses)_TOTAL.xlsx")
I find it easiest to work with the survey data as a data frame
MTXasDF <- as.data.frame(data1)
Because of the way the cases are constructed, many of the columns of data have the same header. Our survey also had some other questions asking users about their confidence in the answers that was used for later analysis. For creating the CYOA, however, I just want to choose the columns that are relevant to the map. Therefore, I first chose only the relevant columns
MTX_COA <- MTXasDF[,c(2,4,3,6,5,9,8)]
and then gave them a unique ID.
names(MTX_COA) <- c("ID", "Q1","Q1fluid", "Q2","Q2fluid","Q3","Q3fluid")
Finally, I renamed this as a new data frame:
MTX_data1 <- MTX_COA
If you look closely at the survey data, you will see that each cell of data contains the full text of the question option. This is difficult to work with, so we need to transform this data into a more workable data frame. The first step is to add the endings and to assigns the text data to different ending outcomes. The next step will be to route from question to question.
The important thing to note about this organization is that the spreadsheet is recording IF a user visited that node or ending. The pathing will be determined based on this information.
Right now, the spreadsheet only has question options, but no endings. Add the endings to the spreadsheet. For Scenario IV, there are seven possible endings. This code creates a column for the ending with no data within it.
MTX_data1$End4 <- NA
MTX_data1$End5 <- NA
MTX_data1$End6 <- NA
MTX_data1$End7 <- NA
MTX_data1$End8 <- NA
MTX_data1$End9 <- NA
MTX_data1$End10 <- NA
Note that I numbered the endings as if they were questions (eg I had three questions and my first ending is End4 rather than Q4 or End 1). This is revised later in the process when visualizing the map, but for this first iteration, this is how I kept things straight. Feel free to assign ending names in whatever way makes the most sense to you!
The next step is to link the text within the spreadsheet to a specific ending. This should be the same information that is within your document or the survey. I find it easiest to go question by question. The basic format of the code is to indicate that if the text within a question option should route to an ending, then that ending number should be added to the ending column. For example
MTX_data1$End4[MTX_data1$Q1=="Obtain next level in 18 hours (Hour 42)"] <- 4
indicates that if the text “Obtain next level in 18 hours (Hour 42)” is found within the Q1 column, then the number 4 should be added to the End4 column of the spreadsheet.
Make sure that all endings for a question are accounted for. For example, for Scenario IV Question 1, there are nine options that route to an ending:
MTX_data1$End4[MTX_data1$Q1=="Obtain next level in 18 hours (Hour 42)"] <- 4
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 15mg/m2/dose every 6 hours, obtain next level in 6 hours (Hour 30)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 15mg/m2/dose every 6 hours, obtain next level in 12 hours (Hour 36)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 15mg/m2/dose every 6 hours, obtain next level in 18 hours (Hour 42)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 100mg/m2/dose every 6 hours, obtain next level in 6 hours (Hour 30)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 100mg/m2/dose every 6 hours, obtain next level in 12 hours (Hour 36)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 100mg/m2/dose every 6 hours, obtain next level in 18 hours (Hour 42)"] <- 5
MTX_data1$End6[MTX_data1$Q1=="Give glucarpidase 50U/kg/dose IV once"] <- 6
MTX_data1$End7[MTX_data1$Q1=="Discharge the patient home"] <- 7
Repeat this process for all questions.
Notes:
If there is any type of answer in the question column, this means that the user visited this question. Therefore, this information can be input by checking for any text within the column. If there is text within the cell, then the user visited this question. In terms of the coding,
MTX_data1$Q1 <- ifelse(is.na(MTX_data1$Q1), NA, 1)
will replace any text of the Q1 column with NA if the user did not visit and 1 if the user did visit. This code is repeated for all questions.
Assigning information for the fluid pathing is performed in the same way as assigning the endings. Instead of a number being assigned, a yes or no designation was used.
MTX_data1$Q1fluid[MTX_data1$Q1fluid=="No fluid changes are necessary at this time."] <- "no"
MTX_data1$Q1fluid[MTX_data1$Q1fluid=="Increase fluids by 75 ml/m2/hr."] <- "yes"
By the end of this section, the spreadsheet text should be replaced with numbers. All users should have an number within one of the ending columns.
The next step is to use the document to code all possible options. Each question (including the hydration options in this example) is defined as a node and the pathing between is defined as an edge.This is accomplished through an Excel file.
The best way to start this process is to assign a Node Number to each Question, Hydration Point, and Ending. For example, I assigned Question 1 as 1, Q1Fluid “Yes” as 2, and Q1Fluid “No” as 3. The process then repeats. For my example, I got a bit out of order and assigned the questions (eg Q2 is 4, Q3 is 5) before returning to assigning the fluid options (eg Q2Fluid Yes is 6, Q2fluid No is 7, etc).
The next step is to make sure that all options within the document/survey are routed to a node. Four columns should be made:
The “Node 1” column indicates the starting node, the “arrow” column adds an arrow (->), the “Node 2” column indicates the ending node, and “edge” concatenates Node 1 to Node 2 with an arrow between. In Excel, this can be accomplished with =CONCATENATE(A2, B2, C2).
In my example, Question 1 (Node 1) routes to the Q1fluid Yes (Node 2) or Q1 fluid No (Node 3). By entering these values into the proper columns, I have generated two edges: 1->2 and 1->3.
This process is then repeated for all other options within the question. From Question 1, there are six possible routes:
Therefore, these edges should be set from Q1fluid Yes (Node 2) or Q1 fluid No (Node 3) to each of these nodes. This process is repeated for all questions.
TWO VERY IMPORTANT STEPS TO REMEMBER
This map is then accessed by R by reading the file. Mine is named data:
data <- read_excel("DatagenerationIV_levels.xlsx")
The final step to create a map is to link the paths of users to the map. To do this, I created a table (livecountX) that counts how many users used a particular path (or edge). This will give a count of how many users went directly from one node to another. For example, if we look at the first eight lines of the code:
livecountX <- c(nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(Q1))),
nrow(MTX_data1 %>% filter(Q1fluid=="no") %>% filter(!is.na(Q1))),
nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(Q2))),
nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(Q3)) %>% select(Q2) %>% filter_all(all_vars(is.na(.)))),
nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End4)) %>% select(Q2:Q3) %>% filter_all(all_vars(is.na(.)))),
nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End5)) %>% select(Q2:End4) %>% filter_all(all_vars(is.na(.)))),
nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End6)) %>% select(Q2:End5) %>% filter_all(all_vars(is.na(.)))),
nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End7)) %>% select(Q2:End6) %>% filter_all(all_vars(is.na(.)))))
the meaning is as follows:
Line 1: count the number of users who answered Q1fluid as yes (as indicated in our modified spreadsheet) AND who have information with Q1.
Line 2: count the number of users who answered Q1fluid as no AND who have information with Q1.
Line 3: count the number of users who answered Q1fluid as yes AND who have information with Q2.
Line 4: count the number of users who answered Q1fluid as yes AND who have information within Q3 BUT no information with Q2. This is what counts the direct movement from Q1 to Q3 rather than a user going from Q1 to Q2 to Q3
Line 5: count the number of users who answered Q1fluid as yes AND who have information within End4 BUT no information with Q2 or Q3.
Line 6: count the number of users who answered Q1fluid as yes AND who have information within End 5 BUT no information with Q2, Q3, or End 4.
Line 7: count the number of users who answered Q1fluid as yes AND who have information within End 6 BUT no information with Q2, Q3, End 4, or End 5.
Line 8: count the number of users who answered Q1fluid as yes AND who have information within End 7 BUT no information with Q2, Q3, End 4, End 5, or End 6.
This is repeated for all possible routes
TO CHECK YOURSELF
The number of lines in this code should equal the number of rows in the Excel map spreadsheet. If you do not have a one to one alignment here, you will get an error message. I find it easy to copy and paste this code into the Excel map sheet to make sure that each path has a matching line of code.
I then add a column into the Excel map (which is now in R as a table named “data”). I also added a row that calculates the percent of users who used a particular path: This map is then accessed by R by reading the file. Mine is named data:
data$LiveCount <- livecountX
users <- length(which(MTX_data1$Q1 != "NA"))
data$LivePercent <- livecountX/users
To draw the map, the DiagrammeR package is used. DiagrammeR draws maps based on a node data frame and an edge data frame. The following code sets up those data frames and details the options I used to set the shapes and colors within the map.
The following code was used to name the nodes. This is where I changed the naming convention so that endings started at End 1. The first part of the code used the “data” sheet to gather all unique node names and put them in a table called “nd”. A label column was then added with the name of each node.
node_list <- unique(data$node1); node_list2 <- unique(data$node2)
nl <- unique(c(node_list, node_list2))
nd <- as.data.frame(nl)
nd$id <- nd$nl
nd <- nd %>% select(-nl)
nd <- nd %>% arrange(id)
nd$label <- c("24Hr", "Yes", "No","30Hr","36Hr",
"Yes", "No", "Yes", "No", "end1",
"end2", "end3", "end4", "end5", "end6", "end7")
A “color” and “shape” column were added to the “nd” sheet. The basic syntax is I am assigning colors or shapes based on the ID of the node within the nd sheet.
nd$color <- NULL
nd$color <- ifelse(nd$id %in% c("10", "11", "12", "13", "14", "16"), "gray93",
ifelse(nd$id %in% c("1", "2", "5", "9", "15"), "Honeydew2", "white"))
nd$shape <- NULL
nd$shape = ifelse(nd$id %in% c("10", "11", "12", "13", "14","15", "16"), "square",
ifelse(nd$label == "Yes", "diamond",
ifelse(nd$label == "No", "diamond","circle")))
For example, this code says that all endings should be shaded gray, the “correct” route should be shaded honeydew, and all others should be shaded white. Similarly, for the shapes, all endings should be shaped as a square, all hydration choices should be shaped as a diamond, and everything else (eg the questions) should be shaped as a circle.
To set up the edges, I created a table called “edge_list”. This used information from the “data” sheet, including path information and the number of users who traversed that path. If the path was used, the edge was set to navy blue; otherwise it was gray. The weight of the path corresponded to a multiple of the LivePercent column.
colnames(data)[colnames(data)=="node1"] <- "from"
colnames(data)[colnames(data)=="node2"] <- "to"
colnames(data)[colnames(data)=="val2"] <- "LivePercent"
edge_list <- select(data, to, from, LivePercent)
edge_list$style <- "solid"
edge_list$color <- NULL
edge_list$color <- ifelse(edge_list$LivePercent != 0, "navyblue", "gray85")
edge_list$penwidth <- 8*(edge_list$LivePercent+.25)
The node data frame takes information from the nd table to set parameters. Documentation within DiagrammeR lists all of the options. Here, the type, label, shape, and fill color are all set from the “nd” table. The fontsize, penwidth, color, fontcolor, and fontname are all set within this code, but this is accomplished using information from the “nd” table. For example, this code is how I made the correct pathing have a bolder outline (penwidth) and bolder text (fontname).
nodes_df <- create_node_df(n = nrow(nd),
type = nd$label,
label = nd$label,
shape = nd$shape,
fillcolor = nd$color,
fontsize = ifelse(nd$id %in% c("10", "11", "12", "13", "14", "15", "16"), "18",
ifelse(nd$label == "Yes", "18",
ifelse(nd$label == "No", "18","20"))),
penwidth = ifelse(nd$id %in% c("1", "2", "5", "9", "15"), 4, 1),
color = ifelse(nd$id %in% c("10", "11", "12", "13", "14", "15", "16"), "black",
ifelse(nd$id %in% c("1", "2", "5", "9"), "black", "navyblue")),
fixedsize = FALSE,
fontcolor = ifelse(nd$id %in% c("10", "11", "12", "13", "14", "15", "16"), "black",
ifelse(nd$label == "Yes", "black",
ifelse(nd$label == "No", "black","DarkSlateGray"))),
fontname = ifelse(nd$id %in% c("1", "4", "5"), "Helvetica-Bold", "Helvetica"))
The edges data frame can be set entirely through the edge_list table.
edges_df <- create_edge_df(to = edge_list$to,
from = edge_list$from,
style = edge_list$style,
color = edge_list$color,
penwidth = edge_list$penwidth,
rel = edge_list$to)
To create a graph with DiagrammeR, create_graph is used. This combines the nodes and edges data frame. Other attributes can be specified as well. For example, I specified that my layout should be in the dot configuration, the direction should be top to bottom, and the splines should both in the ortho alignment. This is what gives the subway map style look.
Total <-
create_graph(
nodes_df = nodes_df,
edges_df = edges_df) %>%
add_global_graph_attrs(
attr = c("layout", "rankdir", "splines"),
value = c("dot", "TB", "ortho"),
attr_type = c("graph", "graph", "graph"))
The graph is then rendered and labeled.
Total %>% render_graph(title = "MTX Scenario IV_levels")
To export the graph:
export_graph(Total,file_name = "ScenIV_MS_MTX.svg",file_type = "svg")
Sample:
The spreadsheet of user behavior can be used to determine how many users had correct pathing (eg touched the correct nodes at some point in their journey) or perfect pathing (eg touched only the correct nodes). This is done by counting the number of users who took these paths. While I did use the MTX_data1 spreadsheet for this, I modified it slightly and renamed it as MTX_Correct:
MTX_Correct <- MTX_data1
To facilitate proper counting of the hydration choices, these were changed from Yes/No to 1/0:
MTX_Correct[MTX_Correct == "no"] <- "0"
MTX_Correct[MTX_Correct == "yes"] <- "1"
To create a table of the number of perfect users, the following code was used:
y1 <- count(MTX_Correct %>% filter(Q1 == "1"))
y2 <- count(MTX_Correct %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
y3 <- count(MTX_Correct %>% filter(Q3 == "3") %>% filter(is.na(Q2)) %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
y4 <- count(MTX_Correct %>% filter(Q3fluid == "0")%>% filter(Q3 == "3") %>% filter(is.na(Q2)) %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
y5 <- count(MTX_Correct %>% filter(End9 == "9")%>% filter(Q3fluid == "0")%>% filter(Q3 == "3") %>% filter(is.na(Q2)) %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
perfect.table <-rbind(y1, y2, y3, y4, y5)
perfect.table$ID <- c("Q1", "Q1fluid","Q3",
"Q3fluid", "end6")
perfect.table <- perfect.table[,c(2,1)]
perfect.table$percent <- perfect.table$n/users*100
View(perfect.table)
While there is probably an easier way to do this (eg by creating a loop), I honestly could not figure out how to do it so I just brute forced it by copying code from the “From survey data to map data” section.
To calculate how many nodes each user touched, I modified the MTX_data1 sheet so that the fluid answers were uniquely labeled. This is probably something I should have done earlier, but here we are.
MTX_nodes <- MTX_data1
MTX_nodes$Q1fluid[MTX_nodes$Q1fluid == "no"] <- "Q1n"
MTX_nodes$Q1fluid[MTX_nodes$Q1fluid == "yes"] <- "Q1y"
MTX_nodes$Q2fluid[MTX_nodes$Q2fluid == "no"] <- "Q2n"
MTX_nodes$Q2fluid[MTX_nodes$Q2fluid == "yes"] <- "Q2y"
MTX_nodes$Q3fluid[MTX_nodes$Q3fluid == "no"] <- "Q3n"
MTX_nodes$Q3fluid[MTX_nodes$Q3fluid == "yes"] <- "Q3y"
To calculate the number of nodes touched for the entire cohort:
nodes.touched<-nrow(table(unlist(MTX_nodes[,2:14])))
To calculated the number that each individual touched, I added a column to the the sheet that any entry within the sheet that was not NA:
MTX_nodes$touched <- apply(MTX_nodes[,2:14], 1, function(x)length(unique(na.omit(x))))
#Figures ### Endings used For bar charts, the ggplot2, ggsignif, and DescTools packages were used. To compare groups, I first re-organized MTX_data1 so that groups of users (in this case, SOC or Methotracks) were placed together in the chart. This can be done in R, but I manually altered the Excel sheet. I first counted the number of users for each group who used each ending:
end.analysis2 <-as.data.frame(rbind(
length(which(MTX_data1[1:18,8:14] == "4")),
length(which(MTX_data1[1:18,8:14] == "5")),
length(which(MTX_data1[1:18,8:14] == "6")),
length(which(MTX_data1[1:18,8:14] == "7")),
length(which(MTX_data1[1:18,8:14] == "8")),
length(which(MTX_data1[1:18,8:14] == "9")),
length(which(MTX_data1[1:18,8:14] == "10")),
length(which(MTX_data1[19:36,8:14] == "4")),
length(which(MTX_data1[19:36,8:14] == "5")),
length(which(MTX_data1[19:36,8:14] == "6")),
length(which(MTX_data1[19:36,8:14] == "7")),
length(which(MTX_data1[19:36,8:14] == "8")),
length(which(MTX_data1[19:36,8:14] == "9")),
length(which(MTX_data1[19:36,8:14] == "10"))))
Groups were then assigned:
end.analysis2$Group[1:7] <- "SOC"
end.analysis2$Group[8:14] <- "MTX"
The ends were labeled:
end.analysis2$end <- c("End1", "End2", "End3", "End4", "End5", "End6", "End7", "End1", "End2", "End3", "End4", "End5", "End6", "End7")
The sheet was organized and labeled:
end.analysis2 <- end.analysis2[,c(2,3,1)]
names(end.analysis2) <- c("Group", "End", "Count")
And the percent of individuals who arrived at each ending was calculated and added to the sheet:
SOCUsers <- sum(end.analysis2$Count[which(end.analysis2$Group == "SOC")])
MTXUsers <- sum(end.analysis2$Count[which(end.analysis2$Group == "MTX")])
end.analysis2$Percent <- ifelse(end.analysis2$Group == "SOC", (end.analysis2$Count)/SOCUsers*100, (end.analysis2$Count)/MTXUsers*100)
end.analysis2$remain <- ifelse(end.analysis2$Group == "SOC", SOCUsers-end.analysis2$Count, MTXUsers-end.analysis2$Count)
A Gtest was used for statistical analysis:
for (i in 0:6) {
obs <- rbind(c(end.analysis2[8+i,3], end.analysis2[8+i,5]),
c(end.analysis2[1+i,3], end.analysis2[1+i,5]))
G<-GTest(x = obs, correct = "none")
end.analysis2$pvalue[i+1] <- G$p.value
}
end.analysis2$pvalue[8:14] <- NA
Note that this time I finally figured out how to use the loop function!
The meaning of the calculated p-value was assigned and comparison lines were added:
annot <- data.frame(value=end.analysis2$pvalue, mean=ifelse(end.analysis2$pvalue<0.05, "*", "NS"))
annot <- annot[1:7,]
lines=data.frame(x=c(0.875, 1.875, 2.875, 3.875, 4.875, 5.875, 6.875), xend=c(1.125,2.125,3.125,4.125,5.125,6.125,7.125),
y=c(25, 21, 5, 5, 37, 75, 50), annotation=annot$mean)
Finally, all information was put together in GGPLOT:
library(ggplot2)
library(ggsignif)
Fig2C <- ggplot(end.analysis2, aes(x=End,y=Percent, fill=Group))+theme_minimal()+
geom_bar(stat="identity", position=position_dodge2(reverse=TRUE),width=0.6)+
scale_fill_manual("legend", values = c("MTX" = "SteelBlue3", "SOC" = "LightSteelBlue2"))+
ylim(c(0, 100))+
guides(fill = guide_legend(reverse="TRUE"))+
geom_signif(xmin = lines$x, xmax = lines$xend, y_position = lines$y, annotation = lines$annotation,
tip_length = 0)
print(Fig2C)
Note: a similar approach is taken for perfect pathing, but this is a bit more complicated since only the users with perfect pathing are counted. However, the approach is the same in the code.