Note: If you would like to collaborate on creating a CYOA assessment and map, please reach out to Caitlin Hanlon.

Designing the “Choose Your Own Adventure” Scenarios


I. Organizing the scenarios

The foundation of a Choose Your Own Adventure (CYOA) is the story itself. Designing a branching scenario can feel overwhelming—especially when your story includes many different paths. The key is to start small. It’s much easier to expand and add complexity later.

This document will act as a map for the coding process that comes afterward.

Step 1: Write Your First Question

Step 2: Follow the Placeholders

Once you’ve written your first question and its options, move to the next placeholder question and repeat the process.

Step 3: Decide When to End

While your story could go on forever, you’ll eventually want to reach a point where every option leads to a conclusion. This may feel a bit artificial: for instance, you might design one “correct” path while others lead to alternate or less ideal endings, but that’s part of the fun and structure of CYOA.


II. Option accounting

Go back to your first question and evaluate the options.

Repeat this process for all questions.

Making a CYOA Survey


I. Survey options

Now that the scenarios are written, they need to be moved into a medium that allows users to traverse the options. For our purposes, the easiest way to do this was through a survey that allows users to be routed to different questions. We opted to use Google Forms, but there are likely other services (Microsoft forms, Qualtrics, etc) that would achieve the same goal.


II. Survey sections

To start, create sections that correspond to all of the questions and endings. You may want to include a header and question for user identification, but wait until the end to do this. This will make it easier to associate your current document with the survey; otherwise, all of the questions and endings will be one more (eg n+1) compared to how they are assigned in your document.

Once all of the sections are created, start filling in the prompts and options. When adding the options, make sure that “Go to section based on answer” is selected; this can be found within the three vertical dots at the bottom of the section of a Google Form. Now the routes indicated within your document can be added into the survey.

Sample:


III. Beginnings and endings If you want to be able to identify individual users, add in a section at the beginning for users to enter an ID. If you want users to be “blinded” by the meaning of each ending, make the text within each ending the same within the survey, but make sure that your document contains a key to the meaning of each ending.


IV. Survey data Try your survey several times to ensure that

  1. The routing is operating correctly, and
  2. The data is accessible.

Most programs will give the option to have the survey data saved as a spreadsheet. Access this data and save it for later use.


Organizing the data in R


I. Reading the survey data in R

Access the survey using the “readxl” package and the file name. If you do not have a working drive set, you will need to do that so that R knows where to find the file.

data1 <- read_excel("MTX CYOA ScenarioIV_levels (Responses)_TOTAL.xlsx")


II. Cleaning the survey I find it easiest to work with the survey data as a data frame

MTXasDF <- as.data.frame(data1)

Because of the way the cases are constructed, many of the columns of data have the same header. Our survey also had some other questions asking users about their confidence in the answers that was used for later analysis. For creating the CYOA, however, I just want to choose the columns that are relevant to the map. Therefore, I first chose only the relevant columns

MTX_COA <- MTXasDF[,c(2,4,3,6,5,9,8)]

and then gave them a unique ID.

names(MTX_COA) <- c("ID", "Q1","Q1fluid", "Q2","Q2fluid","Q3","Q3fluid")

Finally, I renamed this as a new data frame:

MTX_data1 <- MTX_COA


III. From text to pathing

If you look closely at the survey data, you will see that each cell of data contains the full text of the question option. This is difficult to work with, so we need to transform this data into a more workable data frame. The first step is to add the endings and to assigns the text data to different ending outcomes. The next step will be to route from question to question.

The important thing to note about this organization is that the spreadsheet is recording IF a user visited that node or ending. The pathing will be determined based on this information.


IV. Adding endings Right now, the spreadsheet only has question options, but no endings. Add the endings to the spreadsheet. For Scenario IV, there are seven possible endings. This code creates a column for the ending with no data within it.

MTX_data1$End4 <- NA
MTX_data1$End5 <- NA
MTX_data1$End6 <- NA
MTX_data1$End7 <- NA
MTX_data1$End8 <- NA
MTX_data1$End9 <- NA
MTX_data1$End10 <- NA

Note that I numbered the endings as if they were questions (eg I had three questions and my first ending is End4 rather than Q4 or End 1). This is revised later in the process when visualizing the map, but for this first iteration, this is how I kept things straight. Feel free to assign ending names in whatever way makes the most sense to you!


V. Assigning endings

The next step is to link the text within the spreadsheet to a specific ending. This should be the same information that is within your document or the survey. I find it easiest to go question by question. The basic format of the code is to indicate that if the text within a question option should route to an ending, then that ending number should be added to the ending column. For example

MTX_data1$End4[MTX_data1$Q1=="Obtain next level in 18 hours (Hour 42)"] <- 4

indicates that if the text “Obtain next level in 18 hours (Hour 42)” is found within the Q1 column, then the number 4 should be added to the End4 column of the spreadsheet.

Make sure that all endings for a question are accounted for. For example, for Scenario IV Question 1, there are nine options that route to an ending:

MTX_data1$End4[MTX_data1$Q1=="Obtain next level in 18 hours (Hour 42)"] <- 4
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 15mg/m2/dose every 6 hours, obtain next level in 6 hours (Hour 30)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 15mg/m2/dose every 6 hours, obtain next level in 12 hours (Hour 36)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 15mg/m2/dose every 6 hours, obtain next level in 18 hours (Hour 42)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 100mg/m2/dose every 6 hours, obtain next level in 6 hours (Hour 30)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 100mg/m2/dose every 6 hours, obtain next level in 12 hours (Hour 36)"] <- 5
MTX_data1$End5[MTX_data1$Q1=="Give leucovorin at 100mg/m2/dose every 6 hours, obtain next level in 18 hours (Hour 42)"] <- 5
MTX_data1$End6[MTX_data1$Q1=="Give glucarpidase 50U/kg/dose IV once"] <- 6
MTX_data1$End7[MTX_data1$Q1=="Discharge the patient home"] <- 7

Repeat this process for all questions.

Notes:


VI. Routing between questions

If there is any type of answer in the question column, this means that the user visited this question. Therefore, this information can be input by checking for any text within the column. If there is text within the cell, then the user visited this question. In terms of the coding,

MTX_data1$Q1 <- ifelse(is.na(MTX_data1$Q1), NA, 1)

will replace any text of the Q1 column with NA if the user did not visit and 1 if the user did visit. This code is repeated for all questions.


VII. Routing for fluid Assigning information for the fluid pathing is performed in the same way as assigning the endings. Instead of a number being assigned, a yes or no designation was used.

MTX_data1$Q1fluid[MTX_data1$Q1fluid=="No fluid changes are necessary at this time."] <- "no"
MTX_data1$Q1fluid[MTX_data1$Q1fluid=="Increase fluids by 75 ml/m2/hr."] <- "yes"

By the end of this section, the spreadsheet text should be replaced with numbers. All users should have an number within one of the ending columns.


Creating the map blueprint


The next step is to build the blueprint of the CYOA map by linking each question (or node) to the next using edges, which represent the possible paths between them. These connections are defined in an Excel file and will be used later by a program called DiagrammeR (more on that later).

I. Creating the Excel sheet

Create an Excel sheet with four columns:

  1. node1
  2. arrow
  3. node2
  4. edge

The “Node 1” column indicates the starting node, the “arrow” column adds an arrow (->), the “Node 2” column indicates the ending node, and “edge” concatenates Node 1 to Node 2 with an arrow between. In Excel, this can be accomplished with =CONCATENATE(A2, B2, C2).

I saved my sheet as “DatagenerationIV_levels.xlsx”. Our next step will be to create rows that define which nodes are linking together.


II. Adding in node data

The best way to start this process is return to your Scenario Document. Assign a Node Number to each Question, Hydration Point, and Ending. For example, I assigned Question 1 as 1, Q1Fluid “Yes” as 2, and Q1Fluid “No” as 3. The process then repeats. For my example, I got a bit out of order and assigned the questions (eg Q2 is 4, Q3 is 5) before returning to assigning the fluid options (eg Q2Fluid Yes is 6, Q2fluid No is 7, etc).

In my example, Question 1 (Node 1) routes to the Q1fluid Yes (Node 2) or Q1 fluid No (Node 3). By entering these values into the proper columns, I have generated two edges: 1->2 and 1->3.

This process is then repeated for all other options within the question. From Question 1 fluid (yes or no), there are six possible routes:

  1. Question 2 (Node 4)
  2. Question 3 (Node 5)
  3. End 4 (Node 10)
  4. End 5 (Node 11)
  5. End 6 (Node 12)
  6. End 7 (Node 13)

Therefore, these edges should be set from Q1fluid Yes (Node 2) or Q1 fluid No (Node 3) to each of these nodes. This process is repeated for all questions.

Sample:

TWO VERY IMPORTANT STEPS TO REMEMBER

  1. The “node2” column has to be in numerical order (smallest to largest).
  2. Make sure that all routes are accounted for. Count the number of different paths from Question 1 in the document or survey. This should equal the number of rows within the Excel sheet corresponding this node. Fix any discrepancies now, but there will be a method to check later.

This map is then accessed by R by reading the file. Mine is named data:

data <- read_excel("DatagenerationIV_levels.xlsx")

While this won’t look like much of a map now, the Excel information will give R a blueprint to interpret how nodes are connected through edges.


From survey data to map data

The final step to create a map is to link the paths of users to the map. To do this, I created a table (livecountX) that counts how many users used a particular path (or edge). This will give a count of how many users went directly from one node to another. For example, if we look at the first eight lines of the code:

livecountX <- c(nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(Q1))),
                nrow(MTX_data1 %>% filter(Q1fluid=="no") %>% filter(!is.na(Q1))),
                nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(Q2))),
                nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(Q3)) %>% select(Q2) %>% filter_all(all_vars(is.na(.)))),
                nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End4)) %>% select(Q2:Q3) %>% filter_all(all_vars(is.na(.)))),
                nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End5)) %>% select(Q2:End4) %>% filter_all(all_vars(is.na(.)))),
                nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End6)) %>% select(Q2:End5) %>% filter_all(all_vars(is.na(.)))),
                nrow(MTX_data1 %>% filter(Q1fluid=="yes") %>% filter(!is.na(End7)) %>% select(Q2:End6) %>% filter_all(all_vars(is.na(.)))))

The meaning is as follows:

This is repeated for all possible routes

TO CHECK YOURSELF

The number of lines in this code should equal the number of rows in the Excel map spreadsheet. If you do not have a one to one alignment here, you will get an error message. I find it easy to copy and paste this code into the Excel map sheet to make sure that each path has a matching line of code.

I then add a column into the Excel map (which is now in R as a table named “data”). I also added a row that calculates the percent of users who used a particular path: This map is then accessed by R by reading the file. Mine is named data:

data$LiveCount <- livecountX
users <- length(which(MTX_data1$Q1 != "NA"))
data$LivePercent <- livecountX/users

Using Diagrammer

To draw the map, the DiagrammeR package is used. DiagrammeR draws maps based on a node data frame and an edge data frame. The following code sets up those data frames and details the options I used to set the shapes and colors within the map.


I. Setting up the nodes

The following code was used to name the nodes. This is where I changed the naming convention so that endings started at End 1. The first part of the code used the “data” sheet to gather all unique node names and put them in a table called “nd”. A label column was then added with the name of each node.

node_list <- unique(data$node1); node_list2 <- unique(data$node2)
nl <- unique(c(node_list, node_list2))
nd <- as.data.frame(nl)
nd$id <- nd$nl
nd <- nd %>% select(-nl)
nd <- nd %>% arrange(id)
nd$label <- c("1", "2", "3","4","7", 
              "6", "5", "8", "9", "end1",
              "end2", "end3", "end4", "end5", "FINAL", "end7")


II. Adding shapes and colors to nodes

A “color” and “shape” column were added to the “nd” sheet. The basic syntax is I am assigning colors or shapes based on the ID of the node within the nd sheet.

nd$color <- NULL
nd$color <- ifelse(nd$id %in% c("10", "11", "12", "13", "14", "16"), "gray70", 
                   ifelse(nd$id %in% c("1", "2", "5", "9", "15"), "white", "gray92"))  

nd$shape <- NULL
nd$shape = ifelse(nd$id %in% c("1", "10", "11", "12", "13", "14","15", "16"), "square", 
                  ifelse(nd$id %in% c("2",  "9", "3", "7", "6", "8") , "diamond",
                         "circle"))

For example, this code says that all endings should be shaded dark gray (gray70), the “expert” route should be shaded white, and all others should be shaded like gray (gray92) Similarly, for the shapes, all endings should be shaped as a square, all hydration choices should be shaped as a diamond, and everything else (eg the questions) should be shaped as a circle.


III. Adding colors and weights to edges

To set up the edges, I created a table called “edge_list”. This used information from the “data” sheet, including path information and the number of users who traversed that path.

colnames(data)[colnames(data)=="node1"] <- "from"
colnames(data)[colnames(data)=="node2"] <- "to"
colnames(data)[colnames(data)=="val2"] <- "LivePercent"

edge_list <- select(data, to, from, LivePercent) 
edge_list$style <- "solid"
edge_list$color <- NULL
edge_list$color <- ifelse((edge_list$LivePercent != 0 & edge_list$to == 5 & edge_list$from == 3) |
                            (edge_list$LivePercent != 0 & edge_list$to == 5 & edge_list$from == 6) |
                            (edge_list$LivePercent != 0 & edge_list$to == 5 & edge_list$from == 7) |
                            (edge_list$LivePercent != 0 & edge_list$to == 15 & edge_list$from == 8)
                          , "turquoise4", 
                          ifelse((edge_list$LivePercent != 0 & edge_list$to == 3 & edge_list$from == 1) |
                                   (edge_list$LivePercent != 0 & edge_list$to == 4 & edge_list$from == 2) |
                                   (edge_list$LivePercent != 0 & edge_list$to == 4 & edge_list$from == 3) |
                                   (edge_list$LivePercent != 0 & edge_list$to == 8 & edge_list$from == 5) 
                                 , "darkorange3", 
                                 ifelse((edge_list$LivePercent != 0 & edge_list$to == 6 & edge_list$from == 4) |
                                          (edge_list$LivePercent != 0 & edge_list$to == 7 & edge_list$from == 4) 
                                        , "dodgerblue4", 
                                        ifelse((edge_list$LivePercent != 0 & edge_list$to == 10 & edge_list$from == 2) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 11 & edge_list$from == 2) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 12 & edge_list$from == 2) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 13 & edge_list$from == 2) |
                                                 
                                                 (edge_list$LivePercent != 0 & edge_list$to == 10 & edge_list$from == 3) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 11 & edge_list$from == 3) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 12 & edge_list$from == 3) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 13 & edge_list$from == 3) |
                                                 
                                                 (edge_list$LivePercent != 0 & edge_list$to == 10 & edge_list$from == 6) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 11 & edge_list$from == 6) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 12 & edge_list$from == 6) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 13 & edge_list$from == 6) |
                                                 
                                                 (edge_list$LivePercent != 0 & edge_list$to == 10 & edge_list$from == 7) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 11 & edge_list$from == 7) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 12 & edge_list$from == 7) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 13 & edge_list$from == 7) |
                                                 
                                                 (edge_list$LivePercent != 0 & edge_list$to == 10 & edge_list$from == 8) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 13 & edge_list$from == 8) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 14 & edge_list$from == 8) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 16 & edge_list$from == 8) |
                                                 
                                                 (edge_list$LivePercent != 0 & edge_list$to == 10 & edge_list$from == 9) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 13 & edge_list$from == 9) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 14 & edge_list$from == 9) |
                                                 (edge_list$LivePercent != 0 & edge_list$to == 16 & edge_list$from == 9) 
                                               , "deeppink4", 
                                               
                                               ifelse((edge_list$LivePercent != 0), "black",   
                                                      
                                                      "gray89")))))

edge_list$penwidth <- 8*(edge_list$LivePercent+.15)


IV. Setting the node data frame

The node data frame takes information from the nd table to set parameters. Documentation within DiagrammeR lists all of the options. Here, the type, label, shape, and fill color are all set from the “nd” table. The fontsize, penwidth, color, fontcolor, and fontname are all set within this code, but this is accomplished using information from the “nd” table. For example, this code is how I made the correct pathing have a bolder outline (penwidth) and bolder text (fontname). Rank is used to specify which nodes should be on which row in the map. I chose to have the expert path displayed on the top row with the various ending on the bottom row.

nodes_df <- create_node_df(n = nrow(nd),
                           type = nd$label,
                           label = nd$label,
                           shape = nd$shape,
                           fillcolor = nd$color,
                           penwidth = ifelse(nd$id %in% c("1", "2", "5", "9", "15"), 3, 1),
                           
                           style = "filled, solid",
                           
                           fontsize = ifelse(nd$id %in% c("1", "2", "5", "9"), 18, #correct path
                                             ifelse(nd$id %in% c("3", "7", "8", "6"), 12, 12)), #hydrations #endings
                           
                           color = ifelse(nd$id %in% c("10", "11", "12", "13", "14", "15", "16"), "gray22", 
                                          ifelse(nd$id %in% c("1", "2", "5", "9"), "black", "gray37")),
                       
                           fixedsize = FALSE,
                           fontcolor = ifelse(nd$id %in% c("10", "11", "12", "13", "14", "15", "16"), "black", 
                                                      ifelse(nd$id %in% c("1", "2", "5", "9"), "black", "gray37")),
                           fontname = ifelse(nd$id %in% c("1", "5"), "Helvetica-Bold", "Helvetica"),
                           rank = ifelse(nd$id %in% c("10", "11", "12", "13", "14", "16"), "1", 
                               ifelse(nd$id %in% c("1", "2", "5", "9", "15"), "2", "3"))


V. Setting the edge data frame

The edges data frame can be set entirely through the edge_list table.

edges_df <- create_edge_df(to = edge_list$to,
                           from = edge_list$from,
                           style = edge_list$style,
                           color = edge_list$color,
                           penwidth = edge_list$penwidth,
                           rel = edge_list$to)

Putting it all together

To create a graph with DiagrammeR, create_graph is used. This combines the nodes and edges data frame. Other attributes can be specified as well. For example, I specified that my layout should be in the dot configuration, the direction should be top to bottom, and the splines should both in the ortho alignment. This is what gives the subway map style look.

Total <-
  create_graph(
    nodes_df = nodes_df,
    edges_df = edges_df) %>%
  add_global_graph_attrs(
    attr = c("layout","ranksep", "rankdir", "splines"),
    value = c("dot","0.5", "TB", "ortho"),
    attr_type = c("graph","graph", "graph", "graph"))
#spline: ortho = subway, line = diagonal

The graph is then rendered and labeled.

Total %>% render_graph(title = "MTX Scenario IV_levels")

To export the graph:

export_graph(Total,file_name = "ScenIV_MS_MTX.svg",file_type = "svg")


Calculations


I. Expert pathing

The spreadsheet of user behavior can be used to determine how many users followed different paths. The example here will be for calculating the number of users who remained on the expert path (top line of the map). While I did use the MTX_data1 spreadsheet for this, I modified it slightly and renamed it as MTX_Correct:

MTX_Correct <- MTX_data1

To facilitate proper counting of the hydration choices, these were changed from Yes/No to 1/0:

MTX_Correct[MTX_Correct == "no"] <- "0"
MTX_Correct[MTX_Correct == "yes"] <- "1"

To create a table of the number of expert users, the following code was used:

z1 <- count(MTX_Correct %>% filter(Q1 == "1"))
z2 <- count(MTX_Correct %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
z3 <- count(MTX_Correct %>% filter(Q3 == "3") %>% filter(is.na(Q2)) %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
z4 <- count(MTX_Correct %>% filter(Q3fluid == "0")%>% filter(Q3 == "3") %>% filter(is.na(Q2)) %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))
z5 <- count(MTX_Correct %>% filter(End9 == "9")%>% filter(Q3fluid == "0")%>% filter(Q3 == "3") %>% filter(is.na(Q2)) %>% filter(Q1fluid == "1") %>% filter(Q1 == "1"))

expert.table <-rbind(z1, z2, z3, z4, z5)
expert.table$ID <- c("START", "Q1fluid","Q3", 
                      "Q3fluid", "FINAL")
expert.table <- expert.table[,c(2,1)]
expert.table$percent <- expert.table$n/users*100

While there is probably an easier way to do this (eg by creating a loop), I honestly could not figure out how to do it so I just brute forced it by copying code from the “From survey data to map data” section.


II. Nodes touched

To calculate how many nodes each user touched, I modified the MTX_data1 sheet so that the fluid answers were uniquely labeled. This is probably something I should have done earlier, but here we are.

MTX_nodes <- MTX_data1
  MTX_nodes$Q1fluid[MTX_nodes$Q1fluid == "no"] <- "Q1n"
  MTX_nodes$Q1fluid[MTX_nodes$Q1fluid == "yes"] <- "Q1y"
  MTX_nodes$Q2fluid[MTX_nodes$Q2fluid == "no"] <- "Q2n"
  MTX_nodes$Q2fluid[MTX_nodes$Q2fluid == "yes"] <- "Q2y"
  MTX_nodes$Q3fluid[MTX_nodes$Q3fluid == "no"] <- "Q3n"
  MTX_nodes$Q3fluid[MTX_nodes$Q3fluid == "yes"] <- "Q3y"

To calculate the number of nodes touched for the entire cohort:

nodes.touched<-nrow(table(unlist(MTX_nodes[,2:14])))

To calculated the number that each individual touched, I added a column to the the sheet that any entry within the sheet that was not NA:

MTX_nodes$touched <- apply(MTX_nodes[,2:14], 1, function(x)length(unique(na.omit(x))))