Abstract

The origin of this project was a request when a research team led by Dr. Colette Daiute at the CUNY Graduate Center requested an upgrade to their Interactive Digital Narrative (IDN) Complexity Metric.

The purpose of this project is to create a robust IDN Tree Complexity Metric that well describes the nature of the shape of an IDN Narrative Tree and quantifies which dimensions are most predictive of its complexity through a series of cascading deliverables.

Methodology

In the Literature Review section I will:

Introduce key elements of Narrative Theory and Complexity Theory
Defend why complexity is the best lens to view IDN Tree structure through
Outline methods for collecting and aggregating expert opinion on IDN Tree Complexity

In the Data Collection and Analysis section I will:

Discuss the flask app built for data collection
Explore the creation of ELO rankings for Complexity and how they were built from pairwise comparison in the Data Collection phase
Investigate the characteristics of the predictors and their relation to the tree complexity
Outline some initial hurdles in creating the IDN Tree Complexity Metric

In the Modeling section I will:

Build relevant candidtate predictive models and analyze their effectiveness to predict expert opinion on IDN Tree Complexity moving forward

In the Conclusions and Next Steps section I will:

Discuss the use of this IDN Tree Complexity Metric in research
Outline some future changes that could be made to improve the Data Collection Process

Literature Review

This literature review will support the goal of creating a more robust Interactive Digital Narrative Tree Complexity Metric for the field of Interactive Digital Narrative Design.

To build a case, I will examine:

Narrative Theory
Complexity Theory
Measuring Complexity
Pairwise Comparison
Use of Complexity in IDN
Current state of tree-structure complexity metrics in IDN
Data Collection
Model Building

Narrative Theory:

Interactive Digital Narrative (IDN) involves the creation and use of digital tools toward multi-authored narratives (Koenitz 2015). An example would be a story based video game. IDN Theory differs from Narrative Theory because of the branching nature of an IDN tool. The IDN designer creates potential branching narratives and procedural mechanics so the player participates in the story creation. In a book, you read from the first page to the last, the author of the book has laid out one path through the narrative, but in a narrative video game, the player makes choices that co-create one or more versions of the narrative.

Although the field is now discussing a wide variety of interactive mechanics(Kreminski and Wardrip-Fruin 2018), the most basic IDN pattern is a decision tree structure. Because the data for this project is with beginning IDN designers, the quantification of the tree structure and its relation to other features of the IDN design and process is of foundational importance(C. Daiute, Cox, and Murray 2021).

That said, measuring the basic IDN tree structure has remained elusive.

Complexity Theory

Complexity evades a formal definition, as each person has their own intuitive definition of what is complex and what is not. Even among scientists this holds true,

“complex systems … involve many (relatively) simple individual elements interacting locally with one another”(Polančič and Cegnar 2017)

Narrative tree structures are made up of the most basic component parts of branches and nodes(Nelson, n.d.). The nodes are where the story elements are contained, such as characters actions and objects. The branches are links to subsequent or recursive story nodes. In order to be engaging IDNs must provide rich enough opportunities to be engaging for the player.

Complexity Depends on the Observer

“Only when observations are made, as produced by an acquisition model, is when the question of complexity becomes relevant: after the observer’s model is incorporated.”

Emergent levels of complexity

If the complexity of the most basic IDN can be quantified, then this could lead to the quantification of higher level IDNs in the future. Funes suggests “interactions at a lower level of organization (e.g. subatomic particles) result in higher levels with aggregate rules of their own”(Funes 2001)

Measuring Complexity

There is an interesting history of measuring Complexity by comparison. One particular relevant study took place in 1982 by Dr. MacEachren entitled “Map Complexity: Comparison and Measurement.” The initial procedure was to determine the subjective complexity of chloropleth and isopleth test maps , to determine which form was more complex. This was accomplished by generating a psychological complexity scale to which physical complexity measures for each map could subsequently be compared.(MacEachren 1982)

In the study “Map Complexity: Comparison and Measurement” MacEachren sought to quantify the consensus that chloropleth maps were more complex than isopleth maps. Isopleth maps define features by densities, while chloropleth maps define them by boundaries(Słomska-Przech K. 2021) The procedure was to obtain the subjective complexity of the maps by participants by having them visually compare and rate the complexity of the different types of maps. This data was used to create a “psychological complexity scale,” which was then compared to physical features of each type of map. In essence they had participants select the more complex map, which they then generated a scalar metric from, and then modeled this metric from the features of the maps.

Pairwise Comparison

At the heart of this project is the need for a ranked list of narrative tree structures to produce the scalar element of complexity as referred to in MacEachren ’s study. Thurstone’s Law of Comparative Judgement implies that there is evidence that comparing alternatives pairwise versus ranking all alternatives or selecting one from the set of alternatives is more effective (for humans)(Fürnkranz and Hüllermeier 2011). Pairwise comparison has been leveraged for machine learning applications with the Learning by pairwise comparison (LPC) paradigm(Fürnkranz and Hüllermeier 2011), which will be addressed in the Data and Modeling sections.

The method of Pairwise Comparison I am proposing is as follows:

compare each two candidates head to head
award each candidate 1 point for each head-to-head victory

This data can be aggregated in many different ways, such as a win/loss metric or not aggregated at all, with the raw data used to train specific kinds of models. These pairwise “points” can also be aggregated into an ELO ranking of heirarchy.

Use of Complexity in Interactive Digital Narratives

The term complexity is widely used in the IDN space. In fact there is a group called the The INDCOR project (Interactive Narrative Design for Complexity Representations). They are referring to the complexity of the “space” in which IDNs exist socially. They work at the abstract level of the ideal IDN, but do not look empirically(Perkis, n.d.)

‘A Complexity Analysis Matrix for Narrative Userly Texts’ by Noam Knoller, Christian Roth and Dennis Haak develops a method for measuring the complexity of the IDN(Knoller, Roth, and Haak 2021).

This research adapts the ‘Learning Progression Model’ (Yoon, Goh, and Yang 2019)

breaks complexity of a system down into 6 complex system ideas.

Scaling Effects
Networked interactions(non-linearity, interdependence, emergent patterns)
Multiple causes
Dynamics
(Decentralized) order
non-determination

This project applies Yoon’s work to create a complexity vector space. They then manually code these vectors for the IDN, and a users opinion on the completed IDN. They then measure the euclidean distance between these vectors to determine how much the student understands the system.

This is a metric for the overall complexity of all the aspects of a completed IDN, that can transcend the dimensionality of the story.

What I am focusing on is not the overall complexity of the completed narrative, but strictly a 2 dimensional Tree Structure Complexity of the proto-story elements. This basic structural analysis (nodes and branches) and the patterns of node and branch sections, not the story text. If the narrative tree structure complexity is robust, it provides an independent measure of basic structure to correlate with other elements, such as analysis of the narrative text, and designer/player interaction process.

In addition, this basic metric could provide quantifiable measures of structural change during the IDN design process, which is especially relevant to beginning designers.

Current state of tree structure complexity metrics in IDN Design process

In one study with beginning IDN designers, researchers used a connection-density measure adapted from neural connectivity research(Colette Daiute, Duncan, and Marchenko 2018 ; Rubinov and Sporns 2010).

They computed branches divided by nodes for a measure of connection density. The quantitative metric did not correlate well with their qualitiative Narrative analysis, which is why they are attempting to update the metric. In addition this measure was further updated(C. Daiute, Cox, and Murray 2021) to address logical flaws. The connection density metric could be manipulated by special cases of tree structures so the metric was incoherent. An example would be a single node with 4 branches would have a higher connection density than an infinitely long tree structure with 2 branches per node. A version of the connection density metric created to correct these issues was created(C. Daiute, Cox, and Murray 2021) based loosley on fractal dimensionality, with length and maximum pathway variables included, but additional kinds of logical incoherence could be created with special cases of tree structures.

The identification of these prior measurement issues motivated the rationale, measure and design for this project. When two tree structures are drawn, the tree struture complexity metric should mirror the expert’s intuition on which of the trees is more complex. Pairwise ranking of a sufficiently large dataset of narrative tree structures leads to the computation of a psychological complexity continuum. This continuous variable can then be modeled and the model can be used to compute the intuition of an expert on trees outside of the training dataset.

Data Collection

The sample of the data will be fairly small due to the restrictions of the number of research participants in the workshop. There are an expected 50-100 trees to be collected during the data collection phase, and their features will be extracted through a text scraping algorithm.

METHOD:

The data will be collected from a set of IDN experts opinions on the tree-structure complexity through a tree-structure complexity game using segmented training data set from a research workshop. The results will be stored on an SQL server, recording each rating event. The event variables retained will be:

ID of the winning tree
ID of the losing tree
ID of the rater (anonymous)

The reason this data will not be aggregated at the time is to potentially take advantage of the EloChoice algorithm that will be discussed in the modeling section.

The results of the expert survey will be aggregated in two ways:

winrate metric by aggregating total wins and total losses for each tree
ELO rating will be generated from the data using the eloChoice algorithm

ELO is a dominance rank ordination method. In brief, ELO looks at competitive matchup (think two chess players playing a match) and will assign points (positive to the winner, negative to the loser.) The amount of points assigned depends on the models pre-match assumptions of who is going to win. competitors with ELO in the same range, will have a relatively similar amount of positive and negative points assigned, while a match with a large difference in ELO will assign small values if the pre-match assumption is met, or a large amount of points assigned if it is upset(Albers and Vries 2001).

The winrate metric does not take into account any pre-match assumptions, but will be included in the exploratory stage as a comparison with ELO.

Model Building

Linear regression and ensemble methods will be performed. Most likely \(R^2\) and \(AIC_c\) will be used to compare models, because the sample sizes will be relatively small. \(AIC_c\) is a version of Akaike’s Information Criteria that has a correction for small sample sizes(Hurvich and Tsai 1989).

Despite the relative performance of the models, because the tree-structure complexity metric is being created for use and interpretation by researchers in the field, a version of the linear model will be produced that will be maximized for intelligibility.

Data Collection and Analysis

Overview of the Data Collection Process

The data for the Narrative Tree Complexity Metric was collected through a web app built with the Python web application flask. It is hosted on the heroku cloud application platform and can be found here. https://ctree-postgres.herokuapp.com/ .

Research participants were selected for their involvement in the IDN field and were specifically requested to choose the tree that they believe was ‘more structurally complex.’ Because complexity is understood in a multitude of ways, a brief synopsis of what I mean by narrative tree structure complexity (as detailed in the literature review) was given. Other than that, no direction was provided on how to determine which tree was more structurally complex. This is also detailed again in the “learn more about this app” page on the website.

When the participant selects which tree they intuit as more complex and submit their choice, the winner and loser are recorded to a SQL database hosted on heroku. The participant is then presented with a new random pairing of trees.

After a sufficiently large sample of ratings, the data was pulled down from the server and was in the following form.

## # A tibble: 6 x 3
##      id w       l      
##   <dbl> <chr>   <chr>  
## 1     1 117.PNG 132.PNG
## 2     2 63.PNG  6.PNG  
## 3     3 111.PNG 130.PNG
## 4     4 132.PNG 107.PNG
## 5     5 165.PNG 130.PNG
## 6     6 165.PNG 75.PNG

The ‘winner’ is the tree that was selected to be more complex, and the ‘loser’ is the tree that was not selected.

These visualizations of the narrative trees had been generated in the sherlock web application for an interactive narrative design study. The visualizations were created from their design files.

These files take the following form:

The structure of the file and delimiters determine the structure of the tree.

For example: * ‘#’ denotes a new node * ‘[[]]’ denotes a branch, and its destination

These features were processed in Python to calculate the following variables of an individual tree:

number of nodes
number of branches
number of leaves (nodes with no posterior links)
number of non-leaf nodes (nodes minus leaves)
number of choice nodes (nodes with multiple exit branches)
maximum path length (path length is defined as the number of nodes connected to each other reaching from the origin to a leaf, and then the maximum is taken)
average path length (the average of all of the possible path’s lengths)
recursive branches (branches in a path that link back to a node that has already been traveled to)

These features were then processed automatically for all 300 trees in the original dataset and had the following output form:

## # A tibble: 6 x 9
##   name                nodes branches leaves   nln choice_nodes max_path avg_path
##   <chr>               <dbl>    <dbl>  <dbl> <dbl>        <dbl>    <dbl>    <dbl>
## 1 ./DesignTurns/017-~    17       16      5    12            4        8      7.4
## 2 ./DesignTurns/056-~    19       31      1    18            5        6      6  
## 3 ./DesignTurns/018-~     3        2      2     1            1        2      2  
## 4 ./DesignTurns/160-~    15        9      9     6            1        4      4  
## 5 ./DesignTurns/122-~     7        4      2     5            0        0      0  
## 6 ./DesignTurns/157-~    23       23      5    18            5       16     10  
## # ... with 1 more variable: recursive_branches <dbl>

Github for the tree scraping algorithm can be found here: https://github.com/JackJosephWright/tree_project/tree/main/regex_stuff

These trees were then randomly sampled, and the pictures for the data collection phase were then collected.

After both the tree-file scraping and the complexity intuition rating stages of the project were completed, the rating events were then processed into an ELO score for every tree that was rated. The rationale for using an ELO score as a complexity metric is detailed in the literature review section. These scores were then joined to the features for analysis by a unique index That was tracked throughout this process.

The ELO scores post-processing took the following form:

## # A tibble: 6 x 2
##   keyName value
##   <chr>   <dbl>
## 1 165.PNG  1694
## 2 153.PNG  1592
## 3 35.PNG   1583
## 4 115.PNG  1566
## 5 123.PNG  1547
## 6 171.PNG  1393

Intuitively the scores seem to track complexity fairly well. Here are the visualizations for one of the highest rated, and one of the lowest rated narrative tree structures:

elo score: 1694

elo score: 365

Overview of the Data

As detailed in the subsection dataset size below, the size of this data was particularly small. The size of the data combined with the fact that the model needed to be comprehensible and used analytically for researchers in the IDN space led me towards some form of regression or interpretable tree structure. In the discussion section I will dig deeper into the possibility of launching a web tool that would allow researchers to upload Twine files and have the features and complexity calculated automatically. I would, however, like to have an analytical form that a researcher could plug into a column in Excel. This led me to focus on some form of linear regression.

The pair plot above shows that while a lot of the variables are positively correlated with the complexity metric ‘value,’ they are also highly correlated with each other. Reducing the dimensionality of the data will both remove highly correlated predictors and follow the rule of thumb that there should be at least 10 observations per predictor in a linear regression.

Looking at the pair plots of the data, it seems that all of our predictors have a right skew. In addition, there are two data points in almost every pair plot that are extreme outliers.

##   value index nodes branches leaves nln choice_nodes max_path avg_path
## 1  1694   165    43       61      3  40           19       32 24.03793
## 2  1393   171    50       56      6  44           11       17 13.27778
##   recursive_branches is.outlier is.extreme
## 1                639       TRUE      FALSE
## 2                  3       TRUE      FALSE

outlier data

Tree 165 and 171 appear as outliers in most of the predictor columns, yet due to the way the ELO is calculated they are not outliers in terms of ELO. These data points could apply undue leverage on a predictive model without being accounted for.

## 
## Call:
## lm(formula = value ~ log10(nodes), data = temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -428.74  -96.16   16.26  125.30  275.70 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     84.53      80.06   1.056    0.297    
## log10(nodes)   892.45      74.21  12.026 1.18e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 170 on 45 degrees of freedom
## Multiple R-squared:  0.7627, Adjusted R-squared:  0.7574 
## F-statistic: 144.6 on 1 and 45 DF,  p-value: 1.184e-15

According to a simple linear model, 75% of the variance in the complexity metric alone can be accounted for by a transformation of its most predictive variable.

Recursive Branches

Recursion is a well established element in complexity metrics across fields. Let’s look at the number of recursive branches vs the complexity metric.

The two outliers in recursive_branches are clearly calculation errors. The method in which recursion is calculated will be detailed in its own section.

Recursivity could be included as a dummy variable for whether or not a tree contains a recursive element.

This exploratory analysis indicates some difference between the distributions of the complexity scores, so this might be worth including in our analysis.

PCA

The following is the percent of variation explained by each PC.

Over 70% of the variance can be explained by the first principal component. Recall that from our simple regression with the log of the predictor “node” we were able to get an \(R^2\) of over .75. The remaining several principle components do explain a decent amount of the variation left in the data however.

Note that the predictors in PC1 have almost identical loadings. This could be due to the high co linearity among the predictors.

This plot can be intuited by the two largest trees (most nodes) being highly negative, and the most positive being the smallest along PC1. In PC2 we see highly negative values from trees with recursion and low leaf counts (compared to their size) and positive values in trees without recursion and high relative leaf counts. An interesting example here is an incoherent tree (175) where the author included a bunch of extra nodes unconnected to the tree itself, which the scraping algorithm will identify as leaves.

We will most likely not model off of the PCA, because the goal of this project is intelligibility and ease of use by researchers, both of which PCA hinders.

Data Collection Details

Collecting the Data

Collecting Tree Features

The first step in the data collection process was to programatically calculate the features of the narrative tree structures. This was done in Python by creating a Tree class object and an internal Node class object, then iterating through the narrative tree text files and scraping the relevant information. The code can be found here.

One of the more complex processes was to calculate a tree’s “path length.” This is a method inside of the Tree object called get_longest_path. Once the nodes are defined inside a Tree object, get_longest_path will iterate through the nodes and compile a list of lists recording the iterator’s previous node locations. If there is a ‘choice node,’ or node with more than one branch. The iterator will copy the list of previous destinations as a new element in the list of lists. Once the iterator lands in a node that has no branches out (leaf), it will record the length of its list in a 3rd list. If the iterator follows a branch that returns to a node it has previously visited, it will delete the path and increment recursive_branches for the Tree object. Because The path lists are deleted once the iterator reaches a leaf or a recursive branch, the process will complete when the main list is empty.

The result of this process is a list of all of the possible paths and their lengths. From this list the maximum path length and the average path length are calculated.

The Tree object is initialized with a narrative tree text file output from the sherlock application and has a method called features() that will output the relevant information about the tree in a list.

These lists can then be compiled into a data frame.

Selecting Narrative Trees for the Study

One of the outputs from the Tree object was a boolean called incoherent. Sherlock does not require the narratives to be functional, so the Tree object tested for things like branches that do not lead to nodes. After some sorting for coherence, 50 trees were randomly selected for analysis in the flask app. The reason why a sample of 50 trees was chosen will be discussed in the ELO section of Data Collection Details.

flask Application

The application that the experts used to compare the narrative tree structures was developed with a flask framework in Python. This application (linked above) is hosted on the heroku platform. The github for this application can be found here.

The basic operation of the application is to randomly select two non-identical trees from the static files and display them on screen. Once a selection is made, the server will write the relevant data to a postgres SQL server also hosted on heroku, and two new trees will be presented.

Possible Further Development

As will be detailed in the ELO section, the data set was severely limited by the fact that the number of ratings necessary to get a particular tree’s ELO to stabilize is fairly high. A potential fix for this is to actively calculate the ELO for a tree on the heroku server and store this information along with the ratings. After one random tree is selected, the current ELO for all trees could then be filtered so a partner with a similar current ELO could be compared. This would cause winners to be compared with winners and vice versa.

ELO and its Implications on the Study

ELO is calculated sequentially where every new ‘player’ (in our case narrative tree) competes against another player. Each player’s initial ELO (1000) is updated upward or downward by k depending if it wins the competition. The k value (set to the default 100) is penalized depending on the initial difference between the two competitors’ ELOs. For a high ELO winner against a low ELO loser, the k will be very small, but if the inverse occurs the k will be high.

This is a plot of the changing ELO over the course of the study.

unchanged image

This study has a hierarchy stability rating of .98 over 1500 paired rating events. The number of possible players creates \(\frac{n(n-1)}{2}\) possible combinations of trees.

The exponential growth of possible tree combinations made using more than 50 trees impossible due to the time constraints of the project.

Possible Further Development

The average number of ratings per picture when using a purely random selection method was 63.5 with a standard deviation of 22.9.

If heirarchy stability were created with 63 ratings per tree, a new method that incorporated ratings where competitors only came from local ELO neighbors and the random selection was prioritized for trees with low rating counts, a much higher total number of trees could be analyzed with far fewer comparisons.

Modeling

Taking into account our exploration of the data and modeling requirements as detailed above, the following model candidates were tested: * Simple Linear Regression * Penalized Regression * MARS Regression * Support Vector Machine Regression * Random Forest Regression * CART Decision Trees (Regression) * K-nearest Neighbors Regression

For this process, I used tidymodels modeling framework.

Data Processing

Because these different models prefer different types of preparations to the data, the models were tested with the following transformations.

normalized Data Candidate models were tuned and tested on out-of-bag observations for the following models:

SVM radial model
KNN model
Penalized Regression

Interacted Predictors Predictors were transformed with Orthogonal Polynomial Basis Functions and full interaction between predictors to try to account for the covariance:

Penalized Regression
KNN model

Unprocessed Data:

Candidate models were trained on the raw data with no transformation or scaling:

MARS Regression
Random Forest Model
Penalized Regression
Simple Linear Regression

Resampling Techniques

To account for the small data set, a .5 train/test split was used. Bootstrapping was selected for resampling during tuning to attempt to account for bias in the small training set. 1000 boostrap samples were created for tuning and training the models.

## # A workflow set/tibble: 7 x 4
##   wflow_id                  info             option    result    
##   <chr>                     <list>           <list>    <list>    
## 1 simple_MARS               <tibble [1 x 4]> <opts[0]> <list [0]>
## 2 simple_RF                 <tibble [1 x 4]> <opts[0]> <list [0]>
## 3 simple_linear_reg         <tibble [1 x 4]> <opts[0]> <list [0]>
## 4 normalized_SVM_radial     <tibble [1 x 4]> <opts[0]> <list [0]>
## 5 normalized_linear_reg     <tibble [1 x 4]> <opts[0]> <list [0]>
## 6 full_quad_linear_reg_poly <tibble [1 x 4]> <opts[0]> <list [0]>
## 7 full_quad_KNN_spec        <tibble [1 x 4]> <opts[0]> <list [0]>

Above is a table of models that will be tested and the different transformation strategies that will be employed.

Model Tuning

Since there are so many permutations of models to build, and the computing power for this project is fairly low, efficient grid search via racing with ANOVA models was used for tuning the models.

This process uses a repeated measure ANOVA model to quickly eliminate tuning parameter combinations that are unlikely to lead to the best results. This dramatically sped up the tuning process.

library(finetune)
conflicted::conflict_prefer("rescale", "psych")
race_ctrl<-
  control_race(
    save_pred = TRUE, 
    parallel_over = 'everything',
    save_workflow = TRUE
  )

    race_results<-
      all_workflows%>%
      workflow_map(
        'tune_race_anova',
        seed = 1503, 
        resamples = boots,
        grid = 10,
        control = race_ctrl,
        verbose = TRUE
      )
  

race_results

## # A workflow set/tibble: 7 x 4
##   wflow_id                  info             option    result   
##   <chr>                     <list>           <list>    <list>   
## 1 simple_MARS               <tibble [1 x 4]> <opts[3]> <race[+]>
## 2 simple_RF                 <tibble [1 x 4]> <opts[3]> <race[+]>
## 3 simple_linear_reg         <tibble [1 x 4]> <opts[3]> <race[+]>
## 4 normalized_SVM_radial     <tibble [1 x 4]> <opts[3]> <race[+]>
## 5 normalized_linear_reg     <tibble [1 x 4]> <opts[3]> <race[+]>
## 6 full_quad_linear_reg_poly <tibble [1 x 4]> <opts[3]> <race[+]>
## 7 full_quad_KNN_spec        <tibble [1 x 4]> <opts[3]> <race[+]>

Model Selection

25,000 models were tuned and trained, and the Random Forest Regression performed the best on its out-of-bag samples in the test set in both root mean squared error and adjusted r squared. Unfortunately the simple linear and penalized regression models performed poorly, despite a strong linear relationship between the predictors and the response variable. This is likely due to the small sample sizes in our data set, and the fact that our modeling process is correctly discounting these due to the high variance between bootstraps.

## # A tibble: 124 x 9
##    wflow_id    .config   preproc model .metric .estimator    mean     n  std_err
##    <chr>       <chr>     <chr>   <chr> <chr>   <chr>        <dbl> <int>    <dbl>
##  1 simple_MARS Preproce~ recipe  mars  rmse    standard   575.       13 216.    
##  2 simple_MARS Preproce~ recipe  mars  rsq     standard     0.390    13   0.0733
##  3 simple_MARS Preproce~ recipe  mars  rmse    standard   318.      101  15.4   
##  4 simple_MARS Preproce~ recipe  mars  rsq     standard     0.507   101   0.0235
##  5 simple_RF   Preproce~ recipe  rand~ rmse    standard   308.        3  43.1   
##  6 simple_RF   Preproce~ recipe  rand~ rsq     standard   NaN         0  NA     
##  7 simple_RF   Preproce~ recipe  rand~ rmse    standard   251.        4  25.8   
##  8 simple_RF   Preproce~ recipe  rand~ rsq     standard     0.506     4   0.0976
##  9 simple_RF   Preproce~ recipe  rand~ rmse    standard   212.      101   3.75  
## 10 simple_RF   Preproce~ recipe  rand~ rsq     standard     0.637   101   0.0172
## # ... with 114 more rows

## # A tibble: 1 x 3
##    mtry min_n .config              
##   <int> <int> <chr>                
## 1     2    13 Preprocessor1_Model09

## # A tibble: 2 x 4
##   .metric .estimator .estimate .config             
##   <chr>   <chr>          <dbl> <chr>               
## 1 rmse    standard     202.    Preprocessor1_Model1
## 2 rsq     standard       0.751 Preprocessor1_Model1

Rebuild Random Forest, testing for variable importance

As shown above, the variable nodes is the most important to the model. This makes intuitive sense because nodes is a correlary for the size of the tree.

Conclusions and Next Steps

The goal of this project was to determine and justify how to quantify the IDN Tree Structure, and create a stable metric to relate statistically to other elements of the study.

In the Literature Review section, I explained that complexity is the appropriate frame in order to quantify tree structures, and that this can be modeled through expert opinion and pairwise comparison of those opinions.

In the Data Collection and Analysis section, we see the clear linear relationships between the tree structure features and the complexity score.

In the Modeling section we see that even on a small data set, accurate models are able to be built and future complexity scores can be estimated without putting unlabeled IDN Trees through the labor intensive pairwise comparison strategy.

Applications in Research

The IDN Tree Complexity Metric will complete the set of measures to assess users’ learning of the IDN design process. This metric will allow researchers to statistically inter-relate the following relevant measures: * Twine elements scraped from IDN design text files * IDN design narrative content analysis (characters, events and settings) in the nodes * Player reflections of peers’ designs (such as cognitive & emotional reactions, evaluations of the design, and suggestions for improvement) * IDN Tree Complexity Metric

This ability to quantify the complexity will be valuable for ongoing research and practice in the interactive digital narrative field. If the research finds that the IDN Tree Complexity is correlated to elements like game player appeal or the author’s understanding of the Twine interface, the ability to quantify the IDN Tree Structure is important among other factors of IDN design learning is an important indicator.

Next Steps

As detailed in the above sections, the data collection phase can be improved. A major improvement will be to rerun the expert opinion polling with a larger data set, thereby allowing our models to account for a larger percent of the variance without introducing undue bias. A larger data set might also allow us to use more interpretable models such as simple linear regression which researchers could use to calculate the IDN Tree Complexity Metric analytically as opposed to running their IDN Tree feature data through the selected Random Forest Model.

Bibliography

Albers, Paul CH, and Han de Vries. 2001. “Elo-Rating as a Tool in the Sequential Estimation of Dominance Strengths.” Animal Behaviour. Academic Press.

Daiute, C., D. Cox, and J. T. Murray. 2021. “Proc. Imagining the Other for Interactive Digital Narrative Design Learning in Real Time in Sherlock.” In Proc. 14th Interactive Storytelling, edited by R. Rouse, H. Koenitz, and M. Haahr, 454–61. Springer, Cham.

Daiute, Colette, Robert O. Duncan, and Fedor Marchenko. 2018. “Meta-Communication Between Designers and Players of Interactive Digital Narratives.” In Interactive Storytelling, edited by Rebecca Rouse, Hartmut Koenitz, and Mads Haahr, 134–42. Cham: Springer International Publishing.

Funes, Pablo. 2001. “Evolution of Complexity in Real-World Domains.” PhD thesis, Waltham, Massachusetts, USA: Brandeis University.

Fürnkranz, Johannes, and Eyke Hüllermeier. 2011. “Preference Learning and Ranking by Pairwise Comparison.” In Preference Learning, edited by Johannes Fürnkranz and Eyke Hüllermeier, 65–82. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-14125-6_4.

Hurvich, Clifford M, and Chih-Ling Tsai. 1989. “Regression and Time Series Model Selection in Small Samples.” Biometrika 76 (2): 297–307.

Knoller, Noam, Christian Roth, and Dennis Haak. 2021. “A Complexity Analysis Matrix for Narrative Userly Texts.” https://doi.org/10.13140/RG.2.2.16551.68004.

Koenitz, H. 2015. “Towards a Specific Theory of Interactive Digital Narrative.” In Interactive Digital Narrative, edited by M.Haahr H. Koenitz G. Ferri, 91–105. Routledge.

Kreminski, M., and N. Wardrip-Fruin. 2018. “Proc. Sketching a Map of the Storylets Design Space.” In Proc. 11th Interactive Storytelling, edited by Haahr M. Rouse R. Koenitz H, 160–64. Springer, Cham.

MacEachren, A. 1982. “Map Complexity: Comparison and Measurement.” The American Cartographer 9 (1): 31–46. https://doi.org/10.1559/152304082783948286.

Nelson, P. n.d.

Perkis, A. n.d.

Polančič, G., and B Cegnar. 2017. “Complexity Metrics for Process Models - a Systematic Literature Review.” Computer Standards & Interfaces 51: 104–17. https://doi.org/https://doi.org/10.1016/j.csi.2016.12.003.

Rubinov, Mikail, and Olaf Sporns. 2010. “Complex Network Measures of Brain Connectivity: Uses and Interpretations.” NeuroImage 52 (3): 1059–69. https://doi.org/https://doi.org/10.1016/j.neuroimage.2009.10.003.

Słomska-Przech K., Izabela Małgorzata. 2021. “Do Different Map Types Support Map Reading Equally? Comparing Choropleth, Graduated Symbols, and Isoline Maps for Map Use Tasks.” ISPRS International Journal of Geo-Information 10 (2). https://doi.org/10.3390/ijgi10020069.

Yoon, Susan, Sao-Ee Goh, and Zhitong Yang. 2019. “Toward a Learning Progression of Complex Systems Understanding.” Complicity: An International Journal of Complexity and Education 16 (1). https://doi.org/10.29173/cmplct29340.

IDN Tree Complexity

Jack Wright

5/10/2022

Abstract

Methodology

Literature Review

Narrative Theory:

Complexity Theory

Measuring Complexity

Pairwise Comparison

Use of Complexity in Interactive Digital Narratives

Current state of tree structure complexity metrics in IDN Design process

Data Collection

Model Building

Data Collection and Analysis

Overview of the Data Collection Process

Overview of the Data

Recursive Branches

PCA

Data Collection Details

Collecting the Data

Collecting Tree Features

Selecting Narrative Trees for the Study

flask Application

Possible Further Development

ELO and its Implications on the Study

Possible Further Development

Modeling

Data Processing

Resampling Techniques

Model Tuning

Model Selection

Rebuild Random Forest, testing for variable importance

Conclusions and Next Steps

Applications in Research

Next Steps

Bibliography