The data for the Narrative Tree Complexity Metric was collected
through a web app built with the python web application
flask. It is hosted on the heroku cloud application
platform and can be found here. https://ctree-postgres.herokuapp.com/ .
A group of research participants were selected for their involvement in the interactive digital narrative field and were specifically requested to choose the tree that they believe was ‘more structurally complex.’ Because complexity is understood in a multitude of ways, a brief synopsis of what I mean by narrative tree structure complexity (as detailed in the literature review) was given. Other than that, no direction was given on how to determine which tree was more structurally complex. This is also detailed again in the “learn more about this app” page on the website.
When the participant selects which tree they intuit as more complex and submit, the winner and loser are recorded to a SQL database hosted on heroku, and presented with a new random pairing of trees.
After a sufficiently large sample of ratings, the data was pulled down from the server and was in the following form.
## # A tibble: 6 x 3
## id w l
## <dbl> <chr> <chr>
## 1 1 117.PNG 132.PNG
## 2 2 63.PNG 6.PNG
## 3 3 111.PNG 130.PNG
## 4 4 132.PNG 107.PNG
## 5 5 165.PNG 130.PNG
## 6 6 165.PNG 75.PNG
The ‘winner’ is the tree that was selected to be more complex, and the ‘loser’ is the tree that was not selected.
These visualizations of the narrative trees are generated in the sherlock web application for interactive narrative design which is a react.js app. The visualizations are created from their design files.
These files take the following form:
unchanged image
The structure of the file and delimiters determine the structure of the tree.
For example: * ‘#’ denotes a new node * ‘[[]]’ denotes a branch, and its destination
These features were processed in python to calculate the following features of an individual tree:
These features were then processed automatically for all 300 trees in the original dataset and had the following output form:
## # A tibble: 6 x 9
## name nodes branches leaves nln choice_nodes max_path avg_path
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ./DesignTurns/017-~ 17 16 5 12 4 8 7.4
## 2 ./DesignTurns/056-~ 19 31 1 18 5 6 6
## 3 ./DesignTurns/018-~ 3 2 2 1 1 2 2
## 4 ./DesignTurns/160-~ 15 9 9 6 1 4 4
## 5 ./DesignTurns/122-~ 7 4 2 5 0 0 0
## 6 ./DesignTurns/157-~ 23 23 5 18 5 16 10
## # ... with 1 more variable: recursive_branches <dbl>
These trees were then randomly sampled, and the pictures for the data collection phase were then collected.
After both the tree-file scraping and the complexity intuition rating stages of the project were completed, the rating events were then processed into an ELO score for every tree that was rated. The rationale for using an ELO score as a complexity metric is detailed in the literature review section. These scores were then joined to the features for analysis by a unique index That was tracked throughout this process.
The ELO scores post processing took the following form:
## # A tibble: 6 x 2
## keyName value
## <chr> <dbl>
## 1 165.PNG 1694
## 2 153.PNG 1592
## 3 35.PNG 1583
## 4 115.PNG 1566
## 5 123.PNG 1547
## 6 171.PNG 1393
Intuitively the scores seem to track complexity fairly well. Here are the visualizations for one of the highest rated, and one of the lowest rated narrative tree structures:
elo score: 1694
elo score: 365
As detailed in the subsection dataset size below, the
size of this data was particularly small. This combined with the fact
that the model needed to be comprehensible and used analytically for
researchers in the IDN space led me towards some form of regression. In
the discussion section I will dig deeper into the
possibility of launching a web tool that would allow researchers to
upload twine files and have the features and complexity calculated
automatically, but I would like to have an analytical form that a
researcher could plug into a column in excel. This led me to focus on
some form of linear regression.
The pair plot above shows that while a lot of the variables are positively correlated with the complexity metric ‘value’, they are also highly correlated with each other. Reducing the dimensionality of the data will both remove highly correlated predictors and follow the rule of thumb that there should be at least 10 observations per predictor in a linear regression.
Looking at the pairs plots of the data, it seems that all of our predictors have a right skew, and there are two datapoints in almost every pairs plots that are extreme outliers.
## value index nodes branches leaves nln choice_nodes max_path avg_path
## 1 1694 165 43 61 3 40 19 32 24.03793
## 2 1393 171 50 56 6 44 11 17 13.27778
## recursive_branches is.outlier is.extreme
## 1 639 TRUE FALSE
## 2 3 TRUE FALSE
Tree 165 and 171 appear as outliers in most of the predictor columns, yet due to the way the ELO is calculated they are not outliers in terms of ELO. These datapoints could apply undue leverage on a predictive model without being accounted for.
##
## Call:
## lm(formula = value ~ log10(nodes), data = temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -428.74 -96.16 16.26 125.30 275.70
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 84.53 80.06 1.056 0.297
## log10(nodes) 892.45 74.21 12.026 1.18e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 170 on 45 degrees of freedom
## Multiple R-squared: 0.7627, Adjusted R-squared: 0.7574
## F-statistic: 144.6 on 1 and 45 DF, p-value: 1.184e-15
According to a simple linear model, 75% of the variance in the complexity metric alone can be accounted for by a transformation of its most predictive variable.
Recursion is a well established element in complexity metrics across fields. Lets look at the number of recursive branches vs the complexity metric
The two outliers in recursive_branches are clearly
calculation errors. The method in which recursion is calculated will be
detailed in its own section.
Recursivity could be included as a dummy variable for whether or not a tree contains a recursive element
There seems to be some difference between the distributions of the complexity scores, so this might be worth including in our analysis.
## Recipe
##
## Inputs:
##
## role #variables
## id 1
## outcome 1
## predictor 8
##
## Training data contained 47 data points and no missing data.
##
## Operations:
##
## Centering and scaling for nodes, branches, leaves, nln, choice_nodes, max... [trained]
## PCA extraction with nodes, branches, leaves, nln, choice_nodes, max_... [trained]
percent variation explained by each PC
Over 70% of the variance can be explained by the first principal
component. Recall that from our simple regression with the log of the
predictor “node” we were able to get an R^2 of over 75%. The remaining
several principle components do explain a decent amount of the variation
left in the data however.
Note that the predictors in PC1 have almost identical loadings, This could be due to the high colinearity between the predictors.
This plot can be intuited by the two largest trees (most nodes) being highly negative, and the most positive being the smallest. PC2 we see highly negative values from trees with recursion and low leaf counts (compared to their size) and positive values without recursion and high relative leaf counts. An interesting example here is an incoherent tree (175) where the author included a bunch of extra nodes unconnected to the tree itself, which the scraping algorithm will identify as a leaf.
We will most likely not model off of the PCA, because the goal of this project is intelligibility and ease of use by researchers, both of which PCA hinders.
The first step in the data collection process was to programatically
calculate the features of the narrative tree structures. This was done
in python by creating a Tree class object and an internal
Node class object, then iterating through the narrative
tree text files and scraping the relevant information. The code can be
found here.
One of the more complex processes was to calculate a trees “path
length.” This is a method inside of the Tree object called
get_longest_path. Once the nodes are defined inside a
Tree object, get_longest_path will iterate
through the nodes and compile a list of lists recording the iterators
previous node locations. If there is a ‘choice node’, or node with more
than one branch. The iterator will copy the list of previous
destinations as a new element in the list of lists. Once the iterator
lands in a node that has no branches out (leaf) it will record the
length of its list in a 3rd list. If the iterator follows a branch that
returns to a node it has previously visited, it will delete the path and
increment recursive_branches for the Tree
object. because The path lists are deleted once the iterator reaches a
leaf or a recursive branch, the process will complete when the main list
is empty.
The result of this process is a list of all of the possible paths and their lengths. From this list the maximum path length and the average path length are calculated.
The Tree object is initialized with a narrative tree
text file output from the sherlock application and has a method
called features() that will output the relevant information
about the tree in a list.
These lists can then be compiled into a dataframe.
One of the outputs from the Tree object was a boolean
called incoherent. Sherlock does not require the narratives
to be functional, so the Tree object tested for things like
branches that do not lead to nodes. After some sorting for coherence, 50
trees were randomly selected for analysis in the flask app. The reason
why a sample of 50 trees were chosen will be discussed in the
ELO section of Data Collection Details.
The application that the experts used to compare the narrative tree
structurs was developed with a flask framework in python.
This application (linked above) is hosted on the heroku
platform. The github for this application can be found here.
The basic operation of the application is to randomly select two non-identical trees from the static files and display them on screen. Once a selection is made, the server will write the relevant data to a postgres SQL server also hosted on heroku, and two new trees will be presented.
As will be detailed in the ELO section, the dataset was severely limited by the fact that the number of ratings necessary to get a particular tree’s ELO to stabilize is fairly high. A potential fix for this is to actively calculate the ELO for a tree on the heroku server and store this information along with the ratings. After one random tree is selected, the current ELO for all trees could then be filtered so a partner with a similar current ELO could be compared. This would cause winners to be compared with winners and vice versa.
ELO is calculated sequentially where every new ‘player’ (in our case narrative tree) competes against another player. Each player’s initial ELO (1000) is updated upward or downward by k depending if it wins the competition. the k value (set to the default 100) is penalized depending on the initial difference between the two competitors ELO. A high ELO winner against a low ELO loaser the k will be very small, but if the inverse occurs the k will be high.
This is a plot of the changing ELO over the course of the study
unchanged image
This study has a hierarchy stability rating of .98 over 1500 paired rating events. The number of possible players creates \(\frac{n(n-1)}{2}\) possible combinations of trees.
The exponential growth of the possible combinations made using more than 50 trees in the study (in order to complete the project within the deadline) with the random pairing method of comparison.
The average number of ratings per picture when using a purely random selection method was 63.5 with a standard deviation of 22.9.
If heirarchy stability was created with 63 ratings per tree, a new method that incorporated ratings where competitors only came from local ELO neighbors and the random selection was prioritized for trees with low rating counts, a much higher total number of trees could be analyzed with far less comparisons.