A. Summary

DC Comics has more than 6,500 characters since the introduction of the DC Universe. As much as they enjoy it, the DCU writers reserve the right to eliminate their own creations as the plot progresses over time. Consumers of the DC entertainment products need to be prepared mentally for the departure of their favorite superheroes, which leads to the problem statement: Is it possible to forecast the death of the DCU main characters in the next decade? Using the advanced analytical methodologies, supervised machine learning to be specific, we can predict the survival status of the main characters in DCU for the next 10 years based on the profile attributes, e.g. frequency of appearances, tenure, identity secrecy, villainous behavior, etc. The data is obtained from FiveThirtyEight Github, which originates from the DC Wikia web scrape on 8/24/2018: https://github.com/fivethirtyeight/data/tree/master/comic-characters.

Among Support Vector Machine (SVM), Logistic Regression (GLM), and K-Nearest Neighbors (KNN), GLM is proven to produce the most accurate model. The majority of the Justice League, the Bat family, and Batman’s key villains, including Batman, Superman, Nightwing, the Joker, and Two-Face will not survive as predicted by Polynomial SVM, Radial SVM, GLM, and KNN. The Linear SVM model suggests that all characters will survive another decade, while the KNN presents the deadliest reality in which only less than half of them survive.

B. Analysis

1. Data Preparation

library(caret)
library(kernlab)
library(stats)
library(factoextra)

1.1 Data Cleaning

Import data, remove unused attributes, convert Year of first appearance to Tenure, and filter out missing values.

df <- as.data.frame(read.csv('dc-wikia-data.csv', header = T))
df$YEAR <- 2018 - df$YEAR 
colnames(df)[12] <- 'TENURE'
df[,c('GSM', 'FIRST.APPEARANCE')] <- list(NULL)
str(df)
## 'data.frame':    6895 obs. of  11 variables:
##  $ page_id    : int  1422 23387 1458 1659 1576 1448 1486 1451 71760 1380 ...
##  $ name       : chr  "Batman (Bruce Wayne)" "Superman (Clark Kent)" "Green Lantern (Hal Jordan)" "James Gordon (New Earth)" ...
##  $ urlslug    : chr  "\\/wiki\\/Batman_(Bruce_Wayne)" "\\/wiki\\/Superman_(Clark_Kent)" "\\/wiki\\/Green_Lantern_(Hal_Jordan)" "\\/wiki\\/James_Gordon_(New_Earth)" ...
##  $ ID         : chr  "Secret Identity" "Secret Identity" "Secret Identity" "Public Identity" ...
##  $ ALIGN      : chr  "Good Characters" "Good Characters" "Good Characters" "Good Characters" ...
##  $ EYE        : chr  "Blue Eyes" "Blue Eyes" "Brown Eyes" "Brown Eyes" ...
##  $ HAIR       : chr  "Black Hair" "Black Hair" "Brown Hair" "White Hair" ...
##  $ SEX        : chr  "Male Characters" "Male Characters" "Male Characters" "Male Characters" ...
##  $ APPEARANCES: int  3093 2496 1565 1316 1237 1231 1121 1095 1075 1028 ...
##  $ TENURE     : num  79 32 59 31 78 77 77 29 49 62 ...
##  $ ALIVE      : chr  "Living Characters" "Living Characters" "Living Characters" "Living Characters" ...
df <- df[complete.cases(df),]

Store identification attributes to names and prediction attributes to profiles. Convert the response’s categorical data type to numeric type.

names <- as.data.frame(df[,1:3])
colnames(names)
## [1] "page_id" "name"    "urlslug"
df <- data.matrix(df)
profiles <- as.data.frame(df[,-seq(1,3)])
colnames(profiles)
## [1] "ID"          "ALIGN"       "EYE"         "HAIR"        "SEX"        
## [6] "APPEARANCES" "TENURE"      "ALIVE"
profiles$ALIVE <- as.factor(profiles$ALIVE)
# profiles[,c(seq(1:5),8)] <- lapply(profiles[,c(seq(1:5),8)], factor)
str(profiles)
## 'data.frame':    2097 obs. of  8 variables:
##  $ ID         : num  3 3 3 2 3 2 2 3 2 3 ...
##  $ ALIGN      : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ EYE        : num  3 3 4 4 3 3 3 3 3 3 ...
##  $ HAIR       : num  1 1 4 16 1 1 2 1 2 2 ...
##  $ SEX        : num  3 3 3 3 3 1 3 3 1 3 ...
##  $ APPEARANCES: num  3093 2496 1565 1316 1237 ...
##  $ TENURE     : num  79 32 59 31 78 77 77 29 49 62 ...
##  $ ALIVE      : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
head(profiles)
##   ID ALIGN EYE HAIR SEX APPEARANCES TENURE ALIVE
## 1  3     2   3    1   3        3093     79     2
## 2  3     2   3    1   3        2496     32     2
## 3  3     2   4    4   3        1565     59     2
## 4  2     2   4   16   3        1316     31     2
## 5  3     2   3    1   3        1237     78     2
## 6  2     2   3    1   1        1231     77     2
nrow(names) == nrow(profiles)
## [1] TRUE

“Living Characters” is encoded as “2” in profiles$ALIVE.

1.2 Test Data

The objective of this analysis is to predict the survival status of the most important characters in the DC Comics Universe. The importance of a character is determined by their frequency of appearance since their first introduction. The frequency threshold is set to 2 times the standard deviation of the appearance counts data.

T = 2 * sd(profiles$APPEARANCES)
T
## [1] 282.4999

Assuming that the fate of the less important characters has been decided permanently by the time the data source is scraped, we can rely on their status to identify the patterns for living and deceased characters. Those still alive whose appearance frequency passes the threshold, the “main” characters, are automatically selected for the test data, leaving the rest for the train and validation datasets.

Add 10 years to the main characters’ TENURE to reflect the timeframe of the predictions.

test_rows <- which((profiles$APPEARANCES >= T) & (profiles$ALIVE == '2'))
summary(profiles[test_rows,1:7]$TENURE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.00   32.00   44.00   48.55   75.50   80.00
profiles[test_rows,1:7]$TENURE <- profiles[test_rows,1:7]$TENURE + 10
summary(profiles[test_rows,1:7]$TENURE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.00   42.00   54.00   58.55   85.50   90.00

1.3 Principal Component Analysis

pca <- prcomp(profiles[,1:7], scale. = TRUE, center = TRUE)
summary(pca)
## Importance of components:
##                           PC1    PC2    PC3    PC4    PC5    PC6     PC7
## Standard deviation     1.1983 1.1092 1.0684 0.9826 0.9235 0.8650 0.79088
## Proportion of Variance 0.2051 0.1758 0.1631 0.1379 0.1218 0.1069 0.08936
## Cumulative Proportion  0.2051 0.3809 0.5440 0.6819 0.8037 0.9106 1.00000
fviz_eig(pca)

Reduce the total dimensions from 7 to 2 by taking the top 2 principal components for the train and test data.

profiles.pca <- data.frame(pca$x[,1:2])
profiles.pca <- cbind(profiles.pca, ALIVE = profiles$ALIVE)
str(profiles.pca)
## 'data.frame':    2097 obs. of  3 variables:
##  $ PC1  : num  -15.43 -11.27 -8.16 -5.75 -7.59 ...
##  $ PC2  : num  -1.967 -1.399 -0.705 -1.926 -0.369 ...
##  $ ALIVE: Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...

1.4 Train and Validation Data

Get the test data.

test <- profiles.pca[test_rows,1:2]
dim(test)
## [1] 47  2
head(test)
##          PC1        PC2
## 1 -15.431043 -1.9669383
## 2 -11.268335 -1.3988434
## 3  -8.161795 -0.7047242
## 4  -5.753929 -1.9257187
## 5  -7.589292 -0.3685112
## 6  -6.691241 -2.3585444

The train and validation data is split by 70/30 rule.

set.seed(400)
train_valid <- profiles.pca[-test_rows,]
sample <- createDataPartition(y = train_valid[,3], p = 0.7, list = FALSE)
train <- train_valid[sample,] 
dim(train)
## [1] 1436    3
summary(train$ALIVE)
##    1    2 
##  396 1040
valid <- train_valid[-sample,]
dim(valid)
## [1] 614   3
summary(valid$ALIVE)
##   1   2 
## 169 445

The train data consists of 396 deaths, and the validation data has 169 deaths.

2. Modeling

Since the objective is to classify each character by two survival statuses, deceased or alive, 3 classification methodologies, Support Vector Machine (SVM), Logistic regression (GLM), and K-nearest neighbor (KNN), are employed to build models from the train data. The models will be assessed based on their accuracy level when fitted to the validation dataset to determine the optimal model. This model will then be applied on the test data to make the final predictions.

accuracies <- rep(0:0, 3)
acc_svm <- data.frame(Train_acc = numeric(), Valid_acc = numeric())

2.1 Support Vector Machine

Develop SVM models on the training data using linear, radial, and polynomial basic kernels and 5-fold cross-validation train method. Evaluate the accuracy of each model using the validation data.

2.1.1 Linear SVM
set.seed(450)

model_svm1 <- train(form = ALIVE ~ .,
              data = train,
              trControl = trainControl(method = "cv", number = 5),
              method = 'svmLinear',
              tuneGrid = expand.grid(C = c(1, 10, 100)),
              preProcess = c("center","scale"))
model_svm1
## Support Vector Machines with Linear Kernel 
## 
## 1436 samples
##    2 predictor
##    2 classes: '1', '2' 
## 
## Pre-processing: centered (2), scaled (2) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1149, 1149, 1149, 1149, 1148 
## Resampling results across tuning parameters:
## 
##   C    Accuracy   Kappa
##     1  0.7242354  0    
##    10  0.7242354  0    
##   100  0.7242354  0    
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was C = 1.
pred1_train <- predict(model_svm1, train[,1:2]) 
as.matrix(table(train$ALIVE, pred1_train))
##    pred1_train
##        1    2
##   1    0  396
##   2    0 1040
print(paste('Train accuracy:', mean(pred1_train == train$ALIVE)))
## [1] "Train accuracy: 0.724233983286908"
pred1_valid <- predict(model_svm1, valid[,1:2])
as.matrix(table(valid$ALIVE, pred1_valid))
##    pred1_valid
##       1   2
##   1   0 169
##   2   0 445
print(paste('Validation accuracy:', mean(pred1_valid == valid$ALIVE)))
## [1] "Validation accuracy: 0.724755700325733"
acc_svm <- rbind(acc_svm, c(mean(pred1_train == train$ALIVE), mean(pred1_valid == valid$ALIVE)))
2.1.2 Polynomial SVM
set.seed(450)

model_svm2 <- train(form = ALIVE ~ .,
              data = train,
              trControl = trainControl(method = "cv", number = 5),
              method = 'svmPoly',
              tuneGrid = expand.grid(C = c(1, 10, 100),
                                     degree = 3, scale = c(0.1, 1)),
              preProcess = c("center","scale"))
model_svm2
## Support Vector Machines with Polynomial Kernel 
## 
## 1436 samples
##    2 predictor
##    2 classes: '1', '2' 
## 
## Pre-processing: centered (2), scaled (2) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1149, 1149, 1149, 1149, 1148 
## Resampling results across tuning parameters:
## 
##   C    scale  Accuracy   Kappa
##     1  0.1    0.7242354  0    
##     1  1.0    0.7242354  0    
##    10  0.1    0.7242354  0    
##    10  1.0    0.7242354  0    
##   100  0.1    0.7242354  0    
##   100  1.0    0.7242354  0    
## 
## Tuning parameter 'degree' was held constant at a value of 3
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.1 and C = 1.
pred2_train <- predict(model_svm1, train[,1:2]) 
as.matrix(table(train$ALIVE, pred2_train))
##    pred2_train
##        1    2
##   1    0  396
##   2    0 1040
print(paste('Train accuracy:', mean(pred2_train == train$ALIVE)))
## [1] "Train accuracy: 0.724233983286908"
pred2_valid <- predict(model_svm1, valid[,1:2])
as.matrix(table(valid$ALIVE, pred2_valid))
##    pred2_valid
##       1   2
##   1   0 169
##   2   0 445
print(paste('Validation accuracy:', mean(pred2_valid == valid$ALIVE)))
## [1] "Validation accuracy: 0.724755700325733"
acc_svm <- rbind(acc_svm, c(mean(pred2_train == train$ALIVE), mean(pred2_valid == valid$ALIVE)))
2.1.3 Radial SVM
set.seed(450)

model_svm3 <- train(form = ALIVE ~ .,
              data = train,
              trControl = trainControl(method = "cv", number = 5),
              method = 'svmRadial',
              tuneGrid = expand.grid(C = c(1, 10, 100),
                                     sigma = c(.01, .1)),
              preProcess = c("center","scale"))
model_svm3
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 1436 samples
##    2 predictor
##    2 classes: '1', '2' 
## 
## Pre-processing: centered (2), scaled (2) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1149, 1149, 1149, 1149, 1148 
## Resampling results across tuning parameters:
## 
##   C    sigma  Accuracy   Kappa     
##     1  0.01   0.7242354  0.00000000
##     1  0.10   0.7242354  0.00000000
##    10  0.01   0.7242354  0.00000000
##    10  0.10   0.7263260  0.01082394
##   100  0.01   0.7242354  0.00000000
##   100  0.10   0.7263236  0.01302925
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.1 and C = 10.
pred3_train <- predict(model_svm1, train[,1:2]) 
as.matrix(table(train$ALIVE, pred3_train))
##    pred3_train
##        1    2
##   1    0  396
##   2    0 1040
print(paste('Train accuracy:', mean(pred3_train == train$ALIVE)))
## [1] "Train accuracy: 0.724233983286908"
pred3_valid <- predict(model_svm1, valid[,1:2])
as.matrix(table(valid$ALIVE, pred3_valid))
##    pred3_valid
##       1   2
##   1   0 169
##   2   0 445
print(paste('Validation accuracy:', mean(pred3_valid == valid$ALIVE)))
## [1] "Validation accuracy: 0.724755700325733"
acc_svm <- rbind(acc_svm, c(mean(pred3_train == train$ALIVE), mean(pred3_valid == valid$ALIVE)))
2.1.4 SVM Models Evaluation
colnames(acc_svm) <- c('Train.Acc', 'Validation.Acc')
rownames(acc_svm) <- c('Linear', 'Polynomial', 'Radial')
acc_svm
##            Train.Acc Validation.Acc
## Linear      0.724234      0.7247557
## Polynomial  0.724234      0.7247557
## Radial      0.724234      0.7247557

All 3 SVM models return the similar classification results for the train and validation dataset at the same level of accuracy. It is odd that all 3 predict 100% of the deaths as “Alive”. We’ll assess if the 3 models will predict differently on the test data.

accuracies[1] <- mean(pred2_valid == valid$ALIVE)

2.2 Logistic Regression

Train a logistic regression model, i.e. generalized linear model (GLM) using 5-fold cross validation.

set.seed(450)

model_glm <- train(form = ALIVE ~ .,
                data = train,
                family = binomial(link = 'logit'),
                trControl = trainControl(method = "cv", number = 5),
                method = "glm",
                preProcess = "scale")
model_glm
## Generalized Linear Model 
## 
## 1436 samples
##    2 predictor
##    2 classes: '1', '2' 
## 
## Pre-processing: scaled (2) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1149, 1149, 1149, 1149, 1148 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.7277173  0.0180639
pred_glm <- predict(model_glm, valid) # apply on validation data
as.matrix(table(valid$ALIVE, pred_glm))
##    pred_glm
##       1   2
##   1   1 168
##   2   0 445
acc_glm <- mean(pred_glm == valid$ALIVE)
print(paste('GLM Validation Accuracy:', acc_glm))
## [1] "GLM Validation Accuracy: 0.726384364820847"
accuracies[2] <- acc_glm

2.3 K-Nearest Neighbors

The train.kknn module optimizes the kernel and k nearest neighbor parameter using 5-fold cross-validation; hence, the optimal KNN model is returned automatically.

set.seed(450)
model_knn <- train(form = ALIVE ~ ., 
                 data = train,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 trControl = trainControl(method = "cv", number = 5))
model_knn
## k-Nearest Neighbors 
## 
## 1436 samples
##    2 predictor
##    2 classes: '1', '2' 
## 
## Pre-processing: centered (2), scaled (2) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1149, 1149, 1149, 1149, 1148 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa     
##   5  0.6810492  0.04783585
##   7  0.6963632  0.06454090
##   9  0.7033246  0.05126340
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 9.
pred_knn <- predict(model_knn, valid) # apply on validation data
as.matrix(table(valid$ALIVE, pred_knn))
##    pred_knn
##       1   2
##   1  23 146
##   2  33 412
acc_knn <- mean(pred_knn == valid$ALIVE)

print(paste('KNN Validation Accuracy:', acc_knn))
## [1] "KNN Validation Accuracy: 0.708469055374593"
accuracies[3] <- acc_knn

3. Model Comparisons and Predictions

Return the best model(s) for predictions based on their accuracy as fitted to the validation data.

compare <- data.frame('Model' = c('SVM', 'GLM', 'KNN'),'Accuracy' = accuracies)
compare
##   Model  Accuracy
## 1   SVM 0.7247557
## 2   GLM 0.7263844
## 3   KNN 0.7084691
compare$Model[which(max(compare$Accuracy) == compare$Accuracy, arr.ind = T)]
## [1] "GLM"

The accuracy of the SVM and GLM models on the validation data are quite comparable; however, the GLM model is slightly more accurate. It will be interesting to assess the results from all 3 models, although the classification result by GLM is more preferable.

predict_svm1 <- predict(model_svm1, test)
summary(predict_svm1)
##  1  2 
##  0 47
predict_svm2 <- predict(model_svm2, test)
summary(predict_svm2)
##  1  2 
## 13 34
predict_svm3 <- predict(model_svm3, test)
summary(predict_svm3)
##  1  2 
## 17 30
predict_glm <- predict(model_glm, test)
summary(predict_glm)
##  1  2 
## 17 30
predict_knn <- predict(model_knn, test)
summary(predict_knn)
##  1  2 
## 27 20

The 3 SVM models predict differently on the test data. The Radial SVM model may have similar predictions to the GLM model’s.

predicts.SVM <- data.frame('Character' = names[test_rows,2],
                       'SVM.Lin.Pred' = predict_svm1,
                       'SVM.Poly.Pred' = predict_svm2,
                       'SVM.Rad.Pred' = predict_svm3)
predicts.SVM[] <- lapply(predicts.SVM, as.character)
predicts.SVM[predicts.SVM =="1"] <- 'Deceased'
predicts.SVM[predicts.SVM == "2"] <- 'Alive'
predicts.SVM
##                           Character SVM.Lin.Pred SVM.Poly.Pred SVM.Rad.Pred
## 1              Batman (Bruce Wayne)        Alive      Deceased        Alive
## 2             Superman (Clark Kent)        Alive      Deceased        Alive
## 3        Green Lantern (Hal Jordan)        Alive      Deceased     Deceased
## 4          James Gordon (New Earth)        Alive      Deceased     Deceased
## 5       Richard Grayson (New Earth)        Alive      Deceased     Deceased
## 6       Wonder Woman (Diana Prince)        Alive      Deceased     Deceased
## 7            Aquaman (Arthur Curry)        Alive      Deceased     Deceased
## 8         Timothy Drake (New Earth)        Alive      Deceased     Deceased
## 9    Dinah Laurel Lance (New Earth)        Alive         Alive     Deceased
## 10              Flash (Barry Allen)        Alive      Deceased     Deceased
## 11       Barbara Gordon (New Earth)        Alive         Alive     Deceased
## 12        Jason Garrick (New Earth)        Alive      Deceased     Deceased
## 13            Lois Lane (New Earth)        Alive      Deceased     Deceased
## 14    Alfred Pennyworth (New Earth)        Alive      Deceased     Deceased
## 15          Carter Hall (New Earth)        Alive      Deceased     Deceased
## 16          Kyle Rayner (New Earth)        Alive         Alive        Alive
## 17     Alexander Luthor (New Earth)        Alive         Alive        Alive
## 18           Roy Harper (New Earth)        Alive         Alive     Deceased
## 19           Kara Zor-L (Earth-Two)        Alive         Alive        Alive
## 20       Garfield Logan (New Earth)        Alive         Alive        Alive
## 21          Guy Gardner (New Earth)        Alive         Alive        Alive
## 22         Victor Stone (New Earth)        Alive         Alive        Alive
## 23               Kon-El (New Earth)        Alive         Alive        Alive
## 24          James Olsen (New Earth)        Alive         Alive        Alive
## 25         John Stewart (New Earth)        Alive         Alive        Alive
## 26                Joker (New Earth)        Alive         Alive     Deceased
## 27       Zatanna Zatara (New Earth)        Alive         Alive        Alive
## 28   Michael Jon Carter (New Earth)        Alive         Alive        Alive
## 29  Cassandra Sandsmark (New Earth)        Alive         Alive        Alive
## 30       Harvey Bullock (New Earth)        Alive         Alive        Alive
## 31          Rachel Roth (New Earth)        Alive         Alive        Alive
## 32    Helena Bertinelli (New Earth)        Alive         Alive        Alive
## 33       Nathaniel Adam (New Earth)        Alive         Alive        Alive
## 34     John Constantine (New Earth)        Alive         Alive        Alive
## 35                 Lobo (New Earth)        Alive         Alive        Alive
## 36         Slade Wilson (New Earth)        Alive         Alive        Alive
## 37     Beatriz da Costa (New Earth)        Alive         Alive        Alive
## 38       William Batson (New Earth)        Alive         Alive     Deceased
## 39         Carol Ferris (New Earth)        Alive         Alive        Alive
## 40 Jennifer-Lynn Hayden (New Earth)        Alive         Alive        Alive
## 41        Renee Montoya (New Earth)        Alive         Alive        Alive
## 42          Harvey Dent (New Earth)        Alive         Alive        Alive
## 43    Courtney Whitmore (New Earth)        Alive         Alive        Alive
## 44          Kara Zor-El (New Earth)        Alive         Alive        Alive
## 45            Rex Tyler (New Earth)        Alive         Alive     Deceased
## 46         Pamela Isley (New Earth)        Alive         Alive        Alive
## 47         Pieter Cross (New Earth)        Alive         Alive        Alive
predicts.GLM.KNN <- data.frame('Character' = names[test_rows,2],
                       'GLM.Pred' = predict_glm,
                       'KNN.Pred' = predict_knn)
predicts.GLM.KNN[] <- lapply(predicts.GLM.KNN, as.character)
predicts.GLM.KNN[predicts.GLM.KNN =="1"] <- 'Deceased'
predicts.GLM.KNN[predicts.GLM.KNN == "2"] <- 'Alive'
predicts.GLM.KNN
##                           Character GLM.Pred KNN.Pred
## 1              Batman (Bruce Wayne) Deceased Deceased
## 2             Superman (Clark Kent) Deceased Deceased
## 3        Green Lantern (Hal Jordan) Deceased Deceased
## 4          James Gordon (New Earth) Deceased Deceased
## 5       Richard Grayson (New Earth) Deceased Deceased
## 6       Wonder Woman (Diana Prince) Deceased Deceased
## 7            Aquaman (Arthur Curry) Deceased Deceased
## 8         Timothy Drake (New Earth) Deceased Deceased
## 9    Dinah Laurel Lance (New Earth)    Alive Deceased
## 10              Flash (Barry Allen) Deceased Deceased
## 11       Barbara Gordon (New Earth)    Alive Deceased
## 12        Jason Garrick (New Earth) Deceased Deceased
## 13            Lois Lane (New Earth) Deceased Deceased
## 14    Alfred Pennyworth (New Earth) Deceased Deceased
## 15          Carter Hall (New Earth) Deceased Deceased
## 16          Kyle Rayner (New Earth)    Alive Deceased
## 17     Alexander Luthor (New Earth)    Alive Deceased
## 18           Roy Harper (New Earth) Deceased Deceased
## 19           Kara Zor-L (Earth-Two)    Alive    Alive
## 20       Garfield Logan (New Earth)    Alive Deceased
## 21          Guy Gardner (New Earth)    Alive Deceased
## 22         Victor Stone (New Earth)    Alive    Alive
## 23               Kon-El (New Earth)    Alive Deceased
## 24          James Olsen (New Earth)    Alive    Alive
## 25         John Stewart (New Earth)    Alive Deceased
## 26                Joker (New Earth) Deceased Deceased
## 27       Zatanna Zatara (New Earth)    Alive    Alive
## 28   Michael Jon Carter (New Earth)    Alive    Alive
## 29  Cassandra Sandsmark (New Earth)    Alive    Alive
## 30       Harvey Bullock (New Earth)    Alive    Alive
## 31          Rachel Roth (New Earth)    Alive    Alive
## 32    Helena Bertinelli (New Earth)    Alive    Alive
## 33       Nathaniel Adam (New Earth)    Alive    Alive
## 34     John Constantine (New Earth)    Alive    Alive
## 35                 Lobo (New Earth)    Alive    Alive
## 36         Slade Wilson (New Earth)    Alive    Alive
## 37     Beatriz da Costa (New Earth)    Alive    Alive
## 38       William Batson (New Earth) Deceased Deceased
## 39         Carol Ferris (New Earth)    Alive    Alive
## 40 Jennifer-Lynn Hayden (New Earth)    Alive    Alive
## 41        Renee Montoya (New Earth)    Alive    Alive
## 42          Harvey Dent (New Earth)    Alive Deceased
## 43    Courtney Whitmore (New Earth)    Alive    Alive
## 44          Kara Zor-El (New Earth)    Alive    Alive
## 45            Rex Tyler (New Earth) Deceased Deceased
## 46         Pamela Isley (New Earth)    Alive    Alive
## 47         Pieter Cross (New Earth)    Alive Deceased

4. Interpretations

  • All characters will survive another decade according to the Linear SVM model.
  • The last 4 models: Polynomial SVM, Radial SVM, GLM, and KNN predict the gloomiest future of the DCU in which the majority of the Justice League and a few members of the Bat family, including Batman, Superman, Green Lantern, Nightwing, Aquaman, the Flash, Red Robin, and Batman’s right hand Alfred Pennyworth, will be deceased. Batman’s infamous enemies, the Joker and Two-Face (Harvey Dent) may not escape the death in this reality. Unfortunately for DC Comics fans, the GLM model is determined to be the most accurate. This may be a forecast of another Apokolips canon in which a massive destruction takes place on Earth by the alienate force led by Darkseid!
  • Although most of the Radial SVM predictions match other models’ predictions, this model predicts that the pillars of the DCU, Batman and Superman, will be alive!
  • The survival rate presented by the KNN appears to be the lowest with only 20 out of 47 characters survive. The additional deaths include Superman’s supporting characters, Lex Luthor and Lois Lane, and the Teen Titans/Young Justice’s Beastboy (Gar Logan), Speedy (Roy Harper), and Superboy (Kon El).

-###-