In this document we will be predicting fraudulent transactions from the ‘Credit Card Fraud Detection’ dataset availble from Kaggle. After a short EDA, I will use the keras package to build an Artificial Neural Network (ANN) and compare the results to a traditional logistic regression.

Credit Card Fraud

In this dataset we have a collection of data from credit card users in European countries spanning two days in September of 2013. This dataset contains information on 284,807 transactions; 492 of which have been marked as fraudulent attempts. Since only 0.17% of transactions are fraudulent, this becomes an unbalanced dataset.

The components of this dataset are all numerical values which are the result of a principle component analysis transformation. Due to confidentiality laws, the original features are not available.

library(tidyverse)
library(moments)
library(keras)
library(yardstick)
library(caret)

ccdata <- read_csv('datasets/creditcarddata.csv')
glimpse(ccdata)
## Observations: 284,807
## Variables: 31
## $ Time   <int> 0, 0, 1, 1, 2, 2, 4, 7, 7, 9, 10, 10, 10, 11, 12, 12, 1...
## $ V1     <dbl> -1.3598071, 1.1918571, -1.3583541, -0.9662717, -1.15823...
## $ V2     <dbl> -0.07278117, 0.26615071, -1.34016307, -0.18522601, 0.87...
## $ V3     <dbl> 2.53634674, 0.16648011, 1.77320934, 1.79299334, 1.54871...
## $ V4     <dbl> 1.37815522, 0.44815408, 0.37977959, -0.86329128, 0.4030...
## $ V5     <dbl> -0.33832077, 0.06001765, -0.50319813, -0.01030888, -0.4...
## $ V6     <dbl> 0.46238778, -0.08236081, 1.80049938, 1.24720317, 0.0959...
## $ V7     <dbl> 0.239598554, -0.078802983, 0.791460956, 0.237608940, 0....
## $ V8     <dbl> 0.098697901, 0.085101655, 0.247675787, 0.377435875, -0....
## $ V9     <dbl> 0.3637870, -0.2554251, -1.5146543, -1.3870241, 0.817739...
## $ V10    <dbl> 0.09079417, -0.16697441, 0.20764287, -0.05495192, 0.753...
## $ V11    <dbl> -0.55159953, 1.61272666, 0.62450146, -0.22648726, -0.82...
## $ V12    <dbl> -0.61780086, 1.06523531, 0.06608369, 0.17822823, 0.5381...
## $ V13    <dbl> -0.99138985, 0.48909502, 0.71729273, 0.50775687, 1.3458...
## $ V14    <dbl> -0.31116935, -0.14377230, -0.16594592, -0.28792375, -1....
## $ V15    <dbl> 1.468176972, 0.635558093, 2.345864949, -0.631418118, 0....
## $ V16    <dbl> -0.47040053, 0.46391704, -2.89008319, -1.05964725, -0.4...
## $ V17    <dbl> 0.207971242, -0.114804663, 1.109969379, -0.684092786, -...
## $ V18    <dbl> 0.02579058, -0.18336127, -0.12135931, 1.96577500, -0.03...
## $ V19    <dbl> 0.40399296, -0.14578304, -2.26185710, -1.23262197, 0.80...
## $ V20    <dbl> 0.25141210, -0.06908314, 0.52497973, -0.20803778, 0.408...
## $ V21    <dbl> -0.018306778, -0.225775248, 0.247998153, -0.108300452, ...
## $ V22    <dbl> 0.277837576, -0.638671953, 0.771679402, 0.005273597, 0....
## $ V23    <dbl> -0.110473910, 0.101288021, 0.909412262, -0.190320519, -...
## $ V24    <dbl> 0.06692807, -0.33984648, -0.68928096, -1.17557533, 0.14...
## $ V25    <dbl> 0.12853936, 0.16717040, -0.32764183, 0.64737603, -0.206...
## $ V26    <dbl> -0.18911484, 0.12589453, -0.13909657, -0.22192884, 0.50...
## $ V27    <dbl> 0.133558377, -0.008983099, -0.055352794, 0.062722849, 0...
## $ V28    <dbl> -0.021053053, 0.014724169, -0.059751841, 0.061457629, 0...
## $ Amount <dbl> 149.62, 2.69, 378.66, 123.50, 69.99, 3.67, 4.99, 40.80,...
## $ Class  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...

We see a total of 31 variables.

  • features V1 - V28 are a result of the PCA transformation and are simply numerical representations.

  • Time variable is the amount of time that passed from the time when the first transaction took place.

  • Amount is the value in dollars of the transaction

  • Class represents if the transaction is tagged as being a fraudulent transaction. 0 indicates the transaction is not fraudulent while a 1 indicates a fraudulent transaction. This will be our target variable.

Exploratory Data Analysis

summary(ccdata)
##       Time              V1                  V2           
##  Min.   :     0   Min.   :-56.40751   Min.   :-72.71573  
##  1st Qu.: 54201   1st Qu.: -0.92037   1st Qu.: -0.59855  
##  Median : 84692   Median :  0.01811   Median :  0.06549  
##  Mean   : 94814   Mean   :  0.00000   Mean   :  0.00000  
##  3rd Qu.:139321   3rd Qu.:  1.31564   3rd Qu.:  0.80372  
##  Max.   :172792   Max.   :  2.45493   Max.   : 22.05773  
##  NA's   :1                                               
##        V3                 V4                 V5            
##  Min.   :-48.3256   Min.   :-5.68317   Min.   :-113.74331  
##  1st Qu.: -0.8904   1st Qu.:-0.84864   1st Qu.:  -0.69160  
##  Median :  0.1799   Median :-0.01985   Median :  -0.05434  
##  Mean   :  0.0000   Mean   : 0.00000   Mean   :   0.00000  
##  3rd Qu.:  1.0272   3rd Qu.: 0.74334   3rd Qu.:   0.61193  
##  Max.   :  9.3826   Max.   :16.87534   Max.   :  34.80167  
##                                                            
##        V6                 V7                 V8           
##  Min.   :-26.1605   Min.   :-43.5572   Min.   :-73.21672  
##  1st Qu.: -0.7683   1st Qu.: -0.5541   1st Qu.: -0.20863  
##  Median : -0.2742   Median :  0.0401   Median :  0.02236  
##  Mean   :  0.0000   Mean   :  0.0000   Mean   :  0.00000  
##  3rd Qu.:  0.3986   3rd Qu.:  0.5704   3rd Qu.:  0.32735  
##  Max.   : 73.3016   Max.   :120.5895   Max.   : 20.00721  
##                                                           
##        V9                 V10                 V11          
##  Min.   :-13.43407   Min.   :-24.58826   Min.   :-4.79747  
##  1st Qu.: -0.64310   1st Qu.: -0.53543   1st Qu.:-0.76249  
##  Median : -0.05143   Median : -0.09292   Median :-0.03276  
##  Mean   :  0.00000   Mean   :  0.00000   Mean   : 0.00000  
##  3rd Qu.:  0.59714   3rd Qu.:  0.45392   3rd Qu.: 0.73959  
##  Max.   : 15.59500   Max.   : 23.74514   Max.   :12.01891  
##                                                            
##       V12                V13                V14          
##  Min.   :-18.6837   Min.   :-5.79188   Min.   :-19.2143  
##  1st Qu.: -0.4056   1st Qu.:-0.64854   1st Qu.: -0.4256  
##  Median :  0.1400   Median :-0.01357   Median :  0.0506  
##  Mean   :  0.0000   Mean   : 0.00000   Mean   :  0.0000  
##  3rd Qu.:  0.6182   3rd Qu.: 0.66251   3rd Qu.:  0.4931  
##  Max.   :  7.8484   Max.   : 7.12688   Max.   : 10.5268  
##                                                          
##       V15                V16                 V17           
##  Min.   :-4.49894   Min.   :-14.12985   Min.   :-25.16280  
##  1st Qu.:-0.58288   1st Qu.: -0.46804   1st Qu.: -0.48375  
##  Median : 0.04807   Median :  0.06641   Median : -0.06568  
##  Mean   : 0.00000   Mean   :  0.00000   Mean   :  0.00000  
##  3rd Qu.: 0.64882   3rd Qu.:  0.52330   3rd Qu.:  0.39968  
##  Max.   : 8.87774   Max.   : 17.31511   Max.   :  9.25353  
##                                                            
##       V18                 V19                 V20           
##  Min.   :-9.498746   Min.   :-7.213527   Min.   :-54.49772  
##  1st Qu.:-0.498850   1st Qu.:-0.456299   1st Qu.: -0.21172  
##  Median :-0.003636   Median : 0.003735   Median : -0.06248  
##  Mean   : 0.000000   Mean   : 0.000000   Mean   :  0.00000  
##  3rd Qu.: 0.500807   3rd Qu.: 0.458949   3rd Qu.:  0.13304  
##  Max.   : 5.041069   Max.   : 5.591971   Max.   : 39.42090  
##                                                             
##       V21                 V22                  V23           
##  Min.   :-34.83038   Min.   :-10.933144   Min.   :-44.80774  
##  1st Qu.: -0.22839   1st Qu.: -0.542350   1st Qu.: -0.16185  
##  Median : -0.02945   Median :  0.006782   Median : -0.01119  
##  Mean   :  0.00000   Mean   :  0.000000   Mean   :  0.00000  
##  3rd Qu.:  0.18638   3rd Qu.:  0.528554   3rd Qu.:  0.14764  
##  Max.   : 27.20284   Max.   : 10.503090   Max.   : 22.52841  
##                                                              
##       V24                V25                 V26          
##  Min.   :-2.83663   Min.   :-10.29540   Min.   :-2.60455  
##  1st Qu.:-0.35459   1st Qu.: -0.31715   1st Qu.:-0.32698  
##  Median : 0.04098   Median :  0.01659   Median :-0.05214  
##  Mean   : 0.00000   Mean   :  0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.43953   3rd Qu.:  0.35072   3rd Qu.: 0.24095  
##  Max.   : 4.58455   Max.   :  7.51959   Max.   : 3.51735  
##                                                           
##       V27                  V28                Amount        
##  Min.   :-22.565679   Min.   :-15.43008   Min.   :    0.00  
##  1st Qu.: -0.070840   1st Qu.: -0.05296   1st Qu.:    5.60  
##  Median :  0.001342   Median :  0.01124   Median :   22.00  
##  Mean   :  0.000000   Mean   :  0.00000   Mean   :   88.35  
##  3rd Qu.:  0.091045   3rd Qu.:  0.07828   3rd Qu.:   77.17  
##  Max.   : 31.612198   Max.   : 33.84781   Max.   :25691.16  
##                                                             
##      Class         
##  Min.   :0.000000  
##  1st Qu.:0.000000  
##  Median :0.000000  
##  Mean   :0.001728  
##  3rd Qu.:0.000000  
##  Max.   :1.000000  
## 
length(ccdata$Class)
## [1] 284807
summary(as.factor(ccdata$Class))
##      0      1 
## 284315    492
492/284315
## [1] 0.001730475

Of the 284807 transactions contained in this dataset, only 492 of them are tagged as being fraudulent. This makes for a highly unbalanced dataset. Total frautulent activity is only .0017 which is very low. Balanced datasets are usually around 4:1.

# Checking for missing data
sapply(ccdata,  function(x) sum(is.na(x)))
##   Time     V1     V2     V3     V4     V5     V6     V7     V8     V9 
##      1      0      0      0      0      0      0      0      0      0 
##    V10    V11    V12    V13    V14    V15    V16    V17    V18    V19 
##      0      0      0      0      0      0      0      0      0      0 
##    V20    V21    V22    V23    V24    V25    V26    V27    V28 Amount 
##      0      0      0      0      0      0      0      0      0      0 
##  Class 
##      0

We see one missing value in the Time column so we take a look at that transaction

ccdata[which(is.na(ccdata$Time)), ]
## # A tibble: 1 x 31
##    Time    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10   V11
##   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1    NA -1.41 -1.62  2.55 0.386 0.503 0.507 -1.74 0.761  3.00 -1.16 0.394
## # ... with 19 more variables: V12 <dbl>, V13 <dbl>, V14 <dbl>, V15 <dbl>,
## #   V16 <dbl>, V17 <dbl>, V18 <dbl>, V19 <dbl>, V20 <dbl>, V21 <dbl>, V22
## #   <dbl>, V23 <dbl>, V24 <dbl>, V25 <dbl>, V26 <dbl>, V27 <dbl>, V28
## #   <dbl>, Amount <dbl>, Class <int>

Here we see the transaction missing a time has not been marked as fraudulent and is for an amount of $49.50. All values in V1 - V28 are within normal ranges as found in the variable summaries above so we will keep this transaction in our dataset.

Visual Analysis

A quick visual analysis of our data prior to doing prediction work

# Visualize Amount feature
ggplot(ccdata, aes(x = Amount)) +
  geom_histogram() 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Calculate skewness using moments package
skewness(ccdata$Amount)
## [1] 16.97764

The Amount variable is heavily skewed right both visually and as indicated by a skewness of 16.9 (normal data will have skewness of 0). As such we will be normalizng this feature.

# Summary statistics of Amount feature by class
ccdata %>%
  group_by(Class) %>%
  summarise(min = min(Amount),
            mean = mean(Amount),
            max = max(Amount),
            sd = sd(Amount))
## # A tibble: 2 x 5
##   Class   min  mean   max    sd
##   <int> <dbl> <dbl> <dbl> <dbl>
## 1     0     0  88.3 25691   250
## 2     1     0 122    2126   257
ggplot(ccdata, aes(x = 1, y = Amount, fill = as.factor(Class), group = Class)) +
  geom_boxplot() 

This visual representation shows just how skewed the data is. Especially in the non-fraudulent cases, the data is so skewed that a boxplot is near useless.

Before normalizing, lets take a quick look at just the fraudulent cases

# Plot only fruadulent data
ccdata %>%
  filter(Class == 1) %>%
  ggplot(aes(x = 1, y = Amount)) +
  geom_boxplot() +
  geom_violin(fill = 'lightblue', color = 'lightblue', alpha = .7) +
  scale_y_sqrt()

By using a square root scale we can better visualize where the data lies. The overlaid violin plot show us that there is a secondary peak near the third quartile.

Data Clensing

The Time feature denotes the number of seconds that have passed between each transaction and the first transaction in the dataset. Since we do not have a begin time for the first transaction, the time column contains very little useful information. As such we will remove the ‘Time’ feature from our data

# Removing the TIME feature
ccdata <- ccdata[, -1]
colnames(ccdata)
##  [1] "V1"     "V2"     "V3"     "V4"     "V5"     "V6"     "V7"    
##  [8] "V8"     "V9"     "V10"    "V11"    "V12"    "V13"    "V14"   
## [15] "V15"    "V16"    "V17"    "V18"    "V19"    "V20"    "V21"   
## [22] "V22"    "V23"    "V24"    "V25"    "V26"    "V27"    "V28"   
## [29] "Amount" "Class"
# Convert Class to a factor
ccdata$Class <- as.factor(ccdata$Class)

Preprocessing

# Create new dataframe on which we will create our models
ccproc <- ccdata

# Normalize the Amount feature
ccproc$Amount <- scale(ccproc$Amount, center = TRUE, scale = TRUE)

We will use a 80/20 train/test split

#Splitting the data
set.seed(42)
train <- sample(1:nrow(ccproc), nrow(ccproc)* .8) 
model_data <- ccproc[train, ]
test_data <- ccproc[-train, ]

We do some preprocessing to get the data in a form compliant with keras.

# Predictor variable for the training and testing set
X_train <- model_data[, -30]
X_test <- test_data[, -30]


# Target Variable for training and testing set
Y_train <- model_data[, 30]
Y_test <- test_data[, 30]

Model Building

To predict values in the test set, I will build two models:

  1. Logistic Regression using the glm() function

  2. Artificial Neural Network using keras

We will use the logistic regression model as a performance baseline to which we can compare the performance of the keras ANN.

Logistic Regression

The first model will be a logistic regression model. Logistic Regression models are popular because they are easy to build and give an output that is easy to understand.

# Fitting the model

glm_model <- glm(Class ~ . , family = binomial(link='logit'), data = model_data)
summary(glm_model)
## 
## Call:
## glm(formula = Class ~ ., family = binomial(link = "logit"), data = model_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.8107  -0.0297  -0.0195  -0.0125   4.5966  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -8.6409790  0.1629605 -53.025  < 2e-16 ***
## V1           0.0816868  0.0459098   1.779 0.075193 .  
## V2           0.0216503  0.0647928   0.334 0.738268    
## V3           0.0265704  0.0503064   0.528 0.597381    
## V4           0.6917694  0.0825821   8.377  < 2e-16 ***
## V5           0.1069355  0.0734670   1.456 0.145515    
## V6          -0.1105463  0.0843440  -1.311 0.189973    
## V7          -0.1064610  0.0734993  -1.448 0.147488    
## V8          -0.1638044  0.0334983  -4.890 1.01e-06 ***
## V9          -0.2576267  0.1237902  -2.081 0.037420 *  
## V10         -0.8069236  0.1055221  -7.647 2.06e-14 ***
## V11         -0.0814785  0.0845132  -0.964 0.335000    
## V12          0.1196401  0.1005030   1.190 0.233884    
## V13         -0.3360515  0.0926917  -3.625 0.000288 ***
## V14         -0.5853790  0.0705124  -8.302  < 2e-16 ***
## V15         -0.0361303  0.0945174  -0.382 0.702268    
## V16         -0.2058326  0.1411167  -1.459 0.144676    
## V17         -0.0249571  0.0772978  -0.323 0.746794    
## V18         -0.0005058  0.1432247  -0.004 0.997182    
## V19          0.1618482  0.1073241   1.508 0.131546    
## V20         -0.4133680  0.0900533  -4.590 4.43e-06 ***
## V21          0.3624651  0.0650006   5.576 2.46e-08 ***
## V22          0.5979897  0.1435925   4.164 3.12e-05 ***
## V23         -0.1326601  0.0640901  -2.070 0.038462 *  
## V24          0.0953750  0.1642361   0.581 0.561430    
## V25          0.0532637  0.1424103   0.374 0.708393    
## V26         -0.0262107  0.2117731  -0.124 0.901499    
## V27         -0.7904580  0.1337405  -5.910 3.41e-09 ***
## V28         -0.2325387  0.0960162  -2.422 0.015441 *  
## Amount       0.2369191  0.1046229   2.265 0.023543 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5875.3  on 227844  degrees of freedom
## Residual deviance: 1819.4  on 227815  degrees of freedom
## AIC: 1879.4
## 
## Number of Fisher Scoring iterations: 12
# Use the anova() function to see feature statistics

anova(glm_model, test='Chisq')
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Class
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                  227844     5875.3              
## V1      1   618.93    227843     5256.3 < 2.2e-16 ***
## V2      1   581.86    227842     4674.5 < 2.2e-16 ***
## V3      1   831.65    227841     3842.8 < 2.2e-16 ***
## V4      1   870.46    227840     2972.4 < 2.2e-16 ***
## V5      1    39.19    227839     2933.2 3.838e-10 ***
## V6      1    40.27    227838     2892.9 2.217e-10 ***
## V7      1    41.30    227837     2851.6 1.304e-10 ***
## V8      1    79.53    227836     2772.1 < 2.2e-16 ***
## V9      1    48.93    227835     2723.1 2.647e-12 ***
## V10     1   539.06    227834     2184.1 < 2.2e-16 ***
## V11     1    56.96    227833     2127.1 4.443e-14 ***
## V12     1    39.59    227832     2087.5 3.138e-10 ***
## V13     1    27.43    227831     2060.1 1.627e-07 ***
## V14     1   134.81    227830     1925.3 < 2.2e-16 ***
## V15     1     0.03    227829     1925.3  0.861331    
## V16     1    42.38    227828     1882.9 7.505e-11 ***
## V17     1     2.20    227827     1880.7  0.138375    
## V18     1     0.89    227826     1879.8  0.345281    
## V19     1     0.47    227825     1879.3  0.494363    
## V20     1     1.39    227824     1877.9  0.237828    
## V21     1     8.87    227823     1869.1  0.002904 ** 
## V22     1    14.63    227822     1854.4  0.000131 ***
## V23     1     8.53    227821     1845.9  0.003487 ** 
## V24     1     0.13    227820     1845.8  0.716258    
## V25     1     0.10    227819     1845.7  0.752440    
## V26     1     0.07    227818     1845.6  0.788516    
## V27     1    15.66    227817     1829.9 7.586e-05 ***
## V28     1     3.51    227816     1826.4  0.061096 .  
## Amount  1     7.07    227815     1819.4  0.007860 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we use the logistic model to predict on the testing data.

fitted.results <- predict(glm_model, test_data[, 1:29], type = 'response')
fitted.results <- ifelse(fitted.results > .5, 1, 0)

conf_matrix_glm <- confusionMatrix(reference = unlist(Y_test),data = fitted.results)
conf_matrix_glm
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction     0     1
##          0 56862    33
##          1     8    59
##                                          
##                Accuracy : 0.9993         
##                  95% CI : (0.999, 0.9995)
##     No Information Rate : 0.9984         
##     P-Value [Acc > NIR] : 1.886e-09      
##                                          
##                   Kappa : 0.7418         
##  Mcnemar's Test P-Value : 0.0001781      
##                                          
##             Sensitivity : 0.9999         
##             Specificity : 0.6413         
##          Pos Pred Value : 0.9994         
##          Neg Pred Value : 0.8806         
##              Prevalence : 0.9984         
##          Detection Rate : 0.9982         
##    Detection Prevalence : 0.9988         
##       Balanced Accuracy : 0.8206         
##                                          
##        'Positive' Class : 0              
## 

Of the 92 fraudulent transactions in our test set, the GLM model correctly predicted 59 occurences, had 8 false positives and predicted 33 false negatives.

This is good for an accuracy of .9993, Sensitivity of .9999 and Specificity of .6413

Artificial Neural Network

Next we want to use keras to create an artificial neural network. Our network will have two hidden layers with 10% dropout.

First we initialize the sequential model.

# Building the ANN model
model_keras <- keras_model_sequential()
model2 <- model_keras %>%
  layer_dense(units = 16, kernel_initializer = 'uniform', activation = 'relu', input_shape = ncol(X_train)) %>%
  layer_dropout(rate = 0.1) %>%
  layer_dense(units = 8, kernel_initializer = 'uniform' , activation = 'relu') %>%
  layer_dropout(rate = 0.1) %>%
  layer_dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid')


# Compiling

compile2 <- model_keras %>%
  compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = c('accuracy'))

compile2
## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 16)                    480         
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 16)                    0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 8)                     136         
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 8)                     0           
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 1)                     9           
## ===========================================================================
## Total params: 625
## Trainable params: 625
## Non-trainable params: 0
## ___________________________________________________________________________
summary(model2)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 16)                    480         
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 16)                    0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 8)                     136         
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 8)                     0           
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 1)                     9           
## ===========================================================================
## Total params: 625
## Trainable params: 625
## Non-trainable params: 0
## ___________________________________________________________________________

Now we create the model and plot the outcome

# Fitting the model

kemodel <- fit(object = model2,
               x = as.matrix(X_train),
               y = as.matrix(Y_train),
               batch_size = 100,
               epochs = 30,
               validation_split = 0.30)

# Plotting the output

plot(kemodel) +
  labs( title = 'Deep Learning Training Result')

After just a few epochs, the accuracy becomes maximized and loss is minimized.

Now we use the ANN to predict on the test data and format into a new matrix with predicted probability.

# Predicting for the test data

pred_class <- predict_classes(object = model2,
                                x = as.matrix(X_test)) %>%
                                as.vector()

# Predicted class probability
pred_prob <- predict_proba(object = model2,
                           x = as.matrix(X_test)) %>%
  as.vector()

Some post processing to gather the results into one tibble

# Format test data and prediction vector
predict_value <- tibble(
  truth = as.factor(unlist(Y_test)) %>% fct_recode(Yes = '1', No = '0') ,
  estimate = as.factor(pred_class) %>% fct_recode(Yes = '1', No = '0'),
  pred_prob = pred_prob
)

print(predict_value)
## # A tibble: 56,962 x 3
##    truth estimate            pred_prob
##    <fct> <fct>                   <dbl>
##  1 No    No       0.000252            
##  2 No    No       0.000000000134      
##  3 No    No       0.000000124         
##  4 No    No       0.0000000683        
##  5 No    No       0.0000000556        
##  6 No    No       0.000000000000000517
##  7 No    No       0.00000000954       
##  8 No    No       0.0000000000000114  
##  9 No    No       0.00000165          
## 10 No    No       0.000000331         
## # ... with 56,952 more rows

Model Comparison

Now to compare to the logistic regression model from earlier, lets create a confusion matrix.

# Confusion Matrix

conf_matrix_ann <- confusionMatrix(reference = predict_value$truth, data = predict_value$estimate)
conf_matrix_ann
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    No   Yes
##        No  56857    21
##        Yes    13    71
##                                           
##                Accuracy : 0.9994          
##                  95% CI : (0.9992, 0.9996)
##     No Information Rate : 0.9984          
##     P-Value [Acc > NIR] : 3.363e-12       
##                                           
##                   Kappa : 0.8065          
##  Mcnemar's Test P-Value : 0.2299          
##                                           
##             Sensitivity : 0.9998          
##             Specificity : 0.7717          
##          Pos Pred Value : 0.9996          
##          Neg Pred Value : 0.8452          
##              Prevalence : 0.9984          
##          Detection Rate : 0.9982          
##    Detection Prevalence : 0.9985          
##       Balanced Accuracy : 0.8858          
##                                           
##        'Positive' Class : No              
## 

BY this confusion matrix, the ANN correctly predicted 75 fraud occurences, 17 false positives and 17 false negatives. THis is good for a Sensitivity of .9997 and a Specificisty of .8152.

Lets compare the two models two back to back

ANN

# ANN
conf_matrix_ann$table
##           Reference
## Prediction    No   Yes
##        No  56857    21
##        Yes    13    71
conf_matrix_ann$byClass[1]
## Sensitivity 
##   0.9997714
conf_matrix_ann$byClass[2]
## Specificity 
##   0.7717391
predict_value %>% metrics(truth, estimate)
## # A tibble: 1 x 1
##   accuracy
##      <dbl>
## 1    0.999

GLM

# GLM
conf_matrix_glm$table
##           Reference
## Prediction     0     1
##          0 56862    33
##          1     8    59
conf_matrix_glm$byClass[1]
## Sensitivity 
##   0.9998593
conf_matrix_glm$byClass[2]
## Specificity 
##   0.6413043
conf_matrix_glm$overall[1]
##  Accuracy 
## 0.9992802

Due to the high number of negatives in the testing dataset, both models have a very high Sensitivity. Though it should be noted that the GLM model produced less than half as many true positives.

The artificial Neural Network produced a significantly better Specificity. This means that the ANN was much better at picking up on fraudulent transactions though it is more likely to produce a false positive.

These two findings are reflected in the accuracy for both models. Once again due to the high number of negatives in the testing set, both values are very close to 1 though the acuracy for the ANN is slightly higher (1.22910^{-4} greater).

Conclusion

This is a difficult dataset to predict because of the extreme imbalance in positive and negative transactions. However, of the two models created here, the better model is the Artificial Neural Network. This is because even though it produces a higher number of false positives on non-fraudulent transactions, it is much better at picking up on true fraudulent activity.