Credit Card Fraud Detection Using Neural Nets
Abstract
This research paper is to explore the functionality of neural nets by applying its functionality to Credit Card Fraud Detection. Using data of continuous values that depict the attributes of credit card pictures, we will train two different neural net models to predict which credit cards are fraudulent.
Challenges
In this project there were a few challenges:
The data is massive, about 149 MB large, 284,807 observations of 31 variables
The data is imbalanced, the fraudulent records only make up for 0.172% of all transactions.
The neuralnet library is easier to use then the keras package. Although there are a lot of helpful tutorials of keras out there, just understanding its use can be daunting at first.
Data Wrangling
Import the Data
Data Sources
The first data source was initially sampled to a smaller size and hosted on GitHub as shown below.
However, because of the challenges of the data especially the extreme imbalance, the results using neural nets were insignificant or any other model predictions I tried.
Therefore, another data source was considered in order to make the entire data set available to optimize the results.
The method to get that to work was as follows:
From the json_data we derive a list of resources from our hosting server
We iterate using a for loop to find the type of resource named “derived/csv”
From there we can extract the URL and read the source into memory
json_file <- 'https://datahub.io/machine-learning/creditcard/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
for(i in 1:length(json_data$resources$datahub$type)){
if(json_data$resources$datahub$type[i]=='derived/csv'){
path_to_file = json_data$resources$path[i]
creditcard <- read.csv(url(path_to_file))
}
}Using the str (short for structure) function, we review the data features. The data below is broken down as follows:
The Time column displays how long in seconds had elapsed from this transaction and the first transaction
Columns V1 - V28 are PCA Dimensional reduction values used to describe qualities of the CC while protecting the users’ identities
The Amount column is the transaction amount
The Class column displays 1 for fraudulent transactions and 0 for valid transactions
## 'data.frame': 284807 obs. of 31 variables:
## $ Time : num 0 0 1 1 2 2 4 7 7 9 ...
## $ V1 : num -1.36 1.192 -1.358 -0.966 -1.158 ...
## $ V2 : num -0.0728 0.2662 -1.3402 -0.1852 0.8777 ...
## $ V3 : num 2.536 0.166 1.773 1.793 1.549 ...
## $ V4 : num 1.378 0.448 0.38 -0.863 0.403 ...
## $ V5 : num -0.3383 0.06 -0.5032 -0.0103 -0.4072 ...
## $ V6 : num 0.4624 -0.0824 1.8005 1.2472 0.0959 ...
## $ V7 : num 0.2396 -0.0788 0.7915 0.2376 0.5929 ...
## $ V8 : num 0.0987 0.0851 0.2477 0.3774 -0.2705 ...
## $ V9 : num 0.364 -0.255 -1.515 -1.387 0.818 ...
## $ V10 : num 0.0908 -0.167 0.2076 -0.055 0.7531 ...
## $ V11 : num -0.552 1.613 0.625 -0.226 -0.823 ...
## $ V12 : num -0.6178 1.0652 0.0661 0.1782 0.5382 ...
## $ V13 : num -0.991 0.489 0.717 0.508 1.346 ...
## $ V14 : num -0.311 -0.144 -0.166 -0.288 -1.12 ...
## $ V15 : num 1.468 0.636 2.346 -0.631 0.175 ...
## $ V16 : num -0.47 0.464 -2.89 -1.06 -0.451 ...
## $ V17 : num 0.208 -0.115 1.11 -0.684 -0.237 ...
## $ V18 : num 0.0258 -0.1834 -0.1214 1.9658 -0.0382 ...
## $ V19 : num 0.404 -0.146 -2.262 -1.233 0.803 ...
## $ V20 : num 0.2514 -0.0691 0.525 -0.208 0.4085 ...
## $ V21 : num -0.01831 -0.22578 0.248 -0.1083 -0.00943 ...
## $ V22 : num 0.27784 -0.63867 0.77168 0.00527 0.79828 ...
## $ V23 : num -0.11 0.101 0.909 -0.19 -0.137 ...
## $ V24 : num 0.0669 -0.3398 -0.6893 -1.1756 0.1413 ...
## $ V25 : num 0.129 0.167 -0.328 0.647 -0.206 ...
## $ V26 : num -0.189 0.126 -0.139 -0.222 0.502 ...
## $ V27 : num 0.13356 -0.00898 -0.05535 0.06272 0.21942 ...
## $ V28 : num -0.0211 0.0147 -0.0598 0.0615 0.2152 ...
## $ Amount: num 149.62 2.69 378.66 123.5 69.99 ...
## $ Class : int 0 0 0 0 0 0 0 0 0 0 ...
Data Prep
Using the createDataPartition function from caret, we will sample our data into test and training sets:
Take a 70% sample by setting the p argument (for percentage) to .7, set the list argument to false so we get a matrix instead, and set the y value to determine the classification on which to hold the integrity of the sampling to.
Assign the training data to the training variable by sub-setting based on the index
Assign the testing data to the testing variable by anitjoining the index using a negative sign in the subset
Neural Nets Using BP (back propagation)
Model Training
Below we will be Using the neuralnet package. The following are the steps we begin with to model the data:
Normalize all independent variables using the scale function to ensure the mins and maxes don’t interfere with the performance of the neural net model.
Since this is a classification, we set the linear.output argument to false. For the hidden level we want to set the number of neurons to 5 with 3 layers (mostly arbitrary on my part based on recommended settings)
Lastly, in the formula we let the model know our outcome feature is Class and to use the rest of the features as predictors
creditcart.training.two <- creditcard.training %>% mutate_at(c(1:30), funs(c(scale(.))))
nn_model <- neuralnet(Class ~ ., data = creditcart.training.two, hidden = c(5,3), linear.output = F)To show the plot in this presentation we must set the rep argument to best in order to show the iteration with the smallest error, if not by default it will show in a separate window.
We can see below the black lines representing the connections between each layer the weights outputted to each connection. The blue line display the bias created for the model, which acts as a sort of intercept for the model.
Model Testing
Now that our model has been trained we scale the test data and begin predictions:
- Use the compute function of the neuralnet library, we ensure to make an explicit reference using the double colon since :: since dplyr shares the same named function
creditcart.test.two <- creditcard.test %>% mutate_at(c(1:30), funs(c(scale(.))))
predicted.nn.values <- neuralnet::compute(nn_model, creditcart.test.two)- After compute finishes its predictions, we use sapply to leverage the round function over the results, since neural nets predictions are in percentages
Below you can see what the results look before rounding:
## [,1]
## [1,] 9.351082e-14
## [2,] 9.290211e-14
## [3,] 9.288909e-14
## [4,] 9.296489e-14
## [5,] 9.604899e-14
## [6,] 9.427844e-14
After the rounding the data appears as follows:
## [1] 0 0 0 0 0 0
- Finally, we create a table with the predictions and the original test data to show a confusion matrix of the results. We can see the results in the main diagonal (from top left to bottom right) representing our correct True Positives and True Negatives, While the opposing diagonal (from top right to bottom left) represents our errors for False Positives and False Negatives
##
## predictions 0 1
## 0 85263 22
## 1 38 119
Keras Neural Net Model (Rectified Linear Unit)
We decided to add the popular keras library to the research to see how well it performed. The popular package runs on top of tensor flow, one of the goals being that you only need to pass a graph object since it is language independent, and hence decoupled from your choice of platform modeling. Although it has many options to tinker with, sometimes that complexity can be counter productive if you make one mistake in the configuration.
Model Training
For the training models we use our previous prepared training data sets and transform them using the as.matrix function:
X_train <- creditcard.training %>% select(-Class) %>% scale()
y_train <- to_categorical(creditcard.training$Class)
X_test <- creditcard.test %>% select(-Class) %>% scale()
y_test <- to_categorical(creditcard.test$Class)For the actual model we have to set some configurations before the actual training:
The activation argument defines the algorithm for the neuron outputs of that neuron given a set of inputs.
The relu value we set for the activation arguments stands for Rectified Linear Unit. The algorithm transforms inputs to zero or the input itself. If the number is greater than 0, it will just output the given value. If the number is less than or equal to zero, it will transform the number to zero. The idea being that the more positive the value, the more activated the neuron will be.
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = ncol(X_train)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 2, activation = 'sigmoid')We set the epochs argument to how many full pass overs we want to make through the nodes of the network using all of the training data. That value was conservative due to the size of the data and machine power. The validation split used per training was set to .2 (largely since validation size usually matches training set in the industry):
history <- model %>% compile(loss = 'binary_crossentropy', optimizer = 'adam',
metrics = c('accuracy'))
model %>% fit(X_train, y_train, epochs = 2, batch_size = 5,
validation_split = 0.2)History of Keras Training for Error
History of Keras Training for Accuracy
Below the model has been established, we can see the dense layers and the model’s output shape.
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense (Dense) (None, 256) 7936
## ________________________________________________________________________________
## dropout (Dropout) (None, 256) 0
## ________________________________________________________________________________
## dense_1 (Dense) (None, 128) 32896
## ________________________________________________________________________________
## dropout_1 (Dropout) (None, 128) 0
## ________________________________________________________________________________
## dense_2 (Dense) (None, 2) 258
## ================================================================================
## Total params: 41,090
## Trainable params: 41,090
## Non-trainable params: 0
## ________________________________________________________________________________
Keras Model Testing
We pass the testing data set for the model to predict using the function predict_classes of Keras:
Below we create a confusion matrix for the Keras predictions using the table function, therefore we cast the dependent variable as a factor using the factor function, setting the levels argument to our two labels of fraudulent or non-fraudulent.
table(factor(predictions,
levels=min(creditcard.test$Class):max(creditcard.test$Class)),
factor(creditcard.test$Class,
levels=min(creditcard.test$Class):max(creditcard.test$Class)))##
## 0 1
## 0 85267 28
## 1 34 113
Conclusion
We can see from the confusion matrix for both neural net models are very accurate for predicting non-fraudulent transactions, on average having only an error rate around .02%. Test runs also showed a viable accuracy rate for predicting fraudulent transactions, on average having an error rate around 20%.
Most fraudulent detection systems use models that are only accurate 60-70% of the time for flagging a transaction as fraudulent, an acceptable risk of alerting to avoid the cost of an actual crime of identity theft.
In conclusion there is a lot of potential to use Neural Net models as shown by this research, the only caveat being the understanding on how to best use the models and for what solutions.