Objective

Predict cheaters using the Affairs data from the AER package, using the classification algorithm from the rpart package.

Data Prep

The Tree

Two Interpretations from the Tree

  1. The most likely cheaters:
  1. Unlikely cheater

Variable Importance

Interesting how gender is perceived by many as a predictor for cheating. However, the data shows that it is not a good predictor.

Model Accuracy

Using the validation data, it is observed that the model is able to:

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  4445 
## 
##  
##                  | val$left 
## val$predicted_rf |         0 |         1 | Row Total | 
## -----------------|-----------|-----------|-----------|
##                0 |      3393 |        29 |      3422 | 
##                  |   228.713 |   746.005 |           | 
##                  |     0.992 |     0.008 |     0.770 | 
##                  |     0.997 |     0.028 |           | 
##                  |     0.763 |     0.007 |           | 
## -----------------|-----------|-----------|-----------|
##                1 |         9 |      1014 |      1023 | 
##                  |   765.061 |  2495.434 |           | 
##                  |     0.009 |     0.991 |     0.230 | 
##                  |     0.003 |     0.972 |           | 
##                  |     0.002 |     0.228 |           | 
## -----------------|-----------|-----------|-----------|
##     Column Total |      3402 |      1043 |      4445 | 
##                  |     0.765 |     0.235 |           | 
## -----------------|-----------|-----------|-----------|
## 
##