Background

A national veterans’ organization wishes to develop a predictive model to improve the cost-effectiveness of their direct marketing campaign. The organization, with its in-house database of over 13 million donors, is one of the largest direct-mail fundraisers in the United States. According to their recent mailing records, the overall response rate is 5.1%. Out of those who responded (donated), the average donation is $13.00. Each mailing, which includes a gift of personalized address labels and assortments of cards and envelopes, costs $0.68 to produce and send. Using these facts, we take a sample of this dataset to develop a classification model that can effectively capture donors so that the expected net profit is maximized. Weighted sampling was used, under-representing the non-responders so that the sample has equal numbers of donors and non-donors.

Business Objectives and Goals

The goal is to improve the cost-effectiveness of the veterans’ organization direct marketing campaign. The objective of this effort is to develop a classification model that can effectively capture donors so that the expected net profit is maximized.

Data

For this project, we were given with a data sample. The dataset was already generated with weighted sampling as the original dataset/population has heavy non-responders. The sample given has almost equal number of donors and non-donors.

fund <- readRDS("fundraising.rds")
futu_fund <- readRDS("future_fundraising.rds")
set.seed(12345)
str(fund)
## tibble [3,000 x 21] (S3: tbl_df/tbl/data.frame)
##  $ zipconvert2        : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 2 1 2 ...
##  $ zipconvert3        : Factor w/ 2 levels "Yes","No": 2 2 2 1 1 2 2 2 2 2 ...
##  $ zipconvert4        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 1 ...
##  $ zipconvert5        : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 2 1 1 2 1 ...
##  $ homeowner          : Factor w/ 2 levels "Yes","No": 1 2 1 1 1 1 1 1 1 1 ...
##  $ num_child          : num [1:3000] 1 2 1 1 1 1 1 1 1 1 ...
##  $ income             : num [1:3000] 1 5 3 4 4 4 4 4 4 1 ...
##  $ female             : Factor w/ 2 levels "Yes","No": 2 1 2 2 1 1 2 1 1 1 ...
##  $ wealth             : num [1:3000] 7 8 4 8 8 8 5 8 8 5 ...
##  $ home_value         : num [1:3000] 698 828 1471 547 482 ...
##  $ med_fam_inc        : num [1:3000] 422 358 484 386 242 450 333 458 541 203 ...
##  $ avg_fam_inc        : num [1:3000] 463 376 546 432 275 498 388 533 575 271 ...
##  $ pct_lt15k          : num [1:3000] 4 13 4 7 28 5 16 8 11 39 ...
##  $ num_prom           : num [1:3000] 46 32 94 20 38 47 51 21 66 73 ...
##  $ lifetime_gifts     : num [1:3000] 94 30 177 23 73 139 63 26 108 161 ...
##  $ largest_gift       : num [1:3000] 12 10 10 11 10 20 15 16 12 6 ...
##  $ last_gift          : num [1:3000] 12 5 8 11 10 20 10 16 7 3 ...
##  $ months_since_donate: num [1:3000] 34 29 30 30 31 37 37 30 31 32 ...
##  $ time_lag           : num [1:3000] 6 7 3 6 3 3 8 6 1 7 ...
##  $ avg_gift           : num [1:3000] 9.4 4.29 7.08 7.67 7.3 ...
##  $ target             : Factor w/ 2 levels "Donor","No Donor": 1 1 2 2 1 1 1 2 1 1 ...
str(futu_fund)
## tibble [120 x 20] (S3: tbl_df/tbl/data.frame)
##  $ zipconvert2        : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
##  $ zipconvert3        : Factor w/ 2 levels "Yes","No": 1 2 2 2 1 2 1 2 1 1 ...
##  $ zipconvert4        : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ zipconvert5        : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
##  $ homeowner          : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 2 2 ...
##  $ num_child          : num [1:120] 1 1 1 1 1 1 1 1 1 1 ...
##  $ income             : num [1:120] 5 1 4 4 2 4 2 3 4 2 ...
##  $ female             : Factor w/ 2 levels "Yes","No": 1 2 1 2 1 1 1 2 2 2 ...
##  $ wealth             : num [1:120] 9 7 1 8 7 8 1 8 3 5 ...
##  $ home_value         : num [1:120] 1399 1355 835 1019 992 ...
##  $ med_fam_inc        : num [1:120] 637 411 310 389 524 371 209 253 302 335 ...
##  $ avg_fam_inc        : num [1:120] 703 497 364 473 563 408 259 285 324 348 ...
##  $ pct_lt15k          : num [1:120] 1 9 22 15 6 10 36 25 19 14 ...
##  $ num_prom           : num [1:120] 74 77 70 21 63 35 72 68 55 59 ...
##  $ lifetime_gifts     : num [1:120] 102 249 126 26 100 92 146 98 66 276 ...
##  $ largest_gift       : num [1:120] 6 15 6 16 20 37 12 5 7 15 ...
##  $ last_gift          : num [1:120] 5 7 6 16 3 37 11 3 5 13 ...
##  $ months_since_donate: num [1:120] 29 35 34 37 21 37 36 32 30 33 ...
##  $ time_lag           : num [1:120] 3 3 8 5 6 5 5 9 9 10 ...
##  $ avg_gift           : num [1:120] 4.86 9.58 4.34 13 7.69 ...
table(fund$target)
## 
##    Donor No Donor 
##     1499     1501
(100*1499)/3000
## [1] 49.96667

The training dataset has 3000 observations and the test dataset has 120 observations. The sample used is weighted 50:50, with 1499 Donors and 1501 Non- Donors, which is gives tells us that 49.97% of the observation are doners.

Summary

summary(fund)
##  zipconvert2 zipconvert3 zipconvert4 zipconvert5 homeowner    num_child    
##  No :2352    Yes: 551    No :2357    No :1846    Yes:2312   Min.   :1.000  
##  Yes: 648    No :2449    Yes: 643    Yes:1154    No : 688   1st Qu.:1.000  
##                                                             Median :1.000  
##                                                             Mean   :1.069  
##                                                             3rd Qu.:1.000  
##                                                             Max.   :5.000  
##      income      female         wealth        home_value      med_fam_inc    
##  Min.   :1.000   Yes:1831   Min.   :0.000   Min.   :   0.0   Min.   :   0.0  
##  1st Qu.:3.000   No :1169   1st Qu.:5.000   1st Qu.: 554.8   1st Qu.: 278.0  
##  Median :4.000              Median :8.000   Median : 816.5   Median : 355.0  
##  Mean   :3.899              Mean   :6.396   Mean   :1143.3   Mean   : 388.4  
##  3rd Qu.:5.000              3rd Qu.:8.000   3rd Qu.:1341.2   3rd Qu.: 465.0  
##  Max.   :7.000              Max.   :9.000   Max.   :5945.0   Max.   :1500.0  
##   avg_fam_inc       pct_lt15k        num_prom      lifetime_gifts  
##  Min.   :   0.0   Min.   : 0.00   Min.   : 11.00   Min.   :  15.0  
##  1st Qu.: 318.0   1st Qu.: 5.00   1st Qu.: 29.00   1st Qu.:  45.0  
##  Median : 396.0   Median :12.00   Median : 48.00   Median :  81.0  
##  Mean   : 432.3   Mean   :14.71   Mean   : 49.14   Mean   : 110.7  
##  3rd Qu.: 516.0   3rd Qu.:21.00   3rd Qu.: 65.00   3rd Qu.: 135.0  
##  Max.   :1331.0   Max.   :90.00   Max.   :157.00   Max.   :5674.9  
##   largest_gift       last_gift      months_since_donate    time_lag     
##  Min.   :   5.00   Min.   :  0.00   Min.   :17.00       Min.   : 0.000  
##  1st Qu.:  10.00   1st Qu.:  7.00   1st Qu.:29.00       1st Qu.: 3.000  
##  Median :  15.00   Median : 10.00   Median :31.00       Median : 5.000  
##  Mean   :  16.65   Mean   : 13.48   Mean   :31.13       Mean   : 6.876  
##  3rd Qu.:  20.00   3rd Qu.: 16.00   3rd Qu.:34.00       3rd Qu.: 9.000  
##  Max.   :1000.00   Max.   :219.00   Max.   :37.00       Max.   :77.000  
##     avg_gift            target    
##  Min.   :  2.139   Donor   :1499  
##  1st Qu.:  6.333   No Donor:1501  
##  Median :  9.000                  
##  Mean   : 10.669                  
##  3rd Qu.: 12.800                  
##  Max.   :122.167

Due to some very big and very small values for min and max, it is highly likely that there are outliers in coloumns home_value, med_fam_inc, avg_fam_inc, pct_lt15k, num_prom, lifetime_gifts, largest_gift, last_gift, time_lag and avg_gift.

corr <- (fund[,c(6:7,9:21)])
corr$target <- as.numeric(corr$target)
chart.Correlation(corr, histogram=FALSE, pch=19)

vif(as.data.frame(corr))
##              Variables       VIF
## 1            num_child  1.027086
## 2               income  1.198361
## 3               wealth  1.509487
## 4           home_value  2.495523
## 5          med_fam_inc 18.433712
## 6          avg_fam_inc 20.709328
## 7            pct_lt15k  2.040823
## 8             num_prom  1.964372
## 9       lifetime_gifts  1.994304
## 10        largest_gift  1.715450
## 11           last_gift  4.155421
## 12 months_since_donate  1.159323
## 13            time_lag  1.032709
## 14            avg_gift  4.470251
## 15              target  1.030083

From the correlation chart, we can see that there variables that highly correlate with other variable. Both med_fam_inc and avg_med_fam are correlated to income. We also have two sets of variables that are colinear, med_fam_inc with avg_fam_inc and last_gift with avg_gift.

Exclusion

There are no exclusions.

Variable Transformation

A transformation that would work here would be square root and then to use logistic transformation on the non-zero values. This would make them not zero.

Partitioning ( 80/20 Split and Cross Validation)

trainIndex <- createDataPartition(fund$target,p=.8,list=FALSE)
data_train <- fund[trainIndex,]
data_test <- fund[-trainIndex,]
nrow(data_train)
## [1] 2401
train_control <- trainControl(method="repeatedcv",number=10,repeats=3)

We did two different methods and it seems that doing cross validation was the best option because since the output sample is small.

Model Fit

glm_fit <- glm(target~., data = fund, family = "binomial")
summary(glm_fit) 
## 
## Call:
## glm(formula = target ~ ., family = "binomial", data = fund)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.90432  -1.15349   0.00153   1.15919   1.79778  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.885e+00  4.595e-01  -4.102 4.10e-05 ***
## zipconvert2Yes      -1.365e+01  2.670e+02  -0.051  0.95924    
## zipconvert3No        1.361e+01  2.670e+02   0.051  0.95934    
## zipconvert4Yes      -1.365e+01  2.670e+02  -0.051  0.95922    
## zipconvert5Yes      -1.365e+01  2.670e+02  -0.051  0.95922    
## homeownerNo          4.957e-02  9.412e-02   0.527  0.59847    
## num_child            2.752e-01  1.137e-01   2.422  0.01544 *  
## income              -6.952e-02  2.595e-02  -2.679  0.00738 ** 
## femaleNo             5.995e-02  7.673e-02   0.781  0.43463    
## wealth              -1.907e-02  1.800e-02  -1.059  0.28940    
## home_value          -1.074e-04  7.141e-05  -1.503  0.13272    
## med_fam_inc         -1.200e-03  9.303e-04  -1.289  0.19725    
## avg_fam_inc          1.756e-03  1.010e-03   1.738  0.08226 .  
## pct_lt15k           -9.519e-04  4.440e-03  -0.214  0.83024    
## num_prom            -3.682e-03  2.317e-03  -1.589  0.11204    
## lifetime_gifts       1.599e-04  3.721e-04   0.430  0.66743    
## largest_gift        -1.773e-03  3.091e-03  -0.574  0.56629    
## last_gift            9.923e-03  7.562e-03   1.312  0.18945    
## months_since_donate  5.922e-02  1.003e-02   5.906 3.51e-09 ***
## time_lag            -6.174e-03  6.789e-03  -0.909  0.36311    
## avg_gift             7.539e-03  1.106e-02   0.682  0.49526    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4158.9  on 2999  degrees of freedom
## Residual deviance: 4062.0  on 2979  degrees of freedom
## AIC: 4104
## 
## Number of Fisher Scoring iterations: 12
coef(glm_fit) 
##         (Intercept)      zipconvert2Yes       zipconvert3No      zipconvert4Yes 
##       -1.884773e+00       -1.364510e+01        1.361190e+01       -1.365500e+01 
##      zipconvert5Yes         homeownerNo           num_child              income 
##       -1.365258e+01        4.956599e-02        2.752422e-01       -6.952306e-02 
##            femaleNo              wealth          home_value         med_fam_inc 
##        5.994674e-02       -1.907373e-02       -1.073676e-04       -1.199496e-03 
##         avg_fam_inc           pct_lt15k            num_prom      lifetime_gifts 
##        1.755584e-03       -9.519183e-04       -3.681972e-03        1.598657e-04 
##        largest_gift           last_gift months_since_donate            time_lag 
##       -1.772667e-03        9.923322e-03        5.921548e-02       -6.174095e-03 
##            avg_gift 
##        7.539422e-03
summary(glm_fit)$coef 
##                          Estimate   Std. Error     z value     Pr(>|z|)
## (Intercept)         -1.884773e+00 4.595006e-01 -4.10178679 4.099720e-05
## zipconvert2Yes      -1.364510e+01 2.670209e+02 -0.05110124 9.592448e-01
## zipconvert3No        1.361190e+01 2.670209e+02  0.05097692 9.593439e-01
## zipconvert4Yes      -1.365500e+01 2.670209e+02 -0.05113831 9.592153e-01
## zipconvert5Yes      -1.365258e+01 2.670209e+02 -0.05112926 9.592225e-01
## homeownerNo          4.956599e-02 9.412356e-02  0.52660550 5.984676e-01
## num_child            2.752422e-01 1.136519e-01  2.42180072 1.544382e-02
## income              -6.952306e-02 2.595057e-02 -2.67905680 7.382987e-03
## femaleNo             5.994674e-02 7.672748e-02  0.78129420 4.346295e-01
## wealth              -1.907373e-02 1.800369e-02 -1.05943418 2.894021e-01
## home_value          -1.073676e-04 7.141352e-05 -1.50346346 1.327196e-01
## med_fam_inc         -1.199496e-03 9.302622e-04 -1.28941665 1.972533e-01
## avg_fam_inc          1.755584e-03 1.010277e-03  1.73772557 8.225918e-02
## pct_lt15k           -9.519183e-04 4.440127e-03 -0.21438986 8.302430e-01
## num_prom            -3.681972e-03 2.317003e-03 -1.58911007 1.120355e-01
## lifetime_gifts       1.598657e-04 3.720630e-04  0.42967362 6.674331e-01
## largest_gift        -1.772667e-03 3.090851e-03 -0.57352069 5.662922e-01
## last_gift            9.923322e-03 7.562225e-03  1.31222251 1.894451e-01
## months_since_donate  5.921548e-02 1.002632e-02  5.90600490 3.505036e-09
## time_lag            -6.174095e-03 6.788749e-03 -0.90945987 3.631074e-01
## avg_gift             7.539422e-03 1.105534e-02  0.68197131 4.952571e-01
glm_resp <- predict(glm_fit, type = "response")
contrasts(fund$target)
##          No Donor
## Donor           0
## No Donor        1
glm_dona <- rep("Donor", 3000)
glm_dona[glm_resp>0.5] = "No Donor"
table(glm_dona, fund$target)
##           
## glm_dona   Donor No Donor
##   Donor      870      672
##   No Donor   629      829
mean(glm_dona==fund$target) 
## [1] 0.5663333

A logistic regression was used to find if any significant variables were present where p < .05.

Model Performance and Validation Results

For the following models we will be using the variables: ‘avg_gift,’ ‘lifetime_gifts,’ ‘num_child,’ ‘income,’ ‘home_value,’ ‘months_since_donate,’ and ‘time_lag.’ There were 3 variables excluded because of their collinearity to variables already going to be used. Those 3 are med_fam_inc and pct_lt15k which are both colinear with income. The variable last_gift was also excluded for being colinear with avg_gift.

KNN

knn.fit <- train(target~months_since_donate+largest_gift+avg_gift+num_child+income+home_value,
                 data=fund,
                 method='knn',
                 trControl = train_control,
                 tuneLength=20)
futu_fund.value=predict(knn.fit, futu_fund)
Value=c("value",as.character(futu_fund.value))
write.csv(Value,file="value_knn1.csv")
knn.fit
## k-Nearest Neighbors 
## 
## 3000 samples
##    6 predictor
##    2 classes: 'Donor', 'No Donor' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 2700, 2700, 2700, 2701, 2700, 2700, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa     
##    5  0.5328960  0.06576573
##    7  0.5257904  0.05157043
##    9  0.5226712  0.04533218
##   11  0.5252215  0.05041233
##   13  0.5282194  0.05639490
##   15  0.5293268  0.05862558
##   17  0.5253297  0.05062834
##   19  0.5259890  0.05194578
##   21  0.5247682  0.04950358
##   23  0.5241119  0.04819404
##   25  0.5257775  0.05151753
##   27  0.5282212  0.05639668
##   29  0.5269993  0.05395816
##   31  0.5272256  0.05441351
##   33  0.5262282  0.05243475
##   35  0.5258971  0.05177788
##   37  0.5277849  0.05555772
##   39  0.5262275  0.05244058
##   41  0.5258893  0.05176058
##   43  0.5233267  0.04664669
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.

The best value for the KNN model was k=5 which gives us an accuracy of 53.19%.

GLM

glm.fit.full <- train(target~.,data=data_train,method='glm',trControl = train_control)
summary(glm.fit.full)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.89857  -1.15445   0.00135   1.15097   1.84663  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -2.112e+00  5.170e-01  -4.086 4.39e-05 ***
## zipconvert2Yes      -1.362e+01  3.054e+02  -0.045  0.96444    
## zipconvert3No        1.367e+01  3.054e+02   0.045  0.96431    
## zipconvert4Yes      -1.371e+01  3.054e+02  -0.045  0.96419    
## zipconvert5Yes      -1.371e+01  3.054e+02  -0.045  0.96420    
## homeownerNo          6.876e-02  1.056e-01   0.651  0.51506    
## num_child            2.681e-01  1.262e-01   2.124  0.03369 *  
## income              -8.308e-02  2.928e-02  -2.838  0.00454 ** 
## femaleNo             2.470e-02  8.626e-02   0.286  0.77463    
## wealth              -1.669e-02  2.025e-02  -0.824  0.40982    
## home_value          -1.657e-04  7.967e-05  -2.080  0.03750 *  
## med_fam_inc         -1.060e-03  1.019e-03  -1.040  0.29824    
## avg_fam_inc          1.683e-03  1.120e-03   1.502  0.13303    
## pct_lt15k           -1.977e-03  4.978e-03  -0.397  0.69128    
## num_prom            -3.489e-03  2.658e-03  -1.313  0.18933    
## lifetime_gifts      -9.855e-05  5.266e-04  -0.187  0.85153    
## largest_gift         3.182e-03  7.741e-03   0.411  0.68100    
## last_gift            9.720e-03  8.964e-03   1.084  0.27819    
## months_since_donate  6.886e-02  1.138e-02   6.053 1.42e-09 ***
## time_lag            -1.584e-03  7.767e-03  -0.204  0.83840    
## avg_gift             5.146e-04  1.240e-02   0.042  0.96690    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3328.5  on 2400  degrees of freedom
## Residual deviance: 3233.3  on 2380  degrees of freedom
## AIC: 3275.3
## 
## Number of Fisher Scoring iterations: 12
pred.glm.full<-predict(glm.fit.full,data_test)
confusionMatrix(pred.glm.full,data_test$target)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Donor No Donor
##   Donor      174      147
##   No Donor   125      153
##                                           
##                Accuracy : 0.5459          
##                  95% CI : (0.5051, 0.5863)
##     No Information Rate : 0.5008          
##     P-Value [Acc > NIR] : 0.01513         
##                                           
##                   Kappa : 0.0919          
##                                           
##  Mcnemar's Test P-Value : 0.20291         
##                                           
##             Sensitivity : 0.5819          
##             Specificity : 0.5100          
##          Pos Pred Value : 0.5421          
##          Neg Pred Value : 0.5504          
##              Prevalence : 0.4992          
##          Detection Rate : 0.2905          
##    Detection Prevalence : 0.5359          
##       Balanced Accuracy : 0.5460          
##                                           
##        'Positive' Class : Donor           
## 
futu_fund.value=predict(glm.fit.full, futu_fund)
Value=c("value",as.character(futu_fund.value))
write.csv(Value,file="value_glm_full.csv")

This model gives us a prediction accuracy of 54.26%

glm.fit.imp <- train(target~months_since_donate+largest_gift+avg_gift+num_child+income+home_value,
        data=data_train,
        method='glm',
        trControl = train_control)
summary(glm.fit.imp) 
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7095  -1.1534   0.5867   1.1576   1.8228  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -2.397e+00  3.719e-01  -6.446 1.15e-10 ***
## months_since_donate  7.397e-02  1.094e-02   6.762 1.36e-11 ***
## largest_gift         1.958e-03  4.699e-03   0.417  0.67697    
## avg_gift             1.385e-02  8.992e-03   1.540  0.12365    
## num_child            2.988e-01  1.248e-01   2.395  0.01663 *  
## income              -7.892e-02  2.687e-02  -2.937  0.00331 ** 
## home_value          -8.916e-05  4.616e-05  -1.931  0.05344 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3328.5  on 2400  degrees of freedom
## Residual deviance: 3248.0  on 2394  degrees of freedom
## AIC: 3262
## 
## Number of Fisher Scoring iterations: 4
pred.glm.imp<-predict(glm.fit.imp,data_test)
confusionMatrix(pred.glm.imp,data_test$target)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Donor No Donor
##   Donor      173      145
##   No Donor   126      155
##                                          
##                Accuracy : 0.5476         
##                  95% CI : (0.5067, 0.588)
##     No Information Rate : 0.5008         
##     P-Value [Acc > NIR] : 0.01227        
##                                          
##                   Kappa : 0.0953         
##                                          
##  Mcnemar's Test P-Value : 0.27421        
##                                          
##             Sensitivity : 0.5786         
##             Specificity : 0.5167         
##          Pos Pred Value : 0.5440         
##          Neg Pred Value : 0.5516         
##              Prevalence : 0.4992         
##          Detection Rate : 0.2888         
##    Detection Prevalence : 0.5309         
##       Balanced Accuracy : 0.5476         
##                                          
##        'Positive' Class : Donor          
## 
futu_fund.value=predict(glm.fit.imp, futu_fund)
Value=c("value",as.character(futu_fund.value))
write.csv(Value,file="value_glm_imp.csv")

This model gives us a prediction accuracy of 57.76%

The model that gave us the best prediction accuracy was the GLM fit-train with an accuracy of 57.76.

Cut-off Analysis

I used a cut-off value of .5. This is because the data was creaed using weighted sampling.

Recommendation

For us to have gotten a better accuracy, we could have used a bigger sample size. Also, with the variables num_child and income, could have been altered since they were more of categorical values rather than continuous