library(tidyverse)
library(openintro)
library(readr)
library(ISLR)
library(dplyr)
library(ggcorrplot)
library(car)
library(randomForest)
library(caret)
library(class)

Background Information

A national veterans’ organization wishes to develop a predictive model to improve the cost-effectiveness of their direct marketing campaign. The organization, with its in-house database of over 13 million donors, is one of the largest direct-mail fundraisers in the United States. According to their recent mailing records, the overall response rate is 5.1%. Out of those who responded (donated), the average donation is $13.00. Each mailing, which includes a gift of personalized address labels and assortments of cards and envelopes, costs $0.68 to produce and send. Using these facts, we take a sample of this dataset to develop a classification model that can effectively capture donors so that the expected net profit is maximized. Weighted sampling was used, under-representing the non-responders so that the sample has equal numbers of donors and non-donors.

Business Objectives and Goals

Our objective in this project is to improve the cost-effectiveness of the national veterans’ organization’s marketing campaign. Our goal is to analyze this data set and create a classification model to efficiently capture donors so the expected net profit is maximized.

Data Sources and Data Used

The fundraising file has 3,000 records, with about 50% donors and 50% non-donors. The data set is pre-generated using a weighted sample so as to get a near equal number of donors vs non-donors. We use a weighted sample because it helps to improve our fit when we’re given unknown parameters.

future.fund <- read_rds("~/future_fundraising.rds")
fund <- read_rds("~/fundraising.rds")

Exploring the Data

table(fund$target)
## 
##    Donor No Donor 
##     1499     1501

We wanted an equal split of donors to non-donors and as you can see above, it’s about 50-50 at 1499 donors and 1501 non-donors.

Now, I want to look at the structure of the data sets:

str(future.fund)
## tibble [120 x 20] (S3: tbl_df/tbl/data.frame)
##  $ zipconvert2        : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
##  $ zipconvert3        : Factor w/ 2 levels "Yes","No": 1 2 2 2 1 2 1 2 1 1 ...
##  $ zipconvert4        : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ zipconvert5        : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
##  $ homeowner          : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 2 2 ...
##  $ num_child          : num [1:120] 1 1 1 1 1 1 1 1 1 1 ...
##  $ income             : num [1:120] 5 1 4 4 2 4 2 3 4 2 ...
##  $ female             : Factor w/ 2 levels "Yes","No": 1 2 1 2 1 1 1 2 2 2 ...
##  $ wealth             : num [1:120] 9 7 1 8 7 8 1 8 3 5 ...
##  $ home_value         : num [1:120] 1399 1355 835 1019 992 ...
##  $ med_fam_inc        : num [1:120] 637 411 310 389 524 371 209 253 302 335 ...
##  $ avg_fam_inc        : num [1:120] 703 497 364 473 563 408 259 285 324 348 ...
##  $ pct_lt15k          : num [1:120] 1 9 22 15 6 10 36 25 19 14 ...
##  $ num_prom           : num [1:120] 74 77 70 21 63 35 72 68 55 59 ...
##  $ lifetime_gifts     : num [1:120] 102 249 126 26 100 92 146 98 66 276 ...
##  $ largest_gift       : num [1:120] 6 15 6 16 20 37 12 5 7 15 ...
##  $ last_gift          : num [1:120] 5 7 6 16 3 37 11 3 5 13 ...
##  $ months_since_donate: num [1:120] 29 35 34 37 21 37 36 32 30 33 ...
##  $ time_lag           : num [1:120] 3 3 8 5 6 5 5 9 9 10 ...
##  $ avg_gift           : num [1:120] 4.86 9.58 4.34 13 7.69 ...
str(fund)
## tibble [3,000 x 21] (S3: tbl_df/tbl/data.frame)
##  $ zipconvert2        : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 2 1 2 ...
##  $ zipconvert3        : Factor w/ 2 levels "Yes","No": 2 2 2 1 1 2 2 2 2 2 ...
##  $ zipconvert4        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 1 ...
##  $ zipconvert5        : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 2 1 1 2 1 ...
##  $ homeowner          : Factor w/ 2 levels "Yes","No": 1 2 1 1 1 1 1 1 1 1 ...
##  $ num_child          : num [1:3000] 1 2 1 1 1 1 1 1 1 1 ...
##  $ income             : num [1:3000] 1 5 3 4 4 4 4 4 4 1 ...
##  $ female             : Factor w/ 2 levels "Yes","No": 2 1 2 2 1 1 2 1 1 1 ...
##  $ wealth             : num [1:3000] 7 8 4 8 8 8 5 8 8 5 ...
##  $ home_value         : num [1:3000] 698 828 1471 547 482 ...
##  $ med_fam_inc        : num [1:3000] 422 358 484 386 242 450 333 458 541 203 ...
##  $ avg_fam_inc        : num [1:3000] 463 376 546 432 275 498 388 533 575 271 ...
##  $ pct_lt15k          : num [1:3000] 4 13 4 7 28 5 16 8 11 39 ...
##  $ num_prom           : num [1:3000] 46 32 94 20 38 47 51 21 66 73 ...
##  $ lifetime_gifts     : num [1:3000] 94 30 177 23 73 139 63 26 108 161 ...
##  $ largest_gift       : num [1:3000] 12 10 10 11 10 20 15 16 12 6 ...
##  $ last_gift          : num [1:3000] 12 5 8 11 10 20 10 16 7 3 ...
##  $ months_since_donate: num [1:3000] 34 29 30 30 31 37 37 30 31 32 ...
##  $ time_lag           : num [1:3000] 6 7 3 6 3 3 8 6 1 7 ...
##  $ avg_gift           : num [1:3000] 9.4 4.29 7.08 7.67 7.3 ...
##  $ target             : Factor w/ 2 levels "Donor","No Donor": 1 1 2 2 1 1 1 2 1 1 ...

Our training data set has 3000 observations and our test data set has 120.

Next, I’m going to check for correlation:

data <- (fund[,c(6:7, 9:21)])
data$target <- as.numeric(data$target)

temp <- fund[, c(6,7,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)]
correlation = cor(temp)
round(correlation, 5)
##                     num_child   income   wealth home_value med_fam_inc
## num_child             1.00000  0.09189  0.06018   -0.01196     0.04696
## income                0.09189  1.00000  0.20899    0.29197     0.36751
## wealth                0.06018  0.20899  1.00000    0.26116     0.37776
## home_value           -0.01196  0.29197  0.26116    1.00000     0.73815
## med_fam_inc           0.04696  0.36751  0.37776    0.73815     1.00000
## avg_fam_inc           0.04726  0.37859  0.38589    0.75257     0.97227
## pct_lt15k            -0.03172 -0.28319 -0.37515   -0.39909    -0.66536
## num_prom             -0.08643 -0.06901 -0.41212   -0.06451    -0.05078
## lifetime_gifts       -0.05095 -0.01957 -0.22547   -0.02407    -0.03525
## largest_gift         -0.01755  0.03318 -0.02528    0.05649     0.04703
## last_gift            -0.01295  0.10959  0.05259    0.15886     0.13598
## months_since_donate  -0.00556  0.07724  0.03371    0.02343     0.03234
## time_lag             -0.00607 -0.00155 -0.06642    0.00068     0.01520
## avg_gift             -0.01969  0.12406  0.09108    0.16877     0.13716
##                     avg_fam_inc pct_lt15k num_prom lifetime_gifts largest_gift
## num_child               0.04726  -0.03172 -0.08643       -0.05095     -0.01755
## income                  0.37859  -0.28319 -0.06901       -0.01957      0.03318
## wealth                  0.38589  -0.37515 -0.41212       -0.22547     -0.02528
## home_value              0.75257  -0.39909 -0.06451       -0.02407      0.05649
## med_fam_inc             0.97227  -0.66536 -0.05078       -0.03525      0.04703
## avg_fam_inc             1.00000  -0.68028 -0.05731       -0.04033      0.04310
## pct_lt15k              -0.68028   1.00000  0.03778        0.05962     -0.00788
## num_prom               -0.05731   0.03778  1.00000        0.53862      0.11381
## lifetime_gifts         -0.04033   0.05962  0.53862        1.00000      0.50726
## largest_gift            0.04310  -0.00788  0.11381        0.50726      1.00000
## last_gift               0.13138  -0.06175 -0.05587        0.20206      0.44724
## months_since_donate     0.03127  -0.00901 -0.28232       -0.14462      0.01979
## time_lag                0.02434  -0.01991  0.11962        0.03855      0.03998
## avg_gift                0.13176  -0.06248 -0.14725        0.18232      0.47483
##                     last_gift months_since_donate time_lag avg_gift
## num_child            -0.01295            -0.00556 -0.00607 -0.01969
## income                0.10959             0.07724 -0.00155  0.12406
## wealth                0.05259             0.03371 -0.06642  0.09108
## home_value            0.15886             0.02343  0.00068  0.16877
## med_fam_inc           0.13598             0.03234  0.01520  0.13716
## avg_fam_inc           0.13138             0.03127  0.02434  0.13176
## pct_lt15k            -0.06175            -0.00901 -0.01991 -0.06248
## num_prom             -0.05587            -0.28232  0.11962 -0.14725
## lifetime_gifts        0.20206            -0.14462  0.03855  0.18232
## largest_gift          0.44724             0.01979  0.03998  0.47483
## last_gift             1.00000             0.18672  0.07511  0.86640
## months_since_donate   0.18672             1.00000  0.01553  0.18911
## time_lag              0.07511             0.01553  1.00000  0.07008
## avg_gift              0.86640             0.18911  0.07008  1.00000

Predictors with positive correlations: -home_value and med_fam_inc (0.73815) -avg_fam_inc and home_value (0.75257) -avg_fam_inc and med_fam_inc (0.97227) -num_prom and lifetime_gifts (0.53862) -lifetime_gifts and largest_gift (0.50726) -last_gift and avg_gift (1.00000)

Predictors with negative correlation: -pct_lt15k and _med_fam_inc (-0.66536) -avg_fam_inc and pct_lt15k (-0.68028)

Next, we will be checking for collinearity:

dataf <- (fund[,c(6:7, 9:20)])

vif(lm(data=dataf))
##              income              wealth          home_value         med_fam_inc 
##            1.186680            1.508818            2.478427           18.420766 
##         avg_fam_inc           pct_lt15k            num_prom      lifetime_gifts 
##           20.683498            2.039677            1.951447            1.994198 
##        largest_gift           last_gift months_since_donate            time_lag 
##            1.715154            4.150953            1.144156            1.032417 
##            avg_gift 
##            4.463250

We can see that med_fam_inc and avg_fam_inc are collinear as well as last_gift and avg_gift.

Lastly, we’re going to partition the data set into 80% training and 20% test:

set.seed(12345)
trainI <- sample(1:nrow(fund), round(nrow(fund) * 0.80))
train <- fund[trainI, ]
test <- fund[-trainI, ]
nrow(train)
## [1] 2400

We’re also going to set up the train control for cross validation:

set.seed(12345)
train.ctrl <- trainControl(method="repeatedcv", number=10, repeats=3)

Performed Analyses

First, I’m going to fit the model with all of the predictors:

glm_fit <- glm(target~., data = train, family = "binomial")
summary(glm_fit)
## 
## Call:
## glm(formula = target ~ ., family = "binomial", data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8281  -1.1407  -0.7284   1.1700   1.6933  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.836e+00  5.132e-01  -3.577 0.000347 ***
## zipconvert2Yes      -1.260e+01  2.289e+02  -0.055 0.956114    
## zipconvert3No        1.249e+01  2.289e+02   0.055 0.956490    
## zipconvert4Yes      -1.256e+01  2.289e+02  -0.055 0.956255    
## zipconvert5Yes      -1.252e+01  2.289e+02  -0.055 0.956365    
## homeownerNo          1.461e-01  1.059e-01   1.380 0.167626    
## num_child            3.336e-01  1.279e-01   2.609 0.009092 ** 
## income              -5.378e-02  2.876e-02  -1.870 0.061547 .  
## femaleNo             2.516e-02  8.592e-02   0.293 0.769667    
## wealth              -1.934e-02  2.001e-02  -0.967 0.333665    
## home_value          -9.847e-05  7.963e-05  -1.237 0.216244    
## med_fam_inc         -1.222e-03  1.063e-03  -1.149 0.250371    
## avg_fam_inc          1.679e-03  1.136e-03   1.478 0.139308    
## pct_lt15k           -3.451e-03  4.942e-03  -0.698 0.485073    
## num_prom            -3.980e-03  2.570e-03  -1.548 0.121526    
## lifetime_gifts       2.889e-04  4.031e-04   0.717 0.473484    
## largest_gift        -2.204e-03  3.388e-03  -0.651 0.515324    
## last_gift            1.374e-02  8.664e-03   1.585 0.112860    
## months_since_donate  5.381e-02  1.126e-02   4.777 1.78e-06 ***
## time_lag            -1.430e-03  7.746e-03  -0.185 0.853491    
## avg_gift             5.864e-03  1.237e-02   0.474 0.635556    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3327.0  on 2399  degrees of freedom
## Residual deviance: 3250.2  on 2379  degrees of freedom
## AIC: 3292.2
## 
## Number of Fisher Scoring iterations: 11

We see that the num_child, income, and months_since_donate are significant predictors in our model based on their p-values.

Next, we’re going to perform a linear regression based on our significant predictors.

glm.fit2 <- glm(target ~ num_child + income + months_since_donate, data = train, family = 'binomial')
pred.prob <- predict.glm(glm.fit2, newdata = test, type = 'response')
pred <- ifelse(pred.prob > .5, 'Donor', 'No Donor')
confusionMatrix(as.factor(pred), test$target, positive = 'Donor')
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Donor No Donor
##   Donor      105      147
##   No Donor   185      163
##                                           
##                Accuracy : 0.4467          
##                  95% CI : (0.4064, 0.4875)
##     No Information Rate : 0.5167          
##     P-Value [Acc > NIR] : 0.99974         
##                                           
##                   Kappa : -0.1126         
##                                           
##  Mcnemar's Test P-Value : 0.04229         
##                                           
##             Sensitivity : 0.3621          
##             Specificity : 0.5258          
##          Pos Pred Value : 0.4167          
##          Neg Pred Value : 0.4684          
##              Prevalence : 0.4833          
##          Detection Rate : 0.1750          
##    Detection Prevalence : 0.4200          
##       Balanced Accuracy : 0.4439          
##                                           
##        'Positive' Class : Donor           
## 

Our test accuracy is at 44.67%, so not super accurate.

I’m going to fit a randomForest model with our data and see if that yields better results, and I’m only going to keep in our significant predictors:

set.seed(12345)
bag.fund <- randomForest(target ~ num_child + income + months_since_donate, data =train, mtry = 3, importance = TRUE)
bag.fund
## 
## Call:
##  randomForest(formula = target ~ num_child + income + months_since_donate,      data = train, mtry = 3, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 50.5%
## Confusion matrix:
##          Donor No Donor class.error
## Donor      721      488   0.4036394
## No Donor   724      467   0.6078925
rando.fit <- train(target~ num_child + income + months_since_donate, data = train, method ='rf', trControl = train.ctrl, importance = TRUE)
## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
pred.rf <- predict(rando.fit,test)
confusionMatrix(pred.rf,test$target)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Donor No Donor
##   Donor      197      180
##   No Donor    93      130
##                                           
##                Accuracy : 0.545           
##                  95% CI : (0.5042, 0.5854)
##     No Information Rate : 0.5167          
##     P-Value [Acc > NIR] : 0.08875         
##                                           
##                   Kappa : 0.0977          
##                                           
##  Mcnemar's Test P-Value : 1.94e-07        
##                                           
##             Sensitivity : 0.6793          
##             Specificity : 0.4194          
##          Pos Pred Value : 0.5225          
##          Neg Pred Value : 0.5830          
##              Prevalence : 0.4833          
##          Detection Rate : 0.3283          
##    Detection Prevalence : 0.6283          
##       Balanced Accuracy : 0.5493          
##                                           
##        'Positive' Class : Donor           
## 

The random forest test had an accuracy of about 53.83%, which is a bit better than the regression model.

For my last model I want to test a K-Nearest-Neighbors, once again only using the predictors deemed significant from earlier.

knn <- train(target~num_child + income + months_since_donate, data=train, method='knn',trControl = train.ctrl, tuneLength=30)
pred.knn <- predict(knn,test)
confusionMatrix(pred.knn,test$target)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Donor No Donor
##   Donor      179      163
##   No Donor   111      147
##                                           
##                Accuracy : 0.5433          
##                  95% CI : (0.5025, 0.5837)
##     No Information Rate : 0.5167          
##     P-Value [Acc > NIR] : 0.102639        
##                                           
##                   Kappa : 0.0909          
##                                           
##  Mcnemar's Test P-Value : 0.002063        
##                                           
##             Sensitivity : 0.6172          
##             Specificity : 0.4742          
##          Pos Pred Value : 0.5234          
##          Neg Pred Value : 0.5698          
##              Prevalence : 0.4833          
##          Detection Rate : 0.2983          
##    Detection Prevalence : 0.5700          
##       Balanced Accuracy : 0.5457          
##                                           
##        'Positive' Class : Donor           
## 

This test has an accuracy of 56.33%, meaning it was the best

Selected Model

The model I’ve chosen to select out of the three tested was the KNN model because it appeared to have the highest accuracy of the three tests at 56.3%. We would hope to see a higher testing accuracy but after much trial and error this seems to be the most efficient model.

Testing the Model

For the final step in the process, I will be applying my best fitting model, the KNN model, to the “future_fundraising” data given to us. I will then be converting it to a csv file so it can readily be used.

knn <- train(target~num_child + income + months_since_donate, data=train, method='knn',trControl = train.ctrl, tuneLength=30)
pred.knn <- predict(knn,future.fund)
Value <- c("value",as.character(pred.knn))
write.csv(Value,file="value_knn.csv")

Recommendations

Our models did not present back super high accuracy rates, so one thing that could be done in the future to prevent this would be to add more samples, meaning to record more data. We could have also rescaled features in our KNN model to improve accuracy.

---
title: "Modeling Competition"
author: "Hailey Jensen"
date: "5/1/2022"
output: openintro::lab_report
---

```{r load-packages, message=FALSE}
library(tidyverse)
library(openintro)
library(readr)
library(ISLR)
library(dplyr)
library(ggcorrplot)
library(car)
library(randomForest)
library(caret)
library(class)
```

### Background Information

A national veterans’ organization wishes to develop a predictive model to improve the cost-effectiveness of their direct marketing campaign. The organization, with its in-house database of over 13 million donors, is one of the largest direct-mail fundraisers in the United States. According to their
recent mailing records, the overall response rate is 5.1%. Out of those who responded (donated), the average donation is $13.00. Each mailing, which includes a gift of personalized address labels and assortments of cards and envelopes, costs
$0.68 to produce and send. Using these facts, we take a
sample of this dataset to develop a classification model that can effectively capture donors so that the expected net profit is maximized. Weighted sampling was used, under-representing the non-responders so that the sample has equal numbers of donors and non-donors.

### Business Objectives and Goals

Our objective in this project is to improve the cost-effectiveness of the national veterans' organization's marketing campaign. Our goal is to analyze this data set and create a classification model to efficiently capture donors so the expected net profit is maximized.

### Data Sources and Data Used

The fundraising file has 3,000 records, with about 50% donors and 50% non-donors. The data set is pre-generated using a weighted sample so as to get a near equal number of donors vs non-donors. We use a weighted sample because it helps to improve our fit when we're given unknown parameters.
```{r code-chunk-label}
future.fund <- read_rds("~/future_fundraising.rds")
fund <- read_rds("~/fundraising.rds")
```

***Exploring the Data***
```{r}
table(fund$target)
```

We wanted an equal split of donors to non-donors and as you can see above, it's about 50-50 at 1499 donors and 1501 non-donors.

Now, I want to look at the structure of the data sets:
```{r}
str(future.fund)
```
```{r}
str(fund)
```

Our training data set has 3000 observations and our test data set has 120.

Next, I'm going to check for correlation:
```{r}
data <- (fund[,c(6:7, 9:21)])
data$target <- as.numeric(data$target)

temp <- fund[, c(6,7,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)]
correlation = cor(temp)
round(correlation, 5)
```

Predictors with positive correlations:
-home_value and med_fam_inc (0.73815)
-avg_fam_inc and home_value (0.75257)
-avg_fam_inc and med_fam_inc (0.97227)
-num_prom and lifetime_gifts (0.53862)
-lifetime_gifts and largest_gift (0.50726)
-last_gift and avg_gift (1.00000)

Predictors with negative correlation:
-pct_lt15k and _med_fam_inc (-0.66536)
-avg_fam_inc and pct_lt15k (-0.68028)

Next, we will be checking for collinearity:
```{r}
dataf <- (fund[,c(6:7, 9:20)])

vif(lm(data=dataf))
```

We can see that med_fam_inc and avg_fam_inc are collinear as well as last_gift and avg_gift.

Lastly, we're going to partition the data set into 80% training and 20% test:
```{r}
set.seed(12345)
trainI <- sample(1:nrow(fund), round(nrow(fund) * 0.80))
train <- fund[trainI, ]
test <- fund[-trainI, ]
nrow(train)
```

We're also going to set up the train control for cross validation:
```{r}
set.seed(12345)
train.ctrl <- trainControl(method="repeatedcv", number=10, repeats=3)
```


### Performed Analyses

First, I'm going to fit the model with all of the predictors: 
```{r}
glm_fit <- glm(target~., data = train, family = "binomial")
summary(glm_fit)
```

We see that the num_child, income, and months_since_donate  are significant predictors in our model based on their p-values.

Next, we're going to perform a linear regression based on our significant predictors.
```{r}
glm.fit2 <- glm(target ~ num_child + income + months_since_donate, data = train, family = 'binomial')
pred.prob <- predict.glm(glm.fit2, newdata = test, type = 'response')
pred <- ifelse(pred.prob > .5, 'Donor', 'No Donor')
confusionMatrix(as.factor(pred), test$target, positive = 'Donor')
```

Our test accuracy is at 44.67%, so not super accurate.

I'm going to fit a randomForest model with our data and see if that yields better results, and I'm only going to keep in our significant predictors:
```{r}
set.seed(12345)
bag.fund <- randomForest(target ~ num_child + income + months_since_donate, data =train, mtry = 3, importance = TRUE)
bag.fund
```

```{r}
rando.fit <- train(target~ num_child + income + months_since_donate, data = train, method ='rf', trControl = train.ctrl, importance = TRUE)
pred.rf <- predict(rando.fit,test)
confusionMatrix(pred.rf,test$target)
```

The random forest test had an accuracy of about 53.83%, which is a bit better than the regression model.

For my last model I want to test a K-Nearest-Neighbors, once again only using the predictors deemed significant from earlier.
```{r}
knn <- train(target~num_child + income + months_since_donate, data=train, method='knn',trControl = train.ctrl, tuneLength=30)
pred.knn <- predict(knn,test)
confusionMatrix(pred.knn,test$target)
```

This test has an accuracy of 56.33%, meaning it was the best 

### Selected Model

The model I've chosen to select out of the three tested was the KNN model because it appeared to have the highest accuracy of the three tests at 56.3%. We would hope to see a higher testing accuracy but after much trial and error this seems to be the most efficient model.

### Testing the Model

For the final step in the process, I will be applying my best fitting model, the KNN model, to the "future_fundraising" data given to us. I will then be converting it to a csv file so it can readily be used.
```{r}
knn <- train(target~num_child + income + months_since_donate, data=train, method='knn',trControl = train.ctrl, tuneLength=30)
pred.knn <- predict(knn,future.fund)
Value <- c("value",as.character(pred.knn))
write.csv(Value,file="value_knn.csv")
```

### Recommendations

Our models did not present back super high accuracy rates, so one thing that could be done in the future to prevent this would be to add more samples, meaning to record more data.
We could have also rescaled features in our KNN model to improve accuracy.


...

