Overarching Question

How might we promote happiness with pet ownership? What factors impact the adoption of a pet, and likewise, what factors influence an animal not getting adopted?

Adopt Me Picture

Background Information

Pets have been proven to affect human happiness and mental health positively. By exploring adoption rates of pets, we hope to find ways to promote happiness with pet ownership and discover what characteristics of an animal (age, color, intake condition, etc.) affect whether it’ll be adopted. We are using animal shelter data from Long Beach, California.

Cleaning Dataset

shelter_data <- read.csv("C:/Users/kcoop/Desktop/CS/Data Science/3001/DS3001Final/animal-shelter-intakes-and-outcomes.csv")
column_index <- tibble(colnames(shelter_data))
summary(shelter_data)
##   Animal.ID         Animal.Name        Animal.Type        Primary.Color     
##  Length:30617       Length:30617       Length:30617       Length:30617      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Secondary.Color        Sex                DOB                 Age        
##  Length:30617       Length:30617       Length:30617       Min.   :-8.000  
##  Class :character   Class :character   Class :character   1st Qu.: 3.000  
##  Mode  :character   Mode  :character   Mode  :character   Median : 4.000  
##                                                           Mean   : 4.913  
##                                                           3rd Qu.: 6.000  
##                                                           Max.   :54.000  
##                                                           NA's   :3786    
##  Intake.Date        Intake.Condition   Intake.Type        Intake.Subtype    
##  Length:30617       Length:30617       Length:30617       Length:30617      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Reason.for.Intake  Outcome.Date         Crossing         Jurisdiction      
##  Length:30617       Length:30617       Length:30617       Length:30617      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Outcome.Type       Outcome.Subtype    intake_is_dead     outcome_is_dead
##  Length:30617       Length:30617       Length:30617       Mode :logical  
##  Class :character   Class :character   Class :character   FALSE:24145    
##  Mode  :character   Mode  :character   Mode  :character   TRUE :6472     
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##  was_outcome_alive
##  Min.   :0.0000   
##  1st Qu.:1.0000   
##  Median :1.0000   
##  Mean   :0.7886   
##  3rd Qu.:1.0000   
##  Max.   :1.0000   
## 
mice::md.pattern(shelter_data)

##       Animal.ID Animal.Name Animal.Type Primary.Color Secondary.Color Sex DOB
## 26831         1           1           1             1               1   1   1
## 3786          1           1           1             1               1   1   1
##               0           0           0             0               0   0   0
##       Intake.Date Intake.Condition Intake.Type Intake.Subtype Reason.for.Intake
## 26831           1                1           1              1                 1
## 3786            1                1           1              1                 1
##                 0                0           0              0                 0
##       Outcome.Date Crossing Jurisdiction Outcome.Type Outcome.Subtype
## 26831            1        1            1            1               1
## 3786             1        1            1            1               1
##                  0        0            0            0               0
##       intake_is_dead outcome_is_dead was_outcome_alive  Age     
## 26831              1               1                 1    1    0
## 3786               1               1                 1    0    1
##                    0               0                 0 3786 3786
shelter_data <- shelter_data[,-c(1,2,5,7,12,13,15,18,19,20,21)] 
shelter_data[shelter_data == ""] <- NA
shelter_data<- na.omit(shelter_data)

shelter_data$days_in_shelter <- as.numeric(difftime(shelter_data$Outcome.Date, shelter_data$Intake.Date, units = "days"))

shelter_data <-shelter_data[,-c(5,8)]

# str(shelter_data)

mice::md.pattern(shelter_data)
##  /\     /\
## {  `---'  }
## {  O   O  }
## ==>  V <==  No need for mice. This data set is completely observed.
##  \  \|/  /
##   `-----'

##       Animal.Type Primary.Color Sex Age Intake.Condition Intake.Type
## 26455           1             1   1   1                1           1
##                 0             0   0   0                0           0
##       Jurisdiction Outcome.Type days_in_shelter  
## 26455            1            1               1 0
##                  0            0               0 0
# rescued vs not adopted
# select if shelter outcome is adopted

rescue_data <- subset(shelter_data, (Outcome.Type != "ADOPTION") & (Outcome.Type != "FOSTER TO ADOPT"))

abc <- names(select_if(shelter_data, is.character))
# view(abc)
abc <- abc[-c(4, 7)]
shelter_data[abc] <- lapply(shelter_data[abc], as.factor)
rescue_data[abc] <- lapply(rescue_data[abc], as.factor)

ADOPTED VS NOT ADOPTED

shelter_data$Animal.Type <- 
  fct_collapse(shelter_data$Animal.Type, 
               Other = c("GUINEA PIG", "LIVESTOCK", 
                         "OTHER", "RABBIT", "REPTILE", "WILD"))

shelter_data$Primary.Color <- 
  fct_collapse(shelter_data$Primary.Color,
               Orange = c("APRICOT",  "BLONDE", "CR LYNX PT", "CREAM", "CREAM PT", 
                          "CRM TABBY", "CRM TIGER","FAWN", "FLAME PT", "GOLD", 
                          "ORANGE", "ORG TABBY", "ORG TIGER", "PEACH", "WHEAT",
                          "Y BRINDLE", "YELLOW"),
               Other = c("BL BRINDLE", "BL LYNX PT", "BC LYNX PT", 
                         "BLUE","BLUE BRIND", "BLUE CREAM", "BLUE MERLE",
                         "BLUE PT", "BLUE TABBY", "BLUE TICK", "BUFF", "C-T PT",
                         "CALICO", "CALICO DIL", "CALICO PT", "CALICO TAB", 
                         "DAPPLE", "GREEN","L-C PT", "LC LYNX PT", "LI LYNX PT",
                         "LILAC PT", "LIVER", "LIVER TICK", "LYNX PT", "PINK", 
                         "RD LYNX PT","RED", "RED MERLE", "RUDDY", "S-T PT", 
                         "SABLE", "SEAL", "SEAL PT", "ST LYNX PT", "TORBI", 
                         "TORTIE", "TORTIE DIL", "TORTIE MUT", "TORTIE PT", 
                         "TRICOLOR", "UNKNOWN"),
               Black = c("BLACK","BLK SMOKE", "BLK TABBY", "BLK TIGER"),
               Brown_Tan = c("BR BRINDLE", "BRN MERLE", "BRN TABBY", "BRN TIGER", 
                             "BROWN", "CHOC PT","CHOCOLATE", "SEAL", "SEAL PT",
                             "TAN"),
               Gray_White = c("GRAY", "GRAY TABBY", "GRAY TIGER", "WHITE", 
                              "SILVER", "SL LYNX PT", "SLVR TABBY", "SNOWSHOE"))
 
shelter_data$Intake.Condition <- 
  fct_collapse(shelter_data$Intake.Condition, 
               Other = c("AGED", "UNDER AGE/WEIGHT", "WELFARE SEIZURES"),
               Behavior = c("BEHAVIOR  MILD", "BEHAVIOR  MODERATE", 
                            "BEHAVIOR  SEVERE", "FERAL", "FRACTIOUS"),
               Ill_Injured = c("ILL MILD", "ILL MODERATETE", "ILL SEVERE", 
                               "INJURED  MILD", "INJURED  MODERATE", 
                               "INJURED  SEVERE"),
               Normal = c("NORMAL"))

shelter_data$Intake.Type <- 
  fct_collapse(shelter_data$Intake.Type, 
               Owner_Surrender = c("OWNER SURRENDER"),
               Stray = c("STRAY"),
               Wildlife = c("WILDLIFE"),
               Other = c("CONFISCATE", "Euthenasia Required", "FOSTER", "RETURN",
                         "SAFE KEEP", "WELFARE SEIZED", "Adopted Animal Return",
                         "QUARANTINE", "TRAP, NEUTER, RETURN"))
 
shelter_data$Jurisdiction <- 
  fct_collapse(shelter_data$Jurisdiction, 
               LA = c("LA CITY", "LA COUNTY", "SIGNAL HILL", "TORRANCE AC"),
               OC = c("ORANGE CNTY", "SEAL BEACH", "CERRITOS", "IRVINE", 
                      "GARDEN GROVE", "LA HABRA", "LOS ALAMITOS", "SEAACA", 
                      "WESTMINSTER"),
               LB= c("LONG BEACH", "DISTRICT1", "DISTRICT2", "DISTRICT3",
                     "DISTRICT4", "DISTRICT5", "DISTRICT6", "DISTRICT7",
                     "DISTRICT8", "DISTRICT9"),
               OOA = c("OUT OF AREA"))

shelter_data$Outcome.Type <- 
  fct_collapse(shelter_data$Outcome.Type, 
               Adopted = c("ADOPTION", "FOSTER TO ADOPT"),
               Not_Adopted = c("COMMUNITY CAT", "DIED", "DISPOSAL", "DUPLICATE",
                               "EUTHANASIA", "FOSTER", "MISSING", 
                               "RETURN TO OWNER", "RETURN TO RESCUE", 
                               "RETURN TO WILD HABITAT", 
                               "SHELTER, NEUTER, RETURN", "TRANSFER", 
                               "TRANSPORT", "TRAP, NEUTER, RELEASE",  "RESCUE"))

Tabling Each Variable

table(shelter_data$Animal.Type) # bird, cat, dog, other
## 
##  BIRD   CAT   DOG Other 
##   943 13711  9563  2238
table(shelter_data$Primary.Color) # black, brown/tan, gray/white, orange, other
## 
##     Orange      Other      Black  Brown_Tan Gray_White 
##       2142       2691       6959       7257       7406
table(shelter_data$Sex) # male, female, neutered, spayed, unknown
## 
##   Female     Male Neutered   Spayed  Unknown 
##     6438     6910     4677     4215     4215
table(rescue_data$Age) # from -8 to 54
## 
##   -8    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##    1  828 1638 2226 3182 3721 3161 1616 1322  973  656  544  466  360  319  265 
##   15   16   17   18   19   20   21   22   23   24   25   28   54 
##  185  151   86   68   39   26   10    8    3    3    1    1    1
table(shelter_data$Intake.Condition) # normal, injured/ill, behavior, other
## 
##       Other    Behavior Ill_Injured      Normal 
##        5950        1626        4812       14067
table(shelter_data$Intake.Type) # owner surrender, stray, wildlife, other
## 
##           Other Owner_Surrender           Stray        Wildlife 
##            1350            2729           20124            2252
table(shelter_data$Jurisdiction) # LA, Long Beach, Orange County, Other
## 
##    OC    LB    LA   OOA 
##  2694 22953   680   128
table(shelter_data$Outcome.Type) # adopted/rescued, not adopted
## 
##     Adopted Not_Adopted 
##        4595       21860
# table(rescue_data$days_in_shelter) # from 0 to 730

RESCUE VS NOT ADOPTED

rescue_data$Animal.Type <- 
  fct_collapse(rescue_data$Animal.Type, 
               Other = c("GUINEA PIG", "LIVESTOCK", 
                         "OTHER", "RABBIT", "REPTILE", "WILD"))

rescue_data$Primary.Color <- 
  fct_collapse(rescue_data$Primary.Color,
               Orange = c("APRICOT",  "BLONDE", "CR LYNX PT", "CREAM", "CREAM PT", 
                          "CRM TABBY", "FAWN", "FLAME PT", "GOLD", 
                          "ORANGE", "ORG TABBY", "ORG TIGER", "PEACH", "WHEAT",
                          "Y BRINDLE", "YELLOW"),
               Other = c("BL BRINDLE", "BC LYNX PT", 
                         "BLUE","BLUE BRIND", "BLUE CREAM", "BLUE MERLE",
                         "BLUE PT", "BLUE TABBY", "BLUE TICK", "BUFF", "C-T PT",
                         "CALICO", "CALICO DIL", "CALICO PT", "CALICO TAB", 
                         "DAPPLE", "GREEN","L-C PT", "LC LYNX PT", "LI LYNX PT",
                         "LILAC PT", "LIVER", "LIVER TICK", "LYNX PT", "PINK", 
                         "RD LYNX PT","RED", "RED MERLE", "RUDDY", "S-T PT", 
                         "SABLE", "SEAL", "SEAL PT", "ST LYNX PT", "TORBI", 
                         "TORTIE", "TORTIE DIL", "TORTIE MUT", "TORTIE PT", 
                         "TRICOLOR", "UNKNOWN"),
               Black = c("BLACK","BLK SMOKE", "BLK TABBY", "BLK TIGER"),
               Brown_Tan = c("BR BRINDLE", "BRN MERLE", "BRN TABBY", "BRN TIGER", 
                             "BROWN", "CHOC PT","CHOCOLATE", "SEAL", "SEAL PT",
                             "TAN"),
               Gray_White = c("GRAY", "GRAY TABBY", "GRAY TIGER", "WHITE", 
                              "SILVER", "SL LYNX PT", "SLVR TABBY", "SNOWSHOE"))

rescue_data$Intake.Condition <- 
  fct_collapse(rescue_data$Intake.Condition, 
               Other = c("AGED", "UNDER AGE/WEIGHT", "WELFARE SEIZURES"),
               Behavior = c("BEHAVIOR  MILD", "BEHAVIOR  MODERATE", 
                            "BEHAVIOR  SEVERE", "FERAL", "FRACTIOUS"),
               Ill_Injured = c("ILL MILD", "ILL MODERATETE", "ILL SEVERE", 
                               "INJURED  MILD", "INJURED  MODERATE", 
                               "INJURED  SEVERE"),
               Normal = c("NORMAL"))

rescue_data$Intake.Type <- 
  fct_collapse(rescue_data$Intake.Type, 
               Owner_Surrender = c("OWNER SURRENDER"),
               Stray = c("STRAY"),
               Wildlife = c("WILDLIFE"),
               Other = c("CONFISCATE", "Euthenasia Required", "FOSTER", "RETURN",
                         "SAFE KEEP", "WELFARE SEIZED", "Adopted Animal Return",
                         "QUARANTINE", "TRAP, NEUTER, RETURN"))
 
rescue_data$Jurisdiction <- 
  fct_collapse(rescue_data$Jurisdiction, 
               LA = c("LA CITY", "LA COUNTY", "SIGNAL HILL"),
               OC = c("ORANGE CNTY", "SEAL BEACH", "CERRITOS", "WESTMINSTER",
                      "GARDEN GROVE", "LA HABRA", "LOS ALAMITOS", "SEAACA"),
               LB= c("LONG BEACH", "DISTRICT1", "DISTRICT2", "DISTRICT3",
                     "DISTRICT4", "DISTRICT5", "DISTRICT6", "DISTRICT7",
                     "DISTRICT8", "DISTRICT9"),
               OOA = c("OUT OF AREA"))

rescue_data$Outcome.Type <- 
  fct_collapse(rescue_data$Outcome.Type, 
               Rescued = c("RESCUE"),
               Not_Adopted = c("COMMUNITY CAT", "DIED", "DISPOSAL", "DUPLICATE",
                               "EUTHANASIA", "FOSTER", "MISSING", 
                               "RETURN TO OWNER", "RETURN TO RESCUE", 
                               "RETURN TO WILD HABITAT", 
                               "SHELTER, NEUTER, RETURN",
                               "TRANSFER","TRANSPORT", "TRAP, NEUTER, RELEASE"))

Tabling Each Variable

table(rescue_data$Animal.Type) # bird, cat, dog, other
## 
##  BIRD   CAT   DOG Other 
##   919 11103  7754  2084
table(rescue_data$Primary.Color) # black, brown/tan, gray/white, orange, other
## 
##     Orange      Other      Black  Brown_Tan Gray_White 
##       1714       2203       5763       6000       6180
table(rescue_data$Sex) # male, female, neutered, spayed, unknown
## 
##   Female     Male Neutered   Spayed  Unknown 
##     5925     6432     2805     2512     4186
table(rescue_data$Age) # from -8 to 54
## 
##   -8    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##    1  828 1638 2226 3182 3721 3161 1616 1322  973  656  544  466  360  319  265 
##   15   16   17   18   19   20   21   22   23   24   25   28   54 
##  185  151   86   68   39   26   10    8    3    3    1    1    1
table(rescue_data$Intake.Condition) # normal, injured/ill, behavior, other
## 
##       Other    Behavior Ill_Injured      Normal 
##        5439        1388        4262       10771
table(rescue_data$Intake.Type) # owner surrender, stray, wildlife, other
## 
##           Other Owner_Surrender           Stray        Wildlife 
##            1089            1952           16570            2249
table(rescue_data$Jurisdiction) # LA, Long Beach, Orange County, Other
## 
##    OC    LB    LA   OOA 
##  2295 18917   562    86
table(rescue_data$Outcome.Type) # adopted/rescued, not adopted
## 
## Not_Adopted     Rescued 
##       15729        6131
# table(rescue_data$days_in_shelter) # from 0 to 730

Exploratory Data Analysis

summary(shelter_data)
##  Animal.Type      Primary.Color        Sex            Age        
##  BIRD :  943   Orange    :2142   Female  :6438   Min.   :-8.000  
##  CAT  :13711   Other     :2691   Male    :6910   1st Qu.: 3.000  
##  DOG  : 9563   Black     :6959   Neutered:4677   Median : 4.000  
##  Other: 2238   Brown_Tan :7257   Spayed  :4215   Mean   : 4.954  
##                Gray_White:7406   Unknown :4215   3rd Qu.: 6.000  
##                                                  Max.   :54.000  
##     Intake.Condition          Intake.Type    Jurisdiction      Outcome.Type  
##  Other      : 5950   Other          : 1350   OC : 2694    Adopted    : 4595  
##  Behavior   : 1626   Owner_Surrender: 2729   LB :22953    Not_Adopted:21860  
##  Ill_Injured: 4812   Stray          :20124   LA :  680                       
##  Normal     :14067   Wildlife       : 2252   OOA:  128                       
##                                                                              
##                                                                              
##  days_in_shelter 
##  Min.   :  0.00  
##  1st Qu.:  1.00  
##  Median :  6.00  
##  Mean   : 16.44  
##  3rd Qu.: 15.00  
##  Max.   :799.00
summary(rescue_data)
##  Animal.Type      Primary.Color        Sex            Age        
##  BIRD :  919   Orange    :1714   Female  :5925   Min.   :-8.000  
##  CAT  :11103   Other     :2203   Male    :6432   1st Qu.: 3.000  
##  DOG  : 7754   Black     :5763   Neutered:2805   Median : 4.000  
##  Other: 2084   Brown_Tan :6000   Spayed  :2512   Mean   : 5.099  
##                Gray_White:6180   Unknown :4186   3rd Qu.: 7.000  
##                                                  Max.   :54.000  
##     Intake.Condition          Intake.Type    Jurisdiction      Outcome.Type  
##  Other      : 5439   Other          : 1089   OC : 2295    Not_Adopted:15729  
##  Behavior   : 1388   Owner_Surrender: 1952   LB :18917    Rescued    : 6131  
##  Ill_Injured: 4262   Stray          :16570   LA :  562                       
##  Normal     :10771   Wildlife       : 2249   OOA:   86                       
##                                                                              
##                                                                              
##  days_in_shelter 
##  Min.   :  0.00  
##  1st Qu.:  0.00  
##  Median :  3.00  
##  Mean   : 10.29  
##  3rd Qu.: 10.00  
##  Max.   :730.00

Methods

Using KNN and decision trees

Advantages of using KNN - easy to interpret and naturally handles multiclass datasets - non-parametric

KNN

dim(train)
## [1] 15873    29
dim(tune)
## [1] 5291   29
dim(test)
## [1] 5291   29
# Running kNN algorithm 
# training the classifier for k = 9 

set.seed(1984) # for randomized algorithm
shelter_9NN <- knn(train = train,#<- training set cases
               test = tune,    #<- tune set cases
               cl = train$Outcome.Type_Adopted,#<- category for true classification
               k = 9,#<- number of neighbors considered
               use.all = TRUE,
               prob = TRUE)# provides the output in probabilities 

str(shelter_9NN)
table(shelter_9NN)
table(tune$Outcome.Type_Adopted)
# looking at how the kNN classification compares to the true class using the confusion matrix

kNN_res = table(shelter_9NN,
                tune$Outcome.Type_Adopted)
kNN_res
##            
## shelter_9NN    0    1
##           0 4374  109
##           1    9  799
sum(kNN_res)  #<- the total is all the test examples
## [1] 5291
(4341+822)/(4341+822+19+109) # accuracy = TP+TN/(TP+TN+FP+FN)
## [1] 0.975808
kNN_res[row(kNN_res) == col(kNN_res)] # selecting true positives and true negatives 
## [1] 4374  799
kNN_acc = sum(kNN_res[row(kNN_res) == col(kNN_res)]) / sum(kNN_res) # accuracy rate calculation
kNN_acc 
## [1] 0.977698
# an 100% accuracy rate... find base rate to see the chance of guessing right if we don't know anything about the pet being adopted

confusionMatrix(as.factor(shelter_9NN), as.factor(tune$Outcome.Type_Adopted), positive = "1", dnn=c("Prediction", "Actual"), mode = "sens_spec")
## Confusion Matrix and Statistics
## 
##           Actual
## Prediction    0    1
##          0 4374  109
##          1    9  799
##                                           
##                Accuracy : 0.9777          
##                  95% CI : (0.9734, 0.9815)
##     No Information Rate : 0.8284          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.918           
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.8800          
##             Specificity : 0.9979          
##          Pos Pred Value : 0.9889          
##          Neg Pred Value : 0.9757          
##              Prevalence : 0.1716          
##          Detection Rate : 0.1510          
##    Detection Prevalence : 0.1527          
##       Balanced Accuracy : 0.9390          
##                                           
##        'Positive' Class : 1               
## 
#sensitivity, recall and true poss rate = TP/TP+FN
#specificity, true negative rate = TN/TN+FP

Decision Tree - Adoption

prevalence <- 1-table(shelter_data$Outcome.Type)[[2]]/length(shelter_data$Outcome.Type)  # calculate the proportion of salary that is the positive class 
prevalence
## [1] 0.1736912

The prevalence is the proportion of the positive class within the target variable, in this case, the pets that are adopted from the shelter. In this data set, the prevalence is roughly 17%. This means that roughly 83% of the data is the negative class, or pets that are not adopted. This metric can be used a baseline because a model that always predicts the negative class will be correct about 83% of the time at random.

dim(train)
## [1] 18519     9
dim(tune)
## [1] 3968    9
dim(test)
## [1] 3968    9
features <- train[,c(-8)] # dropping Outcome.Type column to get just the explanatory variables in features
target <- train$Outcome.Type  # add just the Outcome.Type column to target 


fitControl <- trainControl(method = "repeatedcv",  # use repeated cross validation with 5 folds and 3 repeats
                          number = 5,
                          repeats = 3, 
                          returnResamp="all",
                          classProbs = TRUE,
                          allowParallel = TRUE) 

tree.grid <- expand.grid(maxdepth=c(5,7,9,11))

set.seed(1984)  # set seed for reproducibility
shelter_mdl <- train(x=features,
                y=target,
                method="rpart2",#type of model uses maxdepth to select a model
                trControl=fitControl,#previously created
                tuneGrid=tree.grid,#expanded grid
                metric="ROC")#selected on of the metrics available from two variable summary.

shelter_mdl
## CART 
## 
## 18519 samples
##     8 predictor
##     2 classes: 'Adopted', 'Not_Adopted' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times) 
## Summary of sample sizes: 14816, 14815, 14815, 14816, 14814, 14816, ... 
## Resampling results across tuning parameters:
## 
##   maxdepth  Accuracy   Kappa    
##    5        0.8751191  0.5024275
##    7        0.8796909  0.5346578
##    9        0.8841008  0.5656531
##   11        0.8844247  0.5656978
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was maxdepth = 11.
plot(shelter_mdl)

varImp(shelter_mdl)
## rpart2 variable importance
## 
##                   Overall
## days_in_shelter  100.0000
## Sex               57.0850
## Age               48.8401
## Intake.Condition  27.9883
## Animal.Type       11.1253
## Intake.Type        9.7943
## Jurisdiction       0.3915
## Primary.Color      0.0000

Variables with greatest importance are days in the shelter, sex, age, the intake condition, and the type of animal.

predictandCM(shelter_mdl,tune,"raw",tune$Outcome.Type)
## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    Adopted Not_Adopted
##   Adopted         419         201
##   Not_Adopted     270        3078
##                                           
##                Accuracy : 0.8813          
##                  95% CI : (0.8708, 0.8912)
##     No Information Rate : 0.8264          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5693          
##                                           
##  Mcnemar's Test P-Value : 0.001729        
##                                           
##             Sensitivity : 0.6081          
##             Specificity : 0.9387          
##          Pos Pred Value : 0.6758          
##          Neg Pred Value : 0.9194          
##              Prevalence : 0.1736          
##          Detection Rate : 0.1056          
##    Detection Prevalence : 0.1562          
##       Balanced Accuracy : 0.7734          
##                                           
##        'Positive' Class : Adopted         
## 
rpart.plot(shelter_mdl$finalModel, type=4,extra=101)

shelter_mdl$results
##   maxdepth  Accuracy     Kappa  AccuracySD     KappaSD
## 1        5 0.8751191 0.5024275 0.003850890 0.019545342
## 2        7 0.8796909 0.5346578 0.002661621 0.008677938
## 3        9 0.8841008 0.5656531 0.003791283 0.019207966
## 4       11 0.8844247 0.5656978 0.003857220 0.020019717

Decision Tree - Rescue

prevalence <- 1-table(rescue_data$Outcome.Type)[[1]]/length(rescue_data$Outcome.Type)  
prevalence
## [1] 0.2804666
dim(train)
## [1] 15303     9
dim(tune)
## [1] 3279    9
dim(test)
## [1] 3278    9
features <- train[,c(-8)] # dropping outcome type column to get just the explanatory variables in features
target <- train$Outcome.Type


fitControl <- trainControl(method = "repeatedcv",  # use repeated cross validation with 5 folds and 3 repeats
                          number = 5,
                          repeats = 3, 
                          returnResamp="all",
                          classProbs = TRUE,
                          allowParallel = TRUE) 

tree.grid <- expand.grid(maxdepth=c(5,7,9,11))

set.seed(1984)  # set seed for reproducibility
rescue_mdl <- train(x=features,
                y=target,
                method="rpart2",#type of model uses maxdepth to select a model
                trControl=fitControl,#previously created
                tuneGrid=tree.grid,#expanded grid
                metric="ROC")#selected on of the metrics available from two variable summary.

rescue_mdl
## CART 
## 
## 15303 samples
##     8 predictor
##     2 classes: 'Not_Adopted', 'Rescued' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times) 
## Summary of sample sizes: 12243, 12243, 12242, 12243, 12241, 12242, ... 
## Resampling results across tuning parameters:
## 
##   maxdepth  Accuracy   Kappa   
##    5        0.7676272  0.348522
##    7        0.7676272  0.348522
##    9        0.7676272  0.348522
##   11        0.7676272  0.348522
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was maxdepth = 5.
# plot(rescue_mdl)
varImp(rescue_mdl)
## rpart2 variable importance
## 
##                   Overall
## Intake.Condition 100.0000
## Age               73.3506
## Animal.Type       64.9296
## days_in_shelter   47.7059
## Intake.Type       36.2561
## Sex               29.6066
## Jurisdiction       0.1684
## Primary.Color      0.0000
predictandCM(rescue_mdl,tune,"raw",tune$Outcome.Type)
## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    Not_Adopted Rescued
##   Not_Adopted        2180     619
##   Rescued             179     301
##                                           
##                Accuracy : 0.7566          
##                  95% CI : (0.7416, 0.7712)
##     No Information Rate : 0.7194          
##     P-Value [Acc > NIR] : 8.412e-07       
##                                           
##                   Kappa : 0.2942          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.3272          
##             Specificity : 0.9241          
##          Pos Pred Value : 0.6271          
##          Neg Pred Value : 0.7788          
##              Prevalence : 0.2806          
##          Detection Rate : 0.0918          
##    Detection Prevalence : 0.1464          
##       Balanced Accuracy : 0.6256          
##                                           
##        'Positive' Class : Rescued         
## 
rpart.plot(rescue_mdl$finalModel, type=4,extra=101)

rescue_mdl$results
##   maxdepth  Accuracy    Kappa  AccuracySD    KappaSD
## 1        5 0.7676272 0.348522 0.007767632 0.04021767
## 2        7 0.7676272 0.348522 0.007767632 0.04021767
## 3        9 0.7676272 0.348522 0.007767632 0.04021767
## 4       11 0.7676272 0.348522 0.007767632 0.04021767

Fairness Assessment

There are not protected classes because the data is on pets.

Evaluation of our model

In this analysis we studied the factors that impacted adoption using data from adoption shelters in Long Beach, California. Our results show that the animals that are adopted tend to be younger, owner-surrendered (as opposed to strays), and of the cat/dog variety (as opposed to a range of wildlife animals included in the database). Wildlife, older animals and those who come in with behavioral issues and illness/injuries are not adopted as often.

Puppy Picture

Conclusion

Shelters should focus on promoting animals in their shelters that are typically older Our main priority for this exploration is to see which factors contribute the most to an animal not being adopted in order to use this data to promote more instances of adoptions in shelters. This can be done through the shelter marketing older animals that are typically strays and with health issues more frequently. But a select quota of pets that hold a bunch of these “unadoptable” characteristics may continue to not be adopted at all despite a higher amount of promotion. However, there are different ways to handle these animals without them going to a home. Other opportunities for these animals lie in rescues, animal sanctuaries, and more. By doing this, it can help human society in a different way by people coming through and visiting and also through providing the opportunity to help these animals have better lives through volunteer work as well.

Old Dog Picture

Future work

One way that this study could be adapted in future work could be through applying different databases in different areas of the world to learn if/how adoption trends change in different places and in different cultures. Databases with more quantitative variables than the one we used from the Long Beach area would be especially useful, as we wanted to try clustering in our data analysis but didn’t have enough quantitative variables (i.e. weight) to do so. One reason to try clustering is because of the possibility of over-fitting with kNN analysis, which we experienced a bit throughout the process.