How might we promote happiness with pet ownership? What factors impact the adoption of a pet, and likewise, what factors influence an animal not getting adopted?
Pets have been proven to affect human happiness and mental health positively. By exploring adoption rates of pets, we hope to find ways to promote happiness with pet ownership and discover what characteristics of an animal (age, color, intake condition, etc.) affect whether it’ll be adopted. We are using animal shelter data from Long Beach, California.
shelter_data <- read.csv("C:/Users/kcoop/Desktop/CS/Data Science/3001/DS3001Final/animal-shelter-intakes-and-outcomes.csv")
column_index <- tibble(colnames(shelter_data))
summary(shelter_data)
## Animal.ID Animal.Name Animal.Type Primary.Color
## Length:30617 Length:30617 Length:30617 Length:30617
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Secondary.Color Sex DOB Age
## Length:30617 Length:30617 Length:30617 Min. :-8.000
## Class :character Class :character Class :character 1st Qu.: 3.000
## Mode :character Mode :character Mode :character Median : 4.000
## Mean : 4.913
## 3rd Qu.: 6.000
## Max. :54.000
## NA's :3786
## Intake.Date Intake.Condition Intake.Type Intake.Subtype
## Length:30617 Length:30617 Length:30617 Length:30617
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Reason.for.Intake Outcome.Date Crossing Jurisdiction
## Length:30617 Length:30617 Length:30617 Length:30617
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Outcome.Type Outcome.Subtype intake_is_dead outcome_is_dead
## Length:30617 Length:30617 Length:30617 Mode :logical
## Class :character Class :character Class :character FALSE:24145
## Mode :character Mode :character Mode :character TRUE :6472
##
##
##
##
## was_outcome_alive
## Min. :0.0000
## 1st Qu.:1.0000
## Median :1.0000
## Mean :0.7886
## 3rd Qu.:1.0000
## Max. :1.0000
##
mice::md.pattern(shelter_data)
## Animal.ID Animal.Name Animal.Type Primary.Color Secondary.Color Sex DOB
## 26831 1 1 1 1 1 1 1
## 3786 1 1 1 1 1 1 1
## 0 0 0 0 0 0 0
## Intake.Date Intake.Condition Intake.Type Intake.Subtype Reason.for.Intake
## 26831 1 1 1 1 1
## 3786 1 1 1 1 1
## 0 0 0 0 0
## Outcome.Date Crossing Jurisdiction Outcome.Type Outcome.Subtype
## 26831 1 1 1 1 1
## 3786 1 1 1 1 1
## 0 0 0 0 0
## intake_is_dead outcome_is_dead was_outcome_alive Age
## 26831 1 1 1 1 0
## 3786 1 1 1 0 1
## 0 0 0 3786 3786
shelter_data <- shelter_data[,-c(1,2,5,7,12,13,15,18,19,20,21)]
shelter_data[shelter_data == ""] <- NA
shelter_data<- na.omit(shelter_data)
shelter_data$days_in_shelter <- as.numeric(difftime(shelter_data$Outcome.Date, shelter_data$Intake.Date, units = "days"))
shelter_data <-shelter_data[,-c(5,8)]
# str(shelter_data)
mice::md.pattern(shelter_data)
## /\ /\
## { `---' }
## { O O }
## ==> V <== No need for mice. This data set is completely observed.
## \ \|/ /
## `-----'
## Animal.Type Primary.Color Sex Age Intake.Condition Intake.Type
## 26455 1 1 1 1 1 1
## 0 0 0 0 0 0
## Jurisdiction Outcome.Type days_in_shelter
## 26455 1 1 1 0
## 0 0 0 0
# rescued vs not adopted
# select if shelter outcome is adopted
rescue_data <- subset(shelter_data, (Outcome.Type != "ADOPTION") & (Outcome.Type != "FOSTER TO ADOPT"))
abc <- names(select_if(shelter_data, is.character))
# view(abc)
abc <- abc[-c(4, 7)]
shelter_data[abc] <- lapply(shelter_data[abc], as.factor)
rescue_data[abc] <- lapply(rescue_data[abc], as.factor)
shelter_data$Animal.Type <-
fct_collapse(shelter_data$Animal.Type,
Other = c("GUINEA PIG", "LIVESTOCK",
"OTHER", "RABBIT", "REPTILE", "WILD"))
shelter_data$Primary.Color <-
fct_collapse(shelter_data$Primary.Color,
Orange = c("APRICOT", "BLONDE", "CR LYNX PT", "CREAM", "CREAM PT",
"CRM TABBY", "CRM TIGER","FAWN", "FLAME PT", "GOLD",
"ORANGE", "ORG TABBY", "ORG TIGER", "PEACH", "WHEAT",
"Y BRINDLE", "YELLOW"),
Other = c("BL BRINDLE", "BL LYNX PT", "BC LYNX PT",
"BLUE","BLUE BRIND", "BLUE CREAM", "BLUE MERLE",
"BLUE PT", "BLUE TABBY", "BLUE TICK", "BUFF", "C-T PT",
"CALICO", "CALICO DIL", "CALICO PT", "CALICO TAB",
"DAPPLE", "GREEN","L-C PT", "LC LYNX PT", "LI LYNX PT",
"LILAC PT", "LIVER", "LIVER TICK", "LYNX PT", "PINK",
"RD LYNX PT","RED", "RED MERLE", "RUDDY", "S-T PT",
"SABLE", "SEAL", "SEAL PT", "ST LYNX PT", "TORBI",
"TORTIE", "TORTIE DIL", "TORTIE MUT", "TORTIE PT",
"TRICOLOR", "UNKNOWN"),
Black = c("BLACK","BLK SMOKE", "BLK TABBY", "BLK TIGER"),
Brown_Tan = c("BR BRINDLE", "BRN MERLE", "BRN TABBY", "BRN TIGER",
"BROWN", "CHOC PT","CHOCOLATE", "SEAL", "SEAL PT",
"TAN"),
Gray_White = c("GRAY", "GRAY TABBY", "GRAY TIGER", "WHITE",
"SILVER", "SL LYNX PT", "SLVR TABBY", "SNOWSHOE"))
shelter_data$Intake.Condition <-
fct_collapse(shelter_data$Intake.Condition,
Other = c("AGED", "UNDER AGE/WEIGHT", "WELFARE SEIZURES"),
Behavior = c("BEHAVIOR MILD", "BEHAVIOR MODERATE",
"BEHAVIOR SEVERE", "FERAL", "FRACTIOUS"),
Ill_Injured = c("ILL MILD", "ILL MODERATETE", "ILL SEVERE",
"INJURED MILD", "INJURED MODERATE",
"INJURED SEVERE"),
Normal = c("NORMAL"))
shelter_data$Intake.Type <-
fct_collapse(shelter_data$Intake.Type,
Owner_Surrender = c("OWNER SURRENDER"),
Stray = c("STRAY"),
Wildlife = c("WILDLIFE"),
Other = c("CONFISCATE", "Euthenasia Required", "FOSTER", "RETURN",
"SAFE KEEP", "WELFARE SEIZED", "Adopted Animal Return",
"QUARANTINE", "TRAP, NEUTER, RETURN"))
shelter_data$Jurisdiction <-
fct_collapse(shelter_data$Jurisdiction,
LA = c("LA CITY", "LA COUNTY", "SIGNAL HILL", "TORRANCE AC"),
OC = c("ORANGE CNTY", "SEAL BEACH", "CERRITOS", "IRVINE",
"GARDEN GROVE", "LA HABRA", "LOS ALAMITOS", "SEAACA",
"WESTMINSTER"),
LB= c("LONG BEACH", "DISTRICT1", "DISTRICT2", "DISTRICT3",
"DISTRICT4", "DISTRICT5", "DISTRICT6", "DISTRICT7",
"DISTRICT8", "DISTRICT9"),
OOA = c("OUT OF AREA"))
shelter_data$Outcome.Type <-
fct_collapse(shelter_data$Outcome.Type,
Adopted = c("ADOPTION", "FOSTER TO ADOPT"),
Not_Adopted = c("COMMUNITY CAT", "DIED", "DISPOSAL", "DUPLICATE",
"EUTHANASIA", "FOSTER", "MISSING",
"RETURN TO OWNER", "RETURN TO RESCUE",
"RETURN TO WILD HABITAT",
"SHELTER, NEUTER, RETURN", "TRANSFER",
"TRANSPORT", "TRAP, NEUTER, RELEASE", "RESCUE"))
table(shelter_data$Animal.Type) # bird, cat, dog, other
##
## BIRD CAT DOG Other
## 943 13711 9563 2238
table(shelter_data$Primary.Color) # black, brown/tan, gray/white, orange, other
##
## Orange Other Black Brown_Tan Gray_White
## 2142 2691 6959 7257 7406
table(shelter_data$Sex) # male, female, neutered, spayed, unknown
##
## Female Male Neutered Spayed Unknown
## 6438 6910 4677 4215 4215
table(rescue_data$Age) # from -8 to 54
##
## -8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 1 828 1638 2226 3182 3721 3161 1616 1322 973 656 544 466 360 319 265
## 15 16 17 18 19 20 21 22 23 24 25 28 54
## 185 151 86 68 39 26 10 8 3 3 1 1 1
table(shelter_data$Intake.Condition) # normal, injured/ill, behavior, other
##
## Other Behavior Ill_Injured Normal
## 5950 1626 4812 14067
table(shelter_data$Intake.Type) # owner surrender, stray, wildlife, other
##
## Other Owner_Surrender Stray Wildlife
## 1350 2729 20124 2252
table(shelter_data$Jurisdiction) # LA, Long Beach, Orange County, Other
##
## OC LB LA OOA
## 2694 22953 680 128
table(shelter_data$Outcome.Type) # adopted/rescued, not adopted
##
## Adopted Not_Adopted
## 4595 21860
# table(rescue_data$days_in_shelter) # from 0 to 730
rescue_data$Animal.Type <-
fct_collapse(rescue_data$Animal.Type,
Other = c("GUINEA PIG", "LIVESTOCK",
"OTHER", "RABBIT", "REPTILE", "WILD"))
rescue_data$Primary.Color <-
fct_collapse(rescue_data$Primary.Color,
Orange = c("APRICOT", "BLONDE", "CR LYNX PT", "CREAM", "CREAM PT",
"CRM TABBY", "FAWN", "FLAME PT", "GOLD",
"ORANGE", "ORG TABBY", "ORG TIGER", "PEACH", "WHEAT",
"Y BRINDLE", "YELLOW"),
Other = c("BL BRINDLE", "BC LYNX PT",
"BLUE","BLUE BRIND", "BLUE CREAM", "BLUE MERLE",
"BLUE PT", "BLUE TABBY", "BLUE TICK", "BUFF", "C-T PT",
"CALICO", "CALICO DIL", "CALICO PT", "CALICO TAB",
"DAPPLE", "GREEN","L-C PT", "LC LYNX PT", "LI LYNX PT",
"LILAC PT", "LIVER", "LIVER TICK", "LYNX PT", "PINK",
"RD LYNX PT","RED", "RED MERLE", "RUDDY", "S-T PT",
"SABLE", "SEAL", "SEAL PT", "ST LYNX PT", "TORBI",
"TORTIE", "TORTIE DIL", "TORTIE MUT", "TORTIE PT",
"TRICOLOR", "UNKNOWN"),
Black = c("BLACK","BLK SMOKE", "BLK TABBY", "BLK TIGER"),
Brown_Tan = c("BR BRINDLE", "BRN MERLE", "BRN TABBY", "BRN TIGER",
"BROWN", "CHOC PT","CHOCOLATE", "SEAL", "SEAL PT",
"TAN"),
Gray_White = c("GRAY", "GRAY TABBY", "GRAY TIGER", "WHITE",
"SILVER", "SL LYNX PT", "SLVR TABBY", "SNOWSHOE"))
rescue_data$Intake.Condition <-
fct_collapse(rescue_data$Intake.Condition,
Other = c("AGED", "UNDER AGE/WEIGHT", "WELFARE SEIZURES"),
Behavior = c("BEHAVIOR MILD", "BEHAVIOR MODERATE",
"BEHAVIOR SEVERE", "FERAL", "FRACTIOUS"),
Ill_Injured = c("ILL MILD", "ILL MODERATETE", "ILL SEVERE",
"INJURED MILD", "INJURED MODERATE",
"INJURED SEVERE"),
Normal = c("NORMAL"))
rescue_data$Intake.Type <-
fct_collapse(rescue_data$Intake.Type,
Owner_Surrender = c("OWNER SURRENDER"),
Stray = c("STRAY"),
Wildlife = c("WILDLIFE"),
Other = c("CONFISCATE", "Euthenasia Required", "FOSTER", "RETURN",
"SAFE KEEP", "WELFARE SEIZED", "Adopted Animal Return",
"QUARANTINE", "TRAP, NEUTER, RETURN"))
rescue_data$Jurisdiction <-
fct_collapse(rescue_data$Jurisdiction,
LA = c("LA CITY", "LA COUNTY", "SIGNAL HILL"),
OC = c("ORANGE CNTY", "SEAL BEACH", "CERRITOS", "WESTMINSTER",
"GARDEN GROVE", "LA HABRA", "LOS ALAMITOS", "SEAACA"),
LB= c("LONG BEACH", "DISTRICT1", "DISTRICT2", "DISTRICT3",
"DISTRICT4", "DISTRICT5", "DISTRICT6", "DISTRICT7",
"DISTRICT8", "DISTRICT9"),
OOA = c("OUT OF AREA"))
rescue_data$Outcome.Type <-
fct_collapse(rescue_data$Outcome.Type,
Rescued = c("RESCUE"),
Not_Adopted = c("COMMUNITY CAT", "DIED", "DISPOSAL", "DUPLICATE",
"EUTHANASIA", "FOSTER", "MISSING",
"RETURN TO OWNER", "RETURN TO RESCUE",
"RETURN TO WILD HABITAT",
"SHELTER, NEUTER, RETURN",
"TRANSFER","TRANSPORT", "TRAP, NEUTER, RELEASE"))
table(rescue_data$Animal.Type) # bird, cat, dog, other
##
## BIRD CAT DOG Other
## 919 11103 7754 2084
table(rescue_data$Primary.Color) # black, brown/tan, gray/white, orange, other
##
## Orange Other Black Brown_Tan Gray_White
## 1714 2203 5763 6000 6180
table(rescue_data$Sex) # male, female, neutered, spayed, unknown
##
## Female Male Neutered Spayed Unknown
## 5925 6432 2805 2512 4186
table(rescue_data$Age) # from -8 to 54
##
## -8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 1 828 1638 2226 3182 3721 3161 1616 1322 973 656 544 466 360 319 265
## 15 16 17 18 19 20 21 22 23 24 25 28 54
## 185 151 86 68 39 26 10 8 3 3 1 1 1
table(rescue_data$Intake.Condition) # normal, injured/ill, behavior, other
##
## Other Behavior Ill_Injured Normal
## 5439 1388 4262 10771
table(rescue_data$Intake.Type) # owner surrender, stray, wildlife, other
##
## Other Owner_Surrender Stray Wildlife
## 1089 1952 16570 2249
table(rescue_data$Jurisdiction) # LA, Long Beach, Orange County, Other
##
## OC LB LA OOA
## 2295 18917 562 86
table(rescue_data$Outcome.Type) # adopted/rescued, not adopted
##
## Not_Adopted Rescued
## 15729 6131
# table(rescue_data$days_in_shelter) # from 0 to 730
summary(shelter_data)
## Animal.Type Primary.Color Sex Age
## BIRD : 943 Orange :2142 Female :6438 Min. :-8.000
## CAT :13711 Other :2691 Male :6910 1st Qu.: 3.000
## DOG : 9563 Black :6959 Neutered:4677 Median : 4.000
## Other: 2238 Brown_Tan :7257 Spayed :4215 Mean : 4.954
## Gray_White:7406 Unknown :4215 3rd Qu.: 6.000
## Max. :54.000
## Intake.Condition Intake.Type Jurisdiction Outcome.Type
## Other : 5950 Other : 1350 OC : 2694 Adopted : 4595
## Behavior : 1626 Owner_Surrender: 2729 LB :22953 Not_Adopted:21860
## Ill_Injured: 4812 Stray :20124 LA : 680
## Normal :14067 Wildlife : 2252 OOA: 128
##
##
## days_in_shelter
## Min. : 0.00
## 1st Qu.: 1.00
## Median : 6.00
## Mean : 16.44
## 3rd Qu.: 15.00
## Max. :799.00
summary(rescue_data)
## Animal.Type Primary.Color Sex Age
## BIRD : 919 Orange :1714 Female :5925 Min. :-8.000
## CAT :11103 Other :2203 Male :6432 1st Qu.: 3.000
## DOG : 7754 Black :5763 Neutered:2805 Median : 4.000
## Other: 2084 Brown_Tan :6000 Spayed :2512 Mean : 5.099
## Gray_White:6180 Unknown :4186 3rd Qu.: 7.000
## Max. :54.000
## Intake.Condition Intake.Type Jurisdiction Outcome.Type
## Other : 5439 Other : 1089 OC : 2295 Not_Adopted:15729
## Behavior : 1388 Owner_Surrender: 1952 LB :18917 Rescued : 6131
## Ill_Injured: 4262 Stray :16570 LA : 562
## Normal :10771 Wildlife : 2249 OOA: 86
##
##
## days_in_shelter
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 3.00
## Mean : 10.29
## 3rd Qu.: 10.00
## Max. :730.00
Using KNN and decision trees
Advantages of using KNN - easy to interpret and naturally handles multiclass datasets - non-parametric
dim(train)
## [1] 15873 29
dim(tune)
## [1] 5291 29
dim(test)
## [1] 5291 29
# Running kNN algorithm
# training the classifier for k = 9
set.seed(1984) # for randomized algorithm
shelter_9NN <- knn(train = train,#<- training set cases
test = tune, #<- tune set cases
cl = train$Outcome.Type_Adopted,#<- category for true classification
k = 9,#<- number of neighbors considered
use.all = TRUE,
prob = TRUE)# provides the output in probabilities
str(shelter_9NN)
table(shelter_9NN)
table(tune$Outcome.Type_Adopted)
# looking at how the kNN classification compares to the true class using the confusion matrix
kNN_res = table(shelter_9NN,
tune$Outcome.Type_Adopted)
kNN_res
##
## shelter_9NN 0 1
## 0 4374 109
## 1 9 799
sum(kNN_res) #<- the total is all the test examples
## [1] 5291
(4341+822)/(4341+822+19+109) # accuracy = TP+TN/(TP+TN+FP+FN)
## [1] 0.975808
kNN_res[row(kNN_res) == col(kNN_res)] # selecting true positives and true negatives
## [1] 4374 799
kNN_acc = sum(kNN_res[row(kNN_res) == col(kNN_res)]) / sum(kNN_res) # accuracy rate calculation
kNN_acc
## [1] 0.977698
# an 100% accuracy rate... find base rate to see the chance of guessing right if we don't know anything about the pet being adopted
confusionMatrix(as.factor(shelter_9NN), as.factor(tune$Outcome.Type_Adopted), positive = "1", dnn=c("Prediction", "Actual"), mode = "sens_spec")
## Confusion Matrix and Statistics
##
## Actual
## Prediction 0 1
## 0 4374 109
## 1 9 799
##
## Accuracy : 0.9777
## 95% CI : (0.9734, 0.9815)
## No Information Rate : 0.8284
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.918
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.8800
## Specificity : 0.9979
## Pos Pred Value : 0.9889
## Neg Pred Value : 0.9757
## Prevalence : 0.1716
## Detection Rate : 0.1510
## Detection Prevalence : 0.1527
## Balanced Accuracy : 0.9390
##
## 'Positive' Class : 1
##
#sensitivity, recall and true poss rate = TP/TP+FN
#specificity, true negative rate = TN/TN+FP
prevalence <- 1-table(shelter_data$Outcome.Type)[[2]]/length(shelter_data$Outcome.Type) # calculate the proportion of salary that is the positive class
prevalence
## [1] 0.1736912
The prevalence is the proportion of the positive class within the target variable, in this case, the pets that are adopted from the shelter. In this data set, the prevalence is roughly 17%. This means that roughly 83% of the data is the negative class, or pets that are not adopted. This metric can be used a baseline because a model that always predicts the negative class will be correct about 83% of the time at random.
dim(train)
## [1] 18519 9
dim(tune)
## [1] 3968 9
dim(test)
## [1] 3968 9
features <- train[,c(-8)] # dropping Outcome.Type column to get just the explanatory variables in features
target <- train$Outcome.Type # add just the Outcome.Type column to target
fitControl <- trainControl(method = "repeatedcv", # use repeated cross validation with 5 folds and 3 repeats
number = 5,
repeats = 3,
returnResamp="all",
classProbs = TRUE,
allowParallel = TRUE)
tree.grid <- expand.grid(maxdepth=c(5,7,9,11))
set.seed(1984) # set seed for reproducibility
shelter_mdl <- train(x=features,
y=target,
method="rpart2",#type of model uses maxdepth to select a model
trControl=fitControl,#previously created
tuneGrid=tree.grid,#expanded grid
metric="ROC")#selected on of the metrics available from two variable summary.
shelter_mdl
## CART
##
## 18519 samples
## 8 predictor
## 2 classes: 'Adopted', 'Not_Adopted'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times)
## Summary of sample sizes: 14816, 14815, 14815, 14816, 14814, 14816, ...
## Resampling results across tuning parameters:
##
## maxdepth Accuracy Kappa
## 5 0.8751191 0.5024275
## 7 0.8796909 0.5346578
## 9 0.8841008 0.5656531
## 11 0.8844247 0.5656978
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was maxdepth = 11.
plot(shelter_mdl)
varImp(shelter_mdl)
## rpart2 variable importance
##
## Overall
## days_in_shelter 100.0000
## Sex 57.0850
## Age 48.8401
## Intake.Condition 27.9883
## Animal.Type 11.1253
## Intake.Type 9.7943
## Jurisdiction 0.3915
## Primary.Color 0.0000
Variables with greatest importance are days in the shelter, sex, age, the intake condition, and the type of animal.
predictandCM(shelter_mdl,tune,"raw",tune$Outcome.Type)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Adopted Not_Adopted
## Adopted 419 201
## Not_Adopted 270 3078
##
## Accuracy : 0.8813
## 95% CI : (0.8708, 0.8912)
## No Information Rate : 0.8264
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5693
##
## Mcnemar's Test P-Value : 0.001729
##
## Sensitivity : 0.6081
## Specificity : 0.9387
## Pos Pred Value : 0.6758
## Neg Pred Value : 0.9194
## Prevalence : 0.1736
## Detection Rate : 0.1056
## Detection Prevalence : 0.1562
## Balanced Accuracy : 0.7734
##
## 'Positive' Class : Adopted
##
rpart.plot(shelter_mdl$finalModel, type=4,extra=101)
shelter_mdl$results
## maxdepth Accuracy Kappa AccuracySD KappaSD
## 1 5 0.8751191 0.5024275 0.003850890 0.019545342
## 2 7 0.8796909 0.5346578 0.002661621 0.008677938
## 3 9 0.8841008 0.5656531 0.003791283 0.019207966
## 4 11 0.8844247 0.5656978 0.003857220 0.020019717
prevalence <- 1-table(rescue_data$Outcome.Type)[[1]]/length(rescue_data$Outcome.Type)
prevalence
## [1] 0.2804666
dim(train)
## [1] 15303 9
dim(tune)
## [1] 3279 9
dim(test)
## [1] 3278 9
features <- train[,c(-8)] # dropping outcome type column to get just the explanatory variables in features
target <- train$Outcome.Type
fitControl <- trainControl(method = "repeatedcv", # use repeated cross validation with 5 folds and 3 repeats
number = 5,
repeats = 3,
returnResamp="all",
classProbs = TRUE,
allowParallel = TRUE)
tree.grid <- expand.grid(maxdepth=c(5,7,9,11))
set.seed(1984) # set seed for reproducibility
rescue_mdl <- train(x=features,
y=target,
method="rpart2",#type of model uses maxdepth to select a model
trControl=fitControl,#previously created
tuneGrid=tree.grid,#expanded grid
metric="ROC")#selected on of the metrics available from two variable summary.
rescue_mdl
## CART
##
## 15303 samples
## 8 predictor
## 2 classes: 'Not_Adopted', 'Rescued'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times)
## Summary of sample sizes: 12243, 12243, 12242, 12243, 12241, 12242, ...
## Resampling results across tuning parameters:
##
## maxdepth Accuracy Kappa
## 5 0.7676272 0.348522
## 7 0.7676272 0.348522
## 9 0.7676272 0.348522
## 11 0.7676272 0.348522
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was maxdepth = 5.
# plot(rescue_mdl)
varImp(rescue_mdl)
## rpart2 variable importance
##
## Overall
## Intake.Condition 100.0000
## Age 73.3506
## Animal.Type 64.9296
## days_in_shelter 47.7059
## Intake.Type 36.2561
## Sex 29.6066
## Jurisdiction 0.1684
## Primary.Color 0.0000
predictandCM(rescue_mdl,tune,"raw",tune$Outcome.Type)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Not_Adopted Rescued
## Not_Adopted 2180 619
## Rescued 179 301
##
## Accuracy : 0.7566
## 95% CI : (0.7416, 0.7712)
## No Information Rate : 0.7194
## P-Value [Acc > NIR] : 8.412e-07
##
## Kappa : 0.2942
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.3272
## Specificity : 0.9241
## Pos Pred Value : 0.6271
## Neg Pred Value : 0.7788
## Prevalence : 0.2806
## Detection Rate : 0.0918
## Detection Prevalence : 0.1464
## Balanced Accuracy : 0.6256
##
## 'Positive' Class : Rescued
##
rpart.plot(rescue_mdl$finalModel, type=4,extra=101)
rescue_mdl$results
## maxdepth Accuracy Kappa AccuracySD KappaSD
## 1 5 0.7676272 0.348522 0.007767632 0.04021767
## 2 7 0.7676272 0.348522 0.007767632 0.04021767
## 3 9 0.7676272 0.348522 0.007767632 0.04021767
## 4 11 0.7676272 0.348522 0.007767632 0.04021767
There are not protected classes because the data is on pets.
In this analysis we studied the factors that impacted adoption using data from adoption shelters in Long Beach, California. Our results show that the animals that are adopted tend to be younger, owner-surrendered (as opposed to strays), and of the cat/dog variety (as opposed to a range of wildlife animals included in the database). Wildlife, older animals and those who come in with behavioral issues and illness/injuries are not adopted as often.
Shelters should focus on promoting animals in their shelters that are typically older Our main priority for this exploration is to see which factors contribute the most to an animal not being adopted in order to use this data to promote more instances of adoptions in shelters. This can be done through the shelter marketing older animals that are typically strays and with health issues more frequently. But a select quota of pets that hold a bunch of these “unadoptable” characteristics may continue to not be adopted at all despite a higher amount of promotion. However, there are different ways to handle these animals without them going to a home. Other opportunities for these animals lie in rescues, animal sanctuaries, and more. By doing this, it can help human society in a different way by people coming through and visiting and also through providing the opportunity to help these animals have better lives through volunteer work as well.
One way that this study could be adapted in future work could be through applying different databases in different areas of the world to learn if/how adoption trends change in different places and in different cultures. Databases with more quantitative variables than the one we used from the Long Beach area would be especially useful, as we wanted to try clustering in our data analysis but didn’t have enough quantitative variables (i.e. weight) to do so. One reason to try clustering is because of the possibility of over-fitting with kNN analysis, which we experienced a bit throughout the process.