Link to download the dataset: https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes-10-01-2013-to-05-05-/9t4d-g238/about_data

Goal

The goal of this project is to find rules or patterns by which people adopt different types of anymals from the shelters to drive higher rates of adoption through the correct advertising and preferences changes.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

EDA

data <- read.csv("Austin_Animal_Center_Outcomes_(10_01_2013_to_05_05_2025)_20260127.csv")

str(data)

## 'data.frame':    173775 obs. of  12 variables:
##  $ Animal.ID       : chr  "A668305" "A673335" "A675999" "A679066" ...
##  $ Date.of.Birth   : chr  "2012-12-01" "2012-02-22" "2013-04-03" "2014-04-16" ...
##  $ Name            : chr  "" "" "" "" ...
##  $ DateTime        : chr  "2013-12-02T00:00:00-05:00" "2014-02-22T00:00:00-05:00" "2014-04-07T00:00:00-05:00" "2014-05-16T00:00:00-05:00" ...
##  $ MonthYear       : chr  "12-2013" "02-2014" "04-2014" "05-2014" ...
##  $ Outcome.Type    : chr  "Transfer" "Euthanasia" "Transfer" "" ...
##  $ Outcome.Subtype : chr  "Partner" "Suffering" "Partner" "" ...
##  $ Animal.Type     : chr  "Other" "Other" "Other" "Other" ...
##  $ Sex.upon.Outcome: chr  "Unknown" "Unknown" "Unknown" "Unknown" ...
##  $ Age.upon.Outcome: chr  "1 year" "2 years" "1 year" "4 weeks" ...
##  $ Breed           : chr  "Turtle Mix" "Raccoon" "Turtle Mix" "Rabbit Sh" ...
##  $ Color           : chr  "Brown/Yellow" "Black/Gray" "Green" "Brown" ...

#View(data)

We do not need all the variables in this dataset, for example animal name, id, date are not relevant in my context and will not allow the model to generalise well.

data <- data %>% select(MonthYear, 
                Outcome.Type, 
                Outcome.Subtype, 
                Animal.Type, 
                Sex.upon.Outcome,
                Age.upon.Outcome,
                Breed,
                Color
                )
summary(data)

##   MonthYear         Outcome.Type       Outcome.Subtype    Animal.Type       
##  Length:173775      Length:173775      Length:173775      Length:173775     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Sex.upon.Outcome   Age.upon.Outcome      Breed              Color          
##  Length:173775      Length:173775      Length:173775      Length:173775     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character

Let me handle the data feature by feature.

Adoption Month

Firstly, MonthYear - represents in which month of which year a pet was adopted. Since the goal of this project is to use the results for the future, I am only interested in month metric. So I will categorise the month by its number and factor it.

data <- data %>% separate(MonthYear, into = c("month", "year"), sep = "-", convert = TRUE)
data <- data %>% select(-year)
unique(data$month)

##  [1] 12  2  4  5  6  7  8  9  1  3 10 11

data$month <- factor(data$month)

All months are present, so I do not need to look into that.

sum(is.na(data$month) | data$month == 0)

## [1] 0

No missing values

ggplot(data, aes(x = month)) +
  geom_bar() + 
  labs(
    title = "Pets adoption count in different months in the past 12 years in Austin", 
    y = "number of pets"
  )

Instead of month I actually want to encode season

data$season <- factor(ifelse(data$month %in% c(12,1,2), "Winter",
                        ifelse(data$month %in% c(3,4,5), "Spring",
                        ifelse(data$month %in% c(6,7,8), "Summer", "Fall"))))

Month on its own does not give me much information and may be too granular for my analysis, so I will remove it.

data <- data %>% select(-month)

ggplot(data, aes(x=season)) +
  geom_bar(fill="skyblue") +
  labs(title="Pet adoptions per season")

Outcome for the pet

table(data$Outcome.Type)

## 
##                        Adoption            Died        Disposal      Euthanasia 
##              46           84598            1672             877           10833 
##            Lost         Missing        Relocate Return to Owner       Rto-Adopt 
##               2              92              29           25691            1241 
##          Stolen        Transfer 
##               5           48689

data <- data[data$Outcome.Type != "",]

There are 46 occurances of “” empty sting value, which I wanted to assign to “Other”. However in case of association rules I think this could lead to rules that include this “Other” which will not bring any value to my research. So I will drop those values.

sum(is.na(data$Outcome.Type))

## [1] 0

data$Outcome.Type <- factor(data$Outcome.Type)

Outcome subtype

table(data$Outcome.Subtype)

## 
##                              Aggressive              At Vet                Barn 
##               94071                 613                 357                  16 
##            Behavior Court/Investigation          Customer S                Emer 
##                 176                 103                  21                  11 
##           Emergency             Enroute               Field              Foster 
##                  54                 112                 243               17949 
##           In Foster           In Kennel            In State          In Surgery 
##                 416                 873                  12                  33 
##             Medical             Offsite           Out State             Partner 
##                 355                 512                 931               40410 
##      Possible Theft                 Prc         Rabies Risk                SCRP 
##                  16                  20                4922                3211 
##                 Snr           Suffering            Underage 
##                4101                4154                  37

data$Outcome.Subtype[data$Outcome.Subtype == ""] <- "Other"
data$Outcome.Subtype = factor(data$Outcome.Subtype)

There are a lot of empty outcome subtypes (over 90.000 entries). Initially I wanted to remove this feature as we already have outcome type and this might not be so useful. However not that I see what kind of data there is actually inside I think this will be very useful for some of the association rules. So I am leaving it.

sum(is.na(data$Outcome.Subtype))

## [1] 0

Animal type

table(data$Animal.Type)

## 
##      Bird       Cat       Dog Livestock     Other 
##       876     69390     94475        34      8954

sum(is.na(data$Animal.Type))

## [1] 0

data$Animal.Type <- factor(data$Animal.Type)

Sex

table(data$Sex.upon.Outcome)

## 
## Intact Female   Intact Male Neutered Male          NULL Spayed Female 
##         21802         22255         60923             1         55253 
##       Unknown 
##         13495

data <- data[data$Sex.upon.Outcome != "Unknown" & data$Sex.upon.Outcome != "NULL",]

Even though I have substantial number of rows that are unkown , for the same reason I described above, I will drop such rows. Here I might have done clustering and just assign values from the cluster that the row “Unknown” was assigned to. But since the project is about association rules, I will not include this in here.

sum(is.na(data$Sex.upon.Outcome))

## [1] 0

I want to divide the sex feature and if the animal has undergone castration (were “fixed”) feature:

data <- data %>%
  mutate(
    sex = if_else(str_detect(Sex.upon.Outcome, "Female"), "Female", "Male"),
    
    if_fixed = if_else(str_detect(Sex.upon.Outcome, "Intact"), "Intact", "Fixed")
  ) %>%
  select(-Sex.upon.Outcome)

data$sex <- factor(data$sex)
data$if_fixed <- factor(data$if_fixed)

Age

table(data$Age.upon.Outcome)

## 
##  -1 years  -2 years  -3 years  -4 years   0 years     1 day   1 month    1 week 
##         5         1         2         1       114       245      7211       508 
##   1 weeks    1 year 10 months  10 years 11 months  11 years  12 years  13 years 
##       947     24924      2379      2360      1096       923      1156       739 
##  14 years  15 years  16 years  17 years  18 years  19 years    2 days  2 months 
##       533       459       199       107        64        29       300     23128 
##   2 weeks   2 years  20 years  22 years  23 years  24 years  25 years  28 years 
##      2115     24617        24         5         1         1         2         1 
##    3 days  3 months   3 weeks   3 years  30 years    4 days  4 months   4 weeks 
##       340      9012      2404      9453         1       195      5913      1814 
##   4 years    5 days  5 months   5 weeks   5 years    6 days  6 months   6 years 
##      5719       123      4164       193      5198       185      3727      3401 
##  7 months   7 years  8 months   8 years  9 months   9 years      NULL 
##      2236      3000      2629      2947      1725      1654         4

data <- data %>%
  filter(!Age.upon.Outcome %in% 
           c("-1 years", "-2 years", "-3 years", "-4 years", "0 years"))

There are some rows that are below 0 years which is impossible, I manually checked those rows and indeed the data for birth of the animal or the adoption date were input incorrectly. Luckily, we only have 9 rows like that. I will also remove rows with “0 year” as upon manually checking in the table those are also put by mistake, as adoption happens before birth.

This variables is very messy as there are entries in different units like days, years, weeks. I will clean up this data and then bin the data into bins of “Baby”, “Young”, “Adult”, “Senior”

data <- data %>% 
  filter(!is.na(Age.upon.Outcome), 
         !grepl("-", Age.upon.Outcome),
         Age.upon.Outcome != "NULL")

convert_to_years <- function(age_str) {
  parts <- unlist(strsplit(age_str, " "))
  value <- as.numeric(parts[1])
  unit <- parts[2]
  
  if (grepl("day", unit)) return(value / 365)
  if (grepl("week", unit)) return(value / 52)
  if (grepl("month", unit)) return(value / 12)
  if (grepl("year", unit)) return(value)
  return(NA)
}

data$age_numeric <- sapply(data$Age.upon.Outcome, convert_to_years)

# life stages: Baby (<0.5yr), Young (0.5-2), Adult (2-7), Senior (>7)
data$age_group <- cut(data$age_numeric, 
                          breaks = c(-Inf, 0.5, 2, 7, Inf), 
                          labels = c("Baby", "Young", "Adult", "Senior"))

data <- data %>% select(-Age.upon.Outcome, -age_numeric)

table(data$age_group)

## 
##   Baby  Young  Adult Senior 
##  62524  59606  26771  11205

data$age_group <- factor(data$age_group)

Breed

table(data$Breed)[1:20]

## 
##                           Abyssinian                       Abyssinian Mix 
##                                   13                                   14 
##                        Affenpinscher                    Affenpinscher Mix 
##                                    1                                   12 
##                     Afghan Hound Mix         Afghan Hound/German Shepherd 
##                                    1                                    1 
##      Afghan Hound/Labrador Retriever                     Airedale Terrier 
##                                    1                                    4 
##                 Airedale Terrier Mix       Airedale Terrier/Cairn Terrier 
##                                   35                                    1 
##       Airedale Terrier/Irish Terrier  Airedale Terrier/Labrador Retriever 
##                                    1                                    2 
## Airedale Terrier/Miniature Schnauzer          Airedale Terrier/Otterhound 
##                                    1                                    2 
##     Airedale Terrier/Standard Poodle                               Akbash 
##                                    1                                    2 
##                           Akbash Mix                Akbash/Great Pyrenees 
##                                   10                                    1 
##                                Akita                            Akita Mix 
##                                   37                                   76

Since there are so many of them, I will keep the top 20 and the rest I will assign to “Other” , in this case this informtation will be useful for me:

sum(is.na(data$Breed))

## [1] 0

data <- data %>%
  mutate(
    breed_group = fct_lump_n(Breed, n = 20, other_level = "Other Breed")
  ) %>%
  select(-Breed)

data$breed_group <- factor(data$breed_group)

Color

table(data$Color)[1:20]

## 
##                      Agouti          Agouti/Brown Tabby 
##                          33                           2 
##                 Agouti/Gold                 Agouti/Gray 
##                           1                           1 
##                Agouti/White                     Apricot 
##                           3                          98 
##               Apricot/Brown            Apricot/Tricolor 
##                           3                           2 
##               Apricot/White                       Black 
##                          16                       13117 
##               Black Brindle         Black Brindle/Black 
##                         179                          12 
##          Black Brindle/Blue     Black Brindle/Blue Tick 
##                           1                           2 
##         Black Brindle/Brown Black Brindle/Brown Brindle 
##                          40                           4 
##           Black Brindle/Tan         Black Brindle/White 
##                           4                         330 
##                 Black Smoke           Black Smoke/Black 
##                         237                           1

Since there are also so many of the colors, I will only keep top 20

sum(is.na(data$Color))

## [1] 0

data <- data %>%
  mutate(
    color_group = fct_lump_n(Color, n = 20, other_level = "Other Color")
  ) %>%
  select(-Color)

Assocuation rules - Apriori

names(data)

## [1] "Outcome.Type"    "Outcome.Subtype" "Animal.Type"     "season"         
## [5] "sex"             "if_fixed"        "age_group"       "breed_group"    
## [9] "color_group"

For the analysis of the Outcome type I need to remove the variable Outcome.Subtype, since they are very correlated. Same goes to Month variable, I will remove it as it is implicating that the animal was already adopted, which is not useful for my results. I might use it for later analysis.

data_sel <- data %>% select(
                            Outcome.Type, 
                            Animal.Type, 
                            sex, 
                            if_fixed, 
                            age_group, 
                            breed_group, 
                            color_group, 
                            season)

write.csv(data_sel, file = "data.csv", row.names = FALSE)

#install.packages("arules")
library(arules)

## Warning: package 'arules' was built under R version 4.5.2

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## 
## Attaching package: 'arules'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

trans1<-read.transactions("data.csv", format="basket", sep=",", skip=0) # reading the file as transactions

inspect(trans1[1:10])

##      items                     
## [1]  {age_group,               
##       Animal.Type,             
##       breed_group,             
##       color_group,             
##       if_fixed,                
##       Outcome.Type,            
##       season,                  
##       sex}                     
## [2]  {Adoption,                
##       Baby,                    
##       Dog,                     
##       Female,                  
##       Fixed,                   
##       Other Breed,             
##       Other Color,             
##       Summer}                  
## [3]  {Baby,                    
##       Bird,                    
##       Black,                   
##       Intact,                  
##       Male,                    
##       Other Breed,             
##       Summer,                  
##       Transfer}                
## [4]  {Brown/White,             
##       Chihuahua Shorthair Mix, 
##       Dog,                     
##       Intact,                  
##       Male,                    
##       Transfer,                
##       Winter,                  
##       Young}                   
## [5]  {Bird,                    
##       Intact,                  
##       Male,                    
##       Other Breed,             
##       Other Color,             
##       Transfer,                
##       Winter,                  
##       Young}                   
## [6]  {Dog,                     
##       Intact,                  
##       Male,                    
##       Other Breed,             
##       Tan,                     
##       Transfer,                
##       Winter,                  
##       Young}                   
## [7]  {Baby,                    
##       Cat,                     
##       Domestic Shorthair Mix,  
##       Female,                  
##       Intact,                  
##       Other Color,             
##       Transfer,                
##       Winter}                  
## [8]  {Baby,                    
##       Cat,                     
##       Domestic Longhair Mix,   
##       Female,                  
##       Intact,                  
##       Other Color,             
##       Transfer,                
##       Winter}                  
## [9]  {Adult,                   
##       Dog,                     
##       Fixed,                   
##       Male,                    
##       Other Breed,             
##       Other Color,             
##       Return to Owner,         
##       Winter}                  
## [10] {Adult,                   
##       Chihuahua Shorthair Mix, 
##       Dog,                     
##       Fixed,                   
##       Male,                    
##       Other Color,             
##       Transfer,                
##       Winter}

size(trans1)[1:10]

##  [1] 8 8 8 8 8 8 8 8 8 8

length(trans1)

## [1] 160107

itemLabels(trans1)

##  [1] "Adoption"                  "Adult"                    
##  [3] "age_group"                 "Animal.Type"              
##  [5] "Australian Cattle Dog Mix" "Baby"                     
##  [7] "Bird"                      "Black"                    
##  [9] "Black/Brown"               "Black/Tan"                
## [11] "Black/White"               "Blue"                     
## [13] "Blue/White"                "Border Collie Mix"        
## [15] "Boxer Mix"                 "breed_group"              
## [17] "Brown"                     "Brown Brindle/White"      
## [19] "Brown Tabby"               "Brown Tabby/White"        
## [21] "Brown/White"               "Calico"                   
## [23] "Cat"                       "Chihuahua Shorthair"      
## [25] "Chihuahua Shorthair Mix"   "color_group"              
## [27] "Dachshund Mix"             "Died"                     
## [29] "Disposal"                  "Dog"                      
## [31] "Domestic Longhair Mix"     "Domestic Medium Hair"     
## [33] "Domestic Medium Hair Mix"  "Domestic Shorthair"       
## [35] "Domestic Shorthair Mix"    "Euthanasia"               
## [37] "Fall"                      "Female"                   
## [39] "Fixed"                     "German Shepherd"          
## [41] "German Shepherd Mix"       "if_fixed"                 
## [43] "Intact"                    "Labrador Retriever"       
## [45] "Labrador Retriever Mix"    "Livestock"                
## [47] "Lost"                      "Male"                     
## [49] "Miniature Poodle Mix"      "Missing"                  
## [51] "Orange Tabby"              "Other"                    
## [53] "Other Breed"               "Other Color"              
## [55] "Outcome.Type"              "Pit Bull"                 
## [57] "Pit Bull Mix"              "Relocate"                 
## [59] "Return to Owner"           "Rto-Adopt"                
## [61] "season"                    "Senior"                   
## [63] "sex"                       "Siamese Mix"              
## [65] "Siberian Husky Mix"        "Spring"                   
## [67] "Stolen"                    "Summer"                   
## [69] "Tan"                       "Tan/White"                
## [71] "Tortie"                    "Transfer"                 
## [73] "Tricolor"                  "White"                    
## [75] "White/Black"               "White/Brown"              
## [77] "Winter"                    "Young"

round(itemFrequency(trans1),3)

##                  Adoption                     Adult                 age_group 
##                     0.526                     0.167                     0.000 
##               Animal.Type Australian Cattle Dog Mix                      Baby 
##                     0.000                     0.012                     0.391 
##                      Bird                     Black               Black/Brown 
##                     0.002                     0.082                     0.021 
##                 Black/Tan               Black/White                      Blue 
##                     0.023                     0.106                     0.019 
##                Blue/White         Border Collie Mix                 Boxer Mix 
##                     0.029                     0.007                     0.007 
##               breed_group                     Brown       Brown Brindle/White 
##                     0.000                     0.025                     0.018 
##               Brown Tabby         Brown Tabby/White               Brown/White 
##                     0.061                     0.032                     0.034 
##                    Calico                       Cat       Chihuahua Shorthair 
##                     0.018                     0.402                     0.012 
##   Chihuahua Shorthair Mix               color_group             Dachshund Mix 
##                     0.043                     0.000                     0.007 
##                      Died                  Disposal                       Dog 
##                     0.008                     0.001                     0.585 
##     Domestic Longhair Mix      Domestic Medium Hair  Domestic Medium Hair Mix 
##                     0.010                     0.013                     0.020 
##        Domestic Shorthair    Domestic Shorthair Mix                Euthanasia 
##                     0.139                     0.194                     0.028 
##                      Fall                    Female                     Fixed 
##                     0.258                     0.481                     0.725 
##           German Shepherd       German Shepherd Mix                  if_fixed 
##                     0.011                     0.025                     0.000 
##                    Intact        Labrador Retriever    Labrador Retriever Mix 
##                     0.275                     0.012                     0.054 
##                 Livestock                      Lost                      Male 
##                     0.000                     0.000                     0.519 
##      Miniature Poodle Mix                   Missing              Orange Tabby 
##                     0.006                     0.001                     0.030 
##                     Other               Other Breed               Other Color 
##                     0.011                     0.327                     0.319 
##              Outcome.Type                  Pit Bull              Pit Bull Mix 
##                     0.000                     0.021                     0.063 
##                  Relocate           Return to Owner                 Rto-Adopt 
##                     0.000                     0.159                     0.008 
##                    season                    Senior                       sex 
##                     0.000                     0.070                     0.000 
##               Siamese Mix        Siberian Husky Mix                    Spring 
##                     0.009                     0.006                     0.227 
##                    Stolen                    Summer                       Tan 
##                     0.000                     0.281                     0.026 
##                 Tan/White                    Tortie                  Transfer 
##                     0.031                     0.019                     0.269 
##                  Tricolor                     White               White/Black 
##                     0.022                     0.036                     0.029 
##               White/Brown                    Winter                     Young 
##                     0.020                     0.233                     0.372

Since I have 160.000 rows I will sample the data to not crush the RStudio

set.seed(123)
trans_sample <- trans1[sample(1:length(trans1), 30000)]  # 30k transactions

unique(data$Outcome.Type)

##  [1] Adoption        Transfer        Return to Owner Missing        
##  [5] Died            Euthanasia      Rto-Adopt       Disposal       
##  [9] Relocate        Stolen          Lost           
## 11 Levels: Adoption Died Disposal Euthanasia Lost Missing ... Transfer

Now let me find out what is the profile of the Animls that get adopted.

rules.Adoption<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Adoption"), control=list(verbose=F)) 

rules.Adoption.byconf<-sort(rules.Adoption, by="confidence", decreasing=TRUE)
inspect(head(rules.Adoption.byconf))

##     lhs                        rhs            support confidence    coverage     lift count
## [1] {Baby,                                                                                 
##      Fixed,                                                                                
##      Siamese Mix,                                                                          
##      Summer}                => {Adoption} 0.001000000          1 0.001000000 1.902708    30
## [2] {Baby,                                                                                 
##      Domestic Medium Hair,                                                                 
##      Fall,                                                                                 
##      Fixed}                 => {Adoption} 0.002033333          1 0.002033333 1.902708    61
## [3] {Baby,                                                                                 
##      Domestic Medium Hair,                                                                 
##      Fixed,                                                                                
##      Other Color}           => {Adoption} 0.001966667          1 0.001966667 1.902708    59
## [4] {Baby,                                                                                 
##      Calico,                                                                               
##      Fixed,                                                                                
##      Winter}                => {Adoption} 0.001233333          1 0.001233333 1.902708    37
## [5] {Baby,                                                                                 
##      Blue/White,                                                                           
##      Domestic Shorthair,                                                                   
##      Fixed}                 => {Adoption} 0.001233333          1 0.001233333 1.902708    37
## [6] {Baby,                                                                                 
##      Fall,                                                                                 
##      Fixed,                                                                                
##      Tan/White}             => {Adoption} 0.001266667          1 0.001266667 1.902708    38

So, it seems like mostly people adopt Baby animals. The profile with the biggest support and coverage is {Baby, Domestic Medium Hair, Fixed, Other Color}. Lifet is 1.9 which indicates the positive relationship between the Adoption and this profile. I can tell the same about the rest of the 5 profiles. also it looks like Cats appear in the profile more than other animals.

What about the animals that return to owners?

rules.RTO<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Return to Owner"), control=list(verbose=F)) 

rules.RTO.byconf<-sort(rules.RTO, by="confidence", decreasing=TRUE)
inspect(head(rules.RTO.byconf))

##     lhs                     rhs                   support confidence    coverage     lift count
## [1] {German Shepherd,                                                                          
##      Intact,                                                                                   
##      Male,                                                                                     
##      Young}              => {Return to Owner} 0.001000000  0.8571429 0.001166667 5.438724    30
## [2] {Dog,                                                                                      
##      German Shepherd,                                                                          
##      Intact,                                                                                   
##      Male,                                                                                     
##      Young}              => {Return to Owner} 0.001000000  0.8571429 0.001166667 5.438724    30
## [3] {Intact,                                                                                   
##      Siberian Husky Mix} => {Return to Owner} 0.001066667  0.7804878 0.001366667 4.952334    32
## [4] {Dog,                                                                                      
##      Intact,                                                                                   
##      Siberian Husky Mix} => {Return to Owner} 0.001066667  0.7804878 0.001366667 4.952334    32
## [5] {German Shepherd,                                                                          
##      Intact,                                                                                   
##      Young}              => {Return to Owner} 0.001200000  0.7659574 0.001566667 4.860136    36
## [6] {Dog,                                                                                      
##      German Shepherd,                                                                          
##      Intact,                                                                                   
##      Young}              => {Return to Owner} 0.001200000  0.7659574 0.001566667 4.860136    36

It is very clear that German Shepherd, Young Intact Male, return to owner the most. For me this raises a question if this is because Intact Male German Shepherds also run away from the owners the most or there are other reason for that. Lift for both of the profiles is very high (over 5), so we can be confident that there is a positive relationship between this profile of German Shepherd, Young Intact Male and Return to Owner outcome

Now I want to find out what animals sadly are Euthanised

rules.Euthanasia<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Euthanasia"), control=list(verbose=F)) 

rules.Euthanasia.byconf<-sort(rules.Euthanasia, by="confidence", decreasing=TRUE)
inspect(head(rules.Euthanasia.byconf))

##     lhs                            rhs          support     confidence
## [1] {Adult, Cat, Intact}        => {Euthanasia} 0.001033333 0.2183099 
## [2] {Female, Intact, Senior}    => {Euthanasia} 0.001166667 0.1832461 
## [3] {Intact, Senior}            => {Euthanasia} 0.002366667 0.1779449 
## [4] {Intact, Male, Senior}      => {Euthanasia} 0.001200000 0.1730769 
## [5] {Cat, Intact, Male, Winter} => {Euthanasia} 0.001000000 0.1595745 
## [6] {Cat, Intact, Male, Young}  => {Euthanasia} 0.001466667 0.1456954 
##     coverage    lift     count
## [1] 0.004733333 7.938540 31   
## [2] 0.006366667 6.663494 35   
## [3] 0.013300000 6.470722 71   
## [4] 0.006933333 6.293706 36   
## [5] 0.006266667 5.802708 30   
## [6] 0.010066667 5.298013 44

Mostly seniors and adults, also very often cats. Unfortunately young cats as well. And all of the profiles are supported by a very high lift (almost 8) and quite high confidence which proves the positive relationship.

Now I want to analyse what are the profiles of animals that get transfered.

rules.Transfer<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Transfer"), control=list(verbose=F)) 

rules.Transfer.byconf<-sort(rules.Transfer, by="confidence", decreasing=TRUE)
inspect(head(rules.Transfer.byconf))

##     lhs                      rhs            support confidence    coverage     lift count
## [1] {Brown Tabby,                                                                        
##      Intact,                                                                             
##      Summer,                                                                             
##      Young}               => {Transfer} 0.001100000          1 0.001100000 3.700962    33
## [2] {Brown Tabby,                                                                        
##      Domestic Shorthair,                                                                 
##      Female,                                                                             
##      Intact,                                                                             
##      Summer}              => {Transfer} 0.001400000          1 0.001400000 3.700962    42
## [3] {Brown Tabby,                                                                        
##      Cat,                                                                                
##      Intact,                                                                             
##      Summer,                                                                             
##      Young}               => {Transfer} 0.001100000          1 0.001100000 3.700962    33
## [4] {Baby,                                                                               
##      Brown Tabby,                                                                        
##      Domestic Shorthair,                                                                 
##      Female,                                                                             
##      Intact,                                                                             
##      Summer}              => {Transfer} 0.001066667          1 0.001066667 3.700962    32
## [5] {Brown Tabby,                                                                        
##      Cat,                                                                                
##      Domestic Shorthair,                                                                 
##      Female,                                                                             
##      Intact,                                                                             
##      Summer}              => {Transfer} 0.001400000          1 0.001400000 3.700962    42
## [6] {Baby,                                                                               
##      Brown Tabby,                                                                        
##      Cat,                                                                                
##      Domestic Shorthair,                                                                 
##      Female,                                                                             
##      Intact,                                                                             
##      Summer}              => {Transfer} 0.001066667          1 0.001066667 3.700962    32

Interesting observation that Brown Tabby intact animals, cats mostly - is the biggest profile with a solid lift and confidence of 1 !! which is huge.

Assocuation rules - ECLAT

Let me find a frequent itemset with a min suppot of 0.05

itemsets <- eclat(trans_sample, parameter = list(supp = 0.05, maxlen = 15))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.05      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 1500 
## 
## create itemset ... 
## set transactions ...[70 item(s), 30000 transaction(s)] done [0.01s].
## sorting and recoding items ... [26 item(s)] done [0.00s].
## creating bit matrix ... [26 row(s), 30000 column(s)] done [0.00s].
## writing  ... [517 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

itemsets_sorted <- sort(itemsets, by = "support")
inspect(head(itemsets_sorted, 10))

##      items             support   count
## [1]  {Fixed}           0.7242333 21727
## [2]  {Dog}             0.5852667 17558
## [3]  {Adoption}        0.5255667 15767
## [4]  {Male}            0.5163333 15490
## [5]  {Adoption, Fixed} 0.5044667 15134
## [6]  {Female}          0.4836667 14510
## [7]  {Dog, Fixed}      0.4485667 13457
## [8]  {Cat}             0.4009667 12029
## [9]  {Baby}            0.3918667 11756
## [10] {Fixed, Male}     0.3784333 11353

Let me plot the scatter plot for Rules of adoption and how it reflects on confidence, support, lift chart.

# install.packages("arulesViz")
library(arulesViz)

plot(rules.Adoption, method = "scatterplot", engine = "ggplot2")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

# --- EDA: Top Items Frequency ---
itemFrequencyPlot(trans_sample, topN = 20, type = "absolute", 
                  col = "steelblue", main = "Top 20 Most Frequent Animal Attributes")

[USL] Association rules project - Kamilla Tadjibaeva

Kamilla Tadjibaeva