Link to download the dataset: https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes-10-01-2013-to-05-05-/9t4d-g238/about_data
The goal of this project is to find rules or patterns by which people adopt different types of anymals from the shelters to drive higher rates of adoption through the correct advertising and preferences changes.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read.csv("Austin_Animal_Center_Outcomes_(10_01_2013_to_05_05_2025)_20260127.csv")
str(data)
## 'data.frame': 173775 obs. of 12 variables:
## $ Animal.ID : chr "A668305" "A673335" "A675999" "A679066" ...
## $ Date.of.Birth : chr "2012-12-01" "2012-02-22" "2013-04-03" "2014-04-16" ...
## $ Name : chr "" "" "" "" ...
## $ DateTime : chr "2013-12-02T00:00:00-05:00" "2014-02-22T00:00:00-05:00" "2014-04-07T00:00:00-05:00" "2014-05-16T00:00:00-05:00" ...
## $ MonthYear : chr "12-2013" "02-2014" "04-2014" "05-2014" ...
## $ Outcome.Type : chr "Transfer" "Euthanasia" "Transfer" "" ...
## $ Outcome.Subtype : chr "Partner" "Suffering" "Partner" "" ...
## $ Animal.Type : chr "Other" "Other" "Other" "Other" ...
## $ Sex.upon.Outcome: chr "Unknown" "Unknown" "Unknown" "Unknown" ...
## $ Age.upon.Outcome: chr "1 year" "2 years" "1 year" "4 weeks" ...
## $ Breed : chr "Turtle Mix" "Raccoon" "Turtle Mix" "Rabbit Sh" ...
## $ Color : chr "Brown/Yellow" "Black/Gray" "Green" "Brown" ...
#View(data)
We do not need all the variables in this dataset, for example animal name, id, date are not relevant in my context and will not allow the model to generalise well.
data <- data %>% select(MonthYear,
Outcome.Type,
Outcome.Subtype,
Animal.Type,
Sex.upon.Outcome,
Age.upon.Outcome,
Breed,
Color
)
summary(data)
## MonthYear Outcome.Type Outcome.Subtype Animal.Type
## Length:173775 Length:173775 Length:173775 Length:173775
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## Sex.upon.Outcome Age.upon.Outcome Breed Color
## Length:173775 Length:173775 Length:173775 Length:173775
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
Let me handle the data feature by feature.
Firstly, MonthYear - represents in which month of which year a pet was adopted. Since the goal of this project is to use the results for the future, I am only interested in month metric. So I will categorise the month by its number and factor it.
data <- data %>% separate(MonthYear, into = c("month", "year"), sep = "-", convert = TRUE)
data <- data %>% select(-year)
unique(data$month)
## [1] 12 2 4 5 6 7 8 9 1 3 10 11
data$month <- factor(data$month)
All months are present, so I do not need to look into that.
sum(is.na(data$month) | data$month == 0)
## [1] 0
No missing values
ggplot(data, aes(x = month)) +
geom_bar() +
labs(
title = "Pets adoption count in different months in the past 12 years in Austin",
y = "number of pets"
)
Instead of month I actually want to encode season
data$season <- factor(ifelse(data$month %in% c(12,1,2), "Winter",
ifelse(data$month %in% c(3,4,5), "Spring",
ifelse(data$month %in% c(6,7,8), "Summer", "Fall"))))
Month on its own does not give me much information and may be too granular for my analysis, so I will remove it.
data <- data %>% select(-month)
ggplot(data, aes(x=season)) +
geom_bar(fill="skyblue") +
labs(title="Pet adoptions per season")
table(data$Outcome.Type)
##
## Adoption Died Disposal Euthanasia
## 46 84598 1672 877 10833
## Lost Missing Relocate Return to Owner Rto-Adopt
## 2 92 29 25691 1241
## Stolen Transfer
## 5 48689
data <- data[data$Outcome.Type != "",]
There are 46 occurances of “” empty sting value, which I wanted to assign to “Other”. However in case of association rules I think this could lead to rules that include this “Other” which will not bring any value to my research. So I will drop those values.
sum(is.na(data$Outcome.Type))
## [1] 0
data$Outcome.Type <- factor(data$Outcome.Type)
table(data$Outcome.Subtype)
##
## Aggressive At Vet Barn
## 94071 613 357 16
## Behavior Court/Investigation Customer S Emer
## 176 103 21 11
## Emergency Enroute Field Foster
## 54 112 243 17949
## In Foster In Kennel In State In Surgery
## 416 873 12 33
## Medical Offsite Out State Partner
## 355 512 931 40410
## Possible Theft Prc Rabies Risk SCRP
## 16 20 4922 3211
## Snr Suffering Underage
## 4101 4154 37
data$Outcome.Subtype[data$Outcome.Subtype == ""] <- "Other"
data$Outcome.Subtype = factor(data$Outcome.Subtype)
There are a lot of empty outcome subtypes (over 90.000 entries). Initially I wanted to remove this feature as we already have outcome type and this might not be so useful. However not that I see what kind of data there is actually inside I think this will be very useful for some of the association rules. So I am leaving it.
sum(is.na(data$Outcome.Subtype))
## [1] 0
table(data$Animal.Type)
##
## Bird Cat Dog Livestock Other
## 876 69390 94475 34 8954
sum(is.na(data$Animal.Type))
## [1] 0
data$Animal.Type <- factor(data$Animal.Type)
table(data$Sex.upon.Outcome)
##
## Intact Female Intact Male Neutered Male NULL Spayed Female
## 21802 22255 60923 1 55253
## Unknown
## 13495
data <- data[data$Sex.upon.Outcome != "Unknown" & data$Sex.upon.Outcome != "NULL",]
Even though I have substantial number of rows that are unkown , for the same reason I described above, I will drop such rows. Here I might have done clustering and just assign values from the cluster that the row “Unknown” was assigned to. But since the project is about association rules, I will not include this in here.
sum(is.na(data$Sex.upon.Outcome))
## [1] 0
I want to divide the sex feature and if the animal has undergone castration (were “fixed”) feature:
data <- data %>%
mutate(
sex = if_else(str_detect(Sex.upon.Outcome, "Female"), "Female", "Male"),
if_fixed = if_else(str_detect(Sex.upon.Outcome, "Intact"), "Intact", "Fixed")
) %>%
select(-Sex.upon.Outcome)
data$sex <- factor(data$sex)
data$if_fixed <- factor(data$if_fixed)
table(data$Age.upon.Outcome)
##
## -1 years -2 years -3 years -4 years 0 years 1 day 1 month 1 week
## 5 1 2 1 114 245 7211 508
## 1 weeks 1 year 10 months 10 years 11 months 11 years 12 years 13 years
## 947 24924 2379 2360 1096 923 1156 739
## 14 years 15 years 16 years 17 years 18 years 19 years 2 days 2 months
## 533 459 199 107 64 29 300 23128
## 2 weeks 2 years 20 years 22 years 23 years 24 years 25 years 28 years
## 2115 24617 24 5 1 1 2 1
## 3 days 3 months 3 weeks 3 years 30 years 4 days 4 months 4 weeks
## 340 9012 2404 9453 1 195 5913 1814
## 4 years 5 days 5 months 5 weeks 5 years 6 days 6 months 6 years
## 5719 123 4164 193 5198 185 3727 3401
## 7 months 7 years 8 months 8 years 9 months 9 years NULL
## 2236 3000 2629 2947 1725 1654 4
data <- data %>%
filter(!Age.upon.Outcome %in%
c("-1 years", "-2 years", "-3 years", "-4 years", "0 years"))
There are some rows that are below 0 years which is impossible, I manually checked those rows and indeed the data for birth of the animal or the adoption date were input incorrectly. Luckily, we only have 9 rows like that. I will also remove rows with “0 year” as upon manually checking in the table those are also put by mistake, as adoption happens before birth.
This variables is very messy as there are entries in different units like days, years, weeks. I will clean up this data and then bin the data into bins of “Baby”, “Young”, “Adult”, “Senior”
data <- data %>%
filter(!is.na(Age.upon.Outcome),
!grepl("-", Age.upon.Outcome),
Age.upon.Outcome != "NULL")
convert_to_years <- function(age_str) {
parts <- unlist(strsplit(age_str, " "))
value <- as.numeric(parts[1])
unit <- parts[2]
if (grepl("day", unit)) return(value / 365)
if (grepl("week", unit)) return(value / 52)
if (grepl("month", unit)) return(value / 12)
if (grepl("year", unit)) return(value)
return(NA)
}
data$age_numeric <- sapply(data$Age.upon.Outcome, convert_to_years)
# life stages: Baby (<0.5yr), Young (0.5-2), Adult (2-7), Senior (>7)
data$age_group <- cut(data$age_numeric,
breaks = c(-Inf, 0.5, 2, 7, Inf),
labels = c("Baby", "Young", "Adult", "Senior"))
data <- data %>% select(-Age.upon.Outcome, -age_numeric)
table(data$age_group)
##
## Baby Young Adult Senior
## 62524 59606 26771 11205
data$age_group <- factor(data$age_group)
table(data$Breed)[1:20]
##
## Abyssinian Abyssinian Mix
## 13 14
## Affenpinscher Affenpinscher Mix
## 1 12
## Afghan Hound Mix Afghan Hound/German Shepherd
## 1 1
## Afghan Hound/Labrador Retriever Airedale Terrier
## 1 4
## Airedale Terrier Mix Airedale Terrier/Cairn Terrier
## 35 1
## Airedale Terrier/Irish Terrier Airedale Terrier/Labrador Retriever
## 1 2
## Airedale Terrier/Miniature Schnauzer Airedale Terrier/Otterhound
## 1 2
## Airedale Terrier/Standard Poodle Akbash
## 1 2
## Akbash Mix Akbash/Great Pyrenees
## 10 1
## Akita Akita Mix
## 37 76
Since there are so many of them, I will keep the top 20 and the rest I will assign to “Other” , in this case this informtation will be useful for me:
sum(is.na(data$Breed))
## [1] 0
data <- data %>%
mutate(
breed_group = fct_lump_n(Breed, n = 20, other_level = "Other Breed")
) %>%
select(-Breed)
data$breed_group <- factor(data$breed_group)
table(data$Color)[1:20]
##
## Agouti Agouti/Brown Tabby
## 33 2
## Agouti/Gold Agouti/Gray
## 1 1
## Agouti/White Apricot
## 3 98
## Apricot/Brown Apricot/Tricolor
## 3 2
## Apricot/White Black
## 16 13117
## Black Brindle Black Brindle/Black
## 179 12
## Black Brindle/Blue Black Brindle/Blue Tick
## 1 2
## Black Brindle/Brown Black Brindle/Brown Brindle
## 40 4
## Black Brindle/Tan Black Brindle/White
## 4 330
## Black Smoke Black Smoke/Black
## 237 1
Since there are also so many of the colors, I will only keep top 20
sum(is.na(data$Color))
## [1] 0
data <- data %>%
mutate(
color_group = fct_lump_n(Color, n = 20, other_level = "Other Color")
) %>%
select(-Color)
names(data)
## [1] "Outcome.Type" "Outcome.Subtype" "Animal.Type" "season"
## [5] "sex" "if_fixed" "age_group" "breed_group"
## [9] "color_group"
For the analysis of the Outcome type I need to remove the variable Outcome.Subtype, since they are very correlated. Same goes to Month variable, I will remove it as it is implicating that the animal was already adopted, which is not useful for my results. I might use it for later analysis.
data_sel <- data %>% select(
Outcome.Type,
Animal.Type,
sex,
if_fixed,
age_group,
breed_group,
color_group,
season)
write.csv(data_sel, file = "data.csv", row.names = FALSE)
#install.packages("arules")
library(arules)
## Warning: package 'arules' was built under R version 4.5.2
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
##
## recode
## The following objects are masked from 'package:base':
##
## abbreviate, write
trans1<-read.transactions("data.csv", format="basket", sep=",", skip=0) # reading the file as transactions
inspect(trans1[1:10])
## items
## [1] {age_group,
## Animal.Type,
## breed_group,
## color_group,
## if_fixed,
## Outcome.Type,
## season,
## sex}
## [2] {Adoption,
## Baby,
## Dog,
## Female,
## Fixed,
## Other Breed,
## Other Color,
## Summer}
## [3] {Baby,
## Bird,
## Black,
## Intact,
## Male,
## Other Breed,
## Summer,
## Transfer}
## [4] {Brown/White,
## Chihuahua Shorthair Mix,
## Dog,
## Intact,
## Male,
## Transfer,
## Winter,
## Young}
## [5] {Bird,
## Intact,
## Male,
## Other Breed,
## Other Color,
## Transfer,
## Winter,
## Young}
## [6] {Dog,
## Intact,
## Male,
## Other Breed,
## Tan,
## Transfer,
## Winter,
## Young}
## [7] {Baby,
## Cat,
## Domestic Shorthair Mix,
## Female,
## Intact,
## Other Color,
## Transfer,
## Winter}
## [8] {Baby,
## Cat,
## Domestic Longhair Mix,
## Female,
## Intact,
## Other Color,
## Transfer,
## Winter}
## [9] {Adult,
## Dog,
## Fixed,
## Male,
## Other Breed,
## Other Color,
## Return to Owner,
## Winter}
## [10] {Adult,
## Chihuahua Shorthair Mix,
## Dog,
## Fixed,
## Male,
## Other Color,
## Transfer,
## Winter}
size(trans1)[1:10]
## [1] 8 8 8 8 8 8 8 8 8 8
length(trans1)
## [1] 160107
itemLabels(trans1)
## [1] "Adoption" "Adult"
## [3] "age_group" "Animal.Type"
## [5] "Australian Cattle Dog Mix" "Baby"
## [7] "Bird" "Black"
## [9] "Black/Brown" "Black/Tan"
## [11] "Black/White" "Blue"
## [13] "Blue/White" "Border Collie Mix"
## [15] "Boxer Mix" "breed_group"
## [17] "Brown" "Brown Brindle/White"
## [19] "Brown Tabby" "Brown Tabby/White"
## [21] "Brown/White" "Calico"
## [23] "Cat" "Chihuahua Shorthair"
## [25] "Chihuahua Shorthair Mix" "color_group"
## [27] "Dachshund Mix" "Died"
## [29] "Disposal" "Dog"
## [31] "Domestic Longhair Mix" "Domestic Medium Hair"
## [33] "Domestic Medium Hair Mix" "Domestic Shorthair"
## [35] "Domestic Shorthair Mix" "Euthanasia"
## [37] "Fall" "Female"
## [39] "Fixed" "German Shepherd"
## [41] "German Shepherd Mix" "if_fixed"
## [43] "Intact" "Labrador Retriever"
## [45] "Labrador Retriever Mix" "Livestock"
## [47] "Lost" "Male"
## [49] "Miniature Poodle Mix" "Missing"
## [51] "Orange Tabby" "Other"
## [53] "Other Breed" "Other Color"
## [55] "Outcome.Type" "Pit Bull"
## [57] "Pit Bull Mix" "Relocate"
## [59] "Return to Owner" "Rto-Adopt"
## [61] "season" "Senior"
## [63] "sex" "Siamese Mix"
## [65] "Siberian Husky Mix" "Spring"
## [67] "Stolen" "Summer"
## [69] "Tan" "Tan/White"
## [71] "Tortie" "Transfer"
## [73] "Tricolor" "White"
## [75] "White/Black" "White/Brown"
## [77] "Winter" "Young"
round(itemFrequency(trans1),3)
## Adoption Adult age_group
## 0.526 0.167 0.000
## Animal.Type Australian Cattle Dog Mix Baby
## 0.000 0.012 0.391
## Bird Black Black/Brown
## 0.002 0.082 0.021
## Black/Tan Black/White Blue
## 0.023 0.106 0.019
## Blue/White Border Collie Mix Boxer Mix
## 0.029 0.007 0.007
## breed_group Brown Brown Brindle/White
## 0.000 0.025 0.018
## Brown Tabby Brown Tabby/White Brown/White
## 0.061 0.032 0.034
## Calico Cat Chihuahua Shorthair
## 0.018 0.402 0.012
## Chihuahua Shorthair Mix color_group Dachshund Mix
## 0.043 0.000 0.007
## Died Disposal Dog
## 0.008 0.001 0.585
## Domestic Longhair Mix Domestic Medium Hair Domestic Medium Hair Mix
## 0.010 0.013 0.020
## Domestic Shorthair Domestic Shorthair Mix Euthanasia
## 0.139 0.194 0.028
## Fall Female Fixed
## 0.258 0.481 0.725
## German Shepherd German Shepherd Mix if_fixed
## 0.011 0.025 0.000
## Intact Labrador Retriever Labrador Retriever Mix
## 0.275 0.012 0.054
## Livestock Lost Male
## 0.000 0.000 0.519
## Miniature Poodle Mix Missing Orange Tabby
## 0.006 0.001 0.030
## Other Other Breed Other Color
## 0.011 0.327 0.319
## Outcome.Type Pit Bull Pit Bull Mix
## 0.000 0.021 0.063
## Relocate Return to Owner Rto-Adopt
## 0.000 0.159 0.008
## season Senior sex
## 0.000 0.070 0.000
## Siamese Mix Siberian Husky Mix Spring
## 0.009 0.006 0.227
## Stolen Summer Tan
## 0.000 0.281 0.026
## Tan/White Tortie Transfer
## 0.031 0.019 0.269
## Tricolor White White/Black
## 0.022 0.036 0.029
## White/Brown Winter Young
## 0.020 0.233 0.372
Since I have 160.000 rows I will sample the data to not crush the RStudio
set.seed(123)
trans_sample <- trans1[sample(1:length(trans1), 30000)] # 30k transactions
unique(data$Outcome.Type)
## [1] Adoption Transfer Return to Owner Missing
## [5] Died Euthanasia Rto-Adopt Disposal
## [9] Relocate Stolen Lost
## 11 Levels: Adoption Died Disposal Euthanasia Lost Missing ... Transfer
Now let me find out what is the profile of the Animls that get adopted.
rules.Adoption<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Adoption"), control=list(verbose=F))
rules.Adoption.byconf<-sort(rules.Adoption, by="confidence", decreasing=TRUE)
inspect(head(rules.Adoption.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Baby,
## Fixed,
## Siamese Mix,
## Summer} => {Adoption} 0.001000000 1 0.001000000 1.902708 30
## [2] {Baby,
## Domestic Medium Hair,
## Fall,
## Fixed} => {Adoption} 0.002033333 1 0.002033333 1.902708 61
## [3] {Baby,
## Domestic Medium Hair,
## Fixed,
## Other Color} => {Adoption} 0.001966667 1 0.001966667 1.902708 59
## [4] {Baby,
## Calico,
## Fixed,
## Winter} => {Adoption} 0.001233333 1 0.001233333 1.902708 37
## [5] {Baby,
## Blue/White,
## Domestic Shorthair,
## Fixed} => {Adoption} 0.001233333 1 0.001233333 1.902708 37
## [6] {Baby,
## Fall,
## Fixed,
## Tan/White} => {Adoption} 0.001266667 1 0.001266667 1.902708 38
So, it seems like mostly people adopt Baby animals. The profile with the biggest support and coverage is {Baby, Domestic Medium Hair, Fixed, Other Color}. Lifet is 1.9 which indicates the positive relationship between the Adoption and this profile. I can tell the same about the rest of the 5 profiles. also it looks like Cats appear in the profile more than other animals.
What about the animals that return to owners?
rules.RTO<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Return to Owner"), control=list(verbose=F))
rules.RTO.byconf<-sort(rules.RTO, by="confidence", decreasing=TRUE)
inspect(head(rules.RTO.byconf))
## lhs rhs support confidence coverage lift count
## [1] {German Shepherd,
## Intact,
## Male,
## Young} => {Return to Owner} 0.001000000 0.8571429 0.001166667 5.438724 30
## [2] {Dog,
## German Shepherd,
## Intact,
## Male,
## Young} => {Return to Owner} 0.001000000 0.8571429 0.001166667 5.438724 30
## [3] {Intact,
## Siberian Husky Mix} => {Return to Owner} 0.001066667 0.7804878 0.001366667 4.952334 32
## [4] {Dog,
## Intact,
## Siberian Husky Mix} => {Return to Owner} 0.001066667 0.7804878 0.001366667 4.952334 32
## [5] {German Shepherd,
## Intact,
## Young} => {Return to Owner} 0.001200000 0.7659574 0.001566667 4.860136 36
## [6] {Dog,
## German Shepherd,
## Intact,
## Young} => {Return to Owner} 0.001200000 0.7659574 0.001566667 4.860136 36
It is very clear that German Shepherd, Young Intact Male, return to owner the most. For me this raises a question if this is because Intact Male German Shepherds also run away from the owners the most or there are other reason for that. Lift for both of the profiles is very high (over 5), so we can be confident that there is a positive relationship between this profile of German Shepherd, Young Intact Male and Return to Owner outcome
Now I want to find out what animals sadly are Euthanised
rules.Euthanasia<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Euthanasia"), control=list(verbose=F))
rules.Euthanasia.byconf<-sort(rules.Euthanasia, by="confidence", decreasing=TRUE)
inspect(head(rules.Euthanasia.byconf))
## lhs rhs support confidence
## [1] {Adult, Cat, Intact} => {Euthanasia} 0.001033333 0.2183099
## [2] {Female, Intact, Senior} => {Euthanasia} 0.001166667 0.1832461
## [3] {Intact, Senior} => {Euthanasia} 0.002366667 0.1779449
## [4] {Intact, Male, Senior} => {Euthanasia} 0.001200000 0.1730769
## [5] {Cat, Intact, Male, Winter} => {Euthanasia} 0.001000000 0.1595745
## [6] {Cat, Intact, Male, Young} => {Euthanasia} 0.001466667 0.1456954
## coverage lift count
## [1] 0.004733333 7.938540 31
## [2] 0.006366667 6.663494 35
## [3] 0.013300000 6.470722 71
## [4] 0.006933333 6.293706 36
## [5] 0.006266667 5.802708 30
## [6] 0.010066667 5.298013 44
Mostly seniors and adults, also very often cats. Unfortunately young cats as well. And all of the profiles are supported by a very high lift (almost 8) and quite high confidence which proves the positive relationship.
Now I want to analyse what are the profiles of animals that get transfered.
rules.Transfer<-apriori(data=trans_sample, parameter=list(supp=0.001, conf=0.08), appearance=list(default="lhs", rhs="Transfer"), control=list(verbose=F))
rules.Transfer.byconf<-sort(rules.Transfer, by="confidence", decreasing=TRUE)
inspect(head(rules.Transfer.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Brown Tabby,
## Intact,
## Summer,
## Young} => {Transfer} 0.001100000 1 0.001100000 3.700962 33
## [2] {Brown Tabby,
## Domestic Shorthair,
## Female,
## Intact,
## Summer} => {Transfer} 0.001400000 1 0.001400000 3.700962 42
## [3] {Brown Tabby,
## Cat,
## Intact,
## Summer,
## Young} => {Transfer} 0.001100000 1 0.001100000 3.700962 33
## [4] {Baby,
## Brown Tabby,
## Domestic Shorthair,
## Female,
## Intact,
## Summer} => {Transfer} 0.001066667 1 0.001066667 3.700962 32
## [5] {Brown Tabby,
## Cat,
## Domestic Shorthair,
## Female,
## Intact,
## Summer} => {Transfer} 0.001400000 1 0.001400000 3.700962 42
## [6] {Baby,
## Brown Tabby,
## Cat,
## Domestic Shorthair,
## Female,
## Intact,
## Summer} => {Transfer} 0.001066667 1 0.001066667 3.700962 32
Interesting observation that Brown Tabby intact animals, cats mostly - is the biggest profile with a solid lift and confidence of 1 !! which is huge.
Let me find a frequent itemset with a min suppot of 0.05
itemsets <- eclat(trans_sample, parameter = list(supp = 0.05, maxlen = 15))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.05 1 15 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 1500
##
## create itemset ...
## set transactions ...[70 item(s), 30000 transaction(s)] done [0.01s].
## sorting and recoding items ... [26 item(s)] done [0.00s].
## creating bit matrix ... [26 row(s), 30000 column(s)] done [0.00s].
## writing ... [517 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
itemsets_sorted <- sort(itemsets, by = "support")
inspect(head(itemsets_sorted, 10))
## items support count
## [1] {Fixed} 0.7242333 21727
## [2] {Dog} 0.5852667 17558
## [3] {Adoption} 0.5255667 15767
## [4] {Male} 0.5163333 15490
## [5] {Adoption, Fixed} 0.5044667 15134
## [6] {Female} 0.4836667 14510
## [7] {Dog, Fixed} 0.4485667 13457
## [8] {Cat} 0.4009667 12029
## [9] {Baby} 0.3918667 11756
## [10] {Fixed, Male} 0.3784333 11353
Let me plot the scatter plot for Rules of adoption and how it reflects on confidence, support, lift chart.
# install.packages("arulesViz")
library(arulesViz)
plot(rules.Adoption, method = "scatterplot", engine = "ggplot2")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
# --- EDA: Top Items Frequency ---
itemFrequencyPlot(trans_sample, topN = 20, type = "absolute",
col = "steelblue", main = "Top 20 Most Frequent Animal Attributes")