Introduction


The Problem and Why I Care

I first became interested in studying animal shelters last year, when my community learned of some devastating news: after receiving some anonymous tips about sub-par conditions, the Guilford County Sheriff’s office decided to visit the Guilford County Animal Shelter to see if the accusations had merit. Unfortunately, they did. Dozens of cases of animal abuse, neglect, and inhumane euthanasia were found at the shelter, which was run by a private company on behalf of the county. The private company was disbanded, criminal charges were filed, and the county has begun to self-manage the shelter. The abrupt change in management has been challenging for the county, and has and brought some very real concerns about animal safety and adoption to light. Although the private company may have kept records such as Austin’s in the past that could be used by Guilford County to improve decision making going forward, the data would be untrustworthy. The claims of animal neglect stem from allegations that records were mishandled and suffering animals were not euthanized when necessary so the private company could boast a low euthanization and high adoption rates. The general problem the county faces right now is that they just don’t have any reliable information to guide their decision making.


Solving The Problem

I plan to study Austin’s data to glean insights about general conceptions people have regarding animal shelters, described below:

  • Question 1: How many animals that come through the shelter get adopted? Get transferred? Get euthanized?
    • Methodology: Study the counts of the different outcomes in the Output data.
    • Why it’s important: Knowing this statistic would be beneficial for the county to know so they can project their expenses. If the survival rate (the rate of any outcome besides euthanasia) is high, that should be published for public relations purposes.
    • Packages Required: tidyverse to read in the data, ggplot to create a histogram.
  • Question 2: How long do animals stay in the shelter?
    • Methodology: This would require tracking the Animal ID variable to compare Intake Date in the Intake data with the Outcome Date and Outcome Type from the Outcome data. I could then use my results to sort by Outcome Type and compare my results to what I calculate in my first bullet.
    • Why it’s important: If If the average time an animal stays at the shelter is high, perhaps the shelter needs to push their marketing efforts. Dog customers know where they are located? Are the adoption fees comparable to other counties? Should more pictures of animals be put on social media?
    • Packages Required: tidyverse to read in the data, rename() and mutate() to prepare for merging, lubridate to change the Date columns from character vectors to dates. Do an inner mutating join of the data sets so Outcome Date can be subtracted from Intake Date to create the new variable Days at Shelter. ggplot to chart the findings.
  • Question 3: Are older animals adopted at a slower rate than younger animals?
    • Methodology: I would use the same methodology as the question above, but would include the Date of Birth variable from the Outcome data. Then, I would sort for those animals that were adopted and create a new variable for how many days they were in the shelter before adoption.
    • Why it’s important: Keeping animals in a shelter for long periods of time racks up added expenses for the shelter, and increases the chance of euthanasia.
    • Packages Required: tidyverse to read in the data, lubridate to change the Date columns from character vectors to dates. ggplot to compare Date of Birth to new variable, Days at Shelter.
  • Question 4: Are there any other animals at the shelter besides dogs and cats?
    • Methodology: Study the counts of the different types of animals, group them into “Dog or Cat” and “Other” to check dispersion.
    • Why it’s important: This is another point that is important for budget and PR reasons. If it costs the shelter a significant amount of money to care for the specialty animals, perhaps they should be transferred to another organization. If cost is low, the shelter should consider better advertising these animals so people know adoption is for more than just dogs and cats.
    • Packages Required: tidyverse to read in the data, ggplot to graph Intake Animal Type.
  • Question 5: Does the color of an animal dictate if they get adopted or euthanized?
    • Methodology: Compare the color of the animal with the outcome.
    • Why it’s important: The shelter sometimes holds events where you can adopt an animal of specific color at a discounted rate, such as their black dog and cat Black Friday discount.
    • Packages Required: tidyverse to read in the data, filter() to only look at the animals that are a single color, not mixed. ggplot to great sideways bar chart with color of animal on y axis and average days in shelter on x axis.
Optional Questions

If the data leads me to do so, I may explore the parameters below. Similar to many of the questions above, finding answers to the questions below could help guide the county to make decisions about PR efforts and budgetary spending.

  • Question: Are there significantly more animals at the shelter due to being a stray or an owner surrender?
  • Question: Do many of the animals need medical attention when they arrive above and beyond a traditional check-up?

How I’ll Help

My interest has been peaked and I want to learn what I can from the Austin data to see how I can help my community better understand animal intake and adoption rates and trends. Finding answers to these questions will help me better understand the issues encountered by my local animal shelter. I hope to pass on this information to the Chairman of the County Commissioners, who is currently overseeing the efforts of the county to take over the management of the shelter. Hopefully, information I discover from this data set will help me make a difference in my community.


Data Preparation

Data Sets To Be Analyzed

The data sets were both found on the Data.gov website. * Austin Animal Center Outcomes - 10/01/2013 to 07/21/2017 * Collected to “check out” animals that were leaving the shelter * 12 variables * Used nrow() to discover there are 69,293 rows * Differences between “Intake” data: this data contains a Date of Birth variable * Austin Animal Center Intakes - 10/01/2013 to 07/21/2017 * Collected to “check in” animals that were arriving at the shelter * 12 variables * Used nrow() to discover there are 69,513 rows * Differences between “Outcome” data: this data contains a Found Location variable

Cleaning Data

I started by reading in the data, and labeling them “intakes” and “outcomes.” I prepared the data for merging by changing character vetors to dates, renaming all variables that had identical names across the sets except for Animal ID, and dropping the Found Location, MonthYear, and Name variables.

The main variables I am concerned with are the Animal ID, Intake and Outcome Dates, Date of Birth, Animal Type, and Animal Color. Many questions can be answered and inferences can be made by combining the date of a certain event and a qualitative variable.

#Reading in "intakes" data#
library(tidyverse)
library(lubridate)
intakes <- read_csv("Austin_Animal_Center_Intakes.csv") %>%
  rename(`Intake Date` = DateTime) %>%
  rename(`Intake Breed` = `Breed`) %>%
  rename(`Intake Color` = `Color`) %>%
  rename(`Intake Animal Type` = `Animal Type`) %>%
  mutate(`Intake Date` = as.Date(`Intake Date`, format = "%m/%d/%Y")) %>%
  select(-`Found Location`, -MonthYear, -Name) %>%
  arrange(`Animal ID`)

intakes[!rev(duplicated(rev(intakes$`Animal ID`))),]
## # A tibble: 63,231 x 9
##    `Animal ID` `Intake Date`   `Intake Type` `Intake Condition`
##          <chr>        <date>           <chr>              <chr>
##  1     A006100    2014-03-07   Public Assist             Normal
##  2     A047759    2014-04-02 Owner Surrender             Normal
##  3     A134067    2013-11-16   Public Assist            Injured
##  4     A141142    2013-11-16           Stray               Aged
##  5     A163459    2014-11-14           Stray             Normal
##  6     A165752    2014-09-15           Stray             Normal
##  7     A178569    2014-03-17   Public Assist             Normal
##  8     A189592    2015-09-18           Stray             Normal
##  9     A191351    2015-11-13           Stray             Normal
## 10     A197810    2014-12-08           Stray             Normal
## # ... with 63,221 more rows, and 5 more variables: `Intake Animal
## #   Type` <chr>, `Sex upon Intake` <chr>, `Age upon Intake` <chr>, `Intake
## #   Breed` <chr>, `Intake Color` <chr>
#Reading in "outcomes" data#
library(tidyverse)
library(lubridate)
outcomes <- read_csv("Austin_Animal_Center_Outcomes.csv") %>%
  rename(`Outcome Date` = DateTime) %>%
  rename(`Outcome Animal Type` = `Animal Type`) %>%
  rename(`Outcome Breed` = `Breed`) %>%
  rename(`Outcome Color` = `Color`) %>%
  mutate(`Outcome Date` = as.Date(`Outcome Date`, format = "%m/%d/%Y")) %>%
  mutate(`Date of Birth` = as.Date(`Date of Birth`, format = "%m/%d/%Y")) %>%
  select( -MonthYear, -Name) %>%
  arrange(`Animal ID`)
outcomes[!rev(duplicated(rev(outcomes$`Animal ID`))),]
## # A tibble: 63,040 x 10
##    `Animal ID` `Outcome Date` `Date of Birth`  `Outcome Type`
##          <chr>         <date>          <date>           <chr>
##  1     A006100     2014-12-20      2007-07-09 Return to Owner
##  2     A047759     2014-04-07      2004-04-02        Transfer
##  3     A134067     2013-11-16      1997-10-16 Return to Owner
##  4     A141142     2013-11-17      1998-06-01 Return to Owner
##  5     A163459     2014-11-14      1999-10-19 Return to Owner
##  6     A165752     2014-09-15      1999-08-18 Return to Owner
##  7     A178569     2014-03-23      1999-03-17 Return to Owner
##  8     A189592     2015-09-18      1997-08-01 Return to Owner
##  9     A191351     2015-11-17      1999-08-21 Return to Owner
## 10     A197810     2014-12-22      2000-01-21        Transfer
## # ... with 63,030 more rows, and 6 more variables: `Outcome
## #   Subtype` <chr>, `Outcome Animal Type` <chr>, `Sex upon Outcome` <chr>,
## #   `Age upon Outcome` <chr>, `Outcome Breed` <chr>, `Outcome Color` <chr>

I have been unsuccessful in my attempts to merge my data. Some animals have been at the shelter on more than one occasion and they are tracked by their Animal ID. As you can see from the first entry in the below table, Scamp has been at the shelter on two separate occasions. I attemped to use merge() and inner_join() along with an if() statement to merge the two lines only if the Intake Date was older than the Outcome Date, but was unsuccessful.

Exploratory Analysis

Question 1: How many animals that come through the shelter get adopted? Get transferred? Get euthanized?

#Historgram with color fill to show breakdown of Outcome Type#
library(tidyverse)
ggplot(data = outcomes) +
  geom_bar(mapping = aes(x = `Outcome Type`, fill = `Outcome Type`))

More animals have the outcome of “Adopted” than any other outcome. Euthanasia is suprisingly low.

I wanted to see the breakdown of the outcomes by Animal Type, so I created the two graphs below.

#Histogram by Animal Type#
library(tidyverse)
ggplot(data = outcomes) +
  geom_bar(mapping = aes(x = `Outcome Animal Type`, fill = `Outcome Type`))

#Proportional histogram by Animal Type#
library(tidyverse)
ggplot(data = outcomes) +
  geom_bar(mapping = aes(x = `Outcome Type`, fill = `Outcome Animal Type`), position = "fill")

This graph supports my initial theory that most of the adoptions at the shelter are dogs and cats. Looking through a few of the individual data points, I found that most of the animals listed in the “Other” Animal Type are wild, such as opossum, foxes, non-domesticated birds, etc. So it makes sense why this animal type has high percentages in Died, Disposal and Relocated.

Question 4: Are there any other animals at the shelter besides dogs and cats?

count(outcomes, `Outcome Animal Type`)
## # A tibble: 5 x 2
##   `Outcome Animal Type`     n
##                   <chr> <int>
## 1                  Bird   294
## 2                   Cat 26020
## 3                   Dog 39150
## 4             Livestock     9
## 5                 Other  3820

Here we can see that there have only been 9 cases of livestock over the past few years at the shelter, so to feed and house this livestock is probably requires a special contingency fund to budget for these odd, infrequent expenses like food and shelter.