Man’s best friend
Man’s best friend

Introduction

Problem Statement

Dogs have long been known as “Man’s best friend,” but nowadays, there are so many pups out there who need a little extra love. Many have been abandoned or born as strays, and they’re just waiting for a warm, welcoming home!

Short Explanation

Through the analysis of this data, we intend to identify patterns that adoption agencies can leverage to increase the placement of dogs into suitable homes. For instance, if the state of Florida has a significant population of huskies available for adoption, relocating these animals to states such as Michigan or Alaska may enhance their prospects of finding a permanent, nurturing home.

Proposed Approach

We will collect the provided data, perform data cleaning to ensure usability and accuracy, analyze the relationships among the various variables, and integrate external data sources to enhance the analysis. The questions we intend to answer are:

  • What breed of dog is least desirable by location or region?

  • What are the average ages of dogs entering the adoption system?

  • Is there a correlation between location and coat length?

  • Where are most shelter dogs from? Broken down by breed and age.

  • Is there a seasonal pattern to when dogs enter the adoption system?

How will this help?

When adoption agencies get a grasp on the types of dogs that people are looking for in various areas, they can help find loving homes for more pups in need. This means fewer sad and stressed dogs waiting in shelters, all eager to find their forever families!

Packages Required

All Packages

To analyze this data, we will use the following R packages:

library(tidyverse)
library(knitr)
library(lubridate)

Explanation

We will use the tidyverse package to effectively clean and transform our data, the knitr package to prepare a summary, and the lubridate package to easily work with the dates in our data.

Data Preparation

Original Source

This data was aquired from Petfinder on 09/20/2019. The original dataset contained three different .csv files, dog_adoptable, dog_descriptions and dog_destination. The temperature data was obtained from NOAA National Centers for Environmental information, Climate at a Glance: Statewide Mapping, Average Temperature, published September 2022, retrieved on October 8, 2022 from https://www.ncdc.noaa.gov/cag/.

Dog_adoptable contained:

  • Observations: 90

  • Variables: 5

  • There were no oddities in this data set

Dog_destination contained:

  • Observations: 6194

  • Variables: 8

  • The description variable contained observations of jumbled text

Dog_descriptions contained:

  • Observations: 58180

  • Variables: 35

  • Many peculiarities including:

    • 33 observations were shifted over a column, either due to how the data was inputted or how the data was exported

    • Contained “declawed” variable, likely due to Petfinder also working with cats, however this was not applicable to dogs

    • Date accessed was a date, but date posted was a date/time

Data Importing and Cleaning

Data importation of the .csv files were done using Base R. We first examined the codebook, dimensions, variable types, and checked for duplicate values in the unique id variable.

By doing that, we discovered the 33 observations that were shifted over a column in dog_descriptions. The next task was to correct that, which was done by separating the descriptions set into good data and bad data. The bad data was corrected by removing a variable and renaming the columns so it could be merged back into the good data set.

We then removed variables that we did not want, for example, species was removed as all observations are dogs, and photo was removed as all observations had NA values. Next, we changed the posted date/time from character to date/time to date, the accessed variable from character to date, removed the Unknown observations (3) from the sex variable, and changed the NA values in the type variable to dog.

One of the biggest cleaning challenges was the contact_city observations. There were many that were capitalized differently, or typos, or just slightly different (Saint vs St.). We began by converting them all to title case, and then grouped them by contact_state and contact_zip, and filtered for the same state and zip code that had more than one unique city. That was then joined with an inner join to acquire names of the unique cities greater than one, and sorted alphabetically for efficiency and accuracy. This allowed us to quickly see any typos or slight differences that we could correct.

We then joined the cleaned dog_descriptions with dog_destination to form dog_data.The final step was releveling the age category to ensure that it will list from youngest (Baby), to oldest (Senior).

Final Data Set

A small glimpse of the final data set looks like:
Summary of Dog Data
id breed_primary breed_secondary breed_mixed breed_unknown color_primary color_secondary color_tertiary age sex size coat fixed house_trained special_needs env_children env_dogs env_cats posted contact_city contact_state contact_zip contact_country accessed type found manual remove still_there avg_temp
41330726 German Shepherd Dog NA FALSE FALSE NA NA NA Young Male Large NA FALSE FALSE FALSE NA NA NA 2018-04-05 Las Vegas NV 89146 US 2019-09-20 Dog NA NA NA NA 51.57105
38169117 Boxer Pit Bull Terrier TRUE FALSE Black White / Cream NA Adult Female Large Short TRUE TRUE FALSE NA NA FALSE 2017-05-26 Chandler AZ 85249 US 2019-09-20 Dog NA NA NA NA 61.61579
45833989 Beagle NA FALSE FALSE NA NA NA Senior Male Medium Short TRUE TRUE FALSE TRUE TRUE TRUE 2019-09-01 Albany NY 12220 US 2019-09-20 Dog NA NA NA NA 46.60987
45515547 Mixed Breed NA FALSE FALSE NA NA NA Senior Male Medium Short TRUE TRUE FALSE NA NA FALSE 2019-08-06 Albany NY 12220 US 2019-09-20 Dog NA NA NA NA 46.60987
45294115 Basset Hound NA FALSE FALSE Brown / Chocolate White / Cream NA Senior Female Medium Short TRUE TRUE FALSE FALSE FALSE NA 2019-07-18 Albany NY 12220 US 2019-09-20 Dog NA NA NA NA 46.60987
45229004 American Bulldog NA TRUE FALSE NA NA NA Senior Male Large Short TRUE TRUE FALSE TRUE TRUE NA 2019-07-11 Saugerties NY 12477 US 2019-09-20 Dog NA NA NA NA 46.60987
45227052 Mixed Breed NA FALSE FALSE White / Cream NA NA Senior Female Medium Short TRUE TRUE FALSE TRUE TRUE NA 2019-07-11 Saugerties NY 12477 US 2019-09-20 Dog NA NA NA NA 46.60987
45569380 Maltese NA FALSE FALSE White / Cream NA NA Senior Female Small Short TRUE TRUE FALSE TRUE TRUE NA 2019-08-10 Bristow VA 20136 US 2019-09-20 Dog NA NA NA NA 56.58289
44694387 Fox Terrier Chihuahua TRUE FALSE Bicolor NA NA Young Male Small Short TRUE FALSE FALSE FALSE NA FALSE 2019-05-14 Silver Spring MD 20905 US 2019-09-20 Dog NA NA NA NA 56.03355
36978896 Alaskan Malamute NA FALSE FALSE Bicolor NA NA Adult Female Large NA TRUE TRUE FALSE NA NA FALSE 2016-12-15 Gettysburg PA 17325 US 2019-09-20 Dog Maryland NA TRUE NA 50.06053

Summary Information

Concise Summary of Each Variable in dog_data
Variable Type Unique_Values Missing_Values
id integer 57351 0
breed_primary character 216 0
breed_secondary character 191 38180
breed_mixed logical 2 0
breed_unknown logical 1 0
color_primary character 16 32878
color_secondary character 16 47321
color_tertiary character 15 58162
age factor 4 0
sex character 2 0
size character 4 0
coat character 7 31748
fixed logical 2 0
house_trained logical 2 0
special_needs logical 2 0
env_children logical 3 30784
env_dogs logical 3 23539
env_cats logical 3 39782
posted Date 1753 0
contact_city character 2120 0
contact_state character 53 0
contact_zip character 3464 12
contact_country character 2 0
accessed Date 1 0
type character 1 0
found character 601 53266
manual character 67 57288
remove logical 2 57685
still_there logical 2 59090
avg_temp numeric 49 498

Proposed Exploratory Data Analysis

To answer some of our questions, such as is there a correlation between coat length and location, we will need to access other data, such as average temperatures or latitude per state or country. We will also need to form a correlation between month of the year and seasons, to determine what seasons dogs tend to enter the adoption system. Bar graphs and tables will be very useful in visually demonstrating the patterns we are locating. We would also like to determine how most dogs enter the adoption system, however that information is buried in jumbled description variables and we are not yet confident on how to efficiently extract the information. We do not plan on using any machine learning techniques.

Data Exploration

What Dog Is Least Desirable?

There are a few ways to determine the least desirable dog breed. We will answer this through the number of each dog breed and the average age of each dog breed.

Least Desirable Dog Breed by Total In System

The number one least desirable dog breed by total number in the system is the Pit Bull Terrier, with 7783 observations. The remaining top is as follows: Labrador Retriever with 7082; Chihuahua with 3714; Mixed Breed with 3240, Terrier with 2609, Hound with 2267; German Shepherd Dog with 2097; Boxer with 2001; Shepherd with 1939; and American Staffordshire Terrier with 1800.

Least Desirable Dog Breed by State In System

The least desirable dog breed by state is determined by total number of a particular breed in that state and consolidated for a grand total for each applicable dog breed. Labrador Retrievers are most populous shelter dog breed in 22 states. Pit Bull Terrier in 20 states. Mixed Breeds in 4. Husky in 2 states. American Staffordshire Terrier, Border Collie, Chihuahua, Golden Retriever and Shar-Pei are tied for 1 state each.

## # A tibble: 9 × 2
##   breed_primary                  states_count
##   <chr>                                 <int>
## 1 Labrador Retriever                       23
## 2 Pit Bull Terrier                         19
## 3 Mixed Breed                               4
## 4 Husky                                     2
## 5 American Staffordshire Terrier            1
## 6 Border Collie                             1
## 7 Chihuahua                                 1
## 8 Golden Retriever                          1
## 9 Shar-Pei                                  1

What is the average age of dogs put up for adoption?

The graph provides a clear visualization of the age distribution of dogs within the adoption system, highlighting some distinct trends. Adult dogs form the largest group, which could suggest they face more challenges in being adopted compared to other age groups. Following closely are young dogs, whose numbers indicate both their appeal and the possibility that not all are quickly placed into homes. Meanwhile, puppies, though fewer, appear to have a relatively notable presence, possibly due to their high demand but also a steady intake. Senior dogs, on the other hand, make up the smallest group, which might reflect either a lower intake or potential difficulties in rehoming older pets. This age distribution presents an opportunity for further analysis, such as exploring the factors contributing to these trends or considering strategies to boost adoption rates for less-represented age groups, like seniors. It serves as an excellent foundation for uncovering deeper insights into the adoption process.

Average age of dogs put up for adoption broken down by breed?

The average age of dogs put up for adoption is when they are Adult, at 47.6% of all dogs listed for adoption. The breeds most likely to be put up for adoption at an elderly age are the Bolognese, Sky Terrier, Pembroke Welsh Corgi, Kyi Leo, Finnish Spitz, Sheep Dog, Lhasa Apso, Smooth Collie, Miniature Poodle, and the Silky Terrier. The breed most likely to be put up for adoption at a young age are the Spinone Italiano, Lakeland Terrier, Mountain Dog, Irish Terrier, Tibetan Mastiff, Petit Basset Griffon Vendeen, Portuguese Podengo, Wirehaired Dachshund, Border Terrier, and the Ibizan Hound.

## # A tibble: 4 × 3
##   age    count percentage
##   <fct>  <int>      <dbl>
## 1 Adult  28249      47.6 
## 2 Young  16807      28.3 
## 3 Baby    9798      16.5 
## 4 Senior  4555       7.67
## # A tibble: 10 × 2
##    breed_primary        avg_age
##    <chr>                  <dbl>
##  1 Bolognese               4   
##  2 Skye Terrier            4   
##  3 Pembroke Welsh Corgi    3.56
##  4 Kyi Leo                 3.5 
##  5 Finnish Spitz           3.4 
##  6 Sheep Dog               3.33
##  7 Lhasa Apso              3.25
##  8 Smooth Collie           3.25
##  9 Miniature Poodle        3.23
## 10 Silky Terrier           3.12
## # A tibble: 10 × 2
##    breed_primary                avg_age
##    <chr>                          <dbl>
##  1 Spinone Italiano                1   
##  2 Lakeland Terrier                1.2 
##  3 Mountain Dog                    1.22
##  4 Irish Terrier                   1.65
##  5 Tibetan Mastiff                 1.7 
##  6 Petit Basset Griffon Vendeen    1.75
##  7 Portuguese Podengo              1.75
##  8 Wirehaired Dachshund            1.75
##  9 Border Terrier                  1.8 
## 10 Ibizan Hound                    1.8

Is there a correlation between location and coat length?

This heatmap and boxplot provide a harrowing view of the relationship between average state temperatures and various coat types. By utilizing a gradient that shifts from dark to light blue, it offers an intuitive way to interpret how climates vary across states and how these differences might align with the coat types of dogs. States with lower average temperatures appear prominently in darker shades, while those with higher averages have lighter hues. The inclusion of coat categories like curly, hairless, etc, coat types highlights a lack of meaningful patterns or adaptations linked to regional climates. This visualization stands out as a powerful tool for exploring how environmental factors and biological traits are underutilized by dog owners, sparking implications for animal welfare and habitat suitability across different regions and the need to inform owners about appropriate dog breeds for their area.

Where are most shelter dogs from?

This bar chart offers a striking overview of the distribution of shelter dogs across states, while also providing an age-based breakdown using color-coded categories. States like Georgia, Nevada, and Ohio stand out for their notably higher number of shelter dogs, which might warrant further investigation into factors such as population density or shelter intake policies. The use of distinct colors for age groups makes it easy to see how age distribution varies within each state. Adults appear to dominate in most states, whereas babies, young dogs, and seniors are less prevalent but still visible. This visualization effectively highlights key trends and opens the door for deeper analysis, such as understanding the reasons for higher dog populations in certain states or exploring how age impacts adoption rates across different regions.

Is there a seasonal pattern to when dogs enter the adoption system?

This bar chart offers valuable insights into the seasonal trends of dog adoption postings. Fall stands out as the season with the highest number of dogs posted for adoption, closely followed by Summer. Spring sees a moderate number of postings, while Winter has the lowest. These patterns might reflect practical or societal factors, such as changes in shelter intake rates, adopter interest, or seasonal events like holidays that could influence posting activity. The clear distinction between seasons suggests an opportunity for shelters and agencies to optimize their efforts, perhaps by aligning campaigns or resources with peak and low activity periods. It’s a well-designed visualization that highlights an important temporal trend in the adoption system.

Summary

This analysis aimed to address the challenges surrounding the high number of dogs in shelters by identifying patterns and trends that could help increase their chances of adoption. Using a dataset sourced from Petfinder and enhanced with NOAA temperature data, we carefully cleaned and structured the information to ensure usability and accuracy. Key variables were examined, and relationships between factors such as breed, age, location, and seasonality were analyzed using statistical methods and visualizations.

The findings revealed several important insights, including the prevalence of adult dogs in shelters, the geographic disparities with certain states housing significantly higher numbers of dogs, and the seasonal trends in adoption postings peaking during Fall and Summer. Additionally, the analysis highlighted correlations between coat types and regional climates, as well as the dominance of particular breeds, such as Labrador Retrievers and Pit Bull Terriers, in the shelter system. These insights have practical implications for adoption agencies, allowing them to develop targeted campaigns for underrepresented groups like senior dogs, relocate specific breeds to areas with higher demand, and align resources with peak adoption periods.

However, the study had limitations, such as the challenges of cleaning inconsistent data, the exclusion of external factors like economic conditions, and reliance on descriptive methods. Future work could expand the dataset, incorporate predictive models, or integrate adopter feedback for more comprehensive insights. This project demonstrates the potential of data-driven approaches in tackling real-world issues while highlighting opportunities for further exploration and refinement.