Dogs have long been known as “Man’s best friend,” but nowadays, there are so many pups out there who need a little extra love. Many have been abandoned or born as strays, and they’re just waiting for a warm, welcoming home!
Through the analysis of this data, we intend to identify patterns that adoption agencies can leverage to increase the placement of dogs into suitable homes. For instance, if the state of Florida has a significant population of huskies available for adoption, relocating these animals to states such as Michigan or Alaska may enhance their prospects of finding a permanent, nurturing home.
We will collect the provided data, perform data cleaning to ensure usability and accuracy, analyze the relationships among the various variables, and integrate external data sources to enhance the analysis. The questions we intend to answer are:
What breed of dog is least desirable by location or region?
What are the average ages of dogs entering the adoption system?
Is there a correlation between location and coat length?
Where are most shelter dogs from? Broken down by breed and age.
Is there a seasonal pattern to when dogs enter the adoption system?
When adoption agencies get a grasp on the types of dogs that people are looking for in various areas, they can help find loving homes for more pups in need. This means fewer sad and stressed dogs waiting in shelters, all eager to find their forever families!
To analyze this data, we will use the following R packages:
library(tidyverse)
library(knitr)
library(lubridate)
We will use the tidyverse package to effectively clean and transform our data, the knitr package to prepare a summary, and the lubridate package to easily work with the dates in our data.
This data was aquired from Petfinder on 09/20/2019. The original dataset contained three different .csv files, dog_adoptable, dog_descriptions and dog_destination. The temperature data was obtained from NOAA National Centers for Environmental information, Climate at a Glance: Statewide Mapping, Average Temperature, published September 2022, retrieved on October 8, 2022 from https://www.ncdc.noaa.gov/cag/.
Dog_adoptable contained:
Observations: 90
Variables: 5
There were no oddities in this data set
Dog_destination contained:
Observations: 6194
Variables: 8
The description variable contained observations of jumbled text
Dog_descriptions contained:
Observations: 58180
Variables: 35
Many peculiarities including:
33 observations were shifted over a column, either due to how the data was inputted or how the data was exported
Contained “declawed” variable, likely due to Petfinder also working with cats, however this was not applicable to dogs
Date accessed was a date, but date posted was a date/time
Data importation of the .csv files were done using Base R. We first examined the codebook, dimensions, variable types, and checked for duplicate values in the unique id variable.
By doing that, we discovered the 33 observations that were shifted over a column in dog_descriptions. The next task was to correct that, which was done by separating the descriptions set into good data and bad data. The bad data was corrected by removing a variable and renaming the columns so it could be merged back into the good data set.
We then removed variables that we did not want, for example, species was removed as all observations are dogs, and photo was removed as all observations had NA values. Next, we changed the posted date/time from character to date/time to date, the accessed variable from character to date, removed the Unknown observations (3) from the sex variable, and changed the NA values in the type variable to dog.
One of the biggest cleaning challenges was the contact_city observations. There were many that were capitalized differently, or typos, or just slightly different (Saint vs St.). We began by converting them all to title case, and then grouped them by contact_state and contact_zip, and filtered for the same state and zip code that had more than one unique city. That was then joined with an inner join to acquire names of the unique cities greater than one, and sorted alphabetically for efficiency and accuracy. This allowed us to quickly see any typos or slight differences that we could correct.
We then joined the cleaned dog_descriptions with dog_destination to form dog_data.The final step was releveling the age category to ensure that it will list from youngest (Baby), to oldest (Senior).
| id | breed_primary | breed_secondary | breed_mixed | breed_unknown | color_primary | color_secondary | color_tertiary | age | sex | size | coat | fixed | house_trained | special_needs | env_children | env_dogs | env_cats | posted | contact_city | contact_state | contact_zip | contact_country | accessed | type | found | manual | remove | still_there | avg_temp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 41330726 | German Shepherd Dog | NA | FALSE | FALSE | NA | NA | NA | Young | Male | Large | NA | FALSE | FALSE | FALSE | NA | NA | NA | 2018-04-05 | Las Vegas | NV | 89146 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 51.57105 |
| 38169117 | Boxer | Pit Bull Terrier | TRUE | FALSE | Black | White / Cream | NA | Adult | Female | Large | Short | TRUE | TRUE | FALSE | NA | NA | FALSE | 2017-05-26 | Chandler | AZ | 85249 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 61.61579 |
| 45833989 | Beagle | NA | FALSE | FALSE | NA | NA | NA | Senior | Male | Medium | Short | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | 2019-09-01 | Albany | NY | 12220 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 46.60987 |
| 45515547 | Mixed Breed | NA | FALSE | FALSE | NA | NA | NA | Senior | Male | Medium | Short | TRUE | TRUE | FALSE | NA | NA | FALSE | 2019-08-06 | Albany | NY | 12220 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 46.60987 |
| 45294115 | Basset Hound | NA | FALSE | FALSE | Brown / Chocolate | White / Cream | NA | Senior | Female | Medium | Short | TRUE | TRUE | FALSE | FALSE | FALSE | NA | 2019-07-18 | Albany | NY | 12220 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 46.60987 |
| 45229004 | American Bulldog | NA | TRUE | FALSE | NA | NA | NA | Senior | Male | Large | Short | TRUE | TRUE | FALSE | TRUE | TRUE | NA | 2019-07-11 | Saugerties | NY | 12477 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 46.60987 |
| 45227052 | Mixed Breed | NA | FALSE | FALSE | White / Cream | NA | NA | Senior | Female | Medium | Short | TRUE | TRUE | FALSE | TRUE | TRUE | NA | 2019-07-11 | Saugerties | NY | 12477 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 46.60987 |
| 45569380 | Maltese | NA | FALSE | FALSE | White / Cream | NA | NA | Senior | Female | Small | Short | TRUE | TRUE | FALSE | TRUE | TRUE | NA | 2019-08-10 | Bristow | VA | 20136 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 56.58289 |
| 44694387 | Fox Terrier | Chihuahua | TRUE | FALSE | Bicolor | NA | NA | Young | Male | Small | Short | TRUE | FALSE | FALSE | FALSE | NA | FALSE | 2019-05-14 | Silver Spring | MD | 20905 | US | 2019-09-20 | Dog | NA | NA | NA | NA | 56.03355 |
| 36978896 | Alaskan Malamute | NA | FALSE | FALSE | Bicolor | NA | NA | Adult | Female | Large | NA | TRUE | TRUE | FALSE | NA | NA | FALSE | 2016-12-15 | Gettysburg | PA | 17325 | US | 2019-09-20 | Dog | Maryland | NA | TRUE | NA | 50.06053 |
| Variable | Type | Unique_Values | Missing_Values |
|---|---|---|---|
| id | integer | 57351 | 0 |
| breed_primary | character | 216 | 0 |
| breed_secondary | character | 191 | 38180 |
| breed_mixed | logical | 2 | 0 |
| breed_unknown | logical | 1 | 0 |
| color_primary | character | 16 | 32878 |
| color_secondary | character | 16 | 47321 |
| color_tertiary | character | 15 | 58162 |
| age | factor | 4 | 0 |
| sex | character | 2 | 0 |
| size | character | 4 | 0 |
| coat | character | 7 | 31748 |
| fixed | logical | 2 | 0 |
| house_trained | logical | 2 | 0 |
| special_needs | logical | 2 | 0 |
| env_children | logical | 3 | 30784 |
| env_dogs | logical | 3 | 23539 |
| env_cats | logical | 3 | 39782 |
| posted | Date | 1753 | 0 |
| contact_city | character | 2120 | 0 |
| contact_state | character | 53 | 0 |
| contact_zip | character | 3464 | 12 |
| contact_country | character | 2 | 0 |
| accessed | Date | 1 | 0 |
| type | character | 1 | 0 |
| found | character | 601 | 53266 |
| manual | character | 67 | 57288 |
| remove | logical | 2 | 57685 |
| still_there | logical | 2 | 59090 |
| avg_temp | numeric | 49 | 498 |
To answer some of our questions, such as is there a correlation between coat length and location, we will need to access other data, such as average temperatures or latitude per state or country. We will also need to form a correlation between month of the year and seasons, to determine what seasons dogs tend to enter the adoption system. Bar graphs and tables will be very useful in visually demonstrating the patterns we are locating. We would also like to determine how most dogs enter the adoption system, however that information is buried in jumbled description variables and we are not yet confident on how to efficiently extract the information. We do not plan on using any machine learning techniques.
There are a few ways to determine the least desirable dog breed. We will answer this through the number of each dog breed and the average age of each dog breed.
The number one least desirable dog breed by total number in the system is the Pit Bull Terrier, with 7783 observations. The remaining top is as follows: Labrador Retriever with 7082; Chihuahua with 3714; Mixed Breed with 3240, Terrier with 2609, Hound with 2267; German Shepherd Dog with 2097; Boxer with 2001; Shepherd with 1939; and American Staffordshire Terrier with 1800.
The least desirable dog breed by state is determined by total number of a particular breed in that state and consolidated for a grand total for each applicable dog breed. Labrador Retrievers are most populous shelter dog breed in 22 states. Pit Bull Terrier in 20 states. Mixed Breeds in 4. Husky in 2 states. American Staffordshire Terrier, Border Collie, Chihuahua, Golden Retriever and Shar-Pei are tied for 1 state each.
## # A tibble: 9 × 2
## breed_primary states_count
## <chr> <int>
## 1 Labrador Retriever 23
## 2 Pit Bull Terrier 19
## 3 Mixed Breed 4
## 4 Husky 2
## 5 American Staffordshire Terrier 1
## 6 Border Collie 1
## 7 Chihuahua 1
## 8 Golden Retriever 1
## 9 Shar-Pei 1
The graph provides a clear visualization of the age distribution of dogs within the adoption system, highlighting some distinct trends. Adult dogs form the largest group, which could suggest they face more challenges in being adopted compared to other age groups. Following closely are young dogs, whose numbers indicate both their appeal and the possibility that not all are quickly placed into homes. Meanwhile, puppies, though fewer, appear to have a relatively notable presence, possibly due to their high demand but also a steady intake. Senior dogs, on the other hand, make up the smallest group, which might reflect either a lower intake or potential difficulties in rehoming older pets. This age distribution presents an opportunity for further analysis, such as exploring the factors contributing to these trends or considering strategies to boost adoption rates for less-represented age groups, like seniors. It serves as an excellent foundation for uncovering deeper insights into the adoption process.
The average age of dogs put up for adoption is when they are Adult, at 47.6% of all dogs listed for adoption. The breeds most likely to be put up for adoption at an elderly age are the Bolognese, Sky Terrier, Pembroke Welsh Corgi, Kyi Leo, Finnish Spitz, Sheep Dog, Lhasa Apso, Smooth Collie, Miniature Poodle, and the Silky Terrier. The breed most likely to be put up for adoption at a young age are the Spinone Italiano, Lakeland Terrier, Mountain Dog, Irish Terrier, Tibetan Mastiff, Petit Basset Griffon Vendeen, Portuguese Podengo, Wirehaired Dachshund, Border Terrier, and the Ibizan Hound.
## # A tibble: 4 × 3
## age count percentage
## <fct> <int> <dbl>
## 1 Adult 28249 47.6
## 2 Young 16807 28.3
## 3 Baby 9798 16.5
## 4 Senior 4555 7.67
## # A tibble: 10 × 2
## breed_primary avg_age
## <chr> <dbl>
## 1 Bolognese 4
## 2 Skye Terrier 4
## 3 Pembroke Welsh Corgi 3.56
## 4 Kyi Leo 3.5
## 5 Finnish Spitz 3.4
## 6 Sheep Dog 3.33
## 7 Lhasa Apso 3.25
## 8 Smooth Collie 3.25
## 9 Miniature Poodle 3.23
## 10 Silky Terrier 3.12
## # A tibble: 10 × 2
## breed_primary avg_age
## <chr> <dbl>
## 1 Spinone Italiano 1
## 2 Lakeland Terrier 1.2
## 3 Mountain Dog 1.22
## 4 Irish Terrier 1.65
## 5 Tibetan Mastiff 1.7
## 6 Petit Basset Griffon Vendeen 1.75
## 7 Portuguese Podengo 1.75
## 8 Wirehaired Dachshund 1.75
## 9 Border Terrier 1.8
## 10 Ibizan Hound 1.8
This heatmap and boxplot provide a harrowing view of the relationship between average state temperatures and various coat types. By utilizing a gradient that shifts from dark to light blue, it offers an intuitive way to interpret how climates vary across states and how these differences might align with the coat types of dogs. States with lower average temperatures appear prominently in darker shades, while those with higher averages have lighter hues. The inclusion of coat categories like curly, hairless, etc, coat types highlights a lack of meaningful patterns or adaptations linked to regional climates. This visualization stands out as a powerful tool for exploring how environmental factors and biological traits are underutilized by dog owners, sparking implications for animal welfare and habitat suitability across different regions and the need to inform owners about appropriate dog breeds for their area.
This bar chart offers a striking overview of the distribution of shelter dogs across states, while also providing an age-based breakdown using color-coded categories. States like Georgia, Nevada, and Ohio stand out for their notably higher number of shelter dogs, which might warrant further investigation into factors such as population density or shelter intake policies. The use of distinct colors for age groups makes it easy to see how age distribution varies within each state. Adults appear to dominate in most states, whereas babies, young dogs, and seniors are less prevalent but still visible. This visualization effectively highlights key trends and opens the door for deeper analysis, such as understanding the reasons for higher dog populations in certain states or exploring how age impacts adoption rates across different regions.
This bar chart offers valuable insights into the seasonal trends of dog adoption postings. Fall stands out as the season with the highest number of dogs posted for adoption, closely followed by Summer. Spring sees a moderate number of postings, while Winter has the lowest. These patterns might reflect practical or societal factors, such as changes in shelter intake rates, adopter interest, or seasonal events like holidays that could influence posting activity. The clear distinction between seasons suggests an opportunity for shelters and agencies to optimize their efforts, perhaps by aligning campaigns or resources with peak and low activity periods. It’s a well-designed visualization that highlights an important temporal trend in the adoption system.
This analysis aimed to address the challenges surrounding the high number of dogs in shelters by identifying patterns and trends that could help increase their chances of adoption. Using a dataset sourced from Petfinder and enhanced with NOAA temperature data, we carefully cleaned and structured the information to ensure usability and accuracy. Key variables were examined, and relationships between factors such as breed, age, location, and seasonality were analyzed using statistical methods and visualizations.
The findings revealed several important insights, including the prevalence of adult dogs in shelters, the geographic disparities with certain states housing significantly higher numbers of dogs, and the seasonal trends in adoption postings peaking during Fall and Summer. Additionally, the analysis highlighted correlations between coat types and regional climates, as well as the dominance of particular breeds, such as Labrador Retrievers and Pit Bull Terriers, in the shelter system. These insights have practical implications for adoption agencies, allowing them to develop targeted campaigns for underrepresented groups like senior dogs, relocate specific breeds to areas with higher demand, and align resources with peak adoption periods.
However, the study had limitations, such as the challenges of cleaning inconsistent data, the exclusion of external factors like economic conditions, and reliance on descriptive methods. Future work could expand the dataset, incorporate predictive models, or integrate adopter feedback for more comprehensive insights. This project demonstrates the potential of data-driven approaches in tackling real-world issues while highlighting opportunities for further exploration and refinement.