Dogs come in all shapes, sizes, and personalities. Adoptable dogs are no exception. These dogs demand goods and services just like humans. Nearly every canine bone, leash, bowl or bed, comes to market through an extensive supply chain forecasted to meet demand. For example, bigger dogs need bigger beds, bones, collars, and cages. Smaller dogs need smaller products which take up less retail shelf space and have lower freight costs.
With this data analysis, manufacturers can optimize their supply chain, and pet retailers can be more profitable with more accurate forecasts for canine-specific products. Ultimately, the data analysis will save freight costs, prevent stockouts, and increase customer service satisfaction.
____________________________________________________________________________________________________________________________
Our goal is to minimize shortages and reduce cost of goods sold. This data analysis provides insights on adoptable canines across the country to improve purchasing and logistics, making sure manufacturers and retailers accurately get the right products, to the correct location, at the right time.
This data analysis will focus on descriptive insights from a dataset on adoptable dogs. This data is from September 19, 2019, and is provided by PetFinder and Github. This analysis is intended to help canine manufacturers and retailers optimize merchandise in regions with the most demand for specific products and services. Specifically, we’ll focus on four areas: size, breed, and age of the dog.
list.of.packages <- c("tidyverse", "scales", "readr", "maps", "DT", "knitr", "rmarkdown", "ggthemes", "plotly", "ggbump", "ggalluvial", "tinytex")
library(tidyverse) # easy installation of packages
library(scales) # for scale functions for visualization
library(readr) # to easily import delimited data
library(maps) # for geographical data
library(DT) # to create functional tables in HTML
library(knitr) # for dynamic report generation
library(rmarkdown) # for RMarkdown documents into a variety of formats
library(ggthemes) # to implement theme across report
library(plotly) # for dynamic plotting
library(kableExtra) # for styling tables
library(tinytex) # for making tables
library(ggalluvial) # to visualize frequency tables of categorical variables
library(ggbump) # bump chart to plot ranking when the path between two
The project contains data used in The Pudding essay Finding Forever Homes written by Amber Thomas and published in October 2019.
A dataset, dog_description.csv, was downloaded from GitHub labeled Adoptable Dogs.
Our project uses Finding Forever Homes data initially collected from Petfinder.com on all adoptable dogs in the U.S. on a single day, specifically 09-20-2019.
The original purpose of the data was used for the Finding Forever Homes essay to highlight where a state’s adoptable dogs are imported from by state and why they were relocated. The essay draws conclusions about the benefits and risks of the transportation of dogs for adoption.
The data available comes in 3 csv files labeled dog_description.csv, dog_moves.csv,and dog_travel.csv. However, we will use data from only one dataset for this this project.
dog_description.csv has 58,180 entries with 36 variables. Each row represents an individual adoptable dog in the U.S. on September 20, 2019. Each dog has a unique I.D. number. Unless otherwise noted, all the data is exactly as reported by the shelter or rescue that posted an individual animal adoption on PetFinder.Missing values are recorded using “NA” in original data sets.
Data is imported from csv as shown below:
dog_descriptions <- readr::read_csv(url("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_descriptions.csv"))
-Copyright
Values in the original dataset were shifted into the wrong column. This step shows cleaning the data through realignment. The data was also cleaned by correcting any misspellings or misinformation. We also organized the data to make is easier to work with.
After taking a look at the variables in the dataset, it was important to remove the variables that would not be needed or contribute to the story of our analysis. We decided to keep some of the original 36 variables in dog_description.csv. The columns that were not needed for the analysis were removed.
After evaluating column specifications it is important to address missing values. To do this we utilize the is.na() function to determine which columns contain missing values.
| Variable | Type | Description |
|---|---|---|
| id | numeric | Observation identifier |
| breed | character | Type of dog breed |
| age | character | Age of dog |
| size | character | Size of dog |
| city | character | City where dog is located |
| state | character | State where dog is located |
| zip | character | Zip code of where dog is located |
| region | character | United States region where dog is located |
For the purposes of our visualizations and analysis, we have broken the United States out into regions 1 ~ 8 noted on the map below.
Illustration: 8 Regions:
The dataset is directly based off Petfinder Data for a single day and are almost all character variables. After looking at the data, no outliers appear.
Dog Breed
In our analysis, we took a look at which breeds were most popular in the US. Understanding the volume of dogs of a particular breed will allow for more products to be on shelves that are specific to customer needs. For example, hunting dogs may require more toys to run and play with while toy dogs may need more grooming products.
We wanted to start by finding the zip code and city with the highest concentration of the most popular breed in the US: Pit Bull. We found this to be in Atlanta, GA where there were 242 pitbulls.
| breed | city | zip | n |
|---|---|---|---|
| Pit Bull Terrier | Atlanta | 30318 | 242 |
| breed | city | zip | n |
|---|---|---|---|
| Chihuahua | Las Vegas | 89103 | 54 |
Ranking dogs by breed allows us to understand which popular breeds to stock products for and further research their needs. In the US the most popular breed is the Pit Bull Terrier, closely followed by the Labrador Retriever. There were over 7000 dogs from each of these breeds on PetFinder.
| breed | breed_rank | n |
|---|---|---|
| Pit Bull Terrier | 1 | 7890 |
| Labrador Retriever | 2 | 7198 |
| Chihuahua | 3 | 3766 |
| Mixed Breed | 4 | 3242 |
| Terrier | 5 | 2641 |
| Hound | 6 | 2282 |
| German Shepherd Dog | 7 | 2122 |
| Boxer | 8 | 2050 |
| Shepherd | 9 | 1972 |
| American Staffordshire Terrier | 10 | 1862 |
Dog Size
In our analysis, we took a look at where dogs of certain sizes were most popular in the US. Manufacturers and retailers could value from this by saving money of shipments of heavier, larger goods and only distributing what is needed to the right locations. For example, extra large dogs eat more food and need big crates and toys, while small dogs need small treats and collars.
First, we evaluated which dog sizes were most popular in different regions of the US. We predicted there would be a big difference in regions due to rural vs. urban areas, climate differences, and the need for working dogs. However, we were surprised that while there were some differences in preference, most regions prefer medium sized dogs. We were also interested to see that extra large dogs are significantly more rare than other sizes.
We counted the dog Size per Region here:
| region | size | n | prop |
|---|---|---|---|
| Southeast | |||
| Region-1 | Extra Large | 154 | 1.09 |
| Region-1 | Large | 3887 | 27.58 |
| Region-1 | Medium | 7631 | 54.15 |
| Region-1 | Small | 2420 | 17.17 |
| Mid South | |||
| Region-2 | Extra Large | 49 | 1.29 |
| Region-2 | Large | 1017 | 26.69 |
| Region-2 | Medium | 2113 | 55.46 |
| Region-2 | Small | 631 | 16.56 |
| Southwest | |||
| Region-3 | Extra Large | 151 | 2.94 |
| Region-3 | Large | 1329 | 25.84 |
| Region-3 | Medium | 2298 | 44.67 |
| Region-3 | Small | 1366 | 26.56 |
| West Coast | |||
| Region-4 | Extra Large | 25 | 0.96 |
| Region-4 | Large | 829 | 32.00 |
| Region-4 | Medium | 856 | 33.04 |
| Region-4 | Small | 881 | 34.00 |
| Pacific Northwest | |||
| Region-5 | Extra Large | 26 | 1.72 |
| Region-5 | Large | 471 | 31.17 |
| Region-5 | Medium | 673 | 44.54 |
| Region-5 | Small | 341 | 22.57 |
| Great Plaines | |||
| Region-6 | Extra Large | 56 | 1.84 |
| Region-6 | Large | 900 | 29.60 |
| Region-6 | Medium | 1375 | 45.22 |
| Region-6 | Small | 710 | 23.35 |
| Midwest | |||
| Region-7 | Extra Large | 164 | 2.05 |
| Region-7 | Large | 2402 | 30.00 |
| Region-7 | Medium | 4030 | 50.34 |
| Region-7 | Small | 1410 | 17.61 |
| Northeast | |||
| Region-8 | Extra Large | 306 | 1.53 |
| Region-8 | Large | 4926 | 24.65 |
| Region-8 | Medium | 10932 | 54.70 |
| Region-8 | Small | 3821 | 19.12 |
After counting % of Dog Size per Region we wanted the percentages of each dog size in each region. It appears, the west coast has the least preference in dog sizes.
From here we wanted to look at our own state of Ohio to understand how many dogs there were of different sizes. In Ohio most dogs are medium sized.
| size | count_by_size |
|---|---|
| Extra Large | 67 |
| Large | 763 |
| Medium | 1321 |
| Small | 522 |
| state | size | n | percent_small |
|---|---|---|---|
| UT | Small | 300 | 61.86 |
| state | size | n | percent_large |
|---|---|---|---|
| SD | Large | 15 | 62.5 |
Dog Age
We looked at where there were dogs of certain ages were most common in the US. Dogs have evolving needs throughout their lives that owners must prepare for. Therefore, manufacturers and retailers become more trusted to the customer when they are able to provide products to dogs of all ages. For example, puppies need training tools like clickers and harnesses, while elderly dogs need medical products.
We analyzed zip codes where puppies were most commonly found in the US. It is possible that this could be because there are breeders or puppy mills in this area. This will translate to more demand for puppy supplies.
| zip | city | state | age | n |
|---|---|---|---|---|
| 80126 | Littleton | CO | Baby | 87 |
| 11558 | Island Park | NY | Baby | 84 |
| 35051 | Columbiana | AL | Baby | 78 |
| 01810 | Andover | MA | Baby | 77 |
| 29607 | Greenville | SC | Baby | 77 |
We analyzed cities where aging dogs were most commonly found in the US. It is possible that this could be because there are more families or elderly people in this area. This will translate to more demand for elderly dog supplies.
| city | state | age | n |
|---|---|---|---|
| Kanab | UT | Senior | 86 |
| Las Vegas | NV | Senior | 75 |
| Phoenix | AZ | Senior | 74 |
| Chamblee | GA | Senior | 59 |
| New York | NY | Senior | 55 |
Finally, we evaluated which dog age was most common by percentage in each region of the US. As expected, we found adult dogs to be the most common age in all regions. This makes sense as the range of adult years spans most of a dog’s life. Items related to adult dogs will have the most need in all pet stores.
Summary of Our Findings
After completing our data analysis, we believe market research is crucial for companies to be able to source the right products to their customers. In the US canine manufacturers and retailers should focus mainly on items catered to medium sized, adult dogs. They should also keep specialty items for Pitt Bull Terriers and Labrador Retrievers. Pet stores should take into account that many factors can affect the outcomes of data analysis gathered from just one day, so we recommend they continue research with similar PetFinder data over time. It is also important to note that there are many underlying reasons for the results of our data analysis that may not be explicitly shown in the research we have presented. Finding other related data sources in the future can provide answers to questions about the outcomes of our analysis and give more context. Overall, canine manufacturers and suppliers will gain a better understanding of the products needed most from our research, and can reduce cost while having the best selection for their customers.
-Copyright