1 Introduction

This project will analyze animal intake and outcome data from the Austin Animal Center, a large municipal organization that processes thousands of animals annually. The analysis will use data from three distinct sources.

The primary data is drawn from two publicly-maintained datasets provided by the City of Austin’s official open data portal. These datasets contain detailed records for every animal processed by the center.

A third data source for daily weather conditions will be generated using the GSODR R package. This package downloads data directly from the National Oceanic and Atmospheric Administration’s (NOAA) “Global Summary of the Day” (GSOD) database for the Austin area.

Figure 1. The Austin Animal Center (Image Source: City of Austin) Figure 1. The Austin Animal Center (Image Source: City of Austin)

These datasets were chosen because they provide an opportunity to understand the operational side of an animal shelter.

The goal of this investigation is to identify patterns in the shelter’s population such as the most common animal types and breeds while analyzing the key factors that influence animal outcomes: adoption, transfer, or return to owner. Specifically, this project will explore the relationship between an animal’s characteristics like type, breed, and age, with external factors such as daily weather, and its total length of stay in the shelter.

2 Data Preparation

This analysis integrates three distinct data sources to create a final dataset ready for analysis.

3 Data Sourcing

  • Austin Animal Center Data: Two primary datasets, “Austin Animal Center Intakes” and “Austin Animal Center Outcomes,” were downloaded as CSV files from the City of Austin’s open data portal. I imported them into R using the read.csv() function.

  • Weather Data: I obtained the third dataset for daily weather by using the GSODR package. First, I used the package’s get_isd_history() function to identify the station ID for “AUSTIN-BERGSTROM INTL AIRPORT” (722540-13904). Then, with the get_GSOD() function I obtained the daily weather summary for this station for the years 2013 to 2025.

4 Data Processing and Cleaning

4.1 Challenges

The most significant challenge in data preparation was correctly linking an animal’s intake record to its corresponding outcome record. A single Animal.ID can appear multiple times in both datasets which means the animal had multiple shelter stays. A simple inner_join on Animal.ID would create many incorrect many-to-many matches.

I resolved the problem through these steps:

  • The Source.Date (from Intakes) and Outcome.Date (from Outcomes) columns were parsed from character strings into date-time objects using the parse_date_time() function.

  • The joined dataset was filtered to remove all rows where the Outcome.Date occurred before the Intake.Date.

  • To solve the many-to-many problem, the data was grouped by Animal.ID and Intake.Date. The slice_min() function was then applied to select only the earliest outcome for each unique intake event. This step made sure that each row in the Clean_Data frame represents a single shelter visit.

Once ready, the clean data was joined with the Weather dataset. To do this, the Intake.Date was converted to a simple Date object and served as the YEARMODA key for the join.

I also faced a problem with the data source itself. First, I tried to use the newest data model. I quickly found it did not have enough data for my project. Because of this, I had to switch to the larger older dataset. This change was not simple as it took me a lot of time to adapt my code to work with the old data’s structure.

4.2 Engineered variables

  • Season.of.Intake: The Intake.Month was extracted from the Intake.Date. A custom function I created mapped this month to one of four seasons.

  • Age_in_Months: The animal’s date of birth and date at time of intake were used to calculate the animals age at intake. The pmax() function was used so all negative ages were set to 0. Days were then converted into months.

4.3 Handling NA Values

Missing values for the Intake.Health.Condition variable were changed to empty strings (““). The na_if() function was used to convert all empty strings in this column into proper NA values so they wouldn’t cause problems during analysis.

In the final dataset there are 172257 observations and 7 variables.

4.4 Variable Descriptions

ACP Variable Summary
Variable Name Data Type Description
Animal_Type factor The type of animal.
Health_Condition factor The health condition of the animal upon intake.
Outcome factor The final outcome status for the animal.
Intake_Season factor The season during which the animal was taken in.
Age_in_Months numeric The animal’s age at intake in months.
Days_in_Shelter numeric The total number of days the animal spent in the shelter.
Avg_Temp_on_Intake numeric The average temperature on the day of the animal’s intake.

5 Univariate Analyses



This bar chart displays the frequency distribution of animal intakes across the four seasons. The distribution is not uniform, indicating a distinct seasonal pattern. The most frequent category is Summer, which has the highest number of intakes at about 47,000. Spring is the second-busiest season, approximately 44,000 intakes, followed closely by Fall with 45,000. The number of intakes drops noticeably in Winter, which is the least frequent category with approximately 36,000 intakes. This shows a clear seasonal variation, with intake counts peaking in the summer and reaching their lowest point in the winter. The range between the most and least frequent season is approximately 10,000 intakes.



This bar chart illustrates the frequency of different intake types. The data is heavily concentrated in one category, identifying a main way animals enter the shelter. The modal intake type is “Stray”, which accounts for 118078 intakes, or 68.5 % of the total. This is substantially higher than the second most common category, “Owner Surrender”, with 35257 intakes 20.5 %. The remaining categories, such as “Public Assist” 10364 and “Wildlife” 6422, represent a much smaller fraction of total intakes.



The histogram of animal age at intake reveals a right-skewed distribution. A very large proportion of incoming animals are young, with the highest frequency count occurring in the first bin, near 0 months, representing very young animals or newborns. The median age at intake is 12.2 months, which clearly shows the typical animal is quite young. The mean age, 24.7 months, is significantly higher, influenced by the long spread of older animals. This skewness is reflected in the large standard deviation of 34.7. The distribution also shows smaller peaks around 12 and 24 months, suggesting potential data misleads, where owners or staff may round ages to one or two years.



The distribution of the duration animals stay in the shelter is visualized with a log-transformed x-axis. This was necessary because the raw data is significantly right-skewed, with most stays being short but a spread of very long stays. The distribution on the log scale appears multimodal, with a peak around 5 days and another smaller peak near 1 day. The skewed data has a median length of stay of 6 days. The mean length of stay is 21.1 days, which is pulled higher by the outliers. The standard deviation is 48.5 , showcasing significant variability in how long animals remain in the shelter.



6 Multivariate Analyses



This scatter plot compares the “Age at Intake” (x-axis) against the “Days in Shelter” (y-axis), with both axes presented on a log scale. The data points are color-coded by the “Animal Type”. The plot shows that the vast majority of animals, primarily Stray (blue) and Owner Surrender (green), create a large cloud with no strong correlation between age and length of stay. An interesting observation is that Wildlife intakes (pink) show a very distinct pattern, with nearly all points clustered at the bottom of the graph. Conversely, Euthanasia Request (yellow) points are few but are clustered in the lower right, representing older animals with short shelter stays. The relationship between the age and shelter duration is very weak, with a correlation coefficient of 0.04.



This series of box plots illustrates the distribution of “Days in Shelter” (y-axis log scale) for the four main “Outcome Types”. There are clear differences in the length of stay for each outcome. Adoption is associated with the longest shelter stays, with a median duration of 3 weeks. In contrast, Euthanasia has the shortest median stay, appearing to be few hours. Return to Owner and Transfer have similar median stays, falling between the other two groups at 1 and 3 days. The difference between outcomes is clear with the median stay for ‘Adoption’ 14 days, being significantly longer than for ‘Euthanasia’ 1 day.



This stacked bar chart shows the relative proportions of different “Intake Types” across the four “Seasons of Intake”. The most striking feature of this graph is the stability of these proportions for Fall, Spring, and Summer. Stray (blue) is consistently the largest category, making up an estimated 65% of all intakes in every season. Owner Surrender (red) is the next most common, consistently accounting for roughly 20% of intakes. In Winter, the proportion of Stray intakes goes down, dropping to 55%. This tells us that while the total number of intakes might change every season, the proportions of intake types remains steady. This seasonal difference is notable in winter, when the proportion of ‘Stray’ intakes drops by approximately 6 percentage points.



This faceted stacked bar chart compares outcomes based on the animal’s “Intake Health Condition” (Injured, Normal, Sick), split into two graphs for “Owner Surrender” and “Stray” animals. For animals in ‘Normal’ condition, the ‘Adoption’ (red) proportion is the largest outcome in both groups, at approximately 50% for ‘Stray’ and just over 70% for ‘Owner Surrender’. Health status has a big impact for ‘Sick’ animals, the ‘Euthanasia’ (green) proportion increases significantly to over 15% for ‘Owner Surrender’ cases. A key difference between the facets is that ‘Return to Owner’ (blue) is a major outcome for ‘Stray’ animals, around 15% for those in ‘Normal’ conditions, but is nearly nonexistent for ‘Owner Surrender’ animals. For animals in ‘Normal’ condition, the adoption rate for ‘Owner Surrender’ animals is 71 percentage points.



This scatter plot examines the relationship between “Avg_Temp_on_Intake” and the total number of animals processed on days with that temperature. The plot shows a relationship. Intakes are very low on cold days, below 5°C. As the temperature warms, the number of intakes steadily increases, as shown by the red line. The volume of intakes peaks at hot temperatures, roughly between 25°C and 30°C. However, at the very hottest temperatures, above 30°C, the number of daily intakes appears to drop off. The Pearson’s correlation coefficient between the daily Celsius temperature and the total number of intakes is 0.63. This positive value indicates a general tendency for more intakes on warmer days.



This box plot examines the relationship between “Animal_Type” and” Intake_Year”. The visualization reveals a shift in intake classifications over time. The ‘Euthanasia Request’ category is heavily concentrated in the early years of the dataset, with a median around 2014. This suggest that this specific intake classification was largely discontinued or reclassified after 2015. Conversely, the ‘Abandoned’ category appears to be a much more recent, with a median intake year of 2023 and the entire interquartile range falling within the last few years. Standard categories like ‘Owner Surrender’, ‘Public Assist’, and ‘Stray’ show a distribution spanning the full timeline (2013-2025). However, ‘Wildlife’ intakes show slightly older median year of 2017, suggesting a potential decrease of wildlife intakes over the years.

7 Reproducibility

Several steps were taken to ensure the report is reproducible:

  • All required R packages, GSODR, dplyr, lubridate, knitr, tibble, ggplot2, are loaded in the first setup chunk. Any user running this file would first need to install these packages.

  • The weather data is fully reproducible as it is not saved locally. The code uses the GSODR package to pull the data directly from the NOAA GSOD database using the get_GSOD() function.

  • The two primary datasets, Austin_Animal_Center_Intakes.csv and Austin_Animal_Center_Outcomes.csv, are loaded from local files. For the anyone to reproduce the report, these two CSV files need to be located in the same directory as the .Rmd file.

8 Conclusion

This analysis integrated animal shelter records with daily weather data to identify the factors influencing shelter operations and animal outcomes. The major findings showed a strong seasonal and weather-driven pattern. Animal intakes are not random as they peak during the Summer and on hotter days , with the lowest intake volumes occurring in Winter. This seasonal change is linked to a significant drop in the proportion of “Stray” animals during colder months. As hypothesized, an animal’s journey through the shelter is strongly influenced by its specific circumstances. The final outcome was the strongest predictor of shelter duration. “Adoption” is a long process, with a median stay of three weeks, while “Euthanasia” occurs very quickly, often within a day. The “Return to Owner” outcome was as expected, a major pathway for “Strays” but was non-existent for “Owner Surrenders.” “Adoption” was the dominant outcome for “Normal” animals, but “Euthanasia” rates rose sharply for “Sick” animals. Interestingly, no strong correlation was found between an animal’s age and its length of stay. This says that once an animal is in the shelter, other factors like its health, intake type, and eventual outcome, are more decisive than its age in determining how long the stay at the center is. Beyond animal characteristics, the longitudinal data revealed shifts in administrative trends. The analysis of intake years identified that “Euthanasia Requests” effectively ceased after 2015, while “Abandoned” cases emerged as a distinct new category in 2023. Furthermore, the relationship between temperature and intake volume proved to be non-linear. While warmer weather generally drives higher intakes, the data indicates a threshold where shelter activity drops off once average daily temperatures exceed 30°C.

9 References

City of Austin. (2025). Austin Animal Center Intakes. Austin Open Data Portal. Retrieved November 12, 2025, from https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/pyqf-r2dc/about_data

City of Austin. (2025). Austin Animal Center Outcomes. Austin Open Data Portal. Retrieved November 12, 2025, from https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/gsvs-ypi7/about_data

City of Austin. (n.d.). The Austin Animal Center [Image]. KVUE. Retrieved November 11, 2025, from https://media.kvue.com/assets/KVUE/images/56c5cbdd-1b85-46fb-b4d5-c3f99d194763/56c5cbdd-1b85-46fb-b4d5-c3f99d194763_1920x1080.jpg

Grolemund, G., & Wickham, H. (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. https://www.jstatsoft.org/v40/i03/

National Oceanic and Atmospheric Administration (NOAA). (2025). Global Summary of the Day (GSOD). National Centers for Environmental Information.

Sparks, A. H., Hengl, T., & Nelson, A. (2018). GSODR: Global Summary of the Day Weather Data in R. R package version 1.1.5. https://CRAN.R-project.org/package=GSODR

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.