STA 111 Lab 8
The Data
The Austin Animal Shelter in Austin, Texas, is the largest no-kill shelter in the United States. The organization cares for dogs, cats, and other animals in need, and each year, this work results in thousands of adoptions.
Today, we will be working with random sample of 1000 animals adopted from the Shelter in 2016 or 2017. Our task is to determine the average number of number of days an animal tends to spend in the Shelter prior to being adopted. We will be using the techniques of building confidence intervals and performing hypothesis tests to take the results from our sample and try to generalize to the population.
Our sample contains information on some of the animals that were adopted from the Shelter in 2016 or 2017.
Question 1
What is the population of interest?
Loading the Data
As we know, the first step in any data analysis is looking at the data We need to know what kind of data we are working with, and what questions we might be able to explore using this data. To start, we need to load in the data.
This data load is going to be different from what we have done so far. This data set is a csv file and is posted on Canvas under Lab 8. Because we are loading our own data set, we have a few extra steps to load the data. Detailed instructions are provided here.
Check to be sure you see Adopted
in your Environment Tab and that you have put the necessary code into a chunk in your RMarkdown document before you proceed.
Obtaining the Sample Statistic
The data set we will be working with contains information on 16 different variables. The meaning of each variable is as follows:animal_type
: the type of animal, i.e., dog, cat, etc.breed
: the specific breed of animal; example: Cocker Spaniel.color
: the coat/feather color of the animal.intake_type
: the reason that the animal was taken in to the shelter. Levels: Owner Surrender, Public Assist, Stray, Wildlife.`intake_condition
: the condition of the animal upon its arrival at the shelter.age_upon_intake_.days.
: the age of the animal in days upon its arrival at the shelter.age_upon_intake_.years.
: the age of the animal in years (rounded) upon its arrival at the shelter.intake_month
: the month of the year the animal arrived at the shelter.intake_year
: the year the animal arrived at the shelter.time_in_shelter_days
: the number of days the animal remained in the shelter prior to being adopted.outcome_year
: the year the animal was adopted from the shelter.outcome_subtype
: if the adoption was not to a family how, where was the animal adopted? Options: Blank (adopted to a family), Foster (sent to a foster home) or Offsite (adopted by an organization or business).outcome_type
: for all animals in our subset, the outcome of being in the shelter is “Adopted”.sex_upon_outcome
: the sex of the animal, including information about neutering/spaying.age_upon_outcome_.days.
: the age of the animal in days upon its departure from the shelter.age_upon_outcome_.years.
: the age of the animal in years upon its departure from the shelter.
We are interested in the variable time_in_shelter_days
, which expresses the number of days an animal is in the Shelter prior to being adopted.
Question 2
Find the sample mean for the number of days an animal stays in the Shelter. Remember that the code mean
will be useful for this.
Question 3
Find the sample standard deviation for the number of days an animal stays in the Shelter. Remember that the code sd
will be useful for this.
Confidence Interval
We have been asked to determine how many days on average an animal remains in the Shelter in Austin before being adopted.
Question 4
We have already computed the sample mean. Why would it not be reasonable to just report this sample statistic (i.e., sample mean) in response to this question?
Because we cannot just report the sample statistic, we need to utilize some sort of inference procedure. This means building a confidence interval or running a hypothesis test. We are going to begin with a confidence interval.
A confidence interval requires three pieces of information:
- A sample statistic (in our case a sample mean)
- A critical value (this tells us how many standard errors we want to stretch away from the mean)
- The standard error of our sample statistic (this tells us how much we expect our sample statistic to change from sample to sample)
Question 5
Our sample statistic is a sample mean \(\bar{x}\), and this comes from a sample with \(n = 1000\). Based on this, if we want to build a 95% confidence interval for the population mean, what distribution will we use to find our critical value?
Question 6
Find the critical value for the 95% confidence interval for the population mean. Hint: Helpful code for this will be
qt( INPUT1 , df = INPUT2)
To run the code, replace INPUT1
with a number between 0 and 1 that represents the percentile you want to compute. Then, replace INPUT2
with the degrees of freedom for your distribution.
Question 7
Is the critical value very similar to what you would have gotten using a normal distribution, or is it different? Why do you think that is?
Question 8
What is the standard error of \(\bar{x}\)?
Question 9
Make and interpret a 95% confidence interval for the population mean (average number of days an animal remains in the Shelter before being adopted.)
Hypothesis Test
Now that we have built a confidence interval, let’s explore how hypothesis tests could be useful in working with these data.
An individual who works at the Shelter claims that, on average, animals spend less than a month (30 days) in the Shelter prior to being adopted. In order to check this claim, we could either build a confidence interval or conduct a hypothesis test.
Question 10
Based on the 95% confidence interval you have built, how would you reply to the claim that on average, animals spend less than 30 days in the Shelter prior to being adopted?
Question 11
Run a hypothesis test to determine whether or not the average length of time an animal remains in the Shelter prior to being adopted is less than 30 days. Write down all 6 steps, interpret the p-value, and state your conclusion. Note: In Step 4, just state the distribution; you do not need to draw a picture. To produce the necessary mathematical notation for the 6 steps, copy and paste the following into the white space in your Markdown file.
Step 1: $$H_0:, H_a:$$
Step 2: $$\bar{x} =$$, $$SE= \frac{s}{\sqrt{n}} =$$
Step 3:
Step 4:
Step 5: The p-value is ?. This means that …
Step 6: Using a significance level of .05, we …
As part of a hypothesis test, we need to compute a p-value. We know how to do this using our t-tables, but we can also do it in R. Suppose we have computed a t-score for some sample mean, and found that it is -2.4. In order to compute the probability of seeing a sample mean as extreme as -2.4, we use the code:
pt(-2.4, df= 999)
Note that this computes the probability of being less than or equal to the t-score of -2.4.
Question 12
What is the maximum probability of making a Type 1 Error we are willing to allow in our hypothesis test?
Just Dogs
Suppose we are only interested in the time it takes dogs to be adopted. There are more than just dogs in our Adopted
, so the first step is to create a subset of our data that only contains information on the adoption of dogs. We can do that using the subset
command:
<-subset(Adopted, Adopted$animal_type == "Dog") Dogs
Question 13
Create a 95% confidence interval for the average number of days a dog is in the Shelter prior to being adopted.
Question 14
Interpret the interval.
Question 15
Consider the two 95% confidence intervals you have created (Questions 9 and 13). Based on these intervals, do you that the average number of days it takes for an animal to be adopted overall is different from the average number of days it takes a dog to be adopted? Explain your reasoning.
This lab was written by Nicole Dalzell at Wake Forest University, using data provided here through the Austin Open Data initiative. This lab is released under a Creative Commons Attribution-NonCommercial 4.0 International License. Last updated 2022 August 4.