1. Load your chosen dataset into Rmarkdown

  2. Select the dependent variable you are interested in, along with independent variables which you believe are causing the dependent variable

  3. create a linear model using the “lm()” command, save it to some object

  4. call a “summary()” on your new model

  5. interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?

  6. Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable?

  7. Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
Animal_Control<-read_csv("Animal_Care_and_Control_Division_Annual_Statistics.csv")
## Rows: 22 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (17): Year, Number of Employees, Number of Division Vehicles, Annual Bud...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Adoptions_model<-lm(Euthanized~Adoptions,data=Animal_Control)
summary(Adoptions_model)
## 
## Call:
## lm(formula = Euthanized ~ Adoptions, data = Animal_Control)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1174.53  -192.86    51.67   211.99   886.80 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5662.387    875.929   6.464 2.65e-06 ***
## Adoptions     -2.253      0.406  -5.550 1.97e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 461.4 on 20 degrees of freedom
## Multiple R-squared:  0.6063, Adjusted R-squared:  0.5866 
## F-statistic:  30.8 on 1 and 20 DF,  p-value: 1.969e-05
Foster_model <- lm(Euthanized ~ `Fostered Animals`, data = Animal_Control)
summary(Foster_model)
## 
## Call:
## lm(formula = Euthanized ~ `Fostered Animals`, data = Animal_Control)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -661.78 -460.12  -99.59  470.44  900.22 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1567.8999   285.6842   5.488  2.7e-05 ***
## `Fostered Animals`   -0.8745     0.2818  -3.103  0.00586 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 549 on 19 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3363, Adjusted R-squared:  0.3014 
## F-statistic: 9.627 on 1 and 19 DF,  p-value: 0.005859
RTO_model <- lm(Euthanized ~ `Return to Owner`, data = Animal_Control)
summary(RTO_model)
## 
## Call:
## lm(formula = Euthanized ~ `Return to Owner`, data = Animal_Control)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1006.6  -553.5  -263.1   717.8  1275.8 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)       -1244.610   1503.150  -0.828    0.417
## `Return to Owner`     4.025      2.899   1.388    0.180
## 
## Residual standard error: 702.3 on 20 degrees of freedom
## Multiple R-squared:  0.08791,    Adjusted R-squared:  0.04231 
## F-statistic: 1.928 on 1 and 20 DF,  p-value: 0.1803
Adoptions_RTO_Foster <- lm(Euthanized~Adoptions+`Fostered Animals`+`Return to Owner`, data=Animal_Control)
summary(Adoptions_RTO_Foster)
## 
## Call:
## lm(formula = Euthanized ~ Adoptions + `Fostered Animals` + `Return to Owner`, 
##     data = Animal_Control)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -754.1 -272.6  108.0  255.5  486.0 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3576.9917  1121.0412   3.191 0.005353 ** 
## Adoptions            -1.9252     0.4290  -4.488 0.000324 ***
## `Fostered Animals`   -0.1896     0.2593  -0.731 0.474530    
## `Return to Owner`     2.9390     1.8206   1.614 0.124861    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 391.9 on 17 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.6974, Adjusted R-squared:  0.644 
## F-statistic: 13.06 on 3 and 17 DF,  p-value: 0.0001133

#The r-squared is 0.644. This model represents the variance between Euthanized that is explained by Adoptions, Fostered Animals, and Return to Owner IV’s combined. This R-squared result of 0.644 explains 64% of the variance in the numbers of euthanized animals.

#The p-value for Adoptions and Intercept (Euthanized) are significant. The p-value for Adoptions is highly significant, I would say that for every 1 additional adoption, the number of euthanized animals is expected to decrease by approximately 1.93, holding all other variables constant.

#The p-value for intercept (Euthanized) is significant, representing the predicted number of euthanized animals at approximately 3,577 if all independent variables were zero.

#The p-value of 0.0001133 is considered excellent, it’s far below the threshold of 0.05. This means that my results are highly staistically significant. There is about only a 0.01% chance that this relationship happened by accident.

#The insignificant variables are Fostered Animals and Return to Owner.

#Fostered Animals has a p-value of .475 and is much higher than the threshold for signiificance. Fostering does not have a statistically significant relationship with euthanasia rates in the specific model. I think I can transform the data some to get it closer to significance.

#Return to Owner has a p-value of .125, while closer to the threshold than fostering, it still fails to meet the .o5 cutoff and is considered statistically insignificant in this model. I will try to transform this data as well to see if I can get even closer to significant.

#The affect the significant independent variable of adoptions has on the significant dependent variable of euthanized animals is as stated above. For every 1 additional adoption, the number of euthanized animals is expected to decrease by approximately 1.93, holding all other variables constant.

#Also, while tested individually adoptions and fostering are significant IV’s. While not matter how Return to Owner is tested it remains insignificant.

plot(Adoptions_RTO_Foster,which=1)

#I think my model violates the assumption of linearity. There are too much variations within the Independent Variables. They are not falling on the dotted line, nor are they all closly clusted to the red line. I may have to do some futher data transformations to maybe get less variation within the IV’s.