library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.2.0 âś” readr 2.1.6
## âś” forcats 1.0.1 âś” stringr 1.6.0
## âś” ggplot2 4.0.2 âś” tibble 3.3.1
## âś” lubridate 1.9.5 âś” tidyr 1.3.2
## âś” purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
Animal_Control<-read_csv("Animal_Care_and_Control_Division_Annual_Statistics.csv")
## Rows: 22 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (17): Year, Number of Employees, Number of Division Vehicles, Annual Bud...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Adoptions_RTO_Foster <- lm(Euthanized~Adoptions+`Fostered Animals`+`Return to Owner`, data=Animal_Control)
plot(Adoptions_RTO_Foster,which=1)
raintest(Adoptions_RTO_Foster)
##
## Rainbow test
##
## data: Adoptions_RTO_Foster
## Rain = 8.1043, df1 = 11, df2 = 6, p-value = 0.009022
Adoptions_RTO_Foster<-lm(Euthanized~Adoptions+`Fostered Animals`+`Return to Owner`, data=Animal_Control)
Adoptions_RTO_Foster_log<-lm(log(Euthanized)~log(Adoptions+`Fostered Animals`+`Return to Owner`),data=Animal_Control)
plot(Adoptions_RTO_Foster_log,which=1)
raintest(Adoptions_RTO_Foster_log)
##
## Rainbow test
##
## data: Adoptions_RTO_Foster_log
## Rain = 4.5353, df1 = 11, df2 = 8, p-value = 0.02055
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
durbinWatsonTest(Adoptions_RTO_Foster)
## lag Autocorrelation D-W Statistic p-value
## 1 0.1800719 1.569375 0.144
## Alternative hypothesis: rho != 0
plot(Adoptions_RTO_Foster,which = 3)
bptest(Adoptions_RTO_Foster)
##
## studentized Breusch-Pagan test
##
## data: Adoptions_RTO_Foster
## BP = 1.2035, df = 3, p-value = 0.7522
plot(Adoptions_RTO_Foster,which=2)
shapiro.test(Adoptions_RTO_Foster$residuals)
##
## Shapiro-Wilk normality test
##
## data: Adoptions_RTO_Foster$residuals
## W = 0.94478, p-value = 0.2704
plot(Adoptions_RTO_Foster_log,which=2)
Adoptions_RTO_Foster <- lm(Euthanized~Adoptions+`Fostered Animals`+`Return to Owner`, data=Animal_Control)
vif(Adoptions_RTO_Foster)
## Adoptions `Fostered Animals` `Return to Owner`
## 1.470658 1.661375 1.226752
#My model doesn’t seem to meet the assumption of linearity by eye. I feel that my model is not very linear and has a curve in it can be transformed to become more linear. I conducted a rainbow test and got a p-value far less than 0.05. My raintest p-value results were 0.009022, making this model non-linear.
#After running the Durbin-Watson Test my data shows a p-value of 0.112. This result passed the indpendence of errors assumption. With a p-value of more than 0.05 I fail to reject the null hypothesis. The residuals are not significantly autocorrelated, thereby satisfying the assumption of independence required for linear modeling.
#As for the assumption of homoscedasticity I ran a plot and bptest. The p-value that my data shows in the bptest was 0.7522 and this is very much greater than 0.05. Based on this p-value of 0.7522 i fail to reject the null hypothesis and my model has homoscedasticity. My data passes this assumption.
#I have met the assumption of normality of residuals based on the the W=0.94478. With this number being so close to 1.0 a 0.94 indicates that my data is close to a perfect normal distribution. I ran plot (log)Which=2) just to try and see how the data changed for educational purpose. I have satisfied the assumption of normality.
#While running the VIF test my variables retured values far less than 5 and way less than 10. The three independent variables of Adoptions, Return to Owner, and Fostering are not strongly correlated with one another. The data shows Adoptions= 1.470658, Return to Owner= 1.226752 and Fostering= 1.661375. My variables passed the no multicolinarity assumption.
#My model violated the assumption of linearity.
#I performed a log transformation on the data.
Adoptions_RTO_Foster_log<-lm(log(Euthanized)~log(Adoptions+Fostered Animals+Return to Owner),data=Animal_Control)
plot(Adoptions_RTO_Foster_log,which=1) While i got a p-value of p-value
= 0.02055 and this is still considered non-linear it was a move in the
right direction.
#I am not sure if I should try to continue to mitigate the linearity assumption or if I should leave it alone. After the log transformation it apears to the eye to be less curved and more linear than the orgianl model. I actually think the first model is more linear than the log transformed model. The transformed model has two bumps in it, where as the orginal has only one slope.