# Import data
setwd("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Assignment/Lab Assignment 10")
Arrests <- read.csv("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Assignment/Lab Assignment 10/Arrests.csv")
Sys.setenv(language="en")
library(psych)
Warning: package ‘psych’ was built under R version 4.1.1
describe(Arrests)
Arrests$checksbinary <- ifelse(Arrests$checks==0,0,1)
Arrests2 <- subset(Arrests[,c("checksbinary","race","age")])
linearity <- glm(checksbinary~., family=binomial(link="logit"), data=Arrests2)
logodds <- predict(linearity)
plotlin <- with(Arrests2, data.frame(checksbinary=checksbinary, logit=logodds))
# Plotting
ggplot(plotlin, aes(x=checksbinary, y=logit))+
geom_point()+
labs(x="checksbinary", y="log odds") +
geom_smooth(method="loess", col="#3e3e3e")+
geom_smooth(method="lm", col="blue")
`geom_smooth()` using formula 'y ~ x'
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
pseudoinverse used at -0.005
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
neighborhood radius 1.005
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
reciprocal condition number 6.4089e-031
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)), :
pseudoinverse used at -0.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)), :
neighborhood radius 1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)), :
reciprocal condition number 6.4089e-031
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)), :
There are other near singularities as well. 1.01
`geom_smooth()` using formula 'y ~ x'
lm1<-glm(checksbinary ~ race + age, family=binomial(link='logit'), data=Arrests2)
summary(lm1)
Call:
glm(formula = checksbinary ~ race + age, family = binomial(link = "logit"),
data = Arrests2)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2080 -1.2014 0.6957 0.9997 1.2457
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.91340 0.45213 -2.020 0.04336 *
race 0.99701 0.30664 3.251 0.00115 **
age 0.05387 0.01914 2.815 0.00488 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 361.16 on 276 degrees of freedom
Residual deviance: 340.09 on 274 degrees of freedom
AIC: 346.09
Number of Fisher Scoring iterations: 4
lm1null<-glm(checksbinary~1, family=binomial(link='logit'), data=Arrests2)
The logit coefficient for the race variable is 0.997, which means that black arrestee has 0.997 more log odds of name appearing in a police database for a previous arrest, conviction, or parole than the white arrestee.