Load data
Victim<-read.csv("C:/Users/ASUS/Desktop/data/victims.csv", header=T)
head(Victim)
## nV Race
## 1 0 White
## 2 0 White
## 3 0 White
## 4 0 White
## 5 0 White
## 6 0 White
Recode “Race” into numeric variable (black=1, white=0),& save
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
Victim01<-Victim %>% mutate(Raceij=recode(Race,
`Black`="1",
`White`="0"))
Victim01$Raceij= as.numeric(Victim01$Raceij)
is.numeric(Victim01$Raceij)
## [1] TRUE
head(Victim01)
## nV Race Raceij
## 1 0 White 0
## 2 0 White 0
## 3 0 White 0
## 4 0 White 0
## 5 0 White 0
## 6 0 White 0
The output suggested that Raceij (Black vs White) is a significant predictor for nV
(# of Victim). For every unit increase in Raceij (being Black) will change the log of count of nV by +1.7331 unit. (Black interviewees tend to report a higher log of count in Victim #)
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
m1<-glm.nb(nV ~ Raceij, link=log, data=Victim01)
summary (m1)
##
## Call:
## glm.nb(formula = nV ~ Raceij, data = Victim01, link = log, init.theta = 0.2023119205)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7184 -0.3899 -0.3899 -0.3899 3.5072
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.3832 0.1172 -20.335 < 2e-16 ***
## Raceij 1.7331 0.2385 7.268 3.66e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.2023) family taken to be 1)
##
## Null deviance: 471.57 on 1307 degrees of freedom
## Residual deviance: 412.60 on 1306 degrees of freedom
## AIC: 1001.8
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.2023
## Std. Err.: 0.0409
##
## 2 x log-likelihood: -995.7980
The poisson model gives a similar coefficient estimates output. Furthermore, the
overdispersion test indicates there is an overdispersion issue in y^
sjPlot::tab_model(m2 <- glm(nV ~ Raceij, data=Victim01,
family=poisson(link=log)), show.se=T, show.r2=F, show.obs=F)
| nV | ||||
|---|---|---|---|---|
| Predictors | Incidence Rate Ratios | std. Error | CI | p |
| (Intercept) | 0.09 | 0.10 | 0.08 – 0.11 | <0.001 |
| Raceij | 5.66 | 0.15 | 4.24 – 7.53 | <0.001 |
summary(m2)
##
## Call:
## glm(formula = nV ~ Raceij, family = poisson(link = log), data = Victim01)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0218 -0.4295 -0.4295 -0.4295 6.1874
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.38321 0.09713 -24.54 <2e-16 ***
## Raceij 1.73314 0.14657 11.82 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 962.80 on 1307 degrees of freedom
## Residual deviance: 844.71 on 1306 degrees of freedom
## AIC: 1122
##
## Number of Fisher Scoring iterations: 6
performance::check_overdispersion(m2)
## # Overdispersion test
##
## dispersion ratio = 1.746
## Pearson's Chi-Squared = 2279.873
## p-value = < 0.001
## Overdispersion detected.
Redo an output summary with dispersion parameter defining as 1.746
summary(m2, dispersion=1.746)$coef
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.383208 0.1283420 -18.569195 5.705143e-77
## Raceij 1.733145 0.1936693 8.948992 3.587447e-19
To fix overdispersion issue, a Quasipoisson model is used to account for the influence of the dispersion parameter
summary(m3<- glm(nV ~ Raceij, family=quasipoisson, data = Victim01))
##
## Call:
## glm(formula = nV ~ Raceij, family = quasipoisson, data = Victim01)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0218 -0.4295 -0.4295 -0.4295 6.1874
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.3832 0.1283 -18.57 <2e-16 ***
## Raceij 1.7331 0.1937 8.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.745694)
##
## Null deviance: 962.80 on 1307 degrees of freedom
## Residual deviance: 844.71 on 1306 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6