1 Data Management

Load data

Victim<-read.csv("C:/Users/ASUS/Desktop/data/victims.csv", header=T)
head(Victim)
##   nV  Race
## 1  0 White
## 2  0 White
## 3  0 White
## 4  0 White
## 5  0 White
## 6  0 White

Recode “Race” into numeric variable (black=1, white=0),& save

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
Victim01<-Victim %>% mutate(Raceij=recode(Race, 
                         `Black`="1",
                         `White`="0"))

Victim01$Raceij= as.numeric(Victim01$Raceij)
is.numeric(Victim01$Raceij)
## [1] TRUE
head(Victim01)
##   nV  Race Raceij
## 1  0 White      0
## 2  0 White      0
## 3  0 White      0
## 4  0 White      0
## 5  0 White      0
## 6  0 White      0

2 Analysis

2.1 m1 (negative binominal)

The output suggested that Raceij (Black vs White) is a significant predictor for nV
(# of Victim). For every unit increase in Raceij (being Black) will change the log of count of nV by +1.7331 unit. (Black interviewees tend to report a higher log of count in Victim #)

library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
m1<-glm.nb(nV ~ Raceij, link=log, data=Victim01)
summary (m1)
## 
## Call:
## glm.nb(formula = nV ~ Raceij, data = Victim01, link = log, init.theta = 0.2023119205)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7184  -0.3899  -0.3899  -0.3899   3.5072  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.3832     0.1172 -20.335  < 2e-16 ***
## Raceij        1.7331     0.2385   7.268 3.66e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.2023) family taken to be 1)
## 
##     Null deviance: 471.57  on 1307  degrees of freedom
## Residual deviance: 412.60  on 1306  degrees of freedom
## AIC: 1001.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.2023 
##           Std. Err.:  0.0409 
## 
##  2 x log-likelihood:  -995.7980

2.2 m2 (poisson model)

The poisson model gives a similar coefficient estimates output. Furthermore, the
overdispersion test indicates there is an overdispersion issue in y^

sjPlot::tab_model(m2 <- glm(nV ~ Raceij, data=Victim01, 
                            family=poisson(link=log)), show.se=T, show.r2=F, show.obs=F)
  nV
Predictors Incidence Rate Ratios std. Error CI p
(Intercept) 0.09 0.10 0.08 – 0.11 <0.001
Raceij 5.66 0.15 4.24 – 7.53 <0.001
summary(m2)
## 
## Call:
## glm(formula = nV ~ Raceij, family = poisson(link = log), data = Victim01)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0218  -0.4295  -0.4295  -0.4295   6.1874  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.38321    0.09713  -24.54   <2e-16 ***
## Raceij       1.73314    0.14657   11.82   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 962.80  on 1307  degrees of freedom
## Residual deviance: 844.71  on 1306  degrees of freedom
## AIC: 1122
## 
## Number of Fisher Scoring iterations: 6
performance::check_overdispersion(m2)
## # Overdispersion test
## 
##        dispersion ratio =    1.746
##   Pearson's Chi-Squared = 2279.873
##                 p-value =  < 0.001
## Overdispersion detected.

Redo an output summary with dispersion parameter defining as 1.746

summary(m2, dispersion=1.746)$coef
##              Estimate Std. Error    z value     Pr(>|z|)
## (Intercept) -2.383208  0.1283420 -18.569195 5.705143e-77
## Raceij       1.733145  0.1936693   8.948992 3.587447e-19

2.3 m3 (Quasipoisson)

To fix overdispersion issue, a Quasipoisson model is used to account for the influence of the dispersion parameter

summary(m3<- glm(nV ~ Raceij, family=quasipoisson, data = Victim01))
## 
## Call:
## glm(formula = nV ~ Raceij, family = quasipoisson, data = Victim01)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0218  -0.4295  -0.4295  -0.4295   6.1874  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.3832     0.1283  -18.57   <2e-16 ***
## Raceij        1.7331     0.1937    8.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.745694)
## 
##     Null deviance: 962.80  on 1307  degrees of freedom
## Residual deviance: 844.71  on 1306  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6