To begin with, we are still working with the World Value Survey dataset, the focus again lies on Germany.

In this project we were looking for binary variables and one that was interesting - r_victim. It tells whether the respondent was a victim in some crime. This was our dependent variable.

We also chose three independent variables - satisfactions with the political system (stsf_polit_sys), whether one carries a weapon for security (safe_weapon) and how frequent are the robberies in the neighborhood (freq_robberies).

The two control variables are gender and age.

Reasons for choice of these variables are discussed in the theoretical framework section.

1. Theoretical framework

Age. Minors and the elderly are those categories of citizens who are at greater risk of becoming a victim of criminals. While the former are usually more strongly protected by law, the question of the security of the latter is quite urgent (Hirschel & Rubin, 1982). Due to the absence of juveniles in our sample, we expect that increasing age will be associated with a greater risk of becoming a victim.

Satisfaction with the political system (stsf_polit_sys). Trust or distrust of authorities affects the likelihood that a person will report a crime to the police (Xie & Baumer, 2019). We think that a positive answer to the question about the victims of crimes may be associated with the police confirming the fact of the crime, recognizing one’s experience as the experience of a victim. Therefore, those who trust the political system are likely to be more open about being the victim of criminals.

Many studies have raised the issue of crime and its victims. Gender is one of the factors that affects the likelihood of being a victim of a crime. In the study (Anggi Saputra, 2014) the main hypothesis was confirmed that female representatives are more likely to become victims of violence, robbery and many other types of crimes.

Also, other researchers believe that if a person carries a weapon with him (safe_weapon) , then the probability that he will become a victim of a crime is greatly reduced (Kai Thaler, 2011). This is explained by the rather simple fact that if a person has a weapon with him, then it becomes a device for self-defense.

Do not forget about the place where a person lives. According to one study (Wesley Skogan, 1986): if there is a high level of robberies in a person’s neighborhood (freq_robberies), then it is likely that residents of the neighborhood will become victims of crimes.

2. Data manipulations

germany1 <- Wave_7_Germany[,c(177,
                             295,
                             297, 
                             287,
                             174,
                             165)]
names(germany1)[1:6] = c("r_victim","gender", "age","stsf_polit_sys", "safe_weapon","freq_robberies")
germany1[germany1 < 0] <- NA
germany1 <- na.omit(germany1)
germany1$r_victim = recode_factor(germany1$r_victim, "1" = "Yes", "2"="No")
germany1$gender = recode_factor(germany1$gender, "1" = "Male", "2" = "Female")
germany1$safe_weapon = recode_factor(germany1$safe_weapon, "1" = "Yes", "2" = "No")
germany1$freq_robberies = recode_factor(germany1$freq_robberies, "1" = "Very frequently", "2" = "Quite frequently",
                                        "3" = "Not frequently", "4" = "Not at all frequently")
summary(germany1)
##  r_victim      gender         age       stsf_polit_sys   safe_weapon
##  Yes:  85   Male  :721   Min.   :18.0   Min.   : 1.000   Yes:  60   
##  No :1391   Female:755   1st Qu.:36.0   1st Qu.: 5.000   No :1416   
##                          Median :51.0   Median : 7.000              
##                          Mean   :50.7   Mean   : 6.373              
##                          3rd Qu.:65.0   3rd Qu.: 8.000              
##                          Max.   :96.0   Max.   :10.000              
##                freq_robberies
##  Very frequently      :  10  
##  Quite frequently     :  83  
##  Not frequently       : 373  
##  Not at all frequently:1010  
##                              
## 

3. Descriptive statistics

ggplot(germany1, aes(x=r_victim)) +
  geom_bar(fill = "darkslategray4", alpha = 0.8) + 
  labs(x = "Been a Victim", y ="Number", title="Victim of a crime during past year") +
  theme_bw() + coord_flip() +
  theme(text = element_text(family = "Times New Roman"))

So, the dataset doesn’t have many respondents who were victims of a crime. It may be good for Germany and the country’s safety, but not quite good for our model.

ggplot(germany1, aes(x=gender)) +
  geom_bar(fill = "darkslategray4", alpha = 0.8) + 
  labs(x = "Gender", title="Gender Distribution", y = "Number") +
  theme_bw() + coord_flip() +
  theme(text = element_text(family = "Times New Roman"))

Here we have almost equal distribution between the respondents.

ggplot(germany1, aes(x=age)) +
  geom_density(fill="darkslategray4", color="#e9ecef", alpha=0.8) + 
  labs(x = "Age", title="Age Distribution", y = "Density") +
  theme_bw()+
  theme(text = element_text(family = "Times New Roman"))

Most of the respondents are in their fifties. The distribution is not normal, but it is also hard to judge about the skews.

ggplot(germany1, aes(x=stsf_polit_sys)) +
  geom_bar(fill = "darkslategray4", alpha = 0.8) + 
  labs(x = "Satisfaction of political system", title="Satisfaction of political system Distribution", y = "Number") +
  scale_x_continuous(breaks= seq(from = 1, to = 10, by = 1)) +
  theme_bw() +
  theme(text = element_text(family = "Times New Roman"))

Most of the respondents are very well satisfied with the political system they are in. How remarkable.

ggplot(germany1, aes(x=safe_weapon)) +
  geom_bar(fill = "darkslategray4", alpha = 0.8) + 
  labs(x = "Carried a knife, gun or other weapon", title="Carried a weapon for reasons of security distribution", y = "Number") +
  theme_bw() +  coord_flip() +
  theme(text = element_text(family = "Times New Roman"))

There are not so many people carrying weapons.

ggplot(germany1, aes(x=freq_robberies)) +
  geom_bar(fill = "darkslategray4", alpha = 0.8) + 
  labs(x = "Frequency of Robberies in Neighbourhood", title="Frequency of Robberies in Neighbourhood Distribution", y = "Number") +
  theme_bw() + coord_flip() +
  theme(text = element_text(family = "Times New Roman")) 

4. Modelling

Logit Model

model_logit = glm(data = germany1, r_victim ~ gender + age + stsf_polit_sys + safe_weapon + freq_robberies,family=binomial)
summary(model_logit)
## 
## Call:
## glm(formula = r_victim ~ gender + age + stsf_polit_sys + safe_weapon + 
##     freq_robberies, family = binomial, data = germany1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8728   0.2224   0.2815   0.3646   1.2654  
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         -2.214179   0.893696  -2.478 0.013229 *  
## genderFemale                         0.673809   0.240274   2.804 0.005042 ** 
## age                                  0.018508   0.006747   2.743 0.006082 ** 
## stsf_polit_sys                       0.145094   0.048077   3.018 0.002545 ** 
## safe_weaponNo                        1.377508   0.366944   3.754 0.000174 ***
## freq_robberiesQuite frequently       0.945080   0.818580   1.155 0.248281    
## freq_robberiesNot frequently         1.542374   0.777546   1.984 0.047296 *  
## freq_robberiesNot at all frequently  1.890850   0.768845   2.459 0.013919 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 650.26  on 1475  degrees of freedom
## Residual deviance: 596.41  on 1468  degrees of freedom
## AIC: 612.41
## 
## Number of Fisher Scoring iterations: 6

gender - the log of odds ratio of becoming a victim is 0.67 higher for women compared to men.

age - the log of odds ratio of becoming a victim is increasing by 0.01 with each additional year of age.

stsf_polit_sys -the log of odds ratio of becoming a victim is increasing by 0.14 with each level of satisfaction with the political system.

safe_weapon - the log of odds ratio of becoming a victim is higher by 1.37 for people having a weapon with them as compared with those who don’t have it.

freq_robberies - the log of odds ratio of becoming a victim is higher by 1.54 if the crime has happened in the neighborhood where robberies happen not frequently as compared to those where robberies happen very frequently, and for neighborhoods where robberies happen not at all frequently it’s higher by 1.89 compared to high-frequency-robberies neighborhoods.

Odds ratio in model

exp(coef(model_logit))
##                         (Intercept)                        genderFemale 
##                           0.1092431                           1.9616948 
##                                 age                      stsf_polit_sys 
##                           1.0186803                           1.1561484 
##                       safe_weaponNo      freq_robberiesQuite frequently 
##                           3.9650079                           2.5730193 
##        freq_robberiesNot frequently freq_robberiesNot at all frequently 
##                           4.6756750                           6.6249950

gender - the odds ratio of becoming a victim is 1.96 higher for women compared to men.

age - the odds ratio of becoming a victim is increasing by 1.01 with each additional year of age.

stsf_polit_sys -the odds ratio of becoming a victim is increasing by 1.15 with each level of satisfaction with the political system.

safe_weapon - the odds ratio of becoming a victim is higher by 3.96 for people having a weapon with them as compared with those who don’t have it.

freq_robberies - the odds ratio of becoming a victim is higher by 4.67 if the crime has happened in the neighborhood where robberies happen not frequently as compared to those where robberies happen very frequently, and for neighborhoods where robberies happen not at all frequently it’s higher by 6.62 compared to high-frequency-robberies neighborhoods.

So, for now it seems that the elder you are, the more satisfied with the political system you are, if you are a woman, you live in the neighborhood with not frequent or not at all frequent robberies, oh, and you don’t have a weapon with you; the more likely you are to become a victim. However, there is no interaction effect, so these must be treated isolated from one another.

The Model Fit

library(DescTools)
PseudoR2(model_logit)
##   McFadden 
## 0.08281183

The Pseudo R-squared tell us that the model fit is bad. The coefficient should be between 0.2 and 0.5. What we have is lower than 0.1.

It means we can only judge about relations between the variables.

library(pscl)
hitmiss(model_logit)
## Classification Threshold = 0.5 
##        y=0  y=1
## yhat=0   2    1
## yhat=1  83 1390
## Percent Correctly Predicted = 94.31%
## Percent Correctly Predicted = 2.353%, for y = 0
## Percent Correctly Predicted = 99.93%  for y = 1
## Null Model Correctly Predicts 94.24%
## [1] 94.308943  2.352941 99.928109

Now we can see why the model is bad. We predict only 2% of cases where people actually became victims and 99.93% cases where one didn’t become a victim.

The null model predicts 94%, so our model is better than some random one, but still the aim is lost in the process.

If the probit model doesn’t show any better results, we shall talk about it in the discussion section.

Probit Model

model_probit = glm(data = germany1, r_victim ~ gender + age + stsf_polit_sys + safe_weapon + freq_robberies,
                   family=binomial(link = probit))
summary(model_probit)
## 
## Call:
## glm(formula = r_victim ~ gender + age + stsf_polit_sys + safe_weapon + 
##     freq_robberies, family = binomial(link = probit), data = germany1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9021   0.2213   0.2843   0.3707   1.1227  
## 
## Coefficients:
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         -0.938836   0.499833  -1.878  0.06034 .  
## genderFemale                         0.315055   0.112861   2.792  0.00525 ** 
## age                                  0.008315   0.003173   2.621  0.00877 ** 
## stsf_polit_sys                       0.069528   0.023452   2.965  0.00303 ** 
## safe_weaponNo                        0.698154   0.202056   3.455  0.00055 ***
## freq_robberiesQuite frequently       0.528950   0.468960   1.128  0.25935    
## freq_robberiesNot frequently         0.824780   0.446150   1.849  0.06451 .  
## freq_robberiesNot at all frequently  0.995263   0.441883   2.252  0.02430 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 650.26  on 1475  degrees of freedom
## Residual deviance: 597.91  on 1468  degrees of freedom
## AIC: 613.91
## 
## Number of Fisher Scoring iterations: 6

gender - for females z-score of probability of becoming a victim is by 0.31 greater compared to men.

age - with each additional year of age the z-score of probability of becoming a victim increases by 0.008.

stsf_polit_sys - with each level of satisfaction with political system z-score of probability of becoming a victim increases by 0.07.

safe_weapon - for those not carrying weapons z-score of probability of becoming a victim is higher by 0.7 compared to those carrying weapons.

freq_robberies - for people living in neighborhoods with not at all frequent robberies the z-score of probability of becoming a victim is by 0.995 higher compared to those living in neighborhoods with very frequent robberies.

Margins in model

summary(margins(model_probit))
##                               factor    AME     SE      z      p   lower  upper
##                                  age 0.0009 0.0003 2.5895 0.0096  0.0002 0.0016
##  freq_robberiesNot at all frequently 0.1868 0.1278 1.4618 0.1438 -0.0637 0.4373
##         freq_robberiesNot frequently 0.1686 0.1280 1.3172 0.1878 -0.0823 0.4194
##       freq_robberiesQuite frequently 0.1246 0.1310 0.9514 0.3414 -0.1321 0.3813
##                         genderFemale 0.0335 0.0120 2.7987 0.0051  0.0101 0.0570
##                        safe_weaponNo 0.1162 0.0468 2.4841 0.0130  0.0245 0.2078
##                       stsf_polit_sys 0.0074 0.0025 2.9234 0.0035  0.0025 0.0124

gender - for women the probability of becoming a victim is higher by 3% compared to men.

age - with each additional year of age the probability of becoming a victim increases by 0.1%.

stsf_polit_sys - with additional level of satisfaction with the political system the probability of becoming a victim increases by 0.7%.

safe_weapon - for those not carrying weapons the probability of becoming a victim in greater by 12% than for those who carry weapons.

For freq_robberies all the p-values are above the threshold, so we cannot draw any conclusions.

The Model Fit

PseudoR2(model_probit)
##   McFadden 
## 0.08051674

Pseudo R-squared fell down. Probably it’s because we lost “not frequently” level in freq_robberies variable in the model.

Anyways, our model doesn’t fit the data well, so the best thing we can do is to just watch on the relations between the variables.

hitmiss(model_probit)
## Classification Threshold = 0.5 
##        y=0  y=1
## yhat=0   1    0
## yhat=1  84 1391
## Percent Correctly Predicted = 94.31%
## Percent Correctly Predicted = 1.176%, for y = 0
## Percent Correctly Predicted = 100%  for y = 1
## Null Model Correctly Predicts 94.24%
## [1]  94.308943   1.176471 100.000000

Now we have 100% cases with non-victims predicted, and 1.1% of cases with victims predicted.

The model fit is bad.

Discussion

Lack of result, as well as failures and errors are also results. We can still draw some conclusions and discuss what we have. First, the data doesn’t have enough cases of victimized people, so we couldn’t train it in a proper way. The more cases of one category it has, the more it’s prone to predict only these ones, which would increase the model’s accuracy. So, our model predicts victims very badly.

But since we are sociologists, sometimes it’s enough to just watch the directions of relations between the variables.

Maybe we can say that more women are victims of crimes in Germany. This hypothesis was put forward in the literary review part, and it was confirmed. People who carry weapons are less likely to become victims. Referring to our review, it can also be explained. Simply having a self-defense device can prevent crime from happening.

The older you are the more likely you are to be a victim like it was stated in the review part (but the estimates are so low that each year almost increases the possibility by less than 0.01).

Also, people with more satisfaction with the political systems as well as those who live in the safe neighborhoods with almost no robberies are more likely to be victims according to the model. These results also confirm stated hypotheses and can be explained by the existing literature on the topic. But it seems to us that the dataset has significantly more people who are satisfied and live in safe neighborhoods and the probability of someone who was a victim to be among these people is just big. So we probably wouldn’t be sure of this relationship.

Sources: