According to data from the Gun Violence Archive, a total of 316 mass shooting incidents have occurred as of November 14, 2017. The Federal Bureau of Investigation (F.B.I.)1 defines a “mass killing” as the killing of three or more people in a public place, but they also define a “mass murderer” as someone who has killed four or more people in the same location. The Gun Violence Archive2 lists itself as a non-for-profit organization that documents gun violence and gun crime nationally. In this project, we will utilize shooting data collected from a dataset from the kaggle website for Data Science and Machine Learning3. In the dataset, we will analyze variables for categorical analysis and potential independent outcomes. In this project, a several variables are tested through a series of different categorical analysis techniques.
Through the series of analysis we will look at four different testing scenarios. First, we will test the relationship between age groups (adult or teen) and race (white or black) with the Cochran-Mantel-Haenszel (CMH) test for a \(2x2\) contingency table. The second test covers mental health status in response to mass shooting type (mass shooting event or not) controlling for race using the CMH test, along with relative risk, and odds ratios. The third test is for linear association between mass shooting events and age ranges using the Cochran-Armitage Trend test. Then finally, a logistic regression test for mass shooting type, day type (weekday or weekend), race, mental health status. In the logistic regression we will fit the statistic, find significant parameters, and use the Hosmer and Lemeshow goodness-of-fit test to see if the model is adequate for dataset.
The table is setup with the risk factor for “Age Group” is the row variable, and the response variable for “Race”. The age groups are \(Teens=1-19\) and \(Adults=20\) and above.
| Age Group | Black | White | Total |
|---|---|---|---|
| Teen | 12 | 31 | 43 |
| Adult | 63 | 104 | 167 |
| Total | 75 | 135 | 210 |
The results include a contingency . Using a \(\chi^2\) test; where the null hypothesis \(H_0:\beta=0\) or that the two variables are independent. With a summary table from the SAS output.
| Statistic | Degrees of Freedom | Value | Probability |
|---|---|---|---|
| Chi-Square | 1 | 1.4355 | 0.2309 |
| Likelihood Ratio Chi-Square | 1 | 1.4779 | 0.2241 |
| Continuity Adj. Chi-Square | 1 | 1.0398 | 0.3079 |
| Mantel-Haenszel Chi-Square | 1 | 1.4287 | 0.232 |
| Phi Coefficient | -0.0827 | ||
| Contingency Coefficient | 0.0824 | ||
| Cramer’s V | -0.0827 |
Exact Test
| Cell (1,1) Frequency (F) | 12 |
|---|---|
| Left-sided Pr <= F | 0.1538 |
| Right-sided Pr >= F | 0.9174 |
| Table Probability (P) | 0.0712 |
| Two-sided Pr <= P | 0.2853 |
Since we observe a large \(p-value=0.2381\); we do not reject the null hypothesis \(H_0:\beta=0\), and conclude that race and age group have a weak association.
| Statistic | Value | Lower 95% CI | Upper 95% CI |
|---|---|---|---|
| Odds Ratio | 0.639 | 0.3061 | 1.3342 |
| RR (Column 1) | 0.7398 | 0.4405 | 1.2423 |
| RR (Column 2) | 1.1576 | 0.9288 | 1.4429 |
Relative Risk
I am very much interested in looking at trends among teens across race lines. The relative risk of a black teen assailant relative to a black adult assailant is \(0.7398\) while the relative risk of a white teen assailant relative to a white adult assailant is \(1.1576\).
It is interesting to note that White teens are more likely to commit mass shooting acts than adult whites. The same cannot be said however for the black group. It shows an opposite trend.
Odds Ratio
I wanted to compare the likelihood of occurrence of mass shooting between groups. I utilized the odds ratio. The odds for black teens the shooter relative to a \(black adults= 0.63910\) This means that black teens are less likely to indulge in mass shooting than black adults. For the White sample, the odds ratio \(\Omega= 1.57\). White teens are more likely to indulge in such incidents than white adults.
The assumption of utilizing the \(\chi^2\) statistic is that the expected cell counts for each cell should exceed 5. Note that the expected values for each combination is big so the assumption is met and conclusions are sound. Otherwise, we should use Fisher’s exact test in the end of the output where \(p-value=0.2853\). From the SAS output, the \(\chi^2\) option is used to calculate the \(\chi^2\) statistics, which includes Pearson \(\chi^2\), likelihood-ratio \(\chi^2\), and Mantel-Haenszel \(\chi^2\). We observe that these statistics are asymptotically equivalent.
Testing dependence of Shooting Type by Mental Health Status across Race.
We collected data on mental health related incidents. With a response variable on shooting type Mass Shootings (MS) versus Non-Mass Shootings (NMS) and controlled by race.
| Race | Mental Health | Mass Shooting | Non-Mass Shooting |
|---|---|---|---|
| White | Yes | 39 | 27 |
| No | 20 | 17 | |
| Black | Yes | 6 | 8 |
| No | 9 | 18 |
The summary statistics
| Statistic | Alternative Hypothesis | Degrees of Freedom | Value | Probability |
|---|---|---|---|---|
| 1 | Nonzero Correlation | 1 | 0.5334 | 0.46519 |
| 2 | Row Mean Scores Difference | 1 | 0.5334 | 0.46519 |
| 3 | General Association | 1 | 0.5334 | 0.46519 |
So, with a \(p-value=0.4652\) we can conclude that the association between mental health status and type of shooting is a weak association for controlling by race.
Controlling for Race = WHITE
| Statistic | Value | Lower 95% CI | Upper 95% CI |
|---|---|---|---|
| Odds Ratio | 1.2278 | 0.5453 | 2.7646 |
| RR (Column 1) | 1.0932 | 0.7638 | 1.5646 |
| RR (Column 2) | 0.8904 | 0.5654 | 1.4021 |
Controlling for Race = BLACK
| Statistic | Value | Lower 95% CI | Upper 95% CI |
|---|---|---|---|
| Odds Ratio | 1.5 | 0.3979 | 5.564 |
| RR (Column 1) | 1.2857 | 0.574 | 2.88 |
| RR (Column 2) | 0.8571 | 0.5064 | 1.4508 |
Common Odds Ratio and Relative Risks
| Statistic | Method | Value | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|
| Odds Ratio | Mantel-Haenszel | 1.2961 | 0.6486 | 2.59 |
| Logit | 1.2966 | 0.6488 | 2.5912 | |
| RR (Column 1) | Mantel-Haenszel | 1.1304 | 0.8131 | 1.5715 |
| Logit | 1.1228 | 0.8092 | 1.5581 | |
| RR (Column 2) | Mantel-Haenszel | 0.8784 | 0.6207 | 1.2431 |
| Logit | 0.876 | 0.6212 | 1.2355 |
With CMH Option we produced tables for relative risk (RR). From the tables above, the probability of Mass shooting event with mental health related cases is higher than the probability of a Non-Mass shooting.
Homogeneity of the Odds Ratios
| Breslow-Day Test | |
|---|---|
| Chi-Square | 0.0637 |
| Degrees of Freedom | 1 |
| Probability | 0.8007 |
With a large \(p-value=0.8007\) for the Breslow-Day test, we can conclude that race has no significant difference in the Odds Ratios.
Odds Ratios
Relative Risks
Testing for trends within the variable for Age referencing Mass Shooting incidents.
| Age | Groups | ||
|---|---|---|---|
| 1-19 | 20-39 | 40+ | |
| Mass Shooting | 13 | 54 | 47 |
| Non-Mass Shooting | 40 | 62 | 27 |
| Statistic | Value | ASE | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|
| Gamma | 0.5313 | 0.0935 | 0.348 | 0.7146 |
| Kendall’s Tau-b | 0.3373 | 0.0642 | 0.2114 | 0.4631 |
| Stuart’s Tau-c | 0.4111 | 0.0798 | 0.2547 | 0.5675 |
| Somers’ D C|R | 0.4427 | 0.0837 | 0.2786 | 0.6068 |
| Somers’ D R|C | 0.2569 | 0.0499 | 0.1592 | 0.3547 |
| Pearson Correlation | 0.3776 | 0.0714 | 0.2378 | 0.5175 |
| Spearman Correlation | 0.3771 | 0.0718 | 0.2363 | 0.5178 |
| Lambda Asymmetric C|R | 0.125 | 0.0662 | 0 | 0.2547 |
| Lambda Asymmetric R|C | 0.2373 | 0.0837 | 0.0732 | 0.4014 |
| Lambda Symmetric | 0.1604 | 0.0621 | 0.0388 | 0.2821 |
| Uncertainty Coefficient C|R | 0.0515 | 0.0191 | 0.014 | 0.089 |
| Uncertainty Coefficient R|C | 0.1261 | 0.0467 | 0.0346 | 0.2175 |
| Uncertainty Coefficient Symmetric | 0.0731 | 0.0271 | 0.0199 | 0.1262 |
The SAS output shows the expected increasing trend in mass shooting with increase in age (from \(24.53%\) to \(63.51%\)). Where I used the codes; 1=“1-19”, 2=“20-39”, 3=“40+”.
| Statistic (Z) | 4.3248 |
|---|---|
| One-sided Pr > Z | < 0.0001 |
| Two-sided Pr > |Z| | < 0.0001 |
The small left-sided p-values for the Cochran-Armitage test indicate that the probability of the Row 1 level mass shooting decreases as age increases or, equivalently, that the probability of the Row 2 level non mass shooting increases as age increases.
The two-sided p-value tests against either an increasing or decreasing alternative. This is an appropriate hypothesis when you want to determine whether the age has progressive effects on the probability of mass shootings but the direction is unknown.
So, we can conclude that the Cochran-Armitage test supports the trend hypothesis, because the asymptotic 95% confidence limits do not contain one, this indicates a strong positive association. Similarly, the Pearson and Spearman correlation coefficients show evidence of a strong positive association, as hypothesized.
Testing a binary response to type of day (weekday/weekend), race (black/white), mental health status to see if the response for Mass Shooting is effected.
| Shooting Type | Day Type | Race | Mental Health | Count |
|---|---|---|---|---|
| Mass Shooting | Weekday | White | Yes | 36 |
| Mass Shooting | Weekday | White | No | 14 |
| Mass Shooting | Weekday | Black | Yes | 3 |
| Mass Shooting | Weekday | Black | No | 6 |
| Mass Shooting | Weekend | White | Yes | 3 |
| Mass Shooting | Weekend | White | No | 6 |
| Mass Shooting | Weekend | Black | Yes | 3 |
| Mass Shooting | Weekend | Black | No | 3 |
| Non-Mass Shooting | Weekday | White | Yes | 25 |
| Non-Mass Shooting | Weekday | White | No | 15 |
| Non-Mass Shooting | Weekday | Black | Yes | 6 |
| Non-Mass Shooting | Weekday | Black | No | 9 |
| Non-Mass Shooting | Weekend | White | Yes | 1 |
| Non-Mass Shooting | Weekend | White | No | 2 |
| Non-Mass Shooting | Weekend | Black | Yes | 2 |
| Non-Mass Shooting | Weekend | Black | No | 9 |
Fitting the statistic and Hypothesis Testing The likelihood ratio test and the efficient score test for testing the joint significance of the explanatory variables (Day Type, Race, Mental Health Status) are included in the Null Hypothesis \(H_0:\beta=0\) under the full model.
| Test | Chi-Square | Degrees of Freedom | Probability |
|---|---|---|---|
| Likelihood Ratio Test | 9.3397 | 3 | 0.0251 |
| Score Test | 9.1831 | 3 | 0.027 |
| Wald Test | 8.7928 | 3 | 0.0322 |
So with a small \(p-value=0.0270\) for the Score Test, we reject the null hypothesis \(H_0\) that all slope parameters are equal to zero, and at least one predictor is significant.
Parameter Estimates
The Analysis of Maximum Likelihood Estimates list the parameters estimates with the standard error, and Wald Test per parameter.
| Parameter | Degrees of Freedom | Estimate | Standard Error | Wald Chi-Square | Probability | |
|---|---|---|---|---|---|---|
| Intercept | 1 | 0.8523 | 0.4337 | 3.8631 | 0.0494 | |
| Day Type | Weekday | 1 | -0.4597 | 0.4042 | 1.2939 | 0.2553 |
| Race | Black | 1 | -0.9114 | 0.362 | 6.3368 | 0.0118 |
| Mental Health | No | 1 | -0.3509 | 0.3126 | 1.2604 | 0.2616 |
The only parameter with significance is Race with a \(p-value=0.0118\). So, Day Type and Mental Health Status not being significant predictors.
Odds Ratio
| Effect | Point Estimate | Lower 95% Wald CI | Upper 95% Wald CI | |
|---|---|---|---|---|
| Day Type | Weekday vs Weekend | 0.631 | 0.286 | 1.394 |
| Race | Black vs White | 0.402 | 0.198 | 0.817 |
| Mental Health | No vs Yes | 0.704 | 0.382 | 1.299 |
From the table, the odds of a white person being the shooter are higher than a black person. The odds of a mental health person being the shooter is higher than a non-mental health person, but surprisingly mass shootings have a higher odds of occurring on the weekend than during the week according to the Odds Ratio.
Fit of the Model
| Chi-Square | Degrees of Freedom | Probability |
|---|---|---|
| 1.2203 | 6 | 0.9759 |
From the table, the large \(p-value=0.9759\) shows that the model is an adequate fit for this analysis.
Through our series of analysis we have shown the following for the four tests:
1) The relationship between Age and Race, that we could not reject the null hypothesis \(H_0:\beta=0\) due to the large \(p-value\), and concluded that there was a weak association. Also, Pearson \(\chi^2\), likelihood-ratio \(\chi^2\), and Mantel-Haenszel \(\chi^2\) statistics were asymptotically equivalent.
2) On Testing dependence of Shooting Type by Mental Health across Race, we concluded that the association was weak when controlling by Race, and that Race had no significant difference in the Odds Ratios.
3) Testing for trends for Age Range, we concluded that there was a strong positive linear association by the Cochran-Armitage test and the Spearman correlation coefficients. From the analysis, the expected trend increase from \(24.53\)% to \(63.51\)% spanning over the Age Ranges.
4) The Binary Logistic Regression, we rejected the null hypothesis \(H_0: \beta=0\) in favor of the alternative that at least on predictor had significance. Through parameter estimation we determined that only the predictor for Race was significant. Then using Hosmer and Lemeshow Goodness-of-Fit test, we concluded that the model was an adequate fit for the analysis.