Abstract

According to data from the Gun Violence Archive, a total of 316 mass shooting incidents have occurred as of November 14, 2017. The Federal Bureau of Investigation (F.B.I.)1 defines a “mass killing” as the killing of three or more people in a public place, but they also define a “mass murderer” as someone who has killed four or more people in the same location. The Gun Violence Archive2 lists itself as a non-for-profit organization that documents gun violence and gun crime nationally. In this project, we will utilize shooting data collected from a dataset from the kaggle website for Data Science and Machine Learning3. In the dataset, we will analyze variables for categorical analysis and potential independent outcomes. In this project, a several variables are tested through a series of different categorical analysis techniques.

Introduction

Through the series of analysis we will look at four different testing scenarios. First, we will test the relationship between age groups (adult or teen) and race (white or black) with the Cochran-Mantel-Haenszel (CMH) test for a \(2x2\) contingency table. The second test covers mental health status in response to mass shooting type (mass shooting event or not) controlling for race using the CMH test, along with relative risk, and odds ratios. The third test is for linear association between mass shooting events and age ranges using the Cochran-Armitage Trend test. Then finally, a logistic regression test for mass shooting type, day type (weekday or weekend), race, mental health status. In the logistic regression we will fit the statistic, find significant parameters, and use the Hosmer and Lemeshow goodness-of-fit test to see if the model is adequate for dataset.

Analysis

Relationship between Age and Race

The table is setup with the risk factor for “Age Group” is the row variable, and the response variable for “Race”. The age groups are \(Teens=1-19\) and \(Adults=20\) and above.

Association between Race and Age Group in Shooting Incidents
Age Group Black White Total
Teen 12 31 43
Adult 63 104 167
Total 75 135 210

The results include a contingency . Using a \(\chi^2\) test; where the null hypothesis \(H_0:\beta=0\) or that the two variables are independent. With a summary table from the SAS output.

Statistics for Table of Age Group by Race
Statistic Degrees of Freedom Value Probability
Chi-Square 1 1.4355 0.2309
Likelihood Ratio Chi-Square 1 1.4779 0.2241
Continuity Adj. Chi-Square 1 1.0398 0.3079
Mantel-Haenszel Chi-Square 1 1.4287 0.232
Phi Coefficient -0.0827
Contingency Coefficient 0.0824
Cramer’s V -0.0827

Exact Test

Fisher’s Exact Test
Cell (1,1) Frequency (F) 12
Left-sided Pr <= F 0.1538
Right-sided Pr >= F 0.9174
Table Probability (P) 0.0712
Two-sided Pr <= P 0.2853

Since we observe a large \(p-value=0.2381\); we do not reject the null hypothesis \(H_0:\beta=0\), and conclude that race and age group have a weak association.

Odds Ratio and Relative Risks
Statistic Value Lower 95% CI Upper 95% CI
Odds Ratio 0.639 0.3061 1.3342
RR (Column 1) 0.7398 0.4405 1.2423
RR (Column 2) 1.1576 0.9288 1.4429

Relative Risk
I am very much interested in looking at trends among teens across race lines. The relative risk of a black teen assailant relative to a black adult assailant is \(0.7398\) while the relative risk of a white teen assailant relative to a white adult assailant is \(1.1576\).
It is interesting to note that White teens are more likely to commit mass shooting acts than adult whites. The same cannot be said however for the black group. It shows an opposite trend.

Odds Ratio
I wanted to compare the likelihood of occurrence of mass shooting between groups. I utilized the odds ratio. The odds for black teens the shooter relative to a \(black adults= 0.63910\) This means that black teens are less likely to indulge in mass shooting than black adults. For the White sample, the odds ratio \(\Omega= 1.57\). White teens are more likely to indulge in such incidents than white adults.

The assumption of utilizing the \(\chi^2\) statistic is that the expected cell counts for each cell should exceed 5. Note that the expected values for each combination is big so the assumption is met and conclusions are sound. Otherwise, we should use Fisher’s exact test in the end of the output where \(p-value=0.2853\). From the SAS output, the \(\chi^2\) option is used to calculate the \(\chi^2\) statistics, which includes Pearson \(\chi^2\), likelihood-ratio \(\chi^2\), and Mantel-Haenszel \(\chi^2\). We observe that these statistics are asymptotically equivalent.

Confounding and Interaction in Contingency Tables

Testing dependence of Shooting Type by Mental Health Status across Race.
We collected data on mental health related incidents. With a response variable on shooting type Mass Shootings (MS) versus Non-Mass Shootings (NMS) and controlled by race.

Association between mental health status, race and age in shooting incidents
Race Mental Health Mass Shooting Non-Mass Shooting
White Yes 39 27
No 20 17
Black Yes 6 8
No 9 18

The summary statistics

The summary statistics for Mental Health by Shooting Type controlling for Race
Statistic Alternative Hypothesis Degrees of Freedom Value Probability
1 Nonzero Correlation 1 0.5334 0.46519
2 Row Mean Scores Difference 1 0.5334 0.46519
3 General Association 1 0.5334 0.46519

So, with a \(p-value=0.4652\) we can conclude that the association between mental health status and type of shooting is a weak association for controlling by race.

Controlling for Race = WHITE

The Odds Ratio and Relative Risk table Controlling for Race = WHITE
Statistic Value Lower 95% CI Upper 95% CI
Odds Ratio 1.2278 0.5453 2.7646
RR (Column 1) 1.0932 0.7638 1.5646
RR (Column 2) 0.8904 0.5654 1.4021

Controlling for Race = BLACK

The Odds Ratio and Relative Risk table Controlling for Race = BLACK
Statistic Value Lower 95% CI Upper 95% CI
Odds Ratio 1.5 0.3979 5.564
RR (Column 1) 1.2857 0.574 2.88
RR (Column 2) 0.8571 0.5064 1.4508

Common Odds Ratio and Relative Risks

The Common Odds Ratio and Relative Risks Table
Statistic Method Value Lower 95% CI Upper 95% CI
Odds Ratio Mantel-Haenszel 1.2961 0.6486 2.59
Logit 1.2966 0.6488 2.5912
RR (Column 1) Mantel-Haenszel 1.1304 0.8131 1.5715
Logit 1.1228 0.8092 1.5581
RR (Column 2) Mantel-Haenszel 0.8784 0.6207 1.2431
Logit 0.876 0.6212 1.2355

With CMH Option we produced tables for relative risk (RR). From the tables above, the probability of Mass shooting event with mental health related cases is higher than the probability of a Non-Mass shooting.

Homogeneity of the Odds Ratios

Breslow-Day Test for Homogeneity of the Odds Ratios
Breslow-Day Test
Chi-Square 0.0637
Degrees of Freedom 1
Probability 0.8007

With a large \(p-value=0.8007\) for the Breslow-Day test, we can conclude that race has no significant difference in the Odds Ratios.

Odds Ratios

Odds Ratios

Relative Risks

Relative Risks

Cochran-Armitage Trend Test

Testing for trends within the variable for Age referencing Mass Shooting incidents.

Trend test in Age Range in Shooting Incidents
Age Groups
1-19 20-39 40+
Mass Shooting 13 54 47
Non-Mass Shooting 40 62 27
Statistics for Table of Shooting Type by Age Range
Statistic Value ASE Lower 95% CI Upper 95% CI
Gamma 0.5313 0.0935 0.348 0.7146
Kendall’s Tau-b 0.3373 0.0642 0.2114 0.4631
Stuart’s Tau-c 0.4111 0.0798 0.2547 0.5675
Somers’ D C|R 0.4427 0.0837 0.2786 0.6068
Somers’ D R|C 0.2569 0.0499 0.1592 0.3547
Pearson Correlation 0.3776 0.0714 0.2378 0.5175
Spearman Correlation 0.3771 0.0718 0.2363 0.5178
Lambda Asymmetric C|R 0.125 0.0662 0 0.2547
Lambda Asymmetric R|C 0.2373 0.0837 0.0732 0.4014
Lambda Symmetric 0.1604 0.0621 0.0388 0.2821
Uncertainty Coefficient C|R 0.0515 0.0191 0.014 0.089
Uncertainty Coefficient R|C 0.1261 0.0467 0.0346 0.2175
Uncertainty Coefficient Symmetric 0.0731 0.0271 0.0199 0.1262

The SAS output shows the expected increasing trend in mass shooting with increase in age (from \(24.53%\) to \(63.51%\)). Where I used the codes; 1=“1-19”, 2=“20-39”, 3=“40+”.

Cochran-Armitage Trend Test
Statistic (Z) 4.3248
One-sided Pr > Z < 0.0001
Two-sided Pr > |Z| < 0.0001

The small left-sided p-values for the Cochran-Armitage test indicate that the probability of the Row 1 level mass shooting decreases as age increases or, equivalently, that the probability of the Row 2 level non mass shooting increases as age increases.
The two-sided p-value tests against either an increasing or decreasing alternative. This is an appropriate hypothesis when you want to determine whether the age has progressive effects on the probability of mass shootings but the direction is unknown.

So, we can conclude that the Cochran-Armitage test supports the trend hypothesis, because the asymptotic 95% confidence limits do not contain zero, this indicates a strong positive association. Similarly, the Pearson and Spearman correlation coefficients show evidence of a strong positive association, as hypothesized.

Logistic Regression

Testing a binary response to type of day (weekday/weekend), race (black/white), mental health status to see if the response for Mass Shooting is effected.

Logistic Regression
Shooting Type Day Type Race Mental Health Count
Mass Shooting Weekday White Yes 36
Mass Shooting Weekday White No 14
Mass Shooting Weekday Black Yes 3
Mass Shooting Weekday Black No 6
Mass Shooting Weekend White Yes 3
Mass Shooting Weekend White No 6
Mass Shooting Weekend Black Yes 3
Mass Shooting Weekend Black No 3
Non-Mass Shooting Weekday White Yes 25
Non-Mass Shooting Weekday White No 15
Non-Mass Shooting Weekday Black Yes 6
Non-Mass Shooting Weekday Black No 9
Non-Mass Shooting Weekend White Yes 1
Non-Mass Shooting Weekend White No 2
Non-Mass Shooting Weekend Black Yes 2
Non-Mass Shooting Weekend Black No 9

Fitting the statistic and Hypothesis Testing The likelihood ratio test and the efficient score test for testing the joint significance of the explanatory variables (Day Type, Race, Mental Health Status) are included in the Null Hypothesis \(H_0:\beta=0\) under the full model.

Testing Global Null Hypothesis: BETA=0 from SAS
Test Chi-Square Degrees of Freedom Probability
Likelihood Ratio Test 9.3397 3 0.0251
Score Test 9.1831 3 0.027
Wald Test 8.7928 3 0.0322

So with a small \(p-value=0.0270\) for the Score Test, we reject the null hypothesis \(H_0\) that all slope parameters are equal to zero, and at least one predictor is significant.

Parameter Estimates
The Analysis of Maximum Likelihood Estimates list the parameters estimates with the standard error, and Wald Test per parameter.

Analysis of Maximum Likelihood Estimates
Parameter Degrees of Freedom Estimate Standard Error Wald Chi-Square Probability
Intercept 1 0.8523 0.4337 3.8631 0.0494
Day Type Weekday 1 -0.4597 0.4042 1.2939 0.2553
Race Black 1 -0.9114 0.362 6.3368 0.0118
Mental Health No 1 -0.3509 0.3126 1.2604 0.2616

The only parameter with significance is Race with a \(p-value=0.0118\). So, Day Type and Mental Health Status not being significant predictors.

Odds Ratio

Wald Confidence Interval for Odds Ratios
Effect Point Estimate Lower 95% Wald CI Upper 95% Wald CI
Day Type Weekday vs Weekend 0.631 0.286 1.394
Race Black vs White 0.402 0.198 0.817
Mental Health No vs Yes 0.704 0.382 1.299

From the table, the odds of a white person being the shooter are higher than a black person. The odds of a mental health person being the shooter is higher than a non-mental health person, but surprisingly mass shootings have a higher odds of occurring on the weekend than during the week according to the Odds Ratio.

Fit of the Model

Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square Degrees of Freedom Probability
1.2203 6 0.9759

From the table, the large \(p-value=0.9759\) shows that the model is an adequate fit for this analysis.

Conclusion

Through our series of analysis we have shown the following for the four tests:
1) The relationship between Age and Race, that we could not reject the null hypothesis \(H_0:\beta=0\) due to the large \(p-value\), and concluded that there was a weak association. Also, Pearson \(\chi^2\), likelihood-ratio \(\chi^2\), and Mantel-Haenszel \(\chi^2\) statistics were asymptotically equivalent.
2) On Testing dependence of Shooting Type by Mental Health across Race, we concluded that the association was weak when controlling by Race, and that Race had no significant difference in the Odds Ratios.
3) Testing for trends for Age Range, we concluded that there was a strong positive linear association by the Cochran-Armitage test and the Spearman correlation coefficients.
4) The Binary Logistic Regression, we rejected the null hypothesis \(H_0: \beta=0\) in favor of the alternative that at least on predictor had significance. Through parameter estimation we determined that only the predictor for Race was significant. Then using Hosmer and Lemeshow Goodness-of-Fit test, we concluded that the model was an adequate fit for the analysis.

Reference