Analyzing the pattern between violent crimes

2026-03-04

What is the data/What is being studied?

The data set that is being used is a collection of arrests in the US
We will be using linear regression to the view relationship between the number of murder arrests and assault arrests and if a linear relationship exists betweent the two
If so, one may be a predictor of the other

What are the variables?

We will use assault arrests as the variable that could possibly be used to predict murder arrests

Plot comparing the arrests for murders vs assault

Getting the linear model and equation

\[Y = \beta_0 + \beta_1X+\epsilon\]

Y: Murder arrests/100k residents
X: Assault arrests/100k residents
\(\beta_0\): Baseline murder arrests rate
\(\beta_1\): Rate of murder arrests per assault arrest
\(\epsilon\): Error

## (Intercept)     Assault 
##  0.63168266  0.04190863

The new plot of murders vs assault with linear regression

## `geom_smooth()` using formula = 'y ~ x'

Calculating the t statistic

\[H_0: \beta_1 = 0\] \[H_a: \beta_1 \neq 0\] \[F = \frac{MSR}{MSE}\]

Using this and the values we geet from a summary(), a F-statistic of 86.45 which produces an extremely small p-value which means that the relationship is statistically significant

The code for the final plot that shows the full plotly plot

plot_ly(USArrests, x = ~Assault, y = ~UrbanPop, z = ~Murder,
        type = "scatter3d", mode = "markers",
        marker = list(size = 5, color = ~Murder, colorscale = 'Viridis')) %>%
  layout(scene = list(xaxis = list(title = 'Assault'),
                      yaxis = list(title = 'Urban Pop'),
                      zaxis = list(title = 'Murder')))

What is the data/What is being studied?

What are the variables?

Plot comparing the arrests for murders vs assault

Getting the linear model and equation

The new plot of murders vs assault with linear regression

Calculating the t statistic

The code for the final plot that shows the full plotly plot

The graph of the final 3D plot