2026-03-04

What is the data/What is being studied?

  • The data set that is being used is a collection of arrests in the US
  • We will be using linear regression to the view relationship between the number of murder arrests and assault arrests and if a linear relationship exists betweent the two
  • If so, one may be a predictor of the other

What are the variables?

  • We will use assault arrests as the variable that could possibly be used to predict murder arrests

Plot comparing the arrests for murders vs assault

Getting the linear model and equation

\[Y = \beta_0 + \beta_1X+\epsilon\]

  • Y: Murder arrests/100k residents
  • X: Assault arrests/100k residents
  • \(\beta_0\): Baseline murder arrests rate
  • \(\beta_1\): Rate of murder arrests per assault arrest
  • \(\epsilon\): Error
## (Intercept)     Assault 
##  0.63168266  0.04190863

The new plot of murders vs assault with linear regression

## `geom_smooth()` using formula = 'y ~ x'

Calculating the t statistic

\[H_0: \beta_1 = 0\] \[H_a: \beta_1 \neq 0\] \[F = \frac{MSR}{MSE}\]

  • Using this and the values we geet from a summary(), a F-statistic of 86.45 which produces an extremely small p-value which means that the relationship is statistically significant

The code for the final plot that shows the full plotly plot

plot_ly(USArrests, x = ~Assault, y = ~UrbanPop, z = ~Murder,
        type = "scatter3d", mode = "markers",
        marker = list(size = 5, color = ~Murder, colorscale = 'Viridis')) %>%
  layout(scene = list(xaxis = list(title = 'Assault'),
                      yaxis = list(title = 'Urban Pop'),
                      zaxis = list(title = 'Murder')))

The graph of the final 3D plot