Introduction

I will be modeling linear regression using the built-in R dataset USArrests. My methods will include both simple and multiple regression models.

USArrests consists of 50 observations on 4 variables: Murder, Assault, UrbanPop(ulation), and Rape. The statistics contained within the dataframe are based on arrests made per 100,000 people residing in each US state during 1973.

My purpose is to explore how the urban population percentage affects assault arrest rates, the relationships between assault, rape, and murder arrest rates, and how known assault and rape arrest rates can be used to predict murder arrest rates.

Line Plot: Assault Arrest Rates and Urban Population

Coding Line Plot

#Creating a line plot, x-axis = Urban pop and y-axis = Assault rate
ggplot(USArrests, aes(x = UrbanPop, y = Assault)) +
  
  #Smooth red line and no confidence band
  geom_smooth(method = "loess", se = FALSE, color = "red") +
  
  #Plot points in blue, 0.5 transparency, size 3
  geom_point(color = "blue", alpha = 0.5, size = 3) +
  
  #Add % symbol to x-axis 
  scale_x_continuous(labels = function(x) paste0(x, "%")) +
  
  #Titles and labels
  labs(title = "Assault Arrest Rates vs Urban Population",
    subtitle = "(Per 100,000 arrests made in each US state in 1973)",
    x = "Urban population", y = "Assault")

Scatterplot with Line Regression: Assault, Rape, and Murder Arrest Rates

Simple Regression Model: Rape Arrest Rate Predicted by Assault Arrest Rate

\[ \text{Rape}_i = \beta_0 + \beta_1 \cdot \text{Assault}_i + \epsilon_i \]

  • \(\text{Rape}_i\) = Rape arrest rate by each state
  • \(\text{Assault}_i\) = Assault arrest rate by each state
  • \(\beta_0\) = Intercept
  • \(\beta_1\) = Slope for assault rate
  • \(\epsilon_i\) = Error term

Multiple Regression Model: Assault, Rape, and Murder Arrest Rates

Multiple Regression Model: Murder Arrest Rate Predicted by Assault and Rape Arrest Rates

\[ \text{Murder}_i = \beta_0 + \beta_1 \cdot \text{Assault}_i + \beta_2 \cdot \text{Rape}_i + \epsilon_i \]

  • \(\text{Murder}_i\) = Murder arrest rate by each state
  • \(\text{Assault}_i\) = Assault arrest rate by each state
  • \(\text{Rape}_i\) = Rape arrest rate by each state
  • \(\beta_0\) = Intercept
  • \(\beta_1\) = Slope for assault rate
  • \(\beta_2\) = Slope for rape rate
  • \(\epsilon_i\) = Error term