Intro:The goal of this analysis is to explore patterns in crime across U.S. states and identify meaningful relationships between variables..

I first start by understanding the structure of the dataset before doing any transformations.

data("USArrests")
head(USArrests)
##            Murder Assault UrbanPop Rape
## Alabama      13.2     236       58 21.2
## Alaska       10.0     263       48 44.5
## Arizona       8.1     294       80 31.0
## Arkansas      8.8     190       50 19.5
## California    9.0     276       91 40.6
## Colorado      7.9     204       78 38.7
summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00

To simplify interpretation, I created a function to group states based on total crime levels.

crime_category <- function(murder, assault, rape) {
  total <- murder + assault + rape
  
  if (total > 200) {
    "High"
  } else if (total > 100) {
    "Medium"
  } else {
    "Low"
  }
}

USArrests$CrimeLevel <- mapply(crime_category,
                              USArrests$Murder,
                              USArrests$Assault,
                              USArrests$Rape)

table(USArrests$CrimeLevel)
## 
##   High    Low Medium 
##     22      9     19

Next, I compared average crime values across the groups to understand how they differ.

crime_summary <- aggregate(cbind(Murder, Assault, Rape) ~ CrimeLevel,
                           data = USArrests,
                           mean)

kable(crime_summary)
CrimeLevel Murder Assault Rape
High 11.727273 251.50000 28.11818
Low 2.855556 60.11111 11.36667
Medium 5.563158 129.68421 17.93158

I then checked relationships between variables to understand how crime types move together.

cor(USArrests[,1:3])
##              Murder   Assault   UrbanPop
## Murder   1.00000000 0.8018733 0.06957262
## Assault  0.80187331 1.0000000 0.25887170
## UrbanPop 0.06957262 0.2588717 1.00000000

I visualized the relationship between assault and murder to identify patterns and clusters across states.

p <- ggplot(USArrests,
            aes(x = Assault,
                y = Murder,
                color = CrimeLevel,
                text = rownames(USArrests))) +
  geom_point(size = 3, alpha = 0.8) +
  labs(title = "Assault vs Murder by State",
       x = "Assault Rate",
       y = "Murder Rate") +
  theme_minimal()

ggplotly(p, tooltip = "text")

I enhanced this with an interactive Plotly version for better exploration.

ggplotly(p)

I also explored whether urban population has any relationship with crime levels.

# Create UrbanGroup as factor
USArrests$UrbanGroup <- factor(
  ifelse(USArrests$UrbanPop > 70, "Urban", "Less Urban"),
  levels = c("Less Urban", "Urban")
)

# Contingency table
table(USArrests$UrbanGroup, USArrests$CrimeLevel)
##             
##              High Low Medium
##   Less Urban   12   8     11
##   Urban        10   1      8
# Proportions
prop.table(table(USArrests$UrbanGroup, USArrests$CrimeLevel), margin = 1)
##             
##                    High        Low     Medium
##   Less Urban 0.38709677 0.25806452 0.35483871
##   Urban      0.52631579 0.05263158 0.42105263

To demonstrate control flow, I used a loop to extract states with high crime levels.

for (i in 1:nrow(USArrests)) {
  if (USArrests$CrimeLevel[i] == "High") {
    cat(rownames(USArrests)[i], "\n")
  }
}
## Alabama 
## Alaska 
## Arizona 
## Arkansas 
## California 
## Colorado 
## Delaware 
## Florida 
## Georgia 
## Illinois 
## Louisiana 
## Maryland 
## Michigan 
## Mississippi 
## Missouri 
## Nevada 
## New Mexico 
## New York 
## North Carolina 
## South Carolina 
## Tennessee 
## Texas

Final Conclusion:

Crime across U.S. states is unevenly distributed, forming clear low, medium, and high crime groups.

Assault is the most dominant factor in explaining overall crime patterns and shows a strong relationship with murder rates.

Urban population alone is not a reliable indicator of crime levels, indicating that crime is influenced by multiple complex factors.