Intro:The goal of this analysis is to explore patterns in crime
across U.S. states and identify meaningful relationships between
variables..
To simplify interpretation, I created a function to group states
based on total crime levels.
crime_category <- function(murder, assault, rape) {
total <- murder + assault + rape
if (total > 200) {
"High"
} else if (total > 100) {
"Medium"
} else {
"Low"
}
}
USArrests$CrimeLevel <- mapply(crime_category,
USArrests$Murder,
USArrests$Assault,
USArrests$Rape)
table(USArrests$CrimeLevel)
##
## High Low Medium
## 22 9 19
Next, I compared average crime values across the groups to
understand how they differ.
crime_summary <- aggregate(cbind(Murder, Assault, Rape) ~ CrimeLevel,
data = USArrests,
mean)
kable(crime_summary)
| High |
11.727273 |
251.50000 |
28.11818 |
| Low |
2.855556 |
60.11111 |
11.36667 |
| Medium |
5.563158 |
129.68421 |
17.93158 |
I then checked relationships between variables to understand how
crime types move together.
cor(USArrests[,1:3])
## Murder Assault UrbanPop
## Murder 1.00000000 0.8018733 0.06957262
## Assault 0.80187331 1.0000000 0.25887170
## UrbanPop 0.06957262 0.2588717 1.00000000
I visualized the relationship between assault and murder to identify
patterns and clusters across states.
p <- ggplot(USArrests,
aes(x = Assault,
y = Murder,
color = CrimeLevel,
text = rownames(USArrests))) +
geom_point(size = 3, alpha = 0.8) +
labs(title = "Assault vs Murder by State",
x = "Assault Rate",
y = "Murder Rate") +
theme_minimal()
ggplotly(p, tooltip = "text")
I enhanced this with an interactive Plotly version for better
exploration.
ggplotly(p)
I also explored whether urban population has any relationship with
crime levels.
# Create UrbanGroup as factor
USArrests$UrbanGroup <- factor(
ifelse(USArrests$UrbanPop > 70, "Urban", "Less Urban"),
levels = c("Less Urban", "Urban")
)
# Contingency table
table(USArrests$UrbanGroup, USArrests$CrimeLevel)
##
## High Low Medium
## Less Urban 12 8 11
## Urban 10 1 8
# Proportions
prop.table(table(USArrests$UrbanGroup, USArrests$CrimeLevel), margin = 1)
##
## High Low Medium
## Less Urban 0.38709677 0.25806452 0.35483871
## Urban 0.52631579 0.05263158 0.42105263
Final Conclusion:
Crime across U.S. states is unevenly distributed, forming clear low,
medium, and high crime groups.
Assault is the most dominant factor in explaining overall crime
patterns and shows a strong relationship with murder rates.
Urban population alone is not a reliable indicator of crime levels,
indicating that crime is influenced by multiple complex factors.