Exploration on DataSheet

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(readr)
crime_data <- read_csv("C:/Users/kevin/Downloads/state_crime.csv")
# Quick look at the structure
glimpse(crime_data)

## Rows: 3,115
## Columns: 21
## $ State                         <chr> "Alabama", "Alabama", "Alabama", "Alabam…
## $ Year                          <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966…
## $ Data.Population               <dbl> 3266740, 3302000, 3358000, 3347000, 3407…
## $ Data.Rates.Property.All       <dbl> 1035.4, 985.5, 1067.0, 1150.9, 1358.7, 1…
## $ Data.Rates.Property.Burglary  <dbl> 355.9, 339.3, 349.1, 376.9, 466.6, 473.7…
## $ Data.Rates.Property.Larceny   <dbl> 592.1, 569.4, 634.5, 683.4, 784.1, 812.1…
## $ Data.Rates.Property.Motor     <dbl> 87.3, 76.8, 83.4, 90.6, 108.0, 106.9, 13…
## $ Data.Rates.Violent.All        <dbl> 186.6, 168.5, 157.3, 182.7, 213.1, 199.8…
## $ Data.Rates.Violent.Assault    <dbl> 138.1, 128.9, 119.0, 142.1, 163.0, 149.1…
## $ Data.Rates.Violent.Murder     <dbl> 12.4, 12.9, 9.4, 10.2, 9.3, 11.4, 10.9, …
## $ Data.Rates.Violent.Rape       <dbl> 8.6, 7.6, 6.5, 5.7, 11.7, 10.6, 9.7, 10.…
## $ Data.Rates.Violent.Robbery    <dbl> 27.5, 19.1, 22.5, 24.7, 29.1, 28.7, 32.0…
## $ Data.Totals.Property.All      <dbl> 33823, 32541, 35829, 38521, 46290, 48215…
## $ Data.Totals.Property.Burglary <dbl> 11626, 11205, 11722, 12614, 15898, 16398…
## $ Data.Totals.Property.Larceny  <dbl> 19344, 18801, 21306, 22874, 26713, 28115…
## $ Data.Totals.Property.Motor    <dbl> 2853, 2535, 2801, 3033, 3679, 3702, 4606…
## $ Data.Totals.Violent.All       <dbl> 6097, 5564, 5283, 6115, 7260, 6916, 8098…
## $ Data.Totals.Violent.Assault   <dbl> 4512, 4255, 3995, 4755, 5555, 5162, 6249…
## $ Data.Totals.Violent.Murder    <dbl> 406, 427, 316, 340, 316, 395, 384, 415, …
## $ Data.Totals.Violent.Rape      <dbl> 281, 252, 218, 192, 397, 367, 341, 371, …
## $ Data.Totals.Violent.Robbery   <dbl> 898, 630, 754, 828, 992, 992, 1124, 1167…

names(crime_data)

##  [1] "State"                         "Year"                         
##  [3] "Data.Population"               "Data.Rates.Property.All"      
##  [5] "Data.Rates.Property.Burglary"  "Data.Rates.Property.Larceny"  
##  [7] "Data.Rates.Property.Motor"     "Data.Rates.Violent.All"       
##  [9] "Data.Rates.Violent.Assault"    "Data.Rates.Violent.Murder"    
## [11] "Data.Rates.Violent.Rape"       "Data.Rates.Violent.Robbery"   
## [13] "Data.Totals.Property.All"      "Data.Totals.Property.Burglary"
## [15] "Data.Totals.Property.Larceny"  "Data.Totals.Property.Motor"   
## [17] "Data.Totals.Violent.All"       "Data.Totals.Violent.Assault"  
## [19] "Data.Totals.Violent.Murder"    "Data.Totals.Violent.Rape"     
## [21] "Data.Totals.Violent.Robbery"

head(crime_data)

## # A tibble: 6 × 21
##   State    Year Data.Population Data.Rates.Property.All Data.Rates.Property.Bu…¹
##   <chr>   <dbl>           <dbl>                   <dbl>                    <dbl>
## 1 Alabama  1960         3266740                   1035.                     356.
## 2 Alabama  1961         3302000                    986.                     339.
## 3 Alabama  1962         3358000                   1067                      349.
## 4 Alabama  1963         3347000                   1151.                     377.
## 5 Alabama  1964         3407000                   1359.                     467.
## 6 Alabama  1965         3462000                   1393.                     474.
## # ℹ abbreviated name: ¹Data.Rates.Property.Burglary
## # ℹ 16 more variables: Data.Rates.Property.Larceny <dbl>,
## #   Data.Rates.Property.Motor <dbl>, Data.Rates.Violent.All <dbl>,
## #   Data.Rates.Violent.Assault <dbl>, Data.Rates.Violent.Murder <dbl>,
## #   Data.Rates.Violent.Rape <dbl>, Data.Rates.Violent.Robbery <dbl>,
## #   Data.Totals.Property.All <dbl>, Data.Totals.Property.Burglary <dbl>,
## #   Data.Totals.Property.Larceny <dbl>, Data.Totals.Property.Motor <dbl>, …

states_of_interest <- c("California", "Texas", "Florida", "New York", "Illinois", "United States")

crime_data %>%
  filter(State %in% states_of_interest) %>%          # unquoted State
  ggplot(aes(x = Year, y = `Data.Rates.Violent.All`, color = State)) +
  geom_line(size = 1) +
  labs(title = "Violent Crime Rate Over Time",
       x = "Year", y = "Violent Crime Rate (per 100,000)",
       color = "State") +
  theme_minimal()

# The line plot shows a dramatic rise in violent crime rates from the 1960s through the early 1990s, followed by a steady decline across all selected states. The national average (United States) follows this same pattern. California and New York experienced particularly high peaks in the early 1990s, while Texas and Florida show slightly lower peaks. This visualization suggests that the factors driving violent crime were national in scope, but their magnitude varied by state. Next steps could include investigating socioeconomic changes, policing strategies, or drug market dynamics that might explain these differences. It also motivates a closer look at the period around 1990 to understand what made some states’ rates spike higher than others.

crime_data %>%
  filter(Year == 2019, State != "United States") %>%
  ggplot(aes(x = `Data.Rates.Property.All`, y = `Data.Rates.Violent.All`)) +
  geom_point(aes(color = State), show.legend = FALSE) +
  geom_smooth(method = "lm", se = FALSE, color = "darkred") +
  labs(title = "Property vs. Violent Crime Rates (2019)",
       x = "Property Crime Rate (per 100,000)",
       y = "Violent Crime Rate (per 100,000)") +
  theme_minimal()

# The scatter plot reveals a moderate positive correlation between property and violent crime rates – states with higher property crime rates tend to have higher violent crime rates, but the relationship is not extremely tight. The linear trend line suggests that for every increase in property crime rate, violent crime rate also tends to increase, but there is considerable scatter. Outliers are visible: for example, the District of Columbia has a very high violent crime rate relative to its property crime rate, while some states like New Hampshire have low rates for both. This plot encourages deeper investigation: what makes some states deviate from the trend? We could examine factors such as poverty, income inequality, drug enforcement policies, or gun laws to explain the residuals. Outliers might also be candidates for case studies.

crime_data %>%
  filter(Year == 2019, State != "United States") %>%
  arrange(desc(`Data.Totals.Violent.Murder`)) %>%
  slice_head(n = 10) %>%
  mutate(State = fct_reorder(State, `Data.Totals.Violent.Murder`)) %>%
  ggplot(aes(x = `Data.Totals.Violent.Murder`, y = State)) +
  geom_col(fill = "steelblue") +
  labs(title = "Top 10 States by Total Murders (2019)",
       x = "Number of Murders", y = "") +
  theme_minimal()

# California, Texas, and Florida lead in total murders, largely reflecting their large populations. However, states like Illinois and Pennsylvania also appear in the top 10 despite having smaller populations than some other states, suggesting higher murder rates. This plot guides next steps: we should compute murder rates (per capita) to see which states have the most severe murder problems relative to their population. Additionally, we might want to examine trends in these high‑murder states over time to see if the numbers are improving or worsening. The absolute counts also matter for federal and state funding decisions regarding crime prevention and law enforcement.

map_data_crime <- crime_data %>%
  filter(`Year` == 2019, State != "United States") %>%
  mutate(region = tolower(State)) %>%
  select(region, violent_rate = `Data.Rates.Violent.All`)

us_states <- map_data("state")
map_df <- left_join(us_states, map_data_crime, by = "region")

ggplot(map_df, aes(x = long, y = lat, group = group, fill = violent_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = "lightyellow", high = "darkred", na.value = "grey50") +
  labs(title = "Violent Crime Rate by State, 2019",
       fill = "Rate per 100,000") +
  theme_void() +
  theme(legend.position = "bottom")

crime_data %>%
  filter(State != "United States") %>%
  mutate(Decade = floor(Year / 10) * 10) %>%
  ggplot(aes(x = factor(Decade), y = `Data.Rates.Violent.All`)) +
  geom_boxplot(fill = "lightblue", outlier.color = "red", outlier.alpha = 0.5) +
  labs(title = "Distribution of Violent Crime Rates Across States by Decade",
       x = "Decade", y = "Violent Crime Rate (per 100,000)") +
  theme_minimal()

Exploration on DataSheet

2026-02-22

R Markdown

Including Plots