This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
library(readr)
crime_data <- read_csv("C:/Users/kevin/Downloads/state_crime.csv")
# Quick look at the structure
glimpse(crime_data)
## Rows: 3,115
## Columns: 21
## $ State <chr> "Alabama", "Alabama", "Alabama", "Alabam…
## $ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966…
## $ Data.Population <dbl> 3266740, 3302000, 3358000, 3347000, 3407…
## $ Data.Rates.Property.All <dbl> 1035.4, 985.5, 1067.0, 1150.9, 1358.7, 1…
## $ Data.Rates.Property.Burglary <dbl> 355.9, 339.3, 349.1, 376.9, 466.6, 473.7…
## $ Data.Rates.Property.Larceny <dbl> 592.1, 569.4, 634.5, 683.4, 784.1, 812.1…
## $ Data.Rates.Property.Motor <dbl> 87.3, 76.8, 83.4, 90.6, 108.0, 106.9, 13…
## $ Data.Rates.Violent.All <dbl> 186.6, 168.5, 157.3, 182.7, 213.1, 199.8…
## $ Data.Rates.Violent.Assault <dbl> 138.1, 128.9, 119.0, 142.1, 163.0, 149.1…
## $ Data.Rates.Violent.Murder <dbl> 12.4, 12.9, 9.4, 10.2, 9.3, 11.4, 10.9, …
## $ Data.Rates.Violent.Rape <dbl> 8.6, 7.6, 6.5, 5.7, 11.7, 10.6, 9.7, 10.…
## $ Data.Rates.Violent.Robbery <dbl> 27.5, 19.1, 22.5, 24.7, 29.1, 28.7, 32.0…
## $ Data.Totals.Property.All <dbl> 33823, 32541, 35829, 38521, 46290, 48215…
## $ Data.Totals.Property.Burglary <dbl> 11626, 11205, 11722, 12614, 15898, 16398…
## $ Data.Totals.Property.Larceny <dbl> 19344, 18801, 21306, 22874, 26713, 28115…
## $ Data.Totals.Property.Motor <dbl> 2853, 2535, 2801, 3033, 3679, 3702, 4606…
## $ Data.Totals.Violent.All <dbl> 6097, 5564, 5283, 6115, 7260, 6916, 8098…
## $ Data.Totals.Violent.Assault <dbl> 4512, 4255, 3995, 4755, 5555, 5162, 6249…
## $ Data.Totals.Violent.Murder <dbl> 406, 427, 316, 340, 316, 395, 384, 415, …
## $ Data.Totals.Violent.Rape <dbl> 281, 252, 218, 192, 397, 367, 341, 371, …
## $ Data.Totals.Violent.Robbery <dbl> 898, 630, 754, 828, 992, 992, 1124, 1167…
names(crime_data)
## [1] "State" "Year"
## [3] "Data.Population" "Data.Rates.Property.All"
## [5] "Data.Rates.Property.Burglary" "Data.Rates.Property.Larceny"
## [7] "Data.Rates.Property.Motor" "Data.Rates.Violent.All"
## [9] "Data.Rates.Violent.Assault" "Data.Rates.Violent.Murder"
## [11] "Data.Rates.Violent.Rape" "Data.Rates.Violent.Robbery"
## [13] "Data.Totals.Property.All" "Data.Totals.Property.Burglary"
## [15] "Data.Totals.Property.Larceny" "Data.Totals.Property.Motor"
## [17] "Data.Totals.Violent.All" "Data.Totals.Violent.Assault"
## [19] "Data.Totals.Violent.Murder" "Data.Totals.Violent.Rape"
## [21] "Data.Totals.Violent.Robbery"
head(crime_data)
## # A tibble: 6 × 21
## State Year Data.Population Data.Rates.Property.All Data.Rates.Property.Bu…¹
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alabama 1960 3266740 1035. 356.
## 2 Alabama 1961 3302000 986. 339.
## 3 Alabama 1962 3358000 1067 349.
## 4 Alabama 1963 3347000 1151. 377.
## 5 Alabama 1964 3407000 1359. 467.
## 6 Alabama 1965 3462000 1393. 474.
## # ℹ abbreviated name: ¹​Data.Rates.Property.Burglary
## # ℹ 16 more variables: Data.Rates.Property.Larceny <dbl>,
## # Data.Rates.Property.Motor <dbl>, Data.Rates.Violent.All <dbl>,
## # Data.Rates.Violent.Assault <dbl>, Data.Rates.Violent.Murder <dbl>,
## # Data.Rates.Violent.Rape <dbl>, Data.Rates.Violent.Robbery <dbl>,
## # Data.Totals.Property.All <dbl>, Data.Totals.Property.Burglary <dbl>,
## # Data.Totals.Property.Larceny <dbl>, Data.Totals.Property.Motor <dbl>, …
states_of_interest <- c("California", "Texas", "Florida", "New York", "Illinois", "United States")
crime_data %>%
filter(State %in% states_of_interest) %>% # unquoted State
ggplot(aes(x = Year, y = `Data.Rates.Violent.All`, color = State)) +
geom_line(size = 1) +
labs(title = "Violent Crime Rate Over Time",
x = "Year", y = "Violent Crime Rate (per 100,000)",
color = "State") +
theme_minimal()
# The line plot shows a dramatic rise in violent crime rates from the 1960s through the early 1990s, followed by a steady decline across all selected states. The national average (United States) follows this same pattern. California and New York experienced particularly high peaks in the early 1990s, while Texas and Florida show slightly lower peaks. This visualization suggests that the factors driving violent crime were national in scope, but their magnitude varied by state. Next steps could include investigating socioeconomic changes, policing strategies, or drug market dynamics that might explain these differences. It also motivates a closer look at the period around 1990 to understand what made some states’ rates spike higher than others.
crime_data %>%
filter(Year == 2019, State != "United States") %>%
ggplot(aes(x = `Data.Rates.Property.All`, y = `Data.Rates.Violent.All`)) +
geom_point(aes(color = State), show.legend = FALSE) +
geom_smooth(method = "lm", se = FALSE, color = "darkred") +
labs(title = "Property vs. Violent Crime Rates (2019)",
x = "Property Crime Rate (per 100,000)",
y = "Violent Crime Rate (per 100,000)") +
theme_minimal()
# The scatter plot reveals a moderate positive correlation between property and violent crime rates – states with higher property crime rates tend to have higher violent crime rates, but the relationship is not extremely tight. The linear trend line suggests that for every increase in property crime rate, violent crime rate also tends to increase, but there is considerable scatter. Outliers are visible: for example, the District of Columbia has a very high violent crime rate relative to its property crime rate, while some states like New Hampshire have low rates for both. This plot encourages deeper investigation: what makes some states deviate from the trend? We could examine factors such as poverty, income inequality, drug enforcement policies, or gun laws to explain the residuals. Outliers might also be candidates for case studies.
crime_data %>%
filter(Year == 2019, State != "United States") %>%
arrange(desc(`Data.Totals.Violent.Murder`)) %>%
slice_head(n = 10) %>%
mutate(State = fct_reorder(State, `Data.Totals.Violent.Murder`)) %>%
ggplot(aes(x = `Data.Totals.Violent.Murder`, y = State)) +
geom_col(fill = "steelblue") +
labs(title = "Top 10 States by Total Murders (2019)",
x = "Number of Murders", y = "") +
theme_minimal()
# California, Texas, and Florida lead in total murders, largely reflecting their large populations. However, states like Illinois and Pennsylvania also appear in the top 10 despite having smaller populations than some other states, suggesting higher murder rates. This plot guides next steps: we should compute murder rates (per capita) to see which states have the most severe murder problems relative to their population. Additionally, we might want to examine trends in these high‑murder states over time to see if the numbers are improving or worsening. The absolute counts also matter for federal and state funding decisions regarding crime prevention and law enforcement.
map_data_crime <- crime_data %>%
filter(`Year` == 2019, State != "United States") %>%
mutate(region = tolower(State)) %>%
select(region, violent_rate = `Data.Rates.Violent.All`)
us_states <- map_data("state")
map_df <- left_join(us_states, map_data_crime, by = "region")
ggplot(map_df, aes(x = long, y = lat, group = group, fill = violent_rate)) +
geom_polygon(color = "white", size = 0.2) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
scale_fill_gradient(low = "lightyellow", high = "darkred", na.value = "grey50") +
labs(title = "Violent Crime Rate by State, 2019",
fill = "Rate per 100,000") +
theme_void() +
theme(legend.position = "bottom")
crime_data %>%
filter(State != "United States") %>%
mutate(Decade = floor(Year / 10) * 10) %>%
ggplot(aes(x = factor(Decade), y = `Data.Rates.Violent.All`)) +
geom_boxplot(fill = "lightblue", outlier.color = "red", outlier.alpha = 0.5) +
labs(title = "Distribution of Violent Crime Rates Across States by Decade",
x = "Decade", y = "Violent Crime Rate (per 100,000)") +
theme_minimal()