# Load required packageslibrary(dslabs) # Contains the murders dataset
Warning: package 'dslabs' was built under R version 4.5.1
library(tidyverse) # For data manipulation and ggplot
Warning: package 'ggplot2' was built under R version 4.5.1
Warning: package 'tibble' was built under R version 4.5.1
Warning: package 'purrr' was built under R version 4.5.1
Warning: package 'stringr' was built under R version 4.5.1
Warning: package 'forcats' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes) # For additional themes# Load the murders data setdata("murders")
# Preview the data sethead(murders)
state abb region population total
1 Alabama AL South 4779736 135
2 Alaska AK West 710231 19
3 Arizona AZ West 6392017 232
4 Arkansas AR South 2915918 93
5 California CA West 37253956 1257
6 Colorado CO West 5029196 65
summary(murders)
state abb region population
Length:51 Length:51 Northeast : 9 Min. : 563626
Class :character Class :character South :17 1st Qu.: 1696962
Mode :character Mode :character North Central:12 Median : 4339367
West :13 Mean : 6075769
3rd Qu.: 6636084
Max. :37253956
total
Min. : 2.0
1st Qu.: 24.5
Median : 97.0
Mean : 184.4
3rd Qu.: 268.0
Max. :1257.0
# Calculate the average murder rate for the countryr <- murders |>summarize(rate =sum(total) /sum(population) *1e6) |>pull(rate)
# Create scatterplot - Population vs Total Murdersmurders |>ggplot(aes(x = population /1e6,y = total,color = region))+geom_point(size =3, alpha = .7)+geom_abline(intercept =log10(r), lty =2, col ="darkgrey")+scale_x_log10("population (millions, log scale)")+scale_y_log10("total murders (log scale)")+labs(title ="US gun murders in 2010", color ="region")+scale_color_manual(values =c("Northeast"="#b2ffff","South"="#ffff31","North Central"="#5595d4","West"="#e0115f" ))+theme_minimal(base_size =14, base_family ="serif")
Essay
For this visualization, I used the murders dataset from the dslabs package, which contains information on gun murders in each U.S. state in 2010, including the state name, abbreviation, total number of murders, population, and geographic region. I created a scatterplot with the population (in millions) on the x-axis and the total number of murders on the y-axis, both on a logarithmic scale to account for the wide range of values and make trends more visible. I calculated the average national murder rate and added it as a reference line using geom_abline(). The points are colored according to the four U.S. regions: Northeast, South, North Central, and West. I applied the theme_minimal() theme with a serif font to make the graph visually cleaner and more professional. This graph differs from the example in the notes because it uses custom region colors, increases point visibility, and clearly labels the axes and title. An interesting insight from the visualization is that states with larger populations tend to have higher total murders, but some smaller states in the South show disproportionately high murder counts relative to their population, highlighting regional differences in gun-related violence.