DS Labs HW

Author

Leah Marshall

# Load required packages
library(dslabs)     # Contains the murders dataset
Warning: package 'dslabs' was built under R version 4.5.1
library(tidyverse)  # For data manipulation and ggplot
Warning: package 'ggplot2' was built under R version 4.5.1
Warning: package 'tibble' was built under R version 4.5.1
Warning: package 'purrr' was built under R version 4.5.1
Warning: package 'stringr' was built under R version 4.5.1
Warning: package 'forcats' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)   # For additional themes
# Load the murders data set
data("murders")
# Preview the data set
head(murders)
       state abb region population total
1    Alabama  AL  South    4779736   135
2     Alaska  AK   West     710231    19
3    Arizona  AZ   West    6392017   232
4   Arkansas  AR  South    2915918    93
5 California  CA   West   37253956  1257
6   Colorado  CO   West    5029196    65
summary(murders)
    state               abb                      region     population      
 Length:51          Length:51          Northeast    : 9   Min.   :  563626  
 Class :character   Class :character   South        :17   1st Qu.: 1696962  
 Mode  :character   Mode  :character   North Central:12   Median : 4339367  
                                       West         :13   Mean   : 6075769  
                                                          3rd Qu.: 6636084  
                                                          Max.   :37253956  
     total       
 Min.   :   2.0  
 1st Qu.:  24.5  
 Median :  97.0  
 Mean   : 184.4  
 3rd Qu.: 268.0  
 Max.   :1257.0  
# Calculate the average murder rate for the country
r <- murders |>
  summarize(rate = sum(total) / sum(population) * 1e6) |>
  pull(rate)
# Create scatterplot - Population vs Total Murders
murders |>
  ggplot(aes(x = population / 1e6,
             y = total,
             color = region))+
  geom_point(size = 3, alpha = .7)+
  geom_abline(intercept = log10(r), lty = 2, col = "darkgrey")+
  scale_x_log10("population (millions, log scale)")+
  scale_y_log10("total murders (log scale)")+
  labs(title = "US gun murders in 2010", 
       color = "region")+
  scale_color_manual(values = c(
    "Northeast" = "#b2ffff",
    "South" = "#ffff31",
    "North Central" = "#5595d4",
    "West" = "#e0115f"
  ))+
  theme_minimal(base_size = 14, base_family = "serif")

Essay

For this visualization, I used the murders dataset from the dslabs package, which contains information on gun murders in each U.S. state in 2010, including the state name, abbreviation, total number of murders, population, and geographic region. I created a scatterplot with the population (in millions) on the x-axis and the total number of murders on the y-axis, both on a logarithmic scale to account for the wide range of values and make trends more visible. I calculated the average national murder rate and added it as a reference line using geom_abline(). The points are colored according to the four U.S. regions: Northeast, South, North Central, and West. I applied the theme_minimal() theme with a serif font to make the graph visually cleaner and more professional. This graph differs from the example in the notes because it uses custom region colors, increases point visibility, and clearly labels the axes and title. An interesting insight from the visualization is that states with larger populations tend to have higher total murders, but some smaller states in the South show disproportionately high murder counts relative to their population, highlighting regional differences in gun-related violence.