knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The data set used in this project contains information on human trafficking arrests recorded across multiple U.S. states and offense categories. The dataset I am using is from data.gov. It can be found here: https://catalog.data.gov/dataset/?tags=human-trafficking.The dataset includes counts of arrests for different offense types, such as commercial sex acts and involuntary servitude, and separates the totals by demographic groups including juvenile male, juvenile female, adult male, and adult female. These variables allow for comparisons between offense types, demographic categories, and geographic locations.

The data was compiled for reporting and monitoring purposes and provides a structured record of arrests related to human trafficking offenses. Because the dataset is organized by state and offense type, it can be used to identify patterns in arrest totals, examine differences across demographic groups, and explore which states report the highest numbers of cases. This type of information can be useful for understanding crime trends, evaluating enforcement activity, and supporting research related to public safety and criminal justice.

For this analysis, the data was cleaned and formatted to ensure that all numeric values could be used in calculations and graphs. Missing state names were filled where necessary, and arrest counts were converted into numeric form so that visualizations could be created. These preparation steps allowed the dataset to be used for graphical analysis, making it possible to explore distributions, compare offense types, and identify states with the highest arrest totals.

The goal of this project is to use graphical visualization to better understand patterns in human trafficking arrests and to determine whether certain offense types, demographic groups, or states show higher reported numbers than others.

library(readxl)
human_trafficking <- read_excel("C:/Users/hls68/OneDrive - Drexel University/human_trafficking.xlsx")
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
# copy dataset
ht <- human_trafficking
# rename columns
colnames(ht) <- c("State", "Offense", "Juvenile_male", "Juvenile_female", "Adult_male", "Adult_female")
# remove title/header rows
ht <- ht[-c(1,2,3,4), ]
# fill missing states and convert to numbers
ht <- ht %>%
  mutate(State = ifelse(State == "NA" | State == "", NA, State)) %>%
  fill(State, .direction = "down") %>%
  mutate(
    Juvenile_male = as.numeric(Juvenile_male),
    Juvenile_female = as.numeric(Juvenile_female),
    Adult_male = as.numeric(Adult_male),
    Adult_female = as.numeric(Adult_female)
  )
head(ht)
## # A tibble: 6 × 6
##   State    Offense         Juvenile_male Juvenile_female Adult_male Adult_female
##   <chr>    <chr>                   <dbl>           <dbl>      <dbl>        <dbl>
## 1 Alabama  Commercial Sex…             0               0         56            1
## 2 Alabama  Involuntary Se…             0               0          5            1
## 3 Arizona  Commercial Sex…             1               0         32            3
## 4 Arizona  Involuntary Se…             0               0          3            1
## 5 Arkansas Commercial Sex…             0               0          4            0
## 6 Arkansas Involuntary Se…             0               0          0            0

This dataset required several cleaning and transformation steps before it could be used for analysis. The original file contained column headers and formatting that needed to be adjusted so the data could be properly read by R. The columns were renamed to clearly identify the variables, including state, offense type, and arrest counts for juvenile male, juvenile female, adult male, and adult female categories. Some rows contained missing or repeated state names, so these values were filled in to ensure each record was correctly associated with its state.

In addition, the arrest count columns were originally read as character values, so they were converted to numeric format in order to perform calculations and create graphs. This step was necessary to allow the data to be summarized, grouped, and visualized correctly. After the cleaning process, the dataset was structured in a way that made it possible to generate histograms, bar charts, and state comparisons. The cleaning steps shown above ensure that the data is consistent and usable for graphical analysis.

ggplot(ht, aes(x = Adult_male)) +
  geom_histogram(binwidth = 5, fill = "gray40", color = "black") +
  labs(
    title = "Distribution of Adult Male Human Trafficking Arrests",
    x = "Number of Arrests",
    y = "Frequency"
  ) +
  theme_minimal()

ggplot(ht, aes(x = Offense)) +
  geom_bar(fill = "gray40", color = "black") +
  labs(
    title = "Human Trafficking Arrests by Offense Type",
    x = "Offense Type",
    y = "Number of Records"
  ) +
  theme_minimal()

top_states <- ht %>%
  group_by(State) %>%
  summarise(total_adult_male = sum(Adult_male, na.rm = TRUE)) %>%
  arrange(desc(total_adult_male)) %>%
  slice(1:15)
ggplot(top_states, aes(x = total_adult_male, y = reorder(State, total_adult_male))) +
  geom_point(size = 3) +
  geom_line(aes(group = 1)) +
  labs(
    title = "Top 15 States: Adult Male Human Trafficking Arrests",
    x = "Number of Arrests",
    y = "State"
  ) +
  theme_minimal()

## R Markdown

Graph 1 — Distribution of Adult Male Human Trafficking Arrests

This graph shows the distribution of adult male arrests for human trafficking offenses across the dataset. Most states have a very low number of arrests, while a small number of states have much higher values. The histogram is heavily skewed to the right, meaning that the majority of states report few arrests, but a few states report significantly higher totals. This suggests that human trafficking arrests are not evenly distributed and may be concentrated in certain states, possibly due to differences in population size, enforcement activity, or reporting practices.

Graph 2 — Human Trafficking Arrests by Offense Type

This bar chart compares the number of arrests between the two offense types in the dataset: commercial sex acts and involuntary servitude. The graph shows that both offense categories have similar totals, indicating that arrests are recorded in both types at comparable levels. This suggests that law enforcement reports both forms of human trafficking, rather than one category being overwhelmingly more common than the other. The graph helps show how arrests are divided by offense classification.

Graph 3 — Top 15 States: Adult Male Human Trafficking Arrests

This graph shows the top fifteen states with the highest number of adult male human trafficking arrests. The states are ordered from lowest to highest so the differences can be seen clearly. California has the highest number of arrests, followed by Texas and Alabama, with the remaining states showing gradually smaller totals. The graph indicates that arrest counts vary widely by state, which may reflect differences in population size, reporting systems, or enforcement priorities. The pattern suggests that a small number of states account for a large portion of the total arrests, rather than the arrests being evenly spread across the country.

Based on this dataset, I came up with these three hypotheses:

Hypothesis 1: Arrest counts will be higher for adult offenders than juvenile offenders. Adult offenders are expected to have higher arrest counts than juvenile offenders based on national crime statistics. Research shows that arrests of individuals under age 18 make up a smaller share of total arrests compared to adults, and youth arrests have declined significantly over time. In recent years, juveniles accounted for only a small percentage of total arrests for serious offenses, meaning that the majority involved adult offenders. Studies also show that arrest rates increase during late adolescence and early adulthood, with individuals in their twenties more likely to be arrested than younger juveniles. These findings support the expectation that adult arrest totals will be higher than juvenile totals in this dataset (Office of Juvenile Justice and Delinquency Prevention, 2022; The Sentencing Project, 2023).

Hypothesis 2: Some states will have significantly more arrests than others. Differences in arrest totals between states are expected because crime statistics are influenced by population size, law enforcement activity, and reporting practices. Larger states typically report higher numbers of arrests simply due to having more residents, while smaller states often show lower totals. In addition, crime data can vary depending on how actively offenses are investigated and recorded by local agencies. National crime reports note that arrest counts are not evenly distributed across the United States and tend to be concentrated in states with larger populations and higher levels of law enforcement activity. Because of these factors, it is reasonable to expect that some states will show significantly higher human trafficking arrest totals than others in this dataset (Federal Bureau of Investigation, 2022; Bureau of Justice Statistics, 2021).

Hypothesis 3: Certain offense types will occur more frequently than others. Differences in arrest totals between offense types are expected because certain crimes are reported and prosecuted more often than others. Crime statistics show that the frequency of arrests can vary depending on how offenses are defined, detected, and enforced by law enforcement agencies. Some types of crimes may be easier to identify or may receive greater enforcement attention, leading to higher recorded arrest counts. In addition, national crime reports note that arrest data often reflects reporting practices and legal classifications rather than the exact number of incidents occurring. Because of these factors, it is reasonable to expect that some human trafficking offense categories will appear more frequently than others in this dataset (Federal Bureau of Investigation, 2022; Bureau of Justice Statistics, 2021).

Conclusion

This analysis examined patterns in human trafficking arrests across different states, offense types, and demographic categories. The graphs showed that arrest totals are not evenly distributed, with adult offenders having higher counts than juveniles and certain states reporting much larger numbers of arrests than others. The visualization also showed that offense types occur at similar but not identical frequencies, which suggests that reporting practices and enforcement priorities may influence the totals. Overall, the data demonstrates that arrest counts vary based on demographic group, location, and offense classification, and these differences can be better understood through graphical visualization.

How did the visualization impact the way you understand the data?

The visualization made it easier to understand patterns that were not obvious when looking at the raw numbers. By creating histograms and bar charts, it became clear that most states have relatively low arrest totals while a few states have much higher counts. The graphs also helped show that adult arrest counts are higher than juvenile counts, which supports the hypothesis that adults make up the majority of arrests. In addition, the offense type graph showed that both categories appear frequently, but the totals are not exactly the same. Seeing the data visually made it easier to compare categories and understand how the values are distributed.

Can you make any predictions based on this analysis?

Based on this analysis, it is possible to make limited predictions about future arrest patterns. Because the graphs show that certain states consistently have higher arrest totals, it is reasonable to predict that those states may continue to report higher numbers in future data. It is also likely that adult offenders will continue to make up the majority of arrests, since national crime statistics show that adults are arrested more often than juveniles. However, predictions should be made carefully because arrest data depends on reporting practices, enforcement activity, and changes in laws, which can affect the totals from year to year.

What was easy about R?

One of the easier parts of using R was creating graphs once the data was cleaned. The ggplot function made it possible to quickly generate histograms and bar charts with labels and titles. After learning the basic structure of the commands, it became easier to modify the graphs and change the variables being displayed. Using R also made it simple to group data by state and calculate totals, which helped create the comparisons needed for the project.

What was difficult about learning these fundamentals?

The most difficult part of learning R was understanding how to clean and format the data before creating graphs. Errors often occurred when the columns were not in the correct format or when missing values were present. It also took time to learn the correct syntax for grouping, summarizing, and plotting the data. Another challenge was figuring out how to label graphs correctly and make sure the output matched the required format. Although the process was difficult at first, practicing the commands made it easier to understand how the different functions work together.

Resources: Bureau of Justice Statistics. (2021). Arrest data analysis tool. U.S. Department of Justice. https://bjs.ojp.gov/data-tools

Federal Bureau of Investigation. (2022). Crime in the United States 2022. U.S. Department of Justice. https://ucr.fbi.gov/crime-in-the-u.s

Office of Juvenile Justice and Delinquency Prevention. (2022). Trends in youth arrests. U.S. Department of Justice. https://ojjdp.ojp.gov/publications/trends-in-youth-arrests.pdf

The Sentencing Project. (2023). Youth justice by the numbers. https://www.sentencingproject.org/policy-brief/youth-justice-by-the-numbers/

U.S. Department of Justice. (2023). Human trafficking statistics and research. https://www.justice.gov/humantrafficking

Federal Bureau of Investigation. (2023). Uniform Crime Reporting program. https://www.fbi.gov/services/cjis/ucr