Domestic_violence in North Carolina

Author

Nseyo O

OpenAI. (2025, May 14). A digital painting depicting a single red rose, visibly trampled, damaged, and bleeding, lying on a cracked, worn concrete surface. Generated using DALL-E.

Who are the most prevalent Domestic violence victims?

We will be examining this issue using a domestic violence dataset specifically focused on the counties of North Carolina. This analysis aims to uncover patterns and trends related to domestic violence incidents in the region, providing valuable insights for community awareness and interventions.

I discovered this dataset within the links for Statistics and Social Justice projects on our class Google Drive.

This project’s datasets were cleaned and made available by: Uma, RN; Tokuta, Alade; Lowe, Rebecca Zulli; Smith, Adrienne (2021). Domestic Violence NC. figshare.

Dataset. https://doi.org/10.6084/m9.figshare.14552145.v3

The National Institute of Health (NIH) defines domestic abuse as abusive behaviors in which one individual gains power over another individual.

First we load our needed libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(maps)


Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map

library(viridis)

Loading required package: viridisLite

Attaching package: 'viridis'

The following object is masked from 'package:maps':

    unemp

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(scales)


Attaching package: 'scales'

The following object is masked from 'package:viridis':

    viridis_pal

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

My files were hard to combine, so I fed them straight from the source file, making one long tibble organized in ascending years, e.g., 2004 – 2005.

files <- list.files(
  path        = "/Users/oworenibanseyo/Desktop/Data_110_Final_proj",                  
  pattern     = "^cleaned-NC-DV-\\d{4}-\\d{4}\\.csv$",  # Debugged with Chat GPT
  full.names  = TRUE
)


dv_all <- files |>
  set_names() |>                             
  map_dfr(read_csv, .id = "source") |>       
  mutate(
    Period = str_extract(source, "\\d{4}-\\d{4}") # Debugged with Chat GPT
  ) |>
  select(-source)

Rows: 206 Columns: 40
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Year
dbl (38): ID, Num.Calls, Num.Clients, White, Black, Hispanic, Amer.Indian, A...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 305 Columns: 40
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Year
dbl (38): ID, Num.Calls, Num.Clients, White, Black, Hispanic, Amer.Indian, A...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 204 Columns: 40
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Year
dbl (38): ID, Num.Calls, Num.Clients, White, Black, Hispanic, Amer.Indian, A...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 309 Columns: 40
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Year
dbl (38): ID, Num.Calls, Num.Clients, White, Black, Hispanic, Amer.Indian, A...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 208 Columns: 39
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Year
dbl (37): ID, Num.Calls, Num.Clients, White, Black, Hispanic, Amer.Indian, A...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 312 Columns: 44
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, Year
dbl (42): ID, Num.Calls, Num.Clients, White, Black, Hispanic, Amer.Indian, A...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

In total, I have gathered 1,544 observables across 55 variables to analyze and explore further.

For my first question, supported by the given question on the project, can we see over these years which gender has reported or admitted to domestic abuse?

Using pivot_longer to reshape the data while aggregating totals by gender.

dv_gender <- dv_all |>
  pivot_longer(
    cols      = c(Male, Female),
    names_to  = "Gender",
    values_to = "Count"
  ) |>
  group_by(Period, Gender) |>
  summarize(
    Total = sum(Count, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(Period = factor(Period, levels = unique(Period)))


print(dv_gender)

# A tibble: 12 × 3
   Period    Gender  Total
   <fct>     <chr>   <dbl>
 1 2004-2006 Female  86073
 2 2004-2006 Male    12826
 3 2006-2009 Female 117759
 4 2006-2009 Male    23206
 5 2009-2011 Female 107862
 6 2009-2011 Male    19741
 7 2011-2014 Female 137798
 8 2011-2014 Male    26384
 9 2014-2016 Female  87075
10 2014-2016 Male    18176
11 2016-2019 Female 137292
12 2016-2019 Male    26021

Now we plot a simple time series showing the trend of domestic violence reports or admissions.

ggplot(dv_gender, aes(x = Period, y = Total, color = Gender, group = Gender)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  scale_y_continuous(
  labels = label_number(scale = 1/1000, suffix = " K"), # Debugged with Chat GPT
  breaks = pretty_breaks(6)                             # Debugged with Chat GPT
) +
  labs(
    title = "Total Domestic-Violence Victims by Gender (2004–2019)",
    x     = "Two-Year Intervals",
    y     = "Number of Victims",
    caption = "Source: https://ncadmin.nc.gov/about-doa/divisions/council-
for-women/women-statistics. ",
    color = "Gender"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

The results presented are concerning; instead of declining, domestic violence numbers for women appear to be on the rise. From 2014 to 2016, there was a noticeable drop in these numbers; however, it remains unclear whether this was due to a genuine decrease in domestic violence cases or if the data collection methods were not adequately implemented during that period. The significant decline in reported domestic violence incidents among women, followed by a sharp increase, is particularly intriguing, and I am curious about the underlying story behind these trends.

Now we are organizing to obtain the count for each age group.

dv_age <- dv_all |> 
  pivot_longer(
    cols      = starts_with("Age"),
    names_to  = "AgeGroup",
    values_to = "Count"
  )

Drop created grouping and remove missing values in summary.

dv_age_summary <- dv_age |> 
  group_by(AgeGroup) |>
  summarise(
    TotalVictims = sum(Count, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(TotalVictims)

The flipped coordinate bar graph illustrates the total number of domestic violence victims categorized by age groups across the years.

ggplot(dv_age_summary, aes(x = reorder(AgeGroup, TotalVictims), y = TotalVictims)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Total Domestic-Violence Victims by Age Group (2004–2019)",
    x     = "Age Group",
    y     = "Number of Victims",
    caption = "Source: https://ncadmin.nc.gov/about-doa/divisions/council-
for-women/women-statistics. "
  ) +
  theme_minimal(base_size = 12)

Analyzing the results, the highest victim age group is 45-54, which typically represents adults who have been married for some time. Given that the majority of these victims are women, I surmise that most victims come from married households, particularly those who have been married long enough to feel too scared to leave the relationship. Additionally, it is important to note that elderly individuals can also be victims of abuse, potentially by family members or caretakers in group homes for the elderly.

I will now create a correlation matrix to illustrate relationships with strong coefficients.

num_df <- dv_all |>
  select(
    Num.Calls,
    White, Black, Hispanic, Amer.Indian, Asian, Unknown.Race, Other.Race,
    Female, Male,
    Age.Under.25, Age.25.34, Age.35.44, Age.45.54, Age.55.64, Age.65.plus
  )


corr_mat <- cor(num_df, use = "pairwise.complete.obs")

Then convert it to a tibble for easier plotting and display the matrix on a heatmap.

corr_long <- as.data.frame(corr_mat) |>
  rownames_to_column("Var1") |>
  pivot_longer(-Var1, names_to = "Var2", values_to = "Corr")


ggplot(corr_long, aes(x = Var1, y = Var2, fill = Corr)) +
  geom_tile() +
  scale_fill_gradient2(midpoint = 0, low = "blue", high = "red", mid = "white") +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title   = element_blank()
  ) +
  labs(fill = "Pearson\nr",
       title = "Correlation Matrix of DV Counts, Race, Gender & Age",
       caption = "Source: https://ncadmin.nc.gov/about-doa/divisions/council-
for-women/women-statistics. ")

Domestic-violence reporting in North Carolina between 2004–2019 is overwhelmingly driven by female victims, with especially high representation among Black women and adults aged 25–44.

If the trend indicates an increase in domestic violence (DV) reports, can the data from 2004 to 2019 demonstrate a relationship with the number of DV calls?

Let’s create a simple linear model to show association.

dv_all2 <- dv_all |>
  mutate(
    start = as.numeric(str_sub(Year, 1, 4)),
    end   = as.numeric(str_sub(Year, 6, 9)),
    YearMid = (start + end) / 2
  ) |>
  drop_na(Num.Calls, YearMid)


mod1 <- lm(Num.Calls ~ YearMid, data = dv_all2)
summary(mod1)


Call:
lm(formula = Num.Calls ~ YearMid, data = dv_all2)

Residuals:
    Min      1Q  Median      3Q     Max 
-1084.7  -853.6  -592.6    88.8 21574.5 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  9705.546  21872.992   0.444    0.657
YearMid        -4.299     10.874  -0.395    0.693

Residual standard error: 1846 on 1540 degrees of freedom
Multiple R-squared:  0.0001015, Adjusted R-squared:  -0.0005478 
F-statistic: 0.1563 on 1 and 1540 DF,  p-value: 0.6927

The fitted model shows an estimated slope of –4.3, which indicates that, on average, there are about four fewer calls for each half-year increase in mid-period. However, it’s important to note that this decline isn’t statistically significant ( p = 0.693), as it exceeds our alpha = 0.05 threshold. Additionally, the model’s R² is nearly zero (0.000015), implying that “YearMid” explains virtually none of the variability in call volume.

In summary, there’s no clear evidence of a sustained upward or downward trend in domestic-violence calls during the 2004–2019 period.

Now, let’s plot this, as we can still gain some insight from these results.

ggplot(dv_all2, aes(x = YearMid, y = Num.Calls)) +
  geom_jitter(alpha = 0.3) +
  geom_smooth(method = "lm", col = "red", size = 1) +
  theme_minimal(base_size = 12) +
  labs(
    title = "Trend of Total DV Calls Over Time",
    x     = "Mid-Year of Two-Year Window",
    y     = "Number of Calls"
  )

`geom_smooth()` using formula = 'y ~ x'

Data from 2004 to 2019 indicates that domestic violence calls have fluctuated at mid-year points, with notable spikes observed at certain times. However, the overall trend remains stable, with a consistent baseline of approximately 5,000 calls annually throughout this period, suggesting no significant upward or downward movement in the overall volume of reports.

What do we know now?

We have insight into the gender and age groups with the most prevalent domestic violence cases. Now we can put this on a map of North Carolina to see which counties have the highest reports.

Having run out of time to gather coordinates, I use ggplot’s built-in map to pull NC boundary data, and then calculate the average domestic violence reports by county.

dv_avg <- dv_all |>

  mutate(
    Total = Male + Female
  ) |>
  group_by(County) |>           
  summarize(
    avg_reports = mean(Total, na.rm = TRUE)
  ) |>
  ungroup() |>
  mutate(
    county_lower = tolower(County)         
  )


nc_map <- map_data("county") |>
  filter(region == "north carolina") |>
  mutate(county_lower = subregion)         


nc_map2 <- left_join(nc_map, dv_avg, by = "county_lower")

Now we plot out map.

ggplot(nc_map2, aes(long, lat, group = group, fill = avg_reports)) +
  geom_polygon(color = "white", size = 0.15) +
  coord_fixed(1.3) +
  scale_fill_viridis(
    option    = "viridis",
    na.value  = "grey",
    name      = "Avg DV\nreports"
  ) +
  labs(
    title    = "Average Domestic Violence Reports by County (2004–2019)",
    subtitle = "North Carolina",
    x        = NULL, y = NULL,
    caption = "Source: https://ncadmin.nc.gov/about-doa/divisions/council-
for-women/women-statistics. "
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text  = element_blank(),
    axis.ticks = element_blank(),
    panel.grid = element_blank()
  )

A place was created for the missing values of the county that we don’t have for the map, as it could not be left blank.

On this map, we add interactivity to obtain the names of these counties and the average count of domestic violence.

dv_avg <- dv_all |>
  mutate(Total = Male + Female) |>
  group_by(County) |>
  summarize(avg_reports = mean(Total, na.rm = T)) |>
  ungroup() |>
  mutate(county_lower = tolower(County))


nc_map <- map_data("county") |>
  filter(region == "north carolina") |>
  mutate(county_lower = subregion)


nc_map2 <- left_join(nc_map, dv_avg, by = "county_lower")



p <- ggplot(nc_map2, aes(
    x = long, 
    y = lat, 
    group = group,
    fill = avg_reports,
    text = paste0(
      "County: ", County, "\n",
      "Avg DV reports: ", round(avg_reports,1)
    )
  )) +
  geom_polygon(color = "white", size = 0.15) +
  coord_fixed(1.3) +
  scale_fill_viridis(
    option   = "viridis",
    na.value = "grey",
    name     = "Avg DV\nreports"
  ) +
  labs(
    title    = "Average Domestic Violence Reports by County (NC) (2004–2019)",
    subtitle = "North Carolina",
    x        = NULL, 
    y        = NULL,
    caption = "Source: https://ncadmin.nc.gov/about-doa/divisions/council-
for-women/women-statistics. "
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text  = element_blank(),
    axis.ticks = element_blank(),
    panel.grid = element_blank()
  )

ggplotly(p, tooltip = "text")

My subtitle and source have disappeared while making the map interactive, but here by county you can visually see the DV report intensity, and hovering the cursor over it will show the county name and average DV count.

In conclusion, through analyzing these datasets, I was able to gain insights into domestic violence and the reported numbers across various age groups. However, the original dataset did not specify the types of domestic violence, which prevents a clearer understanding of the prevalent forms of abuse. Based on my findings, I can infer that different types of abuse are likely to occur among married couples, most often directed against the female partner.

For more insight, I represented myself as an example in this dataset to see why the count for men is so low. From experience, a good number of men will not report domestic abuse until there are serious implications. I accepted physical abuse endured during a non-married relationship, and authorities were not contacted until physical property was at risk. Some don’t notice abuse, while others, due to how they might be perceived, do not report it. This is also the case for some women. A note is that those who experienced or endured abuse from childhood are less likely to report it. Unfortunately, if we look back at the time series, it seems that these numbers for female abuse have been steadily increasing over the years. I am still yet to find a solution.

Sources:

Domestic violence definition: https://www.ncbi.nlm.nih.gov/books/NBK499891/#:~:text=Definitions,former%20or%20current%20intimate%20partners.
https://stackoverflow.com/questions/65000995/how-can-i-read-multiple-csv-files-into-r-at-once-and-know-which-file-the-data-is
https://stackoverflow.com/questions/59275281/filter-a-correlation-matrix-based-on-value-and-occurrence/59375523#59375523
https://rpubs.com/chidungkt/1250143
https://stackoverflow.com/questions/75088240/choropleth-map-and-subregions
Code debugging with ChatGPT/Open AI/Model 4