Population Change and Crime in Victorian LGAs (2023

---
title: "Population Change and Crime in Victorian LGAs (2023–2025)"
output:
  flexdashboard::flex_dashboard:
    storyboard: true
    theme: cosmo
    orientation: rows
    social: menu
    source_code: embed
---

```{r, include=FALSE}


# Load required packages
library(tidyverse)
library(readxl)
library(janitor)
library(stringr)
library(lubridate)
library(sf)
library(scales)
library(glue)
library(forcats)
library(plotly)
library(leaflet)
library(flexdashboard)
knitr::opts_chunk$set(message = FALSE, warning = FALSE)

# Folder with your data
DATA_DIR  <- "data"
ABS_XLSX  <- file.path(DATA_DIR, "32180DS0002_2023-24.xlsx")
CSA_XLSX  <- file.path(DATA_DIR, "Data_Tables_LGA_Recorded_Offences_Year_Ending_June_2025.xlsx")
LGA_GEO   <- file.path(DATA_DIR, "georef-australia-local-government-area.geojson")

# Helper functions
read_excel_safely <- function(path, sheet, skip = 0){
  suppressMessages(readxl::read_excel(path, sheet = sheet, skip = skip))
}
standardise_lga <- function(x){
  x |> str_to_upper() |> str_replace_all("&","AND") |>
    str_replace_all("\\(|\\)"," ") |> str_replace_all("[^A-Z0-9 ]"," ") |> str_squish()
}

# --- Load ABS population data (Victoria only; uses your actual column names) ---
# The Victoria table starts after 6 header rows
abs_tbl <- readxl::read_excel(ABS_XLSX, sheet = "Table 2", skip = 6)

abs_pop <- abs_tbl |>
  janitor::clean_names() |>
  # Your cleaned column names (from your printout): 
  # "lga_code", "lga_name_a", "no_3", "no_4", "no_5", "percent", "no_7", "no_8", "no_9", "km2", "persons_km2"
  dplyr::rename(
    lga_code = lga_code,
    lga_name = lga_name_a,
    pop_2023 = no_3,
    pop_2024 = no_4
  ) |>
  dplyr::mutate(
    lga_name_std   = standardise_lga(lga_name),
    pop_2023       = suppressWarnings(as.numeric(pop_2023)),
    pop_2024       = suppressWarnings(as.numeric(pop_2024)),
    pop_latest     = pop_2024,
    pop_prev       = pop_2023,
    pop_change     = pop_latest - pop_prev,
    pop_change_pct = dplyr::if_else(!is.na(pop_prev) & pop_prev > 0,
                                    100 * pop_change / pop_prev, NA_real_)
  ) |>
  dplyr::select(lga_name, lga_name_std, pop_2023, pop_2024, pop_latest, pop_prev, pop_change, pop_change_pct)



# --- Load CSA crime data ---
# --- Load CSA crime data (Table 01: LGA totals) ---
csa_tbl01 <- readxl::read_excel(CSA_XLSX, sheet = "Table 01", skip = 0)

csa_raw <- csa_tbl01 |>
  janitor::remove_empty(c("rows","cols")) |>
  janitor::clean_names()

# expected columns after clean_names():
# year, year_ending, police_region, local_government_area, offence_count, rate_per_100_000_population
# but detect robustly in case labels shift
nm <- names(csa_raw)
year_col   <- nm[stringr::str_detect(nm, "^year$|^year_?$")][1]
lga_col    <- nm[stringr::str_detect(nm, "local_government_area|^lga($|_)|.*lga.*")][1]
count_col  <- nm[stringr::str_detect(nm, "offence_count|offences?_count|number")][1]
rate_col   <- nm[stringr::str_detect(nm, "rate.*100_?000")][1]

stopifnot(!is.na(year_col), !is.na(lga_col), !is.na(count_col))

message(paste("CSA columns → year:", year_col, "| LGA:", lga_col, "| count:", count_col, "| rate:", rate_col %||% "none"))

csa_crime <- csa_raw |>
  dplyr::transmute(
    year          = suppressWarnings(as.integer(stringr::str_extract(as.character(.data[[year_col]]), "\\d{4}"))),
    lga_name      = .data[[lga_col]],
    offence_count = suppressWarnings(as.numeric(.data[[count_col]])),
    rate_per_100k = if (!is.na(rate_col)) suppressWarnings(as.numeric(.data[[rate_col]])) else NA_real_
  ) |>
  dplyr::filter(!is.na(lga_name)) |>
  dplyr::mutate(lga_name_std = standardise_lga(lga_name))

# --- Load LGA GeoJSON ---
lga_geo <- read_sf(LGA_GEO) |>
  clean_names() |>
  rename(lga_name_geo = lga_name, lga_code_geo = lga_code) |>
  mutate(lga_name_std = standardise_lga(lga_name_geo)) |>
  filter(str_to_upper(ste_name) %in% c("VICTORIA","VIC","2","02"))

# --- Join datasets ---
crime_pop <- csa_crime |>
  left_join(abs_pop |> select(lga_name_std, pop_latest, pop_prev, pop_change, pop_change_pct),
            by = "lga_name_std") |>
  mutate(
    rate_per_100k = if_else(
      is.na(rate_per_100k) & !is.na(offence_count) & !is.na(pop_latest) & pop_latest > 0,
      (as.numeric(offence_count)/pop_latest)*100000,
      suppressWarnings(as.numeric(rate_per_100k))
    )
  )

crime_pop_geo <- lga_geo |>
  select(lga_name_std, lga_name_geo, lga_code_geo, geometry) |>
  right_join(crime_pop, by = "lga_name_std") |>
  st_as_sf()

latest_year <- max(na.omit(crime_pop$year))
```

## Objective & Overview
Objective & Overview
=======================================================================
This storyboard explores how population changes across Victorian Local Government Areas (LGAs) relate to patterns in recorded criminal offences between 2023 and 2025. The goal is to highlight whether rapid population growth aligns with rising offence counts and to identify areas that show notable differences. By combining ABS population estimates with CSA offence statistics, the visualisations provide an evidence-based view of demographic and social change across Victoria.

## Data & Method
Data & Method
=======================================================================
The analysis combines open data from two official Victorian sources. Population estimates were sourced from the Australian Bureau of Statistics (ABS, 2023–24 regional dataset), while crime statistics were obtained from the Crime Statistics Agency (CSA) for the year ending June 2025. Both datasets were cleaned and standardised in R using tidyverse and janitor packages. Local Government Area (LGA) names were harmonised to enable merging, and totals were calculated to measure year-on-year change. A GeoJSON file containing LGA boundaries was joined to the data to support map-based visualisations. All transformations were reproducible within R Markdown using dplyr, ggplot2, plotly, and leaflet libraries.

## State Overview 
State Overview
=======================================================================
```{r, fig.height=6, fig.width=10}
vic_pop_latest <- sum(abs_pop$pop_latest, na.rm = TRUE)
vic_offences_latest <- sum(crime_pop %>% filter(year == latest_year) %>% pull(offence_count), na.rm = TRUE)
cat(glue::glue("Population (latest): {scales::comma(vic_pop_latest)}  |  Offences (YE {latest_year}): {scales::comma(vic_offences_latest)}"))


p <- crime_pop |>
filter(!is.na(year)) |>
group_by(year) |>
summarise(offences = sum(offence_count, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(year, offences)) +
geom_line() + geom_point() +
scale_y_continuous(labels = comma) +
labs(title = "Recorded Offences in Victoria by Year", x = NULL, y = "Offences")
plotly::ggplotly(p)


```
* Shows total recorded offences across Victoria by year.
* Highlights the recent rise to YE 2025 after a mid-period dip.
* Useful for setting statewide context before LGA comparisons.


## Population Change by LGA
LGA Population change
=======================================================================

```{r, fig.height=9, fig.width=10}
map_df <- crime_pop_geo |> st_drop_geometry() |> distinct(lga_name_std, .keep_all = TRUE)
pal1 <- colorNumeric("Blues", domain = map_df$pop_change_pct)

leaflet(crime_pop_geo) |>
addProviderTiles("CartoDB.Positron") |>
addPolygons(fillColor = ~pal1(pop_change_pct),
color="#444", weight=0.5, fillOpacity=0.8,
label = ~glue("{lga_name_geo}: {round(pop_change_pct,1)}%")) |>
addLegend("bottomright", pal=pal1, values=map_df$pop_change_pct,
title="Population change (%)")

```
* Maps year-over-year population percentage change for each LGA.
* Darker shades indicate faster growth; pale tones indicate slower or negative change.
* Helps spot growth corridors that may experience service or policing pressures.

## Offences by LGA
Offences/LGA
=======================================================================

```{r, fig.height=7, fig.width=10}
latest_geo <- crime_pop_geo |> filter(year == latest_year)
pal2 <- colorNumeric("Reds", domain = latest_geo$offence_count)

leaflet(latest_geo) |>
addProviderTiles("CartoDB.Positron") |>
addPolygons(fillColor = ~pal2(offence_count),
color="#444", weight=0.5, fillOpacity=0.8,
label=~glue("{lga_name_geo}: {comma(offence_count)} offences")) |>
addLegend("bottomright", pal=pal2, values=latest_geo$offence_count,
title=glue("Offences (YE {latest_year})"))

```
* Maps year-over-year population percentage change for each LGA.
* Darker shades indicate faster growth; pale tones indicate slower or negative change.
* Helps spot growth corridors that may experience service or policing pressures.

## Growth vs Rate
Growth vs Rate
=======================================================================

```{r, fig.height=6, fig.width=10}
scatter_df <- crime_pop |> filter(year == latest_year) |> distinct(lga_name_std, .keep_all = TRUE)
p2 <- ggplot(scatter_df, aes(pop_change_pct, rate_per_100k, label=lga_name)) +
geom_point() +
geom_smooth(method="lm", se=FALSE, color="darkblue") +
labs(x="Population change (%)", y=glue("Rate per 100k (YE {latest_year})"),
title="Population Growth vs Crime Rate")
plotly::ggplotly(p2, tooltip=c("label","x","y"))


```
* Each point is an LGA; x = population % change, y = rate per 100k (YE 2025).
* The fitted line summarises the overall association across LGAs.
* Outliers indicate places where crime rates do not align with population growth.

## Top 10 Areas by Offences
Top offences
=======================================================================
```{r, fig.height=5, fig.width=10}
chg_df <- crime_pop %>%
  filter(!stringr::str_detect(stringr::str_to_upper(lga_name), "^TOTAL")) %>%
  group_by(lga_name_std, lga_name) %>%
  filter(year %in% c(latest_year, latest_year - 1)) %>%
  summarise(
    change = sum(offence_count[year == latest_year], na.rm = TRUE) -
             sum(offence_count[year == latest_year - 1], na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(change)) %>%
  slice_head(n = 10) %>%
  mutate(lga_name = forcats::fct_reorder(lga_name, change))

ggplot(chg_df, aes(lga_name, change)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(x = NULL, y = "Change in offences",
       title = "Top 10 LGAs by Increase in Offences (latest vs prior year)",
       caption = glue::glue("Year ending {latest_year} vs {latest_year-1}"))

```

* Ranks LGAs by the change in offences from prior year to YE 2025.
* Focuses attention on areas with the largest recent increases.
* Useful for prioritising follow-up analysis or local policy discussion.

## Notes & References
Notes & references
=======================================================================
* All data used in this storyboard were obtained from publicly available open-data sources.
* Population data were drawn from ABS regional statistics for Victorian LGAs (2023–24).
* Crime data were obtained from the Crime Statistics Agency Victoria for the year ending June 2025.
* Data cleaning, transformation, and merging were completed using R (tidyverse, janitor, sf, plotly, and leaflet).
* The analysis is for educational and illustrative purposes only; figures represent recorded offences and estimated       populations, not causal relationships.
* All visualisations were created and hosted via R Markdown (flexdashboard format).

References
* Australian Bureau of Statistics. (2024). Regional population, 2023–24 [Data set]. Australian Bureau of Statistics. https://www.abs.gov.au/statistics/people/population/regional-population

* Crime Statistics Agency. (2025). Recorded offences by local government area, year ending June 2025 [Data set]. Crime Statistics Agency Victoria. https://www.crimestatistics.vic.gov.au/

* RStudio PBC. (2024). RStudio: Integrated development environment for R [Computer software]. https://posit.co/