# Individual packages (following lab style)
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(stringr)
library(forcats)
library(lubridate)
library(janitor)

# Tables
library(knitr)
library(kableExtra)

# Spatial
library(sf)
library(leaflet)
library(leaflet.extras)

# Interactive and advanced plots
library(plotly)
library(GGally)
library(ggridges)
library(ggforce)
library(ggcorrplot)

# Colour palettes and themes
library(viridis)
library(RColorBrewer)

# Set a consistent ggplot theme
theme_set(
  theme_minimal(base_size = 13) +
    theme(
      plot.title    = element_text(face = "bold", size = 15),
      plot.subtitle = element_text(colour = "grey40"),
      legend.position = "bottom"
    )
)

1 Introduction

Colchester, one of Britain’s oldest towns according to historical records, is a town with a historic town centre, a growing university presence, and suburban residential areas. This research will analyse street-level crime incidents reported by Essex Police within Colchester in 2025, as well as daily weather data collected from a nearby meteorological station. The primary theory on which this research will be based is routine activity theory (Cohen & Felson, 1979), which explains that crime increases when motivated offenders, suitable targets, and the absence of capable guardians come together in time and space.

This research will aim to prove or disprove three hypotheses based on routine activity theory. The first will be the seasonal hypothesis: crime volume will increase as temperature increases on a monthly basis, peaking in summer and decreasing in winter. The second will be the spatial hypothesis: crime will be more prevalent in areas with high foot traffic and mixed land use, such as the town centre, shopping areas, and nightlife areas, as opposed to spreading out evenly across Colchester. The third will be the outcome hypothesis: most reported crime incidents will not have a formal policing outcome due to the difficulty policing high-volume crime with little evidence.

The data on crime is retrieved from the UK Police open data portal (data.police.uk), which publishes monthly street-level crime data with coordinates anonymized and aggregated to the nearest street or landmark. Weather data are retrieved from OGIMET Station 3590, which is nearby Colchester and offers daily summaries for temperature, precipitation, humidity, wind direction, and sunshine hours. Combining these two data sources on a monthly basis allows for a discussion on whether environmental factors correlate with changes in crime activity, while acknowledging that while correlation does not necessarily mean causation, there are limitations to drawing definitive inferences based on a single-year dataset.

The research follows a logical progression from data cleaning and categorical data analysis to static and interactive visualisationss, temporal and spatial analysis, multivariate modelling for weather and crime activity, and concluding discussion on vulnerability, equity, and ethics.

2 Data Preparation

2.1 Crime Data

The crime data is ingested with manual specification of column headers since the original data contains a header row that is overwritten for uniformity. The coordinates are converted to numeric data types, and data points with missing or out-of-range values are dropped. The date column, which is in YYYY-MM format, is converted to a proper Date object by appending ‘-01’. Derived variables for temporal data types include month name, month number, and season. Three new grouping factors are then created: crime_group generalizes crime types into five categories, loc_grp generalizes locations based on presence of keywords in street names, and outcome_grp generalizes policing outcome types into four categories. The data is then deduplicated based on incident identifiers and converted into a spatial sf object for mapping.

# Read crime data using base R read.csv (Lab style)
crime_raw <- read.csv("crime25.csv", header = TRUE)

# Assign cleaner column names
colnames(crime_raw) <- c("row_num", "category", "persistent_id", "date",
                         "lat", "long", "street_id", "street_name",
                         "context", "id", "location_type", "location_subtype",
                         "outcome_status")

crime <- crime_raw %>%
  mutate(
    lat  = as.numeric(lat),
    long = as.numeric(long)
  ) %>%
  # Remove rows with missing coordinates
  filter(!is.na(lat), !is.na(long)) %>%
  # Keep only the broader Colchester area (bounding box)
  filter(lat > 51.85, lat < 51.95, long > 0.84, long < 0.96) %>%
  # Date handling
  mutate(
    date      = paste0(date, "-01"),
    date      = ymd(date),
    month     = month(date, label = TRUE, abbr = TRUE),
    month_num = month(date),
    year      = year(date),
    season    = factor(
      case_when(
        month_num %in% c(12, 1, 2)  ~ "Winter",
        month_num %in% 3:5          ~ "Spring",
        month_num %in% 6:8          ~ "Summer",
        month_num %in% 9:11         ~ "Autumn"
      ),
      levels = c("Spring", "Summer", "Autumn", "Winter")
    )
  )

crime <- crime %>%
  mutate(
    crime_group = factor(case_when(
      category %in% c("violent-crime", "robbery")
        ~ "Violent",
      category %in% c("burglary", "shoplifting", "other-theft",
                       "vehicle-crime", "bicycle-theft",
                       "criminal-damage-arson", "theft-from-the-person")
        ~ "Property",
      category %in% c("public-order", "anti-social-behaviour")
        ~ "Public Order / ASB",
      category %in% c("drugs", "possession-of-weapons")
        ~ "Drugs & Weapons",
      TRUE ~ "Other"
    ), levels = c("Violent", "Property", "Public Order / ASB",
                  "Drugs & Weapons", "Other")),

    loc_grp = factor(case_when(
      str_detect(street_name, regex("Shopping|Supermarket|Market",
                                    ignore_case = TRUE))
        ~ "Retail",
      str_detect(street_name, regex("Nightclub|Club|Bar|Pub",
                                    ignore_case = TRUE))
        ~ "Nightlife",
      str_detect(street_name, regex("Park|Garden|Meadow|Green|Field|Open|Castle Bailey",
                                    ignore_case = TRUE))
        ~ "Open Space",
      str_detect(street_name, regex("Road|Street|Lane|Avenue|Close|Crescent|Drive|Way|Walk|Terrace|Hill|Mews|Rise|Place",
                                    ignore_case = TRUE))
        ~ "Residential Street",
      TRUE ~ "Other"
    ), levels = c("Retail", "Nightlife", "Open Space",
                  "Residential Street", "Other")),

    outcome_grp = factor(case_when(
      is.na(outcome_status) | outcome_status == "NA"
        ~ "No Outcome (ASB)",
      outcome_status %in% c(
        "Investigation complete; no suspect identified",
        "Unable to prosecute suspect",
        "Further investigation is not in the public interest",
        "Status update unavailable")
        ~ "No Further Action",
      outcome_status %in% c(
        "Offender given a caution",
        "Local resolution",
        "Formal action is not in the public interest",
        "Action to be taken by another organisation",
        "Court result unavailable")
        ~ "Formal Action",
      TRUE ~ "Ongoing / Other"
    ), levels = c("No Outcome (ASB)", "No Further Action",
                  "Formal Action", "Ongoing / Other"))
  )

# Deduplicate: keep distinct records by id (the unique incident identifier)
# Many ASB records have blank persistent_id, so we use id as the primary key
crime <- crime %>%
  distinct(id, .keep_all = TRUE)

# Spatial object for mapping
crime_sf <- st_as_sf(crime, coords = c("long", "lat"), crs = 4326)

After cleaning, the crime dataset comprises 5956 incidents spanning 12 months, across 14 original categories collapsed into 5 groups.

2.2 Weather Data

The weather dataset is ingested with its header. The column names are standardized to lowercase snake_case, and key fields are abbreviated to concise, descriptive names. The numerical coercion, filtering to 2025, and computation of month and season fields are performed in an identical manner to those applied to the crime data. Next, a monthly weather summary is created, followed by merging it with the crime data based on month number.

# Read weather data using base R read.csv (Lab style)
weather_raw <- read.csv("temp25.csv", header = TRUE)
weather_raw <- weather_raw %>% clean_names()

weather <- weather_raw %>%
  rename(
    temp_avg   = temperature_c_avg,
    temp_max   = temperature_c_max,
    temp_min   = temperature_c_min,
    humidity   = hr_avg,
    wind_kmh   = windkmh_int,
    pressure   = presslev_hp,
    precip     = precmm,
    sun_hours  = sun_d1h
  ) %>%
  mutate(
    date      = ymd(date),
    across(c(temp_avg, temp_max, temp_min, humidity, wind_kmh,
             pressure, precip, sun_hours, vis_km), as.numeric),
    month_num = month(date),
    month     = month(date, label = TRUE, abbr = TRUE),
    year      = year(date),
    season    = factor(
      case_when(
        month_num %in% c(12, 1, 2)  ~ "Winter",
        month_num %in% 3:5          ~ "Spring",
        month_num %in% 6:8          ~ "Summer",
        month_num %in% 9:11         ~ "Autumn"
      ),
      levels = c("Spring", "Summer", "Autumn", "Winter")
    )
  ) %>%
  filter(year == 2025)

weather_mo <- weather %>%
  group_by(month_num, month, season) %>%
  summarise(
    temp_avg_mo   = mean(temp_avg, na.rm = TRUE),
    precip_mo     = sum(precip, na.rm = TRUE),
    humidity_mo   = mean(humidity, na.rm = TRUE),
    wind_mo       = mean(wind_kmh, na.rm = TRUE),
    sun_hours_mo  = sum(sun_hours, na.rm = TRUE),
    .groups       = "drop"
  )

crime_monthly <- crime %>%
  count(month_num, month, season, name = "n_crimes") %>%
  left_join(weather_mo, by = c("month_num", "month", "season"))

# Also count by group per month (useful later)
crime_group_monthly <- crime %>%
  count(month_num, month, season, crime_group, name = "n_crimes") %>%
  left_join(weather_mo, by = c("month_num", "month", "season"))

# Pre-compute key statistics for inline references in the narrative
peak_month     <- crime_monthly %>% slice_max(n_crimes, n = 1)
low_month      <- crime_monthly %>% slice_min(n_crimes, n = 1)
total_crimes   <- nrow(crime)

# Outcome proportions
nfa_count <- sum(crime$outcome_grp == "No Further Action")
nfa_pct   <- round(nfa_count / total_crimes * 100, 1)
asb_count <- sum(crime$outcome_grp == "No Outcome (ASB)")
asb_pct   <- round(asb_count / total_crimes * 100, 1)
formal_count <- sum(crime$outcome_grp == "Formal Action")
formal_pct   <- round(formal_count / total_crimes * 100, 1)

# Violent crime total
violent_n   <- sum(crime$crime_group == "Violent")
violent_pct <- round(violent_n / total_crimes * 100, 1)
property_n  <- sum(crime$crime_group == "Property")

# Correlation coefficients (temperature and precipitation vs crime)
r_temp   <- round(cor(crime_monthly$temp_avg_mo, crime_monthly$n_crimes, use = "complete.obs"), 2)
r_precip <- round(cor(crime_monthly$precip_mo, crime_monthly$n_crimes, use = "complete.obs"), 2)
r_sun    <- round(cor(crime_monthly$sun_hours_mo, crime_monthly$n_crimes, use = "complete.obs"), 2)
r_humid  <- round(cor(crime_monthly$humidity_mo, crime_monthly$n_crimes, use = "complete.obs"), 2)

The weather dataset covers 365 daily records for 2025. The merged monthly table contains 12 rows — one per calendar month — linking aggregate crime counts with mean temperature, total precipitation, mean humidity, mean wind speed, and total sun hours.

3 Categorical Analysis

In this section, the cross-tabulations between crime types and their locations, temporal periods, and policing outcomes will be analysed. Three two-way tables will be provided, each with a unique view regarding the distribution of crime in Colchester.

3.1 Crime Group by Month

# Base R frequency table (Lab 3 style)
tab_base <- table(crime$crime_group, crime$month)
tab_prop <- prop.table(tab_base, margin = 1)   # row proportions

# Tidied cross-tabulation for display
tab_gm <- crime %>%
  count(crime_group, month) %>%
  pivot_wider(names_from = month, values_from = n, values_fill = 0) %>%
  mutate(Total = rowSums(across(where(is.numeric))))

tab_gm %>%
  kable(caption = "Table 1: Crime incidents by group and month (2025)",
        format.args = list(big.mark = ",")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white")

Table 1: Crime incidents by group and month (2025)
crime_group	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sept	Oct	Nov	Dec	Total
Violent	166	188	179	191	244	232	224	244	184	252	214	201	2,519
Property	143	171	158	186	185	188	183	192	165	193	148	144	2,056
Public Order / ASB	65	85	86	106	131	85	84	95	101	64	88	52	1,042
Drugs & Weapons	24	16	20	22	28	22	10	23	18	15	23	29	250
Other	10	5	4	13	10	4	9	8	4	8	4	10	89

As can be seen in Table 1 above, the largest proportion of crime in each month is that of Violent Crime, with significant spikes in the summer months. Property Crime is the second largest type of crime recorded in the city, with Public Order/ASB crime displaying a steady trend with spikes in the summer months. The rest of the categories include Drugs & Weapons and Other.

3.2 Crime Group by Location

# Base R frequency table (Lab 3 style)
tab_loc_base <- table(crime$crime_group, crime$loc_grp)
tab_loc_prop <- prop.table(tab_loc_base, margin = 1)

# Tidied cross-tabulation for display
tab_gl <- crime %>%
  count(crime_group, loc_grp) %>%
  pivot_wider(names_from = loc_grp, values_from = n, values_fill = 0) %>%
  mutate(Total = rowSums(across(where(is.numeric))))

tab_gl %>%
  kable(caption = "Table 2: Crime incidents by group and location type",
        format.args = list(big.mark = ",")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white")

Table 2: Crime incidents by group and location type
crime_group	Retail	Nightlife	Open Space	Residential Street	Other	Total
Violent	178	117	222	1,554	448	2,519
Property	608	60	125	1,018	245	2,056
Public Order / ASB	128	43	76	621	174	1,042
Drugs & Weapons	19	6	27	132	66	250
Other	7	4	9	52	17	89

Table 2 also shows that the highest volume of crime in total occurs in the Residential Streets, which corresponds to the spatial distribution of the population. Retail areas emerge as a notable second hotspot, in particular for Property crimes such as shoplifting and theft. Nightlife areas emerge as having a disproportionate volume of Violent crime and Public Order/ASB in relation to their geographic footprint.

3.3 Outcome by Crime Group

# Base R frequency table (Lab 3 style)
tab_out_base <- table(crime$outcome_grp, crime$crime_group)
tab_out_prop <- prop.table(tab_out_base, margin = 2)  # column proportions

# Tidied cross-tabulation for display
tab_og <- crime %>%
  count(outcome_grp, crime_group) %>%
  pivot_wider(names_from = crime_group, values_from = n, values_fill = 0) %>%
  mutate(Total = rowSums(across(where(is.numeric))))

tab_og %>%
  kable(caption = "Table 3: Policing outcomes by crime group",
        format.args = list(big.mark = ",")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white")

Table 3: Policing outcomes by crime group
outcome_grp	Public Order / ASB	Violent	Property	Drugs & Weapons	Other	Total
No Outcome (ASB)	590	0	0	0	0	590
No Further Action	328	1,924	1,667	45	44	4,008
Formal Action	70	270	191	148	23	702
Ongoing / Other	54	325	198	57	22	656

As can be seen from Table 3 above, the majority of incidents falling under the outcome group ‘No Further Action’, which accounts for 4,008 incidents or (67.3%) of the total crime incidents in the city, far outstrip the next leading outcome group. Within the Violent crime group alone, 1,924 of the incidents end in this manner. This suggests that a large number of victims of violent crime in the city do not receive closure. The 590 incidents falling under the outcome group ‘No Outcome (ASB)’ refer to anti-social behavior incidents, which by the nature of the offense, do not receive police attention. The 702 incidents falling under the ‘Formal Action’ outcome group refer to cautions, court results, and local resolutions. The ‘Ongoing / Other’ group refers to incidents which were still under investigation at the point of data extraction.

4 Visualisations

This section will show a series of static plots that will explore the crime volume, the distribution of crime types, and the behavior of the weather variables over the seasons. Each plot is carefully created to highlight different trends or contrasts.

4.1 Bar Chart: Total Crimes by Group

# Step-by-step ggplot object building (Lab 6/7 style)
crime_counts <- crime %>% count(crime_group)

g <- ggplot(crime_counts, aes(x = reorder(crime_group, -n), y = n, fill = crime_group))
g1 <- g + geom_col(width = 0.7, show.legend = FALSE)
g2 <- g1 + geom_text(aes(label = format(n, big.mark = ",")),
                     vjust = -0.5, size = 4, fontface = "bold")
g3 <- g2 + scale_fill_manual(values = c("Violent" = "#E74C3C",
                                         "Property" = "#3498DB",
                                         "Public Order / ASB" = "#2ECC71",
                                         "Drugs & Weapons" = "#9B59B6",
                                         "Other" = "#E67E22"))
g4 <- g3 + labs(title    = "Total Crime Incidents by Group",
                subtitle = "Colchester, 2025",
                x = NULL, y = "Number of incidents")
g5 <- g4 + theme(axis.text.x = element_text(size = 11))
g5

Figure 1: Total crime counts by group (2025)

Figure 1 confirms that Violent crime dominates the dataset with 2,519 incidents (42.3% of the total), followed by Property crime (2,056). Public Order / ASB ranks third, while Drugs & Weapons and Other crime types account for a combined 5.7% of all incidents. This hierarchy is consistent with national patterns for a mid-size English town, where violence against the person has been the single largest recorded category in recent years.

It is evident from Figure 1 that the majority of the data set is comprised of Violent crime, which comprises a total of 2,519 incidents or (42.3%) of the total crime. This is then followed by Property crime, which comprises a total of r format(2,056) incidents. Third is Public Order / ASB, and Drugs & Weapons and Other crime types make up a combined figure of 5.7% of all crime. This follows the trend established throughout the country regarding a mid-sized English town, where violence against the person has been the single largest category of crime over the past few years.

4.2 Dot Plot: Monthly Crime Counts

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_monthly, aes(x = month, y = n_crimes))
g1 <- g + geom_segment(aes(xend = month, yend = 0), colour = "grey60", linewidth = 0.4)
g2 <- g1 + geom_point(size = 4, colour = "#2C3E50")
g3 <- g2 + labs(title    = "Monthly Crime Volume",
                subtitle = "Colchester, 2025",
                x = NULL, y = "Number of incidents")
g3

Figure 2: Total incidents per month (2025)

Figure 2 shows the Cleveland dot plot for the total crime count by month. The maximum count of crime incidents occurs in May, with 598 incidents. On the contrary, Jan has the minimum count of crime incidents with 408 incidents. The gap between the maximum and minimum count of crime incidents in the city reflects the significant variation in crime incidents. This supports the routine activity theory because the warmer months of the year increase the chances of specific categories of crime.

4.3 Pie Chart: Outcome Proportions

outcome_counts <- crime %>%
  count(outcome_grp) %>%
  mutate(
    pct   = n / sum(n) * 100,
    label = paste0(outcome_grp, "\n", round(pct, 1), "%")
  )

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(outcome_counts, aes(x = "", y = n, fill = outcome_grp))
g1 <- g + geom_col(width = 1, colour = "white")
g2 <- g1 + coord_polar(theta = "y")
g3 <- g2 + geom_text(aes(label = label),
                     position = position_stack(vjust = 0.5), size = 3.5)
g4 <- g3 + scale_fill_brewer(palette = "Pastel1")
g5 <- g4 + labs(title = "Distribution of Policing Outcomes", fill = "Outcome")
g6 <- g5 + theme_void(base_size = 13) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 15))
g6

Figure 3: Policing outcome distribution

As depicted in Figure 3, the distribution by outcome group shows a remarkable trend. “No Further Action” makes up 67.3% of the total number of incidents (4,008 cases). This demonstrates that the majority of reported crimes in Colchester have been resolved without a suspect being identified or prosecuted. When “No Outcome (ASB)” cases, which by definition have no policing outcome, are factored in at 9.9%, the total percentage of unresolved cases comes out to be 77.2%. Only 11.8% of the total number of incidents end in “Formal Action” such as cautions, court proceedings, or local proceedings. These statistics raise many questions about the confidence levels of the victim in the reporting process and the system’s capability to generate tangible results for street-level offenses.

4.4 Histogram: Average Daily Crime per Month

crime_monthly <- crime_monthly %>%
  mutate(
    days_in_month = days_in_month(ymd(paste0("2025-", month_num, "-01"))),
    avg_daily     = n_crimes / days_in_month
  )

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_monthly, aes(x = avg_daily))
g1 <- g + geom_histogram(bins = 8, fill = "#3498DB", colour = "white", alpha = 0.85)
g2 <- g1 + labs(title    = "Distribution of Average Daily Crime Rate",
                subtitle = "Each bar represents a range of daily averages across months",
                x = "Average daily incidents", y = "Number of months")
g2

Figure 4: Distribution of average daily crime rate across months

Figure 4 shows a histogram for the mean daily crime rate calculated for each month (by dividing the monthly total by the number of days in the month). This reduces bias due to varying month lengths. The range for the daily crime rate extends from 13.2 to 19.3 per day, showing that even after adjusting for month length, the more active months are still substantially more active than the less active ones.

4.5 Density Plots: Temperature by Season

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(weather, aes(x = temp_avg, fill = season))
g1 <- g + geom_density(alpha = 0.55)
g2 <- g1 + scale_fill_manual(values = c("Spring" = "#2ECC71", "Summer" = "#E74C3C",
                                         "Autumn" = "#E67E22", "Winter" = "#3498DB"))
g3 <- g2 + labs(title    = "Daily Average Temperature Distribution by Season",
                subtitle = "Colchester weather station 3590, 2025",
                x = "Average temperature (°C)", y = "Density", fill = "Season")
g3

Figure 5: Distribution of average daily temperature by season

As presented in Figure 5, the overlain density curves provide a clear view of the distribution of daily mean temperatures for a respective season. As expected, the clear gap between Winter and Summer confirms the expected gradient in Colchester, while the overlapping distribution for Spring and Autumn reflects the transitional nature of the seasons. These temperature distributions will be useful in understanding the seasonality of crimes, which will be discussed later in the report.

4.6 Box and Violin Plots: Monthly Crime by Group

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_group_monthly, aes(x = crime_group, y = n_crimes, fill = crime_group))
g1 <- g + geom_violin(alpha = 0.3, show.legend = FALSE)
g2 <- g1 + geom_boxplot(width = 0.25, outlier.shape = 21, show.legend = FALSE)
g3 <- g2 + stat_summary(fun = median, geom = "point", shape = 18,
                        size = 3, colour = "red", show.legend = FALSE)
g4 <- g3 + scale_fill_manual(values = c("Violent" = "#E74C3C",
                                         "Property" = "#3498DB",
                                         "Public Order / ASB" = "#2ECC71",
                                         "Drugs & Weapons" = "#9B59B6",
                                         "Other" = "#E67E22"))
g5 <- g4 + labs(title    = "Monthly Crime Counts by Group",
                subtitle = "Violin + box plot (red diamond = median)",
                x = NULL, y = "Monthly incidents")
g6 <- g5 + theme(axis.text.x = element_text(angle = 15, hjust = 1))
g6

Figure 6: Monthly crime distribution by group (box + violin)

Figure 6 combines the violin plots and box plots for a better representation of both the form and the summary statistic of the distribution. Violent crimes have the highest median and highest dispersion, meaning it has the highest month-to-month variation compared to the rest. Property crimes have the next highest variation after violent crimes. Drugs & Weapons and Other have the least variation in terms of the violins, but they have the lowest medians.

4.7 Sina Plot: Location Groups Across Months

set.seed(42)
crime_sample <- crime %>% sample_n(min(2000, nrow(crime)))

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_sample, aes(x = loc_grp, y = month_num, colour = loc_grp))
g1 <- g + geom_sina(alpha = 0.5, size = 1.5, show.legend = FALSE)
g2 <- g1 + scale_y_continuous(breaks = 1:12, labels = month.abb[1:12])
g3 <- g2 + scale_colour_brewer(palette = "Dark2")
g4 <- g3 + labs(title    = "Incident Distribution by Location Group Across Months",
                subtitle = "Sina plot (random sample for readability)",
                x = NULL, y = "Month")
g5 <- g4 + theme(axis.text.x = element_text(angle = 15, hjust = 1))
g5

Figure 7: Incident distribution by location group across months

Figure 7 displays the distribution of the incidents over the months using a sina plot, a jittered strip chart. For the Retail and Residential Street location group, the activity levels are consistent throughout the year. This demonstrates the consistent usage and function of the location group as a hub of activity. For the Nightlife location group, the density levels in the plot are higher in the months when the weather is warmer and during the university term time. For the Open Space location group, the density levels in the plot are higher in the summer months.

5 Temporal and Spatial Analysis

This section discusses the temporal evolution of crime and geographical concentrations. Static and interactive time series plots are used to analyse crime trends and seasonality, while maps are used to display the spatial distribution of crime in Colchester.

5.1 Static Time Series with LOESS Smoother

ts_plot_data <- crime_monthly %>%
  mutate(date = ymd(paste0("2025-", month_num, "-01")))

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(ts_plot_data, aes(x = date, y = n_crimes))
g1 <- g + geom_line(colour = "#2C3E50", linewidth = 1)
g2 <- g1 + geom_point(size = 4, colour = "#2C3E50")
g3 <- g2 + geom_text(aes(label = n_crimes), vjust = -1, size = 3.5)
g4 <- g3 + geom_smooth(method = "loess", se = TRUE, colour = "#E74C3C",
                       fill = "#E74C3C", alpha = 0.15, linewidth = 1)
g5 <- g4 + scale_x_date(date_labels = "%b", date_breaks = "1 month")
g6 <- g5 + expand_limits(y = 0)
g7 <- g6 + labs(title    = "Monthly Crime Trend with LOESS Smoother",
                subtitle = "Colchester, 2025",
                x = NULL, y = "Number of incidents")
g7

Figure 8: Monthly crime trend with LOESS smoother

Figure 8 shows the total monthly crime counts with the LOESS curve superimposed. This curve shows the trend of increased crime from early spring through mid-year, peaking at the month indicated by May (598 incidents). It then shows a decline through autumn and winter. Bottoming out at the month indicated by Jan (408) is consistent with the cycle of temperatures discussed in Section 4, as well as the expectations of routine activity theory, which predicts increased criminal activity as the hours of daylight and temperatures increase, causing more people to be outside, thereby increasing the convergence of potential offenders and victims.

5.2 Interactive Time Series: Crime and Temperature

The interactive plot depicted below utilizes the Plotly library to plot the monthly crime counts and the average temperature. This is a dual-axis plot. Hovering over the plotted points will reveal the values. This is one of the Level 7 interactive components in the report.

ts_data <- crime_monthly %>%
  mutate(date = ymd(paste0("2025-", month_num, "-01")))

# Compute rescaling factor outside aes to avoid issues
scale_factor <- max(ts_data$n_crimes, na.rm = TRUE) /
                max(ts_data$temp_avg_mo, na.rm = TRUE)

ts_data <- ts_data %>%
  mutate(temp_rescaled = temp_avg_mo * scale_factor)

# Build ggplot object, then wrap with ggplotly (Lab 10 style)
G <- ggplot(ts_data, aes(x = date))
G <- G + geom_line(aes(y = n_crimes, colour = "Crime count"), linewidth = 0.8)
G <- G + geom_point(aes(y = n_crimes, colour = "Crime count"), size = 3)
G <- G + geom_line(aes(y = temp_rescaled, colour = "Avg temperature"),
                   linewidth = 0.8, linetype = "dashed")
G <- G + geom_point(aes(y = temp_rescaled, colour = "Avg temperature"), size = 3)
G <- G + scale_colour_manual(values = c("Crime count" = "#2C3E50",
                                         "Avg temperature" = "#E74C3C"))
G <- G + scale_x_date(date_labels = "%b", date_breaks = "1 month")
G <- G + labs(title = "Monthly Crime Count vs Average Temperature",
              x = NULL, y = "Crime count (temperature rescaled)", colour = NULL)

ggplotly(G)

Figure 9: Interactive dual-axis — crime counts and temperature

The interactive dual-axis chart illustrates the degree of correlation between the monthly crime figures and the average temperatures. Where there is a concurrent rise and fall in both series, it suggests a common cause, presumably increased outdoor activity and social interaction in the warmer temperatures. It should be noted, however, that correlation does not imply causation.

5.3 Seasonal Heatmap

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_group_monthly, aes(x = month, y = crime_group, fill = n_crimes))
g1 <- g + geom_tile(colour = "white", linewidth = 0.8)
g2 <- g1 + scale_fill_viridis_c(option = "magma", direction = -1)
g3 <- g2 + labs(title    = "Crime Intensity Heatmap",
                subtitle = "Crime group \u00d7 month (darker = more incidents)",
                x = NULL, y = NULL, fill = "Incidents")
g4 <- g3 + theme(axis.text.x = element_text(angle = 45, hjust = 1))
g4

Figure 10: Seasonal heatmap of crime group by month

Figure 10 shows a tile heatmap where the intensity of the colour represents the number of incidents. This allows easy scanning across the months for a particular type of crime and vice versa. Violent crime has been depicted as the darkest row throughout the figure, and a horizontal gradient effect has been achieved for most groups.

5.4 Ridgeline Plot: Crime Types Across Months

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime, aes(x = month_num, y = crime_group, fill = after_stat(x)))
g1 <- g + geom_density_ridges_gradient(scale = 2.5, rel_min_height = 0.01,
                                       show.legend = FALSE)
g2 <- g1 + scale_fill_viridis_c(option = "plasma")
g3 <- g2 + scale_x_continuous(breaks = 1:12, labels = month.abb[1:12])
g4 <- g3 + labs(title    = "Temporal Distribution of Crime Types",
                subtitle = "Ridgeline density (wider = higher concentration in that month)",
                x = "Month", y = NULL)
g4

Figure 11: Ridgeline plot of crime types across months

Figure 11 shows ridgeline plots or joy plots for visualizing the distribution of all crime types for the months. This plot is different from the heatmap as it focuses on the distribution instead of absolute frequencies. The ridgeline plot for Violent crime shows a significant increase in width between months 5 and 8, again supporting the peak in violent crime for late spring and summer. Property crime shows a relatively consistent ridgeline plot with a small peak in the middle of the year, while Drugs & Weapons shows little change at all.

5.5 Static Spatial Map

# Aggregate by rounded coordinates for point-size scaling
crime_agg <- crime %>%
  mutate(
    lat_r  = round(lat, 3),
    long_r = round(long, 3)
  ) %>%
  count(lat_r, long_r, crime_group, name = "count")

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_agg, aes(x = long_r, y = lat_r,
                            colour = crime_group, size = count))
g1 <- g + geom_point(alpha = 0.55)
g2 <- g1 + scale_colour_manual(values = c("Violent" = "#E74C3C",
                                           "Property" = "#3498DB",
                                           "Public Order / ASB" = "#2ECC71",
                                           "Drugs & Weapons" = "#9B59B6",
                                           "Other" = "#E67E22"))
g3 <- g2 + scale_size_continuous(range = c(1, 8))
g4 <- g3 + labs(title    = "Spatial Distribution of Crime Incidents",
                subtitle = "Point size reflects local incident count at aggregated coordinates",
                x = "Longitude", y = "Latitude",
                colour = "Crime group", size = "Count")
g5 <- g4 + coord_fixed(ratio = 1.5)
g6 <- g5 + theme(legend.position = "right")
g6

Figure 12: Spatial scatter of crime incidents in Colchester

Figure 12 illustrates the locations as points on a Cartesian scatter plot. The size of the points indicates the number of incidents. The clustering of large, multi-coloured points in the vicinity of the town centre indicates the concentration of retail outlets, nightlife, and transport hubs. The sparsely populated points in the peripheral areas correlate with the low density of the population and the number of commercial outlets.

5.6 Interactive Leaflet Map

The interactive Leaflet map provided below features a heatmap layer displaying crime density and discrete markers that provide popups with information about the type of crime, street name, date, and policing outcome. Users have the ability to zoom and pan the map.

leaflet(crime) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addHeatmap(
    lng = ~long, lat = ~lat,
    intensity = 1, blur = 15, max = 10, radius = 12,
    group = "Heatmap"
  ) %>%
  addCircleMarkers(
    lng = ~long, lat = ~lat,
    radius = 3, stroke = FALSE, fillOpacity = 0.4,
    color = ~colorFactor("Set1", crime_group)(crime_group),
    popup = ~paste0(
      "<b>Category:</b> ", category, "<br>",
      "<b>Street:</b> ", street_name, "<br>",
      "<b>Date:</b> ", date, "<br>",
      "<b>Outcome:</b> ", outcome_grp
    ),
    group = "Points"
  ) %>%
  addLayersControl(
    overlayGroups = c("Heatmap", "Points"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  setView(lng = 0.9, lat = 51.89, zoom = 14)

Figure 13: Interactive Leaflet heatmap of crime density

The Leaflet map provides an opportunity to explore the data with a high degree of detail. The heatmap layer clearly identifies the corridor around the town centre as the primary crime hotspot, with smaller hotspots near the railway station and major retail areas. Clicking on the markers provides information about the crime incident.

6 Multivariate Analysis

Having examined crime and weather independently, this section will explore their co-variation. In order to determine whether monthly weather is statistically related to crime volume, this section will use scatter plots, an interactive bubble chart, pair plots, and a correlation matrix heatmap.

6.1 Scatter: Temperature vs Crime

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_monthly, aes(x = temp_avg_mo, y = n_crimes))
g1 <- g + geom_point(size = 4, colour = "#2C3E50")
g2 <- g1 + geom_smooth(method = "lm", se = TRUE, colour = "#E74C3C",
                       fill = "#E74C3C", alpha = 0.15)
g3 <- g2 + geom_text(aes(label = month), hjust = -0.2, size = 3.5)
g4 <- g3 + labs(title    = "Average Temperature vs Monthly Crime Count",
                subtitle = "Linear regression smoother",
                x = "Mean monthly temperature (\u00b0C)", y = "Number of incidents")
g4

Figure 14: Monthly average temperature vs crime count

Figure 14 displays the monthly crime count as a function of the average temperature, yielding a Pearson correlation coefficient of r = 0.7. The positively sloped line fit indicates a correlation in which temperatures are positively correlated with crime count, a relationship consistent with routine activity theory in which increased temperatures result in increased exposure in public spaces, guardianship of homes decreases, and the convergence of potential offenders and targets increases (Cohen & Felson, 1979). The month labels help in identifying the months that influence the general trend.

6.2 Scatter: Precipitation vs Crime

# Step-by-step ggplot object building (Lab 6/7 style)
g <- ggplot(crime_monthly, aes(x = precip_mo, y = n_crimes))
g1 <- g + geom_point(size = 4, colour = "#2C3E50")
g2 <- g1 + geom_smooth(method = "lm", se = TRUE, colour = "#E74C3C",
                       fill = "#E74C3C", alpha = 0.15)
g3 <- g2 + geom_text(aes(label = month), hjust = -0.2, size = 3.5)
g4 <- g3 + labs(title    = "Total Precipitation vs Monthly Crime Count",
                subtitle = "Linear regression smoother",
                x = "Total monthly precipitation (mm)", y = "Number of incidents")
g4

Figure 15: Monthly total precipitation vs crime count

Figure 15 attempts to gauge the impact of rainfall on the commission of crimes, yielding a Pearson correlation value of r = -0.26. While the correlation coefficient for temperature was stronger (r = 0.7), the impact of rainfall on the commission of crimes was naturally expected to be less pronounced, owing to the co-occurrence of high temperatures and low rainfall in the summer season. Hence, the inverse relationship between rainfall and the commission of crimes must be viewed with a measure of circumspection.

6.3 Interactive Bubble Chart

# Build ggplot object, then wrap with ggplotly (Lab 10 style)
G <- ggplot(crime_monthly, aes(x = temp_avg_mo, y = n_crimes,
                                size = precip_mo, colour = humidity_mo))
G <- G + geom_point(alpha = 0.7)
G <- G + geom_text(aes(label = month), vjust = -1.2, size = 3, show.legend = FALSE)
G <- G + scale_colour_viridis_c(option = "viridis")
G <- G + scale_size_continuous(range = c(3, 15))
G <- G + labs(title  = "Crime, Temperature, Precipitation & Humidity",
              x = "Average monthly temperature (\u00b0C)",
              y = "Monthly crime count",
              colour = "Humidity (%)", size = "Precip (mm)")

ggplotly(G)

Figure 16: Interactive bubble — temperature, crime, precipitation, humidity

The interactive bubble chart displays four different variables at once. The x-axis represents the temperature, the y-axis represents the crime count, the size of the bubbles represents the precipitation levels, and the colours represent the humidity levels. When the user places the pointer over the bubbles, the corresponding month label and the actual values are revealed. This way, the reader can easily determine if the months with high levels of temperature and low levels of precipitation correlate with high crime levels.

6.4 Pair Plot (GGally)

pairs_data <- crime_monthly %>%
  select(n_crimes, temp_avg_mo, precip_mo, humidity_mo, wind_mo, sun_hours_mo) %>%
  rename(
    Crimes      = n_crimes,
    Temperature = temp_avg_mo,
    Precip      = precip_mo,
    Humidity    = humidity_mo,
    Wind        = wind_mo,
    Sunshine    = sun_hours_mo
  )

ggpairs(
  pairs_data,
  lower = list(continuous = wrap("smooth", alpha = 0.5, colour = "#3498DB")),
  diag  = list(continuous = wrap("densityDiag", fill = "#3498DB", alpha = 0.4)),
  upper = list(continuous = wrap("cor", size = 5)),
  title = "Pair Plot: Crime Count and Weather Variables"
) +
  theme(plot.title = element_text(face = "bold", size = 14))

Figure 17: Pair plot of crime and weather variables

Figure 17 provides a comprehensive view of the relationship between each pair of monthly crime counts and weather variables. In the figure above, the upper triangle shows the correlation coefficients, the diagonal shows univariate density plots for each variable, and the lower triangle shows plots with LOESS smooths. This view is especially useful for detecting unexpected relationships, like the relationship between humidity and crimes, or for verifying relationships we might expect, like those involving temperature and sunshine.

6.5 Correlation Matrix Heatmap

# Correlation matrix and p-values (Lab 9/10 style using ggcorrplot)
cor_mat <- cor(pairs_data, use = "complete.obs")
p_mat   <- ggcorrplot::cor_pmat(pairs_data)

ggcorrplot::ggcorrplot(cor_mat,
                       hc.order = TRUE,
                       type     = "upper",
                       lab      = TRUE,
                       lab_size = 4,
                       p.mat    = p_mat,
                       insig    = "blank",
                       colors   = c("#3498DB", "white", "#E74C3C"),
                       title    = "Correlation Matrix: Crime and Weather Variables",
                       ggtheme  = theme_minimal(base_size = 13))

Figure 18: Correlation matrix heatmap with hierarchical ordering

Figure 18 demonstrates the same correlation structure but this time represented by a colour-coded heatmap with hierarchical clustering. Non-significant correlations, based on the p-value matrix, are blanked to ensure that only those correlations that are statistically significant are represented. Temperature (r = 0.7) and sunshine hours (r = 0.57) are identified as those variables with the highest positive correlation to monthly crime counts, while humidity (r = -0.71) is shown to have a negative correlation. Finally, precipitation is shown to have a lower correlation (r = -0.26). The hierarchical cluster analysis groups temperature and sunshine hours together due to their seasonality and reinforces the notion that temperature-based variables have a larger impact on crime volume than wind and pressure.

7 Vulnerability and Equity

The spatial distributions and categorical patterns identified above have significant implications that transcend the realm of scholarly description. A detailed understanding of the population that is disproportionately impacted by crime is vital in the formulation of inclusive public policy.

The clustering of crime incidents in the town centre, including retail areas, the nightclub district, and transportation nodes, suggests that the population that passes through these areas or calls them their workplace is disproportionately impacted. The retail worker, in particular, who tends to earn lower incomes and have fewer options in terms of route or work schedules, is disproportionately impacted by incidents of shoplifting, public order offences, and violence.

Anti-social behaviour (ASB), while considered a lower-level crime in terms of severity, can have a significant overall impact on the quality of life. The data suggests that ASB is distributed between residential settings and open space settings. It can be inferred that the everyday environment can be the location of ASB. The impact of ASB can be significant for vulnerable populations such as the elderly, the disabled, and families with young children in terms of the perceived safety that underpins community cohesion.

Moreover, the outcome statistics serve to further emphasize the issue of equity. As has been demonstrated in Section 3 and Figure 3 above, 67.3 percent of all incidents culminate in the “No Further Action” outcome. This has the effect of rendering the reporting process irrelevant for the victim of property crime and lower-level violence. When this phenomenon continues over several months, it has the potential to undermine the trust of the public in the reporting process. Already vulnerable communities who experience higher crime rates may be further dissuaded from reporting crime to the authorities.

Furthermore, the rising university population in the Colchester region serves to further complicate the situation. The students, who may be new to the region and live in unfamiliar environments, may be more prone to violent and public-order crime. Moreover, the students who frequent the nightclub corridor may be more vulnerable to violent crime. They may be less disposed to reporting crime or participating in the criminal justice system.

Addressing the issue of disparities requires not just more detailed statistics, such as correlating crime statistics with the Index of Multiple Deprivation. Rather, it requires more qualitative research into the communities to better understand their lived experience.

8 Ethical Considerations

8.1 Privacy and Coordinate Obfuscation

The United Kingdom Police open data portal deliberately generalizes the location of crimes by snapping them to the street segment or landmark instead of providing the exact location. This is an important privacy consideration, as the location of crimes provided at a more detailed level can lead to the stigmatization of certain households or structures. However, the generalization of the location of crimes can lead to uncertainty in the location of crimes reported in the data. Analysts should be cautious in interpreting the clustering of crime locations in certain areas, as it might be an artifact of the snapping process.

8.2 Under-Reporting and Policing Intensity

Recorded crime statistics can be seen as the product of the level of criminal activity and the methods by which it is recorded. Domestic violence, sexual crime, and hate crime tend not to be recorded in the way they actually occur. As such, the pattern of crime is not necessarily the true pattern. Similarly, in areas where police presence is more likely (such as the town centre), it is likely that the pattern of crime will be over-represented simply because more police officers will be present. The high concentration of crime around the police station is likely the product of such factors as much as it is the product of the true pattern of crime.

8.3 Limitations of Single-Station Weather Data

The reliance upon a single weather station to represent the conditions over the whole Colchester area is an assumption. The effects of microclimatic variations such as wind exposure or rainfall can be significant at the street level but remain invisible in the averaged data. Additionally, the averaged data hide the crime potential within the day; the warm weather can increase crime potential in the afternoon, while the cold weather can reduce it in the morning.

8.4 Single-Year Temporal Limitation

With only one year of data (2025), it is impossible to separate true seasonal effects from anomalies specific to the year. A warm winter, a large public event, or a change in policing strategy could create patterns that are unique and would not be repeated in other years. Additional years of data are necessary to accurately calculate seasonal effects while controlling for long-term trends.

9 Conclusion

The first hypothesis on seasonality receives support. The data shows a clear pattern in crime volume per month, peaking in May with 598 incidents and reaching a trough in Jan with 408 incidents. There is a positive Pearson correlation between average monthly temperature and crime volume, r = 0.7. Similarly, sunshine hours are positively correlated as well, r = 0.57. Precipitation shows a weaker negative correlation, r = -0.26. This supports routine activity theory as crime peaks during the warmer months when there are longer days and more opportunities for crime as more people are out and about, increasing the probability for crime as potential offenders and victims converge in public spaces (Cohen & Felson, 1979; Field, 1992).

The spatial hypothesis is supported as well. The static scatter plot (Figure 12) and interactive heatmap (Figure 13) produced using Leaflet show clear concentrations of crime incidents in the town centre, retail areas, nightlife areas, and the railway station. Although residential areas have the highest absolute crime volume due to their size, crime density per unit area is clearly higher in mixed-use areas. This supports routine activity theory on crime concentrations as nodes aggregate large numbers of people with little surveillance.

The outcome hypothesis is supported as well. The percentage of total recorded crime incidents that ultimately end in “No Further Action” or no formal policing outcome is 77.2%. The percentage of formal actions taken is 11.8%.

With these three findings, we have established that in one mid-sized English town such as Colchester, crime is seasonal in nature, with certain crimes more likely during certain times of the year. However, the policing system is not able to effectively translate crime incidents into meaningful outcomes for victims.

As was discussed in the ethical considerations section of the paper (Section 8), the constraints that limit the conclusions that can be drawn from the data provided include the obfuscation of coordinates, under-reporting of crime incidents, one-station weather reporting, and the one-year window.

Possible extensions of this study include using data over multiple years to determine if the effects seen in the data are actually seasonal or if they are year-specific anomalies. It might be possible to correlate crime statistics with the Index of Multiple Deprivation (IMD) at the ward level to determine if crime is concentrated in more disadvantaged areas.

10 References

UK Police Open Data Portal. Available at: https://data.police.uk/.

OGIMET Synoptic Data. Available at: https://www.ogimet.com/.

Bczernecki, B. (2023). R climate package (OGIMET interface). Available at: https://bczernecki.github.io/climate/.

Tierney, N. (2020). UK Police API (R package). Available at: https://ukpolice.njtierney.com/.

Wickham, H. et al. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686.

Cohen, L. E. and Felson, M. (1979). Social Change and Crime Rate Trends: A Routine Activity Approach. American Sociological Review, 44(4), 588–608.

Field, S. (1992). The Effect of Temperature on Crime. The British Journal of Criminology, 32(3), 340–351.

Cohn, E. G. and Rotton, J. (2000). Weather, Seasonal Trends, and Property Crimes in Minneapolis, 1987–1988. Journal of Environmental Psychology, 20(3), 257–272.

MA304 Assignment – Crime & Weather in Colchester (2025)

2507713

02 April 2026