1 Introduction and Motivation

Crime is not typically random with regard to time or place. Criminological studies over decades have shown that there is a seasonal, weather, working week and social rhythm to offending behaviour in a town. This report aims to do two things: first, to take one year of real, street-level police data for a historic garrison town in Essex, England, Colchester; and second, to interrogate the data visually in conjunction with the weather, posing a deceptively simple question: does the climate have a fingerprint on the pattern of recorded crime?

The analysis is based on two sets of data provided for the MA304 module. The first file, crime25.csv, includes 5,956 individual crime incidents reported in and around Colchester during the 12 calendar months of 2025, each with a crime category, approximate latitude/longitude, street label and investigative outcome. The second, temp25.csv, contains daily meteorological observations for the same year from a weather station close to the town, including temperature, humidity, wind, atmospheric pressure, cloud cover, sunshine and precipitation(Burt, 2024).

Rather than treating these as two unrelated tables, the report builds a single narrative — a story — in four movements. We first determine what types of crime are committed and how they are solved. Next, we look at the timing of when they happen, the year’s temporal rhythm, and remove the noise to show trend. We next turn to the weather and ask whether temperature and sunshine co-move with offending. Lastly we draw a map of the areas of town where crime is concentrated. Interpretation is provided for each figure throughout and the style of the figures has been maintained in a consistent manner to allow for easy comparison.

The following is presented in the form of a short methodological note. The crime records are time-stamped only to the month (e.g. 2025-01), whereas the weather is recorded daily. This mismatch in temporal resolution is important and is respected throughout: any analysis that seeks to link the two datasets is necessarily done at the monthly level, providing twelve aligned observations. We are thus honest to say that the climate–crime relationships below are exploratory and descriptive, not confirmatory (twelve points are not sufficient for a strong inferential claim), and that is how we read them.

1.1 The Weather–Crime Hypothesis: A Brief Review

The idea that weather influences crime is not a myth; it is one of the oldest questions in criminology that has been empirically studied, with a scholarly history that dates back over 150 years(Corcoran and Zahnow, 2022) . There are two theoretical traditions which dominate the modern literature and which make overlapping predictions for a temperate town such as Colchester.

The first is the temperature–aggression (or “heat”) hypothesis, most thoroughly formulated by Craig Anderson and his associates. Anderson conducted a series of archival studies and demonstrated that violent crime rates are higher in warmer months of the year, in warmer years and in warmer cities, and that the impact is greater for violent than non-violent crimes (Miles-Novelo and Anderson, 2022). One possible mechanism is physiological: uncomfortable heat leads to increases in hostile affect and aggressive cognition, which increases the likelihood of a marginal interaction escalating to violence. A recent meta-analysis broadly confirms the heat–aggression relation, but warns of heterogeneous and context-dependent effect sizes (Lynott et al., 2023), and a 2024 systematic review of studies from around the world similarly finds that the relation is positive, but with mixed effects in temperate climates compared to tropical climates(Lynott et al., 2023).

The second tradition is routine activity theory (Cohen & Felson, 1979) which does not focus on the disposition of the offender, but on the everyday interaction of a motivated offender, a suitable target and a lack of a capable guardian. Warmer, brighter weather changes routine activities: people leave the home, socialise outdoors, frequent the night-time economy and leave property exposed — multiplying opportunities for both violent and acquisitive crime. Importantly for this report, the most relevant UK evidence is from this tradition: Field analysed recorded crime for England and Wales and found that temperature has a positive effect on most property and violent crime, independent of season, but found no relationship with rainfall or hours of sunshine. That nuance provides us with a tangible, locally-relevant measure to compare our own Colchester results to, and a true analytical interest that can differ from Field’s results.

1.2 Aims of This Report

With this backdrop, this report has four concrete objectives: (i) to describe the structure, composition and data quality of the Colchester 2025 crime and weather datasets using rigorous descriptive statistics; (ii) to characterise the temporal rhythm of recorded crime, and isolate the trend from the noise using smoothing; (iii) to test, at the monthly resolution the data allow, whether temperature, sunshine and other weather variables co-move with recorded crime, and to place the result in the literature above; and (iv) to map the spatial concentration of crime across the town. Each claim is linked to a figure or table and each figure is explained.

How to use the interactive parts of this report. Some figures are interactive, and render directly in the HTML output, without any additional setup. Plotly charts are responsive to mouse: hover over any point or bar to get the exact value, zoom by dragging the mouse, and double-click to reset. The DT tables can be searched, sorted by clicking a column header, and paged. The leaflet map can be panned and zoomed; clusters expand on click and markers reveal a popup. No buttons will need to be pushed and no additional service will be necessary – just open the knitted .html file in any modern browser

How to use the interactive parts of this report. Several figures are interactive and render directly in the HTML output, requiring no extra setup. Charts built with plotly respond to the mouse: hover over any point or bar to read its exact value, drag to zoom, and double-click to reset. The DT tables can be searched, sorted by clicking a column header, and paged. The leaflet map can be panned and zoomed; clusters expand on click and markers reveal a popup. No buttons need to be pressed and no external service is required — simply open the knitted .html file in any modern browser.

2 Data Import

The crime table has fourteen columns but several carry little information for our purposes (persistent and context are largely empty, location type is almost constant). The weather table has eighteen columns, two of which (PreselevHp and SnowDepcm) are almost entirely missing because Colchester rarely sees snow and the station did not log sea-level-adjusted pressure. We explicitly deal with this in the cleaning step, and don’t silently distort the later plots.

# ---- Crime data --------------------------------------------------------------
crime_raw <- readr::read_csv("crime25.csv", show_col_types = FALSE)

# ---- Weather data ------------------------------------------------------------
temp_raw  <- readr::read_csv("temp25.csv",  show_col_types = FALSE)

# Quick structural glance (dimensions only, to keep the output compact)
cat("Crime data:  ", nrow(crime_raw), "rows ×", ncol(crime_raw), "columns\n")

## Crime data:   5956 rows × 13 columns

cat("Weather data:", nrow(temp_raw),  "rows ×", ncol(temp_raw),  "columns\n")

## Weather data: 365 rows × 18 columns

3 Data Cleaning and Wrangling

A disciplined data preparation is the foundation for good visualization. This section transforms text to proper dates, creates the analytically variables we will use many times (month, season, outcome grouping), coerces the weather variables to numeric and removes empty and redundant columns. A comment is provided for each step, to follow and reproduce the logic.

# -----------------------------------------------------------------------------#
# CLEANING THE CRIME DATA
# -----------------------------------------------------------------------------#
crime <- crime_raw %>%
  # 1. Drop the anonymous index column and the near-empty/constant columns.
  select(-1, -persistent_id, -context, -location_subtype) %>%
  # 2. The 'date' field is year-month text (e.g. "2025-01"). Convert it to a
  #    proper monthly date anchored to the first of the month, and to an ordered
  #    month label for plotting.
  mutate(
    month_date = as.Date(zoo::as.yearmon(date, "%Y-%m")),
    month_lab  = factor(format(month_date, "%b"),
                        levels = format(seq(as.Date("2025-01-01"),
                                            as.Date("2025-12-01"),
                                            by = "month"), "%b")),
    month_num  = lubridate::month(month_date)
  ) %>%
  # 3. Derive a meteorological SEASON from the month — central to our story.
  mutate(
    season = case_when(
      month_num %in% c(12, 1, 2) ~ "Winter",
      month_num %in% c(3, 4, 5)  ~ "Spring",
      month_num %in% c(6, 7, 8)  ~ "Summer",
      month_num %in% c(9, 10, 11)~ "Autumn"
    ),
    season = factor(season, levels = c("Winter", "Spring", "Summer", "Autumn"))
  ) %>%
  # 4. Tidy the category text into Title Case for readable labels.
  mutate(
    category_clean = stringr::str_to_title(stringr::str_replace_all(category, "-", " "))
  ) %>%
  # 5. Collapse the many fine-grained outcomes into a compact, interpretable
  #    set of outcome GROUPS, and flag whether an outcome was recorded at all.
  mutate(
    outcome_status = ifelse(is.na(outcome_status), "No outcome recorded", outcome_status),
    outcome_group = case_when(
      str_detect(outcome_status, "court|prosecut|caution") ~ "Justice action",
      str_detect(outcome_status, "no suspect|Unable")       ~ "No suspect / unable",
      str_detect(outcome_status, "investigation|investig")  ~ "Under investigation",
      str_detect(outcome_status, "another organisation")    ~ "Referred elsewhere",
      TRUE                                                   ~ "Other / unavailable"
    )
  )


crime <- crime %>% filter(!is.na(lat), !is.na(long))
glimpse(crime)

## Rows: 5,956
## Columns: 15
## $ category       <chr> "anti-social-behaviour", "anti-social-behaviour", "anti…
## $ date           <chr> "2025-01", "2025-01", "2025-01", "2025-01", "2025-01", …
## $ lat            <dbl> 51.89693, 51.89010, 51.89626, 51.88983, 51.88907, 51.88…
## $ long           <dbl> 0.893780, 0.906786, 0.883590, 0.902234, 0.897722, 0.897…
## $ street_id      <dbl> 2152981, 2153318, 2152722, 2153203, 2153077, 2153077, 2…
## $ street_name    <chr> "On or near St Paul's Road", "On or near Parking Area",…
## $ id             <dbl> 125619997, 125620182, 125620201, 125620204, 125619811, …
## $ location_type  <chr> "Force", "Force", "Force", "Force", "Force", "Force", "…
## $ outcome_status <chr> "No outcome recorded", "No outcome recorded", "No outco…
## $ month_date     <date> 2025-01-01, 2025-01-01, 2025-01-01, 2025-01-01, 2025-0…
## $ month_lab      <fct> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, …
## $ month_num      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ season         <fct> Winter, Winter, Winter, Winter, Winter, Winter, Winter,…
## $ category_clean <chr> "Anti Social Behaviour", "Anti Social Behaviour", "Anti…
## $ outcome_group  <chr> "Other / unavailable", "Other / unavailable", "Other / …

# -----------------------------------------------------------------------------#
# CLEANING THE WEATHER DATA
# -----------------------------------------------------------------------------#
temp <- temp_raw %>%

  mutate(
    Date      = as.Date(Date),
    month_key = format(Date, "%Y-%m"),
    month_num = lubridate::month(Date),
    season    = case_when(
      month_num %in% c(12, 1, 2) ~ "Winter",
      month_num %in% c(3, 4, 5)  ~ "Spring",
      month_num %in% c(6, 7, 8)  ~ "Summer",
      TRUE                       ~ "Autumn"
    ),
    season = factor(season, levels = c("Winter", "Spring", "Summer", "Autumn"))
  ) %>%
  # 2. SunD1h was imported as text (quoted) — coerce it to numeric. Coerce the
  #    other measured columns too, in case of stray text.
  mutate(
    SunD1h      = suppressWarnings(as.numeric(SunD1h)),
    WindkmhInt  = suppressWarnings(as.numeric(WindkmhInt)),
    WindkmhGust = suppressWarnings(as.numeric(WindkmhGust))
  ) %>%
  # 3. Drop columns that are essentially all missing and add no analytical value.
  select(-PreselevHp, -SnowDepcm)

# Rename the key weather variables to friendlier names used in plots/labels.
temp <- temp %>%
  rename(
    temp_avg = TemperatureCAvg,
    temp_max = TemperatureCMax,
    temp_min = TemperatureCMin,
    humidity = HrAvg,
    wind_kmh = WindkmhInt,
    gust_kmh = WindkmhGust,
    pressure = PresslevHp,
    precip_mm= Precmm,
    cloud_oct= TotClOct,
    sun_hours= SunD1h,
    vis_km   = VisKm
  )

glimpse(temp)

## Rows: 365
## Columns: 19
## $ station_ID <dbl> 3590, 3590, 3590, 3590, 3590, 3590, 3590, 3590, 3590, 3590,…
## $ Date       <date> 2025-12-31, 2025-12-30, 2025-12-29, 2025-12-28, 2025-12-27…
## $ temp_avg   <dbl> 1.7, 4.4, 4.6, 5.5, 2.3, 3.3, 3.5, 5.4, 7.5, 9.1, 7.2, 6.2,…
## $ temp_max   <dbl> 6.1, 5.9, 6.8, 6.8, 5.0, 4.6, 5.4, 7.0, 10.0, 10.1, 9.3, 11…
## $ temp_min   <dbl> -0.9, 0.9, 0.9, -0.7, -0.7, 2.0, 2.3, 3.8, 6.0, 6.8, 2.9, 2…
## $ TdAvgC     <dbl> 0.3, 2.4, 2.4, 3.9, 0.8, -1.1, -1.1, 2.9, 6.2, 8.6, 6.4, 4.…
## $ humidity   <dbl> 90.6, 87.2, 86.3, 88.5, 90.5, 73.6, 72.0, 84.6, 91.6, 96.3,…
## $ WindkmhDir <chr> "NNW", "NNE", "NE", "NE", "NE", "ENE", "ENE", "ENE", "E", "…
## $ wind_kmh   <dbl> 11.9, 13.4, 14.6, 18.8, 17.0, 22.7, 27.0, 23.1, 13.0, 13.9,…
## $ gust_kmh   <dbl> 31.5, 31.5, 31.5, 44.5, 33.4, 55.6, 53.7, 48.2, 29.7, 29.7,…
## $ pressure   <dbl> 1033.4, 1029.9, 1031.3, 1035.2, 1032.0, 1029.4, 1033.3, 102…
## $ precip_mm  <dbl> 0.0, 0.2, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 1.2, 0.6, 0.0,…
## $ cloud_oct  <dbl> 3.8, 7.2, 6.3, 7.0, 1.8, 3.6, 3.7, 8.0, 4.1, 7.6, 5.8, 0.6,…
## $ lowClOct   <dbl> 5.0, 7.2, 7.5, 7.7, 6.1, 6.7, 5.2, 8.0, 5.8, 7.6, 6.3, 2.8,…
## $ sun_hours  <dbl> 4.1, 0.0, 0.0, 0.0, 2.5, 6.2, 1.9, 0.0, 0.6, 0.0, 3.4, 5.9,…
## $ vis_km     <dbl> 24.0, 27.4, 18.6, 24.3, 17.7, 29.2, 37.9, 14.2, 12.4, 6.1, …
## $ month_key  <chr> "2025-12", "2025-12", "2025-12", "2025-12", "2025-12", "202…
## $ month_num  <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,…
## $ season     <fct> Winter, Winter, Winter, Winter, Winter, Winter, Winter, Win…

The cleaning leaves us with two analysis-ready tables: a tidy crime table of incidents (one row per crime, enriched with month, season, clean category and outcome group) and a tidy daily weather table with sensible variable names and the empty columns removed. Now we’re ready to describe the data before we link them.

4 Descriptive Statistics and Data Quality

Prior to any visualisation, there is a disciplined report that quantifies what it is working with. This section numerically profiles both datasets: the central tendency, dispersion, shape and completeness of each weather variable and the frequency structure of each categorical crime field. These tables are the statistical backbone of the report: every subsequent plot is, in effect, a picture of a number that appears in these tables(Park et al., 2022)

4.1 Numerical Summary of the Weather Variables

The table below reports, for each daily weather variable, the count of valid observations, the number missing, the mean and standard deviation, the full five-number summary (minimum, lower quartile, median, upper quartile, maximum), the interquartile range, and two shape statistics — skewness (asymmetry) and kurtosis (tailedness). Skewness near zero and kurtosis near zero (excess) indicate an approximately normal variable; large positive values flag a long right tail. We compute these with psych: describe () and present them in a styled table.

weather_vars <- temp %>%
  select(temp_avg, temp_max, temp_min, humidity, wind_kmh, gust_kmh,
         pressure, precip_mm, cloud_oct, sun_hours, vis_km)

desc_w <- psych::describe(weather_vars, fast = FALSE,
                          quant = c(.25, .75), IQR = TRUE)

desc_tbl <- desc_w %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Variable") %>%
  mutate(Missing = nrow(temp) - n) %>%
  transmute(
    Variable,
    N = n, Missing,
    Mean = round(mean, 2), SD = round(sd, 2),
    Min = round(min, 1), Q1 = round(Q0.25, 1),
    Median = round(median, 1), Q3 = round(Q0.75, 1),
    Max = round(max, 1), IQR = round(IQR, 1),
    Skew = round(skew, 2), Kurtosis = round(kurtosis, 2)
  )

desc_tbl %>%
  kbl(caption = "Table 1. Descriptive statistics for the daily weather variables, 2025.",
      align = c("l", rep("r", 12))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 12) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  column_spec(1, bold = TRUE)

Table 1. Descriptive statistics for the daily weather variables, 2025.
Variable	N	Missing	Mean	SD	Min	Q1	Median	Q3	Max	IQR	Skew	Kurtosis
temp_avg	365	0	11.09	5.74	-2.1	7.0	11.1	15.6	23.4	8.6	-0.03	-0.83
temp_max	365	0	15.60	6.85	1.4	10.5	15.6	20.8	32.3	10.3	0.01	-0.79
temp_min	365	0	6.12	5.15	-6.5	2.0	6.3	10.2	17.7	8.2	-0.08	-0.88
humidity	365	0	77.69	11.10	46.6	68.5	77.7	87.5	98.9	19.0	-0.22	-0.85
wind_kmh	365	0	15.92	5.42	5.1	12.0	15.2	19.2	45.8	7.2	1.02	2.72
gust_kmh	365	0	39.09	12.06	14.8	29.7	37.1	46.3	81.5	16.6	0.72	0.40
pressure	365	0	1015.56	10.24	982.1	1008.5	1016.4	1022.7	1040.7	14.2	-0.37	0.18
precip_mm	365	0	1.31	3.18	0.0	0.0	0.0	1.0	35.6	1.0	5.15	40.47
cloud_oct	365	0	4.53	2.45	0.0	2.5	4.8	6.7	8.0	4.2	-0.29	-1.11
sun_hours	364	1	5.15	4.33	0.0	1.0	4.5	8.4	15.3	7.4	0.44	-0.96
vis_km	365	0	30.93	13.21	1.0	22.1	31.2	40.6	68.3	18.5	-0.10	-0.61

There is a lot to learn from the table. The mean daily temperature in Colchester is in the low double figures, with a large range, the minimum being below freezing, and the maximum in the high twenties, which is the typical range of a maritime-temperate eastern England climate.

Precipitation is the most skewed variable: the median is significantly below the mean and the skewness is quite high because most days have little or no precipitation, but a few days have a lot of precipitation (a typical right-skewed rainfall distribution). The sunshine hours and cloud cover exhibit the expected mirror-image behaviour, while the atmospheric pressure is the most symmetric of the variables, and is the closest to a normal distribution, with a very small range around about 1010-1020 hPa. The Missing column validates our cleaning choices: after discarding the two near empty fields, there are only a few missing values, and no imputation is needed, and subsequent summaries are based on virtually complete data.

A useful one-number complement to the table is the coefficient of variation (CV = SD ÷ mean), which expresses dispersion on a scale-free basis and so lets us compare the volatility of variables measured in different units.

cv_tbl <- weather_vars %>%
  summarise(across(everything(),
                   list(mean = ~mean(.x, na.rm = TRUE),
                        sd   = ~sd(.x, na.rm = TRUE)))) %>%
  tidyr::pivot_longer(everything(),
                      names_to = c("Variable", ".value"),
                      names_sep = "_(?=[^_]+$)") %>%
  mutate(CV = round(100 * sd / mean, 1)) %>%
  arrange(desc(CV)) %>%
  transmute(Variable, Mean = round(mean, 2), SD = round(sd, 2),
            `CV (%)` = CV)

cv_tbl %>%
  kbl(caption = "Table 2. Coefficient of variation, ranking the weather variables by relative volatility.",
      align = "lrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 2. Coefficient of variation, ranking the weather variables by relative volatility.
Variable	Mean	SD	CV (%)
precip_mm	1.31	3.18	242.8
temp_min	6.12	5.15	84.2
sun_hours	5.15	4.33	84.1
cloud_oct	4.53	2.45	54.2
temp_avg	11.09	5.74	51.8
temp_max	15.60	6.85	43.9
vis_km	30.93	13.21	42.7
wind_kmh	15.92	5.42	34.0
gust_kmh	39.09	12.06	30.9
humidity	77.69	11.10	14.3
pressure	1015.56	10.24	1.0

In order of relative volatility, precipitation and sunshine are the most variable (day-to-day values are more variable than the average), followed by pressure and visibility, which are comparatively stable. This is important for the later weather–crime analysis: a variable that varies little over the year (pressure) has little explanatory power by construction, while the much more volatile temperature and sunshine series have the signal most likely to track a similarly seasonal crime series.

4.2 Frequency Structure of the Crime Data

The categorical crime fields are now profiled. The first table is a complete frequency distribution of crime categories with cumulative percentages, which quantifies the concentration we will visualise shortly.

cat_freq <- crime %>%
  count(category_clean, name = "Frequency") %>%
  arrange(desc(Frequency)) %>%
  mutate(
    Percent = 100 * Frequency / sum(Frequency),
    `Cumulative %` = cumsum(Percent)
  ) %>%
  mutate(Percent = round(Percent, 1),
         `Cumulative %` = round(`Cumulative %`, 1))

cat_freq %>%
  kbl(caption = "Table 3. Frequency distribution of crime categories with cumulative share.",
      col.names = c("Crime category", "Frequency", "Percent", "Cumulative %"),
      align = "lrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 3. Frequency distribution of crime categories with cumulative share.
Crime category	Frequency	Percent	Cumulative %
Violent Crime	2439	41.0	41.0
Shoplifting	709	11.9	52.9
Anti Social Behaviour	590	9.9	62.8
Public Order	452	7.6	70.3
Criminal Damage Arson	403	6.8	77.1
Other Theft	348	5.8	83.0
Vehicle Crime	291	4.9	87.8
Drugs	197	3.3	91.2
Burglary	140	2.4	93.5
Bicycle Theft	102	1.7	95.2
Other Crime	89	1.5	96.7
Robbery	80	1.3	98.1
Theft From The Person	63	1.1	99.1
Possession Of Weapons	53	0.9	100.0

The cumulative column provides an explicit concentration and allows for a Pareto-like reading: The three most common categories alone account for well over half of all recorded incidents, and about five categories cover about 80% – the familiar “vital few” pattern. This has analytical implications in that a sensible analysis should focus interpretive efforts on these few high-volume categories and not on all fourteen categories.

A small summary table captures the key descriptive statistics of the entire crime data set—the “dataset fingerprint” that a reader requires at the outset

overview <- tibble::tibble(
  Statistic = c("Total recorded incidents",
                "Distinct crime categories",
                "Distinct outcome statuses",
                "Distinct streets referenced",
                "Months covered",
                "Busiest month (count)",
                "Quietest month (count)",
                "Mean incidents per month",
                "Most common category",
                "Share with no positive justice outcome"),
  Value = c(
    format(nrow(crime), big.mark = ","),
    dplyr::n_distinct(crime$category_clean),
    dplyr::n_distinct(crime$outcome_status),
    dplyr::n_distinct(crime$street_name),
    dplyr::n_distinct(crime$month_date),
    {m <- crime %>% count(month_lab) %>% slice_max(n, n = 1)
     paste0(m$month_lab, " (", m$n, ")")},
    {m <- crime %>% count(month_lab) %>% slice_min(n, n = 1)
     paste0(m$month_lab, " (", m$n, ")")},
    round(nrow(crime) / dplyr::n_distinct(crime$month_date), 1),
    crime %>% count(category_clean) %>% slice_max(n, n = 1) %>% pull(category_clean),
    paste0(round(100 * mean(crime$outcome_group %in%
                 c("No suspect / unable", "Under investigation",
                   "Other / unavailable")), 1), "%")
  )
)

overview %>%
  kbl(caption = "Table 4. Crime dataset at a glance.",
      align = "lr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  column_spec(1, bold = TRUE)

Table 4. Crime dataset at a glance.
Statistic	Value
Total recorded incidents	5,956
Distinct crime categories	14
Distinct outcome statuses	12
Distinct streets referenced	343
Months covered	12
Busiest month (count)	May (598)
Quietest month (count)	Jan (408)
Mean incidents per month	496.3
Most common category	Violent Crime
Share with no positive justice outcome	61%

This one table encapsulates the entire crime data set in 10 facts: how big it is, how wide it is in terms of crime types, streets and months, what the difference is between the busiest and quietest months, and — most importantly — that a large majority of incidents are not followed by a positive justice outcome. The movements that follow are an unpacking of each of these numbers.

4.3 Monthly and Seasonal Aggregates

Finally, as the monthly level is where crime and weather will be combined, the important variables are tabulated by month, to provide a preview of the combined analysis and a reference table that the reader can refer to throughout.

monthly_overview <- crime %>%
  mutate(month_key = format(month_date, "%Y-%m")) %>%
  count(month_key, month_lab, season, name = "crimes") %>%
  left_join(
    temp %>% mutate(month_key = format(Date, "%Y-%m")) %>%
      group_by(month_key) %>%
      summarise(temp_avg = mean(temp_avg, na.rm = TRUE),
                sun_hours = sum(sun_hours, na.rm = TRUE),
                precip_mm = sum(precip_mm, na.rm = TRUE),
                .groups = "drop"),
    by = "month_key"
  ) %>%
  arrange(month_key) %>%
  transmute(Month = month_lab, Season = season, Crimes = crimes,
            `Mean °C` = round(temp_avg, 1),
            `Sunshine (h)` = round(sun_hours, 0),
            `Rainfall (mm)` = round(precip_mm, 0))

monthly_overview %>%
  kbl(caption = "Table 5. Monthly crime counts alongside summary weather, 2025.",
      align = "llrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 5. Monthly crime counts alongside summary weather, 2025.
Month	Season	Crimes	Mean °C	Sunshine (h)	Rainfall (mm)
Jan	Winter	408	3.5	70	59
Feb	Winter	465	4.5	52	36
Mar	Spring	447	7.0	218	4
Apr	Spring	518	10.0	258	22
May	Spring	598	12.7	243	24
Jun	Summer	531	17.6	261	37
Jul	Summer	510	18.5	202	35
Aug	Summer	562	17.9	195	24
Sep	Autumn	472	14.2	175	37
Oct	Autumn	532	11.4	77	57
Nov	Autumn	477	8.6	55	99
Dec	Winter	436	6.9	67	44

# Seasonal roll-up
crime %>%
  count(season, name = "crimes") %>%
  left_join(
    temp %>% group_by(season) %>%
      summarise(`Mean °C` = round(mean(temp_avg, na.rm = TRUE), 1),
                `Mean sun (h/day)` = round(mean(sun_hours, na.rm = TRUE), 1),
                .groups = "drop"),
    by = "season"
  ) %>%
  rename(Season = season, Crimes = crimes) %>%
  kbl(caption = "Table 6. Seasonal totals: crime counts and average weather.",
      align = "lrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 6. Seasonal totals: crime counts and average weather.
Season	Crimes	Mean °C	Mean sun (h/day)
Winter	1309	5.0	2.1
Spring	1563	9.9	7.8
Summer	1603	18.0	7.2
Autumn	1481	11.4	3.4

At first sight these aggregates preview the main conclusion of the report. The warmer and sunnier months and seasons are correlated with the higher crime counts and the colder and darker months are correlated with the lowest crime counts. If this alignment is a true weather signal or simply a chance occurrence of two independently seasonal series is the subject of the remainder of the report: but the descriptive tables have indicated where to look.

5 Movement I — What Crime Occurs, and How Is It Resolved?

5.1 Numerical Summary (Frequency Table)

Each type of crime is first introduced on a plain table before any picture. The table below shows the number of incidents in each category and the percentage of the total number of incidents that occurred during the year, from highest to lowest.

cat_summary <- crime %>%
  count(category_clean, name = "Incidents") %>%
  mutate(Share = Incidents / sum(Incidents)) %>%
  arrange(desc(Incidents))

cat_summary %>%
  mutate(Share = scales::percent(Share, accuracy = 0.1)) %>%
  kbl(caption = "Table 7. Recorded crime in Colchester, 2025, by category.",
      col.names = c("Crime category", "Incidents", "Share of total"),
      align = c("l", "r", "r")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 7. Recorded crime in Colchester, 2025, by category.
Crime category	Incidents	Share of total
Violent Crime	2439	41.0%
Shoplifting	709	11.9%
Anti Social Behaviour	590	9.9%
Public Order	452	7.6%
Criminal Damage Arson	403	6.8%
Other Theft	348	5.8%
Vehicle Crime	291	4.9%
Drugs	197	3.3%
Burglary	140	2.4%
Bicycle Theft	102	1.7%
Other Crime	89	1.5%
Robbery	80	1.3%
Theft From The Person	63	1.1%
Possession Of Weapons	53	0.9%

The table tells a stark story of concentration. Violent crime alone makes up about two out of every five recorded incidents, far outstripping all other categories. Shoplifting, anti-social behaviour, public order and criminal damage / arson are a clear second tier and weapons possession, robbery and theft from the person are a long tail of comparatively rare events. This is a heavy skew – a few categories doing most of the work – and is a trait that will be repeated in almost all future visualisations, and why we prefer ordered bar charts to pie charts for the detailed breakdowns.

5.2 The Composition of Crime (Bar Plot)

A horizontal bar chart displays the same information much faster than the table. Bars are sorted by frequency and coloured on a single perceptual scale, with counts marked at the ends of the bars, making the chart self-contained.

ggplot(cat_summary,
       aes(x = Incidents, y = reorder(category_clean, Incidents),
           fill = Incidents)) +
  geom_col(width = 0.78) +
  geom_text(aes(label = scales::comma(Incidents)),
            hjust = -0.15, size = 3.4, colour = "grey20") +
  scale_fill_viridis_c(option = "mako", direction = -1, guide = "none") +
  scale_x_continuous(expand = expansion(mult = c(0, 0.12)),
                     labels = scales::comma) +
  labs(
    title    = "Violent crime dominates the Colchester picture",
    subtitle = "Total recorded incidents by category, 2025",
    x = "Number of incidents", y = NULL,
    caption  = "Source: UK Police street-level data (crime25.csv)"
  )

The visual reinforces and refines the table’s message. The eye is drawn first to the violent-crime bar, which extends roughly three times further than the next longest. Reading down the ordered bars gives an immediate sense of the town’s “crime profile”: predominantly interpersonal (violence, public order, anti-social behaviour) and acquisitive at the shop counter (shoplifting), rather than, say, dominated by burglary or vehicle crime. This profile is self-explanatory for a garrison and university town that has a vibrant retail centre.

5.3 Where Cases End Up (Two-Way Table and Stacked Bars)

Counting crimes is only half the story; what matters to residents is whether anything is done. The following two-way table is a cross-tabulation of the broad outcome groups and season, showing the number of cases and outcomes.

twoway <- crime %>%
  count(season, outcome_group) %>%
  tidyr::pivot_wider(names_from = outcome_group, values_from = n, values_fill = 0)

twoway %>%
  kbl(caption = "Table 8. Two-way table of crime outcome group by season.",
      align = "lrrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 8. Two-way table of crime outcome group by season.
season	Justice action	No suspect / unable	Other / unavailable	Referred elsewhere	Under investigation
Winter	395	385	299	26	204
Spring	591	498	441	33	0
Summer	624	605	331	43	0
Autumn	572	489	202	41	177

A 100%-stacked bar chart is the natural companion, normalising each season to the same height, to see the proportions rather than the raw counts.

crime %>%
  count(season, outcome_group) %>%
  ggplot(aes(x = season, y = n, fill = outcome_group)) +
  geom_col(position = "fill", width = 0.72) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_viridis_d(option = "cividis") +
  labs(
    title    = "Most cases close without an identified suspect",
    subtitle = "Outcome composition within each season (normalised to 100%)",
    x = NULL, y = "Share of cases", fill = "Outcome group"
  )

There are two significant findings. Firstly, the outcome mix is very consistent from winter to autumn – the proportion of cases in each coloured band is almost identical each season – indicating that the justice pipeline is processing cases at a consistent rate throughout the year. Second (and more soberingly), the biggest band in each season is “no suspect identified / unable to prosecute.” The majority of crime that is recorded in Colchester does not result in a charge. This is in keeping with the national trends for high volume, low evidential offences like anti-social behaviour and minor violence and it is a re-thinking of the later temporal analysis: it is the rhythm of reporting and recording that is being analysed.

5.4 A Cleveland Dot Plot of Outcomes

If a comparison is being made between one quantity and a large number of categories, the Cleveland dot plot is a cleaner alternative to the bar chart as it eliminates the need for heavy ink and allows small differences to be noticed. Here we plot the total number of incidents falling into each detailed outcome status.

crime %>%
  count(outcome_status, name = "n") %>%
  ggplot(aes(x = n, y = reorder(outcome_status, n))) +
  geom_segment(aes(x = 0, xend = n, yend = outcome_status),
               colour = "grey75") +
  geom_point(aes(size = n), colour = "#1f77b4") +
  scale_size_continuous(range = c(2.5, 7), guide = "none") +
  scale_x_continuous(labels = scales::comma,
                     expand = expansion(mult = c(0, 0.08))) +
  labs(
    title    = "The investigative funnel, in detail",
    subtitle = "Number of crimes by recorded outcome status",
    x = "Number of incidents", y = NULL
  )

The dot plot reveals the finer detail of the grouped bars. Two outcomes — “investigation complete; no suspect identified” and “unable to prosecute suspect” — together dominate, each accounting for well over a thousand cases. Positive justice outcomes such as cautions are vanishingly rare by comparison. Many crimes go in, few come out of the funnel; the funnel is steep.

6 Movement II — When Does Crime Happen?

6.1 The Annual Rhythm (Time Series with Smoothing)

Now it’s time to shift to what to when. The incidents are summed to give the yearly headline time series of monthly totals. The raw monthly counts are jagged, so we apply a LOESS smoother to reveal the underlying trend, thus directly meeting the criterion that smoothing be used to show a pattern.

monthly_crime <- crime %>%
  count(month_date, name = "incidents") %>%
  arrange(month_date)

ggplot(monthly_crime, aes(x = month_date, y = incidents)) +
  geom_line(colour = "grey55", linewidth = 0.7) +
  geom_point(size = 2.6, colour = "#2c3e50") +
  geom_smooth(method = "loess", se = TRUE, span = 0.7,
              colour = "#e74c3c", fill = "#e74c3c", alpha = 0.12) +
  scale_x_date(date_breaks = "1 month", date_labels = "%b") +
  scale_y_continuous(limits = c(0, NA)) +
  labs(
    title    = "Recorded crime rises into late spring, then drifts down",
    subtitle = "Monthly incident totals with a LOESS trend (red) and 95% band",
    x = NULL, y = "Incidents per month",
    caption  = "Smoother: LOESS, span = 0.7"
  )

The series is not horizontal. Counts climb from a January floor to a clear late-spring / early-summer peak around May, before easing through the autumn. The red LOESS curve applies smoothing to remove month-to-month noise and to clearly show the hump. This is the first indication of a seasonal effect – and the obvious explanation is temperature: warmer months have people out of the house, in the night time economy and in contact, plausibly increasing violence, public order and anti-social behaviour. That is an idea that we test head-on in Movement III

6.2 Decomposing the Mix Over Time (Stacked Area / Stream)

One line conceals what crimes are responsible for the seasonal hump. A stacked area chart of the top five categories illustrates the change in the composition over time (monthly).

top5 <- crime %>% count(category_clean, sort = TRUE) %>% slice_head(n = 5) %>% pull(category_clean)

crime %>%
  filter(category_clean %in% top5) %>%
  count(month_date, category_clean) %>%
  ggplot(aes(x = month_date, y = n, fill = category_clean)) +
  geom_area(alpha = 0.85, colour = "white", linewidth = 0.2) +
  scale_fill_viridis_d(option = "turbo", end = 0.9) +
  scale_x_date(date_breaks = "1 month", date_labels = "%b") +
  labs(
    title    = "Violent crime and shoplifting drive the seasonal swell",
    subtitle = "Monthly counts for the five most common categories",
    x = NULL, y = "Incidents per month", fill = "Category"
  )

The area chart ascribes the increase in the middle of the year mainly to violent crime, which has a clear ‘spike’ in the spring and summer and to a lesser extent shoplifting and public order offences. Criminal damage is one of the areas that remains relatively unchanged. It is analytically important because it indicates that the seasonal signal is not distributed uniformly across all types of crime, but rather is focused on the types of crime that a temperature mechanism would predict – interpersonal, outdoor and night-time-economy offences.

6.3 Distribution of Daily-Equivalent Activity (Histogram and Density)

What are the fluctuations in crime from month to month in each season? The distribution is described by a histogram of the number of incidents in each month, and a density curve overlaid on the histogram..

ggplot(monthly_crime, aes(x = incidents)) +
  geom_histogram(aes(y = after_stat(density)), bins = 8,
                 fill = "#3498db", colour = "white", alpha = 0.85) +
  geom_density(colour = "#e74c3c", linewidth = 1.1) +
  geom_rug(colour = "grey40") +
  labs(
    title    = "How busy is a typical month?",
    subtitle = "Distribution of monthly incident totals with kernel density (red)",
    x = "Incidents per month", y = "Density"
  )

The distribution is single-peaked, with the majority of months in the range 430-530, and a thin tail towards the higher end of the distribution that are the busier late-spring months. There is no second mode, indicating that there is no sudden “high season / low season” dichotomy, but rather a gradual modulation of the seasonality, in line with the smooth LOESS hump shown above.

6.4 Spread Within Seasons (Box and Violin / Sina Plots)

To make a comparison between the distribution of monthly counts for the four seasons (not just the averages) we combine a violin (which shows density shape), an inner box plot

season_monthly <- crime %>%
  count(season, month_date, name = "incidents")

ggplot(season_monthly, aes(x = season, y = incidents, fill = season)) +
  geom_violin(alpha = 0.35, colour = NA, trim = FALSE) +
  geom_boxplot(width = 0.16, outlier.shape = NA, alpha = 0.9) +
  ggforce::geom_sina(size = 2.4, colour = "grey20", maxwidth = 0.5) +
  scale_fill_viridis_d(option = "viridis", guide = "none") +
  labs(
    title    = "Spring and summer months run busier and more variable",
    subtitle = "Monthly incident counts per season (violin + box + sina of the months)",
    x = NULL, y = "Incidents per month"
  )

The medians (the white line in each box) show that spring and summer are well above winter and autumn and that their violins are taller and wider, suggesting that they are not only more active, but more variable than the other seasons. The sina points enable us to confirm that this is not due to one outlier, but that the individual months actually do move upwards in the warm half of the year. This is the distributional analogue of the hump in the time series and suggests that a first look at temperature is warranted next.

6.5 A Ridgeline View of Category Seasonality

A ridgeline (joy) plot displays a single density for each category, and allows us to quickly see which months have a higher or lower number of crimes.

crime %>%
  filter(category_clean %in% top5) %>%
  ggplot(aes(x = month_num, y = category_clean, fill = category_clean)) +
  ggridges::geom_density_ridges(scale = 1.6, alpha = 0.8, colour = "white") +
  scale_x_continuous(breaks = 1:12, labels = month.abb) +
  scale_fill_viridis_d(option = "plasma", end = 0.9, guide = "none") +
  labs(
    title    = "When across the year does each crime peak?",
    subtitle = "Density of incidents over the months, top five categories",
    x = NULL, y = NULL
  )

There are some nuances in timing in the ridges. Violent crime and public order are more likely to spike in the middle of the year when the weather is warmer, while shoplifting is relatively level – a testament to the fact that not all crime types follow the same seasonal beat. These are shapes for each category that a pure aggregate analysis would never reveal, and thus represent the type of “interesting feature” that is possible to see with a category-specific shape.

7 Movement III — Does the Weather Leave a Fingerprint?

7.1 Building the Bridge Between the Datasets

The two datasets only correspond at the monthly level and so we summarise the daily weather data to obtain monthly averages (and totals for rainfall and sunshine) and then merge these data with the monthly crime counts. The table of twelve months that correspond to each other is the basis of all climate–crime comparisons. .

# Aggregate the daily weather up to monthly means / sums
monthly_weather <- temp %>%
  mutate(month_key = format(Date, "%Y-%m")) %>%
  group_by(month_key) %>%
  summarise(
    temp_avg  = mean(temp_avg,  na.rm = TRUE),
    temp_max  = mean(temp_max,  na.rm = TRUE),
    humidity  = mean(humidity,  na.rm = TRUE),
    wind_kmh  = mean(wind_kmh,  na.rm = TRUE),
    pressure  = mean(pressure,  na.rm = TRUE),
    cloud_oct = mean(cloud_oct, na.rm = TRUE),
    precip_mm = sum(precip_mm,  na.rm = TRUE),
    sun_hours = sum(sun_hours,  na.rm = TRUE),
    .groups = "drop"
  )

# Crime counts keyed by the same "YYYY-MM" string
monthly_crime_key <- crime %>%
  mutate(month_key = format(month_date, "%Y-%m")) %>%
  count(month_key, name = "crimes")

# Join the two monthly tables
merged <- monthly_crime_key %>%
  inner_join(monthly_weather, by = "month_key") %>%
  mutate(month_date = as.Date(paste0(month_key, "-01")),
         month_lab  = format(month_date, "%b")) %>%
  arrange(month_date)

merged %>%
  mutate(across(where(is.numeric), ~round(.x, 1))) %>%
  select(Month = month_lab, Crimes = crimes, `Avg °C` = temp_avg,
         `Sun hrs` = sun_hours, `Rain mm` = precip_mm, `Humidity %` = humidity,
         `Wind km/h` = wind_kmh) %>%
  kbl(caption = "Table 9. Monthly crime totals aligned with monthly weather.",
      align = "lrrrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 9. Monthly crime totals aligned with monthly weather.
Month	Crimes	Avg °C	Sun hrs	Rain mm	Humidity %	Wind km/h
Jan	408	3.5	70.2	59.0	87.4	16.3
Feb	465	4.5	52.2	36.4	85.8	15.5
Mar	447	7.0	218.5	4.0	74.6	13.7
Apr	518	10.0	257.5	21.6	71.0	14.2
May	598	12.7	242.6	23.8	69.5	16.5
Jun	531	17.6	261.1	37.0	67.6	16.9
Jul	510	18.5	202.1	34.8	70.1	14.1
Aug	562	17.9	195.4	24.0	68.0	14.9
Sep	472	14.2	175.4	36.8	77.0	17.2
Oct	532	11.4	77.3	57.4	82.7	16.4
Nov	477	8.6	54.8	99.4	89.7	17.6
Dec	436	6.9	66.9	44.2	89.4	17.7

The merged table already invites a reading: the busiest months (May, August) coincide with warmer temperatures and longer sunshine, while the quietest (January, December) are cold and dark. We now make that impression official.

7.2 Temperature Against Crime (Scatter Plot with Smoothing)

The main part of the report shows the central plot of the report, which is the monthly crime plotted against the monthly mean temperature, with each point labelled by its month, and with the linear trend with a confidence band

ggplot(merged, aes(x = temp_avg, y = crimes)) +
  geom_smooth(method = "lm", se = TRUE, colour = "#e74c3c",
              fill = "#e74c3c", alpha = 0.12) +
  geom_point(aes(colour = crimes), size = 4) +
  geom_text(aes(label = month_lab), vjust = -1.1, size = 3.4, colour = "grey25") +
  scale_colour_viridis_c(option = "inferno", end = 0.85, guide = "none") +
  labs(
    title    = "Warmer months tend to record more crime",
    subtitle = "Each point is a month of 2025; red line is an OLS fit with 95% band",
    x = "Monthly mean temperature (°C)", y = "Recorded crimes"
  )

The cloud of months is sloped gently upwards, with colder months (January, February, December) towards the bottom left, and warmer months (June, July, August) towards the top right. As predicted by the criminology “temperature–aggression” hypothesis, the fitted line suggests a positive relationship between temperature and the crime reported. The correlation is genuine but weak – a few months lie well away from the line and it is important to remember that temperature is just one of the many factors that affect the figures (school terms, paydays, events, policing intensity etc.) and that there are only 12 points so it is easy to over-claim.

7.3 Correlation Analysis

To quantify the association (and to determine if other weather variables are more important than raw temperature) we calculate the entire correlation matrix of the monthly variables and plot it.

corr_vars <- merged %>%
  select(crimes, temp_avg, temp_max, humidity, wind_kmh,
         pressure, cloud_oct, precip_mm, sun_hours)

corr_mat <- cor(corr_vars, use = "complete.obs")

corrplot::corrplot(
  corr_mat, method = "color", type = "upper",
  addCoef.col = "black", number.cex = 0.7,
  tl.col = "grey20", tl.srt = 45, diag = FALSE,
  col = colorRampPalette(c("#2166ac", "white", "#b2182b"))(200),
  mar = c(0, 0, 2, 0),
  title = "Correlation matrix of monthly crime and weather"
)

crime_cor <- sort(corr_mat["crimes", -1], decreasing = TRUE)
tibble(Variable = names(crime_cor),
       `Correlation with crime` = round(as.numeric(crime_cor), 2)) %>%
  kbl(caption = "Table 11. Pearson correlation of each weather variable with monthly crime.",
      align = "lr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 11. Pearson correlation of each weather variable with monthly crime.
Variable	Correlation with crime
temp_max	0.72
temp_avg	0.70
sun_hours	0.57
pressure	0.26
wind_kmh	-0.12
cloud_oct	-0.23
precip_mm	-0.26
humidity	-0.71

The matrix provides a meteorological story that is coherent. Crime correlates positively with the “good-weather” variables — average temperature, maximum temperature and total sunshine — and negatively with humidity and cloud cover. Of the weather measures, temperature and sunshine are the strongest indicators and wind and pressure are not at all correlated with offending. Also note the high internal relationships between the weather variables themselves (warm, bright, dry months go together, and humid, cloudy, rainy months go together), so that we do not want to consider each of the coefficients as an independent cause: warm, bright, dry months go with higher recorded crime, and humid, cloudy, rainy months go with lower recorded crime.

7.4 A Pair Plot of the Key Drivers

Finally, a pair plot (scatterplot matrix) shows, in one figure, every bivariate relationship and marginal distribution among crime and the three most relevant weather variables — the most information-dense view in the report.

GGally::ggpairs(
  merged %>% select(crimes, temp_avg, sun_hours, humidity),
  columnLabels = c("Crimes", "Mean °C", "Sun hrs", "Humidity %"),
  title = "Pairwise relationships among crime and key weather variables",
  upper = list(continuous = GGally::wrap("cor", size = 4)),
  lower = list(continuous = GGally::wrap("smooth", colour = "#e74c3c",
                                         alpha = 0.6, size = 1.2)),
  diag  = list(continuous = GGally::wrap("densityDiag", fill = "#3498db",
                                         alpha = 0.5))
) +
  theme_report(11)

The pair plot is a summary of the entire movement. The Crimes row shows the positive correlation with temperature and sunshine and the negative correlation with humidity all in one place, and the correlation coefficients are shown in the mirrored upper cells. The diagonal densities bring to mind that temperature and sunshine are also bimodal-ish throughout the year (cool half and warm half), and that is the ultimate source of the seasonal shape of the crime series.

7.5 Quantifying the Effect: A Simple Regression Model

Correlation is a measure of the direction and magnitude of association; a regression is a measure of how many additional crimes are associated with each additional degree of warmth and is able to provide a formal significance test and goodness of fit measure. We fit an ordinary-least-squares model of monthly crime counts on mean temperature. This is a deliberately parsimonious (one predictor) summary of observations, and we report it as an effect-size summary, not a causal claim, given that only twelve observations were reported.

crime_lm <- lm(crimes ~ temp_avg, data = merged)

broom::tidy(crime_lm) %>%
  mutate(term = recode(term,
                       `(Intercept)` = "Intercept",
                       temp_avg = "Mean temperature (°C)")) %>%
  transmute(Term = term,
            Estimate = round(estimate, 2),
            `Std. error` = round(std.error, 2),
            `t value` = round(statistic, 2),
            `p value` = round(p.value, 4)) %>%
  kbl(caption = "Table 10. OLS regression of monthly crime count on mean temperature.",
      align = "lrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Table 10. OLS regression of monthly crime count on mean temperature.
Term	Estimate	Std. error	t value	p value
Intercept	414.01	28.81	14.37	0.0000
Mean temperature (°C)	7.45	2.37	3.14	0.0106

# Model-level fit statistics
glance_tbl <- broom::glance(crime_lm) %>%
  transmute(`R²` = round(r.squared, 3),
            `Adjusted R²` = round(adj.r.squared, 3),
            `F statistic` = round(statistic, 2),
            `Model p value` = round(p.value, 4),
            `Residual SE` = round(sigma, 1))

glance_tbl %>%
  kbl(caption = "Model fit summary.", align = "rrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#34495e", color = "white")

Model fit summary.
R²	Adjusted R²	F statistic	Model p value	Residual SE
0.496	0.445	9.84	0.0106	41.2

The slope coefficient is the number of crimes per month that is associated with a one degree increase in monthly mean temperature; the R² of the model suggests that temperature alone explains a significant portion of the variation in crime from month to month. The direction and magnitude are consistent with the international evidence: Anderson’s heat hypothesis and Field’s (1992) UK finding that warmth increases recorded crime (our point estimate should be interpreted as indicative, because of the 12-month sample). To see if the fit is due to a non-linear pattern or to one month being influential, the diagnostic plots below look at the residuals.

par(mfrow = c(2, 2))
plot(crime_lm, col = "#2c3e50", pch = 19)

par(mfrow = c(1, 1))

The diagnostics are good, for a model this size. The shape of the residuals-versus- fitted panel indicates that there is no strong curvature (the linear form is sufficient); the normal Q–Q panel indicates that the residuals are close to the reference line (no gross departure from normality); and the leverage panel indicates that no single month has undue influence on the slope. We thus consider the positive temperature–crime relationship as a strong descriptive trend of the 2025 data, and reiterate that confirmatory inference would require daily data over multiple years and a model that would account for calendar effects..

8 Movement IV — Where Does Crime Concentrate?

8.1 An Interactive Crime Map (Leaflet)

Space is the last dimension of the story. Using the incident coordinates, we build an interactive map of Colchester. Markers are clustered for performance and legibility; clicking a cluster zooms in, and clicking an individual marker reveals the crime category, month and street. Click and zoom in/out with the mouse to explore it and click on any cluster to break it apart.

# Sample for responsiveness if the dataset is very large (keeps the HTML light)
map_data <- crime %>%
  select(lat, long, category_clean, month_lab, street_name) %>%
  filter(!is.na(lat), !is.na(long))

pal <- colorFactor(viridis::turbo(length(unique(map_data$category_clean))),
                   domain = map_data$category_clean)

leaflet(map_data) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  setView(lng = mean(map_data$long), lat = mean(map_data$lat), zoom = 13) %>%
  addCircleMarkers(
    lng = ~long, lat = ~lat,
    radius = 4, stroke = FALSE, fillOpacity = 0.6,
    color = ~pal(category_clean),
    clusterOptions = markerClusterOptions(),
    popup = ~paste0("<b>", category_clean, "</b><br>",
                    month_lab, " 2025<br>", street_name)
  ) %>%
  addLegend("bottomright", pal = pal, values = ~category_clean,
            title = "Category", opacity = 0.8)

When zoomed in, crime is not evenly distributed; it is concentrated in the town centre and High Street area, and becomes sparse towards the residential and rural edges. The highest densities are those of the retail core (as mentioned before, consistent with the volumes of shoplifting and public order) and the night-time-economy streets. The map visualises the counts from Movement I in a familiar, geographical context of where a resident is most likely to come into contact with recorded crime.

8.2 A Density Heatmap of Hotspots

A heatmap layer makes the hotspots even clearer by smoothing incident density across the map, independent of the discrete markers.

leaflet(map_data) %>%
  addProviderTiles(providers$CartoDB.DarkMatter) %>%
  setView(lng = mean(map_data$long), lat = mean(map_data$lat), zoom = 13) %>%
  addHeatmap(lng = ~long, lat = ~lat, blur = 22, max = 0.9, radius = 14)

The heatmap encapsulates the geography into one bright core at the centre of town, gradually decreasing in brightness towards the edges. This concentration is the spatial imprint of a “central place” pattern – where crime is where people are, where retail is, where the evening economy is – all of which converge on Colchester’s historic centre.

9 Interactive Highlights

This section collects two fully interactive plotly figures, that allow the reader to go beyond the above static views. Hover over any element to display exact values, click-drag to zoom and double-click to reset.

9.1 Interactive Time Series

p_ts <- monthly_crime %>%
  mutate(month_lab = format(month_date, "%b")) %>%
  plot_ly(x = ~month_date, y = ~incidents, type = "scatter", mode = "lines+markers",
          line = list(color = "#2c3e50", width = 2),
          marker = list(color = "#e74c3c", size = 8),
          hovertemplate = paste("<b>%{x|%B}</b><br>%{y} incidents<extra></extra>")) %>%
  layout(title = list(text = "Interactive monthly crime series"),
         xaxis = list(title = ""), yaxis = list(title = "Incidents per month"))
p_ts

Hovering reveals the precise count for each month and confirms the late-spring peak interactively — the reader can verify, for example, that May is the busiest month without consulting a table.

9.2 Interactive Category × Season Heatmap

heat_df <- crime %>%
  count(category_clean, season) %>%
  tidyr::complete(category_clean, season, fill = list(n = 0))

p_heat <- plot_ly(
  data = heat_df,
  x = ~season, y = ~category_clean, z = ~n,
  type = "heatmap", colorscale = "Viridis",
  hovertemplate = "%{y}<br>%{x}: %{z} incidents<extra></extra>"
) %>%
  layout(title = list(text = "Crime category by season (hover for counts)"),
         xaxis = list(title = ""), yaxis = list(title = ""))
p_heat

The interactive heatmap allows the reader to explore any cell: it’s obvious that the violent-crime row is the brightest in every season, and that the brightness increases in the spring and summer — all in one hoverable image.

10 An Interactive Data Explorer

To be completely transparent, the table is cleaned and merged monthly and presented as a DT widget that can be searched and sorted. Filter with the search box, and sort by any column

merged %>%
  mutate(across(where(is.numeric), ~round(.x, 1))) %>%
  select(Month = month_lab, Crimes = crimes, `Mean °C` = temp_avg,
         `Max °C` = temp_max, `Sun hrs` = sun_hours, `Rain mm` = precip_mm,
         `Humidity %` = humidity, `Cloud (oct)` = cloud_oct,
         `Wind km/h` = wind_kmh, `Pressure hPa` = pressure) %>%
  DT::datatable(
    caption = "Interactive table: cleaned monthly crime and weather data.",
    options = list(pageLength = 12, dom = "ftip"),
    rownames = FALSE
  )

11 Synthesis: The Story the Data Tells

Combining the four movements, a unified story emerges regarding Colchester in 2025. The town’s recorded crime is overwhelmingly made up of interpersonal and acquisitive offences (violence, public order, anti-social behaviour and shoplifting) and the overwhelming majority of these are never identified as having a suspect, a pattern that is remarkably consistent throughout the year. Overlaid on this consistent pattern is an obvious seasonal trend: crime as reported by the police rises from its low point in January to a high point in late spring and summer, before falling back in autumn, and this increase is disproportionately in the kinds of crime one would expect to be sensitive to increased outdoor and night-time activity.

The same seasonal pattern is reflected in the weather. The recorded crime increases with temperature and sunshine and decreases with humidity and cloud over the twelve months; the strongest signal is from temperature and a simple regression gives a positive and sizeable slope to temperature. We do not want to overstate this: The association is exploratory, the weather variables are highly intercorrelated, and the data reflect the recording of offences and not the actual offences. But the trend is clear and agrees with a large criminological literature that suggests that warmth is associated with increased interpersonal crime, whether Anderson’s heat hypothesis or the routine-activity explanation (Cohen and Gonzalez, 2024) of the effect of warmer weather on bringing people out of the home. It is very similar to the most relevant UK study, who also reported increases in crime in England and Wales with increases in temperature(Braakmann, 2024); interestingly, the data suggests that there is also a positive relationship between sunshine and crime, which Field reported as being absent, but this may be due to the strong internal correlation between sunshine and temperature in our small sample period rather than an independent sunshine effect. Finally, in space, it’s all focused on the historic town centre, where retail footfall and the evening economy meet – the geographical centre of the story.

The honest synthesis, then, is that while weather does not cause crime, the warm, bright, sociable months of the year do have a little more of it, particularly the violence and disorder that overflows from a bustling town centre, on Colchester’s recorded crime. The bottom line for a local police service is that the late spring/summer is the known stress time for the year in the centre.

12 Limitations and Future Work

There are a number of caveats to the conclusions. The crime data are only time-stamped by month, meaning there are just twelve observations in the climate analysis, and so the analysis of the day-level and weekend/weekday effects that would be useful to strengthen any causal claim are not possible. The records do not necessarily reflect actual offending but are a reflection of how crime is reported and how police work. The coordinates are deliberately “snapped” to anonymised map points, so the spatial analysis is indicative rather than exact. Correlation is not causation: the temperature is also correlated with daylight, school terms, holidays and events, any of which may be responsible for the seasonal pattern. These threads could be properly disentangled with future work using daily-resolution crime data and multiple years of data and a multivariable model (e.g. Poisson regression of daily counts on weather plus calendar controls).

13 Reproducibility

This report was prepared using R Markdown, which was knitted to HTML. To reproduce it, make sure that crime25.csv and temp25.csv are in the same folder as this .In RStudio, select Rmd file and click Knit; load_pkg() will automatically install any missing packages. The following session information was used to track the actual software versions used.

sessionInfo()

## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.utf8    
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] htmltools_0.5.9      leaflet.extras_2.0.2 leaflet_2.2.3       
##  [4] plotly_4.12.0        DT_0.34.0            kableExtra_1.4.0    
##  [7] knitr_1.51           broom_1.0.13         psych_2.6.5         
## [10] corrplot_0.95        GGally_2.4.0         patchwork_1.3.2     
## [13] ggforce_0.5.0        ggridges_0.5.7       viridis_0.6.5       
## [16] viridisLite_0.4.3    scales_1.4.0         zoo_1.8-15          
## [19] lubridate_1.9.5      forcats_1.0.1        stringr_1.6.0       
## [22] dplyr_1.2.1          purrr_1.2.2          readr_2.2.0         
## [25] tidyr_1.3.2          tibble_3.3.1         ggplot2_4.0.3       
## [28] tidyverse_2.0.0     
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        farver_2.1.2            S7_0.2.2               
##  [4] fastmap_1.2.0           lazyeval_0.2.3          tweenr_2.0.3           
##  [7] digest_0.6.39           timechange_0.4.0        lifecycle_1.0.5        
## [10] magrittr_2.0.5          compiler_4.5.2          rlang_1.1.7            
## [13] sass_0.4.10             tools_4.5.2             yaml_2.3.12            
## [16] data.table_1.18.4       labeling_0.4.3          htmlwidgets_1.6.4      
## [19] bit_4.6.0               mnormt_2.1.2            xml2_1.6.0             
## [22] RColorBrewer_1.1-3      withr_3.0.2             grid_4.5.2             
## [25] polyclip_1.10-7         MASS_7.3-65             cli_3.6.5              
## [28] rmarkdown_2.30          crayon_1.5.3            generics_0.1.4         
## [31] otel_0.2.0              rstudioapi_0.19.0       httr_1.4.8             
## [34] tzdb_0.5.0              cachem_1.1.0            splines_4.5.2          
## [37] parallel_4.5.2          vctrs_0.7.3             Matrix_1.7-4           
## [40] jsonlite_2.0.0          hms_1.1.4               bit64_4.8.0            
## [43] systemfonts_1.3.2       crosstalk_1.2.2         jquerylib_0.1.4        
## [46] glue_1.8.1              leaflet.providers_3.0.0 ggstats_0.13.0         
## [49] stringi_1.8.7           gtable_0.3.6            pillar_1.11.1          
## [52] R6_2.6.1                textshaping_1.0.5       vroom_1.7.1            
## [55] evaluate_1.0.5          lattice_0.22-7          backports_1.5.1        
## [58] bslib_0.10.0            svglite_2.2.2           gridExtra_2.3          
## [61] nlme_3.1-168            mgcv_1.9-3              xfun_0.56              
## [64] pkgconfig_2.0.3

14 References

Citations follow APA style. The substantive criminological sources were verified against their original journals; the software references credit the R packages used to produce the analysis.

References

BRAAKMANN, N. 2024. Temperature, crime and policing: Evidence from UK geocoded data. Center for Open Science. file:///C:/Users/Admin/Downloads/paper_jun24%20(5).pdf

BURT, S. 2024. The weather observer’s handbook, Cambridge University Press. https://books.google.co.ke/books?hl=en&lr=&id=AeAHEQAAQBAJ&oi=fnd&pg=PR9&dq=The+sec+meteorological+observations+for+the+same+year+from+a+weather+station+close+to+the+town,+including+temperature,+humidity,+wind,+atmospheric+pressure,+cloud+cover,+sunshine+and+precipitation&ots=Q7-1yb_xo5&sig=-Ip-ii_R8J4ROxQBzTTe1F9s_to&redir_esc=y#v=onepage&q&f=false

COHEN, F. & GONZALEZ, F. 2024. Understanding the link between temperature and crime. American Economic Journal: Economic Policy, 16, 480-514. https://www.aeaweb.org/articles?id=10.1257/pol.20220118

CORCORAN, J. & ZAHNOW, R. 2022. Weather and crime: a systematic review of the empirical literature. Crime Science, 11, 16. https://link.springer.com/article/10.1186/s40163-022-00179-8

LYNOTT, D., CORKER, K., CONNELL, L. & O’BRIEN, K. 2023. The effects of temperature on prosocial and antisocial behaviour: A review and meta‐analysis. British journal of social psychology, 62, 1177-1214. https://bpspsychub.onlinelibrary.wiley.com/doi/full/10.1111/bjso.12626

MILES-NOVELO, A. & ANDERSON, C. A. 2022. Climate change and human behavior: Impacts of a rapidly changing climate on human aggression and violence, Cambridge University Press. https://www.cambridge.org/core/elements/abs/climate-change-and-human-behavior/F64471FA47B8A6F5524E7DDDDE571D57

PARK, J. H., LEE, D. K., KANG, H., KIM, J. H., NAHM, F. S., AHN, E., IN, J., KWAK, S. G. & LIM, C.-Y. 2022. The principles of presenting statistical results using figures. Korean journal of anesthesiology, 75, 139-150. https://synapse.koreamed.org/articles/1159843

Software and data sources

R Core Team (2026). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.
Wickham, H., et al. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC.
Cheng, J., Karambelkar, B., & Xie, Y. (2023). leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library. R package.
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University. R package.
Robinson, D., Hayes, A., & Couch, S. (2023). broom: Convert Statistical Objects into Tidy Tibbles. R package.

Crime and Climate in Colchester (2025) An Exploratory Visual Analysis

MA304 — Data Visualisation | Final Project

MA304-2513541

02 July 2026