Inside this work, we will be going through two different data sets, one consists of the estimated location of street-level crime rates committed in the UK, and the other one consists of meteorological data, for both of these we will be focusing on the Colchester’s area and trying to identify if there is some relationship or some interesting findings among these data sets.
#Read data bases
dfc <- read.csv("crime25.csv", header=TRUE)
dft <- read.csv("temp25.csv", header=TRUE)
As previously mentioned, two different datasets will be examined in this study. One natural question that arises is whether weather conditions influence human behaviour, and in particular, whether people are more likely to commit crimes under certain conditions.
For instance, does a warm and pleasant day encourage more activity (and perhaps more opportunities for crime), or do criminals carry on regardless of whether it is sunny, cold, or raining? Rather than assuming a simple answer, this analysis allows the data to guide the conclusions, exploring whether any meaningful relationships emerge. So, let’s see if the lovely weather of Colchester has actually more benefits than we think.
Before exploring any possible relationship between crime and weather, both datasets were first inspected and cleaned. This stage was necessary to ensure that variables were stored in suitable formats, missing values were identified, and the data could later be aligned for comparison.
The crime dataset and the weather dataset were initially examined using summary and structure functions in R in order to understand the variables available in each file. Particular attention was given to date-related variables, since the later stages of the analysis require crime observations and daily weather measurements to be compared on a common time scale.
Using data manipulation tools, the datasets were then prepared by selecting the relevant variables, checking for incomplete observations, and creating any additional variables needed for analysis. This preprocessing stage provided the foundation for the visualisations and comparisons developed in the remainder of the report.
dfc_clean <- dfc %>%
select(-X, -context) %>% # remove useless columns
mutate(date = as.Date(paste0(date, "-01")))
dft_clean <- dft %>%
mutate(Date = as.Date(Date))
dft_monthly <- dft_clean %>%
mutate(month = format(Date, "%Y-%m")) %>%
group_by(month) %>%
summarise(temp_avg = mean(TemperatureCAvg, na.rm = TRUE),temp_max = mean(TemperatureCMax, na.rm = TRUE),temp_min = mean(TemperatureCMin, na.rm = TRUE),precipitation = mean(Precmm, na.rm = TRUE),sunshine = mean(SunD1h, na.rm = TRUE), wind = mean(WindkmhInt, na.rm = TRUE))
dfc_monthly <- dfc_clean %>%
mutate(month = format(date, "%Y-%m")) %>%
group_by(month) %>%
summarise(crime_count = n())
Before conducting the main analysis, both datasets were cleaned and prepared to ensure consistency and comparability.
The crime dataset contained observations recorded at a monthly level, while the weather dataset provided daily measurements. To have a proper comparison, the weather data was aggregated to a monthly level by computing average values for key variables such as temperature, precipitation, and sunshine duration. However, aggregating daily weather data into monthly averages may mask short-term variations and extreme conditions, meaning that potential relationships at a finer temporal scale may not be fully captured.
Data cleaning involved removing variables with no useful information, converting date variables into appropriate formats, and selecting relevant variables for analysis. In particular, the crime data was summarised into monthly counts, allowing it to be aligned with the aggregated weather data.
This preprocessing stage ensured that both datasets shared a common temporal structure, providing a suitable foundation for subsequent visualisations and analysis.
df_merged <- dfc_monthly %>%
left_join(dft_monthly, by = "month")
With “left join” we can merge both data sets, keeping all crime data, while adding weather info where available.
# Quick summary to see the merged data
summary(df_merged)
## month crime_count temp_avg temp_max
## Length:12 Min. :408.0 Min. : 3.455 Min. : 6.319
## Class :character 1st Qu.:460.5 1st Qu.: 6.945 1st Qu.:10.947
## Mode :character Median :493.5 Median :10.687 Median :15.285
## Mean :496.3 Mean :11.053 Mean :15.550
## 3rd Qu.:531.2 3rd Qu.:15.059 3rd Qu.:20.527
## Max. :598.0 Max. :18.526 Max. :24.006
## temp_min precipitation sunshine wind
## Min. : 0.04839 Min. :0.1290 Min. :1.827 Min. :13.74
## 1st Qu.: 2.75829 1st Qu.:0.7726 1st Qu.:2.238 1st Qu.:14.71
## Median : 5.69398 Median :1.2300 Median :6.180 Median :16.35
## Mean : 6.09279 Mean :1.3140 Mean :5.137 Mean :15.93
## 3rd Qu.: 9.54333 3rd Qu.:1.5323 3rd Qu.:7.243 3rd Qu.:16.99
## Max. :12.70968 Max. :3.3133 Max. :8.703 Max. :17.74
Frequency table of crime categories
# Frequency table of crime categories
crime_table <- dfc_clean %>%
count(category, sort = TRUE)
crime_table
## category n
## 1 violent-crime 2439
## 2 shoplifting 709
## 3 anti-social-behaviour 590
## 4 public-order 452
## 5 criminal-damage-arson 403
## 6 other-theft 348
## 7 vehicle-crime 291
## 8 drugs 197
## 9 burglary 140
## 10 bicycle-theft 102
## 11 other-crime 89
## 12 robbery 80
## 13 theft-from-the-person 63
## 14 possession-of-weapons 53
Before moving to the graphical analysis, it is useful to examine the frequency of crime categories directly. The table shows that crime is not evenly distributed across offence types, with some categories appearing far more often than others. This provides an initial indication that any overall relationship between weather and crime may be driven more strongly by certain types of offences than by others.
Bar plot to show crime over time
library(ggplot2)
ggplot(df_merged, aes(x = month, y = crime_count)) +geom_bar(stat = "identity", fill = "steelblue") + labs(title = "Monthly Crime Counts in Colchester (2025)",x = "Month",y = "Number of Crimes") + theme_minimal()
To complement the monthly bar chart, a density plot was used to examine
the overall distribution of monthly crime counts.
ggplot(df_merged, aes(x = crime_count)) + geom_density(fill = "lightblue", alpha = 0.5) + labs(title = "Density Plot of Monthly Crime Counts", x = "Monthly Crime Count", y = "Density") + theme_minimal()
The density plot suggests that monthly crime counts are concentrated
around the middle range of observed values, with relatively few months
showing extremely low or extremely high totals. This indicates that
crime levels fluctuate across the year, but not in a completely
irregular way.
Temperature over time
ggplot(df_merged, aes(x = month, y = temp_avg, group = 1)) +geom_line(color = "red") +geom_point() +labs(title = "Average Monthly Temperature",x = "Month",y = "Temperature (°C)") +theme_minimal()
Precipitation over time
ggplot(df_merged, aes(x = month, y = precipitation, group = 1)) +geom_line(color = "blue") +geom_point() +labs(title = "Average Monthly Precipitation",x = "Month",y = "Rainfall (mm)") +theme_minimal()
Average monthly wind speed
ggplot(df_merged, aes(x = month, y = wind, group = 1)) +geom_line(color = "darkgreen") +geom_point() +labs(title = "Average Monthly Wind Speed",x = "Month",y = "Wind (km/h)") +theme_minimal()
Following the data preparation stage, the crime and weather datasets
were merged using a common monthly time variable, as stated earlier.
This allowed for a direct comparison between crime levels and average
weather conditions across the year.
An initial overview of the data was conducted to understand general patterns. Monthly crime counts were visualised to identify any noticeable fluctuations over time. At the same time, key weather variables, including temperature, precipitation, and wind speed, were plotted to observe seasonal trends. These initial visualisations provide a foundation for exploring potential relationships between crime and weather. In particular, they allow for a preliminary assessment of whether periods of higher crime coincide with specific weather conditions, such as warmer temperatures or lower rainfall.
An initial exploration of the data reveals several interesting patterns, for instance, monthly crime counts show a noticeable increase from the beginning of the year, rising from relatively lower levels in January to a peak around late spring and summer. This might initially suggest that crime becomes more frequent as the weather improves, supporting the intuitive idea that people are more active during warmer months.
However, the relationship is not entirely straightforward. While crime appears to increase alongside temperature in the first half of the year, this pattern does not perfectly hold throughout. For example, although temperature reaches its peak in July, crime does not show a corresponding maximum at exactly the same point, and relatively high crime levels persist even as temperatures begin to decline later in the year. This suggests that temperature alone may not fully explain variations in crime.
When examining precipitation, no clear relationship with crime levels emerges. Rainfall fluctuates considerably throughout the year, with some of the lowest values observed in early spring and the highest toward late autumn. Despite this variation, crime levels do not appear to decrease consistently during wetter periods, indicating that rain alone is unlikely to be an obstacle at all.
Similarly, wind speed shows only minor variation across the year and does not display any obvious association with crime patterns. If anything, it seems that wind is largely ignored by both the weather system and, perhaps unsurprisingly, by those committing crimes.
Overall, these initial observations suggest that while weather conditions, particularly temperature, may have some influence on crime patterns, the relationship is not simple or uniform. This motivates a more detailed investigation into whether specific types of crime respond differently to weather conditions.
pairs(df_merged[, c("crime_count", "temp_avg", "precipitation", "wind")],
pch = 19, cex = 0.8)
As an initial step in examining the relationships among the main
variables, a pair plot was produced for crime count, temperature,
precipitation, and wind speed. This provides a compact overview of the
direction and strength of the associations. The plot suggests a clearer
positive relationship between crime and temperature than between crime
and either precipitation or wind, which appear much weaker and less
structured.
Crime vs Temperature
G_temp <- ggplot(df_merged, aes(x = temp_avg, y = crime_count)) +geom_point(color = "red", size = 3) + geom_smooth(method = "lm", se = FALSE, color = "black") +labs(title = "Crime vs Temperature",x = "Average Temperature (°C)",y = "Crime Count") + theme_minimal()
library(plotly)
## Warning: package 'plotly' was built under R version 4.5.3
##
## Adjuntando el paquete: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
ggplotly(G_temp)
## `geom_smooth()` using formula = 'y ~ x'
Crime vs Rainfall
G_rain <- ggplot(df_merged, aes(x = precipitation, y = crime_count)) +geom_point(color = "blue", size = 3) + geom_smooth(method = "lm", se = FALSE, color = "black") +labs(title = "Crime vs Rainfall",x = "Precipitation (mm)", y = "Crime Count") + theme_minimal()
G_rain
## `geom_smooth()` using formula = 'y ~ x'
Crime vs Wind
G_wind <- ggplot(df_merged, aes(x = wind, y = crime_count)) +geom_point(color = "darkgreen", size = 3) +geom_smooth(method = "lm", se = FALSE, color = "black") +labs(title = "Crime vs Wind Speed",x = "Wind (km/h)",y = "Crime Count") +theme_minimal()
G_wind
## `geom_smooth()` using formula = 'y ~ x'
data_corr <- df_merged %>%
select(crime_count, temp_avg, precipitation, wind)
Corr <- cor(data_corr)
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.5.3
G_corr <- ggcorrplot(Corr, hc.order = TRUE, type = "lower", lab = TRUE)
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggcorrplot package.
## Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
G_corr
ggplotly(G_corr)
To further investigate the relationship between weather conditions and crime levels, scatter plots were used to compare monthly crime counts with key weather variables, including temperature, precipitation, and wind speed. Linear trend lines were added to highlight any underlying patterns.
The relationship between crime and temperature appears to be the most notable. The scatter plot shows a clear upward trend, suggesting that higher temperatures are associated with increased levels of crime. This observation is supported by the correlation analysis, which indicates a relatively strong positive relationship between temperature and crime. In simple terms, as the weather gets warmer, crime tends to increase, although, of course, temperature is unlikely to be the only factor at play.
In contrast, the relationship between crime and precipitation is weaker and appears to follow a slight negative trend. Higher levels of rainfall are generally associated with lower crime counts, although the effect is not particularly strong. This suggests that while rain might discourage certain activities (criminal or otherwise), its overall impact is limited.
Wind speed, on the other hand, shows almost no meaningful relationship with crime. The scatter plot reveals no clear pattern, and the correlation coefficient is very close to zero. It seems that, regardless of how windy it is, crime carries on more or less unaffected.
These findings suggest that temperature plays a more prominent role in influencing crime patterns than other weather variables considered in this analysis. However, the variability observed in the data indicates that weather alone does not fully explain crime behaviour, reinforcing the need to explore additional factors, such as crime type and location. It is also important to note that the analysis is based on only twelve monthly observations, which limits the strength and reliability of the statistical relationships identified. These results represent associations rather than causal relationships, and it cannot be concluded that changes in temperature directly cause changes in crime levels.
dfc_cat_monthly <- dfc_clean %>%
mutate(month = format(date, "%Y-%m")) %>%
group_by(month, category) %>%
summarise(count = n(), .groups = "drop")
# Merge with weather
df_cat_merged <- dfc_cat_monthly %>%
left_join(dft_monthly, by = "month")
top_categories <- dfc_cat_monthly %>%
group_by(category) %>%
summarise(total = sum(count)) %>%
arrange(desc(total)) %>%
slice(1:5) %>%
pull(category)
# Filter Data
df_top <- df_cat_merged %>%
filter(category %in% top_categories)
Plot by category
G_cat <- ggplot(df_top, aes(x = temp_avg, y = count)) + geom_point(color = "darkred") + geom_smooth(method = "lm", se = FALSE, color = "black") + facet_wrap(~category, scales = "free_y") + labs(title = "Crime vs Temperature by Category", x = "Temperature (°C)", y = "Crime Count") + theme_minimal()
Fig_cat <- ggplotly(G_cat)
## `geom_smooth()` using formula = 'y ~ x'
Fig_cat
Interactive versions of selected figures are embedded directly in the HTML version of this report. These can be explored by hovering over points, zooming into regions of interest, and inspecting individual observations more closely.
Boxplots
ggplot(df_top, aes(x = category, y = count, fill = category)) + geom_boxplot() + labs(title = "Distribution of Crime Counts by Category", x = "Crime Type", y = "Count") + theme_minimal()
To continue investigating whether weather affects different types of crime in distinct ways, the analysis was extended by examining crime counts by category. The results provide a much clearer and more nuanced picture than the aggregate analysis alone.
The relationship between temperature and crime varies noticeably across different categories, violent crime shows the strongest association with temperature, with a clear upward trend indicating that incidents increase as conditions become warmer. This pattern is particularly pronounced compared to other categories, suggesting that this type of crime may be more sensitive to changes in environmental conditions.
Shoplifting also exhibits a noticeable positive relationship with temperature, although the trend is less steep than that observed for violent crime. This may reflect increased activity in public and commercial spaces during warmer periods, leading to more opportunities for such offences.
Other categories, such as anti-social behaviour and public order offences, display weaker positive trends. While there is some indication that these crimes become more frequent in warmer weather, the relationship is less consistent and shows greater variability.
In contrast, criminal damage and arson show little to no clear relationship with temperature. The data points are more scattered, and the trend line suggests only a weak association, indicating that these offences are likely influenced by factors other than weather conditions.
These findings demonstrate that the previously observed relationship between temperature and total crime is largely driven by specific categories rather than being a universal effect. In other words, while warmer weather may coincide with higher levels of certain types of crime, it does not affect all criminal activity in the same way.
Crime over time (with smoothing)
ggplot(df_merged, aes(x = as.Date(paste0(month, "-01")), y = crime_count)) +geom_line(color = "steelblue")+ geom_point() + geom_smooth(se = FALSE, color = "black") + labs(title = "Crime Trends Over Time",x = "Month",y = "Crime Count") + theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Temperature over time (with smoothing)
ggplot(df_merged, aes(x = as.Date(paste0(month, "-01")), y = temp_avg)) + geom_line(color = "red") + geom_point() + geom_smooth(se = FALSE, color = "black") + labs(title = "Temperature Trends Over Time",x = "Month",y = "Temperature (°C)") + theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
To examine whether the observed relationship between temperature and
crime reflects a direct influence or a broader seasonal pattern, time
series plots were analysed for both variables.
Both crime counts and temperature display a clear seasonal trend throughout the year. Crime levels increase from the beginning of the year, reaching a peak in late spring, before gradually declining toward the end of the year. Temperature follows a similar overall pattern, rising steadily to a peak during the summer months and then decreasing thereafter.
However, a closer inspection reveals that the two variables do not align perfectly. While temperature reaches its highest point in mid-summer, crime peaks earlier in the year, around late spring. This mismatch suggests that although temperature and crime follow a broadly similar seasonal pattern, temperature alone does not fully explain the timing of crime fluctuations.
This finding indicates that the apparent relationship between temperature and crime may be partially driven by shared seasonal dynamics rather than a direct causal effect. In other words, both variables are influenced by time, but they do not move in perfect synchrony.
Overall, the time series analysis reinforces the idea that while temperature is associated with crime levels, it is unlikely to be the sole determining factor. Other influences, such as social behaviour, routine activity patterns, or location-specific factors, are likely to play an important role in shaping crime trends over time.
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.5.3
leaflet(dfc_clean) %>%
addTiles() %>%
addCircleMarkers(
lng = ~long,
lat = ~lat,
radius = 3,
color = "red",
stroke = FALSE,
fillOpacity = 0.5)
To further enhance the analysis, a spatial visualisation was added to provide further insights into the data. A geographical visualisation of the crime incidents was created, and a Leaflet map was used to display the crimes based on their locations. The map offers a comprehensive view of the distribution of crimes in Colchester. The results indicate that there is a substantial concentration of crimes in specific areas, with higher concentrations in the town centre and other urban areas. The outskirts have fewer incidents compared to the town centre.
This demonstrates an important aspect of the data, which could not have been shown in the previous visualisations. Although the previous results showed that there is a substantial association between crimes and weather, based on the map, it can be concluded that the location of the crimes in specific areas plays an important role in determining the crimes. This demonstrates that although weather, especially temperature, plays an important role in determining the amount of crimes in an area, the nature and distribution of crimes in an area can be attributed to other spatial factors. This further demonstrates the earlier conclusion that crimes in an area can never be attributed to a single factor but to the combined effects of multiple factors. It should also be recognised that the map represents raw incident locations without accounting for underlying population density or exposure, meaning that higher concentrations may partly reflect areas with greater activity rather than inherently higher crime risk.
This analysis was designed with the aim of determining whether weather conditions, specifically temperature, have any bearing on crime trends in Colchester in 2025. Through this analysis, a number of significant findings were established.
The analysis indicates that temperature has a positive relationship with crime trends in general, with higher crime levels generally observed during warmer periods. On the other hand, precipitation and wind levels were found to have little bearing on crime trends, with little to no relationship observed.
This would suggest, in a general sense, that the common notion of “good weather, good times, and a lot of activity, which equates to a lot of crime.” However, further analysis of this trend indicates that this is not necessarily the case, with certain types of crime having a stronger relationship with temperature than others, while some have little to no relationship at all.
Additionally, the time series analysis shows the effect of seasonality. It is evident that the temperature and crime rates follow the same pattern throughout the year. However, the two variables do not reach their peaks at exactly the same time. This indicates that the relationship between the two variables might be explained by the general seasonal effects and not the direct relationship between them.
Finally, the spatial visualisation provides yet another perspective. It is evident that the crime rates are concentrated in particular areas, mostly in the central business districts. This indicates that the geographical and environmental factors might be very important. It might even be suggested that the location is as important, if not even more important, than the weather in the context of the crime rates. For future work, it could be beneficial to use a crime dataset with daily observations rather than monthly data, in order to allow for a more detailed comparison.
In conclusion, it is evident that the weather, and even more so the temperature, is related to the crime rates. Nevertheless, it is not the only factor. Crime rates can be explained by the combination of different factors, including time, location, and the type of crime. In other words, while a warm day might create the conditions for certain types of crime to increase, it is far from the only piece of the puzzle. This analysis highlights the importance of combining exploratory techniques with critical interpretation, as apparent patterns in the data may reflect underlying structural factors rather than direct causal effects.