NYPD Arrest Rate vs Weather Variables

Author

Rashad Long

Introduction

The NYPD Arrests Data-sets provides a detailed record of arrests in New York City during this period. It serves as a valuable resource for understanding crime patterns and trends within the city. The dataset encompasses a wide range of information, including the demographic details of individuals arrested, the types of crimes committed, and the locations where arrests occurred.

Investigating the correlation between arrest rates and weather variables can yield significant insights that inform critical decisions across various domains. This analysis has the potential to influence public policy, policing strategies, and even resource allocation through tax optimization.

Data

NYPD Arrests Data (Historic) - List of every arrest in NYC going back to 2006 through the end of the previous calendar year. This is a breakdown of every arrest effected in NYC by the NYPD going back to 2006 through the end of the previous calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement.

NYPD Arrest Data (Year to Date) - This is a breakdown of every arrest effected in NYC by the NYPD during the current year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement.

Weather Data - Visual Crossing Weather is the easiest-to-use and lowest-cost source for historical and forecast weather data. The Weather API is designed to integrate easily into any app or code, and prices are lower than any other provider in the industry. The data is used daily by a diverse customer-base including business analysts, data scientists, insurance professionals, energy producers, construction planners, and academics.

# Install required packages if not already available
if (!require("readr"))
  install.packages("readr")
library(readr)

if (!require("tidyverse"))
  install.packages("tidyverse")
library(tidyverse)

if (!require("infer"))
  install.packages("infer")
library(infer)

if (!require("ggplot2"))
  install.packages("ggplot2")
library(ggplot2)

if (!require("RSocrata"))
  install.packages("RSocrata")
library(RSocrata)

if (!require("httr"))
  install.packages("httr")
library(httr)

Import Data

First I read the csv files into data-frames. First the Weather Data then the 2 NYPD Data sets.

  arrest_key arrest_date pd_cd   law_code law_cat_cd arrest_boro
1  236791704  2021-11-22   581 PL 2225001          M           M
2  237354740  2021-12-04   153 PL 1302502          F           B
3  236081433  2021-11-09   681 PL 2601001          M           Q
4   32311380  2007-06-18   511 PL 2200300          M           Q
5  192799737  2019-01-26   177 PL 1306503          F           M
6  193260691  2019-02-06  <NA> PL 2203400          F           M
  arrest_precinct jurisdiction_code age_group perp_sex      perp_race
1              28                 0     45-64        M          BLACK
2              41                 0     25-44        M WHITE HISPANIC
3             113                 0     25-44        M          BLACK
4              27                 1     18-24        M          BLACK
5              25                 0     45-64        M          BLACK
6              14                 0     25-44        M        UNKNOWN
  x_coord_cd y_coord_cd           latitude           longitude lon_lat.type
1     997427     230378 40.799008797000056  -73.95240854099995        Point
2    1013232     236725 40.816391847000034  -73.89529641399997        Point
3    1046367     186986  40.67970040800003  -73.77604736799998        Point
4       <NA>       <NA>               <NA>                <NA>         <NA>
5    1000555     230994 40.800694331000045 -73.941109285999971        Point
6     986685     215375 40.757839003000072 -73.991212110999982        Point
  lon_lat.coordinates                            pd_desc ky_cd       ofns_desc
1 -73.95241, 40.79901                               <NA>  <NA>            <NA>
2 -73.89530, 40.81639                             RAPE 3   104            RAPE
3 -73.77605, 40.67970         CHILD, ENDANGERING WELFARE   233      SEX CRIMES
4                NULL CONTROLLED SUBSTANCE, POSSESSION 7   235 DANGEROUS DRUGS
5 -73.94111, 40.80069                       SEXUAL ABUSE   116      SEX CRIMES
6 -73.99121, 40.75784                               <NA>  <NA>            <NA>
  arrest_key arrest_date pd_cd                        pd_desc ky_cd
1  279764521  2024-01-01   109       ASSAULT 2,1,UNCLASSIFIED   106
2  279766985  2024-01-01   905    INTOXICATED DRIVING,ALCOHOL   347
3  279766989  2024-01-01   922 TRAFFIC,UNCLASSIFIED MISDEMEAN   348
4  279767308  2024-01-01   918               RECKLESS DRIVING   348
5  279767573  2024-01-01   109       ASSAULT 2,1,UNCLASSIFIED   106
6  279768912  2024-01-01   268             CRIMINAL MIS 2 & 3   121
                       ofns_desc   law_code law_cat_cd arrest_boro
1                 FELONY ASSAULT PL 1200501          F           Q
2 INTOXICATED & IMPAIRED DRIVING VTL11920U2          M           M
3       VEHICLE AND TRAFFIC LAWS VTL0511001          M           S
4       VEHICLE AND TRAFFIC LAWS VTL1212000          M           K
5                 FELONY ASSAULT PL 1200502          F           B
6 CRIMINAL MISCHIEF & RELATED OF PL 1450502          F           Q
  arrest_precinct jurisdiction_code age_group perp_sex      perp_race
1             114                 0     45-64        M          WHITE
2               6                 0     25-44        M          BLACK
3             120                 0     18-24        M          BLACK
4              76                 0     25-44        M          BLACK
5              41                 0       <18        M BLACK HISPANIC
6             111                 0     18-24        M          WHITE
  x_coord_cd y_coord_cd           latitude          longitude
1    1002279     222018          40.776047         -73.934905
2     981400     206384 40.733152570631795 -74.01028350206072
3     964114     166065  40.62246362389224 -74.07253516606062
4     984493     190398  40.68927525699562 -73.99912377311799
5    1015773     240835  40.82765564227593 -73.88609567466663
6    1047645     217681          40.763933         -73.771149
  geocoded_column.type geocoded_column.coordinates
1                Point         -73.93491, 40.77605
2                Point         -74.01028, 40.73315
3                Point         -74.07254, 40.62246
4                Point         -73.99912, 40.68928
5                Point         -73.88610, 40.82766
6                Point         -73.77115, 40.76393
# Replace '8XZYX6CMANT9QRNP4Q4WEWKGJ' with your actual Visual Crossing API key
api_key <- "8XZYX6CMANT9QRNP4Q4WEWKGJ"

# Base URL for the weather data API
base_url <- "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/"

# Define location and date range
location <- "New%20York%20City%2CUSA"
start_date <- "2006-01-01"
end_date <- "2024-05-04"

# Build the API request URL
url <- paste0(
  base_url,
  location,
  "/",
  start_date,
  "/",
  end_date,
  "?unitGroup=metric&include=days&key=",
  api_key,
  "&contentType=csv"
)

# Send GET request and handle errors
response <- GET(url)

if (!response$status_code == 200) {
  stop(paste0("Unexpected Status code: ", response$status_code))
}

# Read CSV data from response content
weather_data <- read_csv(response$content)

head(weather_data)
# A tibble: 6 × 33
  name      datetime   tempmax tempmin  temp feelslikemax feelslikemin feelslike
  <chr>     <date>       <dbl>   <dbl> <dbl>        <dbl>        <dbl>     <dbl>
1 New York… 2006-01-01     5.8    -0.1   2.9          5.6         -2.5       2  
2 New York… 2006-01-02     7.6     3.1   5            6.9          0.5       3.4
3 New York… 2006-01-03     4.5     1.9   2.8          0.3         -3.5      -2.3
4 New York… 2006-01-04     2.9    -1.7   1.1          0.9         -6.8      -2.4
5 New York… 2006-01-05     9.9     2.7   5.8          7.2          0.8       3.8
6 New York… 2006-01-06     5.4    -1     3.2          5.4         -5.8      -0.1
# ℹ 25 more variables: dew <dbl>, humidity <dbl>, precip <dbl>,
#   precipprob <dbl>, precipcover <dbl>, preciptype <chr>, snow <dbl>,
#   snowdepth <dbl>, windgust <dbl>, windspeed <dbl>, winddir <dbl>,
#   sealevelpressure <dbl>, cloudcover <dbl>, visibility <dbl>,
#   solarradiation <dbl>, solarenergy <dbl>, uvindex <dbl>, severerisk <dbl>,
#   sunrise <dttm>, sunset <dttm>, moonphase <dbl>, conditions <chr>,
#   description <chr>, icon <chr>, stations <chr>

Tidy and Manipulate Data

With the data successfully loaded, we can commence the data cleaning and manipulation phase. We will prioritize the NYPD data-sets, followed by the weather dataset, to ensure a consistent workflow.

NYPD Datasets

  • Combined historic and year to date arrest data
  • Made all column names lowercase
  • Removed every column except arrest_date, arrest_boro, age_group, perp_sex, and perp_race
  • Changed name of arrest_date column to date
  • Adjusted the arrest_boro column to display the full names
  • Set date column as date object
#Combine historic and year to date arrest data
arrest_data <- bind_rows(arrest_data_ytd, historic_arrest_data)

# Make all column names lowercase
colnames(arrest_data) <- tolower(colnames(arrest_data))

# Remove every column except arrest_date, arrest_boro, age_group, perp_sex, and perp_race
arrest_data <-
  arrest_data |> select(arrest_date, arrest_boro, age_group, perp_sex, perp_race)

# Change name of arrest_date column to date
colnames(arrest_data)[colnames(arrest_data) == "arrest_date"] <-
  "date"

# Adjust the arrest_boro column to display the full names
arrest_data$arrest_boro <- case_when(
  arrest_data$arrest_boro == "M" ~ "Manhattan",
  arrest_data$arrest_boro == "B" ~ "Bronx",
  arrest_data$arrest_boro == "K" ~ "Brooklyn",
  arrest_data$arrest_boro == "Q" ~ "Queens",
  arrest_data$arrest_boro == "S" ~ "Staten Island",
  TRUE ~ arrest_data$arrest_boro
)

# Set date column as date object
arrest_data$date <- as.Date(arrest_data$date, format = "%m/%d/%Y")

head(arrest_data)
        date   arrest_boro age_group perp_sex      perp_race
1 2024-01-01        Queens     45-64        M          WHITE
2 2024-01-01     Manhattan     25-44        M          BLACK
3 2024-01-01 Staten Island     18-24        M          BLACK
4 2024-01-01      Brooklyn     25-44        M          BLACK
5 2024-01-01         Bronx       <18        M BLACK HISPANIC
6 2024-01-01        Queens     18-24        M          WHITE

Weather data

  • Removed unneeded columns
  • Convert temperatures to Fahrenheit
  • Convert precip column to inches
  • Adjust variable names
  • Create new variable called “isRain”
  • Set date column as date object
# Remove unneeded columns
weather_data <-
  weather_data |>  select(datetime, temp, precip, humidity, conditions)
head(weather_data)
# A tibble: 6 × 5
  datetime    temp precip humidity conditions            
  <date>     <dbl>  <dbl>    <dbl> <chr>                 
1 2006-01-01   2.9   1.76     76.1 Snow, Rain, Overcast  
2 2006-01-02   5    11.1      75   Rain, Partially cloudy
3 2006-01-03   2.8  31.3      88   Rain, Overcast        
4 2006-01-04   1.1   0        69.7 Partially cloudy      
5 2006-01-05   5.8   1.28     75.9 Rain, Partially cloudy
6 2006-01-06   3.2   0        60.3 Overcast              
# Convert temperatures to Fahrenheit
weather_data$temp <- (weather_data$temp * 9 / 5) + 32

# Convert precip variable to inches
weather_data$precip <- round(weather_data$precip * 0.0393701, 3)

# Adjust variable names
colnames(weather_data)[colnames(weather_data) == "datetime"] <-
  "date"

# Create new variable called "isRain"
weather_data$isRain <-
  grepl("rain", weather_data$conditions, ignore.case = TRUE)

# Remove conditions column
weather_data <- weather_data |> select(-conditions)

# Create Date object
weather_data$date <- as.Date(weather_data$date, format = "%m/%d/%Y")

head(weather_data)
# A tibble: 6 × 5
  date        temp precip humidity isRain
  <date>     <dbl>  <dbl>    <dbl> <lgl> 
1 2006-01-01  37.2  0.069     76.1 TRUE  
2 2006-01-02  41    0.435     75   TRUE  
3 2006-01-03  37.0  1.23      88   TRUE  
4 2006-01-04  34.0  0         69.7 FALSE 
5 2006-01-05  42.4  0.051     75.9 TRUE  
6 2006-01-06  37.8  0         60.3 FALSE 

The final step involves integrating the two data-sets using the date attribute as the key. This consolidated dataset will serve as the foundation for our subsequent analysis, enabling us to explore the potential relationships between arrest rates and weather conditions

# Left join
combined_data <- left_join(arrest_data, weather_data, by = "date")
head(combined_data[order(combined_data$date), ], 20)
             date   arrest_boro age_group perp_sex      perp_race  temp precip
998703 2006-01-01      Brooklyn     25-44        M          BLACK 37.22  0.069
998716 2006-01-01     Manhattan     18-24        M WHITE HISPANIC 37.22  0.069
998720 2006-01-01        Queens     18-24        F WHITE HISPANIC 37.22  0.069
998847 2006-01-01      Brooklyn       <18        M          BLACK 37.22  0.069
998874 2006-01-01     Manhattan     25-44        M          BLACK 37.22  0.069
998878 2006-01-01     Manhattan       <18        M          WHITE 37.22  0.069
998885 2006-01-01      Brooklyn     18-24        M WHITE HISPANIC 37.22  0.069
998905 2006-01-01      Brooklyn     18-24        M          BLACK 37.22  0.069
998907 2006-01-01        Queens     18-24        M          BLACK 37.22  0.069
998931 2006-01-01         Bronx     25-44        F          BLACK 37.22  0.069
998951 2006-01-01     Manhattan     25-44        M          BLACK 37.22  0.069
999012 2006-01-01      Brooklyn     25-44        M          BLACK 37.22  0.069
999014 2006-01-01      Brooklyn     45-64        M          BLACK 37.22  0.069
999118 2006-01-01 Staten Island       <18        F          WHITE 37.22  0.069
999137 2006-01-01         Bronx     25-44        M WHITE HISPANIC 37.22  0.069
999171 2006-01-01     Manhattan     45-64        M          WHITE 37.22  0.069
999181 2006-01-01         Bronx     18-24        M WHITE HISPANIC 37.22  0.069
999199 2006-01-01        Queens     25-44        M WHITE HISPANIC 37.22  0.069
999223 2006-01-01      Brooklyn     18-24        M          BLACK 37.22  0.069
999272 2006-01-01     Manhattan     18-24        M          WHITE 37.22  0.069
       humidity isRain
998703     76.1   TRUE
998716     76.1   TRUE
998720     76.1   TRUE
998847     76.1   TRUE
998874     76.1   TRUE
998878     76.1   TRUE
998885     76.1   TRUE
998905     76.1   TRUE
998907     76.1   TRUE
998931     76.1   TRUE
998951     76.1   TRUE
999012     76.1   TRUE
999014     76.1   TRUE
999118     76.1   TRUE
999137     76.1   TRUE
999171     76.1   TRUE
999181     76.1   TRUE
999199     76.1   TRUE
999223     76.1   TRUE
999272     76.1   TRUE

To facilitate our exploratory analysis, we will construct a new data-frame that aggregates daily arrest counts from the combined dataset.

# Aggregate the data by date
total_arrest_data <- combined_data |>
  group_by(date) |>
  mutate(arrests = n())

total_arrest_data <- total_arrest_data |>
  distinct(date, .keep_all = TRUE)

Explaratory Data Analysis

Precipitation vs Average Daily Arrests

Summary of Average Daily Arrests

As we see on the summary of daily arrests, the average daily arrests is 869 with a minimum of 91 and a maximum of 1773.

# Summary Arrests
summary(total_arrest_data$arrests)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   91.0   598.0   812.0   868.6  1171.0  1773.0 

Visualization of the summary of daily arrests

# Visualize the summary of daily arrests
total_arrest_data |>
  ggplot(aes(x = arrests)) +
  geom_boxplot(fill = 'blue', color = 'black') +
  labs(title = "Distribution of Daily Arrests",
       x = "Arrests",
       y = "Count") +
  theme_minimal()

The daily arrests are higher when it does not rain than when it does.

# Plotting side by side box plot for isRain and avg daily arrests
total_arrest_data |> 
  ggplot(aes(x = isRain, y = arrests, fill = isRain)) +
  geom_boxplot() +
  labs(title = "Total Arrests by Rain",
       x = "Rain",
       y = "Total Arrests") +
  theme_minimal()

We can also calculate

# Calculate the daily avg arrests when it rains
total_arrest_data |> 
  group_by(isRain) |> 
  summarise(avg_arrests = mean(arrests, na.rm =TRUE))
# A tibble: 2 × 2
  isRain avg_arrests
  <lgl>        <dbl>
1 FALSE         909.
2 TRUE          822.
# Calculate total arrests when it rains
total_arrest_data |> 
  group_by(isRain) |> 
  summarise(total_arrests = sum(arrests, na.rm =TRUE))
# A tibble: 2 × 2
  isRain total_arrests
  <lgl>          <int>
1 FALSE        3260994
2 TRUE         2528149

Null Hypothesis (H₀): There is no difference in the average daily arrests between when it rains and when it does not rain.

Alternative Hypothesis (H₁): There is a difference in the average daily arrests between when it rains and when it does not rain.

# Conduct hypothesis test to determine if there is a significant difference in the average daily arrests when it rains and when it does not rain
obs_diff <- total_arrest_data |> 
  drop_na(isRain) |> 
  specify(arrests ~ isRain) |> 
  calculate(stat = "diff in means", order = c("TRUE", "FALSE"))

# Simulate
null_dist <- total_arrest_data |> 
  drop_na(isRain) |> 
  specify(arrests ~ isRain) |> 
  hypothesize(null = "independence") |> 
  generate(reps = 1000, type = "permute") |> 
  calculate(stat = "diff in means", order = c("TRUE", "FALSE"))

# Visualize
ggplot(data = null_dist, aes(x = stat)) +
  geom_histogram(bins = 20, color = "#006BB6", fill = "#F58426", linewidth =1.5) +
  labs(title = "Distribution of Test Statistic (Null Hypothesis)",
       x = "Test Statistic",
       y = "Frequency") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.margin = margin(unit = "points", b = 15, l = 15, r = 15, t = 10)
  )

# Get p-value
null_dist %>%
  get_p_value(obs_stat = obs_diff, direction = "two_sided")
# A tibble: 1 × 1
  p_value
    <dbl>
1       0

A reported p-value of 0 indicates extremely strong evidence against the null hypothesis. This suggests that the observed relationship between arrest rates and rain is unlikely to be due to random chance. In other words, the data provides strong evidence that rain may be associated with arrest rates.

Correlation between temperature and arrests

# Visualize the distribution of arrests
hist(total_arrest_data$arrests, main = "Distribution of Arrests", xlab = "Arrests")
abline(v = mean(total_arrest_data$arrests), col = "#F58426", lwd = 2)
abline(v = median(total_arrest_data$arrests), col = "#006BB6", lwd = 2)
legend("topleft", legend = c("Mean", "Median"), col = c("#F58426", "#006BB6"), lwd = 2)

Looking at the scatterplot we see that the concentration of arrests fall in between what one may call the “temperate” range of 40°F to 80°F. This shows that there may be a correlation between the temperature and the number of arrests. It shows a positive correlation up until the extreme temperatures.

# Visualize the relationship between temperature and arrests
total_arrest_data |> 
  ggplot(aes(x = temp, y = arrests)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Temperature vs. Arrests",
       x = "Temperature (°F)",
       y = "Arrests") +
  theme_minimal()

Using a linear model will does not show a correlation between temperature and arrests. The p-value is 0.79 which is not significant. This indicates that the temperature does not have a significant impact on the number of arrests.

# Fit a linear model to the data
lm_model <- lm(arrests ~ temp, data = total_arrest_data)
summary(lm_model)

Call:
lm(formula = arrests ~ temp, data = total_arrest_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-775.51 -270.84  -56.25  301.97  905.63 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 864.91282   15.05695  57.443   <2e-16 ***
temp          0.06532    0.25638   0.255    0.799    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 351.5 on 6663 degrees of freedom
Multiple R-squared:  9.741e-06, Adjusted R-squared:  -0.0001403 
F-statistic: 0.0649 on 1 and 6663 DF,  p-value: 0.7989

We have a weak positive correlation between temperature and arrests. Indicating there isn’t a strong relationship between the two variables.

# Correlation between temperature and arrests
cor(total_arrest_data$temp, total_arrest_data$arrests)
[1] 0.003121016
# Run correlation test
cor.test(total_arrest_data$temp, total_arrest_data$arrests)

    Pearson's product-moment correlation

data:  total_arrest_data$temp and total_arrest_data$arrests
t = 0.25476, df = 6663, p-value = 0.7989
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.02088890  0.02712733
sample estimates:
        cor 
0.003121016 

Non-Linearity between temperature and arrests

# Fit a quadratic model to the data
quad_model <- lm(arrests ~ poly(temp, 2), data = total_arrest_data)
summary(quad_model)

Call:
lm(formula = arrests ~ poly(temp, 2), data = total_arrest_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-761.55 -270.18  -57.11  303.06  905.93 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     868.589      4.305 201.770   <2e-16 ***
poly(temp, 2)1   89.542    351.445   0.255    0.799    
poly(temp, 2)2 -504.248    351.445  -1.435    0.151    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 351.4 on 6662 degrees of freedom
Multiple R-squared:  0.0003187, Adjusted R-squared:  1.854e-05 
F-statistic: 1.062 on 2 and 6662 DF,  p-value: 0.3459

Lets try polynomial regression model

plot(total_arrest_data$temp, total_arrest_data$arrests, main = "Temperature vs. Arrests", xlab = "Temperature (°F)", ylab = "Arrests")
attach(total_arrest_data)
The following object is masked from package:datasets:

    precip
spline2 <- smooth.spline(temp, arrests, df = 2)
lines(spline2, col = "red")

plot(total_arrest_data$temp, total_arrest_data$arrests, main = "Temperature vs. Arrests", xlab = "Temperature (°F)", ylab = "Arrests")
attach(total_arrest_data)
The following objects are masked from total_arrest_data (pos = 3):

    age_group, arrest_boro, arrests, date, humidity, isRain, perp_race,
    perp_sex, precip, temp
The following object is masked from package:datasets:

    precip
spline3 <- smooth.spline(temp, arrests, df = 3)
lines(spline3, col = "red")

# Build polynomial model
poly.model <- lm(arrests ~ temp+I(temp^2), data = total_arrest_data)
summary(poly.model)

Call:
lm(formula = arrests ~ temp + I(temp^2), data = total_arrest_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-761.55 -270.18  -57.11  303.06  905.93 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 805.46543   44.08360  18.271   <2e-16 ***
temp          2.46006    1.68863   1.457    0.145    
I(temp^2)    -0.02184    0.01522  -1.435    0.151    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 351.4 on 6662 degrees of freedom
Multiple R-squared:  0.0003187, Adjusted R-squared:  1.854e-05 
F-statistic: 1.062 on 2 and 6662 DF,  p-value: 0.3459
# Visualize
total_arrest_data |> 
  ggplot(aes(x = temp, y = arrests)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Temperature vs. Arrests",
       x = "Temperature (°F)",
       y = "Arrests") +
  theme_minimal()

Conclusion:

I have attempted to investigate the correlation between temperature and the number of arrests in New York City. I have used a linear model to analyze the relationship between temperature and the number of arrests. I also used a polynomial model to analyze the relationship between temperature and the number of arrests.

It seems that the correlation isn’t strong enough to make a significant impact on the number of arrests. The data shows that the number of arrests is higher when it does not rain than when it does. This could be due to the fact that people are more likely to stay indoors when it rains. The data also shows that the number of arrests is higher when the temperature is between 40°F and 80°F. This could be due to the fact that people are more likely to be outside when the temperature is moderate. Overall, the data shows that there is not a linear correlation between weather variables and the number of arrests in New York City however there is some correlation. However, there could be another model that can better show a correlation.