Cambridge Crime Report

This dataset comprises crime incidents reported in the City of Cambridge, as featured in the Cambridge Police Department’s Annual Crime Reports, spanning from 1980 to 2008. The data provides detailed information about various crime types and their occurrences across different neighborhoods in Cambridge.

AIM

To identify crime trends in Cambridge and highest crime type by each neighborhood.

QUESTIONS

i Which crime occured most?

ii Highest crime report by neighborhood.

iii Crime trend over the years.

iv Neighborhood crime share using pie chart.

v Which month has the highest crime for each year?

vi Monthly crime trend for each year.

Loaded Necessary

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.5.2

Data Manipulation/Transformation

Importing data

crime_report<-read.csv("C:/Users/HomePC/Desktop/Raheem.R/Crime_Reports G.csv", na.strings = c("NA", ""))
head(crime_report) # it brings out the first 6 row from the top
##   File.Number         Date.of.Report                     Crime.Date.Time
## 1  2009-01323 02/21/2009 09:53:00 AM            02/21/2009 09:20 - 09:30
## 2  2009-01324 02/21/2009 09:59:00 AM 02/20/2009 22:30 - 02/21/2009 10:00
## 3  2009-01327 02/21/2009 12:32:00 PM 02/19/2009 21:00 - 02/21/2009 12:00
## 4  2009-01331 02/21/2009 03:05:00 PM            02/21/2009 15:00 - 15:10
## 5  2009-01346 02/22/2009 05:02:00 AM                    02/22/2009 05:02
## 6  2009-01357 02/22/2009 09:39:00 PM            02/22/2009 21:39 - 21:45
##                Crime Reporting.Area    Neighborhood
## 1            Threats            105  East Cambridge
## 2         Auto Theft           1109 North Cambridge
## 3        Hit and Run           1109 North Cambridge
## 4     Larceny (Misc)           1303 Strawberry Hill
## 5                OUI            105  East Cambridge
## 6 Aggravated Assault           1109 North Cambridge
##                            Location
## 1        100 OTIS ST, Cambridge, MA
## 2     400 RINDGE AVE, Cambridge, MA
## 3     400 RINDGE AVE, Cambridge, MA
## 4     0 NORUMBEGA ST, Cambridge, MA
## 5 FIFTH ST & GORE ST, Cambridge, MA
## 6     400 RINDGE AVE, Cambridge, MA
tail(crime_report) # it shows the bottom 6 of the data set
##       File.Number         Date.of.Report                     Crime.Date.Time
## 95918  2024-03751 05/07/2024 12:48:00 PM 05/03/2024 12:47 - 05/07/2024 12:47
## 95919  2024-03755 05/07/2024 01:13:00 PM            05/04/2024 12:00 - 18:00
## 95920  2024-03756 05/07/2024 02:41:00 PM            05/07/2024 14:40 - 14:41
## 95921  2024-03777 05/07/2024 08:13:00 PM            05/07/2024 15:00 - 19:15
## 95922  2024-03806 05/08/2024 04:09:00 PM            05/07/2024 04:00 - 04:05
## 95923  2024-03824 05/09/2024 10:23:00 AM            05/05/2024 11:30 - 13:00
##                    Crime Reporting.Area   Neighborhood
## 95918            Forgery            411         Area 4
## 95919    Larceny from MV            411         Area 4
## 95920           Accident            611  Mid-Cambridge
## 95921 Larceny of Bicycle            411         Area 4
## 95922    Larceny from MV           1005 West Cambridge
## 95923        Hit and Run           1204      Highlands
##                                            Location
## 95918            100 BISHOP ALLEN DR, Cambridge, MA
## 95919            100 BISHOP ALLEN DR, Cambridge, MA
## 95920 MASSACHUSETTS AVE & PEABODY ST, Cambridge, MA
## 95921                  0 COLUMBIA ST, Cambridge, MA
## 95922                    0 FOSTER PL, Cambridge, MA
## 95923          200 Alewife Brook Pky, Cambridge, MA
str(crime_report) # to show the internal structure and data types
## 'data.frame':    95923 obs. of  7 variables:
##  $ File.Number    : chr  "2009-01323" "2009-01324" "2009-01327" "2009-01331" ...
##  $ Date.of.Report : chr  "02/21/2009 09:53:00 AM" "02/21/2009 09:59:00 AM" "02/21/2009 12:32:00 PM" "02/21/2009 03:05:00 PM" ...
##  $ Crime.Date.Time: chr  "02/21/2009 09:20 - 09:30" "02/20/2009 22:30 - 02/21/2009 10:00" "02/19/2009 21:00 - 02/21/2009 12:00" "02/21/2009 15:00 - 15:10" ...
##  $ Crime          : chr  "Threats" "Auto Theft" "Hit and Run" "Larceny (Misc)" ...
##  $ Reporting.Area : int  105 1109 1109 1303 105 1109 501 501 1108 105 ...
##  $ Neighborhood   : chr  "East Cambridge" "North Cambridge" "North Cambridge" "Strawberry Hill" ...
##  $ Location       : chr  "100 OTIS ST, Cambridge, MA" "400 RINDGE AVE, Cambridge, MA" "400 RINDGE AVE, Cambridge, MA" "0 NORUMBEGA ST, Cambridge, MA" ...
summary(crime_report) #is to give a quick statistical summary`
##  File.Number        Date.of.Report     Crime.Date.Time       Crime          
##  Length:95923       Length:95923       Length:95923       Length:95923      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Reporting.Area   Neighborhood         Location        
##  Min.   : 101.0   Length:95923       Length:95923      
##  1st Qu.: 406.0   Class :character   Class :character  
##  Median : 604.0   Mode  :character   Mode  :character  
##  Mean   : 632.5                                        
##  3rd Qu.: 912.0                                        
##  Max.   :1304.0                                        
##  NA's   :8
dim(crime_report)
## [1] 95923     7

Dropping a column

I removed date of report because there is a column (crime.start.date) that contains the date that the crime occured.

crime_report<-select(crime_report,-Date.of.Report) 

Handling missing values

In crime column there were entries stated Admin Error which means the crime was not stated so i will be removing it.

# HANDLING MISSING VALUES
colSums(is.na(crime_report))
##     File.Number Crime.Date.Time           Crime  Reporting.Area    Neighborhood 
##               0              11               0               8               8 
##        Location 
##             295
# removing using na.omit

crime_report <- na.omit(crime_report)

#checking for the name of the crime
table(crime_report$Crime)
## 
##               Accident            Admin Error     Aggravated Assault 
##                   2871                   2857                   2399 
##   Annoying & Accosting                  Arson             Auto Theft 
##                    148                    148                   1941 
##       Commercial Break     Commercial Robbery         Counterfeiting 
##                   1014                    368                    257 
##             Disorderly       Domestic Dispute     Drinking in Public 
##                    494                     10                    362 
##                  Drugs           Embezzlement    Extortion/Blackmail 
##                   1084                    157                    164 
##              Flim Flam                Forgery               Gambling 
##                   2583                   6109                      4 
##             Harassment            Hit and Run               Homicide 
##                   1521                   9121                     21 
##             Housebreak      Indecent Exposure             Kidnapping 
##                   3804                    390                     34 
##         Larceny (Misc)  Larceny from Building        Larceny from MV 
##                    702                   4464                   7369 
##    Larceny from Person Larceny from Residence     Larceny of Bicycle 
##                   3228                   3953                   6292 
##       Larceny of Plate    Larceny of Services Liquor Possession/Sale 
##                    434                    303                     74 
##    Mal. Dest. Property         Missing Person        Noise Complaint 
##                   6130                   1953                    165 
##                    OUI       Peeping & Spying            Phone Calls 
##                    590                     81                    527 
##           Prostitution    Rec. Stol. Property Sex Offender Violation 
##                     73                    326                     94 
##            Shoplifting         Simple Assault               Stalking 
##                   5610                   4201                     41 
##         Street Robbery     Suspicious Package         Taxi Violation 
##                   1223                   1125                    404 
##                Threats            Trespassing      Violation of H.O. 
##                   2766                    746                    298 
##      Violation of R.O.         Warrant Arrest      Weapon Violations 
##                      7                   4405                    167
# Remove rows where Crime is "Admin Error"
crime_report <- crime_report %>%
  filter(Crime != "Admin Error")

# Check that they are gone
table(crime_report$Crime)
## 
##               Accident     Aggravated Assault   Annoying & Accosting 
##                   2871                   2399                    148 
##                  Arson             Auto Theft       Commercial Break 
##                    148                   1941                   1014 
##     Commercial Robbery         Counterfeiting             Disorderly 
##                    368                    257                    494 
##       Domestic Dispute     Drinking in Public                  Drugs 
##                     10                    362                   1084 
##           Embezzlement    Extortion/Blackmail              Flim Flam 
##                    157                    164                   2583 
##                Forgery               Gambling             Harassment 
##                   6109                      4                   1521 
##            Hit and Run               Homicide             Housebreak 
##                   9121                     21                   3804 
##      Indecent Exposure             Kidnapping         Larceny (Misc) 
##                    390                     34                    702 
##  Larceny from Building        Larceny from MV    Larceny from Person 
##                   4464                   7369                   3228 
## Larceny from Residence     Larceny of Bicycle       Larceny of Plate 
##                   3953                   6292                    434 
##    Larceny of Services Liquor Possession/Sale    Mal. Dest. Property 
##                    303                     74                   6130 
##         Missing Person        Noise Complaint                    OUI 
##                   1953                    165                    590 
##       Peeping & Spying            Phone Calls           Prostitution 
##                     81                    527                     73 
##    Rec. Stol. Property Sex Offender Violation            Shoplifting 
##                    326                     94                   5610 
##         Simple Assault               Stalking         Street Robbery 
##                   4201                     41                   1223 
##     Suspicious Package         Taxi Violation                Threats 
##                   1125                    404                   2766 
##            Trespassing      Violation of H.O.      Violation of R.O. 
##                    746                    298                      7 
##         Warrant Arrest      Weapon Violations 
##                   4405                    167
# checking back the missing value
sum(is.na(crime_report)) # this has been cleaned
## [1] 0
head(crime_report)
##   File.Number                     Crime.Date.Time              Crime
## 1  2009-01323            02/21/2009 09:20 - 09:30            Threats
## 2  2009-01324 02/20/2009 22:30 - 02/21/2009 10:00         Auto Theft
## 3  2009-01327 02/19/2009 21:00 - 02/21/2009 12:00        Hit and Run
## 4  2009-01331            02/21/2009 15:00 - 15:10     Larceny (Misc)
## 5  2009-01346                    02/22/2009 05:02                OUI
## 6  2009-01357            02/22/2009 21:39 - 21:45 Aggravated Assault
##   Reporting.Area    Neighborhood                          Location
## 1            105  East Cambridge        100 OTIS ST, Cambridge, MA
## 2           1109 North Cambridge     400 RINDGE AVE, Cambridge, MA
## 3           1109 North Cambridge     400 RINDGE AVE, Cambridge, MA
## 4           1303 Strawberry Hill     0 NORUMBEGA ST, Cambridge, MA
## 5            105  East Cambridge FIFTH ST & GORE ST, Cambridge, MA
## 6           1109 North Cambridge     400 RINDGE AVE, Cambridge, MA

Converting to date format

I used lubridate package to convert my crime date columns to R recognised date time. i aslo extracted the year,month and the weekday of the crime and i also checked for missing column and i treated the missing column in Reporting area column.

crime_report <- crime_report %>%
  mutate(

# Clean Crime.Date.Time: keep only first date 
    Crime.Date.Time= str_trim(str_extract(Crime.Date.Time, "^[^-]+")),
    Crime.Date.Time= parse_date_time(Crime.Date.Time, 
                                       orders = c("mdy HM", "mdy HMS", "mdy H", "mdy")),
    
    # Extract useful parts AFTER cleaning
    Crime_Year = year(Crime.Date.Time),
    Crime_Hour = hour(Crime.Date.Time),
    Crime_Month = month(Crime.Date.Time, label = TRUE),
    Crime_Weekday = wday(Crime.Date.Time, label = TRUE)
  )
# Arranging year and month for crime date

crime_report <- crime_report %>%
  arrange(Crime_Year, Crime_Month)

converting character to factor

crime_report$Crime<-as.factor(crime_report$Crime)
crime_report$Neighborhood<-as.factor(crime_report$Neighborhood)
crime_report$Location<-as.factor(crime_report$Location)
crime_report$Crime.Date.Time<-as.factor(crime_report$Crime.Date.Time)
str(crime_report)
## 'data.frame':    92755 obs. of  10 variables:
##  $ File.Number    : chr  "2010-04009" "2012-03794" "2018-03218" "2012-04627" ...
##  $ Crime.Date.Time: Factor w/ 86994 levels "1980-01-01","1980-01-01 12:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Crime          : Factor w/ 53 levels "Accident","Aggravated Assault",..: 16 25 16 16 16 15 44 16 16 16 ...
##  $ Reporting.Area : int  1002 606 1101 702 304 1002 607 1012 101 1106 ...
##  $ Neighborhood   : Factor w/ 13 levels "Agassiz","Area 4",..: 13 7 9 11 6 13 7 13 4 9 ...
##  $ Location       : Factor w/ 5079 levels "0 ABERDEEN AVE, Cambridge, MA",..: 577 1937 77 1809 1134 1384 1030 1229 1696 1452 ...
##  $ Crime_Year     : num  1980 1980 1990 1993 1993 ...
##  $ Crime_Hour     : int  0 12 12 9 19 14 0 0 10 19 ...
##  $ Crime_Month    : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 1 6 1 10 1 12 1 1 3 ...
##  $ Crime_Weekday  : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 3 3 6 6 6 1 4 7 1 4 ...
##  - attr(*, "na.action")= 'omit' Named int [1:311] 172 480 768 1337 1513 2105 2155 3028 3337 3401 ...
##   ..- attr(*, "names")= chr [1:311] "172" "480" "768" "1337" ...

I converted all the categorical data from character to factor because R recognizes categorical data as factor.

Exploratory Data Analysis

Top 10 crimes

#Grouped crimes and i checked for which type occurred most 

crime_summary<- crime_report %>%
  group_by(Crime) %>%
  summarise(Total_Cases=n()) %>% #counting how many times the crime appears
  arrange(desc(Total_Cases)) #sorting from highest to the lowest
#Extracting top 10 crimes
top_crimes<-crime_summary %>%
  slice_head(n = 10) #slecting top 10
#viewing top_crimes
top_crimes
## # A tibble: 10 × 2
##    Crime                  Total_Cases
##    <fct>                        <int>
##  1 Hit and Run                   9121
##  2 Larceny from MV               7369
##  3 Larceny of Bicycle            6292
##  4 Mal. Dest. Property           6130
##  5 Forgery                       6109
##  6 Shoplifting                   5610
##  7 Larceny from Building         4464
##  8 Warrant Arrest                4405
##  9 Simple Assault                4201
## 10 Larceny from Residence        3953
# Plot top 10
ggplot(top_crimes, aes(x = reorder(Crime, Total_Cases), y = Total_Cases, fill = Crime)) +
  geom_col(show.legend = FALSE) +  # Draw bars without legend
  geom_text(aes(label = Total_Cases), hjust = -0.1, size = 3.5) +  # Add data labels
  coord_flip() +  # Flip axes for horizontal bars
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +  # Add padding to avoid label cutoff
  theme_minimal() +  # Use minimal theme
  labs(title = "Top 10 Reported Crimes in Cambridge",
       x = "Crime Type",
       y = "Total Cases")

I group my data according to which type of crime occured most,which I sorted from the highest to the lowest so i decided to work with the top 10 most occurred crime and plot the top 10 on a bar chart.

Hit and run was the highest reported crime in cambridge followed by larcencies.

Highest crime by Neighborhood

# Highest crime by Neighborhood

# Top 5 Neighborhoods and Top 5 Crimes
top_crime_plot <- crime_report %>%
  # Summarize counts per neighborhood and crime
  group_by(Neighborhood, Crime) %>%
  summarise(Total_Cases = n(), .groups = "drop") %>%
  # Keep only top 5 neighborhoods and top 5 crimes overall
  filter(Neighborhood %in% (crime_report %>% count(Neighborhood) %>% top_n(5, n) %>% pull(Neighborhood)),
         Crime %in% (crime_report %>% count(Crime) %>% top_n(5, n) %>% pull(Crime))) %>%
  arrange(desc(Total_Cases))

top_crime_plot
## # A tibble: 25 × 3
##    Neighborhood    Crime              Total_Cases
##    <fct>           <fct>                    <int>
##  1 North Cambridge Hit and Run               1124
##  2 Cambridgeport   Larceny from MV           1082
##  3 Cambridgeport   Larceny of Bicycle         994
##  4 Cambridgeport   Hit and Run                962
##  5 East Cambridge  Hit and Run                955
##  6 Mid-Cambridge   Hit and Run                940
##  7 Mid-Cambridge   Larceny from MV            901
##  8 East Cambridge  Forgery                    866
##  9 Area 4          Hit and Run                840
## 10 North Cambridge Larceny from MV            796
## # ℹ 15 more rows
# Visualize with facet wrap by Crime and sort within each facet
ggplot(top_crime_plot, aes(x = reorder_within(Neighborhood, Total_Cases, Crime), y = Total_Cases, fill = Crime)) +
  geom_col(show.legend = FALSE) +  # Draw bars without legend
  coord_flip() +  # Flip axes for horizontal bars
  facet_wrap(~ Crime, scales = "free_y", ncol = 2) +  # Facet by crime type
  scale_x_reordered() +  # Fix axis labels after reorder_within
  theme_minimal() +  # Use minimal theme
  labs(
    title = "Top 5 Neighborhoods by Top 5 Crimes",
    x = "Neighborhood",
    y = "Total Cases"
  )

OBSERVATIONS

1 In East Cambridge forgery incidents is the highest because the neighborhood contains many banks , businesses ,retail stores .These areas involve frequent financial transactions which increases both the opportunity for forgery crimes.

2 In North Cambridge hit and run is the highest due to its heavy traffic flow ,large parking areas and frequents interactions between commuters and residential streets,which increases collision opportunities and driver fleeing the scence in the process.

3 In Cambridgeport larceny and malicious destruction of property is the highest due to its busy commercial activity and densed street parking environment ,people tend to slintely break into cars without people’s consent.

Crime trend over the years

I did a crime trend over years,The sharp change in the trend between 1980 and 2008 is due to incomplete or missing crime records in the earlier years. Proper data collection appears to have started around 2009, resulting in a sudden increase in recorded cases. Therefore, the flat line before 2009 reflects limited or inconsistent reporting rather than an actual absence of crime.

# Summarize yearly crime counts
crime_summary <- crime_report %>%
  group_by(Crime_Year) %>%
  summarise(Total_Cases = n()) %>%
  arrange(Crime_Year)
crime_summary
## # A tibble: 30 × 2
##    Crime_Year Total_Cases
##         <dbl>       <int>
##  1       1980           2
##  2       1990           1
##  3       1993           2
##  4       1995           1
##  5       1999           1
##  6       2000           5
##  7       2001          14
##  8       2002           4
##  9       2003           2
## 10       2004          16
## # ℹ 20 more rows
# Plot yearly crime trend
crime_report1 <- crime_report %>% filter(Crime_Year > 2008)
head(crime_report1)
##   File.Number     Crime.Date.Time                 Crime Reporting.Area
## 1  2009-00323 2009-01-10 23:00:00 Larceny from Building            707
## 2  2009-00330 2009-01-13 07:45:00            Housebreak            509
## 3  2009-00340 2009-01-13 08:24:00           Hit and Run            509
## 4  2009-00341 2009-01-10 12:00:00       Larceny from MV            801
## 5  2009-00345 2009-01-13 09:00:00           Hit and Run           1007
## 6  2009-00352 2009-01-06 15:00:00 Larceny from Building            708
##     Neighborhood                           Location Crime_Year Crime_Hour
## 1      Riverside 100 Mount Auburn St, Cambridge, MA       2009         23
## 2  Cambridgeport     200 MAGAZINE ST, Cambridge, MA       2009          7
## 3  Cambridgeport        0 GRANITE ST, Cambridge, MA       2009          8
## 4        Agassiz     100 KIRKLAND ST, Cambridge, MA       2009         12
## 5 West Cambridge       0 HILLIARD ST, Cambridge, MA       2009          9
## 6      Riverside            0 JFK ST, Cambridge, MA       2009         15
##   Crime_Month Crime_Weekday
## 1         Jan           Sat
## 2         Jan           Tue
## 3         Jan           Tue
## 4         Jan           Sat
## 5         Jan           Tue
## 6         Jan           Tue
crime_summary1 <- crime_report1 %>% 
  group_by(Crime_Year) %>%
  summarise(Total_Cases = n()) %>%
  arrange(Crime_Year)
crime_summary1
## # A tibble: 16 × 2
##    Crime_Year Total_Cases
##         <dbl>       <int>
##  1       2009        6515
##  2       2010        6474
##  3       2011        6433
##  4       2012        6144
##  5       2013        6285
##  6       2014        6179
##  7       2015        6041
##  8       2016        5668
##  9       2017        5436
## 10       2018        5409
## 11       2019        5395
## 12       2020        5731
## 13       2021        5484
## 14       2022        5956
## 15       2023        6820
## 16       2024        2566
ggplot(crime_summary1, aes(x = Crime_Year, y = Total_Cases)) +
  geom_line(color = "red3", linewidth = 1, group = 1) +
  geom_point(color = "black", size = 1) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_continuous(limits = c(2009, 2024), breaks = 2009:2024) +
  labs(
    title = "Yearly Crime Trend in Cambridge",
    x = "Year",
    y = "Number of Crimes"
  )

#The sharp change in the trend between 1980 and 2008 is due to incomplete or missing crime records in the earlier years. Proper data collection appears to have started around 2009, #
#resulting in a sudden increase in recorded cases. Therefore, the flat line before 2009 reflects limited or inconsistent reporting rather than 
#an actual absence of crime.

In this plot the huge drop in 2024 was not a real drop in crime it was due to incomplete data problem in the kaggle dataset.

Neighborhood Crime share

This pie chart highlights the crime distribution with certain neighborhood emerging as crime hot spot areas like cambridgeport which has the highest number of crime distribution.

# Summarize total crimes by neighborhood and arrange by percent
crime_total <- crime_report1 %>%
  count(Neighborhood, name = "Total_Cases") %>%
  mutate(Percent = round(Total_Cases / sum(Total_Cases) * 100, 1)) %>%
  arrange(Percent)  # Arrange in ascending order

crime_total
##        Neighborhood Total_Cases Percent
## 1   Strawberry Hill        1528     1.7
## 2               MIT        1933     2.1
## 3           Agassiz        2529     2.7
## 4         Highlands        2565     2.8
## 5  Inman/Harrington        5605     6.1
## 6           Peabody        5806     6.3
## 7    West Cambridge        9000     9.7
## 8         Riverside        9193     9.9
## 9            Area 4        9219    10.0
## 10    Mid-Cambridge        9527    10.3
## 11  North Cambridge        9712    10.5
## 12   East Cambridge       11868    12.8
## 13    Cambridgeport       14051    15.2
others <- data.frame("Neighborhood" = "Others", "Total_Cases" = sum(crime_total$Total_Cases[1:7]), "Percent" = sum(crime_total$Percent[1:7]))
crime_totpie <- rbind(crime_total[8:13,], others)

# Create the pie chart
ggplot(crime_totpie, aes(x = "", y = Percent, fill = Neighborhood)) +
  geom_col(width = 1, color = "white") +
  coord_polar(theta = "y") +
  geom_text(aes(label = paste0(Percent, "%")),
            position = position_stack(vjust = 0.5), size = 4, color = "black") +
  labs(title = "Crime Distribution by Neighborhood",
       fill = "Neighborhood") +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Cambridgeport has the highest crime distribution because it is one of the densest neighborhoods which contains more people leading to more crime opportunity especially larcenies and malicious destruction of property because of the availability of major transportation routes for easier transportation of vehicles.

Which month had the highest crime for each year?

crime_report %>%
  count(Crime_Year, Crime_Month) %>%
  ggplot(aes(x = Crime_Month, y = Crime_Year, fill = n)) +
  geom_tile(color = "white") +
  # Use multiple distinct colors for better contrast
  scale_fill_gradientn(
    colours = c("blue", "cyan", "yellow", "orange", "red", "darkred"),
    values = scales::rescale(c(0, 100, 200, 300, 400, 600)),  # Adjust based on your data range
    guide = "colorbar"
  ) +
  theme_minimal() +
  labs(
    title = "Heatmap: Crime Frequency by Year and Month",
    x = "Month",
    y = "Year",
    fill = "Total Crimes"
  )

This heatmap shows the crime frequency by year and month,i used blue color which signifies low crime frequency and dark red which shows high crime frequency.The red hotspot zone between 2009 and 2023 shows the years and months where Cambridge had high, consistent crime activity, especially in spring(April-June ) and fall(October-November). The blue areas earlier mostly show years with low or incomplete data.

From 1980 to 2009 crime count was not recorded in the data.

Montly crime trend for each year with complete data

# Filter to years with complete monthly data
complete_years <- crime_report1 %>%
  count(Crime_Year, Crime_Month) %>%
  count(Crime_Year) %>%
  filter(n == 12) %>%
  pull(Crime_Year)

# Use only complete years
crime_report1 %>%
  filter(Crime_Year %in% complete_years) %>%
  count(Crime_Year, Crime_Month) %>%
  ggplot(aes(x = Crime_Month, y = n, group = Crime_Year)) +
  geom_line(color = "red", linewidth = 1) +
  facet_wrap(~ Crime_Year, ncol = 4) +
  theme_minimal() +
  labs(
    title = "Monthly Crime Trend by Year (Complete Years Only)",
    x = "Month",
    y = "Crime Count"
  )

CONCLUSION

As a result of my exploratiory analysis,

i I saw that Hit and run is the most reported crime.

ii Cambridge port has the highest crime distribution by neighborhood in the pie chart.

iii From 1980 t0 2008 there was no official crime reporting by the cambridge police department until 2009 where official recording began where we could see the rise and fall of crime trends but in the year 2023 there was a peak in crime which served as the highest crime trends over the years and 2024 there was a drastic fall in crime rate due to partial record.