2 This code calculates total crimes per year, total crimes per area,
and the average crimes per month, enabling me to analyze trends and
changes in crime over time. The line plot shows that the average monthly
crime rate increased in 2022, remained stable in 2023, and then
decreased in 2024.
#Total crimes per year
total_crime_per_year <-crime %>%
group_by(year_occ) %>%
summarise(Totalcrime= n(), .groups='drop')
#Total crimes per area per year
total_crime_by_area <- crime%>%
group_by(year_occ, AREA )%>%
summarise(CrimePerArea= n(), .groups = 'drop')
#Average crime per month per year
average_crime_per_month <- crime %>%
mutate(
Year= year(`DATE OCC`),
Month=month(`DATE OCC`)
)%>%
group_by(year_occ, Month) %>% #counting crimes per month
summarise(CrimesPerMonth= n(), .groups = 'drop')%>%
group_by(year_occ) %>%
summarise(AverageCrimePerMonth= mean(CrimesPerMonth, na.rm=TRUE))
ggplot(average_crime_per_month, aes(x = year_occ, y = AverageCrimePerMonth)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red", size = 2) +
labs(title = "Average Crime Per Month",
x = "Year",
y = "Average Crimes Per Month")

3. Finding out if there is a correlation between total crimes and
average crime per month. The correlation coefficient is 1, which means
there’s a perfect positive linear relationship between Total Crimes per
year and Average crimes per month.
# Merge total crimes and average monthly crimes per year
crime_summary <- total_crime_per_year %>%
left_join(average_crime_per_month, by ="year_occ")
#Finding correlation
cor_test <- cor.test(crime_summary$Totalcrime, crime_summary$AverageCrimePerMonth)
print(cor_test)
##
## Pearson's product-moment correlation
##
## data: crime_summary$Totalcrime and crime_summary$AverageCrimePerMonth
## t = 7616.4, df = 4, p-value = 1.783e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9999997 1.0000000
## sample estimates:
## cor
## 1
4. The histogram below shows the distribution of victim ages. The
most frequent age recorded is 0, which occurs because, in certain crime
categories such as vehicle theft or vandalism, the data set records the
age of the property rather than the age of a person. Besides that, the
highest victim age falls between late 20’s and early 30’s.
hist(crime$`Vict Age`,
main = "Histogram of Victim Age",
xlab = "Age",
ylab = "Frequency",
col = "lightblue",
border = "black")

5.Question: Is there a significant difference in the number of
crimes between area 1 and 3?
The results showed that Area 1 had slightly higher average yearly
crime count(11,611) compared to Area 3(9,573). However, the p-value
(0.55) indicates that the difference is not statistically significant.
Therefore,the difference in the crime levels between these two areas are
not meaningful.
#Creating a subset with only Area 1 and Area 2
crime_subset <- crime %>%
filter(AREA %in% c("01","03")) %>%
mutate(AREA= factor(AREA))
#count crimes per year per area
crime_counts <-crime_subset %>%
group_by(AREA, year_occ) %>%
summarise(TotalCrimes= n(), .groups= "drop")
#t-test comparing Area 1 Vs Area3
t.test_result <- t.test(TotalCrimes ~ AREA, data =crime_counts)
print(t.test_result)
##
## Welch Two Sample t-test
##
## data: TotalCrimes by AREA
## t = 0.61315, df = 9.4664, p-value = 0.5542
## alternative hypothesis: true difference in means between group 01 and group 03 is not equal to 0
## 95 percent confidence interval:
## -5425.367 9501.701
## sample estimates:
## mean in group 01 mean in group 03
## 11611.67 9573.50