Crime In LA

1.Extracting crime for 2024 and creating a scatter plot. The scatter plot shows the total amount of crime by the Area code in red and then the blue dashed lines on the scatter plot shows the average number of crimes. Area code that has the highest crime rate in 2024 is 01, Central LA, the lowest crime rate area is area 16, Foothil.

crime <- crime %>%
  mutate(
    year_occ = year(`DATE OCC`),
    year_rptd= year(`Date Rptd`)
  )
crime2024 <- crime %>%
  filter(year_occ== 2024)
#creating a scatterplot
crimeRate2024 <- crime2024 %>%
  group_by(AREA) %>%
  summarise(total_crimes=n(), .groups = "drop")
mean_crime <- mean(crimeRate2024$total_crimes, na.rm = TRUE)
ggplot(crimeRate2024, aes(x = AREA, y= total_crimes))+
  geom_point(color="red", size=3)+
  geom_hline(yintercept = mean_crime, linetype="dashed", color="blue")+
  labs(title = "Crime count in 2024 by Area Code",
       x="Area Code",
       y="Number of Crimes")

2 This code calculates total crimes per year, total crimes per area, and the average crimes per month, enabling me to analyze trends and changes in crime over time. The line plot shows that the average monthly crime rate increased in 2022, remained stable in 2023, and then decreased in 2024.

#Total crimes per year
total_crime_per_year <-crime %>%
  group_by(year_occ) %>%
  summarise(Totalcrime= n(), .groups='drop')

#Total crimes per area per year 
total_crime_by_area <- crime%>%
  group_by(year_occ, AREA )%>%
  summarise(CrimePerArea= n(), .groups = 'drop')

#Average crime per month per year 
average_crime_per_month <- crime %>%
  mutate(
    Year= year(`DATE OCC`),
    Month=month(`DATE OCC`)
  )%>%
  group_by(year_occ, Month) %>% #counting crimes per month
  summarise(CrimesPerMonth= n(), .groups = 'drop')%>%
  group_by(year_occ) %>%
  summarise(AverageCrimePerMonth= mean(CrimesPerMonth, na.rm=TRUE))


ggplot(average_crime_per_month, aes(x = year_occ, y = AverageCrimePerMonth)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red", size = 2) +
  labs(title = "Average Crime Per Month",
       x = "Year",
       y = "Average Crimes Per Month")

3. Finding out if there is a correlation between total crimes and average crime per month. The correlation coefficient is 1, which means there’s a perfect positive linear relationship between Total Crimes per year and Average crimes per month.

# Merge total crimes and average monthly crimes per year
crime_summary <- total_crime_per_year %>%
  left_join(average_crime_per_month, by ="year_occ")
#Finding correlation
cor_test <- cor.test(crime_summary$Totalcrime, crime_summary$AverageCrimePerMonth)
print(cor_test)

## 
##  Pearson's product-moment correlation
## 
## data:  crime_summary$Totalcrime and crime_summary$AverageCrimePerMonth
## t = 7616.4, df = 4, p-value = 1.783e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9999997 1.0000000
## sample estimates:
## cor 
##   1

4. The histogram below shows the distribution of victim ages. The most frequent age recorded is 0, which occurs because, in certain crime categories such as vehicle theft or vandalism, the data set records the age of the property rather than the age of a person. Besides that, the highest victim age falls between late 20’s and early 30’s.

hist(crime$`Vict Age`,
     main = "Histogram of Victim Age",
     xlab = "Age",
     ylab = "Frequency",
     col = "lightblue",
     border = "black")

5.Question: Is there a significant difference in the number of crimes between area 1 and 3?

The results showed that Area 1 had slightly higher average yearly crime count(11,611) compared to Area 3(9,573). However, the p-value (0.55) indicates that the difference is not statistically significant. Therefore,the difference in the crime levels between these two areas are not meaningful.

#Creating a subset with only Area 1 and Area 2
crime_subset <- crime %>%
  filter(AREA %in% c("01","03")) %>%
  mutate(AREA= factor(AREA))
#count crimes per year per area
crime_counts <-crime_subset %>%
  group_by(AREA,  year_occ) %>%
  summarise(TotalCrimes= n(), .groups= "drop")
#t-test comparing Area 1 Vs Area3
t.test_result <- t.test(TotalCrimes ~ AREA, data =crime_counts)
print(t.test_result)

## 
##  Welch Two Sample t-test
## 
## data:  TotalCrimes by AREA
## t = 0.61315, df = 9.4664, p-value = 0.5542
## alternative hypothesis: true difference in means between group 01 and group 03 is not equal to 0
## 95 percent confidence interval:
##  -5425.367  9501.701
## sample estimates:
## mean in group 01 mean in group 03 
##         11611.67          9573.50

Crime In LA

Yangchen Lhamo

2 This code calculates total crimes per year, total crimes per area, and the average crimes per month, enabling me to analyze trends and changes in crime over time. The line plot shows that the average monthly crime rate increased in 2022, remained stable in 2023, and then decreased in 2024.

3. Finding out if there is a correlation between total crimes and average crime per month. The correlation coefficient is 1, which means there’s a perfect positive linear relationship between Total Crimes per year and Average crimes per month.

5.Question: Is there a significant difference in the number of crimes between area 1 and 3?

The results showed that Area 1 had slightly higher average yearly crime count(11,611) compared to Area 3(9,573). However, the p-value (0.55) indicates that the difference is not statistically significant. Therefore,the difference in the crime levels between these two areas are not meaningful.