The analysis of crime cases data

Exploratory Data Analysis (EDA) and Statistical Inference

Author

Madonsela, Nkosinathi (231717962)

Published

2025

Other Formats
Code
knitr::opts_chunk$set(echo = TRUE, message = FALSE, cache = TRUE) 

# packages

1 Introduction

The purpose of this project is to analyse crime data, understand the factors influencing the crime cases and provide insights on how we can minimize crime impact.

2 Data

The dataset includes the following tables.

  • id : Numerical Identification of the crime case.

  • region :Categorical data stating the region in which the crime has occured

  • complaint_type : Categorical data describing the type of complaint based on priority i.e alpha for lower priority and charlie for medium priority

  • number_of_arrests : Numerical data for the number of suspects arrested for each crime case.

  • incident_impact_score : Numerical data which is the weight of the impact of the crime case.

  • Response_time : Numerical data representing the time it took the police task team to respond to the crime incident.

Code
library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")
kable(head(Crime_data))
id region complaint_type number_of_arrests incident_impact_score response_time
1 KwaZulu-Natal alpha 4 5.8 26.20
2 KwaZulu-Natal charlie 5 7.7 24.93
3 Eastern Cape alpha 7 6.6 27.59
4 KwaZulu-Natal charlie 5 6.7 24.47
5 Western Cape alpha 9 6.9 11.18
6 KwaZulu-Natal charlie 7 6.0 26.12

3 Descriptive Statistics/ EDA

Perform descriptive analysis/ exploratory data analysis. Include at least 3 graphs (of different types)

Code
library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")

my_skim <- skim_with(
  numeric = sfl(Average = mean, 
                Standard_Deviation = sd, 
                Minimum = min, 
                Percentile_25th = ~ quantile(., .25, type = 6),
                Median = median,
                Percentile_75th = ~ quantile(., .75, type = 6),
                Maximum = max),
  append = FALSE
)

# Summary statistics
my_skim(Crime_data)
Data summary
Name Crime_data
Number of rows 232
Number of columns 6
_______________________
Column type frequency:
character 2
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
region 0 1 7 13 0 6 0
complaint_type 0 1 5 7 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate Average Standard_Deviation Minimum Percentile_25th Median Percentile_75th Maximum
id 0 1 116.50 67.12 1.00 58.25 116.50 174.75 232.00
number_of_arrests 0 1 6.16 2.11 0.00 5.00 6.00 7.00 12.00
incident_impact_score 0 1 6.69 1.15 2.90 6.00 6.70 7.40 9.40
response_time 0 1 23.35 13.07 6.82 13.77 21.09 27.77 80.59
Code
library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")

4 Statistical Inference

This section is for statistical inference.

4.1 Estimation

Use an estimation method to fit on a distribution on numerical discrete variable.

4.1.1 Goodness-of-fit Test

Provide a plot to compare the fit of the observed and fitted data. Perform goodness-of-fit test.

4.2 Interval Estimation

Provide a confidence interval.

4.3 Hypothesis testing

Perform a hypothesis test of your choosing.

5 Summary of Insights

Summarise key findings from your EDA and statistical analysis.

Note: no new information need be presented here, it is a summary of insights already noted in previous sections.

6 Conclusion

Conclude report.

6.1 Recommendations

What would you recommend based on your findings?