The analysis of crime cases data

Code

knitr::opts_chunk$set(echo = TRUE, message = FALSE, cache = TRUE) 

# packages

1 Introduction

The purpose of this project is to analyse crime data, understand the factors influencing the crime cases and provide insights on how we can minimize crime impact.

2 Data

The dataset includes the following tables.

id : Numerical Identification of the crime case.
region :Categorical data stating the region in which the crime has occured
complaint_type : Categorical data describing the type of complaint based on priority i.e alpha for lower priority and charlie for medium priority
number_of_arrests : Numerical data for the number of suspects arrested for each crime case.
incident_impact_score : Numerical data which is the weight of the impact of the crime case.
Response_time : Numerical data representing the time it took the police task team to respond to the crime incident.

Code

library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")
kable(head(Crime_data))

id	region	complaint_type	number_of_arrests	incident_impact_score	response_time
1	KwaZulu-Natal	alpha	4	5.8	26.20
2	KwaZulu-Natal	charlie	5	7.7	24.93
3	Eastern Cape	alpha	7	6.6	27.59
4	KwaZulu-Natal	charlie	5	6.7	24.47
5	Western Cape	alpha	9	6.9	11.18
6	KwaZulu-Natal	charlie	7	6.0	26.12

3 Descriptive Statistics/ EDA

Perform descriptive analysis/ exploratory data analysis. Include at least 3 graphs (of different types)

Code

library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")

my_skim <- skim_with(
  numeric = sfl(Average = mean, 
                Standard_Deviation = sd, 
                Minimum = min, 
                Percentile_25th = ~ quantile(., .25, type = 6),
                Median = median,
                Percentile_75th = ~ quantile(., .75, type = 6),
                Maximum = max),
  append = FALSE
)

# Summary statistics
my_skim(Crime_data)

Data summary
Name	Crime_data
Number of rows	232
Number of columns	6
_______________________
Column type frequency:
character	2
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
region	0	1	7	13	0	6	0
complaint_type	0	1	5	7	0	3	0

Variable type: numeric

skim_variable	complete_rate	Average	Standard_Deviation	Minimum	Percentile_25th	Median	Percentile_75th	Maximum
id	1	116.50	67.12	1.00	58.25	116.50	174.75	232.00
number_of_arrests	1	6.16	2.11	0.00	5.00	6.00	7.00	12.00
incident_impact_score	1	6.69	1.15	2.90	6.00	6.70	7.40	9.40
response_time	1	23.35	13.07	6.82	13.77	21.09	27.77	80.59

Code

library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")

4 Statistical Inference

This section is for statistical inference.

4.1 Estimation

Use an estimation method to fit on a distribution on numerical discrete variable.

4.1.1 Goodness-of-fit Test

Provide a plot to compare the fit of the observed and fitted data. Perform goodness-of-fit test.

4.2 Interval Estimation

Provide a confidence interval.

4.3 Hypothesis testing

Perform a hypothesis test of your choosing.

5 Summary of Insights

Summarise key findings from your EDA and statistical analysis.

Note: no new information need be presented here, it is a summary of insights already noted in previous sections.

6 Conclusion

Conclude report.

6.1 Recommendations

What would you recommend based on your findings?