1 Introduction
The purpose of this project is to analyse crime data, understand the factors influencing the crime cases and provide insights on how we can minimize crime impact.
2 Data
The dataset includes the following tables.
id : Numerical Identification of the crime case.
region :Categorical data stating the region in which the crime has occured
complaint_type : Categorical data describing the type of complaint based on priority i.e alpha for lower priority and charlie for medium priority
number_of_arrests : Numerical data for the number of suspects arrested for each crime case.
incident_impact_score : Numerical data which is the weight of the impact of the crime case.
Response_time : Numerical data representing the time it took the police task team to respond to the crime incident.
Code
| id | region | complaint_type | number_of_arrests | incident_impact_score | response_time |
|---|---|---|---|---|---|
| 1 | KwaZulu-Natal | alpha | 4 | 5.8 | 26.20 |
| 2 | KwaZulu-Natal | charlie | 5 | 7.7 | 24.93 |
| 3 | Eastern Cape | alpha | 7 | 6.6 | 27.59 |
| 4 | KwaZulu-Natal | charlie | 5 | 6.7 | 24.47 |
| 5 | Western Cape | alpha | 9 | 6.9 | 11.18 |
| 6 | KwaZulu-Natal | charlie | 7 | 6.0 | 26.12 |
3 Descriptive Statistics/ EDA
Perform descriptive analysis/ exploratory data analysis. Include at least 3 graphs (of different types)
Code
library(tidyverse)
library(knitr)
Crime_data <- read_csv("crime_data.csv")
my_skim <- skim_with(
numeric = sfl(Average = mean,
Standard_Deviation = sd,
Minimum = min,
Percentile_25th = ~ quantile(., .25, type = 6),
Median = median,
Percentile_75th = ~ quantile(., .75, type = 6),
Maximum = max),
append = FALSE
)
# Summary statistics
my_skim(Crime_data)| Name | Crime_data |
| Number of rows | 232 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| region | 0 | 1 | 7 | 13 | 0 | 6 | 0 |
| complaint_type | 0 | 1 | 5 | 7 | 0 | 3 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | Average | Standard_Deviation | Minimum | Percentile_25th | Median | Percentile_75th | Maximum |
|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 116.50 | 67.12 | 1.00 | 58.25 | 116.50 | 174.75 | 232.00 |
| number_of_arrests | 0 | 1 | 6.16 | 2.11 | 0.00 | 5.00 | 6.00 | 7.00 | 12.00 |
| incident_impact_score | 0 | 1 | 6.69 | 1.15 | 2.90 | 6.00 | 6.70 | 7.40 | 9.40 |
| response_time | 0 | 1 | 23.35 | 13.07 | 6.82 | 13.77 | 21.09 | 27.77 | 80.59 |
4 Statistical Inference
This section is for statistical inference.
4.1 Estimation
Use an estimation method to fit on a distribution on numerical discrete variable.
4.1.1 Goodness-of-fit Test
Provide a plot to compare the fit of the observed and fitted data. Perform goodness-of-fit test.
4.2 Interval Estimation
Provide a confidence interval.
4.3 Hypothesis testing
Perform a hypothesis test of your choosing.
5 Summary of Insights
Summarise key findings from your EDA and statistical analysis.
Note: no new information need be presented here, it is a summary of insights already noted in previous sections.
6 Conclusion
Conclude report.
6.1 Recommendations
What would you recommend based on your findings?