| id | region | complaint_type | number_of_arrests | incident_impact_score | response_time |
|---|---|---|---|---|---|
| 1 | KwaZulu-Natal | alpha | 4 | 5.8 | 26.20 |
| 2 | KwaZulu-Natal | charlie | 5 | 7.7 | 24.93 |
| 3 | Eastern Cape | alpha | 7 | 6.6 | 27.59 |
| 4 | KwaZulu-Natal | charlie | 5 | 6.7 | 24.47 |
| 5 | Western Cape | alpha | 9 | 6.9 | 11.18 |
| 6 | KwaZulu-Natal | charlie | 7 | 6.0 | 26.12 |
1 Introduction
The purpose of this project is to analyse crime data, understand the factors influencing the crime cases and provide insights on how we can minimize crime impact.
2 Data
The dataset includes the following tables.
id : Numerical Identification of the crime case.
region :Categorical data stating the region in which the crime has occured
complaint_type : Categorical data describing the type of complaint based on priority i.e alpha for lower priority and charlie for medium priority
number_of_arrests : Numerical data for the number of suspects arrested for each crime case.
incident_impact_score : Numerical data which is the weight of the impact of the crime case.
Response_time : Numerical data representing the time it took the police task team to respond to the crime incident.
3 Descriptive Statistics/ EDA
Perform descriptive analysis/ exploratory data analysis. Include at least 3 graphs (of different types)
Code
| Name | Crime_data |
| Number of rows | 232 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| region | 0 | 1 | 7 | 13 | 0 | 6 | 0 |
| complaint_type | 0 | 1 | 5 | 7 | 0 | 3 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | Average | Standard_Deviation | Minimum | Percentile_25th | Median | Percentile_75th | Maximum |
|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 116.50 | 67.12 | 1.00 | 58.25 | 116.50 | 174.75 | 232.00 |
| number_of_arrests | 0 | 1 | 6.16 | 2.11 | 0.00 | 5.00 | 6.00 | 7.00 | 12.00 |
| incident_impact_score | 0 | 1 | 6.69 | 1.15 | 2.90 | 6.00 | 6.70 | 7.40 | 9.40 |
| response_time | 0 | 1 | 23.35 | 13.07 | 6.82 | 13.77 | 21.09 | 27.77 | 80.59 |
Code
The data set contains 232 crime records, which provides a reasonable sample size for this exploratory investigation. There are no missing, so the data is complete and ready for analysis without the need for imputation or cleaning.
The data set includes both numeric variables (e.g., number of arrests, incident impact score, response time) and categorical variables (e.g., region, complaint type). Each of these will be discussed in turn.
3.0.1 Numeric Variables
- Number_of_arrests
This is a discrete variable representing how many arrests were made in connection with each complaint. Arrests range from low single digits to higher counts, with an average of around 6 arrests per complain. The moderate spread indicates that while most complaints lead to a handful of arrests, some result in significantly more, possibly reflecting larger or more serious incidents.
- Incident_impact_score
A continuous variable reflecting the severity or seriousness of the incident, measured on a numeric scale. The average score is around 6.6, with relatively low variability across complaints. This suggests that most incidents fall within a similar impact range, although certain outliers may indicate highly severe cases.
- Response_time
A continuous variable representing the time taken to respond to an incident (measured in minutes). Response times vary greatly, from as low as 11 minutes (Western Cape) to over 60 minutes (Northern Cape ), with an overall average in the mid-20s. The wide spread in response times highlights disparities in policing efficiency between regions, with faster responses generally associated with higher arrest rates.
Skewness provides a numerical summary of the shape of the numerical variables in the sample data. The skewness of the numeric variables in the crime data is summarised below.
Code
| number_of_arrests_skewness | incident_impact_score_skewness | response_time_skewness |
|---|---|---|
| 0.2033865 | -0.2263861 | 1.933137 |
4 Statistical Inference
4.1 Estimation
Use an estimation method to fit on a distribution on numerical discrete variable.
4.1.1 Goodness-of-fit Test
Provide a plot to compare the fit of the observed and fitted data. Perform goodness-of-fit test.
4.2 Interval Estimation
Provide a confidence interval.
4.3 Hypothesis testing
Perform a hypothesis test of your choosing.
5 Summary of Insights
Summarise key findings from your EDA and statistical analysis.
Note: no new information need be presented here, it is a summary of insights already noted in previous sections.
6 Conclusion
Conclude report.
6.1 Recommendations
What would you recommend based on your findings?