Prediciting Crime Rates from Income

Analysing the relationship between avearge weekly earnings and crime statistics in Australia

Amanjeet Singh Randhawa s3869950

25th October 2020

Introduction

Crime is a multifaceted occurrence and remains a constant topic of discussion.

Exploring the factors influencing the crime rate at any given time could provide valuable information for important policy decisions.

Here we will explore the relationship between average weekly earnings on the crime rate in Australia for the periods 2010-2019.

Problem Statement

Can we predict the crime rate using earnings data?

Using simple linear regression and correlation, we will examine the strength of the relationship between crime rate and earnings.

Data Collection

Crime [2], wage [3] and population [4] data was collected from the Australian Bureau of Statistics website on October 24th 2020. After importing into R, the data was tidied and manipulated in order to produce a format appropriate for statistical analysis. This included:

Data Characterization

The final dataset contained 6 variables:

Visualising the data

c3 <- rainbow(10, alpha=0.2)
c2 <- rainbow(10, v=0.7)
boxplot(Combined_mutated$Crime_percapita ~ Combined_mutated$State, 
        names = c("ACT", "NSW", "NT", "QLD", "SA", "TAS", "VIC", "WA"), 
        main = "Australian Crime rate per 100 persons by State", ylab = "Crimerate per 100 persons", 
        xlab = "State", col = c3, outcol = c3, medcol = c2)

Visualising cont.

Earnings_plot <- Combined_mutated %>% ggplot(aes(x = Year, y = Earnings, color=`State`)) +
  geom_point() + geom_line(aes(group = `State`)) +
  labs(title = "Australian Average Weekly Earnings by Year")
Earnings_plot1 <- Earnings_plot + labs(y = "Earnings (AUD)", x = "Year")

Earnings_plot1

Crime rate Decsriptive Statistics

Combined_mutated %>% group_by(State) %>% summarise(Min = min(Crime_percapita,na.rm = TRUE),
                                               
                                                   Median = median(Crime_percapita, na.rm = TRUE),
                                                  
                                                   Max = max(Crime_percapita,na.rm = TRUE),
                                                   Mean = mean(Crime_percapita, na.rm = TRUE),
                                                   SD = sd(Crime_percapita, na.rm = TRUE),
                                                 )

Hypothesis Testing

plot(Combined_mutated$Earnings, Combined_mutated$Crime_percapita, 
     main = "Crime rate by Average weekly earnings",
     xlab = "Average Weekly Earnings(AUD)", ylab = "Crime rate per 100 persons")
abline(lm(Combined_mutated$Crime_percapita~Combined_mutated$Earnings), col = "red")

Pearson’s r correlation

r <- cor(Combined_mutated$Earnings, Combined_mutated$Crime_percapita)
CIr(r = r, n = 80, level = .95)
## [1] 0.2379259 0.5975268
bivariate<-as.matrix(dplyr::select(Combined_mutated, Earnings,Crime_percapita))
cor(bivariate)
##                  Earnings Crime_percapita
## Earnings        1.0000000       0.4349074
## Crime_percapita 0.4349074       1.0000000

From the plot we observe a positive linear relationship between crime rate and earnings. Furthermore, the strength of the positive correlation was statistically significant, \(r = .43, p<.001\), 95%CI[.24, .60].

We can therefore continue with linear regression.

Linear Regression

Earningscrimemodel <- lm(Crime_percapita ~ Earnings, data = Combined_mutated)
plot(Earningscrimemodel)

From these plots we can confirm:

It is therefore safe to continue with linear regression.

Hypothesis Testing Cont.

\[H_0: Crime~rate~ and~ earnings~ do~ not~ fit~ the~ linear~ regression~ model \]

\[H_A: Crime~ rate~ and ~earnings~ data ~fit~ the~ linear~ regression~ model\]

## 
## Call:
## lm(formula = Crime_percapita ~ Earnings, data = Combined_mutated)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.16349 -0.94573 -0.03454  0.86428  2.98666 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.2830818  1.0123460   0.280    0.781    
## Earnings    0.0035971  0.0008433   4.266 5.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.17 on 78 degrees of freedom
## Multiple R-squared:  0.1891, Adjusted R-squared:  0.1787 
## F-statistic: 18.19 on 1 and 78 DF,  p-value: 5.547e-05

From the summary, we can confirm \(p<.001\)therefore we reject \(H_0\). Thus, there was statistically significant evidence that the data fit a linear regression model.

Conclusion

After calculating a Pearson’s correlation coefficient, we found a positive linear relationship between crime rate and earnings. Furthermore, the strength of the positive correlation was statistically significant, \(r = .43, p<.001\), \(95%\) CI[.24, .60].

After assessing the bivariate relationship between Crime rate and average weekly earnings, there was evidence of a positive linear relationship. The regression model was statistically significant, \(F(1, 78) = 18.19\), \(p<.001\) and explained 18.9%of the variability in crime rates, \(R^2 = 0.189\).

Discussion

As average weekly earnings increased, so did crime rates. This is contrary to the expectation that crime rate would decrease as wages went up.

Strengths of the study include the recording methods used, whereby the sample sizes for measures was large. Furthermore, as the data was collected by Government agencies, the data likely provides an accurate model of the population.

Limitations include the fact that different states may have different definitions of what a crime entails which would impact the number of crimes recorded. Furthermore, as assault cases were omitted this may cause the results to vary based on the assumption that assault rates are different by state.

As the average weekly earnings data was used for this analysis, a number of limitations arise. This includes the fact that the earnings may be skewed by composition of the workforce in any given recorded period. Additionally, the presence of a wealth gap would likely skew the results of any findings.

In future analysis, controlling for the distribution of earnings as well as crime type would help elucidate the quality and strength of the relationship. Furthermore, exploring the impact of additional predictor variables would provide a more complete picture of the crime rate.

References

[1] https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
[2] https://www.abs.gov.au/statistics/people/crime-and-justice/recorded-crime-victims-australia/latest-release
[3] https://www.abs.gov.au/methodologies/average-weekly-earnings-australia-methodology/may-2020#glossary
[4] http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ERP_QUARTERLY#