Prediciting Crime Rates from Income

Analysing the relationship between avearge weekly earnings and crime statistics in Australia

Amanjeet Singh Randhawa s3869950

25th October 2020

RPubs link information

An online presentation of the findings can be found here:

Introduction

Crime is a multifaceted occurrence and remains a constant topic of discussion.

Exploring the factors influencing the crime rate at any given time could provide valuable information for important policy decisions.

Here we will explore the relationship between average weekly earnings on the crime rate in Australia for the periods 2010-2019.

Problem Statement

Can we predict the crime rate using earnings data?

Using simple linear regression and correlation, we will examine the strength of the relationship between crime rate and earnings.

Data Collection

Crime [2], wage [3] and population [4] data was collected from the Australian Bureau of Statistics website on October 24th 2020. After importing into R, the data was tidied and manipulated in order to produce a format appropriate for statistical analysis. This included:

Removing empty or superfluous observations.
Removing assault cases for all states as these statistics were not reported for Victoria.
Producing a format in line with ‘Tidy Data Principles’[1].
Scanning for missing or special values.
Combining observations across months into yearly averages.
Merging tables and producing a variable for crime rate.

Data Characterization

The final dataset contained 6 variables:

State: The eight states and territories of Australia. A factor variable with eight levels.
Year: A factor variable of 10 levels representing the observation years from 2010-2019.
Total crimes: The sum of reported crimes excluding assaults.
Earnings: The average weekly earnings recorded.
Population: The number of persons normally residing in Australia at the time of recording.
Crime per capita: A numerical variable representing the crime rate per 100 persons. Calculated from the total crimes and population variables.

Visualising the data

c3 <- rainbow(10, alpha=0.2)
c2 <- rainbow(10, v=0.7)
boxplot(Combined_mutated$Crime_percapita ~ Combined_mutated$State, 
        names = c("ACT", "NSW", "NT", "QLD", "SA", "TAS", "VIC", "WA"), 
        main = "Australian Crime rate per 100 persons by State", ylab = "Crimerate per 100 persons", 
        xlab = "State", col = c3, outcol = c3, medcol = c2)

From the plot we observe an outlier for ACT. Upon further investigation, this corresponds to the observation for 2010. As this could indicate an important feature of the data, it is included in the final dataset.
Northern Territory and Western Australia have the highest median crime rate while Tasmania has the lowest.

Visualising cont.

Earnings_plot <- Combined_mutated %>% ggplot(aes(x = Year, y = Earnings, color=`State`)) +
  geom_point() + geom_line(aes(group = `State`)) +
  labs(title = "Australian Average Weekly Earnings by Year")
Earnings_plot1 <- Earnings_plot + labs(y = "Earnings (AUD)", x = "Year")

Earnings_plot1

All states had an increase in average weekly earnings with the rate of increase being fairly constant.
Australian Capital Territory and Northern Territory had the greatest average weekly earnings while Tasmania consistently had the lowest.

Crime rate Decsriptive Statistics

Combined_mutated %>% group_by(State) %>% summarise(Min = min(Crime_percapita,na.rm = TRUE),
                                               
                                                   Median = median(Crime_percapita, na.rm = TRUE),
                                                  
                                                   Max = max(Crime_percapita,na.rm = TRUE),
                                                   Mean = mean(Crime_percapita, na.rm = TRUE),
                                                   SD = sd(Crime_percapita, na.rm = TRUE),
                                                 )

Hypothesis Testing

plot(Combined_mutated$Earnings, Combined_mutated$Crime_percapita, 
     main = "Crime rate by Average weekly earnings",
     xlab = "Average Weekly Earnings(AUD)", ylab = "Crime rate per 100 persons")
abline(lm(Combined_mutated$Crime_percapita~Combined_mutated$Earnings), col = "red")

Pearson’s r correlation

r <- cor(Combined_mutated$Earnings, Combined_mutated$Crime_percapita)
CIr(r = r, n = 80, level = .95)

## [1] 0.2379259 0.5975268

bivariate<-as.matrix(dplyr::select(Combined_mutated, Earnings,Crime_percapita))
cor(bivariate)

##                  Earnings Crime_percapita
## Earnings        1.0000000       0.4349074
## Crime_percapita 0.4349074       1.0000000

From the plot we observe a positive linear relationship between crime rate and earnings. Furthermore, the strength of the positive correlation was statistically significant, \(r = .43, p<.001\), 95%CI[.24, .60].

We can therefore continue with linear regression.

Linear Regression

Earningscrimemodel <- lm(Crime_percapita ~ Earnings, data = Combined_mutated)
plot(Earningscrimemodel)

From these plots we can confirm:

Independence as the data recordings for each state were independent.
Linearity from the scatter plot.
Normality of residuals from the plot of residuals vs fitted as the relationship is relatively flat.
Constant variance

It is therefore safe to continue with linear regression.

Hypothesis Testing Cont.

\[H_0: Crime~rate~ and~ earnings~ do~ not~ fit~ the~ linear~ regression~ model \]

\[H_A: Crime~ rate~ and ~earnings~ data ~fit~ the~ linear~ regression~ model\]

## 
## Call:
## lm(formula = Crime_percapita ~ Earnings, data = Combined_mutated)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.16349 -0.94573 -0.03454  0.86428  2.98666 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.2830818  1.0123460   0.280    0.781    
## Earnings    0.0035971  0.0008433   4.266 5.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.17 on 78 degrees of freedom
## Multiple R-squared:  0.1891, Adjusted R-squared:  0.1787 
## F-statistic: 18.19 on 1 and 78 DF,  p-value: 5.547e-05

From the summary, we can confirm \(p<.001\)therefore we reject \(H_0\). Thus, there was statistically significant evidence that the data fit a linear regression model.

Conclusion

After calculating a Pearson’s correlation coefficient, we found a positive linear relationship between crime rate and earnings. Furthermore, the strength of the positive correlation was statistically significant, \(r = .43, p<.001\), \(95%\) CI[.24, .60].

After assessing the bivariate relationship between Crime rate and average weekly earnings, there was evidence of a positive linear relationship. The regression model was statistically significant, \(F(1, 78) = 18.19\), \(p<.001\) and explained 18.9%of the variability in crime rates, \(R^2 = 0.189\).

Discussion

As average weekly earnings increased, so did crime rates. This is contrary to the expectation that crime rate would decrease as wages went up.

Strengths of the study include the recording methods used, whereby the sample sizes for measures was large. Furthermore, as the data was collected by Government agencies, the data likely provides an accurate model of the population.

Limitations include the fact that different states may have different definitions of what a crime entails which would impact the number of crimes recorded. Furthermore, as assault cases were omitted this may cause the results to vary based on the assumption that assault rates are different by state.

As the average weekly earnings data was used for this analysis, a number of limitations arise. This includes the fact that the earnings may be skewed by composition of the workforce in any given recorded period. Additionally, the presence of a wealth gap would likely skew the results of any findings.

In future analysis, controlling for the distribution of earnings as well as crime type would help elucidate the quality and strength of the relationship. Furthermore, exploring the impact of additional predictor variables would provide a more complete picture of the crime rate.

References

[1] https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
[2] https://www.abs.gov.au/statistics/people/crime-and-justice/recorded-crime-victims-australia/latest-release
[3] https://www.abs.gov.au/methodologies/average-weekly-earnings-australia-methodology/may-2020#glossary
[4] http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ERP_QUARTERLY#