COVID-19 and Crime in Western Australia
Did the lockdown associated with COVID-19 have a statistically significant impact on burglaries in Western Australia?
S3866666 Christopher Gallen
Last updated: 25 October, 2020
Data - How is the data distributed?
- The histogram of the burglary data shows an approximate normal distribution and some possible outliers at the lower end
# Plot histogram of burglaries with a normal distribution overlay
hist(crime$`Burglary Total`, main="Distribution of Burglaries", xlab="Burglaries", freq=FALSE)
curve(dnorm(x, mean=mean(crime$`Burglary Total`), sd=sd(crime$`Burglary Total`)),
col="red", lwd=2, n=100, add=TRUE)

Data - Variables
- The total number of burglaries along with the month and year variable were extracted from the complete dataset
- New variables were created to store the month and year separately
- The ‘Burglary Total’ and ‘Year’ variables are numeric, while the ‘Month’ variable is an ordered factor variable with 12 levels.
# Create new variables for year and month
crime <- crime[,c(1,33)]
crime <- mutate(crime, "Year"=year(`Month and Year`))
crime <- mutate(crime, "Month"=month(`Month and Year`, label=TRUE, abbr=FALSE))
crime <- crime[,-1]
Data - When did these outliers occur?
- Given the burglaries data has been shown to approximate a normal distribution, the normal scores method was used to identify the three outliers.
- These outliers were found to be from April, May and June 2020 and were separated from the remaining dataset.
# Identify outliers
z.scores <- crime$`Burglary Total` %>% scores(type="z")
z.scores %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.30369 -0.59174 0.09832 0.00000 0.61522 2.00945
crime[which(abs(z.scores)>3),]
# Remove outliers and save as "covid"
covid <- crime[c(which(abs(z.scores)>3)),]
# Save remainder as "preCovid"
preCovid <- crime[-c(which(abs(z.scores)>3)),]
Descriptive Statistics and Visualisation
- The histogram below shows the burglary data with the outliers removed
hist(preCovid$`Burglary Total`, main="Distribution of Burglaries", xlab="Burglaries", freq=FALSE)
curve(dnorm(x, mean=mean(preCovid$`Burglary Total`), sd=sd(crime$`Burglary Total`)),
col="red", lwd=2, n=100, add=TRUE)

Hypothesis Testing Cont.
- Although the sample size was small, being only three months, the fact that even the 99.9999% confidence interval does not capture the mean number of burglaries of these three months, or even any of the 3 months individually, suggests there is a statistically significant difference between the number of burglaries recorded during the April to June 2020 period and the January 2007 to March 2020 period.
# Apply t-test using a confidence interval of 99.9999%
t.test(preCovid$`Burglary Total`, conf.level = 0.999999)
##
## One Sample t-test
##
## data: preCovid$`Burglary Total`
## t = 113.84, df = 158, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 99.9999 percent confidence interval:
## 2793.176 3054.711
## sample estimates:
## mean of x
## 2923.943