Introduction

From the outbreak of ‘Covid-19’ at the beginning of 2020, Victoria has experienced six lockdown.
The first one started on 30 March 2020, and the last one ended on 21 October 2021, and 263 days in total.
Being forced to stay home for a long time may cause more mental issue and increase the conflicts between family members is the underlying logic of this report.
Another vital issue will be discussed is the impacts if pandemic depends on different local government areas.
This report use the records from Victoria Police regarding family incidents to examine whether the pandemic affects possibility of it.
The data covers the time period from 2017 to 2021, and collected at the end of June.
The data from 2018 and 2021 will be chosen according to the time line of the lockdwon period of Victoria.

Problem Statement

The question of this report is to discuss whether the family incidents rate increases after the lockdown
Descriptive statistic and related visualization method will be used to analysis the difference between pre and post pandemic.
A t-test will be conducted to compare and check if there is a significant differnce.

caption Source: Department of Health

Data

Search the term “family incident Victoria” in Google will lead to the wanting information for this report.
The data chosen is collected from Crime Statistic Agency Victoria State Government.
Source: https://www.crimestatistics.vic.gov.au/crime-statistics/latest-victorian-crime-data/family-incidents-2

Data Cont.

The dataset contains 79 observations and six variables, and the column name “lga” is short for “local government area”.
Variables:
- “2017” to “2021” simply stand for years;
- “region” is the region area where the local government area is located, there are four levels but it is not relevant with the analysis so no further preprocess will be conducted to this variable.
loading the data and display the data.

# using readr package 
family <- read_xlsx("/Users/noalgreen/Desktop/familyincidents.xlsx")
family %>% head(3)

Data Cont. Subsetting

subset the data into a new data set only includes year 2018 and year 2021 for further analysis

familynew <- family %>% subset( , c(1,3, 6))
#convert the data to longer version for further analysis 
familynew2 <- familynew %>% pivot_longer(cols = c(2,3), names_to = "year", values_to = "case_counting")
familynew2 %>% head(5)

Descriptive Statistics and Visualisation

The boxplot below shows the different range of family case numbers of 2018 and 2021.
It is noteworthy that in 2021 there are four outliers and extremely high, meanwhile there are only two in 2018 and lower than that in 2018.
Also it is clear that 2021 has a higher IQR

BoxPlot <- ggplot(data = familynew2, aes(x=year, y =case_counting)) + geom_boxplot(aes(fill=year))
BoxPlot+labs(title = "Median family incidents counting",x = "Year",y = "Case numbers")+stat_summary(fun.y = mean, colour = "red", geom = "point")

Decsriptive Statistics Cont.

The summary statistics shows in the table below, and it is clear that the data in two years are quite different.
The min is similar, but the max of 2021 is much higher, we can refer to the original data that the statistic of some the local government area are stable.
All the other statistics of 2021 are higher, and at this stage we can reasonably conclude to a hypothesis that the pandemic definitely has an impact on the family incidents in Victoria.

familynew2 %>% group_by(year) %>% summarise(Min = min(case_counting,na.rm = TRUE),
                                           Q1 = quantile(case_counting,probs = .25,na.rm = TRUE),
                                           Median = median(case_counting, na.rm = TRUE),
                                           Q3 = quantile(case_counting,probs = .75,na.rm = TRUE),
                                           Max = max(case_counting,na.rm = TRUE),
                                           Mean = mean(case_counting, na.rm = TRUE),
                                           SD = sd(case_counting, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(case_counting))) -> table1
knitr::kable(table1)

year	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
2018	14	190.5	733	1453.5	4382	961.8228	930.4594	79	0
2021	15	297.0	915	1663.5	5487	1179.8481	1148.0870	79	0

Hypothesis Testing

Independence assumption: the data set were collected independently from each year, and thesharing measurement is to record the counting numbers, so it is independent.
Hypothesis : the data of 2018 and 2019 has a linear relationship.
Linearity: based on the plot, we can see the data of 2018 and 2021 present a linear relationship; and according to the summary statistics, we can see the “Multiple R-squared” is 0.967, which is pretty close to 1 and presents a linear relationship as well.

model1 <- lm(familynew$`2018` ~ familynew$`2021`, data = familynew)
summary <- model1 %>% summary()
p <- plot(familynew$`2021` ~ familynew$`2018`, data = familynew, xlab = "2018.", ylab = "2021") +
abline(model1, col = "red")

## integer(0)

summary <- model1 %>% summary()
summary

## 
## Call:
## lm(formula = familynew$`2018` ~ familynew$`2021`, data = familynew)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -572.26  -42.27  -12.50   65.40  566.14 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      21.51743   27.52234   0.782    0.437    
## familynew$`2021`  0.79697    0.01677  47.524   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 170 on 77 degrees of freedom
## Multiple R-squared:  0.967,  Adjusted R-squared:  0.9666 
## F-statistic:  2259 on 1 and 77 DF,  p-value: < 2.2e-16

Hypthesis Testing Cont.

Here are the examples of mathematical equations:

\[H_0: \mu_1 = \mu_2 \]

\[H_A: \mu_1 \ne \mu_2\]

\[S = \sum^n_{i = 1}d^2_i\]

t.test(
  case_counting ~ year,
  data = familynew2,
  var.equal = FALSE,
  alternative = "two.sided"
  )

## 
##  Welch Two Sample t-test
## 
## data:  case_counting by year
## t = -1.3113, df = 149.58, p-value = 0.1918
## alternative hypothesis: true difference in means between group 2018 and group 2021 is not equal to 0
## 95 percent confidence interval:
##  -546.5550  110.5044
## sample estimates:
## mean in group 2018 mean in group 2021 
##           961.8228          1179.8481

Discussion

The first step of the report used boxplot and showed the extreme outliers of 2021 showing a general trend that the family incident issue is getting serious after pandemic in Victoria.
Then Hypothesis testing were used to determine that how the pandemic affects Victorians is independent from different local government areas.
The results of two-sample t-test shows the data in 2021 is higher than 2018, which agrees with the conclusion of the first step that the pandemic makes the family-related crime more serious.
The strength of the analysis is that the objective is clear, the variables are easy to control, and the data is easy to obtain and it is relatively straightforward to get a conclusion.
The limitation is the sample size is not limited, and only shows the trend of Victoria; and there is a realistic issue that the time periods of lockdown are different all over the world, so it will be much harder to control the variables if a more general trend is desired. The future investigations could expand the sample range.

Math1324 Assignment2

How does the pandemic affects family incidents in Victoria ?

Introduction

Problem Statement

Data

Data Cont.

Data Cont. Subsetting

Descriptive Statistics and Visualisation

Decsriptive Statistics Cont.

Hypothesis Testing

Hypthesis Testing Cont.

Discussion

References