MATH1324 Assignment 3

Fatal Crashes

Kashish Kohli-s3794337
Mohammad Nasir Uddin-s3713833
Zawar Shah-s3733102

Last updated: 27 October, 2019

Introduction

Road deaths in Australia is a big concern for the government and the people here. Sometimes due to vehicle manufacturer’s fault or sometimes due to the driver’s carelessness, people have suffered a lot. The accident type could single vehicle, multiple vehicles, or pedestrian. Everytime a road accident happens, the chances of death are 50%. That’s why the name: Fatal Crashes.

Although, the road death rate has fallen to its lowest in four years due to some strict policies, but still they are in 1000s.[1]

Problem Statement

Data

Data Cont.

Here, we have loaded the data in the dataframe. For the simplicity of this experiment, the columns which were not required were removed prior to loading of the dataset.

dataframe <- read_csv("Fatal Crashes.csv", col_types = cols(Time = col_time(format = "%H:%M")))

As we are only interested in the analysis for years 2015, 2016 and 2017, we have filtered the Years column.

filtered_data <- dataframe %>% filter(Year %in% c(2015, 2016, 2017))

In the next step, we have factored the variables in the dataset.

sampled_data <- filtered_data %>% mutate(
  Month = factor(Month,
                 levels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
                 ordered = TRUE),
  State = factor(State, levels = c("NSW", "VIC", "QLD", "SA", "WA", "TAS", "NT", "ACT")),
  Dayweek = factor(Dayweek, levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
                   ordered = TRUE),
  Crash_Type = factor(Crash_Type, levels = c("Single vehicle", "Multiple vehicle", "Pedestrian")),
  Christmas_Period = factor(Christmas_Period, levels = c("Yes", "No")),
  Easter_Period = factor(Easter_Period, levels = c("Yes", "No")),
  Year = factor(Year, levels = c(2015, 2016, 2017), ordered = TRUE))

Data Cont.

Now, removing the NA values

which(is.na(sampled_data$Christmas_Period))
## [1] 99
which(is.na(sampled_data$Easter_Period))
## [1] 99

As there is only one column for NAs, we have removed it from the dataset.

sampled_data <- filter(sampled_data,  !is.na(sampled_data$Christmas_Period))
sampled_data <- filter(sampled_data,  !is.na(sampled_data$Easter_Period))

Descriptive Statistics and Visualisation

This barplot depicts the number of fatalities in different states. Here, we can see that the highest number of fatalities are recorded in NSW.

barplot(prop.table(table(sampled_data$State)), main = "Proportion distribution of death in states", xlab = "State/Territory", ylab="Proportion", col = c("skyblue", "lightgreen", "yellow"), ylim=c(0, 0.8))

Descriptive Statistics Cont.

This barplot depicts the number of fatalities by crash type. Here, we can see that the highest number of accidents are recorded for Multiple vehicle crash type and the least involve Pedestrian crash type.

barplot(prop.table(table(sampled_data$Crash_Type)), main = "Proportion distribution of death in crash type", xlab = "Crash type", ylab="Proportion", col = c("skyblue", "lightgreen", "yellow"))

Descriptive Statistics Cont.

In the next barplots, we can see the accidents distribution by Month, DayWeek and Year. For Month, we can see that the highest number is recorded in December whereas lowest in February. For DayWeek, we can see that highest number of accidents are recorded on Saturday. For Year, 2015 recorded the highest number.

par(mfrow=c(1, 4))
barplot(prop.table(table(sampled_data$Month)), col = c("skyblue", "lightgreen", "yellow"), ylab = "Proportion", xlab = "Months")
barplot(prop.table(table(sampled_data$Dayweek)), col = c("skyblue", "lightgreen", "yellow", "red"), ylab = "Proportion", xlab = "Days of week")
barplot(prop.table(table(sampled_data$Year)), col = c("skyblue", "lightgreen", "yellow", "red"), ylab = "Proportion", xlab = "Years")
hist(sampled_data$Speed_Limit, main = "Histogram for frequency of accidents happening in speed limits", xlab = "speed limits", ylab = "Frequency", ylim = c(0, 500), col = c("skyblue", "lightgreen", "yellow"))

From the histogram, it could be seen that the highest number of accidents are recorded in the areas where speed limits are 90-100.

Descriptive Statistics Cont.

From the boxplot, the speed limit distribution can be seen.

boxplot(sampled_data$Speed_Limit, ylab = "Speed", col = "red", main = "Boxplot for speed limit", ylim = c(0, 150))

Descriptive Statistics Cont.

The speed limit summary can be seen throught the textual method below. The mean can be noted as: 80.518666, median:80, standard deviation: 21.52277.

stats_summary_speed_limit <- sampled_data %>%
  summarise(
    Min = min(Speed_Limit),
    Q1 = quantile(Speed_Limit)[2],
    Median = median(Speed_Limit),
    Mean = mean(Speed_Limit),
    Q3 = quantile(Speed_Limit)[4],
    Max = max(Speed_Limit),
    SD = sd(Speed_Limit),
    IQR = IQR(Speed_Limit),
    Total = n(),
    Missing = sum(is.na(Speed_Limit))
  ) -> table2
knitr::kable(table2)
Min Q1 Median Mean Q3 Max SD IQR Total Missing
10 60 80 80.51866 100 130 21.52277 40 1340 0

Descriptive Statistics Cont.

The overall summary of the variables can be seen below:

Using the summary function we can check the number of accidents roughly according to variables

table <- sampled_data %>% summary()

knitr::kable(table)
Crash ID State Month Year Dayweek Time Crash_Type Speed_Limit Christmas_Period Easter_Period
Min. :20151001 NSW :1032 December :151 2015:516 Sunday :210 Length:1340 Single vehicle :545 Min. : 10.00 Yes: 56 Yes: 12
1st Qu.:20152018 VIC : 198 August :143 2016:458 Monday :169 Class1:hms Multiple vehicle:566 1st Qu.: 60.00 No :1284 No :1328
Median :20161154 NT : 61 October :136 2017:366 Tuesday :172 Class2:difftime Pedestrian :229 Median : 80.00 NA NA
Mean :20160674 ACT : 25 November :132 NA Wednesday:184 Mode :numeric NA Mean : 80.52 NA NA
3rd Qu.:20171031 TAS : 24 July :118 NA Thursday :186 NA NA 3rd Qu.:100.00 NA NA
Max. :20172144 QLD : 0 September:108 NA Friday :180 NA NA Max. :130.00 NA NA
NA (Other): 0 (Other) :552 NA Saturday :239 NA NA NA NA NA

Hypothesis Testing

NOTE: The significance level in every test is 0.05.

One sample t-test

The mean speed limit where fatalities happened is 80. \[H_0: \mu_1 = 80 \] The mean speed limit where fatalities happened is not 80. \[H_A: \mu_1 \ne 80\]

t.test(sampled_data$Speed_Limit, mu = 80, alternative="two.sided")
## 
##  One Sample t-test
## 
## data:  sampled_data$Speed_Limit
## t = 0.88213, df = 1339, p-value = 0.3779
## alternative hypothesis: true mean is not equal to 80
## 95 percent confidence interval:
##  79.36524 81.67207
## sample estimates:
## mean of x 
##  80.51866

Hypthesis Testing Cont.

Chi-square goodness of fit test

table(sampled_data$Year) %>% prop.table()
## 
##      2015      2016      2017 
## 0.3850746 0.3417910 0.2731343
pop_prop <- c(1/3, 1/3, 1/3)
chi1 <- chisq.test(table(sampled_data$Year), p = pop_prop)
chi1
## 
##  Chi-squared test for given probabilities
## 
## data:  table(sampled_data$Year)
## X-squared = 25.618, df = 2, p-value = 2.736e-06
chi1$observed
## 
## 2015 2016 2017 
##  516  458  366
chi1$expected
##     2015     2016     2017 
## 446.6667 446.6667 446.6667

Hypthesis Testing Cont.

Chi-square test of association

H0: There is no association of crash type dependent on weekday (independence) HA: There is an association of crash type dependent on weekday (dependence)

table(sampled_data$Crash_Type, sampled_data$Dayweek) %>% prop.table() -> table_proportion
knitr::kable(table_proportion)
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Single vehicle 0.0858209 0.0507463 0.0440299 0.0447761 0.0544776 0.0485075 0.0783582
Multiple vehicle 0.0507463 0.0537313 0.0597015 0.0686567 0.0582090 0.0597015 0.0716418
Pedestrian 0.0201493 0.0216418 0.0246269 0.0238806 0.0261194 0.0261194 0.0283582
chi2 <- chisq.test(table(sampled_data$Crash_Type, sampled_data$Dayweek))
chi2
## 
##  Pearson's Chi-squared test
## 
## data:  table(sampled_data$Crash_Type, sampled_data$Dayweek)
## X-squared = 28.973, df = 12, p-value = 0.003976
chi2$observed
##                   
##                    Sunday Monday Tuesday Wednesday Thursday Friday
##   Single vehicle      115     68      59        60       73     65
##   Multiple vehicle     68     72      80        92       78     80
##   Pedestrian           27     29      33        32       35     35
##                   
##                    Saturday
##   Single vehicle        105
##   Multiple vehicle       96
##   Pedestrian             38
chi2$expected
##                   
##                      Sunday   Monday  Tuesday Wednesday Thursday   Friday
##   Single vehicle   85.41045 68.73507 69.95522  74.83582 75.64925 73.20896
##   Multiple vehicle 88.70149 71.38358 72.65075  77.71940 78.56418 76.02985
##   Pedestrian       35.88806 28.88134 29.39403  31.44478 31.78657 30.76119
##                   
##                     Saturday
##   Single vehicle    97.20522
##   Multiple vehicle 100.95075
##   Pedestrian        40.84403

Hypthesis Testing Cont.

Simple Linear Regression - Pedestrian type increase in 2017

acc_sample <- sampled_data %>% filter(Year == 2017, Crash_Type == "Pedestrian")
month_table <- table(acc_sample$Month)

as.vector(month_table)
##  [1] 4 2 4 3 7 5 9 4 5 3 5 5
df <- data.frame(month = c(1:12), Pedestrian = month_table)
names(df)[3] <- paste("Pedestrian")

maxmodel <- lm(Pedestrian ~ month, data = df)
maxmodel %>% summary()
## 
## Call:
## lm(formula = Pedestrian ~ month, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1072 -0.9796 -0.2296  0.1183  4.2704 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.8485     1.1742   3.277  0.00832 **
## month         0.1259     0.1595   0.789  0.44844   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.908 on 10 degrees of freedom
## Multiple R-squared:  0.0586, Adjusted R-squared:  -0.03554 
## F-statistic: 0.6224 on 1 and 10 DF,  p-value: 0.4484

Discussion

References

-[1] The Guardianhttps://www.theguardian.com/australia-news/2019/jan/17/australias-road-deaths-at-lowest-level-in-four-years

-[2] Australian Government,Department of Infrastructure, Transport, Cities and Regional Development https://www.bitre.gov.au/statistics/safety/