BUS2 194A - Project Task 7: R Markdown

1. Introduction:

(DUI) related fatalities have been an issue for decades and have been a public safety concern. This issue has contributed to thousands of fatalities every year, despite the efforts to raise awareness by making ads and laws to help prevent and enforce these accidents from happening. The project we have chosen will have us analyze data on (DUI-related fatalities in California to figure out if putting stricter laws and prevention can help prevent these types of accidents from happening.

2. Problem Statement

Alcohol-impaired driving remains a leading cause of fatal accidents on the road resulting in tragic deaths all over California, despite efforts to combat this issue, these accidents continue to occur at alarming rates with them not thinking about the consequences. This project aims to analyze patterns and contributing factors of alcohol-related accidents to provide insight that can tell us if more effective enforcement and policies are working to reduce these preventable incidents.

3. Objectives

Our main objective of this project is to address the high number of DUI-related deaths above the age of 21. This project aims to identify how much of these preventions have caused the percentage of these accidents to go down or up.

4. Method: Hypothesis Test: Paired T-Test

Load data set

#install.packages("readxl")  To read Excel files
#install.packages("dplyr")   For data manipulation
#install.packages("tidyr")   For tidying data

library(readxl)
## Warning: package 'readxl' was built under R version 4.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.4.2
data <- read_excel(file.choose())
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
head(data)
## # A tibble: 6 × 6
##   `Summary Table`                                  ...2  ...3  ...4  ...5  ...6 
##   <chr>                                            <chr> <chr> <chr> <chr> <chr>
## 1 Number of DUI Crashes involving Alcohol and Dru… <NA>  <NA>  <NA>  <NA>  <NA> 
## 2 County                                           Aver… 2022… <NA>  2023… <NA> 
## 3 <NA>                                             <NA>  DUI   % of  DUI   % of 
## 4 <NA>                                             <NA>  Cras… All   Cras… All  
## 5 <NA>                                             <NA>  <NA>  Cras… <NA>  Cras…
## 6 SISKIYOU                                         0.214 46.0  0.219 40.0  0.209

Clean the data file

cleaned_data <- data %>%
  slice(-(1:5))

cleaned_data <- cleaned_data %>%
  select(-c(...2, ...4, ...6))

cleaned_data <- cleaned_data %>%
  rename("County" = "Summary Table", "2022" = "...3", "2023" = "...5")

cleaned_data$`2022` <- as.numeric(cleaned_data$`2022`)
cleaned_data$`2023` <- as.numeric(cleaned_data$`2023`)

head(cleaned_data)
## # A tibble: 6 × 3
##   County    `2022` `2023`
##   <chr>      <dbl>  <dbl>
## 1 SISKIYOU      46     40
## 2 MENDOCINO     78     73
## 3 PLUMAS        24     12
## 4 TUOLUMNE      71     59
## 5 NEVADA        75     71
## 6 CALAVERAS     52     45

Summarize the data

summary(cleaned_data)
##     County               2022             2023        
##  Length:58          Min.   :   2.0   Min.   :   5.00  
##  Class :character   1st Qu.:  47.5   1st Qu.:  44.25  
##  Mode  :character   Median : 115.5   Median : 124.00  
##                     Mean   : 356.0   Mean   : 340.41  
##                     3rd Qu.: 306.8   3rd Qu.: 310.00  
##                     Max.   :4342.0   Max.   :4077.00
Interpretation:

In 2022:
Los Angeles has the highest number of DUI Crashes: 4,342
Alpine has the lowest number of DUI Crashes: 2
Average: 356 DUI Crashes amoung 58 counties in California

In 2023:
Los Angeles has the highest number of DUI Crashes: 4,077
Modoc has the lowest number of DUI Crashes: 5
Average: 340 DUI Crashes amoung 58 counties in California

Null and Alternate Hypothesis

Stating the parameters:

  µ1 is mean of DUI crashes in 2022
  µ2 is mean of DUI crashes in 2023
  
    ud = u1 - u2
  Ho: ud ≤ 0
  Ha: ud > 0

Summarizing the difference data

cleaned_data$Difference <- cleaned_data$`2022` - cleaned_data$`2023`
head(cleaned_data)
## # A tibble: 6 × 4
##   County    `2022` `2023` Difference
##   <chr>      <dbl>  <dbl>      <dbl>
## 1 SISKIYOU      46     40          6
## 2 MENDOCINO     78     73          5
## 3 PLUMAS        24     12         12
## 4 TUOLUMNE      71     59         12
## 5 NEVADA        75     71          4
## 6 CALAVERAS     52     45          7
summary(cleaned_data$Difference)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -56.00   -2.00    5.50   15.62   19.00  265.00
Interpretation:

For Riverside: DUI crashes increased the most compared to other counties (a rise of 56)
For Los Angeles: DUI crashes decreased the most compared to other counties (a reduction of 256)

Visualize the data - histogram

hist(cleaned_data$Difference,
      main = "Histogram of Differences in DUI Crashes between 2022 and 2023",
       xlab = "Differences between 2022 and 2023",
       ylab = "Different amount",
     col = "red",
     border = "black",
     breaks = 60
) 

Interpretation:

The distribution is right-skewed: The majority of differences are around -50 to 50
The majority of the figures are positive, showing a decrease in DUI crashes from 2022 to 2023
Outliers may be related to counties with larger populations, such as Los Angeles, Orange, Riverside, and San Diego. Higher population densities can cause higher fluctuations in the data

Perform the statistical test

test <- t.test(cleaned_data$`2022`, cleaned_data$`2023`, mu = 0, paired = TRUE, alternative = "greater", conf.level = 0.95)
list(test)
## [[1]]
## 
##  Paired t-test
## 
## data:  cleaned_data$`2022` and cleaned_data$`2023`
## t = 2.605, df = 57, p-value = 0.005849
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  5.594517      Inf
## sample estimates:
## mean difference 
##        15.62069
Interpretation:

We reject H0 as p-value (0.01) ≤ α (0.05)
Ho: ud ≤ 0 (FALSE)
Ha: ud > 0 (TRUE)

→ There is statistically significant evidence that the mean of DUI crashes in 2022 is greater than in 2023

Contributing factor: New DUI laws in California 2023, including harsher penalties like felony charges and vehicle impoundment encourage a reduction in DUI crashes.

6. Conclusion & Recommendation

  Conclusion:
  
There is statistically evidence that the mean of DUI crashes in 2022 is greater than in 2023.

New DUI laws in California 2023, including harsher penalties like felony charges and vehicle impoundment have proven effective in reducing DUI crashes.
  Recommendation:

1/ The implementation of the new DUI law has had a positive impact; therefore, it should be consistently enforced to minimize DUI-related accidents. Developing new laws and adjusting existing regulations based on this success could significantly reduce DUI-related accidents, not just in California, but nationwide.

2/ Drinking and Driving would never be worth it, with resources like Uber, Lyft and RideShare being available to anyone, people could avoid the risk of DUI fatalities.

3/ Public awareness campaigns and educational programs can help people understand the dangers and consequence of driving under the influence; therefore, helps preventing DUI incidents and develop safe driving habits.

```