2024-02-04

##Cincinnati Car Accidents February 2024 1 Introduction

As a rideshare (Uber/Lyft) driver that primarily drives Saturday night I wanted to analyze some data on car accidents in Cincinnati. In this data analysis I explored the following: 1. Was there more accidents on Saturdays for the year 2023 in Cincinnati? 2. On a Saturday night and picking up a lot of nightlife patrons, I wanted to analyze if there were more accidents downtown (which is zipcode 45202) where there are more nightlife establishments. 3. An analysis of accidents by year for 2013-2023 was also reviewed.

The dataframe was downloaded into excel from the https://data.cincinnati- oh.gov/

First Statisicall Analysis For how the data was extracted, a t statistic analysis was done to see if more accidents took place on Saturday or in zip code 45202 in Cincinnati. For the first analysis, by extracting data, the sample mean of accidents for Sunday through Friday was found to be 4233.83 per day, and Saturday’s accidents was 4057.

Next the Standard Deviation of formula was found for Sunday through Fri- day, the equation being σ = ∑(xi−μ)/ (sqrt(n) The standard deviation was found to be 513.8807. To continue with the t statistic analysis, the standard error mean can be found by dividing sigma/sqrt(n) The tvalue is calculated by taking ξ − mu/(sigma/squareroot(n). The tcal- culated value was 0.84 and the tlookup for 5 degrees of freedom and setting a risk of 5 percent, t distribution is 2.57. Since tvalue ¡is less than tdistribution we accept the null hypothesis that accidents on Saturday is not statistically different than Sunday through Friday.

Second Statisicall Analysis For the second analysis the sample mean was found with the r equation y = summarize(accidents2023byzip, abundance = n()) y1 = y[-c(2),] (to take out zip code 45202.) mean(y1)

The mean of the other 39 zipcodes in Cincinnati 658.3333 and standard deviation was The standard deviation was 646.63 With n being 39, the standard error mean was found to 103. T calc = (xi − mu)/(sigma/sqrt(n)) With the Tcalculated being 30.48 being obviously larger that the t distribution based on 38 degrees of freedom and 0.05 risk level, it is is safe to say the null hypothesis is false and that 45202 there is a higher risk of being in a possible accident compared to the other zip codes in Cincinnati.

R Script Used To Reach Conclusions

{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) library(readxl) library(dplyr) Cincycaraccidents = readxl::read_excel(“/Users/laudenbergert/Onedrive/Documents/Cincycaraccidents.xlsx”) library(plotly) library(ggplot2)

accidents2023 <- Cincycaraccidents %>% filter(grepl(‘2023’, CRASHDATE)) accidents2023bydayandzip = select(accidents2023, DAYOFWEEK, ZIP) accidents2023bydayandzipsum = accidents2023bydayandzip %>% group_by(ZIP) accidents2023byzip <- group_by(accidents2023bydayandzipsum, ZIP)

summarize(accidents2023byzip, abundance = n()) accidents2023byzipandday<- group_by(accidents2023bydayandzipsum, ZIP, DAYOFWEEK) summarize(accidents2023byzipandday, abundance = n()) accidents2023byday <- group_by(accidents2023bydayandzipsum, DAYOFWEEK) summarize(accidents2023byday, abundance = n())

R Script Used To Reach Conclusions 2nd page

accidentsbyyear <- select(Cincycaraccidents, CRASHDATE, ) accidentsbyyear2 = mutate(accidentsbyyear, YEAR = substr(accidentsbyyear$CRASHDATE, 1,4 )) accidentbyyears2a <- group_by(accidentsbyyear2, YEAR) summarize(accidentbyyears2a, abundance = n())

figy = plot_ly(x = accidentbyyears2a$YEAR, type = “histogram”)

barbyyear <- ggplot(data=accidentsbyyear2, aes(x=YEAR)) + geom_bar(stat = “Count”, width = 0.7, fill=“red”)+ theme_minimal()

barbyday2023 <- ggplot(data=accidents2023byday, aes(x=DAYOFWEEK)) + geom_bar(stat = “Count”, width = 0.7, fill=“red”)+ theme_minimal()

R Script Used to Reach Conclusions 3rd page

xax <- list( title = “Zip Code”, titlefont = list(family=“Modern Computer Roman”) ) yax <- list( title = “Number of Accidents”, titlefont = list(family=“Modern Computer Roman”) )

figz <- plot_ly(x = accidents2023byzip$ZIP, type = “histogram”) %>% layout(xaxis=xax, yaxis=yax)

y = summarize(accidents2023byzip, abundance = n()) y1 = y[-c(2),]

##2023 Cincinnati Car Accidents by Day of the Week Plot

barbyday2023

2023 Cincinnati Car Accidents by Zip Code

figz

Cincinnati Car Accidents by Year PLOT

barbyyear