knitr::include_graphics("/Users/natalia/Stats/Datasets/carimage.jpeg")

jimglaserlaw.com

When looking for research questions and datasets, I tried to look for some that could relate most to me and the people around me. I, and many of my peers, have begun driving. However, any more information explaining patterns based on the severity of a car crash could be useful to a new or even experienced driver. The dataset I have chosen was originally part of a very large dataset with over 7 million observations. Luckily, the creators of the dataset had already posted a sampled dataset with 500k observations. The dataset consists of entries and information of car crashes in the U.S. from 2018-2023. The variables I am including are severity, the severity of a crash on a scale of 1-4. Additionally, I am using distancemi, which is the distance in miles that the accident affected, and lastly weather_condition, and state. To clean the data, I started off by making sure all the column names are lowercase with no parentheses, and no percentage symbols. I also created a new variable, taking my data and then selecting only certain columns, removing NAs, and finally creating new columns by turning variables into categories.

#setting the working directory, allowing access to my data set
setwd("/Users/natalia/Stats/Datasets")

#calling in the libraries
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

#creating a new variable that reads my data
accidents <- read_csv("US_Accidents_March23_sampled_500k.csv")
## Rows: 500000 Columns: 46
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (17): ID, Source, Description, Street, City, County, State, Zipcode, Co...
## dbl  (13): Severity, Start_Lat, Start_Lng, End_Lat, End_Lng, Distance(mi), T...
## lgl  (13): Amenity, Bump, Crossing, Give_Way, Junction, No_Exit, Railway, Ro...
## dttm  (3): Start_Time, End_Time, Weather_Timestamp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#making sure all the column names are lowercase with no parentheses, and no percentage symbols
colnames(accidents) <- tolower(colnames(accidents))
colnames(accidents) <- gsub("[()]", "", colnames(accidents))
colnames(accidents) <- gsub("[%]", "", colnames(accidents))

#creating a new variable, taking my data and then selecting only certain columns, removing NAs, and finally creating new columns by turning variables into categories
accidents3 <- accidents %>%
  select(severity, distancemi, temperaturef, humidity, wind_speedmph,
         city, county, state, country,
         weather_condition, wind_direction, precipitationin,
         sunrise_sunset, traffic_signal) %>%
  filter(!is.na(severity), !is.na(distancemi), !is.na(temperaturef)) %>%
  mutate(across(c(city, county, state, country,
                weather_condition, wind_direction, precipitationin,
                sunrise_sunset, traffic_signal), as.factor))

#printing the new data
accidents3
## # A tibble: 489,534 × 14
##    severity distancemi temperaturef humidity wind_speedmph city     county state
##       <dbl>      <dbl>        <dbl>    <dbl>         <dbl> <fct>    <fct>  <fct>
##  1        2      0               77       62             5 Zachary  East … LA   
##  2        2      0.056           45       48             5 Sterling Loudo… VA   
##  3        2      0.022           68       73            13 Lompoc   Santa… CA   
##  4        2      1.05            27       86            15 Austin   Mower  MN   
##  5        2      0.046           42       34             0 Bakersf… Kern   CA   
##  6        2      0               42       58            13 Peabody  Essex  MA   
##  7        2      0               35       89             0 Gold Hi… Jacks… OR   
##  8        2      0.047           90       55            12 Panama … Bay    FL   
##  9        2      0.038           91       39             7 Dallas   Dallas TX   
## 10        2      1.30            63       78            10 Indiana… Marion IN   
## # ℹ 489,524 more rows
## # ℹ 6 more variables: country <fct>, weather_condition <fct>,
## #   wind_direction <fct>, precipitationin <fct>, sunrise_sunset <fct>,
## #   traffic_signal <fct>
#linear regression model, how does weather condition impact severity of a crash
lm_model2 <- lm(severity ~ weather_condition,data=accidents3)

#creating a summary of the model
summary(lm_model2)
## 
## Call:
## lm(formula = severity ~ weather_condition, data = accidents3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3872 -0.2214 -0.1447 -0.1299  1.9106 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                    2.06250    0.11930  17.288
## weather_conditionBlowing Dust / Windy         -0.12132    0.16622  -0.730
## weather_conditionBlowing Snow                  0.15179    0.14020   1.083
## weather_conditionBlowing Snow / Windy          0.04276    0.13501   0.317
## weather_conditionClear                         0.30971    0.11932   2.596
## weather_conditionCloudy                        0.09683    0.11932   0.812
## weather_conditionCloudy / Windy                0.11468    0.12019   0.954
## weather_conditionDrifting Snow / Windy        -0.06250    0.49190  -0.127
## weather_conditionDrizzle                       0.22945    0.12247   1.874
## weather_conditionDrizzle / Windy               0.93750    0.49190   1.906
## weather_conditionDrizzle and Fog               0.07543    0.14862   0.508
## weather_conditionFair                          0.06736    0.11931   0.565
## weather_conditionFair / Windy                  0.08499    0.11971   0.710
## weather_conditionFog                           0.08225    0.11945   0.689
## weather_conditionFog / Windy                  -0.04363    0.13613  -0.321
## weather_conditionFreezing Drizzle             -0.06250    0.35791  -0.175
## weather_conditionFreezing Rain                -0.06250    0.18691  -0.334
## weather_conditionFunnel Cloud                  0.10417    0.22845   0.456
## weather_conditionHail                         -0.06250    0.24450  -0.256
## weather_conditionHaze                          0.16186    0.11950   1.355
## weather_conditionHaze / Windy                  0.02693    0.12683   0.212
## weather_conditionHeavy Drizzle                 0.18750    0.14956   1.254
## weather_conditionHeavy Freezing Drizzle       -0.06250    0.49190  -0.127
## weather_conditionHeavy Rain                    0.19494    0.11975   1.628
## weather_conditionHeavy Rain / Windy            0.11023    0.12769   0.863
## weather_conditionHeavy Sleet                  -0.06250    0.35791  -0.175
## weather_conditionHeavy Snow                    0.13686    0.12233   1.119
## weather_conditionHeavy Snow / Windy            0.06713    0.13583   0.494
## weather_conditionHeavy T-Storm                 0.09874    0.12078   0.818
## weather_conditionHeavy T-Storm / Windy         0.10855    0.13126   0.827
## weather_conditionHeavy Thunderstorms and Rain  0.33750    0.12479   2.704
## weather_conditionIce Pellets                   0.06250    0.20664   0.302
## weather_conditionLight Blowing Snow            1.93750    0.49190   3.939
## weather_conditionLight Drizzle                 0.18274    0.11995   1.523
## weather_conditionLight Drizzle / Windy         0.12798    0.15836   0.808
## weather_conditionLight Freezing Drizzle        0.35129    0.12981   2.706
## weather_conditionLight Freezing Fog            0.27083    0.13583   1.994
## weather_conditionLight Freezing Rain           0.29248    0.12337   2.371
## weather_conditionLight Freezing Rain / Windy   0.10417    0.22845   0.456
## weather_conditionLight Hail                   -0.06250    0.49190  -0.127
## weather_conditionLight Haze                   -0.06250    0.49190  -0.127
## weather_conditionLight Ice Pellets             0.47596    0.17819   2.671
## weather_conditionLight Rain                    0.18154    0.11935   1.521
## weather_conditionLight Rain / Windy            0.11245    0.12119   0.928
## weather_conditionLight Rain Shower             0.08750    0.16006   0.547
## weather_conditionLight Rain Shower / Windy    -0.39583    0.30024  -1.318
## weather_conditionLight Rain Showers            0.39904    0.17819   2.239
## weather_conditionLight Rain with Thunder       0.15292    0.12047   1.269
## weather_conditionLight Sleet                   0.15489    0.15535   0.997
## weather_conditionLight Snow                    0.17360    0.11942   1.454
## weather_conditionLight Snow / Windy            0.05945    0.12161   0.489
## weather_conditionLight Snow and Sleet         -0.06250    0.18691  -0.334
## weather_conditionLight Snow and Sleet / Windy -0.06250    0.30024  -0.208
## weather_conditionLight Snow Grains            -0.06250    0.49190  -0.127
## weather_conditionLight Snow Shower             0.13750    0.24450   0.562
## weather_conditionLight Snow Showers            1.43750    0.35791   4.016
## weather_conditionLight Snow with Thunder      -0.06250    0.30024  -0.208
## weather_conditionLight Thunderstorms and Rain  0.45101    0.12249   3.682
## weather_conditionLight Thunderstorms and Snow  0.93750    0.35791   2.619
## weather_conditionLow Drifting Snow             0.93750    0.49190   1.906
## weather_conditionMist                          0.20207    0.12351   1.636
## weather_conditionMist / Windy                 -0.06250    0.49190  -0.127
## weather_conditionMostly Cloudy                 0.15888    0.11932   1.332
## weather_conditionMostly Cloudy / Windy         0.10402    0.12015   0.866
## weather_conditionN/A Precipitation             0.08456    0.12389   0.683
## weather_conditionOvercast                      0.32468    0.11934   2.721
## weather_conditionPartial Fog                  -0.06250    0.30024  -0.208
## weather_conditionPartial Fog / Windy           0.93750    0.49190   1.906
## weather_conditionPartly Cloudy                 0.15382    0.11933   1.289
## weather_conditionPartly Cloudy / Windy         0.10289    0.12076   0.852
## weather_conditionPatches of Fog                0.11719    0.12298   0.953
## weather_conditionRain                          0.19729    0.11948   1.651
## weather_conditionRain / Windy                  0.14482    0.12499   1.159
## weather_conditionRain Shower                  -0.06250    0.49190  -0.127
## weather_conditionRain Showers                 -0.06250    0.24450  -0.256
## weather_conditionSand / Dust Whirlwinds        0.13750    0.24450   0.562
## weather_conditionScattered Clouds              0.31515    0.11938   2.640
## weather_conditionShallow Fog                   0.08601    0.12394   0.694
## weather_conditionShowers in the Vicinity       0.11728    0.12959   0.905
## weather_conditionSleet                        -0.06250    0.22845  -0.274
## weather_conditionSleet / Windy                -0.06250    0.49190  -0.127
## weather_conditionSmall Hail                    0.43750    0.19237   2.274
## weather_conditionSmoke                         0.17000    0.12049   1.411
## weather_conditionSmoke / Windy                 0.65179    0.21626   3.014
## weather_conditionSnow                          0.18263    0.12028   1.518
## weather_conditionSnow / Windy                  0.10633    0.13112   0.811
## weather_conditionSnow and Sleet                0.18750    0.14956   1.254
## weather_conditionSnow and Sleet / Windy       -0.06250    0.26677  -0.234
## weather_conditionSnow and Thunder             -0.06250    0.49190  -0.127
## weather_conditionSnow Grains                   0.93750    0.49190   1.906
## weather_conditionSqualls                      -0.06250    0.49190  -0.127
## weather_conditionSqualls / Windy               0.43750    0.35791   1.222
## weather_conditionT-Storm                       0.15983    0.12021   1.330
## weather_conditionT-Storm / Windy               0.13750    0.13318   1.032
## weather_conditionThunder                       0.07787    0.12034   0.647
## weather_conditionThunder / Windy               0.14680    0.13975   1.050
## weather_conditionThunder / Wintry Mix         -0.06250    0.30024  -0.208
## weather_conditionThunder and Hail             -0.06250    0.49190  -0.127
## weather_conditionThunder in the Vicinity       0.10063    0.12011   0.838
## weather_conditionThunderstorm                  0.35069    0.12257   2.861
## weather_conditionThunderstorms and Rain        0.35417    0.12576   2.816
## weather_conditionTornado                      -0.06250    0.49190  -0.127
## weather_conditionVolcanic Ash                 -0.06250    0.49190  -0.127
## weather_conditionWidespread Dust               0.02841    0.18691   0.152
## weather_conditionWidespread Dust / Windy      -0.06250    0.30024  -0.208
## weather_conditionWintry Mix                    0.07135    0.12053   0.592
## weather_conditionWintry Mix / Windy            0.05515    0.16622   0.332
##                                               Pr(>|t|)    
## (Intercept)                                    < 2e-16 ***
## weather_conditionBlowing Dust / Windy         0.465459    
## weather_conditionBlowing Snow                 0.278968    
## weather_conditionBlowing Snow / Windy         0.751449    
## weather_conditionClear                        0.009443 ** 
## weather_conditionCloudy                       0.417065    
## weather_conditionCloudy / Windy               0.339991    
## weather_conditionDrifting Snow / Windy        0.898895    
## weather_conditionDrizzle                      0.060992 .  
## weather_conditionDrizzle / Windy              0.056669 .  
## weather_conditionDrizzle and Fog              0.611763    
## weather_conditionFair                         0.572357    
## weather_conditionFair / Windy                 0.477753    
## weather_conditionFog                          0.491098    
## weather_conditionFog / Windy                  0.748570    
## weather_conditionFreezing Drizzle             0.861376    
## weather_conditionFreezing Rain                0.738094    
## weather_conditionFunnel Cloud                 0.648412    
## weather_conditionHail                         0.798243    
## weather_conditionHaze                         0.175572    
## weather_conditionHaze / Windy                 0.831839    
## weather_conditionHeavy Drizzle                0.209948    
## weather_conditionHeavy Freezing Drizzle       0.898895    
## weather_conditionHeavy Rain                   0.103562    
## weather_conditionHeavy Rain / Windy           0.387993    
## weather_conditionHeavy Sleet                  0.861376    
## weather_conditionHeavy Snow                   0.263266    
## weather_conditionHeavy Snow / Windy           0.621163    
## weather_conditionHeavy T-Storm                0.413612    
## weather_conditionHeavy T-Storm / Windy        0.408247    
## weather_conditionHeavy Thunderstorms and Rain 0.006841 ** 
## weather_conditionIce Pellets                  0.762304    
## weather_conditionLight Blowing Snow           8.19e-05 ***
## weather_conditionLight Drizzle                0.127636    
## weather_conditionLight Drizzle / Windy        0.419016    
## weather_conditionLight Freezing Drizzle       0.006807 ** 
## weather_conditionLight Freezing Fog           0.046169 *  
## weather_conditionLight Freezing Rain          0.017750 *  
## weather_conditionLight Freezing Rain / Windy  0.648412    
## weather_conditionLight Hail                   0.898895    
## weather_conditionLight Haze                   0.898895    
## weather_conditionLight Ice Pellets            0.007561 ** 
## weather_conditionLight Rain                   0.128237    
## weather_conditionLight Rain / Windy           0.353456    
## weather_conditionLight Rain Shower            0.584614    
## weather_conditionLight Rain Shower / Windy    0.187378    
## weather_conditionLight Rain Showers           0.025131 *  
## weather_conditionLight Rain with Thunder      0.204295    
## weather_conditionLight Sleet                  0.318757    
## weather_conditionLight Snow                   0.146043    
## weather_conditionLight Snow / Windy           0.624936    
## weather_conditionLight Snow and Sleet         0.738094    
## weather_conditionLight Snow and Sleet / Windy 0.835100    
## weather_conditionLight Snow Grains            0.898895    
## weather_conditionLight Snow Shower            0.573865    
## weather_conditionLight Snow Showers           5.91e-05 ***
## weather_conditionLight Snow with Thunder      0.835100    
## weather_conditionLight Thunderstorms and Rain 0.000231 ***
## weather_conditionLight Thunderstorms and Snow 0.008810 ** 
## weather_conditionLow Drifting Snow            0.056669 .  
## weather_conditionMist                         0.101821    
## weather_conditionMist / Windy                 0.898895    
## weather_conditionMostly Cloudy                0.183009    
## weather_conditionMostly Cloudy / Windy        0.386640    
## weather_conditionN/A Precipitation            0.494920    
## weather_conditionOvercast                     0.006517 ** 
## weather_conditionPartial Fog                  0.835100    
## weather_conditionPartial Fog / Windy          0.056669 .  
## weather_conditionPartly Cloudy                0.197361    
## weather_conditionPartly Cloudy / Windy        0.394190    
## weather_conditionPatches of Fog               0.340627    
## weather_conditionRain                         0.098691 .  
## weather_conditionRain / Windy                 0.246603    
## weather_conditionRain Shower                  0.898895    
## weather_conditionRain Showers                 0.798243    
## weather_conditionSand / Dust Whirlwinds       0.573865    
## weather_conditionScattered Clouds             0.008293 ** 
## weather_conditionShallow Fog                  0.487678    
## weather_conditionShowers in the Vicinity      0.365464    
## weather_conditionSleet                        0.784406    
## weather_conditionSleet / Windy                0.898895    
## weather_conditionSmall Hail                   0.022952 *  
## weather_conditionSmoke                        0.158279    
## weather_conditionSmoke / Windy                0.002579 ** 
## weather_conditionSnow                         0.128923    
## weather_conditionSnow / Windy                 0.417380    
## weather_conditionSnow and Sleet               0.209948    
## weather_conditionSnow and Sleet / Windy       0.814766    
## weather_conditionSnow and Thunder             0.898895    
## weather_conditionSnow Grains                  0.056669 .  
## weather_conditionSqualls                      0.898895    
## weather_conditionSqualls / Windy              0.221571    
## weather_conditionT-Storm                      0.183665    
## weather_conditionT-Storm / Windy              0.301872    
## weather_conditionThunder                      0.517573    
## weather_conditionThunder / Windy              0.293502    
## weather_conditionThunder / Wintry Mix         0.835100    
## weather_conditionThunder and Hail             0.898895    
## weather_conditionThunder in the Vicinity      0.402161    
## weather_conditionThunderstorm                 0.004222 ** 
## weather_conditionThunderstorms and Rain       0.004859 ** 
## weather_conditionTornado                      0.898895    
## weather_conditionVolcanic Ash                 0.898895    
## weather_conditionWidespread Dust              0.879195    
## weather_conditionWidespread Dust / Windy      0.835100    
## weather_conditionWintry Mix                   0.553871    
## weather_conditionWintry Mix / Windy           0.740065    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4772 on 487051 degrees of freedom
##   (2376 observations deleted due to missingness)
## Multiple R-squared:  0.03421,    Adjusted R-squared:  0.034 
## F-statistic: 162.8 on 106 and 487051 DF,  p-value: < 2.2e-16

Most significant P-values: 8.19e-05, 5.91e-05, 0.000231 Equation: estimated severity=2.0625 +1.9375(LightBlowingSnow)+1.4375(LightSnowShowers)+0.4510(LightThunderstormsAndRain)

The regression model suggests that Light Blowing Snow, Light Snow Showers, and Light Thunderstorms and Rain are highly significant predictors of accident severity. Specifically, Light Blowing Snow increases the severity by 1.94, Light Snow Showers by 1.44, and Light Thunderstorms and Rain by 0.45. The severity when none of these are present is approximately 2.06.

On the other hand, the R-squared value is very low, meaning that although these factors are statistically significant, they only explain a small amount of the data.

#creating a histogram with what I found
hist(residuals(lm_model2))

As we can see, the histogram of residuals is not centered around 0, and is clearly skewed. Further proving that the model is not very useful. Although weather condition is something to consider when looking at the severity of a car crash, it is important to keep in mind other factors that may not be included in the dataset. For example, the speed of a car could be a large predictor in severity.

# Calculate average distance by severity
severity_dist <- accidents3 %>%
  group_by(severity) %>%
  summarise(avg_distance = mean(distancemi, na.rm = TRUE))

colorsss <- c("1" = "#4d5d82", "2" = "#94608f", "3" = "#ffb759", "4" = "#b02e4a")

# Create a bar plot
ggplot(severity_dist, aes(x = as.factor(severity), y = avg_distance, fill = as.factor(severity))) +
  geom_bar(stat = "identity", width = 0.7) +
  scale_fill_manual(values = colorsss, name = "Severity Level") +
  labs(
    title = "Crash Severity and Distance",
    x = "Crash Severity Level",
    y = "Average Crash Distance (miles)",
    caption = "U.S. Accident Dataset, 2018–2023"
  ) +
  theme_classic() 

https://public.tableau.com/views/AverageDistanceandServerityforCarCrashesBasedonU_S_State/Sheet1?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

I originally wanted to compare the severity of the crashes with the weather condition at the time, as I thought this would be an obvious factor. However, the R-squared value was very low, meaning that although these factors are statistically significant, they only explain a small amount of the data. As we can see, the histogram of residuals is not centered around 0, and is clearly skewed. Further proving that the model is not very useful. Although weather condition is something to consider when looking at the severity of a car crash, it is important to keep in mind other factors that may not be included in the data set. For example, the speed of a car could be a large predictor in severity. In fact, it was difficult to find any factor that actually could predict the severity of a crash. Although I did not find many trends when looking at car crash severity, I saw a trend specifically in South Dakota. In South Dakota, there was a high average severity. To add onto it, there was a high average distance in South Dakota. This could be due to other factors that my data does not show. In the future, I would like to focus more on the individual states, rather than a general look at the U.S. From this analysis I’ve realized that every state and their conditions are so different, that focusing on the U.S. as a whole would be too vague. Specifically, I would like to start with focusing on South Dakota. I feel as though I could find trends specific to each state and even include weather condition, as I originally wanted to.