Final Project Part 1

DSA406_001_SP25_FP1_ryalsaid

Author

Rommie Alsaidi

Published

February 18, 2025

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1.Import dataset using read.csv

data <- read.csv("US_Accidents/US_Accidents_March23_sampled_500k.csv") #read dataset into document

inspect dataset

head(data) #returns the first couple of rows to see columns and values

         ID  Source Severity                    Start_Time
1 A-2047758 Source2        2           2019-06-12 10:10:56
2 A-4694324 Source1        2 2022-12-03 23:37:14.000000000
3 A-5006183 Source1        2 2022-08-20 13:13:00.000000000
4 A-4237356 Source1        2           2022-02-21 17:43:04
5 A-6690583 Source1        2           2020-12-04 01:46:00
6 A-1101469 Source2        2           2021-03-29 07:03:58
                       End_Time Start_Lat  Start_Lng  End_Lat    End_Lng
1           2019-06-12 10:55:58  30.64121  -91.15348       NA         NA
2 2022-12-04 01:56:53.000000000  38.99056  -77.39907 38.99004  -77.39828
3 2022-08-20 15:22:45.000000000  34.66119 -120.49282 34.66119 -120.49244
4           2022-02-21 19:43:23  43.68059  -92.99332 43.68057  -92.97222
5           2020-12-04 04:13:09  35.39548 -118.98518 35.39548 -118.98600
6           2021-03-29 08:51:01  42.53208  -70.94427       NA         NA
  Distance.mi.
1        0.000
2        0.056
3        0.022
4        1.054
5        0.046
6        0.000
                                                         Description
1           Accident on LA-19 Baker-Zachary Hwy at Lower Zachary Rd.
2 Incident on FOREST RIDGE DR near PEPPERIDGE PL Drive with caution.
3       Accident on W Central Ave from Floradale Ave to Western Ave.
4             Incident on I-90 EB near REST AREA Drive with caution.
5               RP ADV THEY LOCATED SUSP VEH OF 20002 - 726 CRAWFORD
6                                Accident on Forest St at Lowell St.
            Street        City           County State    Zipcode Country
1       Highway 19     Zachary East Baton Rouge    LA 70791-4610      US
2  Forest Ridge Dr    Sterling          Loudoun    VA 20164-2813      US
3    Floradale Ave      Lompoc    Santa Barbara    CA      93436      US
4       14th St NW      Austin            Mower    MN      55912      US
5       River Blvd Bakersfield             Kern    CA 93305-2649      US
6        Lowell St     Peabody            Essex    MA 01960-4275      US
    Timezone Airport_Code   Weather_Timestamp Temperature.F. Wind_Chill.F.
1 US/Central         KBTR 2019-06-12 09:53:00             77            77
2 US/Eastern         KIAD 2022-12-03 23:52:00             45            43
3 US/Pacific         KLPC 2022-08-20 12:56:00             68            68
4 US/Central         KAUM 2022-02-21 17:35:00             27            15
5 US/Pacific         KBFL 2020-12-04 01:54:00             42            42
6 US/Eastern         KBVY 2021-03-29 06:53:00             42            35
  Humidity... Pressure.in. Visibility.mi. Wind_Direction Wind_Speed.mph.
1          62        29.92             10             NW               5
2          48        29.91             10              W               5
3          73        29.79             10              W              13
4          86        28.49             10            ENE              15
5          34        29.77             10           CALM               0
6          58        29.37             10              W              13
  Precipitation.in. Weather_Condition Amenity  Bump Crossing Give_Way Junction
1                 0              Fair   False False    False    False    False
2                 0              Fair   False False    False    False    False
3                 0              Fair   False False    False    False    False
4                 0        Wintry Mix   False False    False    False    False
5                 0              Fair   False False    False    False    False
6                 0              Fair   False False    False    False    False
  No_Exit Railway Roundabout Station  Stop Traffic_Calming Traffic_Signal
1   False   False      False   False False           False           True
2   False   False      False   False False           False          False
3   False   False      False   False False           False           True
4   False   False      False   False False           False          False
5   False   False      False   False False           False          False
6   False   False      False   False False           False           True
  Turning_Loop Sunrise_Sunset Civil_Twilight Nautical_Twilight
1        False            Day            Day               Day
2        False          Night          Night             Night
3        False            Day            Day               Day
4        False            Day            Day               Day
5        False          Night          Night             Night
6        False            Day            Day               Day
  Astronomical_Twilight
1                   Day
2                 Night
3                   Day
4                   Day
5                 Night
6                   Day

str(data) #returns the structure of the dataset as well as the variable names and types

'data.frame':   500000 obs. of  46 variables:
 $ ID                   : chr  "A-2047758" "A-4694324" "A-5006183" "A-4237356" ...
 $ Source               : chr  "Source2" "Source1" "Source1" "Source1" ...
 $ Severity             : int  2 2 2 2 2 2 2 2 2 2 ...
 $ Start_Time           : chr  "2019-06-12 10:10:56" "2022-12-03 23:37:14.000000000" "2022-08-20 13:13:00.000000000" "2022-02-21 17:43:04" ...
 $ End_Time             : chr  "2019-06-12 10:55:58" "2022-12-04 01:56:53.000000000" "2022-08-20 15:22:45.000000000" "2022-02-21 19:43:23" ...
 $ Start_Lat            : num  30.6 39 34.7 43.7 35.4 ...
 $ Start_Lng            : num  -91.2 -77.4 -120.5 -93 -119 ...
 $ End_Lat              : num  NA 39 34.7 43.7 35.4 ...
 $ End_Lng              : num  NA -77.4 -120.5 -93 -119 ...
 $ Distance.mi.         : num  0 0.056 0.022 1.054 0.046 ...
 $ Description          : chr  "Accident on LA-19 Baker-Zachary Hwy at Lower Zachary Rd." "Incident on FOREST RIDGE DR near PEPPERIDGE PL Drive with caution." "Accident on W Central Ave from Floradale Ave to Western Ave." "Incident on I-90 EB near REST AREA Drive with caution." ...
 $ Street               : chr  "Highway 19" " Forest Ridge Dr" "Floradale Ave" "14th St NW" ...
 $ City                 : chr  "Zachary" "Sterling" "Lompoc" "Austin" ...
 $ County               : chr  "East Baton Rouge" "Loudoun" "Santa Barbara" "Mower" ...
 $ State                : chr  "LA" "VA" "CA" "MN" ...
 $ Zipcode              : chr  "70791-4610" "20164-2813" "93436" "55912" ...
 $ Country              : chr  "US" "US" "US" "US" ...
 $ Timezone             : chr  "US/Central" "US/Eastern" "US/Pacific" "US/Central" ...
 $ Airport_Code         : chr  "KBTR" "KIAD" "KLPC" "KAUM" ...
 $ Weather_Timestamp    : chr  "2019-06-12 09:53:00" "2022-12-03 23:52:00" "2022-08-20 12:56:00" "2022-02-21 17:35:00" ...
 $ Temperature.F.       : num  77 45 68 27 42 42 35 90 91 63 ...
 $ Wind_Chill.F.        : num  77 43 68 15 42 35 35 90 91 63 ...
 $ Humidity...          : num  62 48 73 86 34 58 89 55 39 78 ...
 $ Pressure.in.         : num  29.9 29.9 29.8 28.5 29.8 ...
 $ Visibility.mi.       : num  10 10 10 10 10 10 10 10 10 10 ...
 $ Wind_Direction       : chr  "NW" "W" "W" "ENE" ...
 $ Wind_Speed.mph.      : num  5 5 13 15 0 13 0 12 7 10 ...
 $ Precipitation.in.    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Weather_Condition    : chr  "Fair" "Fair" "Fair" "Wintry Mix" ...
 $ Amenity              : chr  "False" "False" "False" "False" ...
 $ Bump                 : chr  "False" "False" "False" "False" ...
 $ Crossing             : chr  "False" "False" "False" "False" ...
 $ Give_Way             : chr  "False" "False" "False" "False" ...
 $ Junction             : chr  "False" "False" "False" "False" ...
 $ No_Exit              : chr  "False" "False" "False" "False" ...
 $ Railway              : chr  "False" "False" "False" "False" ...
 $ Roundabout           : chr  "False" "False" "False" "False" ...
 $ Station              : chr  "False" "False" "False" "False" ...
 $ Stop                 : chr  "False" "False" "False" "False" ...
 $ Traffic_Calming      : chr  "False" "False" "False" "False" ...
 $ Traffic_Signal       : chr  "True" "False" "True" "False" ...
 $ Turning_Loop         : chr  "False" "False" "False" "False" ...
 $ Sunrise_Sunset       : chr  "Day" "Night" "Day" "Day" ...
 $ Civil_Twilight       : chr  "Day" "Night" "Day" "Day" ...
 $ Nautical_Twilight    : chr  "Day" "Night" "Day" "Day" ...
 $ Astronomical_Twilight: chr  "Day" "Night" "Day" "Day" ...

Dataset Description

Our Dataset has 500000 rows and 46 columns. Most of the data is either in char or num, although some of the char could be Boolean. We have a unique identifier that called accident_Index, it is quite long however with 300,000 rows it may be optimal not to make our own key.

Source of the data.

Kaggle: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents

What is the dataset about?

This is a countrywide car accident dataset that covers 49 states of the USA. The accident data were collected from February 2016 to March 2023, using multiple APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by various entities, including the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road networks.

What are your motivations for exploring this dataset?

Although this subject was not my first choice, I haven’t been able to find something that could work towards what I want to do. As to this subject, I have been in a car accident before and it shook me up for a while. I wont say that it was hard to drive after but it definitely sits in the back of my mind when I do drive. So being able to understand the common causes of a car accident could help ease my mind.

What questions do you want to answer? (broad)

what attributes tend to be most associated with severe car accidents?

Hypothesis

Severe car accidents are more common in rainy junctions than any other situation.

Biases

A bias I may have is of course the fact that I have been in a car accident so I may hold a bias towards my own situation. I would also have a bias on prior understanding of driving where I know that rain and intersections cause volatile driving situations.

Data Dictionary

A data dictionary serves as a comprehensive guide to understanding the structure and attributes of a dataset. Based on the information you’ve provided, here’s a structured data dictionary for your dataset:


Variable Name	Data Type	Description

ID	String	Unique identifier for the accident record.

Source	String	Origin of the raw accident data.

Severity	Integer	Severity level of the accident, indicating its impact on traffic.

Start_Time	DateTime	Start time of the accident in local time zone.

End_Time	DateTime	End time when the accident’s impact on traffic was dismissed.

Start_Lat	Float	Latitude of the accident’s start point.

Start_Lng	Float	Longitude of the accident’s start point.

End_Lat	Float	Latitude of the accident’s end point.

End_Lng	Float	Longitude of the accident’s end point.

Distance(mi)	Float	Length of the road extent affected by the accident.

Description	String	Human-provided description of the accident.

Street	String	Street name where the accident occurred.

City	String	City where the accident occurred.

County	String	County where the accident occurred.

State	String	State where the accident occurred.

Zipcode	String	Zip code of the accident location.

Country	String	Country where the accident occurred.

Timezone	String	Timezone based on the accident’s location.

Airport_Code	String	Closest airport-based weather station to the accident location.

Weather_Timestamp	DateTime	Timestamp of the weather observation record in local time.

Temperature(F)	Float	Temperature at the time of the accident.

Wind_Chill(F)	Float	Wind chill at the time of the accident.

Humidity(%)	Float	Humidity percentage at the time of the accident.

Pressure(in)	Float	Atmospheric pressure at the time of the accident.

Visibility(mi)	Float	Visibility distance at the time of the accident.

Wind_Direction	String	Direction from which the wind was blowing.

Wind_Speed(mph)	Float	Wind speed at the time of the accident.

Precipitation(in)	Float	Precipitation amount at the time of the accident.

Weather_Condition	String	Weather condition during the accident.

Amenity	Boolean	Presence of an amenity near the accident location.

Bump	Boolean	Presence of a speed bump or hump near the accident location.

Crossing	Boolean	Presence of a crossing near the accident location.

Give_Way	Boolean	Presence of a give way sign near the accident location.

Junction	Boolean	Presence of a junction near the accident location.

No_Exit	Boolean	Presence of a no exit sign near the accident location.

Railway	Boolean	Presence of a railway near the accident location.

Roundabout	Boolean	Presence of a roundabout near the accident location.

Station	Boolean	Presence of a station near the accident location.

Stop	Boolean	Presence of a stop sign near the accident location.

Traffic_Calming	Boolean	Presence of traffic calming measures near the accident location.

Traffic_Signal	Boolean	Presence of a traffic signal near the accident location.

Turning_Loop	Boolean	Presence of a turning loop near the accident location.

Sunrise_Sunset	String	Period of the day based on sunrise/sunset.

Civil_Twilight	String	Period of the day based on civil twilight.

Nautical_Twilight	String	Period of the day based on nautical twilight.

Astronomical_Twilight	String	Period of the day based on astronomical

Data Cleaning

checking for null values

na_count_per_column <- colSums(is.na(data)) #count the total na values in each columns

print(na_count_per_column[na_count_per_column > 0]) #print total na

          End_Lat           End_Lng    Temperature.F.     Wind_Chill.F. 
           220377            220377             10466            129017 
      Humidity...      Pressure.in.    Visibility.mi.   Wind_Speed.mph. 
            11130              8928             11291             36987 
Precipitation.in. 
           142616

Given these results, I am going to delete the End_Lat, End_LNG, Wind_chill.F. columns. and delete all rows in which the rest of the variables are null. I will still have over 300000 rows of data.

data <- subset(data, select = -c(End_Lat, End_Lng, Wind_Chill.F.)) #delete unnecesarry columns
data <- na.omit(data) #omit all rows with NA values

recheck nulls to ensure validity

na_count_per_column <- colSums(is.na(data))
print(na_count_per_column[na_count_per_column > 0])

named numeric(0)

Data Visualizations

What is the distribution of accident severity?

Understanding the frequency of each accident severity level provides a foundational view of the dataset and helps determine where prevention efforts may be most effective. To explore this, I created a bar chart using ggplot2 to visualize the count of incidents across severity levels 1 through 4. The results show that the majority of accidents fall under Severity Level 2, meaning they tend to cause moderate disruption to traffic and are not life-threatening. These findings suggest that targeting the causes of Level 2 accidents could lead to the most widespread improvements in road safety.

library(ggplot2)

#plot the distribution of the severity of the accidents
ggplot(data, aes(x = factor(Severity), fill = factor(Severity))) +
  geom_bar() +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "Moderate accidents are most common",
       x = "Severity Level",
       y = "Count",
       fill = "Severity Level") +
  theme_minimal()

How does accident frequency vary by hour and weekday?

Identifying when accidents are most likely to occur is key for scheduling interventions such as traffic patrols, public safety announcements, or infrastructure changes. I extracted the hour and weekday from the Start_Time field using lubridate and plotted a heat map to examine accident frequency over time. The visualization revealed clear spikes during weekday rush hours—especially between 7–9 AM and 3–6 PM—implying that commuter traffic is a major factor in accident occurrence. These time-based trends can inform better planning of city resources and suggest that interventions should be concentrated during these high-risk windows.

# Extract hour and weekday from Start_Time
data <- data %>%
  mutate(
    Hour = hour(Start_Time),
    Day = wday(Start_Time, label = TRUE)  # Sunday = 1, Saturday = 7
  )

# Count number of accidents for each day-hour pair
heat_data <- data %>%
  count(Day, Hour)

# Plot the heat map
ggplot(heat_data, aes(x = Hour, y = Day, fill = n)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(name = "Accidents", option = "C") +
  labs(
    title = "Clear Spike in Accidents During Commmute Hours",
    x = "Hour of Day",
    y = "Day of Week"
  ) +
  theme_minimal()

What happens to accident frequency when weather conditions change?

Weather is often assumed to be a major cause of traffic accidents, but it’s important to validate whether that assumption holds true in the data. To explore this, I counted the number of accidents associated with each unique weather condition and visualized the top 10 using a horizontal bar chart. Surprisingly, the vast majority of accidents happened under clear or mildly cloudy conditions like “Fair” and “Mostly Cloudy,” rather than during storms or snow. This finding challenges conventional wisdom and suggests that driver behavior and traffic density during normal weather may be more influential than the weather itself in causing accidents.

weather_counts <- data %>%
  group_by(Weather_Condition) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  top_n(10, Count)  # Select top 10 weather conditions

ggplot(weather_counts, aes(x = reorder(Weather_Condition, Count), y = Count, fill = Weather_Condition)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_brewer(palette = "Paired") +
  labs(title = "Top 10 Weather Conditions During Accidents",
       x = "Weather Condition",
       y = "Number of Accidents",
       fill = "Weather Condition") +
  theme_minimal()

3. How does average accident severity differ across cities?

While some cities may experience a high number of accidents, others may be more prone to severe incidents. This distinction is important for making localized improvements in road safety. I used aggregate() to calculate the average severity for each city, filtered to include only those with over 100 accidents, and plotted the top 10 cities by severity using a bar chart. Cities like Saint Louis, Lansing, and Chicago ranked highest in severity, even though they don’t lead in total accident count. This indicates that certain urban environments may have underlying risk factors that lead to more dangerous outcomes, warranting further investigation.

# Calculate average severity per city
avg_severity <- aggregate(Severity ~ City, data = data, mean)

# Calculate count per city to filter out cities with small sample sizes
city_counts <- table(data$City)

# Merge counts into the avg_severity dataframe
avg_severity$Count <- city_counts[avg_severity$City]

# Keep only cities with at least 100 accidents
avg_severity_filtered <- avg_severity[avg_severity$Count >= 100, ]

# Get top 10 cities by average severity
top10 <- head(avg_severity_filtered[order(-avg_severity_filtered$Severity), ], 10)

# Plot
library(ggplot2)
ggplot(top10, aes(x = reorder(City, Severity), y = Severity, fill = Severity)) +
  geom_col() +
  coord_flip() +
  scale_fill_viridis_c() +
  labs(
    title = "Top 10 Cities by Average Accident Severity",
    x = "City",
    y = "Average Severity"
  ) +
  theme_minimal()

city_counts <- table(data$City)

# Convert to data frame
city_counts_df <- as.data.frame(city_counts)
colnames(city_counts_df) <- c("City", "Count")

# Sort by count (descending) and take top 10
top10_cities <- head(city_counts_df[order(-city_counts_df$Count), ], 10)

# Plot
library(ggplot2)
ggplot(top10_cities, aes(x = reorder(City, Count), y = Count, fill = Count)) +
  geom_col() +
  coord_flip() +
  scale_fill_viridis_c() +
  labs(
    title = "Top 10 Cities by Number of Accidents",
    x = "City",
    y = "Accident Count"
  ) +
  theme_minimal()

Is there a time in the year in which we see a spike in accidents?

This seasonal analysis shows that accidents peak during the winter months, followed closely by fall, while spring and summer see noticeably fewer incidents. The elevated accident count in winter may be driven by a combination of hazardous road conditions like ice and snow, reduced daylight, and increased travel around the holidays. Fall’s higher numbers could be influenced by back-to-school traffic and early seasonal weather changes. These findings suggest that colder seasons pose greater risks for drivers, and it’s in our best interest to focus road safety campaigns, resource planning, and traffic management efforts during these times of year.

# Add a Month column
data$Month <- month(data$Start_Time, label = TRUE)
ggplot(data, aes(x = Month)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Monthly Distribution of Accidents",
       x = "Month",
       y = "Number of Accidents") +
  theme_minimal()

# create seasons
data$Season <- factor(
  ifelse(month(data$Start_Time) %in% c(12, 1, 2), "Winter",
  ifelse(month(data$Start_Time) %in% c(3, 4, 5), "Spring",
  ifelse(month(data$Start_Time) %in% c(6, 7, 8), "Summer", "Fall"))),
  levels = c("Winter", "Spring", "Summer", "Fall")
)

# Plot accident count by season
library(ggplot2)

ggplot(data, aes(x = Season)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Early Sunsets Could be a Factor in Accident Spikes",
       x = "Season",
       y = "Number of Accidents") +
  theme_minimal()

Hypothesis

Based on the exploratory analysis, I observed that most accidents occurred during weekday rush hours, particularly between 7–9 AM and 3–6 PM, and under fair or mildly cloudy weather conditions. Additionally, accident frequency peaked during the winter months, followed by fall, while spring and summer saw noticeably fewer incidents. This seasonal trend suggests that factors like holiday travel, shorter daylight hours, and winter road conditions may contribute to increased accident risk—but even then, most accidents still occurred during clear weather. These patterns challenge the common assumption that adverse weather is the primary cause of accidents and instead point to traffic volume and time of day as stronger contributors. From this, I hypothesize that accident frequency is more strongly influenced by traffic patterns than by weather conditions. This hypothesis is meaningful because it can help city planners, public safety officials, and traffic engineers prioritize interventions where they will have the greatest impact—targeting congestion and peak traffic hours rather than focusing solely on weather-related responses. To fully test this hypothesis, additional data such as hourly traffic volume, congestion levels, and road type classifications would be needed. Analytical methods like multivariate regression and time series modeling would help isolate the effects of traffic versus weather. If the hypothesis is true, efforts should focus on managing traffic flow during high-volume periods; if false, more attention should be placed on preparing for and mitigating weather-related hazards.

Executive Summary

This analysis explores patterns and risk factors associated with vehicle accidents using a national dataset. Key exploratory findings reveal that most accidents occur during weekday rush hours—specifically between 7–9 AM and 3–6 PM—suggesting a strong relationship between traffic congestion and accident frequency. Contrary to popular assumptions, the majority of accidents take place in fair or mildly cloudy weather, not during rain, fog, or snow. Seasonal analysis further supports this insight: Winter has the highest accident count, followed by Fall, while Spring and Summer experience fewer incidents overall. This suggests that factors such as holiday travel, shorter daylight hours, and increased congestion in colder months may play a role.

Based on these patterns, we hypothesize that traffic volume and time of day are more influential in accident frequency than adverse weather conditions. This hypothesis has practical implications for traffic engineers, public safety officials, and urban planners. If accurate, it would shift the focus of safety efforts away from weather-specific interventions toward congestion management strategies such as optimized traffic signal timing, enforcement during peak hours, or improved public transportation access.

To rigorously test this hypothesis, additional data is needed—specifically, real-time or historical traffic volume, congestion levels, and road type classifications. Statistical methods such as multivariate regression and time series modeling would help isolate the effects of traffic versus environmental factors on accident frequency.

For stakeholders, the main takeaway is that predictable human patterns—such as commuting times and seasonal travel—may drive accidents more than unpredictable weather events. Focusing resources on these high-risk time windows and travel periods could lead to meaningful reductions in accident rates.

library(ggplot2)
library(lubridate)

# Create necessary time features
data$Hour <- hour(data$Start_Time)
data$Weekday <- wday(data$Start_Time, label = TRUE)
data$Is_Weekday <- !data$Weekday %in% c("Sat", "Sun")
data$Month <- month(data$Start_Time)

# Assign seasons based on month
data$Season <- factor(
  ifelse(data$Month %in% c(12, 1, 2), "Winter",
  ifelse(data$Month %in% c(3, 4, 5), "Spring",
  ifelse(data$Month %in% c(6, 7, 8), "Summer", "Fall"))),
  levels = c("Winter", "Spring", "Summer", "Fall")
)

# Filter for weekdays only
weekday_data <- data[data$Is_Weekday == TRUE, ]

# Count accidents by hour and season
accidents_by_hour_season <- as.data.frame(table(weekday_data$Hour, weekday_data$Season))
colnames(accidents_by_hour_season) <- c("Hour", "Season", "Accidents")
accidents_by_hour_season$Hour <- as.numeric(as.character(accidents_by_hour_season$Hour))

# Plot
ggplot(accidents_by_hour_season, aes(x = Hour, y = Accidents, color = Season, group = Season)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Cold Dark Commutes",
    x = "Hour of Day",
    y = "Number of Accidents",
    color = "Season"
  ) +
  theme_minimal()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.