Our research question is ‘How do factors such as weather and location affect the rate of traffic accidents?’
We can see that many car accidents still occur. To prevent this, the state is making various efforts.
Article 1) Heavy snow closes 42 schools across Shropshire (https://naver.me/FvEdkQJl)
It is an article that there were so many car accidents caused by heavy snow. Not only road control but also actions such as closing schools were taken. As such, the weather is one of the several factors that affects the traffic condition.
Article 2) M5 Motorway lanes closed due to flooding after heavy rain
(https://naver.me/Fcm8vg3h)
In fact, in the UK today, roads that are affected by weather such as rain. Therefore when extreme weathers are foreshadowed, road control is taken in advance. By controlling the road in this way, we can reduce car accidents. However, if there is an insight based on statistics, it would be helpful when making road control decisions. We can also give drivers information about which weather, seasons, and regions to pay more attention to.
Accordingly, we will classify accidents according to variables such as weather and season, and find out under what conditions the most accidents occur. Furthermore, we will display the location of accidents on a map and find out where and where they occur frequently.
As a result, the final graphic will help us to understand and highlight areas where extra attention is needed regarding to traffic accidents.
We have a total of five expected effects while analyzing this data: 1) Accident Prevention and Life Protection Accidents can be prevented by understanding the causes of accidents.
Efficient resource (police, guard rails) allocation. Police patrols, emergency services, and road protections can be optimized under high traffic accident rates or under certain conditions.
policy development It provides useful information for urban and transportation infrastructure design.
Response to Environmental Change Extreme weather increases due to climate change. Understand the relationship between weather and traffic accidents and develop the adaptability of transportation policies to new environmental changes.
Use for driver safety training It can be used to educate drivers on safer driving habits.
The dataset that we are going to use is “Road traffic accidents” provided by UK government website. The dataset is downloaded from ‘https://www.data.gov.uk/’ website. It was published by “Leeds City Council”. It includes a total of 18 variables.
# first, we need to load dataset. Since the data is saved in csv format, the values must be seperated by using the option, sep= ",".
traffic <- read.table("Traffic%20accidents_2019_Leeds.csv",sep=",",header=TRUE)
glimpse(traffic)
## Rows: 1,907
## Columns: 18
## $ Reference.Number <chr> "58F1730", "58F1730", "58F1730", "58F1730", "58F1…
## $ Grid.Ref..Easting <int> 436147, 436147, 436147, 436147, 436147, 436147, 4…
## $ Grid.Ref..Northing <int> 434957, 434957, 434957, 434957, 434957, 434957, 4…
## $ Number.of.Vehicles <int> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 3, 3…
## $ Accident.Date <chr> "15/08/2019", "15/08/2019", "15/08/2019", "15/08/…
## $ Time..24hr. <int> 1812, 1812, 1812, 1812, 1812, 1812, 1812, 1007, 1…
## $ X1st.Road.Class <int> 3, 3, 3, 3, 3, 3, 3, 3, 6, 6, 3, 6, 6, 6, 6, 3, 3…
## $ X1st.Road.Class...No <chr> "A6120", "A6120", "A6120", "A6120", "A6120", "A61…
## $ Road.Surface <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Lighting.Conditions <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 4, 1, 1, 1, 1, 1…
## $ Weather.Conditions <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Local.Authority <chr> "E08000035", "E08000035", "E08000035", "E08000035…
## $ Vehicle.Number <int> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2…
## $ Type.of.Vehicle <int> 11, 11, 11, 11, 11, 11, 11, 1, 9, 1, 1, 9, 9, 5, …
## $ Casualty.Class <int> 2, 2, 2, 2, 2, 2, 2, 1, 3, 1, 1, 3, 1, 1, 2, 1, 2…
## $ Casualty.Severity <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 3, 2, 3, 3, 3…
## $ Sex.of.Casualty <int> 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1…
## $ Age.of.Casualty <int> 6, 9, 39, 5, 8, 48, 57, 54, 3, 45, 44, 16, 38, 35…
Below is an introduction to only variables we’re going to use.
The first variable is ” Casualty Severity.” It is divided into three stages in total. (1 – Fatal 2 – serious 3 – Slight) Among them, only data from 1 and 3 will be used to proceed with the overall comparison according to severity.
The second variable is ” Accident Date.” It represents the date and we will use it for seasonal checks.
The next variable is ” Weather Conditions.” It is divided into nine stages in total. (1- Fine without high winds, 2 - Raining without high winds, 3 - Snowing without high winds, 4 - Fine with high winds, 5 - Raining with high winds, 6 - Snowing with high winds, 7 - Fog or mist if hazard, 8 – Other, 9 – Unknown)
The last variables are ” Grid Ref: Easting”, “Grid Ref: Northing”. This variable is just a coordinate. We will use this variable to locate an accident. These variables provide essential data for analyzing the causes and effects of accidents.
Here are bar plots that shows the brief summary of the variables that we are using: note that only variables with int or double data type is drawn into a bar plot.
# variables which has double or int data type (Date is excluded)
s_variables <- c("Grid.Ref..Easting", "Grid.Ref..Northing", "Weather.Conditions", "Casualty.Severity")
s_traffic <- traffic %>%
select(s_variables)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(s_variables)
##
## # Now:
## data %>% select(all_of(s_variables))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#making the df into long format, in order to draw graphs for each variables
traffic_long <- s_traffic %>%
pivot_longer(cols = everything(),
names_to = "variable",
values_to = "value")
# drawing barplots for each variables
data_info1 <- ggplot(traffic_long, aes(x = value, y = ..count.., fill = variable)) +
geom_bar() +
facet_wrap(~ variable, scales = "free") +
labs(title = "Bar garphs of Each Variable", x = "Value", y = "count") +
theme_minimal()
data_info1
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggsave(here("data_info1.png"), plot = data_info1, bg = "white")
## Saving 7 x 5 in image
Here are more detailed graph portraying descriptive statistics of the dataset.
# variables which has double or int data type (Date is excluded)
graph_traffic <- s_traffic %>%
pivot_longer(cols = everything(), names_to = "variable", values_to = "value") %>%
group_by(variable) %>%
summarise(
mean = mean(value),
sd = sd(value),
min = min(value),
max = max(value)
)
graph_data <- graph_traffic %>%
pivot_longer(cols = c(mean, min, max), names_to = "stat", values_to = "stat_value")
data_info2 <- ggplot(graph_data, aes(x = stat, y = stat_value)) +
geom_point(aes(color = stat), size = 3) +
geom_errorbar(
data = filter(graph_data, stat == "mean"),
aes(ymin = stat_value - sd, ymax = stat_value + sd),
width = 0.2, color = "black") +
facet_wrap(~variable, scales = "free") +
labs(
title = "Mean, Min, Max with SD for Each Variable",
x = "Statistic",
y = "Value"
) +
theme_minimal() +
theme(legend.position = "bottom")
data_info2
ggsave(here("data_info2.png"), plot = data_info2, bg = "white")
## Saving 7 x 5 in image
The entire dataset is too large, hence we first only selected the columns that we are going to use (“Grid.Ref..Easting”, “Grid.Ref..Northing”, “Accident.Date”, “Weather.Conditions”, “Casualty.Severity”).
The date data in the dataframe is saved as ‘char’ format, hence we want to create new column in the dataset where the data data is converted and saved as ‘date’ format. The newly made column will be named “date”.
Also, extra column named “month” is added. This column only stores the month information in ‘int’ format.
#variables that we are going to use (mostly):
variables <- c("Grid.Ref..Easting", "Grid.Ref..Northing", "Accident.Date", "Weather.Conditions", "Casualty.Severity")
traffic_v <- traffic %>%
select(variables)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(variables)
##
## # Now:
## data %>% select(all_of(variables))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
traffic_date <- traffic_v %>%
mutate(date = as.Date(Accident.Date, format = "%d/%m/%Y"))
traffic_month <- traffic_date %>%
mutate(month = month(date))
traffic_month_v <- traffic_month
We further devided the dataset according to the severity.
# dataset with only Casualty.Severity = 1
traffic_1v <- traffic_month_v %>%
filter(Casualty.Severity == 1)
#dataset with only Casualty.Severity = 2
traffic_2v <- traffic_month_v %>%
filter(Casualty.Severity == 2)
#dataset with only Casualty.Severity = 3
traffic_3v <- traffic_month_v %>%
filter(Casualty.Severity == 3)
Fist of all, we will make a scatterplot only using the coordinate data from the UK traffic data set. we set x axis as the Easting coordinates, and y-axis as the Northing coordinates. Hence, drawing the graph would visualize the location where the accidents had happened.
Further pre-processing the data is taken place: we are going to draw graphs for accidents of severity 1 and severity 3, hence we only need accident.severity variables and the coordinate variables.
variables_figure1 <- c("Grid.Ref..Easting", "Grid.Ref..Northing", "Casualty.Severity")
traffic_figure1 <- traffic_month_v %>%
select(variables_figure1)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(variables_figure1)
##
## # Now:
## data %>% select(all_of(variables_figure1))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
with the dataset with only selected variables, using ggplot, scatterplot is drawn.
filtered_traffic <- traffic_figure1 %>%
filter(Casualty.Severity %in% c(1, 3))
coord_scatter <- ggplot(data = filtered_traffic, mapping = aes(x = Grid.Ref..Easting, y = Grid.Ref..Northing, color = as.factor(Casualty.Severity)))
figure1_1 <- coord_scatter +
geom_point(alpha = 0.6) +
scale_color_manual(values = c("red", "lightblue")) +
labs(color = "Casualty Severity")
figure1_1
ggsave(here("figure1_1.png"), plot = figure1_1, bg = "white")
## Saving 7 x 5 in image
However, identifying the exact locations of these accidents is challenging. Including a map of the UK alongside the scatterplot would offer valuable context and improve the visualization.
** reference : https://jcoliver.github.io/learn-r/017-open-street-map.html
** country map source : rnaturalearth and rnaturalearthdata, Made with
Natural Earth. ** roadmap data source : https://www.openstreetmap.org/about
# preparing to draw UK map:
# saving UK map data into variable 'uk':
uk <- ne_countries(country = "united kingdom", type = "countries")
# preparing to draw UK road map:
# references: https://jcoliver.github.io/learn-r/017-open-street-map.html
# OpenStreetMap Data : https://www.openstreetmap.org/about
# getbb() is a function that loads the map data of chosen area
# we need to load map data of "Leed" region of UK
# in order to draw roads, add_osm_feature() is used.
# by using "key = highway" option, we loaded road data.
# however, the road data is composed of may types of roads
# hence, with "value" option, only important and big roads are chosen.
# value = c("motorway", "primary", "secondary")
# "motorway" is highways
# "primary" is roads that connects city to city
# "secondary" is roads that is smaller than primary roads
# osmdata_sf() turns data into sf format.
# in order to draw map using geom_sf(), the data must be in sf format.
leeds_highway <- getbb(place_name = "Leeds") %>%
opq(timeout = 50) %>%
add_osm_feature(key = "highway", value = c("motorway", "trunk", "primary", "secondary")) %>%
osmdata_sf()
# drawing the map:
# the drawn map is saved in "street_plot" variable
street_plot <- ggplot() +
geom_sf(data = leeds_highway$osm_lines,
inherit.aes = FALSE,
color = "black",
size = 0.2) +
theme_minimal()
Before drawing the map, additional data pre-processing is required because the coordinates in the dataset are stored in the Easting and Northing system. In order to plot these points on the map we just created, we must first convert them into the Latitude/Longitude coordinate system. To convert Easting/Northing coordinates to Longitude/Latitude, we need EPSG codes. For the UK, the conversion requires EPSG:27700 (British National Grid) as the source and EPSG:4326 (WGS 84) as the target.
# converting Easting/Northing system into Longitude/Latitude system:
sf_traffic1 <- st_as_sf(traffic_figure1, coords = c("Grid.Ref..Easting", "Grid.Ref..Northing"), crs = 27700)
sf_traffic_transformed1 <- st_transform(sf_traffic1, crs = 4326)
traffic_coords1 <- cbind(
traffic_figure1 %>% select(Casualty.Severity), # keeping Casualty.Severity
st_coordinates(sf_traffic_transformed1) # adding coordinates
)
filtered_traffic1_1and3 <- traffic_coords1 %>%
filter(Casualty.Severity %in% c(1, 3))
maps:
# entire UK map:
figure1_2 <- ggplot() +
geom_sf(data = uk, col = "#555555",
fill = "#DDDDDD", lwd = 0.3) +
geom_point(data = filtered_traffic1_1and3, aes(x = X, y = Y, color = factor(Casualty.Severity)),
size = 2, alpha = 0.6) +
facet_wrap(~Casualty.Severity) +
labs(
x = "Longitude",
y = "Latitude",
title = "Traffic Accidents in UK by Severity",
color = "Severity"
) +
theme_minimal()
figure1_2
ggsave(here("figure1_2.png"), plot = figure1_2, bg = "white")
## Saving 7 x 5 in image
# roadmap of Leeds:
# street_plot is the road map drawn before
figure1_3 <- street_plot +
geom_point(data = filtered_traffic1_1and3, aes(x = X, y = Y, color = factor(Casualty.Severity)),
size = 2, alpha = 0.6) +
facet_wrap(~Casualty.Severity) +
labs(
x = "Longitude",
y = "Latitude",
title = "Traffic Accidents in UK (in Leeds region) by Severity",
color = "Severity"
) +
theme_minimal()
ggsave(here("figure1_3.png"), plot = figure1_3, bg = "white")
## Saving 7 x 5 in image
figure1_3
Since the traffic data is concentrated in the Leeds region, plotting the scatterplot on the entire UK map resulted in all the points appearing clustered together. This makes it difficult to identify the precise locations of the accidents. Therefore, focusing on detailed roadmaps of the Leeds region is necessary to provide clearer insights.
Based on the maps, accidents with a severity level of 1 were relatively few and were concentrated in the city center. In contrast, accidents with a severity level of 3 were more frequent and also clustered in the city center. Additionally, many level 3 accidents were also observed on specific roads outside the center, for instance, in the areas near (1.7W, 53.90N) and (1.37W, 53.93N). Also, many accidents had happened on the the horizontally stretched road around 53.73N.
The second figure will explore if seasons have any affects on the occurrence of traffic accidents. Based on the dates (months) in the database, March to May are classified as spring, June to August as summer, September to November as autumn, and December to February as winter.
A dual bar chart clearly compares the differences in the number of ‘fatal’ and ‘slight’ traffic accidents (Casualty.Severity = 1, 3).
# add a new column named season
traffic_season<- traffic_month_v%>%
mutate(season = case_when(
month %in% c(3, 4, 5) ~ "Spring",
month %in% c(6, 7, 8) ~ "Summer",
month %in% c(9, 10, 11) ~ "Autumn",
month %in% c(12, 1, 2) ~ "Winter",
TRUE ~ NA_character_ ))%>%
filter(Casualty.Severity %in% c(1, 3))
# count
traffic_summary <- traffic_season%>%
group_by(season,Casualty.Severity)%>%
summarise(count = n())
## `summarise()` has grouped output by 'season'. You can override using the
## `.groups` argument.
# Modify the order of the seasons to spring, summer, autumn, and winter.
traffic_summary <- traffic_summary %>%
mutate(season = factor(season, levels = c("Spring", "Summer", "Autumn", "Winter")))
# A dual bar chart images
Figure_2<- ggplot(data = traffic_summary,
mapping = aes(x = season, y = count, fill= factor(Casualty.Severity), group = Casualty.Severity))
Figure_2 + geom_col(position = "dodge") +
labs(x = "", y = "Number of Accidents", fill = "Casualty Severity",
title = "Trends of Casualty Severity by Season") +
scale_fill_manual(values = c("red", "lightblue"),
labels = c("Fatal", "Slight")) +
theme_minimal()+
coord_flip()-> Figure_2
Figure_2
ggsave(here("figure2.png"), plot = Figure_2, bg = "white")
## Saving 7 x 5 in image
Overall, the number of slight traffic accidents is much higher than the number of fatal accidents across all four seasons. slight accidents are most common in autumn, with a decrease in winter, while the numbers in spring and summer are similar.
Fatal traffic accidents occur most frequently in winter. However, due to the large difference in the number of slight and fatal accidents, the changes in the frequency of fatal accidents are not easily noticeable on a single chart.
Separate the accidents into two plots, to specifically compare
seasonal differences in accident counts for slight (Casualty Severity
=3) and fatal accidents(Casualty Severity=1). Due to the large
difference in the number between the accidents across seasons, the
trends of smaller data are not easily visible. facet_wrap()
was used to clearly observe the number of each type of accident in
different seasons.
# Calculate the maximum occurrence of traffic accidents for each severity level,filter the max one.
traffic_summary_filter <- traffic_summary %>%
group_by(Casualty.Severity) %>%
mutate(is_max = ifelse(count == max(count), "Max", "Others")) %>%
ungroup()
# images.We want to highlight the season with the highest number of traffic accidents, so we will color the bar for that season in red, while using gray for the other seasons to make the chart clearer.
Figure2_2 <- ggplot(data = traffic_summary_filter,
mapping = aes(x = season,
y = count,
fill = is_max))
Figure2_2 + geom_col(position = "dodge", alpha = 0.9) +
geom_text(data = subset(traffic_summary_filter, is_max == "Max"),
aes(label = count),
position = position_dodge(width = 1.2),
vjust = -0.5,
size = 3) +
labs(x = "",
y = "Number of Accidents",
color = "Casualty Severity",
title = "Trends of Casualty Severity by Season") +
facet_wrap(~Casualty.Severity,
scales = "free_y",
labeller = labeller(Casualty.Severity = c("1" = "Fatal", "3" = "Slight"))) +
theme_minimal() +
theme(strip.text = element_text(size = 10),
plot.title = element_text(size = 16, face = "bold")) +
scale_fill_manual(values = c("Max" = "pink", "Others" = "grey")) +
guides(fill = "none") -> Figure2_2
Figure2_2
ggsave(here("figure2_2.png"), plot = Figure2_2, bg = "white")
## Saving 7 x 5 in image
For fatal traffic accidents (Casualty Severity = 1), the number of incidents in winter differs significantly from other seasons. The number of accidents is relatively low in spring and autumn. Therefore, winter is the peak period for fatal traffic accidents, and effective preventive measures should be taken during this season.
For slight traffic accidents (Casualty Severity = 3), autumn has the highest number of incidents, with a total of 436.
maps: Slight traffic accidents (Casualty Severity = 3) occur most often in autumn. Fatal traffic accidents (Casualty Severity = 1) occur most often in winter. These maps would give us an answer to these questions: -> Where do slight accidents happen in autumn? And where do fatal accidents happen in winter?
#Plot the geographic locations of fatal accidents (Casualty Severity == 1) in the UK during winter.
season_1_max <- traffic_season%>%
select("Grid.Ref..Easting","Grid.Ref..Northing","Casualty.Severity","season")%>%
filter(Casualty.Severity == 1& season == "Winter")
# converting Easting/Northing system into Longitude/Latitude system:
sf_traffic1_season<- st_as_sf(season_1_max, coords = c("Grid.Ref..Easting", "Grid.Ref..Northing"), crs = 27700)
sf_season_transformed1 <- st_transform(sf_traffic1_season, crs = 4326)
season_1 <- cbind(
season_1_max %>% select("Casualty.Severity"),
st_coordinates(sf_season_transformed1)
)
Plotting a UK road map showing fatal accidents (Casualty.Severity == 1) during winter. Locations of fatal traffic accidents are marked with red dots.
Figure2_3 <- street_plot +
geom_point(data = season_1, aes(x = X, y = Y),
size = 2, alpha = 0.6, color = "red") +
labs(
x = "Longitude",
y = "Latitude",
title = "Traffic Accidents (Severity 1) in UK, Leeds streets",
subtitle =":Most Frequent Locations in Winter"
)
Figure2_3
ggsave(here("figure2_3.png"), plot = Figure2_3, bg = "white")
## Saving 7 x 5 in image
Although fatal traffic accidents are spread out, it is clear that they are mostly concentrated at intersections where two roads cross (crossroads).
Ploting the geographic locations of slight accidents (Casualty Severity == 3) in the UK during autumn.
season_3_max <- traffic_season%>%
select("Grid.Ref..Easting","Grid.Ref..Northing","Casualty.Severity","season")%>%
filter(Casualty.Severity == 3& season == "Autumn")
#converting Easting/Northing system into Longitude/Latitude system:
sf_traffic3_season<- st_as_sf(season_3_max, coords = c("Grid.Ref..Easting", "Grid.Ref..Northing"), crs = 27700)
sf_season_transformed3 <- st_transform(sf_traffic3_season, crs = 4326)
season_3 <- cbind(
season_3_max %>% select("Casualty.Severity"),
st_coordinates(sf_season_transformed3 )
)
Plot a UK road map showing slight accidents (Casualty.Severity == 3)
during autumn. Locations of slight traffic accidents are maked with
lightblue dots. geom_rect()
was used to highlight areas
where these accidents are more concentrated.
Figure2_4 <- street_plot +
geom_point(data = season_3, aes(x = X, y = Y),
size = 2, alpha = 0.5, color = "lightblue") +
geom_rect(aes(xmin = -1.6, xmax = -1.5, ymin = 53.78, ymax = 53.83),
fill = NA, color = "blue", size = 1, linetype = "dashed") +
geom_text(aes(x =-1.49, y = 53.839,
label = "Most frequent location"),
size = 4, color = "black", hjust = 0, vjust = 1) +
labs(
x = "Longitude",
y = "Latitude",
title = "Traffic Accidents (Severity 3) in UK, Leeds streets",
subtitle =":Most Frequent Locations in Autumn"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Figure2_4
ggsave(here("figure2_4.png"), plot = Figure2_4, bg = "white")
## Saving 7 x 5 in image
Most traffic accidents in autumn are concentrated in certain areas, but some are more spread out. Notably, in the area highlighted with blue box, minor traffic accidents occurred the most. Hence, in autumn, special attention should be given to slight accidents in this area, and preventive measures should be taken.
Figure 3 focuses on the weather variables.
Initially, the frequency of traffic accidents (of severity == 3) for each weather variable was analyzed and stored in the table named ‘weather_counts3’. Traffic accidents (of severity == 1) for each weather variable are analyzed and stored in table named ‘weather_count1’.
only_weather1 <- traffic_1v %>% select(`Weather.Conditions`)
weather_counts1 <- table(only_weather1)
weather_counts1
## Weather.Conditions
## 1 2 8
## 19 2 1
only_weather3 <- traffic_3v %>% select(`Weather.Conditions`)
weather_counts3 <- table(only_weather3)
weather_counts3
## Weather.Conditions
## 1 2 3 4 5 6 7 8 9
## 1330 172 3 17 20 1 1 6 1
pie charts:
We counted the number of accidents for each weather variable and now will draw a graph to find out which weather variable caused the most accidents. We used a pie chart for an easy comparison. We will look at the pie chart of severity 1 and severity 3 together and compare them.
# severity == 3
weather_conditions <- c(1330, 172, 3, 17, 20, 1, 1, 6, 1)
names(weather_conditions) <- 1:9
weather_conditions2 <- c(19, 2, 1)
names(weather_conditions2) <- c("1", "2", "8")
# percentage calculation
percentages1 <- round(weather_conditions / sum(weather_conditions) * 100, 1)
percentages2 <- round(weather_conditions2 / sum(weather_conditions2) * 100, 1)
custom_colors1 <- c("red", "orange", "yellow", "lightgreen", "darkgreen", "blue", "navy", "purple", "magenta")
custom_colors2 <- c("red", "orange", "purple")
# Saving the pie charts as PNG
png("figure3.png", width = 800, height = 400)
par(mfrow = c(1, 2)) # setting the layout
# first pie chart, severity == 3
pie(
weather_conditions,
labels = NULL,
main = "Weather Conditions (severity == 3)",
col = custom_colors1
)
legend("topleft",
legend = paste(names(weather_conditions), "(", percentages1, "%)", sep = ""),
fill = custom_colors1,
cex = 0.8
)
# second pie chart, severity == 1
pie(
weather_conditions2,
labels = NULL,
main = "Weather Conditions (severity == 1)",
col = custom_colors2
)
legend("topleft",
legend = paste(names(weather_conditions2), "(", percentages2, "%)", sep = ""),
fill = custom_colors2,
cex = 0.8
)
# printing two pie charts together
par(mfrow = c(1, 1))
# Close the plotting device to save the file
dev.off()
## png
## 2
Through the analysis, it was observed that for both severity levels 1 and 3, the highest frequency of incidents occurred when weather == 1 (corresponding to the category “Fine without high winds”), which represents calm and sunny weather conditions. This finding led us to an conclusion that such weather, while seemingly safe, might inadvertently contribute to increased accidents due to drivers becoming overly relaxed and less vigilant behind the wheel.