4.1 Data Cleaning

\[ \]:

# Read CSV
df.road <- read.csv("dft-road-casualty-statistics-casualty-provisional-mid-year-unvalidated-2023.csv")

# View the structure of the dataset
str(df.road)

'data.frame':  62674 obs. of  19 variables:
 $ collision_index                   : chr  "2023010419171" "2023010419183" "2023010419183" "2023010419189" ...
 $ collision_year                    : int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ collision_reference               : chr  "010419171" "010419183" "010419183" "010419189" ...
 $ vehicle_reference                 : int  1 2 3 1 2 2 1 1 1 1 ...
 $ casualty_reference                : int  1 1 2 1 1 1 1 1 1 1 ...
 $ casualty_class                    : int  3 1 2 1 1 1 3 1 1 3 ...
 $ sex_of_casualty                   : int  2 1 2 1 1 1 1 1 1 1 ...
 $ age_of_casualty                   : int  20 25 38 50 34 24 65 22 20 33 ...
 $ age_band_of_casualty              : int  4 5 7 8 6 5 9 5 4 6 ...
 $ casualty_severity                 : int  3 3 3 3 3 3 3 3 3 3 ...
 $ pedestrian_location               : int  5 0 0 0 0 0 4 0 0 8 ...
 $ pedestrian_movement               : int  1 0 0 0 0 0 4 0 0 1 ...
 $ car_passenger                     : int  0 0 2 0 0 0 0 0 0 0 ...
 $ bus_or_coach_passenger            : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pedestrian_road_maintenance_worker: int  0 0 0 0 0 0 0 0 0 0 ...
 $ casualty_type                     : int  0 9 9 9 1 3 0 1 9 0 ...
 $ casualty_home_area_type           : int  1 1 -1 1 1 -1 1 2 1 1 ...
 $ casualty_imd_decile               : int  10 3 -1 5 2 -1 5 10 3 8 ...
 $ lsoa_of_casualty                  : chr  "E01030370" "E01001546" "-1" "E01002443" ...

From the 19 variables, we note that majority of the data types are integers. However, these integers are represented by categorical data hence we continue cleaning the data by defining the variables.

\[ \]:

summary(df.road)

 collision_index    collision_year collision_reference vehicle_reference
 Length:62674       Min.   :2023   Length:62674        Min.   :  1.000  
 Class :character   1st Qu.:2023   Class :character    1st Qu.:  1.000  
 Mode  :character   Median :2023   Mode  :character    Median :  1.000  
                    Mean   :2023                       Mean   :  1.468  
                    3rd Qu.:2023                       3rd Qu.:  2.000  
                    Max.   :2023                       Max.   :992.000  
 casualty_reference casualty_class  sex_of_casualty  age_of_casualty 
 Min.   : 1.000     Min.   :1.000   Min.   :-1.000   Min.   : -1.00  
 1st Qu.: 1.000     1st Qu.:1.000   1st Qu.: 1.000   1st Qu.: 22.00  
 Median : 1.000     Median :1.000   Median : 1.000   Median : 34.00  
 Mean   : 1.375     Mean   :1.491   Mean   : 1.358   Mean   : 36.95  
 3rd Qu.: 1.000     3rd Qu.:2.000   3rd Qu.: 2.000   3rd Qu.: 51.00  
 Max.   :70.000     Max.   :3.000   Max.   : 9.000   Max.   :102.00  
 age_band_of_casualty casualty_severity pedestrian_location pedestrian_movement
 Min.   :-1.000       Min.   :1.000     Min.   :-1.0000     Min.   :0.0000     
 1st Qu.: 5.000       1st Qu.:3.000     1st Qu.: 0.0000     1st Qu.:0.0000     
 Median : 6.000       Median :3.000     Median : 0.0000     Median :0.0000     
 Mean   : 6.315       Mean   :2.785     Mean   : 0.8087     Mean   :0.6634     
 3rd Qu.: 8.000       3rd Qu.:3.000     3rd Qu.: 0.0000     3rd Qu.:0.0000     
 Max.   :11.000       Max.   :3.000     Max.   :10.0000     Max.   :9.0000     
 car_passenger     bus_or_coach_passenger pedestrian_road_maintenance_worker
 Min.   :-1.0000   Min.   :-1.00000       Min.   :-1.00000                  
 1st Qu.: 0.0000   1st Qu.: 0.00000       1st Qu.: 0.00000                  
 Median : 0.0000   Median : 0.00000       Median : 0.00000                  
 Mean   : 0.2246   Mean   : 0.06234       Mean   : 0.03721                  
 3rd Qu.: 0.0000   3rd Qu.: 0.00000       3rd Qu.: 0.00000                  
 Max.   : 9.0000   Max.   : 9.00000       Max.   : 2.00000                  
 casualty_type    casualty_home_area_type casualty_imd_decile
 Min.   :-1.000   Min.   :-1.00           Min.   :-1.00      
 1st Qu.: 1.000   1st Qu.: 1.00           1st Qu.: 2.00      
 Median : 9.000   Median : 1.00           Median : 4.00      
 Mean   : 9.508   Mean   : 1.05           Mean   : 4.26      
 3rd Qu.: 9.000   3rd Qu.: 1.00           3rd Qu.: 7.00      
 Max.   :98.000   Max.   : 3.00           Max.   :10.00      
 lsoa_of_casualty  
 Length:62674      
 Class :character  
 Mode  :character

Proceeding with this we check for missing values

\[ \]:

sum(is.na(df.road))

At this point, we have noted there are no missing values within the dataset.

Removing lines with -1 values as the value is unknown and will impact modelling

\[ \]:

# Install and load the dplyr package
install.packages("dplyr")
library(dplyr)

# Specify the columns to check for -1
columns_to_check <- c("vehicle_reference", "sex_of_casualty", "pedestrian_road_maintenance_worker",
                      "pedestrian_location", "lsoa_of_casualty", "casualty_type",
                      "casualty_imd_decile", "casualty_home_area_type", "car_passenger",
                      "age_of_casualty", "age_band_of_casualty")

# Filter out rows with -1 in any of the specified columns
df.road_clean <- df.road %>%
  filter(!if_any(all_of(columns_to_check), ~ . == -1) & sex_of_casualty != 9)

# Print the first few rows of the cleaned data frame to verify
head(df.road_clean)
str(df.road_clean)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

	collision_index	collision_year	collision_reference	vehicle_reference	casualty_reference	casualty_class	sex_of_casualty	age_of_casualty	age_band_of_casualty	casualty_severity	pedestrian_location	pedestrian_movement	car_passenger	bus_or_coach_passenger	pedestrian_road_maintenance_worker	casualty_type	casualty_home_area_type	casualty_imd_decile	lsoa_of_casualty
	<chr>	<int>	<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<chr>
1	2023010419171	2023	010419171	1	1	3	2	20	4	3	5	1	0	0	0	0	1	10	E01030370
2	2023010419183	2023	010419183	2	1	1	1	25	5	3	0	0	0	0	0	9	1	3	E01001546
3	2023010419189	2023	010419189	1	1	1	1	50	8	3	0	0	0	0	0	9	1	5	E01002443
4	2023010419191	2023	010419191	2	1	1	1	34	6	3	0	0	0	0	0	1	1	2	E01004679
5	2023010419198	2023	010419198	1	1	3	1	65	9	3	4	4	0	0	0	0	1	5	E01023593
6	2023010419201	2023	010419201	1	1	1	1	22	5	3	0	0	0	0	0	1	2	10	E01026413

'data.frame':  52065 obs. of  19 variables:
 $ collision_index                   : chr  "2023010419171" "2023010419183" "2023010419189" "2023010419191" ...
 $ collision_year                    : int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ collision_reference               : chr  "010419171" "010419183" "010419189" "010419191" ...
 $ vehicle_reference                 : int  1 2 1 2 1 1 1 1 1 1 ...
 $ casualty_reference                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ casualty_class                    : int  3 1 1 1 3 1 1 3 1 1 ...
 $ sex_of_casualty                   : int  2 1 1 1 1 1 1 1 2 1 ...
 $ age_of_casualty                   : int  20 25 50 34 65 22 20 33 58 40 ...
 $ age_band_of_casualty              : int  4 5 8 6 9 5 4 6 9 7 ...
 $ casualty_severity                 : int  3 3 3 3 3 3 3 3 3 3 ...
 $ pedestrian_location               : int  5 0 0 0 4 0 0 8 0 0 ...
 $ pedestrian_movement               : int  1 0 0 0 4 0 0 1 0 0 ...
 $ car_passenger                     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ bus_or_coach_passenger            : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pedestrian_road_maintenance_worker: int  0 0 0 0 0 0 0 0 0 0 ...
 $ casualty_type                     : int  0 9 9 1 0 1 9 0 9 9 ...
 $ casualty_home_area_type           : int  1 1 1 1 1 2 1 1 1 1 ...
 $ casualty_imd_decile               : int  10 3 5 2 5 10 3 8 5 2 ...
 $ lsoa_of_casualty                  : chr  "E01030370" "E01001546" "E01002443" "E01004679" ...

\[ \]:

# Get the total number of lines in the dataset
total_lines <- nrow(df.road)

print(total_lines)

# Get the total number of lines in the clean dataset
total_lines_clean <- nrow(df.road_clean)

print(total_lines_clean)

# Divide the total number of lines by new dataset to see how many % has been removed
result <- 1-(total_lines_clean/total_lines)

# Print the result
print(result)

[1] 62674
[1] 52065
[1] 0.1692727

4.1.1 Defining the variables

The variables defined are age, casualty home area, casualty classes and also gender of casualty. They were defined to help us proceed to data visualisation.

Age

\[ \]:

# Define a function to map age bands to intervals
map_age_band <- function(age_band) {
  if (age_band == 1) {
    return("0 - 5")
  } else if (age_band <= 10 && age_band > 0) {
    lower_bound <- (age_band - 1) * 5 + 1
    upper_bound <- age_band * 5
    return(paste(lower_bound, "-", upper_bound))
  } else if (age_band == -1) {
    return("Unknown")
  } else {
    return("> 50")
  }
}

# Add a new column using mutate
df.road_clean <- df.road_clean %>%
  mutate(age_interval = sapply(age_band_of_casualty, map_age_band))

head(df.road_clean)

A data.frame: 6 × 20
	collision_index	collision_year	collision_reference	vehicle_reference	casualty_reference	casualty_class	sex_of_casualty	age_of_casualty	age_band_of_casualty	casualty_severity	pedestrian_location	pedestrian_movement	car_passenger	bus_or_coach_passenger	pedestrian_road_maintenance_worker	casualty_type	casualty_home_area_type	casualty_imd_decile	lsoa_of_casualty	age_interval
	<chr>	<int>	<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<chr>	<chr>
1	2023010419171	2023	010419171	1	1	3	2	20	4	3	5	1	0	0	0	0	1	10	E01030370	16 - 20
2	2023010419183	2023	010419183	2	1	1	1	25	5	3	0	0	0	0	0	9	1	3	E01001546	21 - 25
3	2023010419189	2023	010419189	1	1	1	1	50	8	3	0	0	0	0	0	9	1	5	E01002443	36 - 40
4	2023010419191	2023	010419191	2	1	1	1	34	6	3	0	0	0	0	0	1	1	2	E01004679	26 - 30
5	2023010419198	2023	010419198	1	1	3	1	65	9	3	4	4	0	0	0	0	1	5	E01023593	41 - 45
6	2023010419201	2023	010419201	1	1	1	1	22	5	3	0	0	0	0	0	1	2	10	E01026413	21 - 25

Casualty Home Area

\[ \]:

# Define a function to classify casualty home area
casualty_home.area_desc <- function(casualty_home_area_type) {
  if (casualty_home_area_type == 1) {
    return("Urban")
  } else if (casualty_home_area_type == 2) {
    return("Semi-urban")
  } else if (casualty_home_area_type == 3) {
    return("Rural")
  } else {
    return("Other")
  }
}

# Apply the function to create a new column
df.road_clean <- df.road_clean %>%
  mutate(casualty_home.area_desc = sapply(casualty_home_area_type, casualty_home.area_desc))

# Display the first few rows to check the result
head(df.road_clean)

A data.frame: 6 × 21
	collision_index	collision_year	collision_reference	vehicle_reference	casualty_reference	casualty_class	sex_of_casualty	age_of_casualty	age_band_of_casualty	casualty_severity	⋯	pedestrian_movement	car_passenger	bus_or_coach_passenger	pedestrian_road_maintenance_worker	casualty_type	casualty_home_area_type	casualty_imd_decile	lsoa_of_casualty	age_interval	casualty_home.area_desc
	<chr>	<int>	<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	⋯	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<chr>	<chr>	<chr>
1	2023010419171	2023	010419171	1	1	3	2	20	4	3	⋯	1	0	0	0	0	1	10	E01030370	16 - 20	Urban
2	2023010419183	2023	010419183	2	1	1	1	25	5	3	⋯	0	0	0	0	9	1	3	E01001546	21 - 25	Urban
3	2023010419189	2023	010419189	1	1	1	1	50	8	3	⋯	0	0	0	0	9	1	5	E01002443	36 - 40	Urban
4	2023010419191	2023	010419191	2	1	1	1	34	6	3	⋯	0	0	0	0	1	1	2	E01004679	26 - 30	Urban
5	2023010419198	2023	010419198	1	1	3	1	65	9	3	⋯	4	0	0	0	0	1	5	E01023593	41 - 45	Urban
6	2023010419201	2023	010419201	1	1	1	1	22	5	3	⋯	0	0	0	0	1	2	10	E01026413	21 - 25	Semi-urban

Casualty classes

\[ \]:

# Define a function to classify casualty classes
casualty_class_desc <- function(casualty_class) {
  if (casualty_class == 1) {
    return("Driver")
  } else if (casualty_class == 2) {
    return("Passenger")
  } else if (casualty_class == 3) {
    return("Pedestrian")
  } else {
    return("Other")
  }
}

# Apply the function to create a new column
df.road_clean <- df.road_clean %>%
  mutate(casualty_class_desc = sapply(casualty_class, casualty_class_desc))

# Define a function to classify casualty severity
casualty_severity_desc <- function(casualty_severity) {
  if (casualty_severity == 1) {
    return("Fatal")
  } else if (casualty_severity == 2) {
    return("Serious")
  } else if (casualty_severity == 3) {
    return("Slight")
  } else {
    return("Other")
  }
}

# Apply the function to create a new column
df.road_clean <- df.road_clean %>%
  mutate(casualty_severity_desc = sapply(casualty_severity, casualty_severity_desc))

# Define a function to classify casualty gender
casualty_sex_desc <- function(sex_of_casualty) {
  if (sex_of_casualty == 1) {
    return("Male")
  } else if (sex_of_casualty == 2) {
    return("Female")
  } else {
    return("Unknown")
  }
}

# Apply the function to create a new column
df.road_clean <- df.road_clean %>%
  mutate(casualty_sex_desc = sapply(sex_of_casualty, casualty_sex_desc))

# Define a function to classify car passenger
car_passenger_desc <- function(car_passenger) {
  if (car_passenger == 0) {
    return("Yes - Car")
  } else {
    return("No - Others")
  }
}

# Apply the function to create a new column
df.road_clean <- df.road_clean %>%
  mutate(car_passenger_desc = sapply(car_passenger, car_passenger_desc))

# Display the first few rows to check the result
head(df.road_clean)

# Check the summary of the cleaned data frame
str(df.road_clean)

A data.frame: 6 × 25
	collision_index	collision_year	collision_reference	vehicle_reference	casualty_reference	casualty_class	sex_of_casualty	age_of_casualty	age_band_of_casualty	casualty_severity	⋯	casualty_type	casualty_home_area_type	casualty_imd_decile	lsoa_of_casualty	age_interval	casualty_home.area_desc	casualty_class_desc	casualty_severity_desc	casualty_sex_desc	car_passenger_desc
	<chr>	<int>	<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	⋯	<int>	<int>	<int>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>
1	2023010419171	2023	010419171	1	1	3	2	20	4	3	⋯	0	1	10	E01030370	16 - 20	Urban	Pedestrian	Slight	Female	Yes - Car
2	2023010419183	2023	010419183	2	1	1	1	25	5	3	⋯	9	1	3	E01001546	21 - 25	Urban	Driver	Slight	Male	Yes - Car
3	2023010419189	2023	010419189	1	1	1	1	50	8	3	⋯	9	1	5	E01002443	36 - 40	Urban	Driver	Slight	Male	Yes - Car
4	2023010419191	2023	010419191	2	1	1	1	34	6	3	⋯	1	1	2	E01004679	26 - 30	Urban	Driver	Slight	Male	Yes - Car
5	2023010419198	2023	010419198	1	1	3	1	65	9	3	⋯	0	1	5	E01023593	41 - 45	Urban	Pedestrian	Slight	Male	Yes - Car
6	2023010419201	2023	010419201	1	1	1	1	22	5	3	⋯	1	2	10	E01026413	21 - 25	Semi-urban	Driver	Slight	Male	Yes - Car

'data.frame':  52065 obs. of  25 variables:
 $ collision_index                   : chr  "2023010419171" "2023010419183" "2023010419189" "2023010419191" ...
 $ collision_year                    : int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ collision_reference               : chr  "010419171" "010419183" "010419189" "010419191" ...
 $ vehicle_reference                 : int  1 2 1 2 1 1 1 1 1 1 ...
 $ casualty_reference                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ casualty_class                    : int  3 1 1 1 3 1 1 3 1 1 ...
 $ sex_of_casualty                   : int  2 1 1 1 1 1 1 1 2 1 ...
 $ age_of_casualty                   : int  20 25 50 34 65 22 20 33 58 40 ...
 $ age_band_of_casualty              : int  4 5 8 6 9 5 4 6 9 7 ...
 $ casualty_severity                 : int  3 3 3 3 3 3 3 3 3 3 ...
 $ pedestrian_location               : int  5 0 0 0 4 0 0 8 0 0 ...
 $ pedestrian_movement               : int  1 0 0 0 4 0 0 1 0 0 ...
 $ car_passenger                     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ bus_or_coach_passenger            : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pedestrian_road_maintenance_worker: int  0 0 0 0 0 0 0 0 0 0 ...
 $ casualty_type                     : int  0 9 9 1 0 1 9 0 9 9 ...
 $ casualty_home_area_type           : int  1 1 1 1 1 2 1 1 1 1 ...
 $ casualty_imd_decile               : int  10 3 5 2 5 10 3 8 5 2 ...
 $ lsoa_of_casualty                  : chr  "E01030370" "E01001546" "E01002443" "E01004679" ...
 $ age_interval                      : chr  "16 - 20" "21 - 25" "36 - 40" "26 - 30" ...
 $ casualty_home.area_desc           : chr  "Urban" "Urban" "Urban" "Urban" ...
 $ casualty_class_desc               : chr  "Pedestrian" "Driver" "Driver" "Driver" ...
 $ casualty_severity_desc            : chr  "Slight" "Slight" "Slight" "Slight" ...
 $ casualty_sex_desc                 : chr  "Female" "Male" "Male" "Male" ...
 $ car_passenger_desc                : chr  "Yes - Car" "Yes - Car" "Yes - Car" "Yes - Car" ...

Gender of casualty

\[ \]:

4.1.2 Data Visualisation

\[ \]:

hist(df.road_clean$age_of_casualty,
     main = "Distribution of Casualty Age",
     xlab = "Age",
     col = "skyblue",
     border = "white")


# Convert age_interval to a factor variable with custom levels
df.road_clean$age_interval <- factor(df.road_clean$age_interval, levels = c("0 - 5", "6 - 10", "11 - 15", "16 - 20", "21 - 25", "26 - 30", "31 - 35", "36 - 40", "41 - 45", "46 - 50", "> 50"))

# Tabulate the counts of each age interval
age_interval_counts <- table(df.road_clean$age_interval)

# Create a bar plot
barplot(age_interval_counts,
        main = "Distribution of Age Intervals",
        xlab = "Age Intervals",
        ylab = "Count",
        col = "skyblue",
        border = "white",
        las = 3)  # Rotate x-axis labels by 90 degrees

From the above histogram, we note that majority of the road casualities below to adults from 20s to 40s years old.

\[ \]:

# Create a table of casualty class descriptions
casualty_class_desc <- c("Driver", "Passenger", "Pedestrian", "Other")
class_counts <- table(df.road_clean$casualty_class_desc)

# Bar plot of casualty class descriptions
barplot(class_counts,
        main = "Frequency of Casualty Types",
        xlab = "Casualty Class",
        ylab = "Frequency",
        col = "skyblue",
        border ="white",
        names.arg = names(class_counts))

# Load necessary libraries
library(dplyr)
library(RColorBrewer)

# Calculate frequency of each casualty class description
class_counts <- table(df.road_clean$casualty_class_desc)

# Define a pastel color palette
pastel_palette <- brewer.pal(n = 5, name = "Pastel1")

# Create a pie chart
pie(class_counts,
    main = "Distribution of Casualty Class Descriptions",
    col = pastel_palette,
    cex = 0.8,
    labels = paste(names(class_counts), ": ", class_counts))

# Add a legend
legend("topright",
       legend = names(class_counts),
       fill = pastel_palette,
       cex = 0.8,
       title = "Casualty Class Desc.")

The dataset is mostly made up of casualities of drivers, followed up by passenger and finally pedestrian.

\[ \]:

# Show a table of casualty severity codes and their descriptions
severity_table <- table(df.road_clean$casualty_severity, df.road_clean$casualty_severity_desc)
severity_table


barplot(table(df.road_clean$casualty_class),
        main = "Frequency of Casualty Classes",
        xlab = "Casualty Class",
        ylab = "Frequency",
        col = "skyblue")


boxplot(df.road_clean$age_of_casualty ~ df.road_clean$casualty_severity,
        main = "Age Distribution by Casualty Severity",
        xlab = "Casualty Severity",
        ylab = "Age",
        col = "skyblue")

severity_by_class <- table(df.road_clean$casualty_class, df.road_clean$casualty_severity)
barplot(severity_by_class,
        main = "Casualty Severity by Casualty Class",
        xlab = "Casualty Class",
        ylab = "Frequency",
        col = c("skyblue", "salmon","purple"),
        legend = rownames(severity_by_class))

    Fatal Serious Slight
  1   551       0      0
  2     0    9768      0
  3     0       0  41746

\[ \]:

# Count the frequency of each combination of casualty severity and gender
severity_gender_counts <- table(df.road_clean$casualty_sex_desc, df.road_clean$casualty_severity_desc)

# Plot the bar chart
barplot(severity_gender_counts,
        beside = TRUE,  # To plot bars side by side
        main = "Casualty Severity by Gender",
        xlab = "Frequency",
        ylab = "Casualty Severity",
        col = c("skyblue", "salmon","purple"),  # Specify colors for bars
        legend = rownames(severity_gender_counts),  # Add legend
        args.legend = list(title = "Gender"))  # Add legend title

\[ \]:

# Install the gplots package if not already installed
install.packages("gplots")

# Load the gplots package
library(gplots)

# Filter out non-numeric columns
numeric_df <- df.road_clean[sapply(df.road_clean, is.numeric)]

# Remove columns with zero variance
numeric_df <- numeric_df[, sapply(numeric_df, function(x) var(x, na.rm = TRUE) != 0)]

# Calculate correlation matrix
correlation_matrix <- cor(numeric_df, use = "pairwise.complete.obs")

# Create the heatmap with correlation values
heatmap.2(correlation_matrix,
          main = "Correlation Heatmap of Numerical Variables",
          col = cm.colors(256),
          trace = "none",
          symm = TRUE,
          density.info = "none",
          key = TRUE,
          keysize = 1.2,
          key.title = NA,
          key.xlab = "Correlation",
          cexRow = 0.8,
          cexCol = 0.8,
          labRow = names(correlation_matrix),
          labCol = names(correlation_matrix),
          notecol = "black",
          notecex = 0.8,
          margins = c(10, 10))

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘bitops’, ‘gtools’, ‘caTools’



Attaching package: ‘gplots’


The following object is masked from ‘package:stats’:

    lowess

Based on heatmap above, we can see strong relationships between the following:

pedestrian_location and casualty class which indicates that the type of driver and the location of the pedestrian movement at the time of accident.A positive correlation between pedestrian_location and casualty_class suggests that the severity of casualties (reflected by casualty_class) is related to where the pedestrian was located when the accident happened.

The heatmap likely indicates a correlation between the severity of the casualty (reflected by casualty_class) and whether there was a car passenger involved in the accident (represented by car_passenger). A positive correlation between these variables could suggest that accidents with more severe casualties (higher casualty_class values) are also more likely to involve car passengers. There are a few possible explanations for this: - Cars with more occupants (including passengers) could be involved in higher-impact collisions that cause more severe injuries. - Passengers might also be vulnerable to injuries in severe accidents, depending on the seating position, type of collision, etc.

\[ \]:

str(df.road_clean)
summary(df.road_clean)

'data.frame':  52065 obs. of  25 variables:
 $ collision_index                   : chr  "2023010419171" "2023010419183" "2023010419189" "2023010419191" ...
 $ collision_year                    : int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ collision_reference               : chr  "010419171" "010419183" "010419189" "010419191" ...
 $ vehicle_reference                 : int  1 2 1 2 1 1 1 1 1 1 ...
 $ casualty_reference                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ casualty_class                    : int  3 1 1 1 3 1 1 3 1 1 ...
 $ sex_of_casualty                   : int  2 1 1 1 1 1 1 1 2 1 ...
 $ age_of_casualty                   : int  20 25 50 34 65 22 20 33 58 40 ...
 $ age_band_of_casualty              : int  4 5 8 6 9 5 4 6 9 7 ...
 $ casualty_severity                 : int  3 3 3 3 3 3 3 3 3 3 ...
 $ pedestrian_location               : int  5 0 0 0 4 0 0 8 0 0 ...
 $ pedestrian_movement               : int  1 0 0 0 4 0 0 1 0 0 ...
 $ car_passenger                     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ bus_or_coach_passenger            : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pedestrian_road_maintenance_worker: int  0 0 0 0 0 0 0 0 0 0 ...
 $ casualty_type                     : int  0 9 9 1 0 1 9 0 9 9 ...
 $ casualty_home_area_type           : int  1 1 1 1 1 2 1 1 1 1 ...
 $ casualty_imd_decile               : int  10 3 5 2 5 10 3 8 5 2 ...
 $ lsoa_of_casualty                  : chr  "E01030370" "E01001546" "E01002443" "E01004679" ...
 $ age_interval                      : Factor w/ 11 levels "0 - 5","6 - 10",..: 4 5 8 6 9 5 4 6 9 7 ...
 $ casualty_home.area_desc           : chr  "Urban" "Urban" "Urban" "Urban" ...
 $ casualty_class_desc               : chr  "Pedestrian" "Driver" "Driver" "Driver" ...
 $ casualty_severity_desc            : chr  "Slight" "Slight" "Slight" "Slight" ...
 $ casualty_sex_desc                 : chr  "Female" "Male" "Male" "Male" ...
 $ car_passenger_desc                : chr  "Yes - Car" "Yes - Car" "Yes - Car" "Yes - Car" ...

 collision_index    collision_year collision_reference vehicle_reference
 Length:52065       Min.   :2023   Length:52065        Min.   :  1.00   
 Class :character   1st Qu.:2023   Class :character    1st Qu.:  1.00   
 Mode  :character   Median :2023   Mode  :character    Median :  1.00   
                    Mean   :2023                       Mean   :  1.47   
                    3rd Qu.:2023                       3rd Qu.:  2.00   
                    Max.   :2023                       Max.   :992.00   
                                                                        
 casualty_reference casualty_class  sex_of_casualty age_of_casualty 
 Min.   : 1.000     Min.   :1.000   Min.   :1.000   Min.   :  0.00  
 1st Qu.: 1.000     1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 23.00  
 Median : 1.000     Median :1.000   Median :1.000   Median : 35.00  
 Mean   : 1.342     Mean   :1.456   Mean   :1.382   Mean   : 38.06  
 3rd Qu.: 1.000     3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.: 51.00  
 Max.   :70.000     Max.   :3.000   Max.   :2.000   Max.   :102.00  
                                                                    
 age_band_of_casualty casualty_severity pedestrian_location pedestrian_movement
 Min.   : 1.000       Min.   :1.000     Min.   : 0.0000     Min.   :0.0000     
 1st Qu.: 5.000       1st Qu.:3.000     1st Qu.: 0.0000     1st Qu.:0.0000     
 Median : 6.000       Median :3.000     Median : 0.0000     Median :0.0000     
 Mean   : 6.519       Mean   :2.791     Mean   : 0.7703     Mean   :0.6357     
 3rd Qu.: 8.000       3rd Qu.:3.000     3rd Qu.: 0.0000     3rd Qu.:0.0000     
 Max.   :11.000       Max.   :3.000     Max.   :10.0000     Max.   :9.0000     
                                                                               
 car_passenger    bus_or_coach_passenger pedestrian_road_maintenance_worker
 Min.   :0.0000   Min.   :-1.00000       Min.   :0.0000                    
 1st Qu.:0.0000   1st Qu.: 0.00000       1st Qu.:0.0000                    
 Median :0.0000   Median : 0.00000       Median :0.0000                    
 Mean   :0.2087   Mean   : 0.05232       Mean   :0.0362                    
 3rd Qu.:0.0000   3rd Qu.: 0.00000       3rd Qu.:0.0000                    
 Max.   :9.0000   Max.   : 4.00000       Max.   :2.0000                    
                                                                           
 casualty_type    casualty_home_area_type casualty_imd_decile
 Min.   : 0.000   Min.   :1.000           Min.   : 1.000     
 1st Qu.: 1.000   1st Qu.:1.000           1st Qu.: 2.000     
 Median : 9.000   Median :1.000           Median : 5.000     
 Mean   : 9.523   Mean   :1.281           Mean   : 4.918     
 3rd Qu.: 9.000   3rd Qu.:1.000           3rd Qu.: 7.000     
 Max.   :98.000   Max.   :3.000           Max.   :10.000     
                                                             
 lsoa_of_casualty    age_interval   casualty_home.area_desc casualty_class_desc
 Length:52065       26 - 30:11035   Length:52065            Length:52065       
 Class :character   31 - 35: 8441   Class :character        Class :character   
 Mode  :character   36 - 40: 6827   Mode  :character        Mode  :character   
                    21 - 25: 5695                                              
                    16 - 20: 5338                                              
                    41 - 45: 5193                                              
                    (Other): 9536                                              
 casualty_severity_desc casualty_sex_desc  car_passenger_desc
 Length:52065           Length:52065       Length:52065      
 Class :character       Class :character   Class :character  
 Mode  :character       Mode  :character   Mode  :character

4.1.3 Summary of data cleaning

We have removed negative age which contributed 2.2% of the entire population.
We have added variable definitions for easy reference.
We have noted column bus_or_car_passengers is similar to column car_passengers hence we have dropped bus_or_car_passengers to avoid repetition in data.

4.2 Regression Modeling and Evaluation

Question: How can the class of casualties in road accidents be predicted based on accident characteristics, vehicle information, and other casualty demographics?

casualty_class value:

1 - driver
2 - passenger
3 - pedestrian

\[ \]:

# Install the package for splitting the train set and test set from the cleaned dataset
install.packages("caret")
library(caret)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘listenv’, ‘parallelly’, ‘future’, ‘globals’, ‘shape’, ‘future.apply’, ‘numDeriv’, ‘progressr’, ‘SQUAREM’, ‘diagram’, ‘lava’, ‘prodlim’, ‘proxy’, ‘iterators’, ‘Rcpp’, ‘clock’, ‘gower’, ‘hardhat’, ‘ipred’, ‘timeDate’, ‘e1071’, ‘foreach’, ‘ModelMetrics’, ‘plyr’, ‘pROC’, ‘recipes’, ‘reshape2’


Loading required package: ggplot2

Loading required package: lattice

\[ \]:

# Create a subset for modeling
modelSet <- subset(df.road_clean, select = c(vehicle_reference, casualty_class, sex_of_casualty, age_of_casualty,
age_band_of_casualty, casualty_severity, pedestrian_location, pedestrian_movement, car_passenger, bus_or_coach_passenger,
pedestrian_road_maintenance_worker, casualty_home_area_type, casualty_imd_decile))

# Set random seed for reproducibility
set.seed(123)

# Create training set index with 70% of the data
trainIndex <- createDataPartition(modelSet$casualty_class, p = 0.7, list = FALSE)

# Split the data into training and testing sets based on the index
trainSet <- modelSet[trainIndex, ]
testSet <- modelSet[-trainIndex, ]

# View the number of rows in each dataset
nrow(trainSet)
nrow(testSet)

36446

15619

\[ \]:

# Install the package for random forest modeling
install.packages("randomForest")
library(randomForest)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

randomForest 4.7-1.1

Type rfNews() to see new features/changes/bug fixes.


Attaching package: ‘randomForest’


The following object is masked from ‘package:ggplot2’:

    margin


The following object is masked from ‘package:dplyr’:

    combine

\[ \]:

# Define the predictors
predictors <- setdiff(names(trainSet), "casualty_class")

# Train the Random Forest model
# Convert the data to a data.frame for randomForest function
trainSet <- as.data.frame(trainSet)
testSet <- as.data.frame(testSet)

# Create the formula for the Random Forest model
formula <- as.formula(paste("casualty_class", "~", paste(predictors, collapse = " + ")))

# Train the model
set.seed(123)  # For reproducibility
rf_model <- randomForest(formula, data = trainSet, ntree = 500, mtry = floor(sqrt(length(predictors))), importance = TRUE)

# Summarize the model
print(rf_model)

# Make predictions on the test set
predictions <- predict(rf_model, testSet)

Warning message in randomForest.default(m, y, ...):
“The response has five or fewer unique values.  Are you sure you want to do regression?”

Call:
 randomForest(formula = formula, data = trainSet, ntree = 500,      mtry = floor(sqrt(length(predictors))), importance = TRUE) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 3

          Mean of squared residuals: 0.01283855
                    % Var explained: 97.58

\[ \]:

# Evaluate the model

# Calculate Mean Squared Error (MSE)
actuals <- testSet[["casualty_class"]]
mse <- mean((predictions - actuals)^2)

# Calculate Root Mean Squared Error (RMSE)
rmse <- sqrt(mse)

# Calculate R-squared
ss_total <- sum((actuals - mean(actuals))^2)
ss_residual <- sum((actuals - predictions)^2)
r_squared <- 1 - (ss_residual / ss_total)

# Print evaluation metrics
cat("Mean Squared Error:", mse, "\n")
cat("Root Mean Squared Error:", rmse, "\n")
cat("R-squared:", r_squared, "\n")

Mean Squared Error: 0.01388033 
Root Mean Squared Error: 0.1178148 
R-squared: 0.9737804

Low MSE and Low RMSE: These metrics indicate that the prediction errors are small on average, meaning the model's predictions are very close to the actual values.

High R-squared: This indicates that the model explains a large proportion of the variability in the target variable, suggesting a very good fit.

In the context of predictive modeling, achieving low Mean Squared Error (MSE) and low Root Mean Squared Error (RMSE) are indicative of the model making predictions that are very close to the actual values on average. MSE measures the average squared difference between predicted values and actual values, while RMSE provides a measure of the average magnitude of these prediction errors in the same units as the target variable. When both MSE and RMSE are low, it signifies that the model’s predictions are accurate and have minimal deviation from the true values.

Additionally, a high R-squared (R²) value indicates that the model can explain a large proportion of the variability observed in the target variable. R² ranges from 0 to 1, with 1 indicating that the model perfectly predicts the target variable based on the predictors. A high R-squared suggests that the model fits the data well, capturing a significant portion of the variance in the target variable and thereby demonstrating its efficacy in explaining the relationships between predictors and the outcome.

Together, low MSE, low RMSE, and high R-squared collectively signify that the predictive model performs well in terms of accuracy and explanatory power, making it a reliable tool for making predictions and understanding the underlying relationships within the data.

\[ \]:

# Install ggplot2 for visualization
install.packages("ggplot2")
library(ggplot2)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

\[ \]:

# Density Plot of Predicted and Actual Values
plot(density(actuals),
     main = "Density Plot of Actual vs Predicted Values",
     xlab = "Values",
     col = "salmon", lwd = 2)
lines(density(predictions), col = "purple", lwd = 2)
legend("topright", legend = c("Actual", "Predicted"), col = c("salmon", "purple"), lwd = 2)

# Box Plot of Predicted and Actual Values
boxplot(actuals, predictions,
        names = c("Actual", "Predicted"),
        main = "Box Plot of Actual vs Predicted Values",
        col = c("salmon", "purple"))

# Density Plot of Residuals
residuals <- actuals - predictions
ggplot(data.frame(residuals), aes(x = residuals)) +
  geom_density(fill = "purple", alpha = 0.5) +
  labs(title = "Density Plot of Residuals",
       x = "Residuals",
       y = "Density")

4.2.1 Output Discussion

1. Density Plot of Actual vs. Predicted Values

The peaks and shapes of both densities are very similar, which means that the predicted values follow the same distribution as the actual values.

Small deviations may exist, but the model captures the overall underlying distribution of the actual values well.

2. Box Plot of Actual vs. Predicted Values

Both distributions appear similar, indicating that the predictions are closely aligned with the actual values.

There are no extreme outliers, suggesting that the model's predictions are consistent with the actual data.

3. Density Plot of Residuals

The residuals are centered around zero, indicating that the model does not have significant bias.

The peak at zero is sharp, suggesting that most of the residuals are small, which is a good sign of model accuracy.

The spread of the residuals is relatively narrow, indicating that the model's predictions are consistently close to the actual values.

These visualizations collectively provide insights into the model’s performance in predicting casualty classes in road accidents, showcasing its ability to closely match actual data distributions and minimize prediction errors effectively.

4.3 Classification Modeling and Evaluation

Question: Can machine learning techniques effectively predict casualty severity, and which variables are most predictive in the model?

For a classification model for multinomial classification problem. Multinomial logistic regressioncan be used.

\[ \]:

# Load necessary libraries
if(!require(caret)) {
    install.packages("caret")
    library(caret)
}

if(!require(nnet)) {
    install.packages("nnet")
    library(nnet)
}

Loading required package: e1071

Loading required package: tidyverse

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ purrr::%||%()           masks base::%||%()
✖ randomForest::combine() masks dplyr::combine()
✖ dplyr::filter()         masks stats::filter()
✖ dplyr::lag()            masks stats::lag()
✖ purrr::lift()           masks caret::lift()
✖ randomForest::margin()  masks ggplot2::margin()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: gbm

Warning message in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
“there is no package called ‘gbm’”
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Loaded gbm 2.1.9

This version of gbm is no longer under development. Consider transitioning to gbm3, https://github.com/gbm-developers/gbm3

Loading required package: nnet

\[ \]:

head(df.road_clean)

	collision_index	collision_year	collision_reference	vehicle_reference	casualty_reference	casualty_class	sex_of_casualty	age_of_casualty	age_band_of_casualty	casualty_severity	⋯	casualty_type	casualty_home_area_type	casualty_imd_decile	lsoa_of_casualty	age_interval	casualty_home.area_desc	casualty_class_desc	casualty_severity_desc	casualty_sex_desc	car_passenger_desc
	<chr>	<int>	<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	⋯	<int>	<int>	<int>	<chr>	<fct>	<chr>	<chr>	<chr>	<chr>	<chr>
1	2023010419171	2023	010419171	1	1	3	2	20	4	3	⋯	0	1	10	E01030370	16 - 20	Urban	Pedestrian	Slight	Female	Yes - Car
2	2023010419183	2023	010419183	2	1	1	1	25	5	3	⋯	9	1	3	E01001546	21 - 25	Urban	Driver	Slight	Male	Yes - Car
3	2023010419189	2023	010419189	1	1	1	1	50	8	3	⋯	9	1	5	E01002443	36 - 40	Urban	Driver	Slight	Male	Yes - Car
4	2023010419191	2023	010419191	2	1	1	1	34	6	3	⋯	1	1	2	E01004679	26 - 30	Urban	Driver	Slight	Male	Yes - Car
5	2023010419198	2023	010419198	1	1	3	1	65	9	3	⋯	0	1	5	E01023593	41 - 45	Urban	Pedestrian	Slight	Male	Yes - Car
6	2023010419201	2023	010419201	1	1	1	1	22	5	3	⋯	1	2	10	E01026413	21 - 25	Semi-urban	Driver	Slight	Male	Yes - Car

\[ \]:

# Drop columns that won't be used for prediction
columns_to_drop <- c('collision_index', 'collision_reference', 'lsoa_of_casualty',
                     'age_interval', 'casualty_home.area_desc', 'casualty_class_desc',
                     'casualty_severity_desc', 'casualty_sex_desc', 'car_passenger_desc')
df.road_clean_modified <- df.road_clean %>% select(-one_of(columns_to_drop))

# Convert the outcome variable to a factor if it is not already.

df.road_clean_modified$casualty_severity <- as.factor(df.road_clean_modified$casualty_severity)

\[ \]:

# Split data into training and testing sets
set.seed(123)
train_index <- sample(seq_len(nrow(df.road_clean_modified)), size = 0.7*nrow(df.road_clean_modified))
train_data <- df.road_clean_modified[train_index, ]
test_data <- df.road_clean_modified[-train_index, ]

\[ \]:

# Train the multinomial logistic regression model
multinom_model <- multinom(casualty_severity ~ ., data = train_data)

# weights:  51 (32 variable)
initial  value 40038.924861 
iter  10 value 20200.784998
iter  20 value 19995.322055
iter  30 value 19252.683059
iter  40 value 18961.170221
iter  50 value 18958.209543
iter  60 value 18947.713461
final  value 18947.521949 
converged

\[ \]:

# Make predictions
predictions <- predict(multinom_model, newdata = test_data)

\[ \]:

# Evaluate the model
conf_matrix <- confusionMatrix(predictions, test_data$casualty_severity)
print(conf_matrix)

Confusion Matrix and Statistics

          Reference
Prediction     1     2     3
         1     0     0     0
         2     2     6     7
         3   163  2924 12516

Overall Statistics
                                         
               Accuracy : 0.8018         
                 95% CI : (0.7954, 0.808)
    No Information Rate : 0.8018         
    P-Value [Acc > NIR] : 0.5128         
                                         
                  Kappa : 0.0026         
                                         
 Mcnemar's Test P-Value : <2e-16         

Statistics by Class:

                     Class: 1  Class: 2 Class: 3
Sensitivity           0.00000 0.0020478 0.999441
Specificity           1.00000 0.9992907 0.002585
Pos Pred Value            NaN 0.4000000 0.802153
Neg Pred Value        0.98944 0.8126001 0.533333
Prevalence            0.01056 0.1876040 0.801831
Detection Rate        0.00000 0.0003842 0.801383
Detection Prevalence  0.00000 0.0009604 0.999040
Balanced Accuracy     0.50000 0.5006692 0.501013

\[ \]:

# Overall accuracy
accuracy <- conf_matrix$overall['Accuracy']
cat("Accuracy: ", accuracy, "\n")

# Class-specific metrics
class_metrics <- conf_matrix$byClass

# Macro-averaged metrics
macro_avg_precision <- mean(class_metrics[, 'Precision'], na.rm = TRUE)
macro_avg_recall <- mean(class_metrics[, 'Recall'], na.rm = TRUE)
macro_avg_f1 <- mean(class_metrics[, 'F1'], na.rm = TRUE)
macro_avg_specificity <- mean(class_metrics[, 'Specificity'], na.rm = TRUE)

cat("Macro-Averaged Precision: ", macro_avg_precision, "\n")
cat("Macro-Averaged Recall: ", macro_avg_recall, "\n")
cat("Macro-Averaged F1 Score: ", macro_avg_f1, "\n")
cat("Macro-Averaged Specificity: ", macro_avg_specificity, "\n")

Accuracy:  0.8017672 
Macro-Averaged Precision:  0.6010767 
Macro-Averaged Recall:  0.3338296 
Macro-Averaged F1 Score:  0.4470349 
Macro-Averaged Specificity:  0.6672918

4.3.1 Evaluation of Classification Model

Referring to confusion matrix, the model performs poorly for Class 1 with no correct predictions. Class 2 has some correct predictions but is also misclassified frequently. Class 3 is predicted well due to the high prevalence in the dataset.

The model achieved an accuracy of 80.18% indicating a high overall correctness in its predictions, but this might be skewed by the high prevalence of Class 3.

Kappa is very low (0.0026), indicating poor agreement beyond what would be expected by chance.

The macro-averaged precision of 0.6010767 indicates that, on average, the model's precision across all classes is about 60.11%. A precision of 60.11% suggests that when the model predicts a certain class, it is correct 60.11% of the time on average.

The recall, at 33.38%, indicates that the model correctly identifies only a third of the actual instances for each class on average, highlighting a significant shortfall in sensitivity.

The F1 score, which balances precision and recall, stands at 44.70%, reflecting the overall harmonic mean of these two metrics and underscoring the trade-off between them.

The specificity of 66.73% shows how well the model correctly identifies cases that do not belong to a particular class. Overall, these results suggest that although the model is generally good at making correct predictions, it has difficulty in correctly identifying all the actual instances of each class.

In conclusion, the model has low recall and moderate F1 score, despite having high accuracy and moderate precision. This might be due to class imbalance where one class is much more frequent than the others in the training data. More complex algorithms such as random forest and gradient boosting machine models with class weight adjustment might be more suitable for the prediction.

A Comparative Study on Road Accidents

Li Anqi (23069701), Rithana Gunasegaran (23085726), Sara Javed Sani (23063175), Sidharrthan A/L Pichyalagan (23068657), Lin Liy Yee (17007181)

1.0 Project Background

2.0 Data Cleaning

3.0 Data Modeling

3.1 Regression

3.1.1: Data Preparation

3.1.2: Model Training

3.1.3: Model Evaluation

3.1.4: Model Visualization

3.2 Classification

3.2.1 Data Preparation:

3.2.2 Train-Test Split:

3.2.3 Model Training:

3.2.4 Model Evaluation:

3.2.5 Feature Importance:

4.0 Results

4.1 Data Cleaning

4.1.1 Defining the variables

4.1.2 Data Visualisation

4.1.3 Summary of data cleaning

4.2 Regression Modeling and Evaluation

4.2.1 Output Discussion

4.3 Classification Modeling and Evaluation

4.3.1 Evaluation of Classification Model

5.0 Conclusion

6.0 Reference List