Analysis of Airline Safety Dataset

Data Preparation

library(tidyverse)

Motivation

Motivation: The motivation behind this analysis goes beyond simple counts of accidents and incidents. We aim to delve into the rates of accidents, such as accidents per 100,000 flight miles. This metric allows us to account for changes in flight activity over time. For instance, if the number of miles flown doubles, but the accident count only increases by 20%, it could indicate an improvement in flight safety.

This approach aligns with industry best practices and allows for a more nuanced evaluation of safety measures, providing valuable insights for both the aviation industry and the broader public. Our goal is not only to identify patterns but also to contribute to the ongoing efforts to enhance aviation safety and inform decision-making in the industry.

Data collection

The dataset is hosted on kaggle so we are using from kaggle

Type of study

Additionally, we will explore various metrics beyond accident counts, including rates of fatalities and potentially other criteria. By examining these multiple dimensions, we can gain a more holistic understanding of airline safety and identify trends that might be obscured by raw counts alone.

Data Source

The data set has been made available by kaggle at the following link

Dataset Descriptions

airline —————–> Airline (asterisk indicates that regional subsidiaries are included) avail_seat_km_per_week —-> Available seat kilometers flown every week incidents_85_99 ———–>Total number of incidents, 1985–1999 fatal_accidents_85_99——> Total number of fatal accidents, 1985–1999 fatalities_85_99 ——–>Total number of fatalities, 1985–1999 incidents_00_14 ———-> Total number of incidents, 2000–2014 fatal_accidents_00_14—–> Total number of fatal accidents, 2000–2014 fatalities_00_14———> Total number of fatalities, 2000–2014

Relevant summary statistics

Import Dataset

# Load necessary libraries
library(readr)

# Load the dataset
dataset  <- read_csv("airline-safety_csv.csv")

## Rows: 56 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): airline
## dbl (7): avail_seat_km_per_week, incidents_85_99, fatal_accidents_85_99, fat...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(dataset)

# Summary statistics
summary(dataset)

##    airline          avail_seat_km_per_week incidents_85_99 
##  Length:56          Min.   :2.594e+08      Min.   : 0.000  
##  Class :character   1st Qu.:4.740e+08      1st Qu.: 2.000  
##  Mode  :character   Median :8.029e+08      Median : 4.000  
##                     Mean   :1.385e+09      Mean   : 7.179  
##                     3rd Qu.:1.847e+09      3rd Qu.: 8.000  
##                     Max.   :7.139e+09      Max.   :76.000  
##  fatal_accidents_85_99 fatalities_85_99 incidents_00_14  fatal_accidents_00_14
##  Min.   : 0.000        Min.   :  0.0    Min.   : 0.000   Min.   :0.0000       
##  1st Qu.: 0.000        1st Qu.:  0.0    1st Qu.: 1.000   1st Qu.:0.0000       
##  Median : 1.000        Median : 48.5    Median : 3.000   Median :0.0000       
##  Mean   : 2.179        Mean   :112.4    Mean   : 4.125   Mean   :0.6607       
##  3rd Qu.: 3.000        3rd Qu.:184.2    3rd Qu.: 5.250   3rd Qu.:1.0000       
##  Max.   :14.000        Max.   :535.0    Max.   :24.000   Max.   :3.0000       
##  fatalities_00_14
##  Min.   :  0.00  
##  1st Qu.:  0.00  
##  Median :  0.00  
##  Mean   : 55.52  
##  3rd Qu.: 83.25  
##  Max.   :537.00

# Checking missing values
sapply(dataset, function(x) sum(is.na(x)))

##                airline avail_seat_km_per_week        incidents_85_99 
##                      0                      0                      0 
##  fatal_accidents_85_99       fatalities_85_99        incidents_00_14 
##                      0                      0                      0 
##  fatal_accidents_00_14       fatalities_00_14 
##                      0                      0

# Visualizing missing values

library(naniar)

## Warning: package 'naniar' was built under R version 4.3.2

gg_miss_var(dataset)

# Assuming your dataset is stored in the 'dataset' variable

# Check for missing values
missing_values <- sapply(dataset, function(x) sum(is.na(x)))

# Identify columns with missing values
columns_with_missing <- names(which(missing_values > 0))

# Impute missing values using mean for numeric columns
for (col in columns_with_missing) {
  if (is.numeric(dataset[[col]])) {
    mean_value <- mean(dataset[[col]], na.rm = TRUE)
    dataset[[col]][is.na(dataset[[col]])] <- mean_value
  } else {
    # If it's a non-numeric column, you might use another strategy like imputing with the most frequent value
    most_frequent_value <- names(sort(table(dataset[[col]], decreasing = TRUE)))[1]
    dataset[[col]][is.na(dataset[[col]])] <- most_frequent_value
  }
}

# Verify that missing values are filled
sapply(dataset, function(x) sum(is.na(x)))

##                airline avail_seat_km_per_week        incidents_85_99 
##                      0                      0                      0 
##  fatal_accidents_85_99       fatalities_85_99        incidents_00_14 
##                      0                      0                      0 
##  fatal_accidents_00_14       fatalities_00_14 
##                      0                      0

head(dataset)

# Calculate accidents per 100,000 flight miles
dataset$accidents_per_100k_miles <- 
  (dataset$fatal_accidents_85_99 + dataset$fatal_accidents_00_14) / 
  (dataset$avail_seat_km_per_week / 100000)

# Visualize accidents per 100,000 flight miles
hist(dataset$accidents_per_100k_miles, 
     main = "Accidents per 100,000 Flight Miles",
     xlab = "Accidents per 100,000 Miles")

# Calculate a new metric, e.g., fatalities per incident
dataset$fatalities_per_incident <- 
  (dataset$fatalities_85_99 + dataset$fatalities_00_14) / 
  (dataset$incidents_85_99 + dataset$incidents_00_14)

plot(dataset$incidents_85_99 + dataset$incidents_00_14, 
     dataset$fatalities_per_incident,
     main = "Fatalities per Incident",
     xlab = "Total Incidents",
     ylab = "Fatalities per Incident")

# Perform statistical analysis on the new metric
t_test_result <- t.test(dataset$fatalities_per_incident)
print(t_test_result)

## 
##  One Sample t-test
## 
## data:  dataset$fatalities_per_incident
## t = 4.744, df = 54, p-value = 1.575e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  11.25900 27.74091
## sample estimates:
## mean of x 
##  19.49996

# Create a scatter plot of accidents over time
plot(dataset$incidents_85_99 + dataset$incidents_00_14, 
     dataset$avail_seat_km_per_week, 
     main = "Accidents Over Time",
     xlab = "Accidents",
     ylab = "Available Seat Kilometers (in 100 million)")

# Boxplot for selected columns
boxplot(dataset[, c("avail_seat_km_per_week", "incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99")])

# Identify outliers using z-scores
outliers <- as.data.frame(boxplot.stats(dataset$avail_seat_km_per_week)$out)
outliers

# Assuming your dataset is stored in the 'dataset' variable

# Define lower and upper percentiles
lower_percentile <- 0.05
upper_percentile <- 0.95

# Identify numeric columns for outlier removal
numeric_columns <- sapply(dataset, is.numeric)

# Loop through numeric columns and remove outliers
for (col in names(numeric_columns)[numeric_columns]) {
  lower_limit <- quantile(dataset[[col]], lower_percentile, na.rm = TRUE)
  upper_limit <- quantile(dataset[[col]], upper_percentile, na.rm = TRUE)
  
  # Remove outliers
  dataset[[col]] <- ifelse(dataset[[col]] < lower_limit, NA,
                           ifelse(dataset[[col]] > upper_limit, NA, dataset[[col]]))
}

# Verify that outliers are removed
summary(dataset)

##    airline          avail_seat_km_per_week incidents_85_99
##  Length:56          Min.   :3.014e+08      Min.   : 1.00  
##  Class :character   1st Qu.:4.970e+08      1st Qu.: 2.00  
##  Mode  :character   Median :8.029e+08      Median : 4.00  
##                     Mean   :1.156e+09      Mean   : 5.54  
##                     3rd Qu.:1.727e+09      3rd Qu.: 7.75  
##                     Max.   :3.427e+09      Max.   :21.00  
##                     NA's   :6              NA's   :6      
##  fatal_accidents_85_99 fatalities_85_99 incidents_00_14  fatal_accidents_00_14
##  Min.   :0.00          Min.   :  0.00   Min.   : 0.000   Min.   :0.0000       
##  1st Qu.:0.00          1st Qu.:  0.00   1st Qu.: 1.000   1st Qu.:0.0000       
##  Median :1.00          Median : 34.00   Median : 3.000   Median :0.0000       
##  Mean   :1.66          Mean   : 90.85   Mean   : 3.321   Mean   :0.6182       
##  3rd Qu.:3.00          3rd Qu.:159.00   3rd Qu.: 5.000   3rd Qu.:1.0000       
##  Max.   :7.00          Max.   :407.00   Max.   :11.000   Max.   :2.0000       
##  NA's   :3             NA's   :3        NA's   :3        NA's   :1            
##  fatalities_00_14 accidents_per_100k_miles fatalities_per_incident
##  Min.   :  0.00   Min.   :0.0000000        Min.   : 0.0000        
##  1st Qu.:  0.00   1st Qu.:0.0000533        1st Qu.: 0.3368        
##  Median :  0.00   Median :0.0001817        Median : 8.4833        
##  Mean   : 34.32   Mean   :0.0002579        Mean   :13.9006        
##  3rd Qu.: 46.00   3rd Qu.:0.0002981        3rd Qu.:19.3357        
##  Max.   :283.00   Max.   :0.0012106        Max.   :70.7500        
##  NA's   :3        NA's   :3                NA's   :4

# Assuming you want to impute missing values with the mean for numeric columns
library(dplyr)

dataset <- dataset %>%
  mutate(across(where(is.numeric), ~ifelse(is.na(.), mean(., na.rm = TRUE), .)))
# Assuming you want to remove rows with any missing values
dataset <- na.omit(dataset)

# Correlation matrix
cor_matrix <- cor(dataset[, c("avail_seat_km_per_week", "incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99")])

# Visualization of correlation matrix

library(corrplot)

## corrplot 0.92 loaded

corrplot(cor_matrix, method = "circle", type = "upper", order = "hclust")

# Min-Max Scaling
min_max_scaling <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

# Apply Min-Max Scaling to numeric columns
dataset <- as.data.frame(lapply(dataset[, -1], min_max_scaling))

# Add back the 'airline' column
dataset$airline <- dataset$airline

# Display the normalized data
print(dataset)

##    avail_seat_km_per_week incidents_85_99 fatal_accidents_85_99
## 1             0.006248332           0.050             0.0000000
## 2             0.286799875           0.227             0.2371968
## 3             0.027014349           0.250             0.0000000
## 4             0.094552926           0.100             0.1428571
## 5             0.500415714           0.050             0.0000000
## 6             0.864797889           0.650             0.5714286
## 7             0.181710906           0.050             0.1428571
## 8             0.130808150           0.100             0.0000000
## 9             0.212459263           0.200             0.0000000
## 10            0.126916394           0.300             0.2857143
## 11            0.492729802           0.100             0.1428571
## 12            0.273548518           1.000             0.7142857
## 13            0.018194348           0.000             0.0000000
## 14            0.030572231           0.200             0.4285714
## 15            0.921037847           0.150             0.0000000
## 16            0.729910478           0.227             0.0000000
## 17            0.163779904           0.550             0.8571429
## 18            0.037311124           0.050             0.1428571
## 19            0.079711939           0.100             0.1428571
## 20            0.273548518           0.227             0.2371968
## 21            0.082018511           0.350             0.4285714
## 22            0.010901321           0.000             0.1428571
## 23            0.059895012           0.227             0.7142857
## 24            0.065624116           0.000             0.0000000
## 25            0.099827825           0.450             0.4285714
## 26            0.000000000           0.000             0.0000000
## 27            0.061596419           0.227             0.0000000
## 28            0.278970109           0.150             0.1428571
## 29            0.407288570           0.100             0.1428571
## 30            0.273548518           0.050             0.0000000
## 31            0.503394122           0.300             0.1428571
## 32            0.458583736           0.550             0.7142857
## 33            0.224176819           0.100             0.2857143
## 34            1.000000000           0.250             0.1428571
## 35            0.236081962           0.100             0.1428571
## 36            0.015097957           0.350             0.4285714
## 37            0.035719055           0.300             0.5714286
## 38            0.517110972           0.000             0.0000000
## 39            0.273548518           0.200             0.4285714
## 40            0.122103618           0.200             0.0000000
## 41            0.178645564           0.300             0.2857143
## 42            0.664121151           0.050             0.2857143
## 43            0.112033889           0.050             0.1428571
## 44            0.952001105           0.000             0.0000000
## 45            0.007744657           0.050             0.1428571
## 46            0.157183360           0.050             0.1428571
## 47            0.273548518           0.100             0.1428571
## 48            0.386482564           0.350             0.4285714
## 49            0.101675445           0.227             0.0000000
## 50            0.448433708           0.350             0.5714286
## 51            0.526284712           0.350             0.4285714
## 52            0.273548518           0.900             0.2371968
## 53            0.689345568           0.750             1.0000000
## 54            0.103580687           0.300             0.4285714
## 55            0.225227231           0.000             0.0000000
## 56            0.041304645           0.400             0.1428571
##    fatalities_85_99 incidents_00_14 fatal_accidents_00_14 fatalities_00_14
## 1       0.000000000      0.00000000             0.0000000      0.000000000
## 2       0.314496314      0.54545455             0.5000000      0.310954064
## 3       0.000000000      0.09090909             0.0000000      0.000000000
## 4       0.157248157      0.45454545             0.0000000      0.000000000
## 5       0.000000000      0.18181818             0.0000000      0.000000000
## 6       0.194103194      0.54545455             1.0000000      0.121274752
## 7       0.808353808      0.36363636             0.5000000      0.558303887
## 8       0.000000000      0.45454545             0.5000000      0.024734982
## 9       0.000000000      0.45454545             0.5000000      0.310954064
## 10      0.122850123      0.36363636             0.0000000      0.000000000
## 11      0.002457002      0.63636364             0.0000000      0.000000000
## 12      0.248157248      0.30188679             0.3090909      0.121274752
## 13      0.000000000      0.09090909             0.0000000      0.000000000
## 14      0.793611794      0.00000000             0.0000000      0.000000000
## 15      0.000000000      0.54545455             0.0000000      0.000000000
## 16      0.000000000      0.18181818             0.0000000      0.000000000
## 17      0.223216355      0.18181818             0.5000000      0.795053004
## 18      0.039312039      0.00000000             0.0000000      0.000000000
## 19      0.115479115      0.00000000             0.0000000      0.000000000
## 20      1.000000000      0.30188679             1.0000000      0.180212014
## 21      0.692874693      0.36363636             0.5000000      0.049469965
## 22      0.009828010      0.09090909             0.0000000      0.000000000
## 23      0.410319410      0.45454545             1.0000000      0.325088339
## 24      0.000000000      0.00000000             0.0000000      0.000000000
## 25      0.638820639      0.36363636             1.0000000      0.077738516
## 26      0.000000000      0.27272727             0.5000000      0.505300353
## 27      0.000000000      0.09090909             0.0000000      0.000000000
## 28      0.363636364      0.45454545             0.0000000      0.000000000
## 29      0.223216355      0.00000000             0.0000000      0.000000000
## 30      0.000000000      0.18181818             1.0000000      1.000000000
## 31      0.007371007      0.09090909             0.0000000      0.000000000
## 32      0.223216355      0.09090909             0.0000000      0.000000000
## 33      0.051597052      0.00000000             0.0000000      0.000000000
## 34      0.004914005      0.27272727             0.0000000      0.000000000
## 35      0.083538084      0.27272727             1.0000000      0.121274752
## 36      0.574938575      0.90909091             1.0000000      0.162544170
## 37      0.181818182      0.18181818             0.5000000      0.003533569
## 38      0.000000000      0.45454545             0.0000000      0.000000000
## 39      0.125307125      0.27272727             0.0000000      0.000000000
## 40      0.000000000      0.54545455             0.5000000      0.388692580
## 41      0.769041769      1.00000000             0.0000000      0.000000000
## 42      0.014742015      0.18181818             0.5000000      0.293286219
## 43      0.390663391      0.09090909             0.0000000      0.000000000
## 44      0.000000000      0.72727273             0.0000000      0.000000000
## 45      0.034398034      0.36363636             0.0000000      0.000000000
## 46      0.562653563      0.27272727             0.0000000      0.000000000
## 47      0.007371007      0.09090909             0.5000000      0.010600707
## 48      0.240786241      0.63636364             1.0000000      0.664310954
## 49      0.000000000      0.00000000             0.0000000      0.000000000
## 50      0.756756757      0.18181818             0.5000000      0.003533569
## 51      0.157248157      0.72727273             1.0000000      0.296819788
## 52      0.783783784      0.30188679             1.0000000      0.385159011
## 53      0.550368550      1.00000000             1.0000000      0.081272085
## 54      0.420147420      0.09090909             0.0000000      0.000000000
## 55      0.000000000      0.00000000             0.0000000      0.000000000
## 56      0.201474201      0.18181818             0.0000000      0.000000000
##    accidents_per_100k_miles fatalities_per_incident
## 1                0.00000000             0.000000000
## 2                0.21301452             0.037231750
## 3                0.00000000             0.000000000
## 4                0.13839057             0.113074205
## 5                0.00000000             0.000000000
## 6                0.16498274             0.293992933
## 7                0.19005141             0.196474833
## 8                0.11631141             0.012367491
## 9                0.08556659             0.124381625
## 10               0.23667608             0.064246707
## 11               0.04486199             0.001413428
## 12               0.12638988             0.192300539
## 13               0.00000000             0.000000000
## 14               0.62431395             0.913074205
## 15               0.00000000             0.000000000
## 16               0.00000000             0.000000000
## 17               0.71101611             0.767289248
## 18               0.19761930             0.113074205
## 19               0.15005033             0.221436985
## 20               0.17721123             0.134864547
## 21               0.59244359             0.348645465
## 22               0.24624212             0.028268551
## 23               0.21301452             0.122025913
## 24               0.00000000             0.000000000
## 25               0.67335562             0.284704695
## 26               0.27407757             0.505300353
## 27               0.00000000             0.000000000
## 28               0.07040676             0.232430310
## 29               0.05247142             0.196474833
## 30               0.59550848             1.000000000
## 31               0.04406440             0.005300353
## 32               0.23810999             0.462082088
## 33               0.16487873             0.098939929
## 34               0.02410644             0.003140950
## 35               0.23846339             0.196474833
## 36               0.21301452             0.219866510
## 37               1.00000000             0.117785630
## 38               0.00000000             0.000000000
## 39               0.83801089             0.090106007
## 40               0.12094412             0.141342756
## 41               0.19216922             0.245779348
## 42               0.10425710             0.314487633
## 43               0.12678607             0.749116608
## 44               0.00000000             0.000000000
## 45               0.25370317             0.032979976
## 46               0.10421561             0.647349823
## 47               0.63693076             0.021201413
## 48               0.27366045             0.269493522
## 49               0.00000000             0.196474833
## 50               0.24254558             0.436749117
## 51               0.21222317             0.130742049
## 52               0.11569976             0.183317272
## 53               0.30273101             0.129302447
## 54               0.39643301             0.302120141
## 55               0.00000000             0.000000000
## 56               0.19188975             0.105364600

# Install and load the necessary library

library(caTools)

## Warning: package 'caTools' was built under R version 4.3.2

set.seed(123)  # Set seed for reproducibility
split <- sample.split(dataset$incidents_00_14, SplitRatio = 0.7)

train_data <- subset(dataset, split == TRUE)
test_data <- subset(dataset, split == FALSE)

# Decision Tree with limited depth
library(rpart)
library(randomForest)

## Warning: package 'randomForest' was built under R version 4.3.2

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

dt_model_fast <- rpart(incidents_00_14 ~ ., data = train_data, method = "class", maxdepth = 10)

# Random Forest
rf_model_fast <- randomForest(incidents_00_14 ~ ., data = train_data, ntree = 100)

# Predictions on the test set
dt_predictions_fast <- predict(dt_model_fast, test_data, type = "class")
rf_predictions_fast <- predict(rf_model_fast, test_data)

# Confusion Matrix
confusion_matrix_dt_fast <- table(dt_predictions_fast, test_data$incidents_00_14)
confusion_matrix_rf_fast <- table(rf_predictions_fast, test_data$incidents_00_14)

# Compare Confusion Matrices
print("Confusion Matrix for Faster Decision Tree:")

## [1] "Confusion Matrix for Faster Decision Tree:"

print(confusion_matrix_dt_fast)

##                     
## dt_predictions_fast  0 0.0909090909090909 0.181818181818182 0.272727272727273
##   0                  2                  3                 1                 1
##   0.0909090909090909 1                  0                 0                 0
##   0.181818181818182  0                  0                 0                 0
##   0.272727272727273  0                  0                 0                 0
##   0.30188679245283   0                  0                 1                 0
##   0.363636363636364  0                  0                 0                 0
##   0.454545454545455  0                  0                 0                 0
##   0.545454545454545  0                  0                 0                 0
##   0.636363636363636  0                  0                 0                 0
##   0.727272727272727  0                  0                 0                 0
##   0.909090909090909  0                  0                 0                 0
##   1                  0                  0                 0                 0
##                     
## dt_predictions_fast  0.30188679245283 0.363636363636364 0.454545454545455
##   0                                 0                 1                 1
##   0.0909090909090909                0                 0                 0
##   0.181818181818182                 0                 0                 0
##   0.272727272727273                 0                 0                 0
##   0.30188679245283                  1                 0                 1
##   0.363636363636364                 0                 0                 0
##   0.454545454545455                 0                 0                 0
##   0.545454545454545                 0                 0                 0
##   0.636363636363636                 0                 0                 0
##   0.727272727272727                 0                 0                 0
##   0.909090909090909                 0                 0                 0
##   1                                 0                 0                 0
##                     
## dt_predictions_fast  0.545454545454545 0.636363636363636 0.727272727272727 1
##   0                                  0                 1                 1 0
##   0.0909090909090909                 0                 0                 0 1
##   0.181818181818182                  0                 0                 0 0
##   0.272727272727273                  0                 0                 0 0
##   0.30188679245283                   1                 0                 0 0
##   0.363636363636364                  0                 0                 0 0
##   0.454545454545455                  0                 0                 0 0
##   0.545454545454545                  0                 0                 0 0
##   0.636363636363636                  0                 0                 0 0
##   0.727272727272727                  0                 0                 0 0
##   0.909090909090909                  0                 0                 0 0
##   1                                  0                 0                 0 0

print("Confusion Matrix for Random Forest:")

## [1] "Confusion Matrix for Random Forest:"

print(confusion_matrix_rf_fast)

##                    
## rf_predictions_fast 0 0.0909090909090909 0.181818181818182 0.272727272727273
##   0.119878954378954 1                  0                 0                 0
##   0.166934203637034 0                  0                 0                 0
##   0.20487270957554  1                  0                 0                 0
##   0.214646369353917 0                  0                 0                 1
##   0.225266056794359 0                  0                 0                 0
##   0.227013347763348 0                  0                 0                 0
##   0.229343434343434 0                  1                 0                 0
##   0.229791261673337 0                  1                 0                 0
##   0.243386125404993 0                  1                 0                 0
##   0.252924433009339 1                  0                 0                 0
##   0.299328088578088 0                  0                 0                 0
##   0.358843053173242 0                  0                 0                 0
##   0.366833333333334 0                  0                 1                 0
##   0.415569754145226 0                  0                 0                 0
##   0.427694110920526 0                  0                 1                 0
##   0.493007146941109 0                  0                 0                 0
##   0.552459691252144 0                  0                 0                 0
##                    
## rf_predictions_fast 0.30188679245283 0.363636363636364 0.454545454545455
##   0.119878954378954                0                 0                 0
##   0.166934203637034                0                 0                 0
##   0.20487270957554                 0                 0                 0
##   0.214646369353917                0                 0                 0
##   0.225266056794359                0                 0                 0
##   0.227013347763348                0                 0                 1
##   0.229343434343434                0                 0                 0
##   0.229791261673337                0                 0                 0
##   0.243386125404993                0                 0                 0
##   0.252924433009339                0                 0                 0
##   0.299328088578088                0                 0                 0
##   0.358843053173242                0                 1                 0
##   0.366833333333334                0                 0                 0
##   0.415569754145226                1                 0                 0
##   0.427694110920526                0                 0                 0
##   0.493007146941109                0                 0                 0
##   0.552459691252144                0                 0                 1
##                    
## rf_predictions_fast 0.545454545454545 0.636363636363636 0.727272727272727 1
##   0.119878954378954                 0                 0                 0 0
##   0.166934203637034                 0                 1                 0 0
##   0.20487270957554                  0                 0                 0 0
##   0.214646369353917                 0                 0                 0 0
##   0.225266056794359                 0                 0                 0 1
##   0.227013347763348                 0                 0                 0 0
##   0.229343434343434                 0                 0                 0 0
##   0.229791261673337                 0                 0                 0 0
##   0.243386125404993                 0                 0                 0 0
##   0.252924433009339                 0                 0                 0 0
##   0.299328088578088                 0                 0                 1 0
##   0.358843053173242                 0                 0                 0 0
##   0.366833333333334                 0                 0                 0 0
##   0.415569754145226                 0                 0                 0 0
##   0.427694110920526                 0                 0                 0 0
##   0.493007146941109                 1                 0                 0 0
##   0.552459691252144                 0                 0                 0 0

Conclusion

When It comes to the total number of fatalities based on the data set I relied on the T-test to give me the answer and the results are below.

Data Information:
    Data: dataset$fatalities_per_incident
    Sample Mean: 19.49996

T-Test Results:
    t-Value: 4.744
    Degrees of Freedom (df): 54
    p-Value: 1.575e-05 (very small)

Confidence Interval:
    95% Confidence Interval: (11.25900, 27.74091)

Hypothesis Testing:
    Null Hypothesis ($H_0$): The true mean of fatalities_per_incident is equal to 0.
    Alternative Hypothesis ($H_a$): The true mean of fatalities_per_incident is not equal to 0.

Interpretation:
    The t-test statistic (4.744) is associated with a very small p-value (1.575e-05), which is below the commonly used significance level of 0.05.
    Therefore, you would reject the null hypothesis.
    The 95% confidence interval (11.25900, 27.74091) indicates the range within which you are reasonably confident the true mean of fatalities_per_incident lies.

In conclusion, based on this t-test, there is evidence to suggest that the true mean of fatalities_per_incident is significantly different from 0. The sample mean is 19.49996, and the 95% confidence interval provides a range for the true population mean.

Which means for every fatality per incident falls between 11% to 27%.