library(tidyverse)
This approach aligns with industry best practices and allows for a more nuanced evaluation of safety measures, providing valuable insights for both the aviation industry and the broader public. Our goal is not only to identify patterns but also to contribute to the ongoing efforts to enhance aviation safety and inform decision-making in the industry.
The dataset is hosted on kaggle so we are using from kaggle
Additionally, we will explore various metrics beyond accident counts, including rates of fatalities and potentially other criteria. By examining these multiple dimensions, we can gain a more holistic understanding of airline safety and identify trends that might be obscured by raw counts alone.
The data set has been made available by kaggle at the following link
airline —————–> Airline (asterisk indicates that regional subsidiaries are included) avail_seat_km_per_week —-> Available seat kilometers flown every week incidents_85_99 ———–>Total number of incidents, 1985–1999 fatal_accidents_85_99——> Total number of fatal accidents, 1985–1999 fatalities_85_99 ——–>Total number of fatalities, 1985–1999 incidents_00_14 ———-> Total number of incidents, 2000–2014 fatal_accidents_00_14—–> Total number of fatal accidents, 2000–2014 fatalities_00_14———> Total number of fatalities, 2000–2014
# Load necessary libraries
library(readr)
# Load the dataset
dataset <- read_csv("airline-safety_csv.csv")
## Rows: 56 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): airline
## dbl (7): avail_seat_km_per_week, incidents_85_99, fatal_accidents_85_99, fat...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dataset)
# Summary statistics
summary(dataset)
## airline avail_seat_km_per_week incidents_85_99
## Length:56 Min. :2.594e+08 Min. : 0.000
## Class :character 1st Qu.:4.740e+08 1st Qu.: 2.000
## Mode :character Median :8.029e+08 Median : 4.000
## Mean :1.385e+09 Mean : 7.179
## 3rd Qu.:1.847e+09 3rd Qu.: 8.000
## Max. :7.139e+09 Max. :76.000
## fatal_accidents_85_99 fatalities_85_99 incidents_00_14 fatal_accidents_00_14
## Min. : 0.000 Min. : 0.0 Min. : 0.000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.: 1.000 1st Qu.:0.0000
## Median : 1.000 Median : 48.5 Median : 3.000 Median :0.0000
## Mean : 2.179 Mean :112.4 Mean : 4.125 Mean :0.6607
## 3rd Qu.: 3.000 3rd Qu.:184.2 3rd Qu.: 5.250 3rd Qu.:1.0000
## Max. :14.000 Max. :535.0 Max. :24.000 Max. :3.0000
## fatalities_00_14
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 0.00
## Mean : 55.52
## 3rd Qu.: 83.25
## Max. :537.00
# Checking missing values
sapply(dataset, function(x) sum(is.na(x)))
## airline avail_seat_km_per_week incidents_85_99
## 0 0 0
## fatal_accidents_85_99 fatalities_85_99 incidents_00_14
## 0 0 0
## fatal_accidents_00_14 fatalities_00_14
## 0 0
# Visualizing missing values
library(naniar)
## Warning: package 'naniar' was built under R version 4.3.2
gg_miss_var(dataset)
# Assuming your dataset is stored in the 'dataset' variable
# Check for missing values
missing_values <- sapply(dataset, function(x) sum(is.na(x)))
# Identify columns with missing values
columns_with_missing <- names(which(missing_values > 0))
# Impute missing values using mean for numeric columns
for (col in columns_with_missing) {
if (is.numeric(dataset[[col]])) {
mean_value <- mean(dataset[[col]], na.rm = TRUE)
dataset[[col]][is.na(dataset[[col]])] <- mean_value
} else {
# If it's a non-numeric column, you might use another strategy like imputing with the most frequent value
most_frequent_value <- names(sort(table(dataset[[col]], decreasing = TRUE)))[1]
dataset[[col]][is.na(dataset[[col]])] <- most_frequent_value
}
}
# Verify that missing values are filled
sapply(dataset, function(x) sum(is.na(x)))
## airline avail_seat_km_per_week incidents_85_99
## 0 0 0
## fatal_accidents_85_99 fatalities_85_99 incidents_00_14
## 0 0 0
## fatal_accidents_00_14 fatalities_00_14
## 0 0
head(dataset)
# Calculate accidents per 100,000 flight miles
dataset$accidents_per_100k_miles <-
(dataset$fatal_accidents_85_99 + dataset$fatal_accidents_00_14) /
(dataset$avail_seat_km_per_week / 100000)
# Visualize accidents per 100,000 flight miles
hist(dataset$accidents_per_100k_miles,
main = "Accidents per 100,000 Flight Miles",
xlab = "Accidents per 100,000 Miles")
# Calculate a new metric, e.g., fatalities per incident
dataset$fatalities_per_incident <-
(dataset$fatalities_85_99 + dataset$fatalities_00_14) /
(dataset$incidents_85_99 + dataset$incidents_00_14)
plot(dataset$incidents_85_99 + dataset$incidents_00_14,
dataset$fatalities_per_incident,
main = "Fatalities per Incident",
xlab = "Total Incidents",
ylab = "Fatalities per Incident")
# Perform statistical analysis on the new metric
t_test_result <- t.test(dataset$fatalities_per_incident)
print(t_test_result)
##
## One Sample t-test
##
## data: dataset$fatalities_per_incident
## t = 4.744, df = 54, p-value = 1.575e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 11.25900 27.74091
## sample estimates:
## mean of x
## 19.49996
# Create a scatter plot of accidents over time
plot(dataset$incidents_85_99 + dataset$incidents_00_14,
dataset$avail_seat_km_per_week,
main = "Accidents Over Time",
xlab = "Accidents",
ylab = "Available Seat Kilometers (in 100 million)")
# Boxplot for selected columns
boxplot(dataset[, c("avail_seat_km_per_week", "incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99")])
# Identify outliers using z-scores
outliers <- as.data.frame(boxplot.stats(dataset$avail_seat_km_per_week)$out)
outliers
# Assuming your dataset is stored in the 'dataset' variable
# Define lower and upper percentiles
lower_percentile <- 0.05
upper_percentile <- 0.95
# Identify numeric columns for outlier removal
numeric_columns <- sapply(dataset, is.numeric)
# Loop through numeric columns and remove outliers
for (col in names(numeric_columns)[numeric_columns]) {
lower_limit <- quantile(dataset[[col]], lower_percentile, na.rm = TRUE)
upper_limit <- quantile(dataset[[col]], upper_percentile, na.rm = TRUE)
# Remove outliers
dataset[[col]] <- ifelse(dataset[[col]] < lower_limit, NA,
ifelse(dataset[[col]] > upper_limit, NA, dataset[[col]]))
}
# Verify that outliers are removed
summary(dataset)
## airline avail_seat_km_per_week incidents_85_99
## Length:56 Min. :3.014e+08 Min. : 1.00
## Class :character 1st Qu.:4.970e+08 1st Qu.: 2.00
## Mode :character Median :8.029e+08 Median : 4.00
## Mean :1.156e+09 Mean : 5.54
## 3rd Qu.:1.727e+09 3rd Qu.: 7.75
## Max. :3.427e+09 Max. :21.00
## NA's :6 NA's :6
## fatal_accidents_85_99 fatalities_85_99 incidents_00_14 fatal_accidents_00_14
## Min. :0.00 Min. : 0.00 Min. : 0.000 Min. :0.0000
## 1st Qu.:0.00 1st Qu.: 0.00 1st Qu.: 1.000 1st Qu.:0.0000
## Median :1.00 Median : 34.00 Median : 3.000 Median :0.0000
## Mean :1.66 Mean : 90.85 Mean : 3.321 Mean :0.6182
## 3rd Qu.:3.00 3rd Qu.:159.00 3rd Qu.: 5.000 3rd Qu.:1.0000
## Max. :7.00 Max. :407.00 Max. :11.000 Max. :2.0000
## NA's :3 NA's :3 NA's :3 NA's :1
## fatalities_00_14 accidents_per_100k_miles fatalities_per_incident
## Min. : 0.00 Min. :0.0000000 Min. : 0.0000
## 1st Qu.: 0.00 1st Qu.:0.0000533 1st Qu.: 0.3368
## Median : 0.00 Median :0.0001817 Median : 8.4833
## Mean : 34.32 Mean :0.0002579 Mean :13.9006
## 3rd Qu.: 46.00 3rd Qu.:0.0002981 3rd Qu.:19.3357
## Max. :283.00 Max. :0.0012106 Max. :70.7500
## NA's :3 NA's :3 NA's :4
# Assuming you want to impute missing values with the mean for numeric columns
library(dplyr)
dataset <- dataset %>%
mutate(across(where(is.numeric), ~ifelse(is.na(.), mean(., na.rm = TRUE), .)))
# Assuming you want to remove rows with any missing values
dataset <- na.omit(dataset)
# Correlation matrix
cor_matrix <- cor(dataset[, c("avail_seat_km_per_week", "incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99")])
# Visualization of correlation matrix
library(corrplot)
## corrplot 0.92 loaded
corrplot(cor_matrix, method = "circle", type = "upper", order = "hclust")
# Min-Max Scaling
min_max_scaling <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
# Apply Min-Max Scaling to numeric columns
dataset <- as.data.frame(lapply(dataset[, -1], min_max_scaling))
# Add back the 'airline' column
dataset$airline <- dataset$airline
# Display the normalized data
print(dataset)
## avail_seat_km_per_week incidents_85_99 fatal_accidents_85_99
## 1 0.006248332 0.050 0.0000000
## 2 0.286799875 0.227 0.2371968
## 3 0.027014349 0.250 0.0000000
## 4 0.094552926 0.100 0.1428571
## 5 0.500415714 0.050 0.0000000
## 6 0.864797889 0.650 0.5714286
## 7 0.181710906 0.050 0.1428571
## 8 0.130808150 0.100 0.0000000
## 9 0.212459263 0.200 0.0000000
## 10 0.126916394 0.300 0.2857143
## 11 0.492729802 0.100 0.1428571
## 12 0.273548518 1.000 0.7142857
## 13 0.018194348 0.000 0.0000000
## 14 0.030572231 0.200 0.4285714
## 15 0.921037847 0.150 0.0000000
## 16 0.729910478 0.227 0.0000000
## 17 0.163779904 0.550 0.8571429
## 18 0.037311124 0.050 0.1428571
## 19 0.079711939 0.100 0.1428571
## 20 0.273548518 0.227 0.2371968
## 21 0.082018511 0.350 0.4285714
## 22 0.010901321 0.000 0.1428571
## 23 0.059895012 0.227 0.7142857
## 24 0.065624116 0.000 0.0000000
## 25 0.099827825 0.450 0.4285714
## 26 0.000000000 0.000 0.0000000
## 27 0.061596419 0.227 0.0000000
## 28 0.278970109 0.150 0.1428571
## 29 0.407288570 0.100 0.1428571
## 30 0.273548518 0.050 0.0000000
## 31 0.503394122 0.300 0.1428571
## 32 0.458583736 0.550 0.7142857
## 33 0.224176819 0.100 0.2857143
## 34 1.000000000 0.250 0.1428571
## 35 0.236081962 0.100 0.1428571
## 36 0.015097957 0.350 0.4285714
## 37 0.035719055 0.300 0.5714286
## 38 0.517110972 0.000 0.0000000
## 39 0.273548518 0.200 0.4285714
## 40 0.122103618 0.200 0.0000000
## 41 0.178645564 0.300 0.2857143
## 42 0.664121151 0.050 0.2857143
## 43 0.112033889 0.050 0.1428571
## 44 0.952001105 0.000 0.0000000
## 45 0.007744657 0.050 0.1428571
## 46 0.157183360 0.050 0.1428571
## 47 0.273548518 0.100 0.1428571
## 48 0.386482564 0.350 0.4285714
## 49 0.101675445 0.227 0.0000000
## 50 0.448433708 0.350 0.5714286
## 51 0.526284712 0.350 0.4285714
## 52 0.273548518 0.900 0.2371968
## 53 0.689345568 0.750 1.0000000
## 54 0.103580687 0.300 0.4285714
## 55 0.225227231 0.000 0.0000000
## 56 0.041304645 0.400 0.1428571
## fatalities_85_99 incidents_00_14 fatal_accidents_00_14 fatalities_00_14
## 1 0.000000000 0.00000000 0.0000000 0.000000000
## 2 0.314496314 0.54545455 0.5000000 0.310954064
## 3 0.000000000 0.09090909 0.0000000 0.000000000
## 4 0.157248157 0.45454545 0.0000000 0.000000000
## 5 0.000000000 0.18181818 0.0000000 0.000000000
## 6 0.194103194 0.54545455 1.0000000 0.121274752
## 7 0.808353808 0.36363636 0.5000000 0.558303887
## 8 0.000000000 0.45454545 0.5000000 0.024734982
## 9 0.000000000 0.45454545 0.5000000 0.310954064
## 10 0.122850123 0.36363636 0.0000000 0.000000000
## 11 0.002457002 0.63636364 0.0000000 0.000000000
## 12 0.248157248 0.30188679 0.3090909 0.121274752
## 13 0.000000000 0.09090909 0.0000000 0.000000000
## 14 0.793611794 0.00000000 0.0000000 0.000000000
## 15 0.000000000 0.54545455 0.0000000 0.000000000
## 16 0.000000000 0.18181818 0.0000000 0.000000000
## 17 0.223216355 0.18181818 0.5000000 0.795053004
## 18 0.039312039 0.00000000 0.0000000 0.000000000
## 19 0.115479115 0.00000000 0.0000000 0.000000000
## 20 1.000000000 0.30188679 1.0000000 0.180212014
## 21 0.692874693 0.36363636 0.5000000 0.049469965
## 22 0.009828010 0.09090909 0.0000000 0.000000000
## 23 0.410319410 0.45454545 1.0000000 0.325088339
## 24 0.000000000 0.00000000 0.0000000 0.000000000
## 25 0.638820639 0.36363636 1.0000000 0.077738516
## 26 0.000000000 0.27272727 0.5000000 0.505300353
## 27 0.000000000 0.09090909 0.0000000 0.000000000
## 28 0.363636364 0.45454545 0.0000000 0.000000000
## 29 0.223216355 0.00000000 0.0000000 0.000000000
## 30 0.000000000 0.18181818 1.0000000 1.000000000
## 31 0.007371007 0.09090909 0.0000000 0.000000000
## 32 0.223216355 0.09090909 0.0000000 0.000000000
## 33 0.051597052 0.00000000 0.0000000 0.000000000
## 34 0.004914005 0.27272727 0.0000000 0.000000000
## 35 0.083538084 0.27272727 1.0000000 0.121274752
## 36 0.574938575 0.90909091 1.0000000 0.162544170
## 37 0.181818182 0.18181818 0.5000000 0.003533569
## 38 0.000000000 0.45454545 0.0000000 0.000000000
## 39 0.125307125 0.27272727 0.0000000 0.000000000
## 40 0.000000000 0.54545455 0.5000000 0.388692580
## 41 0.769041769 1.00000000 0.0000000 0.000000000
## 42 0.014742015 0.18181818 0.5000000 0.293286219
## 43 0.390663391 0.09090909 0.0000000 0.000000000
## 44 0.000000000 0.72727273 0.0000000 0.000000000
## 45 0.034398034 0.36363636 0.0000000 0.000000000
## 46 0.562653563 0.27272727 0.0000000 0.000000000
## 47 0.007371007 0.09090909 0.5000000 0.010600707
## 48 0.240786241 0.63636364 1.0000000 0.664310954
## 49 0.000000000 0.00000000 0.0000000 0.000000000
## 50 0.756756757 0.18181818 0.5000000 0.003533569
## 51 0.157248157 0.72727273 1.0000000 0.296819788
## 52 0.783783784 0.30188679 1.0000000 0.385159011
## 53 0.550368550 1.00000000 1.0000000 0.081272085
## 54 0.420147420 0.09090909 0.0000000 0.000000000
## 55 0.000000000 0.00000000 0.0000000 0.000000000
## 56 0.201474201 0.18181818 0.0000000 0.000000000
## accidents_per_100k_miles fatalities_per_incident
## 1 0.00000000 0.000000000
## 2 0.21301452 0.037231750
## 3 0.00000000 0.000000000
## 4 0.13839057 0.113074205
## 5 0.00000000 0.000000000
## 6 0.16498274 0.293992933
## 7 0.19005141 0.196474833
## 8 0.11631141 0.012367491
## 9 0.08556659 0.124381625
## 10 0.23667608 0.064246707
## 11 0.04486199 0.001413428
## 12 0.12638988 0.192300539
## 13 0.00000000 0.000000000
## 14 0.62431395 0.913074205
## 15 0.00000000 0.000000000
## 16 0.00000000 0.000000000
## 17 0.71101611 0.767289248
## 18 0.19761930 0.113074205
## 19 0.15005033 0.221436985
## 20 0.17721123 0.134864547
## 21 0.59244359 0.348645465
## 22 0.24624212 0.028268551
## 23 0.21301452 0.122025913
## 24 0.00000000 0.000000000
## 25 0.67335562 0.284704695
## 26 0.27407757 0.505300353
## 27 0.00000000 0.000000000
## 28 0.07040676 0.232430310
## 29 0.05247142 0.196474833
## 30 0.59550848 1.000000000
## 31 0.04406440 0.005300353
## 32 0.23810999 0.462082088
## 33 0.16487873 0.098939929
## 34 0.02410644 0.003140950
## 35 0.23846339 0.196474833
## 36 0.21301452 0.219866510
## 37 1.00000000 0.117785630
## 38 0.00000000 0.000000000
## 39 0.83801089 0.090106007
## 40 0.12094412 0.141342756
## 41 0.19216922 0.245779348
## 42 0.10425710 0.314487633
## 43 0.12678607 0.749116608
## 44 0.00000000 0.000000000
## 45 0.25370317 0.032979976
## 46 0.10421561 0.647349823
## 47 0.63693076 0.021201413
## 48 0.27366045 0.269493522
## 49 0.00000000 0.196474833
## 50 0.24254558 0.436749117
## 51 0.21222317 0.130742049
## 52 0.11569976 0.183317272
## 53 0.30273101 0.129302447
## 54 0.39643301 0.302120141
## 55 0.00000000 0.000000000
## 56 0.19188975 0.105364600
# Install and load the necessary library
library(caTools)
## Warning: package 'caTools' was built under R version 4.3.2
set.seed(123) # Set seed for reproducibility
split <- sample.split(dataset$incidents_00_14, SplitRatio = 0.7)
train_data <- subset(dataset, split == TRUE)
test_data <- subset(dataset, split == FALSE)
# Decision Tree with limited depth
library(rpart)
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.3.2
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
dt_model_fast <- rpart(incidents_00_14 ~ ., data = train_data, method = "class", maxdepth = 10)
# Random Forest
rf_model_fast <- randomForest(incidents_00_14 ~ ., data = train_data, ntree = 100)
# Predictions on the test set
dt_predictions_fast <- predict(dt_model_fast, test_data, type = "class")
rf_predictions_fast <- predict(rf_model_fast, test_data)
# Confusion Matrix
confusion_matrix_dt_fast <- table(dt_predictions_fast, test_data$incidents_00_14)
confusion_matrix_rf_fast <- table(rf_predictions_fast, test_data$incidents_00_14)
# Compare Confusion Matrices
print("Confusion Matrix for Faster Decision Tree:")
## [1] "Confusion Matrix for Faster Decision Tree:"
print(confusion_matrix_dt_fast)
##
## dt_predictions_fast 0 0.0909090909090909 0.181818181818182 0.272727272727273
## 0 2 3 1 1
## 0.0909090909090909 1 0 0 0
## 0.181818181818182 0 0 0 0
## 0.272727272727273 0 0 0 0
## 0.30188679245283 0 0 1 0
## 0.363636363636364 0 0 0 0
## 0.454545454545455 0 0 0 0
## 0.545454545454545 0 0 0 0
## 0.636363636363636 0 0 0 0
## 0.727272727272727 0 0 0 0
## 0.909090909090909 0 0 0 0
## 1 0 0 0 0
##
## dt_predictions_fast 0.30188679245283 0.363636363636364 0.454545454545455
## 0 0 1 1
## 0.0909090909090909 0 0 0
## 0.181818181818182 0 0 0
## 0.272727272727273 0 0 0
## 0.30188679245283 1 0 1
## 0.363636363636364 0 0 0
## 0.454545454545455 0 0 0
## 0.545454545454545 0 0 0
## 0.636363636363636 0 0 0
## 0.727272727272727 0 0 0
## 0.909090909090909 0 0 0
## 1 0 0 0
##
## dt_predictions_fast 0.545454545454545 0.636363636363636 0.727272727272727 1
## 0 0 1 1 0
## 0.0909090909090909 0 0 0 1
## 0.181818181818182 0 0 0 0
## 0.272727272727273 0 0 0 0
## 0.30188679245283 1 0 0 0
## 0.363636363636364 0 0 0 0
## 0.454545454545455 0 0 0 0
## 0.545454545454545 0 0 0 0
## 0.636363636363636 0 0 0 0
## 0.727272727272727 0 0 0 0
## 0.909090909090909 0 0 0 0
## 1 0 0 0 0
print("Confusion Matrix for Random Forest:")
## [1] "Confusion Matrix for Random Forest:"
print(confusion_matrix_rf_fast)
##
## rf_predictions_fast 0 0.0909090909090909 0.181818181818182 0.272727272727273
## 0.119878954378954 1 0 0 0
## 0.166934203637034 0 0 0 0
## 0.20487270957554 1 0 0 0
## 0.214646369353917 0 0 0 1
## 0.225266056794359 0 0 0 0
## 0.227013347763348 0 0 0 0
## 0.229343434343434 0 1 0 0
## 0.229791261673337 0 1 0 0
## 0.243386125404993 0 1 0 0
## 0.252924433009339 1 0 0 0
## 0.299328088578088 0 0 0 0
## 0.358843053173242 0 0 0 0
## 0.366833333333334 0 0 1 0
## 0.415569754145226 0 0 0 0
## 0.427694110920526 0 0 1 0
## 0.493007146941109 0 0 0 0
## 0.552459691252144 0 0 0 0
##
## rf_predictions_fast 0.30188679245283 0.363636363636364 0.454545454545455
## 0.119878954378954 0 0 0
## 0.166934203637034 0 0 0
## 0.20487270957554 0 0 0
## 0.214646369353917 0 0 0
## 0.225266056794359 0 0 0
## 0.227013347763348 0 0 1
## 0.229343434343434 0 0 0
## 0.229791261673337 0 0 0
## 0.243386125404993 0 0 0
## 0.252924433009339 0 0 0
## 0.299328088578088 0 0 0
## 0.358843053173242 0 1 0
## 0.366833333333334 0 0 0
## 0.415569754145226 1 0 0
## 0.427694110920526 0 0 0
## 0.493007146941109 0 0 0
## 0.552459691252144 0 0 1
##
## rf_predictions_fast 0.545454545454545 0.636363636363636 0.727272727272727 1
## 0.119878954378954 0 0 0 0
## 0.166934203637034 0 1 0 0
## 0.20487270957554 0 0 0 0
## 0.214646369353917 0 0 0 0
## 0.225266056794359 0 0 0 1
## 0.227013347763348 0 0 0 0
## 0.229343434343434 0 0 0 0
## 0.229791261673337 0 0 0 0
## 0.243386125404993 0 0 0 0
## 0.252924433009339 0 0 0 0
## 0.299328088578088 0 0 1 0
## 0.358843053173242 0 0 0 0
## 0.366833333333334 0 0 0 0
## 0.415569754145226 0 0 0 0
## 0.427694110920526 0 0 0 0
## 0.493007146941109 1 0 0 0
## 0.552459691252144 0 0 0 0
When It comes to the total number of fatalities based on the data set I relied on the T-test to give me the answer and the results are below.
Data Information:
Data: dataset$fatalities_per_incident
Sample Mean: 19.49996
T-Test Results:
t-Value: 4.744
Degrees of Freedom (df): 54
p-Value: 1.575e-05 (very small)
Confidence Interval:
95% Confidence Interval: (11.25900, 27.74091)
Hypothesis Testing:
Null Hypothesis ($H_0$): The true mean of fatalities_per_incident is equal to 0.
Alternative Hypothesis ($H_a$): The true mean of fatalities_per_incident is not equal to 0.
Interpretation:
The t-test statistic (4.744) is associated with a very small p-value (1.575e-05), which is below the commonly used significance level of 0.05.
Therefore, you would reject the null hypothesis.
The 95% confidence interval (11.25900, 27.74091) indicates the range within which you are reasonably confident the true mean of fatalities_per_incident lies.
In conclusion, based on this t-test, there is evidence to suggest that the true mean of fatalities_per_incident is significantly different from 0. The sample mean is 19.49996, and the 95% confidence interval provides a range for the true population mean.
Which means for every fatality per incident falls between 11% to 27%.