The beauty of this RMarkdown approach is that anything we know we’ll want to analyze in the future, we can write code for now. By replacing only a few lines of code at the start, we can evaluate entirely different/new datasets with this same tailored approach. Switching out datasets (ex. combining years, adding new BARD data, etc.), allows us to be able to run this same analysis for any dataset with a few clicks.
The primary goals of this approach are:
The first thing we’re going to do is check the dataset for missing observations across variables. This function returns a graph that shows variable missingness across the entire given dataset. This process helps us evaluate which variables can be analyzed in the following sections
check_missing_variables(Overview_deaths) # checking which variables are missing data
Now that we know what we can work with, let’s make some basic tables and crosstabs. We’ll utilize the count function primarily for this. (We’re only doing this for several variables for the purpose of illustrating capabilities).
Overview_deaths %>%
count(WaterConditions)
## # A tibble: 5 x 2
## WaterConditions n
## <chr> <int>
## 1 Calm 293
## 2 Choppy 157
## 3 Rough 66
## 4 Unknown 76
## 5 Very rough 9
Overview_deaths %>%
count(NumberDeaths)
## # A tibble: 4 x 2
## NumberDeaths n
## <dbl> <int>
## 1 1 542
## 2 2 50
## 3 3 6
## 4 4 3
table(Overview_deaths$WaterConditions, Overview_deaths$NumberDeaths)
##
## 1 2 3 4
## Calm 264 24 3 2
## Choppy 140 13 3 1
## Rough 58 8 0 0
## Unknown 72 4 0 0
## Very rough 8 1 0 0
Now, the interesting part..
df_WaterConditions <- Overview_deaths %>%
group_by(WaterConditions) %>%
filter(WaterConditions!="Unknown") %>%
summarise(counts = n())
ggplot(df_WaterConditions, aes(x = WaterConditions, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(y = "Total Deaths", fill="test", x = "Water Conditions", title = "Number of Fatalities: by Water Conditions at Time of Incident")
df_drownings <- Overview_deaths %>%
group_by(NumberDrownings) %>%
summarise(counts = n())
ggplot(df_drownings, aes(x = NumberDrownings, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(y = "Total Deaths", fill="test", x = "Number of Drowning Reported", title = "Number of Fatalities")
df_DayofWeek <- Overview_deaths %>%
group_by(DayofWeek) %>%
summarise(counts = n())
ggplot(df_DayofWeek, aes(x = DayofWeek, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(y = "Total Number of Deaths", fill="test", x = "", title = "Number of Fatalities by Day of the Week")
df_CauseofDeath <- Overview_deaths %>%
group_by(CauseCat) %>%
summarise(counts = n())
ggplot(df_CauseofDeath, aes(x = CauseCat, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(y = "Total Number of Deaths", fill="test", x = "Primary Cause of Accident", title = "Number of Fatalities by Accident Cause") +
theme(axis.text.x = element_text(angle = 15, hjust = 1))
df_TimeCat <- Overview_deaths %>%
group_by(TimeCat) %>%
summarise(counts = n())
ggplot(df_TimeCat, aes(x = TimeCat, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
scale_x_discrete(limit = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13),
labels = c("12:00am to 2:30am", "2:31am to 4:30am","4:31am to 6:30am", "6:31am to 8:30am",
"8:31am to 10:30am", "10:31am to 12:30pm", "12:31pm to 2:30pm", "2:31pm to 4:30pm",
"4:31pm to 6:30pm", "6:31pm to 8:30pm", "8:31pm to 10:30pm", "10:31 pm to 11:59 pm",
"Unknown")) +
theme(axis.text.x = element_text(angle = 55, hjust = 1)) +
labs(x="Time Observed", y="Number of Fatalities", title = "Number of Fatalities by Time of Day")
df_BodyofWaterType <- Overview_deaths %>%
group_by(TypeOfBodyOfWater) %>%
summarise(counts = n())
ggplot(df_BodyofWaterType, aes(x = TypeOfBodyOfWater, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(y = "Total Deaths", fill="test", x = "Body of Water", title = "Number of Fatalities by Body of Water Type") +
theme(axis.text.x = element_text(angle = 55, hjust = 1))
There are many listed primary accident causes or AccidentCause1s. Let’s get a quick sense of which causes are most common before graphing it below
table(Overview_deaths$AccidentCause1)
##
## Alcohol use Dam/lock
## 87 6
## Drug use Equipment failure
## 5 7
## Excessive speed Failure to vent
## 21 1
## Force of wake/wave Hazardous waters
## 11 58
## Hull failure Ignition of fuel or vapor
## 5 4
## Improper anchoring Improper loading
## 3 16
## Improper lookout Machinery failure
## 23 9
## Missing/inadequate aids to navigation Navigation rules violation
## 1 15
## Operator inattention Operator inexperience
## 50 42
## Other Overloading
## 51 15
## People on gunwale, bow, or transom Restricted vision
## 12 2
## Sharp turn Starting in gear
## 7 1
## Sudden medical condition Unknown
## 18 93
## Weather
## 38
Okay, given that..
df_MainCause <- Overview_deaths %>%
group_by(AccidentCause1) %>%
filter(AccidentCause1=="Alcohol use"|AccidentCause1=="Operator inattention"|
AccidentCause1=="Operator inexperience"|AccidentCause1=="Hazardous waters") %>%
summarise(counts = n())
ggplot(df_MainCause, aes(x = AccidentCause1, y = counts)) +
geom_bar(fill = "dodgerblue", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(title="Primary Accident Cause of Fatalities", y= "Total Number of Deaths", x="Primary Accident Cause")
For the sake of illustrating ad hoc requests that can be applied in code, I have created the example below. Let’s say we want to get a sense of how much each state reported in total damages.
Total_Damage_State <-Overview %>%
filter(TotalDamage>0) %>%
count(TotalDamage, State)
# This will quickly inform us of the states with the highest damage amounts recorded
Total_Damage_State_Threshold <-Total_Damage_State %>%
#filter(TotalDamage>=10000) %>%
ggplot(aes(x=State, y=TotalDamage)) +
geom_col(fill="dodgerblue") +
scale_y_continuous(labels = scales::comma)
Total_Damage_State_Threshold1 <- Total_Damage_State %>%
# filter(TotalDamage>=10000) %>%
ggplot(aes(x = reorder(State, -TotalDamage), y = TotalDamage)) + geom_bar(stat = "identity") +
geom_col(fill="dodgerblue") +
scale_y_continuous(labels = scales::comma) +
labs(x="State (Acronym)", y="Total Damages Reported", title = "Total Damages Reported by State") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Total_Damage_State_Threshold1
Lastly, given that we have robust coordinate data for fatalities in BARD, let’s get a quick glimpse at what this can look like once plotted out