library(tidyverse)
tx_injuries <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-10/tx_injuries.csv") %>%
# Reorganize parks by franchise: e.g., all six flags in different locations in Six Flag
mutate(park = case_when(
str_detect(name_of_operation, regex("Six Flag", ignore_case = T)) ~ "Six Flag",
str_detect(name_of_operation, regex("Splashtown", ignore_case = T)) ~ "Splashtown",
str_detect(name_of_operation, regex("Skygroup", ignore_case = T)) ~ "Skygroup",
str_detect(name_of_operation, regex("Schlitterbahn", ignore_case = T)) ~ "Schlitterbahn",
str_detect(name_of_operation, regex("Typhoon", ignore_case = T)) ~ "Typhoon",
TRUE ~ name_of_operation
)) %>%
# Lump least common factor levels into Other
mutate(park = fct_lump(park, 5),
alleged_injury = fct_lump(alleged_injury, 4)) %>%
# Filter out Other and NA in Park
filter(park != "Other",
alleged_injury != "Other" & !is.na(alleged_injury))
Hint: You can choose any data you like but can’t take one that is already taken by other groups.
The data displays statistics of amusement park injuries, we chose the one focused in Texas.
injury_report_rec = Unique Record ID
name_of_operation = Company name
city = City
st = State (all TX)
injury_date = Injury date - note there are some different formats
ride_name = Ride Name
serial_no = Serial number of ride
gender = Gender of the injured individual
age = Age of the injured individual
body_part = Body part injured
alleged_injury = Alleged injury - type of injury
cause_of_injury = Approximate cause of the injury (free text)
other = Anecdotal information in addition to cause of injury
Hint: Create at least two plots. 4.3 SCATTERPLOT DOES NOT WORK
tx_injuries %>%
mutate(park = factor(park, levels = c("Splashtown","Skygroup","Schlitterbahn","Typhoon","Six Flag"))) %>%
ggplot(aes(park, fill = alleged_injury)) +
geom_bar() +
coord_flip() +
labs(title = "Amusement Parks in Texas by Injuries",
subtitle = "Top Five Most Popular Amusement Parks by Four Most Common Types of Injuries",
caption = "source: https://data.world/amillerbernd/texas-amusement-park-accidents/workspace/file?filename=Amusement-Park-Injuries-xlsxCleaned.xls",
fill = "Type of Injuries",
y = "Number of Injuries",
x = "Amusement Parks")
The first plot shows the top five most popular amusement parks in Texas. Which park is the most dangerous as indicated by the data? What is your rationale? Can you tell by simply looking at this plot? Or what other information you need to answer this question?
-Most dangerous park is likely Six Flags
-Only park that has reports of contusions, dislocations, lacerations, and pain
-The high number is in part due to the fact that Six Flags is a larger park then the rest
-Still the most dangerous because the data is only from one Six Flags location
HOW WE REACHED THIS CONCLUSION:
-The chart has a legend that gives a color to each type of injury, which is how we can see Six Flags has each type of injury reported
-X values on the chart represent # of injuries
-2nd highest was Typhoon with 24 injuries, Six Flags had 90
-Even though Six Flags is a larger park with more rides that can result in injury, the number is so much higher that it is likely still the most dangerous park. Typhoon is most likely a close second
tx_injuries %>%
mutate(park = factor(park, levels = c("Splashtown","Skygroup","Schlitterbahn","Typhoon","Six Flag"))) %>%
ggplot(aes(park, fill = alleged_injury)) +
geom_bar(position = "fill") +
coord_flip() +
labs(title = "Amusement Parks in Texas by Injuries",
subtitle = "Top Five Most Popular Amusement Parks by Four Most Common Types of Injuries",
caption = "source: https://data.world/amillerbernd/texas-amusement-park-accidents/workspace/file?filename=Amusement-Park-Injuries-xlsxCleaned.xls",
fill = "Type of Injuries",
y = "Proportion",
x = "Amusement Parks")
The second plot shows the proportion of type of injuries per amusement park. Is there any information that might be useful to the park management? What would you recommend?
-Useful because they can improve safety based on what types on injuries are common
WHAT MAY BE MORE HELPFUL:
-A plot that shows what particular rides have the most injuries
-They could use this information to focus on what rides result in injuries which would do a better job of improving overall safety