Import data

library(tidyverse)
tx_injuries <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-10/tx_injuries.csv") %>%
  # Reorganize parks by franchise: e.g., all six flags in different locations in Six Flag
  mutate(park = case_when(
    str_detect(name_of_operation, regex("Six Flag", ignore_case = T)) ~ "Six Flag",
    str_detect(name_of_operation, regex("Splashtown", ignore_case = T)) ~ "Splashtown",
    str_detect(name_of_operation, regex("Skygroup", ignore_case = T)) ~ "Skygroup",
    str_detect(name_of_operation, regex("Schlitterbahn", ignore_case = T)) ~ "Schlitterbahn",
    str_detect(name_of_operation, regex("Typhoon", ignore_case = T)) ~ "Typhoon",
    TRUE ~ name_of_operation
  )) %>%
  # Lump least common factor levels into Other
  mutate(park = fct_lump(park, 5),
         alleged_injury = fct_lump(alleged_injury, 4)) %>%
  # Filter out Other and NA in Park
  filter(park != "Other",
         alleged_injury != "Other" & !is.na(alleged_injury))

Hint: You can choose any data you like but can’t take one that is already taken by other groups.

Description of the data and definition of variables

The data displays statistics of amusement park injuries, we chose the one focused in Texas.

injury_report_rec = Unique Record ID

name_of_operation = Company name

city = City

st = State (all TX)

injury_date = Injury date - note there are some different formats

ride_name = Ride Name

serial_no = Serial number of ride

gender = Gender of the injured individual

age = Age of the injured individual

body_part = Body part injured

alleged_injury = Alleged injury - type of injury

cause_of_injury = Approximate cause of the injury (free text)

other = Anecdotal information in addition to cause of injury

Visualize data

Hint: Create at least two plots. 4.3 SCATTERPLOT DOES NOT WORK

tx_injuries %>%
  mutate(park = factor(park, levels = c("Splashtown","Skygroup","Schlitterbahn","Typhoon","Six Flag"))) %>%
  ggplot(aes(park, fill = alleged_injury)) +
  geom_bar() +
  coord_flip() +
  labs(title = "Amusement Parks in Texas by Injuries",
       subtitle = "Top Five Most Popular Amusement Parks by Four Most Common Types of Injuries",
       caption = "source: https://data.world/amillerbernd/texas-amusement-park-accidents/workspace/file?filename=Amusement-Park-Injuries-xlsxCleaned.xls",
       fill = "Type of Injuries",
       y = "Number of Injuries",
       x = "Amusement Parks") 

The first plot shows the top five most popular amusement parks in Texas. Which park is the most dangerous as indicated by the data? What is your rationale? Can you tell by simply looking at this plot? Or what other information you need to answer this question?

-Most dangerous park is likely Six Flags

-Only park that has reports of contusions, dislocations, lacerations, and pain

-The high number is in part due to the fact that Six Flags is a larger park then the rest

-Still the most dangerous because the data is only from one Six Flags location

HOW WE REACHED THIS CONCLUSION:

-The chart has a legend that gives a color to each type of injury, which is how we can see Six Flags has each type of injury reported

-X values on the chart represent # of injuries

-2nd highest was Typhoon with 24 injuries, Six Flags had 90

-Even though Six Flags is a larger park with more rides that can result in injury, the number is so much higher that it is likely still the most dangerous park. Typhoon is most likely a close second

tx_injuries %>%
  mutate(park = factor(park, levels = c("Splashtown","Skygroup","Schlitterbahn","Typhoon","Six Flag"))) %>%
  ggplot(aes(park, fill = alleged_injury)) +
  geom_bar(position = "fill") +
  coord_flip() +
  labs(title = "Amusement Parks in Texas by Injuries",
       subtitle = "Top Five Most Popular Amusement Parks by Four Most Common Types of Injuries",
       caption = "source: https://data.world/amillerbernd/texas-amusement-park-accidents/workspace/file?filename=Amusement-Park-Injuries-xlsxCleaned.xls",
       fill = "Type of Injuries",
       y = "Proportion",
       x = "Amusement Parks") 

The second plot shows the proportion of type of injuries per amusement park. Is there any information that might be useful to the park management? What would you recommend?

-Useful because they can improve safety based on what types on injuries are common

WHAT MAY BE MORE HELPFUL:

-A plot that shows what particular rides have the most injuries

-They could use this information to focus on what rides result in injuries which would do a better job of improving overall safety

Hide the messages, but display the code and its results on the webpage.

List names of all group members (both first and last name) at the top of the webpage.

Use the correct slug.