Data Wrangling and Combining (from project 3)

read_disaster_data <- function(datapath) {
  # Extracting year and storing it to use later
  firstline <- read_lines(datapath, n_max=1)
  year1 <- str_split_i(firstline, " ", -1)
  
  # Skips first 2 rows which we don't want and makes row 3 the headers of each column
  tibble <- read_csv(datapath, skip=2)
  
  tibble <- tibble |>
    mutate(Year = as.numeric(year1),
           Name = str_remove(Name, "\\s*\\(.*?\\)"),
           `Begin Date` = ymd(`Begin Date`),
           `End Date` = ymd(`End Date`),
           `CPI-Adjusted Cost` = `CPI-Adjusted Cost` / 1000,
           `Unadjusted Cost` = `Unadjusted Cost` / 1000,
           ) |>
    mutate(Duration = `End Date` - `Begin Date`)
  
  tibble
}

# Importing all csv files and putting them together in a list
paths <- list.files("data", pattern="[.]csv$", full.names=TRUE)

# Each entry in the list is a tibble corresponding to a certain year
tibblelist = vector("list", length = length(paths))

# Dealing with each individual dataset
for (i in 1:length(paths)) {
  result <- read_disaster_data(paths[[i]])
  
  tibblelist[[i]] <- result
}

# Combine every tibble into 1
combined_data <- bind_rows(tibblelist)

combined_data <- combined_data |>
  rename(`CPI-Adjusted Cost (in billions)` = `CPI-Adjusted Cost`,
         `Unadjusted Cost (in billions)` = `Unadjusted Cost`)

write_csv(combined_data, "combined_data.csv")

Stacked Bar Chart

plot1 <- ggplot(combined_data) +
  geom_bar(aes(x=Year, fill=Disaster)) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  labs(y="Number of Events",
       title="Frequency of Disasters by Year") +
  scale_fill_viridis_d(option="C") +
  scale_x_continuous(breaks = seq(1980, 2024, by=4))

ggplotly(plot1)

Line Plot

cost_per_year <- combined_data |>
  group_by(Year) |>
  summarize(cost = sum(`CPI-Adjusted Cost (in billions)`))

plot2 <- ggplot(cost_per_year, aes(x=Year, y=round(cost, 2))) +
  geom_point(color="red", aes(text=paste("<b>Year:</b>",Year,"<br><b>Cost (in billions):</b>",round(cost,2)))) +
  geom_line(color="red") +
  theme_minimal() +
  labs(y="Cost in Billions",
       title="Total Cost of Disasters by Year") +
  scale_x_continuous(breaks = seq(1980, 2024, by=4))

ggplotly(plot2, tooltip = "text")

datatable(combined_data,
          rownames = FALSE,
          filter = "top")

Interactivity

I decided to animate the plots that I made for Portfolio 3 because when I recreated them, I wasn’t happy leaving them as static plots. I had to get creative in order to display the counts that made it easier for the user. I had to put the count above each individual bar which although helpful, did not look that clean. With the line plot, there were 44 values for 44 points which looked cluttered even after adjusting the font size. Being able to animate the plots meant that I did not have to put a value above its corresponding bar/point, and it also meant that I did not have to split up the bar charts since all the info is available simply by hovering over individual pieces. Although the original visual is interactive, all the information is condensed into one plot and a lot is displayed to the user which felt overwhelming. Splitting them up into two separate visuals conveys the same message but is “easier on the eyes” of the user. I also think it’s worthwhile including the complete dataset for the user to see as there is a lot of specific information that is not shown on the plots such as the name of the event, number of deaths, time period it took place, etc. It gives the user a chance to navigate the data and perform analysis that is not possible just by looking at the plots.

proj4-writeup

Shalim Montes

Data Wrangling and Combining (from project 3)

Stacked Bar Chart

Line Plot

Interactivity