Introduction

Row

Overview

Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

Objective

The objective of this lab is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

  1. Identify what information interests you about climate change.
  2. Find, collect, organize, and summarize the data necessary to create your data exploration plan.
  3. Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
  4. Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
  5. Develop four questions or ideas about climate change from your visualizations.

Dates & Deliverables

Upload a HTML file to Canvas. The dashboard must include the source_code = TRUE or source_code = embed parameter. Use whichever one that displays the source code.

The due date for this lab is 17 November 2025 at 23h59. This assignment is worth 75 points, 3x a normal homework, the additional time should allow you to spend the neccessary effort on this assignment.

You are welcome to work in groups of \(\leq 2\) people. However, each person in a group must submit their own HTML file on Canvas for grading!

Methods Help

There are lots of places we can get climate data to answer your questions. The simplest would be to go to NOAA National Centers for Environmental Information (https://www.ncdc.noaa.gov/). There are all kinds of data here (regional, global, marine). Also, on the front page of the NOAA website there are also other websites that have climate data, such as: (https://www.climate.gov/), (https://www.weather.gov/), and (https://www.drought.gov/drought/). Obviously, you don’t have to use all of them but it might be helpful to browse them to get ideas for the development of your questions.

Alternatively, and more professionally, there are tons of packages that allow you to access data from R. See here for a great primer on accessing NOAA data with ‘R’. It is also a good introduction to API keys and their use.

Annual counts

##

Annual Counts

This step plot shows how the number of tornadoes changes from one year to the next, making sudden jumps or drops much easier to see. Unlike a bar chart, the step pattern clearly highlights year-to-year variability and reveals long-term shifts in tornado activity across the United States.

## Row

U.S. Tornado Counts per Year (1950–2024)

Seasonality

##

Seasonality

This chart compares the long-term monthly average of U.S. tornadoes with the average from the last 10 years. Tornado activity peaks in late spring and early summer, while winter has very few events. The recent pattern closely matches the long-term trend, showing that tornado season remains centered in April through June.

## Row

Monthly Tornado Seasonality in the U.S

Recent Tracks Map

##

Recent Tracks Map

This map shows how many tornadoes each state experienced in the last five years. Most recent tornado activity is concentrated in the central Plains and Southeast, especially in states like Texas, Oklahoma, Kansas, Alabama, and Mississippi. Western states have far fewer tornadoes. The map clearly highlights where tornado risks have been highest in recent years.

## Row

Recent Tornado Tracks by State (Last 5 Years)

Intensity

##

Intensity

This interactive dygraph shows how the average intensity of U.S. tornadoes has changed since 1950. The smooth line helps reveal long-term shifts, while the range selector allows zooming into specific periods. Although year-to-year values fluctuate, the visualization highlights patterns in the strength of tornadoes over time.

## Row

Average Tornado Intensity Over Time (1950–2024)

Casualties

##

Casualties

This chart shows the total tornado-related casualties each year from 1950 to 2024. The numbers change a lot from year to year, with some years showing major spikes due to large or destructive tornado outbreaks. While many years have low casualty totals, the data highlights how the human impact of tornadoes can vary widely depending on where and how severe the storms are.

## Row

Tornado Casualties in the United States (1950–2024)

Questions and Conclusion

Questions

  1. Is tornado activity increasing or decreasing over time?

Answer: Tornado counts vary from year to year, but reporting has generally increased over the long term. This may reflect improved detection technologies, better reporting practices, and possible influences from changing climate conditions.

  1. Does tornado activity follow a seasonal pattern?

Answer: Yes. Tornadoes show a strong seasonal cycle, with the highest activity in April–June. Winter months consistently show the lowest activity, and recent years follow the same overall pattern.

  1. Which regions experienced the most tornadoes in the last five years?

Answer: The highest activity occurred across the central Plains and the Southeast, especially in Texas, Oklahoma, Kansas, Alabama, and Mississippi. These areas include well-known tornado regions such as Tornado Alley and Dixie Alley.

  1. Are most tornadoes high-intensity events?

Answer: No. Most tornadoes fall into lower intensity categories, with strong and violent tornadoes being relatively rare. However, when high-intensity tornadoes do occur, they contribute disproportionately to damage and casualties.

Conclusion

This dashboard highlights several clear patterns in U.S. tornado behavior. Annual and seasonal analyses reveal when tornadoes are most frequent and how activity fluctuates over time. Geographic patterns show that the central Plains and Southeast consistently face the greatest tornado risk. Intensity and casualty data further demonstrate that while most tornadoes are weak, stronger events though rare cause significant human and economic impacts. Together, these visualizations provide an evidence-based understanding of tornado trends and support better preparedness, risk assessment, and future climate-related research.

---
title: "Lab 2 Data Exploration and Visualization Climate Change"
author: "Mounika Bhandari" 
date: "`r format(Sys.Date(), '%m/%d/%Y')`"
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social: menu
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(sf)
library(httr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(lubridate)
library(maps)
library(scales)

knitr::opts_chunk$set(
  echo = FALSE,     
  message = FALSE,  
  warning = FALSE,
  cache = TRUE
)

```

{data-width=650}
-----------------------------------------------------------------------

# Table of Contents {.sidebar}

* Introduction

* Annual Counts

* Seasonality

* Recent Tracks Map

* Intensity

* Casualties

* Questions and Conclusion

# **Introduction**

Row {data-height=230}
-------------------------------------

### Overview

Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

### Objective

The objective of this lab is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

1. Identify what information interests you about climate change. 
2. Find, collect, organize, and summarize the data necessary to create your data exploration plan.
3. Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
4. Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
5. Develop four questions or ideas about climate change from your visualizations.

### Dates & Deliverables

Upload a HTML file to Canvas. The dashboard must include the source_code = TRUE or source_code = embed parameter. Use whichever one that displays the source code.

The due date for this lab is 17 November 2025 at 23h59. This assignment is worth 75 points, 3x a normal homework, the additional time should allow you to spend the neccessary effort on this assignment.

You are welcome to work in groups of $\leq 2$ people. However, each person in a group must submit their own HTML file on Canvas for grading!

### Methods Help

There are lots of places we can get climate data to answer your questions. The simplest would be to go to NOAA National Centers for Environmental Information (<https://www.ncdc.noaa.gov/>). There are all kinds of data here (regional, global, marine). Also, on the front page of the NOAA website there are also other websites that have climate data, such as: (<https://www.climate.gov/>), (<https://www.weather.gov/>), and (<https://www.drought.gov/drought/>). Obviously, you don’t have to use all of them but it might be helpful to browse them to get ideas for the development of your questions.

Alternatively, and more professionally, there are tons of packages that allow you to access data from R. See here for a great primer on accessing NOAA data with ‘R’. It is also a good introduction to API keys and their use.



```{r load_spc_tornado_paths_fixed}
library(sf)
library(dplyr)
library(lubridate)

url <- "https://www.spc.noaa.gov/gis/svrgis/zipped/1950-2024-torn-aspath.zip"

zip_path <- tempfile(fileext = ".zip")
download.file(url, zip_path, mode = "wb", quiet = TRUE)
unz_dir <- file.path(tempdir(), "spc_tornado_unz")
dir.create(unz_dir, showWarnings = FALSE)
unzip(zip_path, exdir = unz_dir)

shp <- list.files(unz_dir, pattern = "\\.shp$", full.names = TRUE, recursive = TRUE)[1]
stopifnot(length(shp) == 1)
tw <- suppressWarnings(sf::st_read(shp, quiet = TRUE))

# ---- flexible column mapping ----
names(tw) <- tolower(names(tw))

# Find likely columns
year_col  <- intersect(c("yr", "year"), names(tw))
month_col <- intersect(c("mo", "month"), names(tw))
mag_col   <- intersect(c("mag", "f_scale", "ef_scale"), names(tw))
inj_col   <- intersect(c("inj", "injuries"), names(tw))
fat_col   <- intersect(c("fat", "fatalities"), names(tw))

# Mutate safely
tw <- tw %>%
  mutate(
    year  = if (length(year_col)) as.integer(.data[[year_col[1]]]) else NA_integer_,
    month = if (length(month_col)) as.integer(.data[[month_col[1]]]) else NA_integer_,
    mag   = if (length(mag_col))  as.integer(.data[[mag_col[1]]])  else NA_integer_,
    inj   = if (length(inj_col))  as.integer(.data[[inj_col[1]]])  else 0L,
    fat   = if (length(fat_col))  as.integer(.data[[fat_col[1]]])  else 0L
  ) %>%
  filter(!is.na(year))

latest_year <- max(tw$year, na.rm = TRUE)
message("✅ Tornado Paths loaded successfully. Records: ", nrow(tw),
        " | Years covered: ", min(tw$year, na.rm=TRUE), "–", latest_year)
tw_df <- sf::st_drop_geometry(tw)

```

# **Annual counts**

##  {data-width="250"}
------------------------
### **Annual Counts**

This step plot shows how the number of tornadoes changes from one year to the next, making sudden jumps or drops much easier to see. Unlike a bar chart, the step pattern clearly highlights year-to-year variability and reveals long-term shifts in tornado activity across the United States.

## Row
------------------------
### **U.S. Tornado Counts per Year (1950–2024)**

```{r}
annual <- tw_df |> count(year, name = "tornadoes")

ggplot(annual, aes(x = year, y = tornadoes)) +
  geom_step(color = "steelblue", linewidth = 1) +
  geom_point(color = "darkblue", size = 1.5) +
  labs(
    title = "U.S. Tornado Counts per Year (1950–2024)",
    subtitle = "Step plot showing year-to-year changes in tornado frequency",
    x = NULL,
    y = "Number of Tornadoes"
  ) +
  theme_minimal()

```

# **Seasonality**

##  {data-width="250"}
------------------------
### **Seasonality**

This chart compares the long-term monthly average of U.S. tornadoes with the average from the last 10 years. Tornado activity peaks in late spring and early summer, while winter has very few events. The recent pattern closely matches the long-term trend, showing that tornado season remains centered in April through June.

## Row
------------------------
### **Monthly Tornado Seasonality in the U.S**


```{r seasonality, echo=FALSE}
# Monthly seasonality: climatology vs last 10 years

# use tw_df (no geometry)
monthly <- tw_df %>%
  dplyr::filter(!is.na(month)) %>%
  dplyr::count(year, month, name = "n")

# long-term average per month
climo <- monthly %>%
  dplyr::group_by(month) %>%
  dplyr::summarise(climatology = mean(n), .groups = "drop")

# average of last 10 years
recent10 <- monthly %>%
  dplyr::filter(year >= max(year, na.rm = TRUE) - 9) %>%
  dplyr::group_by(month) %>%
  dplyr::summarise(recent = mean(n), .groups = "drop")

climo %>%
  dplyr::left_join(recent10, by = "month") %>%
  tidyr::pivot_longer(-month, names_to = "series", values_to = "value") %>%
  ggplot(aes(x = month, y = value, color = series)) +
  geom_line(linewidth = 1) +
  scale_x_continuous(breaks = 1:12, labels = month.abb) +
  labs(
    title    = "Monthly Tornado Seasonality in the U.S.",
    subtitle = "Climatology (1950–2024) vs mean of last 10 years",
    x        = "Month",
    y        = "Average number of tornadoes"
  ) +
  theme_minimal()
```

# **Recent Tracks Map**

##  {data-width="250"}
------------------------
### **Recent Tracks Map**

This map shows how many tornadoes each state experienced in the last five years. Most recent tornado activity is concentrated in the central Plains and Southeast, especially in states like Texas, Oklahoma, Kansas, Alabama, and Mississippi. Western states have far fewer tornadoes. The map clearly highlights where tornado risks have been highest in recent years.

## Row
------------------------
### **Recent Tornado Tracks by State (Last 5 Years)**


```{r recent_tornado_choropleth, echo=FALSE}
library(dplyr)
library(sf)
library(ggplot2)
library(maps)

# Make sure we have a non-spatial data frame
tw_df <- sf::st_drop_geometry(tw)

# Latest year in the data
latest_year <- max(tw_df$year, na.rm = TRUE)

# Try to detect the state column
state_col <- intersect(c("st", "state", "stf"), names(tw_df))[1]

if (is.na(state_col)) {
  plot.new()
  text(0.5, 0.5, "No state column found in tornado data.")
} else {
  # 🔹 Last 5 years, ALL tornadoes (no EF filter)
  recent_all <- tw_df %>%
    filter(year >= latest_year - 4)
  
  # Build state name column compatible with maps::map("state")
  if (all(nchar(recent_all[[state_col]]) == 2)) {
    # 2-letter state abbreviations
    recent_all <- recent_all %>%
      mutate(
        st_abbr    = toupper(.data[[state_col]]),
        state_name = tolower(state.name[match(st_abbr, state.abb)])
      )
  } else {
    # assume full state names already
    recent_all <- recent_all %>%
      mutate(state_name = tolower(.data[[state_col]]))
  }
  
  # Count tornadoes per state
  state_counts <- recent_all %>%
    filter(!is.na(state_name)) %>%
    count(state_name, name = "tornadoes_last5yr")
  
  # Load US states map
  usa <- maps::map("state", fill = TRUE, plot = FALSE)
  usa_sf <- sf::st_as_sf(usa)
  
  # Join counts to map (usa_sf$ID is lower-case state name)
  map_df <- usa_sf %>%
    left_join(state_counts, by = c("ID" = "state_name"))
  
  # Plot choropleth
  ggplot(map_df) +
    geom_sf(aes(fill = tornadoes_last5yr), color = "gray60", linewidth = 0.2) +
    scale_fill_viridis_c(
      option   = "magma",
      na.value = "gray95",
      name     = "Tornadoes\n(last 5 years)"
    ) +
    labs(
      title    = paste0("Tornadoes by State (Last 5 Years, ", latest_year - 4, "–", latest_year, ")"),
      subtitle = "Counts of all reported tornadoes by state",
      x = NULL,
      y = NULL
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold"),
      legend.position = "right"
    )
}

```

# **Intensity**

##  {data-width="250"}
------------------------
### **Intensity**

This interactive dygraph shows how the average intensity of U.S. tornadoes has changed since 1950. The smooth line helps reveal long-term shifts, while the range selector allows zooming into specific periods. Although year-to-year values fluctuate, the visualization highlights patterns in the strength of tornadoes over time.


## Row
------------------------
### **Average Tornado Intensity Over Time (1950–2024)**

```{r intensity_dygraph, echo=FALSE}
library(dplyr)
library(dygraphs)
library(xts)

# Compute yearly average intensity
intensity_ts <- tw_df %>%
  filter(!is.na(mag)) %>%
  group_by(year) %>%
  summarise(avg_intensity = mean(mag), .groups = "drop")

# Convert to xts time-series
intensity_xts <- xts(intensity_ts$avg_intensity, order.by = as.Date(paste0(intensity_ts$year, "-01-01")))

# Plot dygraph
dygraph(intensity_xts, main = "Average Tornado Intensity Over Time (1950–2024)") %>%
  dyAxis("y", label = "Average Intensity (F/EF)") %>%
  dyOptions(strokeWidth = 2, colors = "darkred") %>%
  dyRangeSelector()


```


# **Casualties**

##  {data-width="250"}
------------------------
### **Casualties**

This chart shows the total tornado-related casualties each year from 1950 to 2024. The numbers change a lot from year to year, with some years showing major spikes due to large or destructive tornado outbreaks. While many years have low casualty totals, the data highlights how the human impact of tornadoes can vary widely depending on where and how severe the storms are.

## Row
------------------------
### **Tornado Casualties in the United States (1950–2024)**

```{r casualties_per_year, echo=FALSE}
library(dplyr)
library(ggplot2)

# Build yearly casualty totals
casualties <- tw_df %>%
  mutate(
    injuries   = ifelse(is.na(inj), 0, inj),
    fatalities = ifelse(is.na(fat), 0, fat),
    total_cas  = injuries + fatalities
  ) %>%
  group_by(year) %>%
  summarise(total_casualties = sum(total_cas), .groups = "drop")

# Plot
ggplot(casualties, aes(x = year, y = total_casualties)) +
  geom_col(fill = "indianred") +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Tornado Casualties per Year in the U.S. (1950–2024)",
    subtitle = "Combined injuries and fatalities",
    x = NULL,
    y = "Total Casualties"
  ) +
  theme_minimal()
```

# **Questions and Conclusion**

### Questions

1. Is tornado activity increasing or decreasing over time?

Answer: Tornado counts vary from year to year, but reporting has generally increased over the long term. This may reflect improved detection technologies, better reporting practices, and possible influences from changing climate conditions.

2. Does tornado activity follow a seasonal pattern?

Answer: Yes. Tornadoes show a strong seasonal cycle, with the highest activity in April–June. Winter months consistently show the lowest activity, and recent years follow the same overall pattern.

3. Which regions experienced the most tornadoes in the last five years?

Answer: The highest activity occurred across the central Plains and the Southeast, especially in Texas, Oklahoma, Kansas, Alabama, and Mississippi. These areas include well-known tornado regions such as Tornado Alley and Dixie Alley.

4. Are most tornadoes high-intensity events?

Answer: No. Most tornadoes fall into lower intensity categories, with strong and violent tornadoes being relatively rare. However, when high-intensity tornadoes do occur, they contribute disproportionately to damage and casualties.

### Conclusion

This dashboard highlights several clear patterns in U.S. tornado behavior. Annual and seasonal analyses reveal when tornadoes are most frequent and how activity fluctuates over time. Geographic patterns show that the central Plains and Southeast consistently face the greatest tornado risk. Intensity and casualty data further demonstrate that while most tornadoes are weak, stronger events though rare cause significant human and economic impacts. Together, these visualizations provide an evidence-based understanding of tornado trends and support better preparedness, risk assessment, and future climate-related research.