Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
1 Introduction
Big picture question: How do typical home values (ZHVI) vary across U.S. places with different frequencies of storm events, and how have these patterns changed over time?
Why this matters: housing values relate to household wealth, insurance risks, and regional planning. Understanding whether areas with more frequent severe-weather reports look systematically different in typical prices can inform homeowners, insurers, and local governments. As background, Zillow’s ZHVI provides a model-based “typical” home value series that is widely used for regional comparisons and trend tracking (see Zillow Research ZHVI). On the weather side, NOAA’s NCEI compiles administrative storm reports from the National Weather Service (NWS), creating a long-running public record of event types, locations, and damage metrics (see the NCEI Storm Events Database).
What we aim to show: we do descriptive comparisons (not causal claims). We summarize storm exposure at the state level and compare distributions of metro-level home values across “lower / medium / higher” exposure groups. We also show time structure where appropriate (dates as an intermediate tool per INFO 201).
2 Analysis tied to Dataset 1: Storm Events (NOAA NCEI)
2.1 Questions
How many storm events are recorded by state in the selected year?
Which states appear in the top of the distribution (exposure proxy) for that year?
2.2 Storm Events Data
Collector/organization. NOAA National Centers for Environmental Information (compiled from NWS reports). Collection page. https://catalog.data.gov/dataset/ncdc-storm-events-database2
File used in this render (example year 1950). https://www.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/StormEvents_details-ftp_v1.0_d1950_c20250520.csv.gz
Population & sample. Population of interest is U.S. places’ storm exposure. The sample is an administrative listing of reported storm events (e.g., tornado, hail, flood) with timing, place, and impacts for the covered year. This render uses 1950, which is tornado-heavy relative to later years; definitions/reporting evolve over time.
This administrative listing offers nationwide, standardized event records suitable for simple exposure summaries by place.
#| fig.alt: Bar chart of top 10 states by number of recorded storm events in 1950; shows relative exposure only for that year.p_states <- state_exposure |>slice_max(events_1950, n =10) |>arrange(events_1950) |>ggplot(aes(x = events_1950, y =reorder(STATE, events_1950))) +geom_col() +labs(title ="Top 10 States by Recorded Storm Events (1950 file)",x ="Number of events (1950)", y ="State",caption ="Source: NOAA NCEI Storm Events (details-ftp_v1.0_d1950_c20250520.csv.gz)" )plotly::ggplotly(p_states)
Interpretation: This offers a relative exposure ranking for 1950 only. Because reporting practices and hazard mixes change over time, this is a limited snapshot.
3 Analysis of Dataset 2: Metro Home Values (Zillow ZHVI)
3.1 Questions
What is the distribution of current (latest month) metro ZHVI values?
Do median values differ across storm-exposure groups derived from Dataset 1?
3.2 ZHVI Data
Collector/organization. Zillow Research. CSV (smoothed, SA, metro). https://files.zillowstatic.com/research/public_csvs/zhvi/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv
Population & sample. Population: U.S. metro housing stock. Sample: model-based monthly typical value per metro (includes homes not for sale). Many series begin ~2000.
ZHVI provides a consistent, metro-level ‘typical value’ series that supports distributional comparisons across regions.
This file contains 895 rows and 318 columns and supports consistent metro-level comparisons.
3.3 Data Preparation
We keep the latest month and rename it to zhvi for clearer labeling in plots (INFO 201 select() and rename(); R4DS “Data transformation”).
latest_col <-names(zillow)[ncol(zillow)]# keep only columns that exist in this filez_latest <- zillow |>select(RegionName, StateName, !!latest_col) |>rename(zhvi =!!latest_col)summary(z_latest$zhvi)
Min. 1st Qu. Median Mean 3rd Qu. Max.
47226 181825 242259 286820 338468 1568567
We first check overall distribution. Because home values are typically right-skewed, we interpret center with the median.
p_hist <-ggplot(z_latest, aes(x = zhvi)) +geom_histogram(binwidth =50000, boundary =0, closed ="left") +labs(title =paste0("Distribution of Metro ZHVI — ", latest_col),x ="Typical home value (USD)",y ="Count",caption ="Source: Zillow Research ZHVI (metro, smoothed, seasonally adjusted).")plotly::ggplotly(p_hist)
We compare metro ZHVI distributions across state storm-exposure groups. Exposure is state-level; ZHVI is metro-level. We join by state, then create tertiles (lower/medium/higher) of exposure and make a box plot (aligns with peer feedback suggesting a distributional comparison).
# A. exposure by STATE from the storm file (full state names)state_exposure <- storm |>count(STATE, name ="events_1950")# B. keep the Zillow columns you need (this file has StateName, not State)latest_col <-names(zillow)[ncol(zillow)]z_latest <- zillow |>select(RegionName, StateName, !!latest_col) |>rename(zhvi =!!latest_col)# C. JOIN explicitly: StateName (Zillow) ↔ STATE (storm)z_joined <- z_latest |>left_join(state_exposure, by =c("StateName"="STATE")) |>mutate(exposure_group =ntile(replace_na(events_1950, 0), 3),exposure_group =factor(exposure_group,levels =c(1,2,3),labels =c("Lower exposure","Medium exposure","Higher exposure")) )"StateName"%in%names(z_latest); "STATE"%in%names(state_exposure)
# A tibble: 1 × 2
metros matched
<int> <int>
1 895 0
We use a box plot (as suggested in peer feedback) to compare the full distributions of metro ZHVI across lower/medium/higher exposure groups instead of only comparing means.
p_box <- z_joined |> dplyr::filter(!is.na(exposure_group), !is.na(zhvi)) |>ggplot(aes(x = exposure_group, y = zhvi, group = exposure_group)) +# explicit groupgeom_boxplot() +labs(title =paste0("Metro ZHVI by State Storm Exposure Group (", latest_col, ")"),x ="State storm exposure (tertiles from 1950 counts)",y ="Metro ZHVI (USD)",caption ="Sources: NOAA NCEI Storm Events (1950 file) joined to Zillow Research ZHVI (latest month)." )p_box
Interpretation. If medians differ across groups, there is an association between where more events were recorded in 1950 and typical metro values today. Because exposure is from a single (tornado-heavy) year and geographies differ (state vs metro), treat this as descriptive and non-causal.
4 General Conclusions
We provided a transparent descriptive linkage: state-level storm exposure (1950) → metro ZHVI distributions (latest).
A box plot yields a clearer distributional comparison than side-by-side means and reflects peer feedback.
Bias/uncertainty. One-year exposure likely underestimates multi-hazard patterns; state-to-metro join can introduce aggregation bias; ZHVI is model-based with coverage differences.
Future directions. Aggregate multi-year exposure (e.g., per-capita or severity-weighted), harmonize geographies (county/metro), and incorporate time windows with INFO 201 dates tools.
5 Project Summary
Group Members: Varnika Dokka; Ruby Xia; Jacob O’Connor
Joins (intermediate) — Needed to attach state storm-exposure counts to metro rows to enable a distributional comparison across exposure groups; there isn’t a simpler way to align those geographies. Reference: https://r4ds.hadley.nz/joins.html.
Dates (intermediate) — Parsed event timestamps with lubridate::mdy_hms() so time-aware checks/filters are possible. Reference: https://r4ds.hadley.nz/datetimes.html.
Plotly & DT (outside class tools) — ggplotly() for light chart interactivity and DT::datatable() for searchable tables; learned from Plotly R “Getting Started” and RStudio DT docs.
7 Group Work Description
Ruby, Jacob, and I worked together for the duration of our project. We all three came up with a topic of interest during our lab section, and I pitched the idea of focusing on storm events data, and connecting it with Zillow housing data as our TA, Renee, helped us connect. We looked for data sets and settled on two datasets that I found and shared with our group. For the analysis, we worked together interactively to figure out how to load our data and split things up so that Jacob and I mostly worked on the code for Storm Events data and Ruby worked on the code for Zillow housing data, which all three of us also helped pitch in. Ruby also helped fix any errors in our report with bar charts, etc., while Jacob and I recorded the presentation for our dashboard. We wrote drafts of the text for our parts and together joined them into a full report before all of us double-checked it and rendered it. I am happy with our group and how we assigned tasks and finished our report together.