Final Project DATA607 - Beyond Magnitude: Analyzing Earthquake Risk Through Seismic Activity and Population Exposure

Author

Muhammad Suffyan Khan

Published

May 9, 2026

1 1. Executive Summary

This project analyzes earthquake risk by combining two different types of data sources: earthquake event data from the USGS Earthquake Catalog API/GeoJSON source and country-level population data from the World Bank Population, total (SP.POP.TOTL) CSV dataset. The main idea is that earthquake risk should not be interpreted only through magnitude or event counts. A more useful risk view should also consider how many people may be exposed in affected countries or regions.

The workflow follows a reproducible data science process: acquire data, clean and transform the raw records, create useful features, join earthquake activity with population exposure, conduct exploratory and statistical analysis, build a custom exposure scoring system, and present findings through visualizations and interpretation. The project uses geospatial mapping with leaflet and a custom earthquake exposure score as features beyond the standard class workflow.

2 2. Introduction and Motivation

Earthquakes are often discussed in terms of magnitude, but magnitude alone does not fully describe risk. A strong earthquake in a remote area may expose fewer people than a moderate earthquake near a highly populated region. For this reason, this project combines seismic activity with population data to examine earthquake exposure from a broader data science perspective.

The goal is to answer the following question:

Which countries or regions appear most exposed when earthquake activity is combined with population size, and does this change how earthquake risk is interpreted compared with magnitude or event count alone?

3 3. Research Questions

This analysis is guided by four research questions:

  1. Which countries or regions experienced the highest number of earthquakes during the study period?
  2. Which countries or regions had the highest earthquake activity after adjusting for population size?
  3. How do magnitude, depth, event frequency, and population combine into an exposure-based risk ranking?
  4. Does the custom exposure score provide a different interpretation than earthquake count alone?

4 4. Data Sources

4.1 4.1 USGS Earthquake Catalog API

The first data source is the USGS Earthquake Catalog API. The project uses a GeoJSON query to retrieve earthquake events with information such as event time, magnitude, depth, latitude, longitude, place description, tsunami flag, and event significance.

For this project, the study period is limited to one full year to keep the scope manageable and reproducible. The analysis uses earthquakes with magnitude 4.5 or greater because these events are more likely to be meaningful for regional exposure analysis and reduce noise from very small events.

4.2 4.2 World Bank Population CSV

The second data source is the World Bank Population, total indicator (SP.POP.TOTL). The raw CSV file is stored in the GitHub repository and loaded through a raw GitHub URL. The World Bank dataset contains yearly population values by country.

4.3 4.3 Why These Sources Work Together

The USGS data captures seismic activity, while the World Bank data captures country-level population exposure. Combining these sources allows the project to compare earthquake activity alone with population-adjusted and exposure-weighted metrics.

5 5. Data Science Workflow

The project follows this workflow:

  1. Acquire earthquake data from the USGS API and population data from the World Bank CSV.
  2. Clean event records, timestamps, magnitude values, depth values, and country identifiers.
  3. Transform earthquake timestamps into dates, classify magnitudes and depths, reshape population data from wide to long format, and select the latest available population year.
  4. Join earthquake records with country boundaries and population data.
  5. Analyze event frequency, magnitude distribution, population-adjusted rates, and custom exposure scores.
  6. Visualize earthquake patterns using bar charts, scatterplots, and an interactive map.
  7. Conclude which countries or regions appear most exposed and discuss limitations.

6 6. Loading Libraries

library(tidyverse)
library(jsonlite)
library(lubridate)
library(scales)
library(sf)
library(rnaturalearth)
library(leaflet)
library(broom)
library(knitr)
library(htmltools)
library(viridis)

options(scipen = 999)

7 7. Data Acquisition

7.1 7.1 Load USGS Earthquake Data

This section pulls earthquake event records from the USGS Earthquake Catalog API. The query uses GeoJSON format, a one-year study period, and a minimum magnitude of 4.5.

study_start <- "2025-01-01"
study_end <- "2025-12-31"
minimum_magnitude <- 4.5

usgs_url <- paste0(
  "https://earthquake.usgs.gov/fdsnws/event/1/query?",
  "format=geojson",
  "&starttime=", study_start,
  "&endtime=", study_end,
  "&minmagnitude=", minimum_magnitude,
  "&orderby=time"
)

earthquake_json <- jsonlite::fromJSON(usgs_url, flatten = TRUE)

earthquake_features <- earthquake_json$features

coordinate_tbl <- tibble(
  longitude = purrr::map_dbl(earthquake_features$geometry.coordinates, 1),
  latitude = purrr::map_dbl(earthquake_features$geometry.coordinates, 2),
  depth_km = purrr::map_dbl(earthquake_features$geometry.coordinates, 3)
)

earthquakes_raw <- earthquake_features %>%
  as_tibble() %>%
  bind_cols(coordinate_tbl) %>%
  transmute(
    event_id = id,
    magnitude = properties.mag,
    place = properties.place,
    event_time_utc = lubridate::as_datetime(properties.time / 1000, tz = "UTC"),
    updated_time_utc = lubridate::as_datetime(properties.updated / 1000, tz = "UTC"),
    tsunami = properties.tsunami,
    significance = properties.sig,
    status = properties.status,
    alert = properties.alert,
    event_url = properties.url,
    longitude,
    latitude,
    depth_km
  )

knitr::kable(
  head(earthquakes_raw, 10),
  caption = "Sample of raw earthquake events from the USGS API"
)
Sample of raw earthquake events from the USGS API
event_id magnitude place event_time_utc updated_time_utc tsunami significance status alert event_url longitude latitude depth_km
us7000rlm7 4.8 69 km E of Yamada, Japan 2025-12-30 23:51:36 2026-04-01 22:48:43 0 354 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlm7 142.7562 39.4639 35.000
us7000rr5p 4.5 173 km NNE of Tobelo, Indonesia 2025-12-30 22:36:09 2026-04-01 22:49:07 0 312 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5p 128.8654 3.0448 10.000
us7000rllj 4.8 117 km E of Miyako, Japan 2025-12-30 22:19:46 2026-04-01 22:48:43 0 354 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rllj 143.3045 39.5984 27.735
us7000rlks 4.5 72 km NNE of Gorontalo, Indonesia 2025-12-30 20:55:06 2026-04-01 22:48:42 0 312 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlks 123.3440 1.1328 10.000
us7000rlkk 5.3 26 km SSE of Boyuibe, Bolivia 2025-12-30 20:26:04 2026-04-01 22:48:42 0 432 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlkk -63.1477 -20.6323 553.819
us7000rr5f 4.5 75 km ESE of Pondaguitan, Philippines 2025-12-30 16:45:05 2026-04-01 22:49:07 0 312 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5f 126.8425 6.2037 80.527
us7000rlim 4.6 60 km SSW of Lhokseumawe, Indonesia 2025-12-30 13:43:09 2026-04-01 22:48:40 0 326 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlim 96.9010 4.6976 10.182
us7000rr5c 4.6 25 km NE of Hingatungan, Philippines 2025-12-30 13:40:00 2026-04-01 22:49:06 0 326 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5c 125.3714 10.7262 10.000
us7000rli5 5.1 87 km SSW of Panguna, Papua New Guinea 2025-12-30 11:22:07 2026-04-01 22:48:40 0 400 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rli5 155.1950 -7.0521 10.000
us7000rli1 4.5 251 km E of Levuka, Fiji 2025-12-30 10:58:04 2026-04-01 22:48:40 0 312 reviewed NA https://earthquake.usgs.gov/earthquakes/eventpage/us7000rli1 -178.3075 -18.1477 536.568

The USGS API data is now in a tidy format with one row per earthquake event. The longitude, latitude, and depth values were extracted from the nested GeoJSON coordinate field.

7.2 7.2 Load World Bank Population Data

The World Bank file has metadata rows before the actual table begins, so the code skips the first four rows. This keeps the raw file unchanged while making the analysis reproducible.

population_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
country_metadata_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/Metadata_Country_API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
indicator_metadata_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/Metadata_Indicator_API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"

population_raw <- readr::read_csv(
  population_url,
  skip = 4,
  show_col_types = FALSE
) %>%
  select(-starts_with("Unnamed"))

country_metadata <- readr::read_csv(
  country_metadata_url,
  show_col_types = FALSE
) %>%
  select(-starts_with("Unnamed"))

indicator_metadata <- readr::read_csv(
  indicator_metadata_url,
  show_col_types = FALSE
) %>%
  select(-starts_with("Unnamed"))

knitr::kable(
  head(population_raw, 8),
  caption = "Sample of raw World Bank population data"
)
Sample of raw World Bank population data
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 …71
Aruba ABW Population, total SP.POP.TOTL 54922 55578 56320 57002 57619 58190 58694 58990 59069 59052 58950 58781 58047 58299 58349 58295 58368 58580 58776 59191 59909 60563 61276 62228 62901 61728 59931 59159 59331 60443 62753 65896 69005 73685 77595 79805 83021 86301 88451 89659 90588 91439 92074 93128 95138 97635 99405 100150 100917 101604 101838 102591 104110 105675 106807 107906 108727 108735 108908 109203 108587 107700 107310 107359 107995 NA NA
Africa Eastern and Southern AFE Population, total SP.POP.TOTL 130075728 133534923 137171659 140945536 144904094 149033472 153281203 157704381 162329396 167088245 171984985 177022314 182126556 187524135 193186642 198914573 204802976 210680842 217074286 223974122 230792729 238043099 245822010 253644643 261458202 269450407 277621771 286067346 294498625 302939121 311748681 320442961 329082707 338324002 347441809 356580375 366138524 375646235 385505757 395750933 406156661 416807868 427820358 439173286 450928044 463076637 475606210 488580707 502070763 516003448 530308387 544737983 559609961 575202699 590968990 607123269 623369401 640058741 657801085 675950189 694446100 713090928 731821393 750491370 769280888 NA NA
Afghanistan AFG Population, total SP.POP.TOTL 9035043 9214083 9404406 9604487 9814318 10036008 10266395 10505959 10756922 11017409 11290128 11567667 11853696 12157999 12469127 12773954 13059851 13340756 13611441 13655567 13169311 11937581 10991378 10917982 11190221 11426852 11420074 11387818 11523298 11874088 12045660 12238879 13278974 14943172 16250794 17065836 17763266 18452091 19159996 19887785 20130327 20284307 21378117 22733049 23560654 24404567 25424094 25909852 26482622 27466101 28284089 29347708 30560034 31622704 32792523 33831764 34700612 35688935 36743039 37856121 39068979 40000412 40578842 41454761 42647492 NA NA
Africa Western and Central AFW Population, total SP.POP.TOTL 97630925 99706674 101854756 104089175 106388440 108772632 111246953 113795019 116444636 119203521 122086536 125072948 128176494 131449942 134911581 138569918 142337272 146258576 150402616 154721711 159166518 163762473 168585118 173255157 177880746 182811038 187889141 193104347 198485027 204062274 209566031 215178709 221191375 227246778 233360104 239801875 246415446 253207584 260297834 267506298 274968446 282780717 290841795 299142845 307725100 316588476 325663158 334984176 344586109 354343844 364358270 374790143 385360349 396030207 406992047 418127845 429454743 440882906 452195915 463365429 474569351 485920997 497387180 509398589 521764076 NA NA
Angola AGO Population, total SP.POP.TOTL 5231654 5301583 5354310 5408320 5464187 5521981 5581386 5641807 5702699 5763685 5852788 5991102 6174262 6388528 6613367 6842947 7074664 7317829 7576734 7847207 8133872 8435607 8751648 9082983 9425917 9779120 10139450 10497858 10861291 11238562 11626360 12023529 12423712 12827135 13249764 13699778 14170973 14660413 15159370 15667235 16194869 16747208 17327699 17943712 18600423 19291161 20015279 20778561 21578655 22414773 23294825 24218352 25177394 26165620 27160769 28157798 29183070 30234839 31297155 32375632 33451132 34532429 35635029 36749906 37885849 NA NA
Albania ALB Population, total SP.POP.TOTL 1608800 1659800 1711319 1762621 1814135 1864791 1914573 1965598 2022272 2081695 2135479 2187853 2243126 2296752 2350124 2404831 2458526 2513546 2566266 2617832 2671997 2726056 2784278 2843960 2904429 2964762 3022635 3083605 3142336 3227943 3286542 3266790 3247039 3227287 3207536 3187784 3168033 3148281 3128530 3108778 3089027 3060173 3051010 3039616 3026939 3011487 2992547 2970017 2947314 2927519 2913021 2905195 2860708 2816902 2773767 2731293 2689469 2648285 2607733 2567801 2528480 2489762 2451636 2414095 2377128 NA NA
Andorra AND Population, total SP.POP.TOTL 9510 10283 11086 11915 12764 13634 14626 15837 17176 18555 19977 21442 22957 24523 26130 27773 29436 31097 32735 34306 35782 37169 38764 40565 42273 43825 45734 47653 49416 50620 52597 56667 60200 63272 64612 63912 63984 64700 65382 65710 65685 65852 66506 69486 74325 77421 79585 81877 83495 83888 80706 77783 76834 75194 73737 72174 72181 73763 75162 76474 77380 78364 79705 80856 81938 NA NA
Arab World ARB Population, total SP.POP.TOTL 91540853 93931683 96428599 99038509 101729760 104494008 107403055 110491709 113712171 117053219 120469263 123950810 127535560 131335660 135282861 139495837 144014135 148599336 153568663 159054118 164657321 170015485 175370356 180973288 186733957 192563243 198509771 204524595 210269696 215969068 223184294 228402316 234321565 240926205 247626514 253952538 260229121 266565183 273094199 279709634 286548096 293580252 300691509 307815105 315060277 322628451 330526310 338947976 347684619 356386673 363568534 372054934 381030177 390954281 400231008 410190679 419808341 428315886 435998060 444281315 453723239 460646603 471352066 482105978 492612632 NA NA

8 8. Data Cleaning and Transformation

8.1 8.1 Clean Earthquake Records

This section converts timestamps into useful date fields and creates magnitude and depth categories. These transformations support both descriptive analysis and later exposure scoring.

earthquakes_clean <- earthquakes_raw %>%
  filter(
    !is.na(magnitude),
    !is.na(latitude),
    !is.na(longitude),
    !is.na(depth_km)
  ) %>%
  mutate(
    event_date = as.Date(event_time_utc),
    event_year = lubridate::year(event_time_utc),
    event_month = lubridate::month(event_time_utc, label = TRUE, abbr = TRUE),
    magnitude_category = case_when(
      magnitude < 5.0 ~ "Light: 4.5-4.9",
      magnitude < 6.0 ~ "Moderate: 5.0-5.9",
      magnitude < 7.0 ~ "Strong: 6.0-6.9",
      TRUE ~ "Major/Great: 7.0+"
    ),
    magnitude_category = factor(
      magnitude_category,
      levels = c(
        "Light: 4.5-4.9",
        "Moderate: 5.0-5.9",
        "Strong: 6.0-6.9",
        "Major/Great: 7.0+"
      )
    ),
    depth_category = case_when(
      depth_km < 70 ~ "Shallow: <70 km",
      depth_km < 300 ~ "Intermediate: 70-299 km",
      TRUE ~ "Deep: 300+ km"
    ),
    depth_category = factor(
      depth_category,
      levels = c("Shallow: <70 km", "Intermediate: 70-299 km", "Deep: 300+ km")
    ),
    magnitude_weight = case_when(
      magnitude < 5.0 ~ 1,
      magnitude < 6.0 ~ 2,
      magnitude < 7.0 ~ 4,
      TRUE ~ 8
    ),
    depth_weight = case_when(
      depth_km < 70 ~ 1.20,
      depth_km < 300 ~ 1.00,
      TRUE ~ 0.80
    ),
    event_weight = magnitude_weight * depth_weight
  )

knitr::kable(
  earthquakes_clean %>%
    select(event_id, event_date, magnitude, magnitude_category, depth_km, depth_category, event_weight, place) %>%
    head(10),
  caption = "Cleaned earthquake records with magnitude and depth categories"
)
Cleaned earthquake records with magnitude and depth categories
event_id event_date magnitude magnitude_category depth_km depth_category event_weight place
us7000rlm7 2025-12-30 4.8 Light: 4.5-4.9 35.000 Shallow: <70 km 1.2 69 km E of Yamada, Japan
us7000rr5p 2025-12-30 4.5 Light: 4.5-4.9 10.000 Shallow: <70 km 1.2 173 km NNE of Tobelo, Indonesia
us7000rllj 2025-12-30 4.8 Light: 4.5-4.9 27.735 Shallow: <70 km 1.2 117 km E of Miyako, Japan
us7000rlks 2025-12-30 4.5 Light: 4.5-4.9 10.000 Shallow: <70 km 1.2 72 km NNE of Gorontalo, Indonesia
us7000rlkk 2025-12-30 5.3 Moderate: 5.0-5.9 553.819 Deep: 300+ km 1.6 26 km SSE of Boyuibe, Bolivia
us7000rr5f 2025-12-30 4.5 Light: 4.5-4.9 80.527 Intermediate: 70-299 km 1.0 75 km ESE of Pondaguitan, Philippines
us7000rlim 2025-12-30 4.6 Light: 4.5-4.9 10.182 Shallow: <70 km 1.2 60 km SSW of Lhokseumawe, Indonesia
us7000rr5c 2025-12-30 4.6 Light: 4.5-4.9 10.000 Shallow: <70 km 1.2 25 km NE of Hingatungan, Philippines
us7000rli5 2025-12-30 5.1 Moderate: 5.0-5.9 10.000 Shallow: <70 km 2.4 87 km SSW of Panguna, Papua New Guinea
us7000rli1 2025-12-30 4.5 Light: 4.5-4.9 536.568 Deep: 300+ km 0.8 251 km E of Levuka, Fiji

8.2 8.2 Convert Earthquake Points to Spatial Data

To connect earthquakes with countries, the latitude and longitude values are converted into spatial point features. Natural Earth country boundaries are then used to assign country and continent information to each event.

world_countries <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") %>%
  sf::st_make_valid() %>%
  select(
    country = admin,
    iso_a3,
    continent,
    geometry
  )

earthquake_points <- earthquakes_clean %>%
  sf::st_as_sf(
    coords = c("longitude", "latitude"),
    crs = 4326,
    remove = FALSE
  )

earthquakes_with_country <- sf::st_join(
  earthquake_points,
  world_countries,
  join = sf::st_within,
  left = TRUE
) %>%
  sf::st_drop_geometry()

country_match_summary <- earthquakes_with_country %>%
  summarise(
    total_events = n(),
    matched_to_country = sum(!is.na(iso_a3)),
    unmatched_or_ocean_events = sum(is.na(iso_a3)),
    matched_percent = matched_to_country / total_events
  )

knitr::kable(
  country_match_summary,
  digits = 3,
  caption = "Summary of earthquake events matched to country boundaries"
)
Summary of earthquake events matched to country boundaries
total_events matched_to_country unmatched_or_ocean_events matched_percent
8546 1487 7059 0.174

Some earthquake events occur offshore or in oceanic regions, so they may not fall within a country boundary. These events are retained for global descriptive analysis but excluded from country-level population joins.

8.3 8.3 Clean and Reshape World Bank Population Data

The World Bank population dataset is originally wide, with one column for each year. This section transforms it into a tidy long format and selects the latest available population year.

population_long <- population_raw %>%
  select(
    country_name_wb = `Country Name`,
    iso_a3 = `Country Code`,
    indicator_name = `Indicator Name`,
    indicator_code = `Indicator Code`,
    matches("^\\d{4}$")
  ) %>%
  pivot_longer(
    cols = matches("^\\d{4}$"),
    names_to = "population_year",
    values_to = "population"
  ) %>%
  mutate(
    population_year = as.integer(population_year),
    population = as.numeric(population)
  )

latest_population_year <- population_long %>%
  filter(!is.na(population)) %>%
  summarise(latest_year = max(population_year)) %>%
  pull(latest_year)

population_latest <- population_long %>%
  filter(population_year == latest_population_year) %>%
  filter(!is.na(population)) %>%
  select(iso_a3, country_name_wb, population_year, population)

country_metadata_clean <- country_metadata %>%
  transmute(
    iso_a3 = `Country Code`,
    wb_region = Region,
    income_group = IncomeGroup,
    table_name = TableName
  )

population_latest <- population_latest %>%
  left_join(country_metadata_clean, by = "iso_a3") %>%
  filter(!is.na(wb_region), wb_region != "Aggregates")

knitr::kable(
  head(population_latest, 10),
  caption = paste("Cleaned World Bank population data for latest available year:", latest_population_year)
)
Cleaned World Bank population data for latest available year: 2024
iso_a3 country_name_wb population_year population wb_region income_group table_name
ABW Aruba 2024 107995 Latin America & Caribbean High income Aruba
AFG Afghanistan 2024 42647492 Middle East & North Africa Low income Afghanistan
AGO Angola 2024 37885849 Sub-Saharan Africa Lower middle income Angola
ALB Albania 2024 2377128 Europe & Central Asia Upper middle income Albania
AND Andorra 2024 81938 Europe & Central Asia High income Andorra
ARE United Arab Emirates 2024 10986400 Middle East & North Africa High income United Arab Emirates
ARG Argentina 2024 45696159 Latin America & Caribbean Upper middle income Argentina
ARM Armenia 2024 3033500 Europe & Central Asia Upper middle income Armenia
ASM American Samoa 2024 46765 East Asia & Pacific High income American Samoa
ATG Antigua and Barbuda 2024 93772 Latin America & Caribbean High income Antigua and Barbuda

This transformation satisfies the requirement for data reshaping because the World Bank population data is converted from a wide year-based format into a tidy long format before the latest year is selected.

9 9. Joined Country-Level Analysis Dataset

This section aggregates earthquake activity by country and joins it with population data. Several exposure metrics are created, including earthquakes per million people, weighted activity per million people, and a custom exposure index.

earthquake_country_summary <- earthquakes_with_country %>%
  filter(!is.na(iso_a3)) %>%
  group_by(iso_a3, country, continent) %>%
  summarise(
    earthquake_count = n(),
    avg_magnitude = mean(magnitude, na.rm = TRUE),
    max_magnitude = max(magnitude, na.rm = TRUE),
    avg_depth_km = mean(depth_km, na.rm = TRUE),
    shallow_event_count = sum(depth_category == "Shallow: <70 km", na.rm = TRUE),
    strong_or_major_count = sum(magnitude >= 6.0, na.rm = TRUE),
    major_event_count = sum(magnitude >= 7.0, na.rm = TRUE),
    weighted_seismic_activity = sum(event_weight, na.rm = TRUE),
    .groups = "drop"
  )

country_exposure <- earthquake_country_summary %>%
  left_join(population_latest, by = "iso_a3") %>%
  filter(!is.na(population), population > 0) %>%
  mutate(
    population_millions = population / 1000000,
    earthquakes_per_million = earthquake_count / population_millions,
    weighted_activity_per_million = weighted_seismic_activity / population_millions,
    population_exposure_factor = log10(population),
    raw_exposure_index = weighted_seismic_activity * population_exposure_factor,
    exposure_score_0_100 = scales::rescale(raw_exposure_index, to = c(0, 100))
  ) %>%
  arrange(desc(exposure_score_0_100))

knitr::kable(
  country_exposure %>%
    select(
      country, continent, population_year, population_millions, earthquake_count,
      max_magnitude, weighted_seismic_activity, earthquakes_per_million,
      exposure_score_0_100
    ) %>%
    head(15),
  digits = 2,
  caption = "Top 15 countries by custom earthquake exposure score"
)
Top 15 countries by custom earthquake exposure score
country continent population_year population_millions earthquake_count max_magnitude weighted_seismic_activity earthquakes_per_million exposure_score_0_100
China Asia 2024 1408.97 146 7.1 217.4 0.10 100.00
Indonesia Asia 2024 283.49 151 6.4 229.0 0.53 97.31
Ethiopia Africa 2024 132.06 110 5.9 157.2 0.83 64.09
Papua New Guinea Oceania 2024 10.58 100 6.6 146.2 9.45 51.51
Chile South America 2024 19.76 95 6.4 128.4 4.81 46.96
Myanmar Asia 2024 54.50 64 7.7 98.0 1.17 37.96
Argentina South America 2024 45.70 87 5.9 98.0 1.90 37.58
Philippines Asia 2024 115.84 64 6.0 91.2 0.55 36.81
Afghanistan Asia 2024 42.65 50 6.2 82.0 1.17 31.28
Peru South America 2024 34.22 63 5.9 76.8 1.84 28.91
Iran Asia 2024 91.57 50 5.6 71.6 0.55 28.47
Russia Europe 2024 143.53 40 5.7 58.4 0.28 23.75
United States of America North America 2024 340.11 27 7.0 53.4 0.08 22.70
Japan Asia 2024 123.98 45 5.3 54.2 0.36 21.85
Turkey Asia 2024 85.52 33 6.1 53.2 0.39 21.01

The custom exposure score combines weighted seismic activity with a population exposure factor. The score is scaled from 0 to 100 so countries can be compared more easily. This is not a measure of actual damage or loss; it is an analytical index designed to compare relative exposure based on available data.

10 10. Exploratory Data Analysis

10.1 10.1 Overall Earthquake Summary

overall_summary <- earthquakes_clean %>%
  summarise(
    study_start = min(event_date),
    study_end = max(event_date),
    total_events = n(),
    average_magnitude = mean(magnitude),
    median_magnitude = median(magnitude),
    maximum_magnitude = max(magnitude),
    average_depth_km = mean(depth_km),
    median_depth_km = median(depth_km)
  )

knitr::kable(
  overall_summary,
  digits = 2,
  caption = "Overall earthquake summary for the study period"
)
Overall earthquake summary for the study period
study_start study_end total_events average_magnitude median_magnitude maximum_magnitude average_depth_km median_depth_km
2025-01-01 2025-12-30 8546 4.81 4.7 8.8 53.27 10

10.2 10.2 Earthquakes by Magnitude Category

magnitude_category_summary <- earthquakes_clean %>%
  count(magnitude_category, name = "event_count") %>%
  mutate(percent = event_count / sum(event_count))

knitr::kable(
  magnitude_category_summary,
  digits = 3,
  caption = "Earthquake count by magnitude category"
)
Earthquake count by magnitude category
magnitude_category event_count percent
Light: 4.5-4.9 6420 0.751
Moderate: 5.0-5.9 1982 0.232
Strong: 6.0-6.9 128 0.015
Major/Great: 7.0+ 16 0.002

11 11. Visualizations Describing the Data

11.1 11.1 Magnitude Distribution

ggplot(earthquakes_clean, aes(x = magnitude)) +
  geom_histogram(binwidth = 0.1, boundary = 0, color = "white") +
  labs(
    title = "Distribution of Earthquake Magnitudes",
    subtitle = paste0("USGS earthquakes from ", study_start, " to ", study_end, ", magnitude ", minimum_magnitude, "+"),
    x = "Magnitude",
    y = "Number of Earthquakes"
  ) +
  theme_minimal()

This plot describes the overall magnitude distribution and helps validate that most events in the study period are in the lower end of the selected magnitude range.

11.2 11.2 Earthquake Count by Magnitude Category

ggplot(magnitude_category_summary, aes(x = magnitude_category, y = event_count)) +
  geom_col() +
  geom_text(aes(label = event_count), vjust = -0.3, size = 3.5) +
  labs(
    title = "Earthquake Events by Magnitude Category",
    x = "Magnitude Category",
    y = "Number of Earthquakes"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 25, hjust = 1))

11.3 11.3 Magnitude and Depth Relationship

ggplot(earthquakes_clean, aes(x = depth_km, y = magnitude)) +
  geom_point(alpha = 0.45) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relationship Between Earthquake Depth and Magnitude",
    x = "Depth (km)",
    y = "Magnitude"
  ) +
  theme_minimal()

This scatterplot explores whether deeper earthquakes tend to have different magnitudes than shallower events. The trend line provides a simple visual check before formal statistical analysis.

11.4 11.4 Top Countries by Earthquake Count

top_countries_by_count <- country_exposure %>%
  slice_max(earthquake_count, n = 15) %>%
  mutate(country = fct_reorder(country, earthquake_count))

ggplot(top_countries_by_count, aes(x = earthquake_count, y = country)) +
  geom_col() +
  labs(
    title = "Top Countries by Earthquake Event Count",
    subtitle = "Country-level events matched using earthquake coordinates and country boundaries",
    x = "Number of Earthquakes",
    y = "Country"
  ) +
  theme_minimal()

This plot shows where earthquakes occurred most often, but it does not yet account for population exposure.

12 12. Visualizations Supporting the Conclusion

12.1 12.1 Top Countries by Custom Exposure Score

top_countries_by_exposure <- country_exposure %>%
  slice_max(exposure_score_0_100, n = 15) %>%
  mutate(country = fct_reorder(country, exposure_score_0_100))

ggplot(top_countries_by_exposure, aes(x = exposure_score_0_100, y = country)) +
  geom_col() +
  labs(
    title = "Top Countries by Earthquake Exposure Score",
    subtitle = "Score combines weighted seismic activity with a population exposure factor",
    x = "Exposure Score (0-100)",
    y = "Country"
  ) +
  theme_minimal()

This graphic supports the main conclusion because it ranks countries using both seismic activity and population exposure, not event count alone.

12.2 12.2 Population and Weighted Seismic Activity

country_exposure %>%
  filter(earthquake_count >= 3) %>%
  ggplot(aes(x = population_millions, y = weighted_seismic_activity)) +
  geom_point(aes(size = exposure_score_0_100), alpha = 0.65) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_log10(labels = scales::comma) +
  labs(
    title = "Population Size and Weighted Seismic Activity",
    subtitle = "Countries with more events and larger populations can rank higher in exposure-based analysis",
    x = "Population in Millions (log scale)",
    y = "Weighted Seismic Activity",
    size = "Exposure Score"
  ) +
  theme_minimal()

This scatterplot compares population size with weighted seismic activity. It helps show why exposure analysis can differ from a simple ranking by event count.

12.3 12.3 Earthquakes per Million People

top_per_million <- country_exposure %>%
  filter(population_millions >= 1) %>%
  slice_max(earthquakes_per_million, n = 15) %>%
  mutate(country = fct_reorder(country, earthquakes_per_million))

ggplot(top_per_million, aes(x = earthquakes_per_million, y = country)) +
  geom_col() +
  labs(
    title = "Top Countries by Earthquakes per Million People",
    subtitle = "Population-adjusted rate using countries with at least 1 million people",
    x = "Earthquakes per Million People",
    y = "Country"
  ) +
  theme_minimal()

This chart provides a population-adjusted view of earthquake activity. It is useful because countries with smaller populations can appear very different after adjustment.

13 13. Feature Beyond Class: Static and Interactive Geospatial Mapping

This project includes both a static world map and an interactive leaflet map as geospatial features beyond the standard course workflow. The static map provides a reliable visualization of global earthquake locations in the rendered report, while the interactive leaflet map allows the user to explore individual earthquake events by clicking on markers. Each marker includes event-level information such as magnitude, depth, date, place, and matched country when available.

leaflet_data <- earthquakes_with_country %>%
  filter(!is.na(latitude), !is.na(longitude)) %>%
  mutate(
    popup_text = paste0(
      "<strong>Magnitude:</strong> ", round(magnitude, 1), "<br>",
      "<strong>Depth:</strong> ", round(depth_km, 1), " km<br>",
      "<strong>Date:</strong> ", event_date, "<br>",
      "<strong>Place:</strong> ", htmltools::htmlEscape(place), "<br>",
      "<strong>Country:</strong> ",
      ifelse(is.na(country), "Offshore/Unmatched", htmltools::htmlEscape(country))
    )
  )
world_map <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")

ggplot() +
  geom_sf(data = world_map, fill = "gray95", color = "gray70", linewidth = 0.2) +
  geom_point(
    data = leaflet_data,
    aes(x = longitude, y = latitude, size = magnitude),
    alpha = 0.45
  ) +
  scale_size_continuous(range = c(1, 5)) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Global Earthquake Locations",
    subtitle = paste0("USGS earthquakes from ", study_start, " to ", study_end, ", magnitude ", minimum_magnitude, "+"),
    x = "Longitude",
    y = "Latitude",
    size = "Magnitude"
  ) +
  theme_minimal()

leaflet(leaflet_data, height = 600) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(
    lng = ~longitude,
    lat = ~latitude,
    radius = ~scales::rescale(magnitude, to = c(3, 9)),
    stroke = FALSE,
    fillOpacity = 0.55,
    popup = ~popup_text,
    label = ~paste0("M", round(magnitude, 1), " - ", place)
  ) %>%
  addLegend(
    position = "bottomright",
    title = "Earthquakes M4.5+",
    labels = c("Circle size increases with magnitude"),
    colors = "gray40"
  )

Together, the static and interactive maps support the geographic interpretation of the project by showing that earthquake activity is spatially clustered rather than evenly distributed across the world. The static map ensures that the location pattern is visible in the published report, while the interactive map provides an additional way to inspect event-level details.

14 14. Statistical Analysis

14.1 14.1 Correlation Analysis

This section evaluates relationships among population size, earthquake count, weighted seismic activity, and the custom exposure score.

correlation_data <- country_exposure %>%
  select(
    population_millions,
    earthquake_count,
    avg_magnitude,
    max_magnitude,
    avg_depth_km,
    weighted_seismic_activity,
    earthquakes_per_million,
    exposure_score_0_100
  )

correlation_results <- tibble(
  comparison = c(
    "Population vs Earthquake Count",
    "Population vs Weighted Seismic Activity",
    "Earthquake Count vs Exposure Score",
    "Weighted Seismic Activity vs Exposure Score",
    "Earthquakes per Million vs Exposure Score"
  ),
  correlation = c(
    cor(correlation_data$population_millions, correlation_data$earthquake_count, use = "complete.obs"),
    cor(correlation_data$population_millions, correlation_data$weighted_seismic_activity, use = "complete.obs"),
    cor(correlation_data$earthquake_count, correlation_data$exposure_score_0_100, use = "complete.obs"),
    cor(correlation_data$weighted_seismic_activity, correlation_data$exposure_score_0_100, use = "complete.obs"),
    cor(correlation_data$earthquakes_per_million, correlation_data$exposure_score_0_100, use = "complete.obs")
  )
)

knitr::kable(
  correlation_results,
  digits = 3,
  caption = "Correlation analysis for earthquake exposure variables"
)
Correlation analysis for earthquake exposure variables
comparison correlation
Population vs Earthquake Count 0.363
Population vs Weighted Seismic Activity 0.387
Earthquake Count vs Exposure Score 0.986
Weighted Seismic Activity vs Exposure Score 0.995
Earthquakes per Million vs Exposure Score -0.062

The correlation analysis supports the project by testing whether the exposure score is more closely aligned with raw event frequency, weighted seismic activity, population-adjusted rates, or population size.

14.2 14.2 Regression Model

This regression model evaluates how population size, earthquake frequency, average magnitude, and average depth relate to the custom exposure index. The model uses a log transformation because exposure values are right-skewed.

regression_data <- country_exposure %>%
  filter(
    earthquake_count >= 2,
    population > 0,
    raw_exposure_index > 0
  ) %>%
  mutate(
    log_exposure_index = log1p(raw_exposure_index),
    log_population = log10(population)
  )

exposure_model <- lm(
  log_exposure_index ~ log_population + earthquake_count + avg_magnitude + avg_depth_km,
  data = regression_data
)

model_summary <- broom::tidy(exposure_model)
model_fit <- broom::glance(exposure_model)

knitr::kable(
  model_summary,
  digits = 4,
  caption = "Regression model coefficients for log exposure index"
)
Regression model coefficients for log exposure index
term estimate std.error statistic p.value
(Intercept) -5.0216 3.5261 -1.4241 0.1609
log_population 0.2340 0.1233 1.8978 0.0638
earthquake_count 0.0295 0.0027 10.7404 0.0000
avg_magnitude 1.5182 0.6708 2.2633 0.0282
avg_depth_km 0.0013 0.0018 0.7294 0.4693
knitr::kable(
  model_fit %>% select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs),
  digits = 4,
  caption = "Regression model fit statistics"
)
Regression model fit statistics
r.squared adj.r.squared sigma statistic p.value df nobs
0.7845 0.7666 0.6602 43.6883 0 4 53

The regression model provides statistical support for the conclusion by evaluating whether population and earthquake activity variables are associated with the exposure index.

14.3 14.3 Comparing Event Count Ranking vs Exposure Score Ranking

A key goal of the project is to show whether exposure analysis changes interpretation compared with simple earthquake counts. This section compares country rankings by earthquake count and by exposure score.

ranking_comparison <- country_exposure %>%
  mutate(
    count_rank = min_rank(desc(earthquake_count)),
    exposure_rank = min_rank(desc(exposure_score_0_100)),
    rank_change = count_rank - exposure_rank
  ) %>%
  arrange(exposure_rank) %>%
  select(
    country,
    continent,
    population_millions,
    earthquake_count,
    count_rank,
    exposure_score_0_100,
    exposure_rank,
    rank_change
  )

knitr::kable(
  ranking_comparison %>% head(20),
  digits = 2,
  caption = "Comparison of earthquake count ranking and exposure score ranking"
)
Comparison of earthquake count ranking and exposure score ranking
country continent population_millions earthquake_count count_rank exposure_score_0_100 exposure_rank rank_change
China Asia 1408.97 146 2 100.00 1 1
Indonesia Asia 283.49 151 1 97.31 2 -1
Ethiopia Africa 132.06 110 3 64.09 3 0
Papua New Guinea Oceania 10.58 100 4 51.51 4 0
Chile South America 19.76 95 5 46.96 5 0
Myanmar Asia 54.50 64 7 37.96 6 1
Argentina South America 45.70 87 6 37.58 7 -1
Philippines Asia 115.84 64 7 36.81 8 -1
Afghanistan Asia 42.65 50 10 31.28 9 1
Peru South America 34.22 63 9 28.91 10 -1
Iran Asia 91.57 50 10 28.47 11 -1
Russia Europe 143.53 40 13 23.75 12 1
United States of America North America 340.11 27 15 22.70 13 2
Japan Asia 123.98 45 12 21.85 14 -2
Turkey Asia 85.52 33 14 21.01 15 -1
Mexico North America 130.86 25 17 15.28 16 1
Colombia South America 52.89 26 16 12.97 17 -1
Ecuador South America 18.14 24 19 12.76 18 1
Canada North America 41.29 19 20 10.79 19 1
Pakistan Asia 251.27 19 20 10.41 20 0

A positive rank change means a country ranks higher under the exposure score than under simple earthquake count. This helps demonstrate how population exposure can change the interpretation of earthquake risk.

15 15. Main Findings

top_count_country <- country_exposure %>%
  slice_max(earthquake_count, n = 1, with_ties = FALSE)

top_exposure_country <- country_exposure %>%
  slice_max(exposure_score_0_100, n = 1, with_ties = FALSE)

top_per_million_country <- country_exposure %>%
  filter(population_millions >= 1) %>%
  slice_max(earthquakes_per_million, n = 1, with_ties = FALSE)

summary_findings <- tibble(
  finding = c(
    "Highest earthquake count",
    "Highest custom exposure score",
    "Highest earthquakes per million people among countries with at least 1 million people"
  ),
  country = c(
    top_count_country$country,
    top_exposure_country$country,
    top_per_million_country$country
  ),
  value = c(
    round(top_count_country$earthquake_count, 2),
    round(top_exposure_country$exposure_score_0_100, 2),
    round(top_per_million_country$earthquakes_per_million, 2)
  )
)

knitr::kable(
  summary_findings,
  caption = "Summary of main country-level findings"
)
Summary of main country-level findings
finding country value
Highest earthquake count Indonesia 151.00
Highest custom exposure score China 100.00
Highest earthquakes per million people among countries with at least 1 million people Papua New Guinea 9.45
cat(
  "The country with the highest earthquake count in this study period was",
  top_count_country$country,
  "with",
  top_count_country$earthquake_count,
  "matched earthquake events. The country with the highest custom exposure score was",
  top_exposure_country$country,
  "with an exposure score of",
  round(top_exposure_country$exposure_score_0_100, 2),
  ". The highest earthquakes-per-million rate among countries with at least 1 million people was observed in",
  top_per_million_country$country,
  "with",
  round(top_per_million_country$earthquakes_per_million, 2),
  "earthquakes per million people. These differences show why population-adjusted and exposure-weighted analysis can change the interpretation of earthquake risk."
)

The country with the highest earthquake count in this study period was Indonesia with 151 matched earthquake events. The country with the highest custom exposure score was China with an exposure score of 100 . The highest earthquakes-per-million rate among countries with at least 1 million people was observed in Papua New Guinea with 9.45 earthquakes per million people. These differences show why population-adjusted and exposure-weighted analysis can change the interpretation of earthquake risk.

The main finding is that earthquake risk interpretation changes depending on the metric used. Countries with the largest number of earthquakes are not always the same as countries with the highest population-adjusted rates or the highest exposure score. This supports the motivation for analyzing earthquake risk beyond magnitude and event count alone.

16 16. Challenges Encountered

One major challenge in this project was connecting earthquake events to countries. The USGS data provides latitude and longitude coordinates and a text place description, but the place field is not always a clean country name. Some events occur offshore, near islands, or in oceanic regions. To address this, the project used geospatial country boundaries from Natural Earth and assigned country information using a spatial join. Events that did not fall within a country boundary were treated as offshore or unmatched events and excluded from the country-population join.

A second challenge was the structure of the World Bank population file. The raw CSV contains metadata rows before the table and stores years as separate columns. To solve this, the project skipped the metadata rows and transformed the population data from wide to long format before selecting the latest available year.

17 17. Limitations

This project uses country-level population as a broad exposure measure. This is useful for comparing countries, but it does not show the exact number of people near each earthquake epicenter. A more advanced version of this project could use gridded population density data to estimate people living within a specific radius of each earthquake.

The custom exposure score is also an analytical index rather than a direct measure of damage, deaths, or economic loss. Actual earthquake impact depends on many factors not included here, such as building quality, emergency preparedness, time of day, local soil conditions, tsunami risk, and distance from population centers.

Finally, offshore earthquakes are important for tsunami risk, but many are not assigned to a country through the spatial join. This analysis focuses mainly on country-level seismic exposure and population context.

18 18. Conclusion

This project shows that earthquake risk should be analyzed beyond magnitude alone. By combining USGS earthquake activity with World Bank population data, the analysis provides multiple ways to compare seismic exposure: raw event counts, population-adjusted rates, weighted seismic activity, and a custom exposure score.

The results demonstrate that the interpretation of risk can change depending on whether population exposure is included. A country with many earthquakes may not have the highest population-adjusted exposure, while a highly populated country with fewer but stronger events may rank higher using an exposure-based score. This supports the main conclusion that population exposure adds important context to earthquake risk analysis.

19 19. Reproducibility Notes

This project is designed to be reproducible:

  • The USGS earthquake data is acquired from a public API/GeoJSON query.
  • The World Bank population CSV files are stored in the project GitHub repository.
  • The World Bank data is loaded using raw GitHub URLs rather than local file paths.
  • All cleaning, transformation, analysis, and visualization steps are performed in this Quarto document.
  • The project avoids references to local OneDrive, Desktop, or Downloads paths.