Final Project DATA607 - Beyond Magnitude: Analyzing Earthquake Risk Through Seismic Activity and Population Exposure

Author

Muhammad Suffyan Khan

Published

May 9, 2026

1 1. Executive Summary

This project analyzes earthquake risk by combining two different types of data sources: earthquake event data from the USGS Earthquake Catalog API/GeoJSON source and country-level population data from the World Bank Population, total (SP.POP.TOTL) CSV dataset. The main idea is that earthquake risk should not be interpreted only through magnitude or event counts. A more useful risk view should also consider how many people may be exposed in affected countries or regions.

The workflow follows a reproducible data science process: acquire data, clean and transform the raw records, create useful features, join earthquake activity with population exposure, conduct exploratory and statistical analysis, build a custom exposure scoring system, and present findings through visualizations and interpretation. The project uses geospatial mapping with leaflet and a custom earthquake exposure score as features beyond the standard class workflow.

2 2. Introduction and Motivation

Earthquakes are often discussed in terms of magnitude, but magnitude alone does not fully describe risk. A strong earthquake in a remote area may expose fewer people than a moderate earthquake near a highly populated region. For this reason, this project combines seismic activity with population data to examine earthquake exposure from a broader data science perspective.

The goal is to answer the following question:

Which countries or regions appear most exposed when earthquake activity is combined with population size, and does this change how earthquake risk is interpreted compared with magnitude or event count alone?

3 3. Research Questions

This analysis is guided by four research questions:

Which countries or regions experienced the highest number of earthquakes during the study period?
Which countries or regions had the highest earthquake activity after adjusting for population size?
How do magnitude, depth, event frequency, and population combine into an exposure-based risk ranking?
Does the custom exposure score provide a different interpretation than earthquake count alone?

4 4. Data Sources

4.1 4.1 USGS Earthquake Catalog API

The first data source is the USGS Earthquake Catalog API. The project uses a GeoJSON query to retrieve earthquake events with information such as event time, magnitude, depth, latitude, longitude, place description, tsunami flag, and event significance.

For this project, the study period is limited to one full year to keep the scope manageable and reproducible. The analysis uses earthquakes with magnitude 4.5 or greater because these events are more likely to be meaningful for regional exposure analysis and reduce noise from very small events.

4.2 4.2 World Bank Population CSV

The second data source is the World Bank Population, total indicator (SP.POP.TOTL). The raw CSV file is stored in the GitHub repository and loaded through a raw GitHub URL. The World Bank dataset contains yearly population values by country.

4.3 4.3 Why These Sources Work Together

The USGS data captures seismic activity, while the World Bank data captures country-level population exposure. Combining these sources allows the project to compare earthquake activity alone with population-adjusted and exposure-weighted metrics.

5 5. Data Science Workflow

The project follows this workflow:

Acquire earthquake data from the USGS API and population data from the World Bank CSV.
Clean event records, timestamps, magnitude values, depth values, and country identifiers.
Transform earthquake timestamps into dates, classify magnitudes and depths, reshape population data from wide to long format, and select the latest available population year.
Join earthquake records with country boundaries and population data.
Analyze event frequency, magnitude distribution, population-adjusted rates, and custom exposure scores.
Visualize earthquake patterns using bar charts, scatterplots, and an interactive map.
Conclude which countries or regions appear most exposed and discuss limitations.

6 6. Loading Libraries

library(tidyverse)
library(jsonlite)
library(lubridate)
library(scales)
library(sf)
library(rnaturalearth)
library(leaflet)
library(broom)
library(knitr)
library(htmltools)
library(viridis)

options(scipen = 999)

7 7. Data Acquisition

7.1 7.1 Load USGS Earthquake Data

This section pulls earthquake event records from the USGS Earthquake Catalog API. The query uses GeoJSON format, a one-year study period, and a minimum magnitude of 4.5.

study_start <- "2025-01-01"
study_end <- "2025-12-31"
minimum_magnitude <- 4.5

usgs_url <- paste0(
  "https://earthquake.usgs.gov/fdsnws/event/1/query?",
  "format=geojson",
  "&starttime=", study_start,
  "&endtime=", study_end,
  "&minmagnitude=", minimum_magnitude,
  "&orderby=time"
)

earthquake_json <- jsonlite::fromJSON(usgs_url, flatten = TRUE)

earthquake_features <- earthquake_json$features

coordinate_tbl <- tibble(
  longitude = purrr::map_dbl(earthquake_features$geometry.coordinates, 1),
  latitude = purrr::map_dbl(earthquake_features$geometry.coordinates, 2),
  depth_km = purrr::map_dbl(earthquake_features$geometry.coordinates, 3)
)

earthquakes_raw <- earthquake_features %>%
  as_tibble() %>%
  bind_cols(coordinate_tbl) %>%
  transmute(
    event_id = id,
    magnitude = properties.mag,
    place = properties.place,
    event_time_utc = lubridate::as_datetime(properties.time / 1000, tz = "UTC"),
    updated_time_utc = lubridate::as_datetime(properties.updated / 1000, tz = "UTC"),
    tsunami = properties.tsunami,
    significance = properties.sig,
    status = properties.status,
    alert = properties.alert,
    event_url = properties.url,
    longitude,
    latitude,
    depth_km
  )

knitr::kable(
  head(earthquakes_raw, 10),
  caption = "Sample of raw earthquake events from the USGS API"
)

Sample of raw earthquake events from the USGS API
event_id	magnitude	place	event_time_utc	updated_time_utc	significance	status	alert	event_url	longitude	latitude	depth_km
us7000rlm7	4.8	69 km E of Yamada, Japan	2025-12-30 23:51:36	2026-04-01 22:48:43	354	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlm7	142.7562	39.4639	35.000
us7000rr5p	4.5	173 km NNE of Tobelo, Indonesia	2025-12-30 22:36:09	2026-04-01 22:49:07	312	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5p	128.8654	3.0448	10.000
us7000rllj	4.8	117 km E of Miyako, Japan	2025-12-30 22:19:46	2026-04-01 22:48:43	354	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rllj	143.3045	39.5984	27.735
us7000rlks	4.5	72 km NNE of Gorontalo, Indonesia	2025-12-30 20:55:06	2026-04-01 22:48:42	312	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlks	123.3440	1.1328	10.000
us7000rlkk	5.3	26 km SSE of Boyuibe, Bolivia	2025-12-30 20:26:04	2026-04-01 22:48:42	432	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlkk	-63.1477	-20.6323	553.819
us7000rr5f	4.5	75 km ESE of Pondaguitan, Philippines	2025-12-30 16:45:05	2026-04-01 22:49:07	312	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5f	126.8425	6.2037	80.527
us7000rlim	4.6	60 km SSW of Lhokseumawe, Indonesia	2025-12-30 13:43:09	2026-04-01 22:48:40	326	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlim	96.9010	4.6976	10.182
us7000rr5c	4.6	25 km NE of Hingatungan, Philippines	2025-12-30 13:40:00	2026-04-01 22:49:06	326	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5c	125.3714	10.7262	10.000
us7000rli5	5.1	87 km SSW of Panguna, Papua New Guinea	2025-12-30 11:22:07	2026-04-01 22:48:40	400	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rli5	155.1950	-7.0521	10.000
us7000rli1	4.5	251 km E of Levuka, Fiji	2025-12-30 10:58:04	2026-04-01 22:48:40	312	reviewed	NA	https://earthquake.usgs.gov/earthquakes/eventpage/us7000rli1	-178.3075	-18.1477	536.568

The USGS API data is now in a tidy format with one row per earthquake event. The longitude, latitude, and depth values were extracted from the nested GeoJSON coordinate field.

7.2 7.2 Load World Bank Population Data

The World Bank file has metadata rows before the actual table begins, so the code skips the first four rows. This keeps the raw file unchanged while making the analysis reproducible.

population_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
country_metadata_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/Metadata_Country_API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
indicator_metadata_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/Metadata_Indicator_API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"

population_raw <- readr::read_csv(
  population_url,
  skip = 4,
  show_col_types = FALSE
) %>%
  select(-starts_with("Unnamed"))

country_metadata <- readr::read_csv(
  country_metadata_url,
  show_col_types = FALSE
) %>%
  select(-starts_with("Unnamed"))

indicator_metadata <- readr::read_csv(
  indicator_metadata_url,
  show_col_types = FALSE
) %>%
  select(-starts_with("Unnamed"))

knitr::kable(
  head(population_raw, 8),
  caption = "Sample of raw World Bank population data"
)

Sample of raw World Bank population data
Country Name	Country Code	Indicator Name	Indicator Code	1960	1961	1962	1963	1964	1965	1966	1967	1968	1969	1970	1971	1972	1973	1974	1975	1976	1977	1978	1979	1980	1981	1982	1983	1984	1985	1986	1987	1988	1989	1990	1991	1992	1993	1994	1995	1996	1997	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023	2024	2025	…71
Aruba	ABW	Population, total	SP.POP.TOTL	54922	55578	56320	57002	57619	58190	58694	58990	59069	59052	58950	58781	58047	58299	58349	58295	58368	58580	58776	59191	59909	60563	61276	62228	62901	61728	59931	59159	59331	60443	62753	65896	69005	73685	77595	79805	83021	86301	88451	89659	90588	91439	92074	93128	95138	97635	99405	100150	100917	101604	101838	102591	104110	105675	106807	107906	108727	108735	108908	109203	108587	107700	107310	107359	107995	NA	NA
Africa Eastern and Southern	AFE	Population, total	SP.POP.TOTL	130075728	133534923	137171659	140945536	144904094	149033472	153281203	157704381	162329396	167088245	171984985	177022314	182126556	187524135	193186642	198914573	204802976	210680842	217074286	223974122	230792729	238043099	245822010	253644643	261458202	269450407	277621771	286067346	294498625	302939121	311748681	320442961	329082707	338324002	347441809	356580375	366138524	375646235	385505757	395750933	406156661	416807868	427820358	439173286	450928044	463076637	475606210	488580707	502070763	516003448	530308387	544737983	559609961	575202699	590968990	607123269	623369401	640058741	657801085	675950189	694446100	713090928	731821393	750491370	769280888	NA	NA
Afghanistan	AFG	Population, total	SP.POP.TOTL	9035043	9214083	9404406	9604487	9814318	10036008	10266395	10505959	10756922	11017409	11290128	11567667	11853696	12157999	12469127	12773954	13059851	13340756	13611441	13655567	13169311	11937581	10991378	10917982	11190221	11426852	11420074	11387818	11523298	11874088	12045660	12238879	13278974	14943172	16250794	17065836	17763266	18452091	19159996	19887785	20130327	20284307	21378117	22733049	23560654	24404567	25424094	25909852	26482622	27466101	28284089	29347708	30560034	31622704	32792523	33831764	34700612	35688935	36743039	37856121	39068979	40000412	40578842	41454761	42647492	NA	NA
Africa Western and Central	AFW	Population, total	SP.POP.TOTL	97630925	99706674	101854756	104089175	106388440	108772632	111246953	113795019	116444636	119203521	122086536	125072948	128176494	131449942	134911581	138569918	142337272	146258576	150402616	154721711	159166518	163762473	168585118	173255157	177880746	182811038	187889141	193104347	198485027	204062274	209566031	215178709	221191375	227246778	233360104	239801875	246415446	253207584	260297834	267506298	274968446	282780717	290841795	299142845	307725100	316588476	325663158	334984176	344586109	354343844	364358270	374790143	385360349	396030207	406992047	418127845	429454743	440882906	452195915	463365429	474569351	485920997	497387180	509398589	521764076	NA	NA
Angola	AGO	Population, total	SP.POP.TOTL	5231654	5301583	5354310	5408320	5464187	5521981	5581386	5641807	5702699	5763685	5852788	5991102	6174262	6388528	6613367	6842947	7074664	7317829	7576734	7847207	8133872	8435607	8751648	9082983	9425917	9779120	10139450	10497858	10861291	11238562	11626360	12023529	12423712	12827135	13249764	13699778	14170973	14660413	15159370	15667235	16194869	16747208	17327699	17943712	18600423	19291161	20015279	20778561	21578655	22414773	23294825	24218352	25177394	26165620	27160769	28157798	29183070	30234839	31297155	32375632	33451132	34532429	35635029	36749906	37885849	NA	NA
Albania	ALB	Population, total	SP.POP.TOTL	1608800	1659800	1711319	1762621	1814135	1864791	1914573	1965598	2022272	2081695	2135479	2187853	2243126	2296752	2350124	2404831	2458526	2513546	2566266	2617832	2671997	2726056	2784278	2843960	2904429	2964762	3022635	3083605	3142336	3227943	3286542	3266790	3247039	3227287	3207536	3187784	3168033	3148281	3128530	3108778	3089027	3060173	3051010	3039616	3026939	3011487	2992547	2970017	2947314	2927519	2913021	2905195	2860708	2816902	2773767	2731293	2689469	2648285	2607733	2567801	2528480	2489762	2451636	2414095	2377128	NA	NA
Andorra	AND	Population, total	SP.POP.TOTL	9510	10283	11086	11915	12764	13634	14626	15837	17176	18555	19977	21442	22957	24523	26130	27773	29436	31097	32735	34306	35782	37169	38764	40565	42273	43825	45734	47653	49416	50620	52597	56667	60200	63272	64612	63912	63984	64700	65382	65710	65685	65852	66506	69486	74325	77421	79585	81877	83495	83888	80706	77783	76834	75194	73737	72174	72181	73763	75162	76474	77380	78364	79705	80856	81938	NA	NA
Arab World	ARB	Population, total	SP.POP.TOTL	91540853	93931683	96428599	99038509	101729760	104494008	107403055	110491709	113712171	117053219	120469263	123950810	127535560	131335660	135282861	139495837	144014135	148599336	153568663	159054118	164657321	170015485	175370356	180973288	186733957	192563243	198509771	204524595	210269696	215969068	223184294	228402316	234321565	240926205	247626514	253952538	260229121	266565183	273094199	279709634	286548096	293580252	300691509	307815105	315060277	322628451	330526310	338947976	347684619	356386673	363568534	372054934	381030177	390954281	400231008	410190679	419808341	428315886	435998060	444281315	453723239	460646603	471352066	482105978	492612632	NA	NA

8 8. Data Cleaning and Transformation

8.1 8.1 Clean Earthquake Records

This section converts timestamps into useful date fields and creates magnitude and depth categories. These transformations support both descriptive analysis and later exposure scoring.

earthquakes_clean <- earthquakes_raw %>%
  filter(
    !is.na(magnitude),
    !is.na(latitude),
    !is.na(longitude),
    !is.na(depth_km)
  ) %>%
  mutate(
    event_date = as.Date(event_time_utc),
    event_year = lubridate::year(event_time_utc),
    event_month = lubridate::month(event_time_utc, label = TRUE, abbr = TRUE),
    magnitude_category = case_when(
      magnitude < 5.0 ~ "Light: 4.5-4.9",
      magnitude < 6.0 ~ "Moderate: 5.0-5.9",
      magnitude < 7.0 ~ "Strong: 6.0-6.9",
      TRUE ~ "Major/Great: 7.0+"
    ),
    magnitude_category = factor(
      magnitude_category,
      levels = c(
        "Light: 4.5-4.9",
        "Moderate: 5.0-5.9",
        "Strong: 6.0-6.9",
        "Major/Great: 7.0+"
      )
    ),
    depth_category = case_when(
      depth_km < 70 ~ "Shallow: <70 km",
      depth_km < 300 ~ "Intermediate: 70-299 km",
      TRUE ~ "Deep: 300+ km"
    ),
    depth_category = factor(
      depth_category,
      levels = c("Shallow: <70 km", "Intermediate: 70-299 km", "Deep: 300+ km")
    ),
    magnitude_weight = case_when(
      magnitude < 5.0 ~ 1,
      magnitude < 6.0 ~ 2,
      magnitude < 7.0 ~ 4,
      TRUE ~ 8
    ),
    depth_weight = case_when(
      depth_km < 70 ~ 1.20,
      depth_km < 300 ~ 1.00,
      TRUE ~ 0.80
    ),
    event_weight = magnitude_weight * depth_weight
  )

knitr::kable(
  earthquakes_clean %>%
    select(event_id, event_date, magnitude, magnitude_category, depth_km, depth_category, event_weight, place) %>%
    head(10),
  caption = "Cleaned earthquake records with magnitude and depth categories"
)

Cleaned earthquake records with magnitude and depth categories
event_id	event_date	magnitude	magnitude_category	depth_km	depth_category	event_weight	place
us7000rlm7	2025-12-30	4.8	Light: 4.5-4.9	35.000	Shallow: <70 km	1.2	69 km E of Yamada, Japan
us7000rr5p	2025-12-30	4.5	Light: 4.5-4.9	10.000	Shallow: <70 km	1.2	173 km NNE of Tobelo, Indonesia
us7000rllj	2025-12-30	4.8	Light: 4.5-4.9	27.735	Shallow: <70 km	1.2	117 km E of Miyako, Japan
us7000rlks	2025-12-30	4.5	Light: 4.5-4.9	10.000	Shallow: <70 km	1.2	72 km NNE of Gorontalo, Indonesia
us7000rlkk	2025-12-30	5.3	Moderate: 5.0-5.9	553.819	Deep: 300+ km	1.6	26 km SSE of Boyuibe, Bolivia
us7000rr5f	2025-12-30	4.5	Light: 4.5-4.9	80.527	Intermediate: 70-299 km	1.0	75 km ESE of Pondaguitan, Philippines
us7000rlim	2025-12-30	4.6	Light: 4.5-4.9	10.182	Shallow: <70 km	1.2	60 km SSW of Lhokseumawe, Indonesia
us7000rr5c	2025-12-30	4.6	Light: 4.5-4.9	10.000	Shallow: <70 km	1.2	25 km NE of Hingatungan, Philippines
us7000rli5	2025-12-30	5.1	Moderate: 5.0-5.9	10.000	Shallow: <70 km	2.4	87 km SSW of Panguna, Papua New Guinea
us7000rli1	2025-12-30	4.5	Light: 4.5-4.9	536.568	Deep: 300+ km	0.8	251 km E of Levuka, Fiji

8.2 8.2 Convert Earthquake Points to Spatial Data

To connect earthquakes with countries, the latitude and longitude values are converted into spatial point features. Natural Earth country boundaries are then used to assign country and continent information to each event.

world_countries <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") %>%
  sf::st_make_valid() %>%
  select(
    country = admin,
    iso_a3,
    continent,
    geometry
  )

earthquake_points <- earthquakes_clean %>%
  sf::st_as_sf(
    coords = c("longitude", "latitude"),
    crs = 4326,
    remove = FALSE
  )

earthquakes_with_country <- sf::st_join(
  earthquake_points,
  world_countries,
  join = sf::st_within,
  left = TRUE
) %>%
  sf::st_drop_geometry()

country_match_summary <- earthquakes_with_country %>%
  summarise(
    total_events = n(),
    matched_to_country = sum(!is.na(iso_a3)),
    unmatched_or_ocean_events = sum(is.na(iso_a3)),
    matched_percent = matched_to_country / total_events
  )

knitr::kable(
  country_match_summary,
  digits = 3,
  caption = "Summary of earthquake events matched to country boundaries"
)

Summary of earthquake events matched to country boundaries
total_events	matched_to_country	unmatched_or_ocean_events	matched_percent
8546	1487	7059	0.174

Some earthquake events occur offshore or in oceanic regions, so they may not fall within a country boundary. These events are retained for global descriptive analysis but excluded from country-level population joins.

8.3 8.3 Clean and Reshape World Bank Population Data

The World Bank population dataset is originally wide, with one column for each year. This section transforms it into a tidy long format and selects the latest available population year.

population_long <- population_raw %>%
  select(
    country_name_wb = `Country Name`,
    iso_a3 = `Country Code`,
    indicator_name = `Indicator Name`,
    indicator_code = `Indicator Code`,
    matches("^\\d{4}$")
  ) %>%
  pivot_longer(
    cols = matches("^\\d{4}$"),
    names_to = "population_year",
    values_to = "population"
  ) %>%
  mutate(
    population_year = as.integer(population_year),
    population = as.numeric(population)
  )

latest_population_year <- population_long %>%
  filter(!is.na(population)) %>%
  summarise(latest_year = max(population_year)) %>%
  pull(latest_year)

population_latest <- population_long %>%
  filter(population_year == latest_population_year) %>%
  filter(!is.na(population)) %>%
  select(iso_a3, country_name_wb, population_year, population)

country_metadata_clean <- country_metadata %>%
  transmute(
    iso_a3 = `Country Code`,
    wb_region = Region,
    income_group = IncomeGroup,
    table_name = TableName
  )

population_latest <- population_latest %>%
  left_join(country_metadata_clean, by = "iso_a3") %>%
  filter(!is.na(wb_region), wb_region != "Aggregates")

knitr::kable(
  head(population_latest, 10),
  caption = paste("Cleaned World Bank population data for latest available year:", latest_population_year)
)

Cleaned World Bank population data for latest available year: 2024
iso_a3	country_name_wb	population_year	population	wb_region	income_group	table_name
ABW	Aruba	2024	107995	Latin America & Caribbean	High income	Aruba
AFG	Afghanistan	2024	42647492	Middle East & North Africa	Low income	Afghanistan
AGO	Angola	2024	37885849	Sub-Saharan Africa	Lower middle income	Angola
ALB	Albania	2024	2377128	Europe & Central Asia	Upper middle income	Albania
AND	Andorra	2024	81938	Europe & Central Asia	High income	Andorra
ARE	United Arab Emirates	2024	10986400	Middle East & North Africa	High income	United Arab Emirates
ARG	Argentina	2024	45696159	Latin America & Caribbean	Upper middle income	Argentina
ARM	Armenia	2024	3033500	Europe & Central Asia	Upper middle income	Armenia
ASM	American Samoa	2024	46765	East Asia & Pacific	High income	American Samoa
ATG	Antigua and Barbuda	2024	93772	Latin America & Caribbean	High income	Antigua and Barbuda

This transformation satisfies the requirement for data reshaping because the World Bank population data is converted from a wide year-based format into a tidy long format before the latest year is selected.

9 9. Joined Country-Level Analysis Dataset

This section aggregates earthquake activity by country and joins it with population data. Several exposure metrics are created, including earthquakes per million people, weighted activity per million people, and a custom exposure index.

earthquake_country_summary <- earthquakes_with_country %>%
  filter(!is.na(iso_a3)) %>%
  group_by(iso_a3, country, continent) %>%
  summarise(
    earthquake_count = n(),
    avg_magnitude = mean(magnitude, na.rm = TRUE),
    max_magnitude = max(magnitude, na.rm = TRUE),
    avg_depth_km = mean(depth_km, na.rm = TRUE),
    shallow_event_count = sum(depth_category == "Shallow: <70 km", na.rm = TRUE),
    strong_or_major_count = sum(magnitude >= 6.0, na.rm = TRUE),
    major_event_count = sum(magnitude >= 7.0, na.rm = TRUE),
    weighted_seismic_activity = sum(event_weight, na.rm = TRUE),
    .groups = "drop"
  )

country_exposure <- earthquake_country_summary %>%
  left_join(population_latest, by = "iso_a3") %>%
  filter(!is.na(population), population > 0) %>%
  mutate(
    population_millions = population / 1000000,
    earthquakes_per_million = earthquake_count / population_millions,
    weighted_activity_per_million = weighted_seismic_activity / population_millions,
    population_exposure_factor = log10(population),
    raw_exposure_index = weighted_seismic_activity * population_exposure_factor,
    exposure_score_0_100 = scales::rescale(raw_exposure_index, to = c(0, 100))
  ) %>%
  arrange(desc(exposure_score_0_100))

knitr::kable(
  country_exposure %>%
    select(
      country, continent, population_year, population_millions, earthquake_count,
      max_magnitude, weighted_seismic_activity, earthquakes_per_million,
      exposure_score_0_100
    ) %>%
    head(15),
  digits = 2,
  caption = "Top 15 countries by custom earthquake exposure score"
)

Top 15 countries by custom earthquake exposure score
country	continent	population_year	population_millions	earthquake_count	max_magnitude	weighted_seismic_activity	earthquakes_per_million	exposure_score_0_100
China	Asia	2024	1408.97	146	7.1	217.4	0.10	100.00
Indonesia	Asia	2024	283.49	151	6.4	229.0	0.53	97.31
Ethiopia	Africa	2024	132.06	110	5.9	157.2	0.83	64.09
Papua New Guinea	Oceania	2024	10.58	100	6.6	146.2	9.45	51.51
Chile	South America	2024	19.76	95	6.4	128.4	4.81	46.96
Myanmar	Asia	2024	54.50	64	7.7	98.0	1.17	37.96
Argentina	South America	2024	45.70	87	5.9	98.0	1.90	37.58
Philippines	Asia	2024	115.84	64	6.0	91.2	0.55	36.81
Afghanistan	Asia	2024	42.65	50	6.2	82.0	1.17	31.28
Peru	South America	2024	34.22	63	5.9	76.8	1.84	28.91
Iran	Asia	2024	91.57	50	5.6	71.6	0.55	28.47
Russia	Europe	2024	143.53	40	5.7	58.4	0.28	23.75
United States of America	North America	2024	340.11	27	7.0	53.4	0.08	22.70
Japan	Asia	2024	123.98	45	5.3	54.2	0.36	21.85
Turkey	Asia	2024	85.52	33	6.1	53.2	0.39	21.01

The custom exposure score combines weighted seismic activity with a population exposure factor. The score is scaled from 0 to 100 so countries can be compared more easily. This is not a measure of actual damage or loss; it is an analytical index designed to compare relative exposure based on available data.

10 10. Exploratory Data Analysis

10.1 10.1 Overall Earthquake Summary

overall_summary <- earthquakes_clean %>%
  summarise(
    study_start = min(event_date),
    study_end = max(event_date),
    total_events = n(),
    average_magnitude = mean(magnitude),
    median_magnitude = median(magnitude),
    maximum_magnitude = max(magnitude),
    average_depth_km = mean(depth_km),
    median_depth_km = median(depth_km)
  )

knitr::kable(
  overall_summary,
  digits = 2,
  caption = "Overall earthquake summary for the study period"
)

Overall earthquake summary for the study period
study_start	study_end	total_events	average_magnitude	median_magnitude	maximum_magnitude	average_depth_km	median_depth_km
2025-01-01	2025-12-30	8546	4.81	4.7	8.8	53.27	10

10.2 10.2 Earthquakes by Magnitude Category

magnitude_category_summary <- earthquakes_clean %>%
  count(magnitude_category, name = "event_count") %>%
  mutate(percent = event_count / sum(event_count))

knitr::kable(
  magnitude_category_summary,
  digits = 3,
  caption = "Earthquake count by magnitude category"
)

Earthquake count by magnitude category
magnitude_category	event_count	percent
Light: 4.5-4.9	6420	0.751
Moderate: 5.0-5.9	1982	0.232
Strong: 6.0-6.9	128	0.015
Major/Great: 7.0+	16	0.002

11 11. Visualizations Describing the Data

11.1 11.1 Magnitude Distribution

ggplot(earthquakes_clean, aes(x = magnitude)) +
  geom_histogram(binwidth = 0.1, boundary = 0, color = "white") +
  labs(
    title = "Distribution of Earthquake Magnitudes",
    subtitle = paste0("USGS earthquakes from ", study_start, " to ", study_end, ", magnitude ", minimum_magnitude, "+"),
    x = "Magnitude",
    y = "Number of Earthquakes"
  ) +
  theme_minimal()

This plot describes the overall magnitude distribution and helps validate that most events in the study period are in the lower end of the selected magnitude range.

11.2 11.2 Earthquake Count by Magnitude Category

ggplot(magnitude_category_summary, aes(x = magnitude_category, y = event_count)) +
  geom_col() +
  geom_text(aes(label = event_count), vjust = -0.3, size = 3.5) +
  labs(
    title = "Earthquake Events by Magnitude Category",
    x = "Magnitude Category",
    y = "Number of Earthquakes"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 25, hjust = 1))

11.3 11.3 Magnitude and Depth Relationship

ggplot(earthquakes_clean, aes(x = depth_km, y = magnitude)) +
  geom_point(alpha = 0.45) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relationship Between Earthquake Depth and Magnitude",
    x = "Depth (km)",
    y = "Magnitude"
  ) +
  theme_minimal()

This scatterplot explores whether deeper earthquakes tend to have different magnitudes than shallower events. The trend line provides a simple visual check before formal statistical analysis.

11.4 11.4 Top Countries by Earthquake Count

top_countries_by_count <- country_exposure %>%
  slice_max(earthquake_count, n = 15) %>%
  mutate(country = fct_reorder(country, earthquake_count))

ggplot(top_countries_by_count, aes(x = earthquake_count, y = country)) +
  geom_col() +
  labs(
    title = "Top Countries by Earthquake Event Count",
    subtitle = "Country-level events matched using earthquake coordinates and country boundaries",
    x = "Number of Earthquakes",
    y = "Country"
  ) +
  theme_minimal()

This plot shows where earthquakes occurred most often, but it does not yet account for population exposure.

12 12. Visualizations Supporting the Conclusion

12.1 12.1 Top Countries by Custom Exposure Score

top_countries_by_exposure <- country_exposure %>%
  slice_max(exposure_score_0_100, n = 15) %>%
  mutate(country = fct_reorder(country, exposure_score_0_100))

ggplot(top_countries_by_exposure, aes(x = exposure_score_0_100, y = country)) +
  geom_col() +
  labs(
    title = "Top Countries by Earthquake Exposure Score",
    subtitle = "Score combines weighted seismic activity with a population exposure factor",
    x = "Exposure Score (0-100)",
    y = "Country"
  ) +
  theme_minimal()

This graphic supports the main conclusion because it ranks countries using both seismic activity and population exposure, not event count alone.

12.2 12.2 Population and Weighted Seismic Activity

country_exposure %>%
  filter(earthquake_count >= 3) %>%
  ggplot(aes(x = population_millions, y = weighted_seismic_activity)) +
  geom_point(aes(size = exposure_score_0_100), alpha = 0.65) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_log10(labels = scales::comma) +
  labs(
    title = "Population Size and Weighted Seismic Activity",
    subtitle = "Countries with more events and larger populations can rank higher in exposure-based analysis",
    x = "Population in Millions (log scale)",
    y = "Weighted Seismic Activity",
    size = "Exposure Score"
  ) +
  theme_minimal()

This scatterplot compares population size with weighted seismic activity. It helps show why exposure analysis can differ from a simple ranking by event count.

12.3 12.3 Earthquakes per Million People

top_per_million <- country_exposure %>%
  filter(population_millions >= 1) %>%
  slice_max(earthquakes_per_million, n = 15) %>%
  mutate(country = fct_reorder(country, earthquakes_per_million))

ggplot(top_per_million, aes(x = earthquakes_per_million, y = country)) +
  geom_col() +
  labs(
    title = "Top Countries by Earthquakes per Million People",
    subtitle = "Population-adjusted rate using countries with at least 1 million people",
    x = "Earthquakes per Million People",
    y = "Country"
  ) +
  theme_minimal()

This chart provides a population-adjusted view of earthquake activity. It is useful because countries with smaller populations can appear very different after adjustment.

13 13. Feature Beyond Class: Static and Interactive Geospatial Mapping

This project includes both a static world map and an interactive leaflet map as geospatial features beyond the standard course workflow. The static map provides a reliable visualization of global earthquake locations in the rendered report, while the interactive leaflet map allows the user to explore individual earthquake events by clicking on markers. Each marker includes event-level information such as magnitude, depth, date, place, and matched country when available.

leaflet_data <- earthquakes_with_country %>%
  filter(!is.na(latitude), !is.na(longitude)) %>%
  mutate(
    popup_text = paste0(
      "<strong>Magnitude:</strong> ", round(magnitude, 1), "<br>",
      "<strong>Depth:</strong> ", round(depth_km, 1), " km<br>",
      "<strong>Date:</strong> ", event_date, "<br>",
      "<strong>Place:</strong> ", htmltools::htmlEscape(place), "<br>",
      "<strong>Country:</strong> ",
      ifelse(is.na(country), "Offshore/Unmatched", htmltools::htmlEscape(country))
    )
  )
world_map <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")

ggplot() +
  geom_sf(data = world_map, fill = "gray95", color = "gray70", linewidth = 0.2) +
  geom_point(
    data = leaflet_data,
    aes(x = longitude, y = latitude, size = magnitude),
    alpha = 0.45
  ) +
  scale_size_continuous(range = c(1, 5)) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Global Earthquake Locations",
    subtitle = paste0("USGS earthquakes from ", study_start, " to ", study_end, ", magnitude ", minimum_magnitude, "+"),
    x = "Longitude",
    y = "Latitude",
    size = "Magnitude"
  ) +
  theme_minimal()

leaflet(leaflet_data, height = 600) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(
    lng = ~longitude,
    lat = ~latitude,
    radius = ~scales::rescale(magnitude, to = c(3, 9)),
    stroke = FALSE,
    fillOpacity = 0.55,
    popup = ~popup_text,
    label = ~paste0("M", round(magnitude, 1), " - ", place)
  ) %>%
  addLegend(
    position = "bottomright",
    title = "Earthquakes M4.5+",
    labels = c("Circle size increases with magnitude"),
    colors = "gray40"
  )

Together, the static and interactive maps support the geographic interpretation of the project by showing that earthquake activity is spatially clustered rather than evenly distributed across the world. The static map ensures that the location pattern is visible in the published report, while the interactive map provides an additional way to inspect event-level details.

14 14. Statistical Analysis

14.1 14.1 Correlation Analysis

This section evaluates relationships among population size, earthquake count, weighted seismic activity, and the custom exposure score.

correlation_data <- country_exposure %>%
  select(
    population_millions,
    earthquake_count,
    avg_magnitude,
    max_magnitude,
    avg_depth_km,
    weighted_seismic_activity,
    earthquakes_per_million,
    exposure_score_0_100
  )

correlation_results <- tibble(
  comparison = c(
    "Population vs Earthquake Count",
    "Population vs Weighted Seismic Activity",
    "Earthquake Count vs Exposure Score",
    "Weighted Seismic Activity vs Exposure Score",
    "Earthquakes per Million vs Exposure Score"
  ),
  correlation = c(
    cor(correlation_data$population_millions, correlation_data$earthquake_count, use = "complete.obs"),
    cor(correlation_data$population_millions, correlation_data$weighted_seismic_activity, use = "complete.obs"),
    cor(correlation_data$earthquake_count, correlation_data$exposure_score_0_100, use = "complete.obs"),
    cor(correlation_data$weighted_seismic_activity, correlation_data$exposure_score_0_100, use = "complete.obs"),
    cor(correlation_data$earthquakes_per_million, correlation_data$exposure_score_0_100, use = "complete.obs")
  )
)

knitr::kable(
  correlation_results,
  digits = 3,
  caption = "Correlation analysis for earthquake exposure variables"
)

Correlation analysis for earthquake exposure variables
comparison	correlation
Population vs Earthquake Count	0.363
Population vs Weighted Seismic Activity	0.387
Earthquake Count vs Exposure Score	0.986
Weighted Seismic Activity vs Exposure Score	0.995
Earthquakes per Million vs Exposure Score	-0.062

The correlation analysis supports the project by testing whether the exposure score is more closely aligned with raw event frequency, weighted seismic activity, population-adjusted rates, or population size.

14.2 14.2 Regression Model

This regression model evaluates how population size, earthquake frequency, average magnitude, and average depth relate to the custom exposure index. The model uses a log transformation because exposure values are right-skewed.

regression_data <- country_exposure %>%
  filter(
    earthquake_count >= 2,
    population > 0,
    raw_exposure_index > 0
  ) %>%
  mutate(
    log_exposure_index = log1p(raw_exposure_index),
    log_population = log10(population)
  )

exposure_model <- lm(
  log_exposure_index ~ log_population + earthquake_count + avg_magnitude + avg_depth_km,
  data = regression_data
)

model_summary <- broom::tidy(exposure_model)
model_fit <- broom::glance(exposure_model)

knitr::kable(
  model_summary,
  digits = 4,
  caption = "Regression model coefficients for log exposure index"
)

Regression model coefficients for log exposure index
term	estimate	std.error	statistic	p.value
(Intercept)	-5.0216	3.5261	-1.4241	0.1609
log_population	0.2340	0.1233	1.8978	0.0638
earthquake_count	0.0295	0.0027	10.7404	0.0000
avg_magnitude	1.5182	0.6708	2.2633	0.0282
avg_depth_km	0.0013	0.0018	0.7294	0.4693

knitr::kable(
  model_fit %>% select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs),
  digits = 4,
  caption = "Regression model fit statistics"
)

Regression model fit statistics
r.squared	adj.r.squared	sigma	statistic	p.value	df	nobs
0.7845	0.7666	0.6602	43.6883	0	4	53

The regression model provides statistical support for the conclusion by evaluating whether population and earthquake activity variables are associated with the exposure index.

14.3 14.3 Comparing Event Count Ranking vs Exposure Score Ranking

A key goal of the project is to show whether exposure analysis changes interpretation compared with simple earthquake counts. This section compares country rankings by earthquake count and by exposure score.

ranking_comparison <- country_exposure %>%
  mutate(
    count_rank = min_rank(desc(earthquake_count)),
    exposure_rank = min_rank(desc(exposure_score_0_100)),
    rank_change = count_rank - exposure_rank
  ) %>%
  arrange(exposure_rank) %>%
  select(
    country,
    continent,
    population_millions,
    earthquake_count,
    count_rank,
    exposure_score_0_100,
    exposure_rank,
    rank_change
  )

knitr::kable(
  ranking_comparison %>% head(20),
  digits = 2,
  caption = "Comparison of earthquake count ranking and exposure score ranking"
)

Comparison of earthquake count ranking and exposure score ranking
country	continent	population_millions	earthquake_count	count_rank	exposure_score_0_100	exposure_rank	rank_change
China	Asia	1408.97	146	2	100.00	1	1
Indonesia	Asia	283.49	151	1	97.31	2	-1
Ethiopia	Africa	132.06	110	3	64.09	3	0
Papua New Guinea	Oceania	10.58	100	4	51.51	4	0
Chile	South America	19.76	95	5	46.96	5	0
Myanmar	Asia	54.50	64	7	37.96	6	1
Argentina	South America	45.70	87	6	37.58	7	-1
Philippines	Asia	115.84	64	7	36.81	8	-1
Afghanistan	Asia	42.65	50	10	31.28	9	1
Peru	South America	34.22	63	9	28.91	10	-1
Iran	Asia	91.57	50	10	28.47	11	-1
Russia	Europe	143.53	40	13	23.75	12	1
United States of America	North America	340.11	27	15	22.70	13	2
Japan	Asia	123.98	45	12	21.85	14	-2
Turkey	Asia	85.52	33	14	21.01	15	-1
Mexico	North America	130.86	25	17	15.28	16	1
Colombia	South America	52.89	26	16	12.97	17	-1
Ecuador	South America	18.14	24	19	12.76	18	1
Canada	North America	41.29	19	20	10.79	19	1
Pakistan	Asia	251.27	19	20	10.41	20	0

A positive rank change means a country ranks higher under the exposure score than under simple earthquake count. This helps demonstrate how population exposure can change the interpretation of earthquake risk.

15 15. Main Findings

top_count_country <- country_exposure %>%
  slice_max(earthquake_count, n = 1, with_ties = FALSE)

top_exposure_country <- country_exposure %>%
  slice_max(exposure_score_0_100, n = 1, with_ties = FALSE)

top_per_million_country <- country_exposure %>%
  filter(population_millions >= 1) %>%
  slice_max(earthquakes_per_million, n = 1, with_ties = FALSE)

summary_findings <- tibble(
  finding = c(
    "Highest earthquake count",
    "Highest custom exposure score",
    "Highest earthquakes per million people among countries with at least 1 million people"
  ),
  country = c(
    top_count_country$country,
    top_exposure_country$country,
    top_per_million_country$country
  ),
  value = c(
    round(top_count_country$earthquake_count, 2),
    round(top_exposure_country$exposure_score_0_100, 2),
    round(top_per_million_country$earthquakes_per_million, 2)
  )
)

knitr::kable(
  summary_findings,
  caption = "Summary of main country-level findings"
)

Summary of main country-level findings
finding	country	value
Highest earthquake count	Indonesia	151.00
Highest custom exposure score	China	100.00
Highest earthquakes per million people among countries with at least 1 million people	Papua New Guinea	9.45

cat(
  "The country with the highest earthquake count in this study period was",
  top_count_country$country,
  "with",
  top_count_country$earthquake_count,
  "matched earthquake events. The country with the highest custom exposure score was",
  top_exposure_country$country,
  "with an exposure score of",
  round(top_exposure_country$exposure_score_0_100, 2),
  ". The highest earthquakes-per-million rate among countries with at least 1 million people was observed in",
  top_per_million_country$country,
  "with",
  round(top_per_million_country$earthquakes_per_million, 2),
  "earthquakes per million people. These differences show why population-adjusted and exposure-weighted analysis can change the interpretation of earthquake risk."
)

The country with the highest earthquake count in this study period was Indonesia with 151 matched earthquake events. The country with the highest custom exposure score was China with an exposure score of 100 . The highest earthquakes-per-million rate among countries with at least 1 million people was observed in Papua New Guinea with 9.45 earthquakes per million people. These differences show why population-adjusted and exposure-weighted analysis can change the interpretation of earthquake risk.

The main finding is that earthquake risk interpretation changes depending on the metric used. Countries with the largest number of earthquakes are not always the same as countries with the highest population-adjusted rates or the highest exposure score. This supports the motivation for analyzing earthquake risk beyond magnitude and event count alone.

16 16. Challenges Encountered

One major challenge in this project was connecting earthquake events to countries. The USGS data provides latitude and longitude coordinates and a text place description, but the place field is not always a clean country name. Some events occur offshore, near islands, or in oceanic regions. To address this, the project used geospatial country boundaries from Natural Earth and assigned country information using a spatial join. Events that did not fall within a country boundary were treated as offshore or unmatched events and excluded from the country-population join.

A second challenge was the structure of the World Bank population file. The raw CSV contains metadata rows before the table and stores years as separate columns. To solve this, the project skipped the metadata rows and transformed the population data from wide to long format before selecting the latest available year.

17 17. Limitations

This project uses country-level population as a broad exposure measure. This is useful for comparing countries, but it does not show the exact number of people near each earthquake epicenter. A more advanced version of this project could use gridded population density data to estimate people living within a specific radius of each earthquake.

The custom exposure score is also an analytical index rather than a direct measure of damage, deaths, or economic loss. Actual earthquake impact depends on many factors not included here, such as building quality, emergency preparedness, time of day, local soil conditions, tsunami risk, and distance from population centers.

Finally, offshore earthquakes are important for tsunami risk, but many are not assigned to a country through the spatial join. This analysis focuses mainly on country-level seismic exposure and population context.

18 18. Conclusion

This project shows that earthquake risk should be analyzed beyond magnitude alone. By combining USGS earthquake activity with World Bank population data, the analysis provides multiple ways to compare seismic exposure: raw event counts, population-adjusted rates, weighted seismic activity, and a custom exposure score.

The results demonstrate that the interpretation of risk can change depending on whether population exposure is included. A country with many earthquakes may not have the highest population-adjusted exposure, while a highly populated country with fewer but stronger events may rank higher using an exposure-based score. This supports the main conclusion that population exposure adds important context to earthquake risk analysis.

19 19. Reproducibility Notes

This project is designed to be reproducible:

The USGS earthquake data is acquired from a public API/GeoJSON query.
The World Bank population CSV files are stored in the project GitHub repository.
The World Bank data is loaded using raw GitHub URLs rather than local file paths.
All cleaning, transformation, analysis, and visualization steps are performed in this Quarto document.
The project avoids references to local OneDrive, Desktop, or Downloads paths.

20 20. References and Data Source Links

USGS Earthquake Catalog API: https://earthquake.usgs.gov/fdsnws/event/1/
World Bank Population, total indicator (SP.POP.TOTL): https://data.worldbank.org/indicator/SP.POP.TOTL
Natural Earth country boundary data accessed through the rnaturalearth R package
R packages used: tidyverse, jsonlite, lubridate, scales, sf, rnaturalearth, leaflet, broom, knitr, htmltools, and viridis