library(tidyverse)
library(jsonlite)
library(lubridate)
library(scales)
library(sf)
library(rnaturalearth)
library(leaflet)
library(broom)
library(knitr)
library(htmltools)
library(viridis)
options(scipen = 999)Final Project DATA607 - Beyond Magnitude: Analyzing Earthquake Risk Through Seismic Activity and Population Exposure
1 1. Executive Summary
This project analyzes earthquake risk by combining two different types of data sources: earthquake event data from the USGS Earthquake Catalog API/GeoJSON source and country-level population data from the World Bank Population, total (SP.POP.TOTL) CSV dataset. The main idea is that earthquake risk should not be interpreted only through magnitude or event counts. A more useful risk view should also consider how many people may be exposed in affected countries or regions.
The workflow follows a reproducible data science process: acquire data, clean and transform the raw records, create useful features, join earthquake activity with population exposure, conduct exploratory and statistical analysis, build a custom exposure scoring system, and present findings through visualizations and interpretation. The project uses geospatial mapping with leaflet and a custom earthquake exposure score as features beyond the standard class workflow.
2 2. Introduction and Motivation
Earthquakes are often discussed in terms of magnitude, but magnitude alone does not fully describe risk. A strong earthquake in a remote area may expose fewer people than a moderate earthquake near a highly populated region. For this reason, this project combines seismic activity with population data to examine earthquake exposure from a broader data science perspective.
The goal is to answer the following question:
Which countries or regions appear most exposed when earthquake activity is combined with population size, and does this change how earthquake risk is interpreted compared with magnitude or event count alone?
3 3. Research Questions
This analysis is guided by four research questions:
- Which countries or regions experienced the highest number of earthquakes during the study period?
- Which countries or regions had the highest earthquake activity after adjusting for population size?
- How do magnitude, depth, event frequency, and population combine into an exposure-based risk ranking?
- Does the custom exposure score provide a different interpretation than earthquake count alone?
4 4. Data Sources
4.1 4.1 USGS Earthquake Catalog API
The first data source is the USGS Earthquake Catalog API. The project uses a GeoJSON query to retrieve earthquake events with information such as event time, magnitude, depth, latitude, longitude, place description, tsunami flag, and event significance.
For this project, the study period is limited to one full year to keep the scope manageable and reproducible. The analysis uses earthquakes with magnitude 4.5 or greater because these events are more likely to be meaningful for regional exposure analysis and reduce noise from very small events.
4.2 4.2 World Bank Population CSV
The second data source is the World Bank Population, total indicator (SP.POP.TOTL). The raw CSV file is stored in the GitHub repository and loaded through a raw GitHub URL. The World Bank dataset contains yearly population values by country.
4.3 4.3 Why These Sources Work Together
The USGS data captures seismic activity, while the World Bank data captures country-level population exposure. Combining these sources allows the project to compare earthquake activity alone with population-adjusted and exposure-weighted metrics.
5 5. Data Science Workflow
The project follows this workflow:
- Acquire earthquake data from the USGS API and population data from the World Bank CSV.
- Clean event records, timestamps, magnitude values, depth values, and country identifiers.
- Transform earthquake timestamps into dates, classify magnitudes and depths, reshape population data from wide to long format, and select the latest available population year.
- Join earthquake records with country boundaries and population data.
- Analyze event frequency, magnitude distribution, population-adjusted rates, and custom exposure scores.
- Visualize earthquake patterns using bar charts, scatterplots, and an interactive map.
- Conclude which countries or regions appear most exposed and discuss limitations.
6 6. Loading Libraries
7 7. Data Acquisition
7.1 7.1 Load USGS Earthquake Data
This section pulls earthquake event records from the USGS Earthquake Catalog API. The query uses GeoJSON format, a one-year study period, and a minimum magnitude of 4.5.
study_start <- "2025-01-01"
study_end <- "2025-12-31"
minimum_magnitude <- 4.5
usgs_url <- paste0(
"https://earthquake.usgs.gov/fdsnws/event/1/query?",
"format=geojson",
"&starttime=", study_start,
"&endtime=", study_end,
"&minmagnitude=", minimum_magnitude,
"&orderby=time"
)
earthquake_json <- jsonlite::fromJSON(usgs_url, flatten = TRUE)
earthquake_features <- earthquake_json$features
coordinate_tbl <- tibble(
longitude = purrr::map_dbl(earthquake_features$geometry.coordinates, 1),
latitude = purrr::map_dbl(earthquake_features$geometry.coordinates, 2),
depth_km = purrr::map_dbl(earthquake_features$geometry.coordinates, 3)
)
earthquakes_raw <- earthquake_features %>%
as_tibble() %>%
bind_cols(coordinate_tbl) %>%
transmute(
event_id = id,
magnitude = properties.mag,
place = properties.place,
event_time_utc = lubridate::as_datetime(properties.time / 1000, tz = "UTC"),
updated_time_utc = lubridate::as_datetime(properties.updated / 1000, tz = "UTC"),
tsunami = properties.tsunami,
significance = properties.sig,
status = properties.status,
alert = properties.alert,
event_url = properties.url,
longitude,
latitude,
depth_km
)
knitr::kable(
head(earthquakes_raw, 10),
caption = "Sample of raw earthquake events from the USGS API"
)| event_id | magnitude | place | event_time_utc | updated_time_utc | tsunami | significance | status | alert | event_url | longitude | latitude | depth_km |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| us7000rlm7 | 4.8 | 69 km E of Yamada, Japan | 2025-12-30 23:51:36 | 2026-04-01 22:48:43 | 0 | 354 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlm7 | 142.7562 | 39.4639 | 35.000 |
| us7000rr5p | 4.5 | 173 km NNE of Tobelo, Indonesia | 2025-12-30 22:36:09 | 2026-04-01 22:49:07 | 0 | 312 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5p | 128.8654 | 3.0448 | 10.000 |
| us7000rllj | 4.8 | 117 km E of Miyako, Japan | 2025-12-30 22:19:46 | 2026-04-01 22:48:43 | 0 | 354 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rllj | 143.3045 | 39.5984 | 27.735 |
| us7000rlks | 4.5 | 72 km NNE of Gorontalo, Indonesia | 2025-12-30 20:55:06 | 2026-04-01 22:48:42 | 0 | 312 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlks | 123.3440 | 1.1328 | 10.000 |
| us7000rlkk | 5.3 | 26 km SSE of Boyuibe, Bolivia | 2025-12-30 20:26:04 | 2026-04-01 22:48:42 | 0 | 432 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlkk | -63.1477 | -20.6323 | 553.819 |
| us7000rr5f | 4.5 | 75 km ESE of Pondaguitan, Philippines | 2025-12-30 16:45:05 | 2026-04-01 22:49:07 | 0 | 312 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5f | 126.8425 | 6.2037 | 80.527 |
| us7000rlim | 4.6 | 60 km SSW of Lhokseumawe, Indonesia | 2025-12-30 13:43:09 | 2026-04-01 22:48:40 | 0 | 326 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rlim | 96.9010 | 4.6976 | 10.182 |
| us7000rr5c | 4.6 | 25 km NE of Hingatungan, Philippines | 2025-12-30 13:40:00 | 2026-04-01 22:49:06 | 0 | 326 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rr5c | 125.3714 | 10.7262 | 10.000 |
| us7000rli5 | 5.1 | 87 km SSW of Panguna, Papua New Guinea | 2025-12-30 11:22:07 | 2026-04-01 22:48:40 | 0 | 400 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rli5 | 155.1950 | -7.0521 | 10.000 |
| us7000rli1 | 4.5 | 251 km E of Levuka, Fiji | 2025-12-30 10:58:04 | 2026-04-01 22:48:40 | 0 | 312 | reviewed | NA | https://earthquake.usgs.gov/earthquakes/eventpage/us7000rli1 | -178.3075 | -18.1477 | 536.568 |
The USGS API data is now in a tidy format with one row per earthquake event. The longitude, latitude, and depth values were extracted from the nested GeoJSON coordinate field.
7.2 7.2 Load World Bank Population Data
The World Bank file has metadata rows before the actual table begins, so the code skips the first four rows. This keeps the raw file unchanged while making the analysis reproducible.
population_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
country_metadata_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/Metadata_Country_API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
indicator_metadata_url <- "https://raw.githubusercontent.com/suffyankhan77/Final_Project-DATA-607/refs/heads/main/data/raw/Metadata_Indicator_API_SP.POP.TOTL_DS2_en_csv_v2_127039.csv"
population_raw <- readr::read_csv(
population_url,
skip = 4,
show_col_types = FALSE
) %>%
select(-starts_with("Unnamed"))
country_metadata <- readr::read_csv(
country_metadata_url,
show_col_types = FALSE
) %>%
select(-starts_with("Unnamed"))
indicator_metadata <- readr::read_csv(
indicator_metadata_url,
show_col_types = FALSE
) %>%
select(-starts_with("Unnamed"))
knitr::kable(
head(population_raw, 8),
caption = "Sample of raw World Bank population data"
)| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | 1966 | 1967 | 1968 | 1969 | 1970 | 1971 | 1972 | 1973 | 1974 | 1975 | 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | …71 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aruba | ABW | Population, total | SP.POP.TOTL | 54922 | 55578 | 56320 | 57002 | 57619 | 58190 | 58694 | 58990 | 59069 | 59052 | 58950 | 58781 | 58047 | 58299 | 58349 | 58295 | 58368 | 58580 | 58776 | 59191 | 59909 | 60563 | 61276 | 62228 | 62901 | 61728 | 59931 | 59159 | 59331 | 60443 | 62753 | 65896 | 69005 | 73685 | 77595 | 79805 | 83021 | 86301 | 88451 | 89659 | 90588 | 91439 | 92074 | 93128 | 95138 | 97635 | 99405 | 100150 | 100917 | 101604 | 101838 | 102591 | 104110 | 105675 | 106807 | 107906 | 108727 | 108735 | 108908 | 109203 | 108587 | 107700 | 107310 | 107359 | 107995 | NA | NA |
| Africa Eastern and Southern | AFE | Population, total | SP.POP.TOTL | 130075728 | 133534923 | 137171659 | 140945536 | 144904094 | 149033472 | 153281203 | 157704381 | 162329396 | 167088245 | 171984985 | 177022314 | 182126556 | 187524135 | 193186642 | 198914573 | 204802976 | 210680842 | 217074286 | 223974122 | 230792729 | 238043099 | 245822010 | 253644643 | 261458202 | 269450407 | 277621771 | 286067346 | 294498625 | 302939121 | 311748681 | 320442961 | 329082707 | 338324002 | 347441809 | 356580375 | 366138524 | 375646235 | 385505757 | 395750933 | 406156661 | 416807868 | 427820358 | 439173286 | 450928044 | 463076637 | 475606210 | 488580707 | 502070763 | 516003448 | 530308387 | 544737983 | 559609961 | 575202699 | 590968990 | 607123269 | 623369401 | 640058741 | 657801085 | 675950189 | 694446100 | 713090928 | 731821393 | 750491370 | 769280888 | NA | NA |
| Afghanistan | AFG | Population, total | SP.POP.TOTL | 9035043 | 9214083 | 9404406 | 9604487 | 9814318 | 10036008 | 10266395 | 10505959 | 10756922 | 11017409 | 11290128 | 11567667 | 11853696 | 12157999 | 12469127 | 12773954 | 13059851 | 13340756 | 13611441 | 13655567 | 13169311 | 11937581 | 10991378 | 10917982 | 11190221 | 11426852 | 11420074 | 11387818 | 11523298 | 11874088 | 12045660 | 12238879 | 13278974 | 14943172 | 16250794 | 17065836 | 17763266 | 18452091 | 19159996 | 19887785 | 20130327 | 20284307 | 21378117 | 22733049 | 23560654 | 24404567 | 25424094 | 25909852 | 26482622 | 27466101 | 28284089 | 29347708 | 30560034 | 31622704 | 32792523 | 33831764 | 34700612 | 35688935 | 36743039 | 37856121 | 39068979 | 40000412 | 40578842 | 41454761 | 42647492 | NA | NA |
| Africa Western and Central | AFW | Population, total | SP.POP.TOTL | 97630925 | 99706674 | 101854756 | 104089175 | 106388440 | 108772632 | 111246953 | 113795019 | 116444636 | 119203521 | 122086536 | 125072948 | 128176494 | 131449942 | 134911581 | 138569918 | 142337272 | 146258576 | 150402616 | 154721711 | 159166518 | 163762473 | 168585118 | 173255157 | 177880746 | 182811038 | 187889141 | 193104347 | 198485027 | 204062274 | 209566031 | 215178709 | 221191375 | 227246778 | 233360104 | 239801875 | 246415446 | 253207584 | 260297834 | 267506298 | 274968446 | 282780717 | 290841795 | 299142845 | 307725100 | 316588476 | 325663158 | 334984176 | 344586109 | 354343844 | 364358270 | 374790143 | 385360349 | 396030207 | 406992047 | 418127845 | 429454743 | 440882906 | 452195915 | 463365429 | 474569351 | 485920997 | 497387180 | 509398589 | 521764076 | NA | NA |
| Angola | AGO | Population, total | SP.POP.TOTL | 5231654 | 5301583 | 5354310 | 5408320 | 5464187 | 5521981 | 5581386 | 5641807 | 5702699 | 5763685 | 5852788 | 5991102 | 6174262 | 6388528 | 6613367 | 6842947 | 7074664 | 7317829 | 7576734 | 7847207 | 8133872 | 8435607 | 8751648 | 9082983 | 9425917 | 9779120 | 10139450 | 10497858 | 10861291 | 11238562 | 11626360 | 12023529 | 12423712 | 12827135 | 13249764 | 13699778 | 14170973 | 14660413 | 15159370 | 15667235 | 16194869 | 16747208 | 17327699 | 17943712 | 18600423 | 19291161 | 20015279 | 20778561 | 21578655 | 22414773 | 23294825 | 24218352 | 25177394 | 26165620 | 27160769 | 28157798 | 29183070 | 30234839 | 31297155 | 32375632 | 33451132 | 34532429 | 35635029 | 36749906 | 37885849 | NA | NA |
| Albania | ALB | Population, total | SP.POP.TOTL | 1608800 | 1659800 | 1711319 | 1762621 | 1814135 | 1864791 | 1914573 | 1965598 | 2022272 | 2081695 | 2135479 | 2187853 | 2243126 | 2296752 | 2350124 | 2404831 | 2458526 | 2513546 | 2566266 | 2617832 | 2671997 | 2726056 | 2784278 | 2843960 | 2904429 | 2964762 | 3022635 | 3083605 | 3142336 | 3227943 | 3286542 | 3266790 | 3247039 | 3227287 | 3207536 | 3187784 | 3168033 | 3148281 | 3128530 | 3108778 | 3089027 | 3060173 | 3051010 | 3039616 | 3026939 | 3011487 | 2992547 | 2970017 | 2947314 | 2927519 | 2913021 | 2905195 | 2860708 | 2816902 | 2773767 | 2731293 | 2689469 | 2648285 | 2607733 | 2567801 | 2528480 | 2489762 | 2451636 | 2414095 | 2377128 | NA | NA |
| Andorra | AND | Population, total | SP.POP.TOTL | 9510 | 10283 | 11086 | 11915 | 12764 | 13634 | 14626 | 15837 | 17176 | 18555 | 19977 | 21442 | 22957 | 24523 | 26130 | 27773 | 29436 | 31097 | 32735 | 34306 | 35782 | 37169 | 38764 | 40565 | 42273 | 43825 | 45734 | 47653 | 49416 | 50620 | 52597 | 56667 | 60200 | 63272 | 64612 | 63912 | 63984 | 64700 | 65382 | 65710 | 65685 | 65852 | 66506 | 69486 | 74325 | 77421 | 79585 | 81877 | 83495 | 83888 | 80706 | 77783 | 76834 | 75194 | 73737 | 72174 | 72181 | 73763 | 75162 | 76474 | 77380 | 78364 | 79705 | 80856 | 81938 | NA | NA |
| Arab World | ARB | Population, total | SP.POP.TOTL | 91540853 | 93931683 | 96428599 | 99038509 | 101729760 | 104494008 | 107403055 | 110491709 | 113712171 | 117053219 | 120469263 | 123950810 | 127535560 | 131335660 | 135282861 | 139495837 | 144014135 | 148599336 | 153568663 | 159054118 | 164657321 | 170015485 | 175370356 | 180973288 | 186733957 | 192563243 | 198509771 | 204524595 | 210269696 | 215969068 | 223184294 | 228402316 | 234321565 | 240926205 | 247626514 | 253952538 | 260229121 | 266565183 | 273094199 | 279709634 | 286548096 | 293580252 | 300691509 | 307815105 | 315060277 | 322628451 | 330526310 | 338947976 | 347684619 | 356386673 | 363568534 | 372054934 | 381030177 | 390954281 | 400231008 | 410190679 | 419808341 | 428315886 | 435998060 | 444281315 | 453723239 | 460646603 | 471352066 | 482105978 | 492612632 | NA | NA |
8 8. Data Cleaning and Transformation
8.1 8.1 Clean Earthquake Records
This section converts timestamps into useful date fields and creates magnitude and depth categories. These transformations support both descriptive analysis and later exposure scoring.
earthquakes_clean <- earthquakes_raw %>%
filter(
!is.na(magnitude),
!is.na(latitude),
!is.na(longitude),
!is.na(depth_km)
) %>%
mutate(
event_date = as.Date(event_time_utc),
event_year = lubridate::year(event_time_utc),
event_month = lubridate::month(event_time_utc, label = TRUE, abbr = TRUE),
magnitude_category = case_when(
magnitude < 5.0 ~ "Light: 4.5-4.9",
magnitude < 6.0 ~ "Moderate: 5.0-5.9",
magnitude < 7.0 ~ "Strong: 6.0-6.9",
TRUE ~ "Major/Great: 7.0+"
),
magnitude_category = factor(
magnitude_category,
levels = c(
"Light: 4.5-4.9",
"Moderate: 5.0-5.9",
"Strong: 6.0-6.9",
"Major/Great: 7.0+"
)
),
depth_category = case_when(
depth_km < 70 ~ "Shallow: <70 km",
depth_km < 300 ~ "Intermediate: 70-299 km",
TRUE ~ "Deep: 300+ km"
),
depth_category = factor(
depth_category,
levels = c("Shallow: <70 km", "Intermediate: 70-299 km", "Deep: 300+ km")
),
magnitude_weight = case_when(
magnitude < 5.0 ~ 1,
magnitude < 6.0 ~ 2,
magnitude < 7.0 ~ 4,
TRUE ~ 8
),
depth_weight = case_when(
depth_km < 70 ~ 1.20,
depth_km < 300 ~ 1.00,
TRUE ~ 0.80
),
event_weight = magnitude_weight * depth_weight
)
knitr::kable(
earthquakes_clean %>%
select(event_id, event_date, magnitude, magnitude_category, depth_km, depth_category, event_weight, place) %>%
head(10),
caption = "Cleaned earthquake records with magnitude and depth categories"
)| event_id | event_date | magnitude | magnitude_category | depth_km | depth_category | event_weight | place |
|---|---|---|---|---|---|---|---|
| us7000rlm7 | 2025-12-30 | 4.8 | Light: 4.5-4.9 | 35.000 | Shallow: <70 km | 1.2 | 69 km E of Yamada, Japan |
| us7000rr5p | 2025-12-30 | 4.5 | Light: 4.5-4.9 | 10.000 | Shallow: <70 km | 1.2 | 173 km NNE of Tobelo, Indonesia |
| us7000rllj | 2025-12-30 | 4.8 | Light: 4.5-4.9 | 27.735 | Shallow: <70 km | 1.2 | 117 km E of Miyako, Japan |
| us7000rlks | 2025-12-30 | 4.5 | Light: 4.5-4.9 | 10.000 | Shallow: <70 km | 1.2 | 72 km NNE of Gorontalo, Indonesia |
| us7000rlkk | 2025-12-30 | 5.3 | Moderate: 5.0-5.9 | 553.819 | Deep: 300+ km | 1.6 | 26 km SSE of Boyuibe, Bolivia |
| us7000rr5f | 2025-12-30 | 4.5 | Light: 4.5-4.9 | 80.527 | Intermediate: 70-299 km | 1.0 | 75 km ESE of Pondaguitan, Philippines |
| us7000rlim | 2025-12-30 | 4.6 | Light: 4.5-4.9 | 10.182 | Shallow: <70 km | 1.2 | 60 km SSW of Lhokseumawe, Indonesia |
| us7000rr5c | 2025-12-30 | 4.6 | Light: 4.5-4.9 | 10.000 | Shallow: <70 km | 1.2 | 25 km NE of Hingatungan, Philippines |
| us7000rli5 | 2025-12-30 | 5.1 | Moderate: 5.0-5.9 | 10.000 | Shallow: <70 km | 2.4 | 87 km SSW of Panguna, Papua New Guinea |
| us7000rli1 | 2025-12-30 | 4.5 | Light: 4.5-4.9 | 536.568 | Deep: 300+ km | 0.8 | 251 km E of Levuka, Fiji |
8.2 8.2 Convert Earthquake Points to Spatial Data
To connect earthquakes with countries, the latitude and longitude values are converted into spatial point features. Natural Earth country boundaries are then used to assign country and continent information to each event.
world_countries <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") %>%
sf::st_make_valid() %>%
select(
country = admin,
iso_a3,
continent,
geometry
)
earthquake_points <- earthquakes_clean %>%
sf::st_as_sf(
coords = c("longitude", "latitude"),
crs = 4326,
remove = FALSE
)
earthquakes_with_country <- sf::st_join(
earthquake_points,
world_countries,
join = sf::st_within,
left = TRUE
) %>%
sf::st_drop_geometry()
country_match_summary <- earthquakes_with_country %>%
summarise(
total_events = n(),
matched_to_country = sum(!is.na(iso_a3)),
unmatched_or_ocean_events = sum(is.na(iso_a3)),
matched_percent = matched_to_country / total_events
)
knitr::kable(
country_match_summary,
digits = 3,
caption = "Summary of earthquake events matched to country boundaries"
)| total_events | matched_to_country | unmatched_or_ocean_events | matched_percent |
|---|---|---|---|
| 8546 | 1487 | 7059 | 0.174 |
Some earthquake events occur offshore or in oceanic regions, so they may not fall within a country boundary. These events are retained for global descriptive analysis but excluded from country-level population joins.
8.3 8.3 Clean and Reshape World Bank Population Data
The World Bank population dataset is originally wide, with one column for each year. This section transforms it into a tidy long format and selects the latest available population year.
population_long <- population_raw %>%
select(
country_name_wb = `Country Name`,
iso_a3 = `Country Code`,
indicator_name = `Indicator Name`,
indicator_code = `Indicator Code`,
matches("^\\d{4}$")
) %>%
pivot_longer(
cols = matches("^\\d{4}$"),
names_to = "population_year",
values_to = "population"
) %>%
mutate(
population_year = as.integer(population_year),
population = as.numeric(population)
)
latest_population_year <- population_long %>%
filter(!is.na(population)) %>%
summarise(latest_year = max(population_year)) %>%
pull(latest_year)
population_latest <- population_long %>%
filter(population_year == latest_population_year) %>%
filter(!is.na(population)) %>%
select(iso_a3, country_name_wb, population_year, population)
country_metadata_clean <- country_metadata %>%
transmute(
iso_a3 = `Country Code`,
wb_region = Region,
income_group = IncomeGroup,
table_name = TableName
)
population_latest <- population_latest %>%
left_join(country_metadata_clean, by = "iso_a3") %>%
filter(!is.na(wb_region), wb_region != "Aggregates")
knitr::kable(
head(population_latest, 10),
caption = paste("Cleaned World Bank population data for latest available year:", latest_population_year)
)| iso_a3 | country_name_wb | population_year | population | wb_region | income_group | table_name |
|---|---|---|---|---|---|---|
| ABW | Aruba | 2024 | 107995 | Latin America & Caribbean | High income | Aruba |
| AFG | Afghanistan | 2024 | 42647492 | Middle East & North Africa | Low income | Afghanistan |
| AGO | Angola | 2024 | 37885849 | Sub-Saharan Africa | Lower middle income | Angola |
| ALB | Albania | 2024 | 2377128 | Europe & Central Asia | Upper middle income | Albania |
| AND | Andorra | 2024 | 81938 | Europe & Central Asia | High income | Andorra |
| ARE | United Arab Emirates | 2024 | 10986400 | Middle East & North Africa | High income | United Arab Emirates |
| ARG | Argentina | 2024 | 45696159 | Latin America & Caribbean | Upper middle income | Argentina |
| ARM | Armenia | 2024 | 3033500 | Europe & Central Asia | Upper middle income | Armenia |
| ASM | American Samoa | 2024 | 46765 | East Asia & Pacific | High income | American Samoa |
| ATG | Antigua and Barbuda | 2024 | 93772 | Latin America & Caribbean | High income | Antigua and Barbuda |
This transformation satisfies the requirement for data reshaping because the World Bank population data is converted from a wide year-based format into a tidy long format before the latest year is selected.
9 9. Joined Country-Level Analysis Dataset
This section aggregates earthquake activity by country and joins it with population data. Several exposure metrics are created, including earthquakes per million people, weighted activity per million people, and a custom exposure index.
earthquake_country_summary <- earthquakes_with_country %>%
filter(!is.na(iso_a3)) %>%
group_by(iso_a3, country, continent) %>%
summarise(
earthquake_count = n(),
avg_magnitude = mean(magnitude, na.rm = TRUE),
max_magnitude = max(magnitude, na.rm = TRUE),
avg_depth_km = mean(depth_km, na.rm = TRUE),
shallow_event_count = sum(depth_category == "Shallow: <70 km", na.rm = TRUE),
strong_or_major_count = sum(magnitude >= 6.0, na.rm = TRUE),
major_event_count = sum(magnitude >= 7.0, na.rm = TRUE),
weighted_seismic_activity = sum(event_weight, na.rm = TRUE),
.groups = "drop"
)
country_exposure <- earthquake_country_summary %>%
left_join(population_latest, by = "iso_a3") %>%
filter(!is.na(population), population > 0) %>%
mutate(
population_millions = population / 1000000,
earthquakes_per_million = earthquake_count / population_millions,
weighted_activity_per_million = weighted_seismic_activity / population_millions,
population_exposure_factor = log10(population),
raw_exposure_index = weighted_seismic_activity * population_exposure_factor,
exposure_score_0_100 = scales::rescale(raw_exposure_index, to = c(0, 100))
) %>%
arrange(desc(exposure_score_0_100))
knitr::kable(
country_exposure %>%
select(
country, continent, population_year, population_millions, earthquake_count,
max_magnitude, weighted_seismic_activity, earthquakes_per_million,
exposure_score_0_100
) %>%
head(15),
digits = 2,
caption = "Top 15 countries by custom earthquake exposure score"
)| country | continent | population_year | population_millions | earthquake_count | max_magnitude | weighted_seismic_activity | earthquakes_per_million | exposure_score_0_100 |
|---|---|---|---|---|---|---|---|---|
| China | Asia | 2024 | 1408.97 | 146 | 7.1 | 217.4 | 0.10 | 100.00 |
| Indonesia | Asia | 2024 | 283.49 | 151 | 6.4 | 229.0 | 0.53 | 97.31 |
| Ethiopia | Africa | 2024 | 132.06 | 110 | 5.9 | 157.2 | 0.83 | 64.09 |
| Papua New Guinea | Oceania | 2024 | 10.58 | 100 | 6.6 | 146.2 | 9.45 | 51.51 |
| Chile | South America | 2024 | 19.76 | 95 | 6.4 | 128.4 | 4.81 | 46.96 |
| Myanmar | Asia | 2024 | 54.50 | 64 | 7.7 | 98.0 | 1.17 | 37.96 |
| Argentina | South America | 2024 | 45.70 | 87 | 5.9 | 98.0 | 1.90 | 37.58 |
| Philippines | Asia | 2024 | 115.84 | 64 | 6.0 | 91.2 | 0.55 | 36.81 |
| Afghanistan | Asia | 2024 | 42.65 | 50 | 6.2 | 82.0 | 1.17 | 31.28 |
| Peru | South America | 2024 | 34.22 | 63 | 5.9 | 76.8 | 1.84 | 28.91 |
| Iran | Asia | 2024 | 91.57 | 50 | 5.6 | 71.6 | 0.55 | 28.47 |
| Russia | Europe | 2024 | 143.53 | 40 | 5.7 | 58.4 | 0.28 | 23.75 |
| United States of America | North America | 2024 | 340.11 | 27 | 7.0 | 53.4 | 0.08 | 22.70 |
| Japan | Asia | 2024 | 123.98 | 45 | 5.3 | 54.2 | 0.36 | 21.85 |
| Turkey | Asia | 2024 | 85.52 | 33 | 6.1 | 53.2 | 0.39 | 21.01 |
The custom exposure score combines weighted seismic activity with a population exposure factor. The score is scaled from 0 to 100 so countries can be compared more easily. This is not a measure of actual damage or loss; it is an analytical index designed to compare relative exposure based on available data.
10 10. Exploratory Data Analysis
10.1 10.1 Overall Earthquake Summary
overall_summary <- earthquakes_clean %>%
summarise(
study_start = min(event_date),
study_end = max(event_date),
total_events = n(),
average_magnitude = mean(magnitude),
median_magnitude = median(magnitude),
maximum_magnitude = max(magnitude),
average_depth_km = mean(depth_km),
median_depth_km = median(depth_km)
)
knitr::kable(
overall_summary,
digits = 2,
caption = "Overall earthquake summary for the study period"
)| study_start | study_end | total_events | average_magnitude | median_magnitude | maximum_magnitude | average_depth_km | median_depth_km |
|---|---|---|---|---|---|---|---|
| 2025-01-01 | 2025-12-30 | 8546 | 4.81 | 4.7 | 8.8 | 53.27 | 10 |
10.2 10.2 Earthquakes by Magnitude Category
magnitude_category_summary <- earthquakes_clean %>%
count(magnitude_category, name = "event_count") %>%
mutate(percent = event_count / sum(event_count))
knitr::kable(
magnitude_category_summary,
digits = 3,
caption = "Earthquake count by magnitude category"
)| magnitude_category | event_count | percent |
|---|---|---|
| Light: 4.5-4.9 | 6420 | 0.751 |
| Moderate: 5.0-5.9 | 1982 | 0.232 |
| Strong: 6.0-6.9 | 128 | 0.015 |
| Major/Great: 7.0+ | 16 | 0.002 |
11 11. Visualizations Describing the Data
11.1 11.1 Magnitude Distribution
ggplot(earthquakes_clean, aes(x = magnitude)) +
geom_histogram(binwidth = 0.1, boundary = 0, color = "white") +
labs(
title = "Distribution of Earthquake Magnitudes",
subtitle = paste0("USGS earthquakes from ", study_start, " to ", study_end, ", magnitude ", minimum_magnitude, "+"),
x = "Magnitude",
y = "Number of Earthquakes"
) +
theme_minimal()This plot describes the overall magnitude distribution and helps validate that most events in the study period are in the lower end of the selected magnitude range.
11.2 11.2 Earthquake Count by Magnitude Category
ggplot(magnitude_category_summary, aes(x = magnitude_category, y = event_count)) +
geom_col() +
geom_text(aes(label = event_count), vjust = -0.3, size = 3.5) +
labs(
title = "Earthquake Events by Magnitude Category",
x = "Magnitude Category",
y = "Number of Earthquakes"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 25, hjust = 1))11.3 11.3 Magnitude and Depth Relationship
ggplot(earthquakes_clean, aes(x = depth_km, y = magnitude)) +
geom_point(alpha = 0.45) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Relationship Between Earthquake Depth and Magnitude",
x = "Depth (km)",
y = "Magnitude"
) +
theme_minimal()This scatterplot explores whether deeper earthquakes tend to have different magnitudes than shallower events. The trend line provides a simple visual check before formal statistical analysis.
11.4 11.4 Top Countries by Earthquake Count
top_countries_by_count <- country_exposure %>%
slice_max(earthquake_count, n = 15) %>%
mutate(country = fct_reorder(country, earthquake_count))
ggplot(top_countries_by_count, aes(x = earthquake_count, y = country)) +
geom_col() +
labs(
title = "Top Countries by Earthquake Event Count",
subtitle = "Country-level events matched using earthquake coordinates and country boundaries",
x = "Number of Earthquakes",
y = "Country"
) +
theme_minimal()This plot shows where earthquakes occurred most often, but it does not yet account for population exposure.
12 12. Visualizations Supporting the Conclusion
12.1 12.1 Top Countries by Custom Exposure Score
top_countries_by_exposure <- country_exposure %>%
slice_max(exposure_score_0_100, n = 15) %>%
mutate(country = fct_reorder(country, exposure_score_0_100))
ggplot(top_countries_by_exposure, aes(x = exposure_score_0_100, y = country)) +
geom_col() +
labs(
title = "Top Countries by Earthquake Exposure Score",
subtitle = "Score combines weighted seismic activity with a population exposure factor",
x = "Exposure Score (0-100)",
y = "Country"
) +
theme_minimal()This graphic supports the main conclusion because it ranks countries using both seismic activity and population exposure, not event count alone.
12.2 12.2 Population and Weighted Seismic Activity
country_exposure %>%
filter(earthquake_count >= 3) %>%
ggplot(aes(x = population_millions, y = weighted_seismic_activity)) +
geom_point(aes(size = exposure_score_0_100), alpha = 0.65) +
geom_smooth(method = "lm", se = FALSE) +
scale_x_log10(labels = scales::comma) +
labs(
title = "Population Size and Weighted Seismic Activity",
subtitle = "Countries with more events and larger populations can rank higher in exposure-based analysis",
x = "Population in Millions (log scale)",
y = "Weighted Seismic Activity",
size = "Exposure Score"
) +
theme_minimal()This scatterplot compares population size with weighted seismic activity. It helps show why exposure analysis can differ from a simple ranking by event count.
12.3 12.3 Earthquakes per Million People
top_per_million <- country_exposure %>%
filter(population_millions >= 1) %>%
slice_max(earthquakes_per_million, n = 15) %>%
mutate(country = fct_reorder(country, earthquakes_per_million))
ggplot(top_per_million, aes(x = earthquakes_per_million, y = country)) +
geom_col() +
labs(
title = "Top Countries by Earthquakes per Million People",
subtitle = "Population-adjusted rate using countries with at least 1 million people",
x = "Earthquakes per Million People",
y = "Country"
) +
theme_minimal()This chart provides a population-adjusted view of earthquake activity. It is useful because countries with smaller populations can appear very different after adjustment.
13 13. Feature Beyond Class: Static and Interactive Geospatial Mapping
This project includes both a static world map and an interactive leaflet map as geospatial features beyond the standard course workflow. The static map provides a reliable visualization of global earthquake locations in the rendered report, while the interactive leaflet map allows the user to explore individual earthquake events by clicking on markers. Each marker includes event-level information such as magnitude, depth, date, place, and matched country when available.
leaflet_data <- earthquakes_with_country %>%
filter(!is.na(latitude), !is.na(longitude)) %>%
mutate(
popup_text = paste0(
"<strong>Magnitude:</strong> ", round(magnitude, 1), "<br>",
"<strong>Depth:</strong> ", round(depth_km, 1), " km<br>",
"<strong>Date:</strong> ", event_date, "<br>",
"<strong>Place:</strong> ", htmltools::htmlEscape(place), "<br>",
"<strong>Country:</strong> ",
ifelse(is.na(country), "Offshore/Unmatched", htmltools::htmlEscape(country))
)
)
world_map <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
ggplot() +
geom_sf(data = world_map, fill = "gray95", color = "gray70", linewidth = 0.2) +
geom_point(
data = leaflet_data,
aes(x = longitude, y = latitude, size = magnitude),
alpha = 0.45
) +
scale_size_continuous(range = c(1, 5)) +
coord_sf(expand = FALSE) +
labs(
title = "Global Earthquake Locations",
subtitle = paste0("USGS earthquakes from ", study_start, " to ", study_end, ", magnitude ", minimum_magnitude, "+"),
x = "Longitude",
y = "Latitude",
size = "Magnitude"
) +
theme_minimal()leaflet(leaflet_data, height = 600) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircleMarkers(
lng = ~longitude,
lat = ~latitude,
radius = ~scales::rescale(magnitude, to = c(3, 9)),
stroke = FALSE,
fillOpacity = 0.55,
popup = ~popup_text,
label = ~paste0("M", round(magnitude, 1), " - ", place)
) %>%
addLegend(
position = "bottomright",
title = "Earthquakes M4.5+",
labels = c("Circle size increases with magnitude"),
colors = "gray40"
)Together, the static and interactive maps support the geographic interpretation of the project by showing that earthquake activity is spatially clustered rather than evenly distributed across the world. The static map ensures that the location pattern is visible in the published report, while the interactive map provides an additional way to inspect event-level details.
14 14. Statistical Analysis
14.1 14.1 Correlation Analysis
This section evaluates relationships among population size, earthquake count, weighted seismic activity, and the custom exposure score.
correlation_data <- country_exposure %>%
select(
population_millions,
earthquake_count,
avg_magnitude,
max_magnitude,
avg_depth_km,
weighted_seismic_activity,
earthquakes_per_million,
exposure_score_0_100
)
correlation_results <- tibble(
comparison = c(
"Population vs Earthquake Count",
"Population vs Weighted Seismic Activity",
"Earthquake Count vs Exposure Score",
"Weighted Seismic Activity vs Exposure Score",
"Earthquakes per Million vs Exposure Score"
),
correlation = c(
cor(correlation_data$population_millions, correlation_data$earthquake_count, use = "complete.obs"),
cor(correlation_data$population_millions, correlation_data$weighted_seismic_activity, use = "complete.obs"),
cor(correlation_data$earthquake_count, correlation_data$exposure_score_0_100, use = "complete.obs"),
cor(correlation_data$weighted_seismic_activity, correlation_data$exposure_score_0_100, use = "complete.obs"),
cor(correlation_data$earthquakes_per_million, correlation_data$exposure_score_0_100, use = "complete.obs")
)
)
knitr::kable(
correlation_results,
digits = 3,
caption = "Correlation analysis for earthquake exposure variables"
)| comparison | correlation |
|---|---|
| Population vs Earthquake Count | 0.363 |
| Population vs Weighted Seismic Activity | 0.387 |
| Earthquake Count vs Exposure Score | 0.986 |
| Weighted Seismic Activity vs Exposure Score | 0.995 |
| Earthquakes per Million vs Exposure Score | -0.062 |
The correlation analysis supports the project by testing whether the exposure score is more closely aligned with raw event frequency, weighted seismic activity, population-adjusted rates, or population size.
14.2 14.2 Regression Model
This regression model evaluates how population size, earthquake frequency, average magnitude, and average depth relate to the custom exposure index. The model uses a log transformation because exposure values are right-skewed.
regression_data <- country_exposure %>%
filter(
earthquake_count >= 2,
population > 0,
raw_exposure_index > 0
) %>%
mutate(
log_exposure_index = log1p(raw_exposure_index),
log_population = log10(population)
)
exposure_model <- lm(
log_exposure_index ~ log_population + earthquake_count + avg_magnitude + avg_depth_km,
data = regression_data
)
model_summary <- broom::tidy(exposure_model)
model_fit <- broom::glance(exposure_model)
knitr::kable(
model_summary,
digits = 4,
caption = "Regression model coefficients for log exposure index"
)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5.0216 | 3.5261 | -1.4241 | 0.1609 |
| log_population | 0.2340 | 0.1233 | 1.8978 | 0.0638 |
| earthquake_count | 0.0295 | 0.0027 | 10.7404 | 0.0000 |
| avg_magnitude | 1.5182 | 0.6708 | 2.2633 | 0.0282 |
| avg_depth_km | 0.0013 | 0.0018 | 0.7294 | 0.4693 |
knitr::kable(
model_fit %>% select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs),
digits = 4,
caption = "Regression model fit statistics"
)| r.squared | adj.r.squared | sigma | statistic | p.value | df | nobs |
|---|---|---|---|---|---|---|
| 0.7845 | 0.7666 | 0.6602 | 43.6883 | 0 | 4 | 53 |
The regression model provides statistical support for the conclusion by evaluating whether population and earthquake activity variables are associated with the exposure index.
14.3 14.3 Comparing Event Count Ranking vs Exposure Score Ranking
A key goal of the project is to show whether exposure analysis changes interpretation compared with simple earthquake counts. This section compares country rankings by earthquake count and by exposure score.
ranking_comparison <- country_exposure %>%
mutate(
count_rank = min_rank(desc(earthquake_count)),
exposure_rank = min_rank(desc(exposure_score_0_100)),
rank_change = count_rank - exposure_rank
) %>%
arrange(exposure_rank) %>%
select(
country,
continent,
population_millions,
earthquake_count,
count_rank,
exposure_score_0_100,
exposure_rank,
rank_change
)
knitr::kable(
ranking_comparison %>% head(20),
digits = 2,
caption = "Comparison of earthquake count ranking and exposure score ranking"
)| country | continent | population_millions | earthquake_count | count_rank | exposure_score_0_100 | exposure_rank | rank_change |
|---|---|---|---|---|---|---|---|
| China | Asia | 1408.97 | 146 | 2 | 100.00 | 1 | 1 |
| Indonesia | Asia | 283.49 | 151 | 1 | 97.31 | 2 | -1 |
| Ethiopia | Africa | 132.06 | 110 | 3 | 64.09 | 3 | 0 |
| Papua New Guinea | Oceania | 10.58 | 100 | 4 | 51.51 | 4 | 0 |
| Chile | South America | 19.76 | 95 | 5 | 46.96 | 5 | 0 |
| Myanmar | Asia | 54.50 | 64 | 7 | 37.96 | 6 | 1 |
| Argentina | South America | 45.70 | 87 | 6 | 37.58 | 7 | -1 |
| Philippines | Asia | 115.84 | 64 | 7 | 36.81 | 8 | -1 |
| Afghanistan | Asia | 42.65 | 50 | 10 | 31.28 | 9 | 1 |
| Peru | South America | 34.22 | 63 | 9 | 28.91 | 10 | -1 |
| Iran | Asia | 91.57 | 50 | 10 | 28.47 | 11 | -1 |
| Russia | Europe | 143.53 | 40 | 13 | 23.75 | 12 | 1 |
| United States of America | North America | 340.11 | 27 | 15 | 22.70 | 13 | 2 |
| Japan | Asia | 123.98 | 45 | 12 | 21.85 | 14 | -2 |
| Turkey | Asia | 85.52 | 33 | 14 | 21.01 | 15 | -1 |
| Mexico | North America | 130.86 | 25 | 17 | 15.28 | 16 | 1 |
| Colombia | South America | 52.89 | 26 | 16 | 12.97 | 17 | -1 |
| Ecuador | South America | 18.14 | 24 | 19 | 12.76 | 18 | 1 |
| Canada | North America | 41.29 | 19 | 20 | 10.79 | 19 | 1 |
| Pakistan | Asia | 251.27 | 19 | 20 | 10.41 | 20 | 0 |
A positive rank change means a country ranks higher under the exposure score than under simple earthquake count. This helps demonstrate how population exposure can change the interpretation of earthquake risk.
15 15. Main Findings
top_count_country <- country_exposure %>%
slice_max(earthquake_count, n = 1, with_ties = FALSE)
top_exposure_country <- country_exposure %>%
slice_max(exposure_score_0_100, n = 1, with_ties = FALSE)
top_per_million_country <- country_exposure %>%
filter(population_millions >= 1) %>%
slice_max(earthquakes_per_million, n = 1, with_ties = FALSE)
summary_findings <- tibble(
finding = c(
"Highest earthquake count",
"Highest custom exposure score",
"Highest earthquakes per million people among countries with at least 1 million people"
),
country = c(
top_count_country$country,
top_exposure_country$country,
top_per_million_country$country
),
value = c(
round(top_count_country$earthquake_count, 2),
round(top_exposure_country$exposure_score_0_100, 2),
round(top_per_million_country$earthquakes_per_million, 2)
)
)
knitr::kable(
summary_findings,
caption = "Summary of main country-level findings"
)| finding | country | value |
|---|---|---|
| Highest earthquake count | Indonesia | 151.00 |
| Highest custom exposure score | China | 100.00 |
| Highest earthquakes per million people among countries with at least 1 million people | Papua New Guinea | 9.45 |
cat(
"The country with the highest earthquake count in this study period was",
top_count_country$country,
"with",
top_count_country$earthquake_count,
"matched earthquake events. The country with the highest custom exposure score was",
top_exposure_country$country,
"with an exposure score of",
round(top_exposure_country$exposure_score_0_100, 2),
". The highest earthquakes-per-million rate among countries with at least 1 million people was observed in",
top_per_million_country$country,
"with",
round(top_per_million_country$earthquakes_per_million, 2),
"earthquakes per million people. These differences show why population-adjusted and exposure-weighted analysis can change the interpretation of earthquake risk."
)The country with the highest earthquake count in this study period was Indonesia with 151 matched earthquake events. The country with the highest custom exposure score was China with an exposure score of 100 . The highest earthquakes-per-million rate among countries with at least 1 million people was observed in Papua New Guinea with 9.45 earthquakes per million people. These differences show why population-adjusted and exposure-weighted analysis can change the interpretation of earthquake risk.
The main finding is that earthquake risk interpretation changes depending on the metric used. Countries with the largest number of earthquakes are not always the same as countries with the highest population-adjusted rates or the highest exposure score. This supports the motivation for analyzing earthquake risk beyond magnitude and event count alone.
16 16. Challenges Encountered
One major challenge in this project was connecting earthquake events to countries. The USGS data provides latitude and longitude coordinates and a text place description, but the place field is not always a clean country name. Some events occur offshore, near islands, or in oceanic regions. To address this, the project used geospatial country boundaries from Natural Earth and assigned country information using a spatial join. Events that did not fall within a country boundary were treated as offshore or unmatched events and excluded from the country-population join.
A second challenge was the structure of the World Bank population file. The raw CSV contains metadata rows before the table and stores years as separate columns. To solve this, the project skipped the metadata rows and transformed the population data from wide to long format before selecting the latest available year.
17 17. Limitations
This project uses country-level population as a broad exposure measure. This is useful for comparing countries, but it does not show the exact number of people near each earthquake epicenter. A more advanced version of this project could use gridded population density data to estimate people living within a specific radius of each earthquake.
The custom exposure score is also an analytical index rather than a direct measure of damage, deaths, or economic loss. Actual earthquake impact depends on many factors not included here, such as building quality, emergency preparedness, time of day, local soil conditions, tsunami risk, and distance from population centers.
Finally, offshore earthquakes are important for tsunami risk, but many are not assigned to a country through the spatial join. This analysis focuses mainly on country-level seismic exposure and population context.
18 18. Conclusion
This project shows that earthquake risk should be analyzed beyond magnitude alone. By combining USGS earthquake activity with World Bank population data, the analysis provides multiple ways to compare seismic exposure: raw event counts, population-adjusted rates, weighted seismic activity, and a custom exposure score.
The results demonstrate that the interpretation of risk can change depending on whether population exposure is included. A country with many earthquakes may not have the highest population-adjusted exposure, while a highly populated country with fewer but stronger events may rank higher using an exposure-based score. This supports the main conclusion that population exposure adds important context to earthquake risk analysis.
19 19. Reproducibility Notes
This project is designed to be reproducible:
- The USGS earthquake data is acquired from a public API/GeoJSON query.
- The World Bank population CSV files are stored in the project GitHub repository.
- The World Bank data is loaded using raw GitHub URLs rather than local file paths.
- All cleaning, transformation, analysis, and visualization steps are performed in this Quarto document.
- The project avoids references to local OneDrive, Desktop, or Downloads paths.
20 20. References and Data Source Links
- USGS Earthquake Catalog API:
https://earthquake.usgs.gov/fdsnws/event/1/ - World Bank Population, total indicator (
SP.POP.TOTL):https://data.worldbank.org/indicator/SP.POP.TOTL - Natural Earth country boundary data accessed through the
rnaturalearthR package - R packages used:
tidyverse,jsonlite,lubridate,scales,sf,rnaturalearth,leaflet,broom,knitr,htmltools, andviridis