Import dataset
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
library(ggplot2)
# Load the dataset
earthquake <- read_csv("earthquakes.csv")
## Rows: 123 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): month, area, region
## dbl (4): year, day, richter, deaths
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Introduction
What are the strongest and most dangerous earthquakes recorded in our history? Earthquakes are a natural disaster that have taken tons of lives throughout history. The dataset I’ll be using for this project is named Earthquakes from OpenIntro.org. This dataset contains a select set of notable earthquakes globally from the 1900 to 1999.
The variables I’ll be using to answer my hypothesis will be richter, which is the magnitude of earthquake using the Richter Scale. The Richter Scale is essentially a number system that measures the strength or magnitude of an earthquake. I will also be creating a new variable, named intensity to categorize the intensity of the earthquake based on the Richter Scale.
Data Analysis
For my data analysis, I will be identifying which earthquakes in history are the strongest and deadliest through exploratory data analysis. First, I’ll be cleaning my data, making sure that I get rid of all the N/As, especially from the variables deaths and richter. Next is I’ll be creating a new column to categorize the intensity of the earthquakes, based on the Richter Scale. I’ll be using functions such as ‘filter’, ‘mutate’, ‘summary’, ‘arrange’, etc. to answer my question. Finally, I will also be using a Scatter Plot and plot the correlation between the intensity of the earthquakes and the deaths it has caused.
earthquakes_no_na <- earthquake |>
filter(!is.na(deaths)) %>%
filter(!is.na(richter))
earthquakes_no_na <- earthquakes_no_na |>
mutate(intensity = case_when(
richter < 5 ~ "Low",
richter >= 5 & richter < 6 ~ "Moderate",
richter >= 6 & richter < 7 ~ "Strong",
richter >= 7 & richter < 8 ~ "Major",
richter >=8 ~ "Great"
))
summary(earthquakes_no_na)
## year month day richter
## Min. :1902 Length:121 Min. : 1.0 Min. :5.500
## 1st Qu.:1933 Class :character 1st Qu.:10.0 1st Qu.:6.800
## Median :1960 Mode :character Median :17.0 Median :7.200
## Mean :1956 Mean :16.9 Mean :7.143
## 3rd Qu.:1980 3rd Qu.:25.0 3rd Qu.:7.500
## Max. :1999 Max. :31.0 Max. :9.500
## area region deaths intensity
## Length:121 Length:121 Min. : 3 Length:121
## Class :character Class :character 1st Qu.: 1250 Class :character
## Mode :character Mode :character Median : 2790 Mode :character
## Mean : 17683
## 3rd Qu.: 8000
## Max. :700000
earthquakes_byMagnitude <- earthquakes_no_na |>
arrange(desc(richter))
earthquakes_byMagnitude
## # A tibble: 121 × 8
## year month day richter area region deaths intensity
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 1960 May 21 9.5 South Chile 1655 Great
## 2 1964 March 27 9.2 Alaska United … 131 Great
## 3 1906 January 31 8.8 Esmeraldas (off coast) Ecuador 1000 Great
## 4 1906 August 17 8.6 Valparaiso Chile 3882 Great
## 5 1950 August 15 8.6 Assam India 1526 Great
## 6 1933 March 2 8.4 Sanriku Japan 2990 Great
## 7 1907 October 21 8.1 Central Asia 12000 Great
## 8 1934 January 15 8.1 Bihar India-N… 10700 Great
## 9 1946 December 29 8.1 Honshu Japan 1362 Great
## 10 1931 August 10 8 Xinjiang China 10000 Great
## # ℹ 111 more rows
earthquakes_byDeaths <- earthquakes_no_na |>
arrange(desc(deaths))
earthquakes_byDeaths
## # A tibble: 121 × 8
## year month day richter area region deaths intensity
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 1970 May 31 7.9 Chimbote Peru 700000 Major
## 2 1976 July 28 7.5 Tangshan China 255000 Major
## 3 1920 December 16 7.8 Gansu China 200000 Major
## 4 1923 September 1 7.9 Yokohama Japan 142800 Major
## 5 1948 October 5 7.3 Ashgabat Turkmenistan 110000 Major
## 6 1908 December 28 7.2 Messina Italy 72000 Major
## 7 1927 May 22 7.6 Gansu China 40900 Major
## 8 1990 June 20 7.4 West Iran 40000 Major
## 9 1939 December 26 7.8 Erzincan Turkey 32700 Major
## 10 1915 January 13 7 Avezzano Italy 32610 Major
## # ℹ 111 more rows
avg_deaths <- earthquakes_no_na |>
group_by(intensity) %>%
summarize(mean_deaths = mean(deaths))
avg_deaths
## # A tibble: 4 × 2
## intensity mean_deaths
## <chr> <dbl>
## 1 Great 4896.
## 2 Major 29303.
## 3 Moderate 2958.
## 4 Strong 3538.
ggplot(earthquakes_no_na, aes(x = richter, y = deaths, color = intensity)) +
geom_point(size = 3) +
scale_color_manual(values = c("#FF6A6A", "#FFA07A", "#FFB90F", "#C1FFC1")
, labels = c("Great", "Major", "Strong", "Moderate")) +
labs(title = "Scatterplot of Earthquake Magnitude Vs Deaths (1900-1999)",
x = "Magnitude (Richter Scale)", y = "Deaths",) +
theme_minimal()
Conclusion and Future Directions
From my findings, the most dangerous or deadliest earthquakes were in the “Major” intensity range (magnitude 7.0 - 7.9). Even though there were more stronger intensity earthquakes from the data set, categorized in the “Great” intensity, it did not necessarily result to more deaths. I believe the reason why is due to its location. Areas that get affected by very high magnitude earthquakes are probably not as populated. Future research in this topic could include more recent data, so from the 2000s to current day. Another thing to consider is how preparedness could lessen the fatalities of earthquakes, such as infrastructures that could withstand intense and frequent earthquakes.
References
“Earthquakes.” Data Sets, www.openintro.org/data/index.php?data=earthquakes. Accessed 17 Oct. 2025.