Import dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
library(ggplot2)

# Load the dataset
earthquake <- read_csv("earthquakes.csv")
## Rows: 123 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): month, area, region
## dbl (4): year, day, richter, deaths
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Introduction

What are the strongest and most dangerous earthquakes recorded in our history? Earthquakes are a natural disaster that have taken tons of lives throughout history. The dataset I’ll be using for this project is named Earthquakes from OpenIntro.org. This dataset contains a select set of notable earthquakes globally from the 1900 to 1999.

The variables I’ll be using to answer my hypothesis will be richter, which is the magnitude of earthquake using the Richter Scale. The Richter Scale is essentially a number system that measures the strength or magnitude of an earthquake. I will also be creating a new variable, named intensity to categorize the intensity of the earthquake based on the Richter Scale.

Data Analysis

For my data analysis, I will be identifying which earthquakes in history are the strongest and deadliest through exploratory data analysis. First, I’ll be cleaning my data, making sure that I get rid of all the N/As, especially from the variables deaths and richter. Next is I’ll be creating a new column to categorize the intensity of the earthquakes, based on the Richter Scale. I’ll be using functions such as ‘filter’, ‘mutate’, ‘summary’, ‘arrange’, etc. to answer my question. Finally, I will also be using a Scatter Plot and plot the correlation between the intensity of the earthquakes and the deaths it has caused.

Clean data by removing rows with N/A deaths and richter variables

earthquakes_no_na <- earthquake |>
  filter(!is.na(deaths)) %>%
  filter(!is.na(richter))

Create a new column that creates a category for the intensity of the earthquakes

earthquakes_no_na <- earthquakes_no_na |>
  mutate(intensity = case_when(
    richter < 5 ~ "Low",
    richter >= 5 & richter < 6 ~ "Moderate",
    richter >= 6 & richter < 7 ~ "Strong",
    richter >= 7 & richter < 8 ~ "Major",
    richter >=8 ~ "Great"
  ))

Summary of the data and data arranged by highest to lowest magnitude earthquakes and deaths

summary(earthquakes_no_na)
##       year         month                day          richter     
##  Min.   :1902   Length:121         Min.   : 1.0   Min.   :5.500  
##  1st Qu.:1933   Class :character   1st Qu.:10.0   1st Qu.:6.800  
##  Median :1960   Mode  :character   Median :17.0   Median :7.200  
##  Mean   :1956                      Mean   :16.9   Mean   :7.143  
##  3rd Qu.:1980                      3rd Qu.:25.0   3rd Qu.:7.500  
##  Max.   :1999                      Max.   :31.0   Max.   :9.500  
##      area              region              deaths        intensity        
##  Length:121         Length:121         Min.   :     3   Length:121        
##  Class :character   Class :character   1st Qu.:  1250   Class :character  
##  Mode  :character   Mode  :character   Median :  2790   Mode  :character  
##                                        Mean   : 17683                     
##                                        3rd Qu.:  8000                     
##                                        Max.   :700000
earthquakes_byMagnitude <- earthquakes_no_na |>
  arrange(desc(richter))

earthquakes_byMagnitude
## # A tibble: 121 × 8
##     year month      day richter area                   region   deaths intensity
##    <dbl> <chr>    <dbl>   <dbl> <chr>                  <chr>     <dbl> <chr>    
##  1  1960 May         21     9.5 South                  Chile      1655 Great    
##  2  1964 March       27     9.2 Alaska                 United …    131 Great    
##  3  1906 January     31     8.8 Esmeraldas (off coast) Ecuador    1000 Great    
##  4  1906 August      17     8.6 Valparaiso             Chile      3882 Great    
##  5  1950 August      15     8.6 Assam                  India      1526 Great    
##  6  1933 March        2     8.4 Sanriku                Japan      2990 Great    
##  7  1907 October     21     8.1 Central                Asia      12000 Great    
##  8  1934 January     15     8.1 Bihar                  India-N…  10700 Great    
##  9  1946 December    29     8.1 Honshu                 Japan      1362 Great    
## 10  1931 August      10     8   Xinjiang               China     10000 Great    
## # ℹ 111 more rows
earthquakes_byDeaths <- earthquakes_no_na |>
  arrange(desc(deaths))

earthquakes_byDeaths
## # A tibble: 121 × 8
##     year month       day richter area     region       deaths intensity
##    <dbl> <chr>     <dbl>   <dbl> <chr>    <chr>         <dbl> <chr>    
##  1  1970 May          31     7.9 Chimbote Peru         700000 Major    
##  2  1976 July         28     7.5 Tangshan China        255000 Major    
##  3  1920 December     16     7.8 Gansu    China        200000 Major    
##  4  1923 September     1     7.9 Yokohama Japan        142800 Major    
##  5  1948 October       5     7.3 Ashgabat Turkmenistan 110000 Major    
##  6  1908 December     28     7.2 Messina  Italy         72000 Major    
##  7  1927 May          22     7.6 Gansu    China         40900 Major    
##  8  1990 June         20     7.4 West     Iran          40000 Major    
##  9  1939 December     26     7.8 Erzincan Turkey        32700 Major    
## 10  1915 January      13     7   Avezzano Italy         32610 Major    
## # ℹ 111 more rows

Table of average deaths by intensity

avg_deaths <- earthquakes_no_na |>
  group_by(intensity) %>%
  summarize(mean_deaths = mean(deaths))

avg_deaths
## # A tibble: 4 × 2
##   intensity mean_deaths
##   <chr>           <dbl>
## 1 Great           4896.
## 2 Major          29303.
## 3 Moderate        2958.
## 4 Strong          3538.

Scatter plot showing the relationshipn between magnitude by richter scale and deaths

ggplot(earthquakes_no_na, aes(x = richter, y = deaths, color = intensity)) +
  geom_point(size = 3) +
  scale_color_manual(values = c("#FF6A6A", "#FFA07A", "#FFB90F", "#C1FFC1")
                    , labels = c("Great", "Major", "Strong", "Moderate")) +
  labs(title = "Scatterplot of Earthquake Magnitude Vs Deaths (1900-1999)",
       x = "Magnitude (Richter Scale)", y = "Deaths",) +
  theme_minimal()

Conclusion and Future Directions

From my findings, the most dangerous or deadliest earthquakes were in the “Major” intensity range (magnitude 7.0 - 7.9). Even though there were more stronger intensity earthquakes from the data set, categorized in the “Great” intensity, it did not necessarily result to more deaths. I believe the reason why is due to its location. Areas that get affected by very high magnitude earthquakes are probably not as populated. Future research in this topic could include more recent data, so from the 2000s to current day. Another thing to consider is how preparedness could lessen the fatalities of earthquakes, such as infrastructures that could withstand intense and frequent earthquakes.

References

“Earthquakes.” Data Sets, www.openintro.org/data/index.php?data=earthquakes. Accessed 17 Oct. 2025.