Assignment 1_DATA 607

Author

Heleine Fouda

Introduction

This analysis will use earthquakes, a select set of notable earthquakes from 1900 to 1999 compiled by the World Almanac and Book of Facts: 2011. The data set is available from the openIntro data sets . The earthquakes data frame contains 7 variables and 123 observations.

Research questions:

In this document, I’m interested in exploring two things :

  1. The strength of the relationship between the magnitude or Richter’s scale of earthquakes and the death tolls;
  2. The relationship between geographic regions and the frequency and/or lethality of earthquakes.

The data

# label: load - packages
library(ggplot2)
library(openintro)
library(tidyverse)
library(gt)

A quick view of the data set:

# label: glimpse - data set
data("earthquakes")
glimpse(earthquakes)
Rows: 123
Columns: 7
$ year    <dbl> 1902, 1902, 1903, 1903, 1905, 1906, 1906, 1906, 1906, 1907, 19…
$ month   <chr> "April", "December", "April", "May", "April", "January", "Marc…
$ day     <dbl> 19, 16, 28, 28, 4, 31, 16, 18, 17, 21, 28, 23, 9, 3, 13, 30, 1…
$ richter <dbl> 7.5, 6.4, 7.0, 5.8, 7.5, 8.8, 6.8, 7.7, 8.6, 8.1, 7.2, 7.3, 7.…
$ area    <chr> "Quezaltenango and San Marco", "Uzbekistan", "Malazgirt", "Gol…
$ region  <chr> "Guatemala", "Russia", "Turkey", "Turkey", "India", "Ecuador",…
$ deaths  <dbl> 2000, 4700, 3500, 1000, 19000, 1000, 1250, 3000, 3882, 12000, …

Structure of the data set

# label: structure - earthquakes
str(earthquakes)
spc_tbl_ [123 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year   : num [1:123] 1902 1902 1903 1903 1905 ...
 $ month  : chr [1:123] "April" "December" "April" "May" ...
 $ day    : num [1:123] 19 16 28 28 4 31 16 18 17 21 ...
 $ richter: num [1:123] 7.5 6.4 7 5.8 7.5 8.8 6.8 7.7 8.6 8.1 ...
 $ area   : chr [1:123] "Quezaltenango and San Marco" "Uzbekistan" "Malazgirt" "Gole" ...
 $ region : chr [1:123] "Guatemala" "Russia" "Turkey" "Turkey" ...
 $ deaths : num [1:123] 2000 4700 3500 1000 19000 ...
 - attr(*, "spec")=
  .. cols(
  ..   Year = col_double(),
  ..   Month = col_character(),
  ..   Day = col_double(),
  ..   Richter = col_double(),
  ..   Area = col_character(),
  ..   Region = col_character(),
  ..   Deaths = col_double()
  .. )

Columns’ names in the data set

# label: columns - names
names(earthquakes)
[1] "year"    "month"   "day"     "richter" "area"    "region"  "deaths" 

columns’ head

# label: columns- head
head(earthquakes)
# A tibble: 6 × 7
   year month      day richter area                        region    deaths
  <dbl> <chr>    <dbl>   <dbl> <chr>                       <chr>      <dbl>
1  1902 April       19     7.5 Quezaltenango and San Marco Guatemala   2000
2  1902 December    16     6.4 Uzbekistan                  Russia      4700
3  1903 April       28     7   Malazgirt                   Turkey      3500
4  1903 May         28     5.8 Gole                        Turkey      1000
5  1905 April        4     7.5 Kangra                      India      19000
6  1906 January     31     8.8 Esmeraldas (off coast)      Ecuador     1000

Analysis

Just how lethal are earthquakes, in general?

The histogram below shows an overall high lethality for earthquakes.

# Label: Earthquakes - lethality- 
# Fig- cap: Histogram of earthquakes lethality over the years.
ggplot(earthquakes, aes(x = log(deaths))) +
  geom_histogram()
Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

below is the average magnitude of earthquakes on the Richter scale

# Richter scale average

earthquakes|>
  summarize(avg =mean(richter))
# A tibble: 1 × 1
    avg
  <dbl>
1  7.13

Average death toll of earthquakes

earthquakes|>
  summarize(avg = mean(deaths, na.rm = ))
# A tibble: 1 × 1
    avg
  <dbl>
1    NA

Let’s now examine the distribution of death toll within the data frame

# label: Death - toll- distribution
# fig - cap: Histogram of death toll
ggplot(earthquakes, aes(x = deaths)) +
  geom_histogram()
Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

Measuring the centrality of earthquakes’ magnitude on the Richter’s scale

# label: mean - median of earthquakes magnitude
earthquakes|>
  summarise(mean_magnitude = mean(richter), median_magnitude = median(richter),
            n = n())
# A tibble: 1 × 3
  mean_magnitude median_magnitude     n
           <dbl>            <dbl> <int>
1           7.13              7.2   123

Data transformation:

Arranging cases of the data set in an ascending order:

# label : Arrange- asc - richter scale order
earthquakes_1 <- earthquakes|>
  arrange(richter)
view(earthquakes_1)
# Label: Arrange - asc - deaths 
earthquakes_2 <- earthquakes_1|>
  arrange(deaths)
view(earthquakes_2)

Arranging observations in a descending order

# label: arrange - desc- Richter scale
earthquakes_3 <- earthquakes_2|>
  arrange(desc(richter))
view(earthquakes_3)
# label: arrange - desc- deaths
earthquakes_4 <- earthquakes_3|>
  arrange(desc(deaths))
view(earthquakes_4)

Re-styling data set columns

earthquakes_5 <- earthquakes_4|>
  rename(Year = year, 
         Month = month,
        Richter_scale = richter,
        Region = region,
        Deaths = deaths)
view(earthquakes_5)

The table below shows the 15 deadliest earthquakes in modern times.

# Label: The 15 most deadly earthquakes in descending order
# fig-cap: tbl of the 15 dealiest earthquakes in modern times
earthquakes_new <- earthquakes_5|>
  slice_head(n=15) |>
  select(Year, Month, Richter_scale, Region, Deaths, -area)
gt(earthquakes_new)
Year Month Richter_scale Region Deaths
1970 May 7.9 Peru 700000
1976 July 7.5 China 255000
1920 December 7.8 China 200000
1923 September 7.9 Japan 142800
1948 October 7.3 Turkmenistan 110000
1908 December 7.2 Italy 72000
1927 May 7.6 China 40900
1990 June 7.4 Iran 40000
1939 December 7.8 Turkey 32700
1915 January 7.0 Italy 32610
1935 May 7.6 Pakistan 30000
1939 January 7.8 Chile 28000
1988 December 6.8 Armenia 25000
1976 February 7.5 Guatemala 23000
1974 May 6.8 China 20000

Exploring the correlation between the magnitude of earthquakes and their lethality:

Examining the relationship between the Richter scale of an earthquake and the death toll using a scatterplot

# label: Richter scale - death toll
# label: fig - scatterplot Richter scale - death toll
# fig-cap: scatterplot of Richter scale - death toll relashionship
ggplot(earthquakes, aes(x = richter, y = deaths, color = deaths))+ 
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Below, examining the relationship between the Richter’s scale of an earthquake and the death toll via a linear model:

ggplot(earthquakes, aes(x = richter, y = deaths))+ 
  geom_smooth(model = lm)
Warning in geom_smooth(model = lm): Ignoring unknown parameters: `model`
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

Below are the results of a pearson test showing an extremely weak correlation between the Richter’s scale of an earthquake and the magnitude of the death toll.

# label: Correlation - deaths - Richter scale
cor.test(earthquakes$deaths, earthquakes$richter)

    Pearson's product-moment correlation

data:  earthquakes$deaths and earthquakes$richter
t = 1.776, df = 119, p-value = 0.07829
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.01833438  0.32972723
sample estimates:
     cor 
0.160688 

Key Findings

  1. Earthquakes remain extremely lethal in many parts of the world, despite the advances in sciences and technologies (see, first histogram).

  2. The average magnitude of earthquakes is 7.128455 on the Richter scale

  3. There is very little relationship between the magnitude or scale of Richter of earthquakes and their lethality.

  4. The lowest Richter scale recorded between 1900 and 1999 was 5.5. That earthquake occurred in El Salvador on October 10 1986, but killed 1000 people. The highest magnitude recorded was 9.5 and it occurred on May 21 1960 in Chili causing 1655 deaths.

  5. The least deathly earthquake occurred in the United States on June 28, 1992 in an area called Landers. Its magnitude was 7.3 on the Richter scale, but only 3 people died. The most deadly earthquake caused 700 000 deaths. It occurred in Peru, in an area called Chimbote on May 31, 1970.

  6. The pearson correlation test reveals, within a 95 percent confidence interval, an extremely weak 0.16088 positive correlation between the Richter’s scale of earthquakes and their death toll.

  7. The similarities between the mean and the median when measuring the magnitude of earthquakes, points to a perfectly symmetrical distribution of earthquakes in terms of their scale or magnitude.

  8. A scatter plot and a linear modeling of the relationship between the death toll and the Richter’s scale points to a normal distribution with a bell curve

Conclusion & Recommendations

This analysis has shown an extremely week connection between the magnitude of earthquakes and their death toll. The heavy lethality of most earthquakes seem to find its source in other factors ( that have not been examined in this paper) such as the social and economical conditions of the populations and countries affected by an earthquake, the area density, poverty, the very nature and quality of the habitat etc. Therefore , the paper recommends the following:

  1. People should be discouraged to build and live in geographic areas that are know to be at risk ,
  2. In regions with higgh risk of earthquakes, States and local communities must invest in earthquakes - proof buildings and /or adequately inform populations on the risk involved in living in certain geographic regions.
  3. When it comes to earthquakes, prevention, information and preparedness are keys to circumvent heavy losses in human lives.