# label: load - packages
library(ggplot2)
library(openintro)
library(tidyverse)
library(gt)Assignment 1_DATA 607
Introduction
This analysis will use earthquakes, a select set of notable earthquakes from 1900 to 1999 compiled by the World Almanac and Book of Facts: 2011. The data set is available from the openIntro data sets . The earthquakes data frame contains 7 variables and 123 observations.
Research questions:
In this document, I’m interested in exploring two things :
- The strength of the relationship between the magnitude or Richter’s scale of earthquakes and the death tolls;
- The relationship between geographic regions and the frequency and/or lethality of earthquakes.
The data
A quick view of the data set:
# label: glimpse - data set
data("earthquakes")
glimpse(earthquakes)Rows: 123
Columns: 7
$ year <dbl> 1902, 1902, 1903, 1903, 1905, 1906, 1906, 1906, 1906, 1907, 19…
$ month <chr> "April", "December", "April", "May", "April", "January", "Marc…
$ day <dbl> 19, 16, 28, 28, 4, 31, 16, 18, 17, 21, 28, 23, 9, 3, 13, 30, 1…
$ richter <dbl> 7.5, 6.4, 7.0, 5.8, 7.5, 8.8, 6.8, 7.7, 8.6, 8.1, 7.2, 7.3, 7.…
$ area <chr> "Quezaltenango and San Marco", "Uzbekistan", "Malazgirt", "Gol…
$ region <chr> "Guatemala", "Russia", "Turkey", "Turkey", "India", "Ecuador",…
$ deaths <dbl> 2000, 4700, 3500, 1000, 19000, 1000, 1250, 3000, 3882, 12000, …
Structure of the data set
# label: structure - earthquakes
str(earthquakes)spc_tbl_ [123 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ year : num [1:123] 1902 1902 1903 1903 1905 ...
$ month : chr [1:123] "April" "December" "April" "May" ...
$ day : num [1:123] 19 16 28 28 4 31 16 18 17 21 ...
$ richter: num [1:123] 7.5 6.4 7 5.8 7.5 8.8 6.8 7.7 8.6 8.1 ...
$ area : chr [1:123] "Quezaltenango and San Marco" "Uzbekistan" "Malazgirt" "Gole" ...
$ region : chr [1:123] "Guatemala" "Russia" "Turkey" "Turkey" ...
$ deaths : num [1:123] 2000 4700 3500 1000 19000 ...
- attr(*, "spec")=
.. cols(
.. Year = col_double(),
.. Month = col_character(),
.. Day = col_double(),
.. Richter = col_double(),
.. Area = col_character(),
.. Region = col_character(),
.. Deaths = col_double()
.. )
Columns’ names in the data set
# label: columns - names
names(earthquakes)[1] "year" "month" "day" "richter" "area" "region" "deaths"
columns’ head
# label: columns- head
head(earthquakes)# A tibble: 6 × 7
year month day richter area region deaths
<dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 1902 April 19 7.5 Quezaltenango and San Marco Guatemala 2000
2 1902 December 16 6.4 Uzbekistan Russia 4700
3 1903 April 28 7 Malazgirt Turkey 3500
4 1903 May 28 5.8 Gole Turkey 1000
5 1905 April 4 7.5 Kangra India 19000
6 1906 January 31 8.8 Esmeraldas (off coast) Ecuador 1000
Analysis
Just how lethal are earthquakes, in general?
The histogram below shows an overall high lethality for earthquakes.
# Label: Earthquakes - lethality-
# Fig- cap: Histogram of earthquakes lethality over the years.
ggplot(earthquakes, aes(x = log(deaths))) +
geom_histogram()Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
below is the average magnitude of earthquakes on the Richter scale
# Richter scale average
earthquakes|>
summarize(avg =mean(richter))# A tibble: 1 × 1
avg
<dbl>
1 7.13
Average death toll of earthquakes
earthquakes|>
summarize(avg = mean(deaths, na.rm = ))# A tibble: 1 × 1
avg
<dbl>
1 NA
Let’s now examine the distribution of death toll within the data frame
# label: Death - toll- distribution
# fig - cap: Histogram of death toll
ggplot(earthquakes, aes(x = deaths)) +
geom_histogram()Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
Measuring the centrality of earthquakes’ magnitude on the Richter’s scale
# label: mean - median of earthquakes magnitude
earthquakes|>
summarise(mean_magnitude = mean(richter), median_magnitude = median(richter),
n = n())# A tibble: 1 × 3
mean_magnitude median_magnitude n
<dbl> <dbl> <int>
1 7.13 7.2 123
Data transformation:
Arranging cases of the data set in an ascending order:
# label : Arrange- asc - richter scale order
earthquakes_1 <- earthquakes|>
arrange(richter)
view(earthquakes_1)# Label: Arrange - asc - deaths
earthquakes_2 <- earthquakes_1|>
arrange(deaths)
view(earthquakes_2)Arranging observations in a descending order
# label: arrange - desc- Richter scale
earthquakes_3 <- earthquakes_2|>
arrange(desc(richter))
view(earthquakes_3)# label: arrange - desc- deaths
earthquakes_4 <- earthquakes_3|>
arrange(desc(deaths))
view(earthquakes_4)Re-styling data set columns
earthquakes_5 <- earthquakes_4|>
rename(Year = year,
Month = month,
Richter_scale = richter,
Region = region,
Deaths = deaths)
view(earthquakes_5)The table below shows the 15 deadliest earthquakes in modern times.
# Label: The 15 most deadly earthquakes in descending order
# fig-cap: tbl of the 15 dealiest earthquakes in modern times
earthquakes_new <- earthquakes_5|>
slice_head(n=15) |>
select(Year, Month, Richter_scale, Region, Deaths, -area)
gt(earthquakes_new)| Year | Month | Richter_scale | Region | Deaths |
|---|---|---|---|---|
| 1970 | May | 7.9 | Peru | 700000 |
| 1976 | July | 7.5 | China | 255000 |
| 1920 | December | 7.8 | China | 200000 |
| 1923 | September | 7.9 | Japan | 142800 |
| 1948 | October | 7.3 | Turkmenistan | 110000 |
| 1908 | December | 7.2 | Italy | 72000 |
| 1927 | May | 7.6 | China | 40900 |
| 1990 | June | 7.4 | Iran | 40000 |
| 1939 | December | 7.8 | Turkey | 32700 |
| 1915 | January | 7.0 | Italy | 32610 |
| 1935 | May | 7.6 | Pakistan | 30000 |
| 1939 | January | 7.8 | Chile | 28000 |
| 1988 | December | 6.8 | Armenia | 25000 |
| 1976 | February | 7.5 | Guatemala | 23000 |
| 1974 | May | 6.8 | China | 20000 |
Exploring the correlation between the magnitude of earthquakes and their lethality:
Examining the relationship between the Richter scale of an earthquake and the death toll using a scatterplot
# label: Richter scale - death toll
# label: fig - scatterplot Richter scale - death toll
# fig-cap: scatterplot of Richter scale - death toll relashionship
ggplot(earthquakes, aes(x = richter, y = deaths, color = deaths))+
geom_point()Warning: Removed 2 rows containing missing values (`geom_point()`).
Below, examining the relationship between the Richter’s scale of an earthquake and the death toll via a linear model:
ggplot(earthquakes, aes(x = richter, y = deaths))+
geom_smooth(model = lm)Warning in geom_smooth(model = lm): Ignoring unknown parameters: `model`
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Below are the results of a pearson test showing an extremely weak correlation between the Richter’s scale of an earthquake and the magnitude of the death toll.
# label: Correlation - deaths - Richter scale
cor.test(earthquakes$deaths, earthquakes$richter)
Pearson's product-moment correlation
data: earthquakes$deaths and earthquakes$richter
t = 1.776, df = 119, p-value = 0.07829
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.01833438 0.32972723
sample estimates:
cor
0.160688
Key Findings
Earthquakes remain extremely lethal in many parts of the world, despite the advances in sciences and technologies (see, first histogram).
The average magnitude of earthquakes is 7.128455 on the Richter scale
There is very little relationship between the magnitude or scale of Richter of earthquakes and their lethality.
The lowest Richter scale recorded between 1900 and 1999 was 5.5. That earthquake occurred in El Salvador on October 10 1986, but killed 1000 people. The highest magnitude recorded was 9.5 and it occurred on May 21 1960 in Chili causing 1655 deaths.
The least deathly earthquake occurred in the United States on June 28, 1992 in an area called Landers. Its magnitude was 7.3 on the Richter scale, but only 3 people died. The most deadly earthquake caused 700 000 deaths. It occurred in Peru, in an area called Chimbote on May 31, 1970.
The pearson correlation test reveals, within a 95 percent confidence interval, an extremely weak 0.16088 positive correlation between the Richter’s scale of earthquakes and their death toll.
The similarities between the mean and the median when measuring the magnitude of earthquakes, points to a perfectly symmetrical distribution of earthquakes in terms of their scale or magnitude.
A scatter plot and a linear modeling of the relationship between the death toll and the Richter’s scale points to a normal distribution with a bell curve
Conclusion & Recommendations
This analysis has shown an extremely week connection between the magnitude of earthquakes and their death toll. The heavy lethality of most earthquakes seem to find its source in other factors ( that have not been examined in this paper) such as the social and economical conditions of the populations and countries affected by an earthquake, the area density, poverty, the very nature and quality of the habitat etc. Therefore , the paper recommends the following:
- People should be discouraged to build and live in geographic areas that are know to be at risk ,
- In regions with higgh risk of earthquakes, States and local communities must invest in earthquakes - proof buildings and /or adequately inform populations on the risk involved in living in certain geographic regions.
- When it comes to earthquakes, prevention, information and preparedness are keys to circumvent heavy losses in human lives.