Earthquakes project 1 Data110

Author

Walter Hinkley

Earthquakes

Richter Scale vs Deaths

Introduction

Earthquakes are a result of seismic activity in the lithosphere of our planet. The Lithosphere is the top crust and upper part of the earths mantle. Different size energy waves create a quaking or shaking of the ground. Seismic waves range from very slight to very extreme. These energy waves are recorded by a seismograph and then measured by The Richter Scale. The Richter Scale determines the severity of each earthquake.I have a Data Set That covers the 20th century and lists a 123 larger earthquakes from 1902-1999. My question while examining the data is, Does the Richter Scale measurement have a correlation with the amount of deaths in each. My source for the data set is World Almanac and Book of Facts: 2011. https://www.openintro.org/data/index.php?data=earthquakes .

A break down of the Richter scale magnitude and the frequency is as follows.

Insert Picture of Richter Scale Chart

Richter Scale Severity and Frequency Chart

Install Libraries and Data Set

library(GGally)

Loading required package: ggplot2

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

library(DataExplorer)
library(psych)


Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ psych::%+%()    masks ggplot2::%+%()
✖ psych::alpha()  masks ggplot2::alpha()
✖ dplyr::filter() masks plotly::filter(), stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
earth <- read_csv('earthquakes.csv')

Rows: 123 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): month, area, region
dbl (4): year, day, richter, deaths

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

earth

# A tibble: 123 × 7
    year month      day richter area                        region        deaths
   <dbl> <chr>    <dbl>   <dbl> <chr>                       <chr>          <dbl>
 1  1902 April       19     7.5 Quezaltenango and San Marco Guatemala       2000
 2  1902 December    16     6.4 Uzbekistan                  Russia          4700
 3  1903 April       28     7   Malazgirt                   Turkey          3500
 4  1903 May         28     5.8 Gole                        Turkey          1000
 5  1905 April        4     7.5 Kangra                      India          19000
 6  1906 January     31     8.8 Esmeraldas (off coast)      Ecuador         1000
 7  1906 March       16     6.8 Chia-i                      Taiwan          1250
 8  1906 April       18     7.7 San Francisco               United States   3000
 9  1906 August      17     8.6 Valparaiso                  Chile           3882
10  1907 October     21     8.1 Central                     Asia           12000
# ℹ 113 more rows

Create a new variable in earth to Show Deaths(Thousands)

earthb <- earth %>% mutate(deaths = deaths/1000)
earthb

# A tibble: 123 × 7
    year month      day richter area                        region        deaths
   <dbl> <chr>    <dbl>   <dbl> <chr>                       <chr>          <dbl>
 1  1902 April       19     7.5 Quezaltenango and San Marco Guatemala       2   
 2  1902 December    16     6.4 Uzbekistan                  Russia          4.7 
 3  1903 April       28     7   Malazgirt                   Turkey          3.5 
 4  1903 May         28     5.8 Gole                        Turkey          1   
 5  1905 April        4     7.5 Kangra                      India          19   
 6  1906 January     31     8.8 Esmeraldas (off coast)      Ecuador         1   
 7  1906 March       16     6.8 Chia-i                      Taiwan          1.25
 8  1906 April       18     7.7 San Francisco               United States   3   
 9  1906 August      17     8.6 Valparaiso                  Chile           3.88
10  1907 October     21     8.1 Central                     Asia           12   
# ℹ 113 more rows

Reorder Data Set by Deaths

earthb <- earthb[order(earth$deaths, decreasing = TRUE),]
earthb

# A tibble: 123 × 7
    year month       day richter area     region       deaths
   <dbl> <chr>     <dbl>   <dbl> <chr>    <chr>         <dbl>
 1  1970 May          31     7.9 Chimbote Peru          700  
 2  1976 July         28     7.5 Tangshan China         255  
 3  1920 December     16     7.8 Gansu    China         200  
 4  1923 September     1     7.9 Yokohama Japan         143. 
 5  1948 October       5     7.3 Ashgabat Turkmenistan  110  
 6  1908 December     28     7.2 Messina  Italy          72  
 7  1927 May          22     7.6 Gansu    China          40.9
 8  1990 June         20     7.4 West     Iran           40  
 9  1939 December     26     7.8 Erzincan Turkey         32.7
10  1915 January      13     7   Avezzano Italy          32.6
# ℹ 113 more rows

Remove rows with NA’s from Data Set

earthc <- earthb%>%
  drop_na()
earthc

# A tibble: 116 × 7
    year month       day richter area     region       deaths
   <dbl> <chr>     <dbl>   <dbl> <chr>    <chr>         <dbl>
 1  1970 May          31     7.9 Chimbote Peru          700  
 2  1976 July         28     7.5 Tangshan China         255  
 3  1920 December     16     7.8 Gansu    China         200  
 4  1923 September     1     7.9 Yokohama Japan         143. 
 5  1948 October       5     7.3 Ashgabat Turkmenistan  110  
 6  1908 December     28     7.2 Messina  Italy          72  
 7  1927 May          22     7.6 Gansu    China          40.9
 8  1990 June         20     7.4 West     Iran           40  
 9  1939 December     26     7.8 Erzincan Turkey         32.7
10  1915 January      13     7   Avezzano Italy          32.6
# ℹ 106 more rows

Create new Data Frame with Needed Columns

earth2 <- earthb[, c(1, 4, 6, 7)]
earth2

# A tibble: 123 × 4
    year richter region       deaths
   <dbl>   <dbl> <chr>         <dbl>
 1  1970     7.9 Peru          700  
 2  1976     7.5 China         255  
 3  1920     7.8 China         200  
 4  1923     7.9 Japan         143. 
 5  1948     7.3 Turkmenistan  110  
 6  1908     7.2 Italy          72  
 7  1927     7.6 China          40.9
 8  1990     7.4 Iran           40  
 9  1939     7.8 Turkey         32.7
10  1915     7   Italy          32.6
# ℹ 113 more rows

Create Interactive Scatterplot with Corosponding Variables

p <- ggplot(earth2, aes(x=year, y = richter, color = deaths))+
  geom_point(size=2)+
  scale_color_distiller(palette="Spectral",breaks=c(0,150,250),
                        limits = c(0,250))+
  labs(x="Year",
       y = "Richter Scale Measurment",
       color = "Deaths(per 1000)",
       title = "Richter Scale VS Deaths Comparisson",
       caption = "World Almanac and Book of Facts: 2011")+
  coord_cartesian(xlim = c(1900, 2000), ylim = c(5.5, 9.2))
p <- ggplotly(p)
p

Scatterplot created with x = years, y = Richter Scale, and color = Deaths. I did this because due to a few outliers in Deaths the scatter plot looked very much like a line since most of the deaths for earthquakes are between 0 and 10,000. So Deaths needed to be moved to color for a more pleasing visual.For my next plots I will adjust limits to obtain regression line and correlation.

Create new GGplot with Red Line Smoother and a Confidence Interval

p2 <- ggplot(earth2, aes(x=deaths, y = richter, color = year))+
  geom_point(size=2)+
  coord_cartesian(xlim = c(0, 50), ylim = c(5.5, 9.2))+
  geom_smooth(color="red")+
  theme_classic()+
  labs(title = "Richter Scale vs Deaths",
       caption = "World Almanac and Book of Facts: 2011")
p2

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Add a Regression Line with its own Confidence Interval

p3 <- p2 + geom_smooth(method = 'lm', formula = y~x)
p3

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 2 rows containing non-finite outside the scale range (`stat_smooth()`).
Removed 2 rows containing non-finite outside the scale range (`stat_smooth()`).

Warning: The following aesthetics were dropped during statistical transformation:
colour.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Create a Correlation Value Between Richter Scale and Deaths

cor(earthc$richter, earthc$deaths)

[1] 0.1589805

Shows a very weak correlation between The Richter Scale and Deaths

fit1 <- lm(richter ~ deaths, data = earthc)
summary(fit1)


Call:
lm(formula = richter ~ deaths, data = earthc)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.4570 -0.3402 -0.0374  0.3586  2.3592 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 7.1382677  0.0675391 105.691   <2e-16 ***
deaths      0.0015592  0.0009069   1.719   0.0883 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7058 on 114 degrees of freedom
Multiple R-squared:  0.02527,   Adjusted R-squared:  0.01672 
F-statistic: 2.956 on 1 and 114 DF,  p-value: 0.08827

Write the Correlation Equation

The equation for Richter = .0016(Deaths) + 7.14

Explain the P value and R-squared value in relation to this model

The pvalue to the right of deaths is .0883 and has no asterics and is not a meaningful variable to the rise Richter Scale. Also in looking at the R-squared value, it says that 2.52% of the time, the observations are explained by this model, while 97.48% of the time the data is not explained by this model.

Conclusion

I see very little evidence that says that the Richter Scale Measurement will help determine the number of deaths in a major earthquake. With Richter Scale measurements from 5.5 up to 9.5 the amount of deaths was identified in this data set. If I had to guess, location and strength of infrastructure would be better factors at determining the number of deaths. Also in looking into a few of the bigger earthquakes with higher death totals I found that there was other contributing factors such as floods and land slides. Another factor I found interesting was that there is some controversy on the data in other web sites. Other sources dispute the death tolls of some of the earthquakes. Would need to dive deeper to see if I could find more data to support this data set or or numbers. Also, I would like to see additional variables that would help define the reasons for higher death counts. With better data a better solution might be possible. I also would have liked to add a photo of damage for each point in my visualization. When seeing a picture, it can really give an idea of damage and severity of an earthquakes. Another interesting idea would be if I could add geographical locations to each earthquake. To have a map showing locations and strengths would be interesting. I do know that the borders of the pacific ocean are a popular area for earthquakes to occur.