Earthquakes are a result of seismic activity in the lithosphere of our planet. The Lithosphere is the top crust and upper part of the earths mantle. Different size energy waves create a quaking or shaking of the ground. Seismic waves range from very slight to very extreme. These energy waves are recorded by a seismograph and then measured by The Richter Scale. The Richter Scale determines the severity of each earthquake.I have a Data Set That covers the 20th century and lists a 123 larger earthquakes from 1902-1999. My question while examining the data is, Does the Richter Scale measurement have a correlation with the amount of deaths in each. My source for the data set is World Almanac and Book of Facts: 2011. https://www.openintro.org/data/index.php?data=earthquakes .
A break down of the Richter scale magnitude and the frequency is as follows.
Insert Picture of Richter Scale Chart
Richter Scale Severity and Frequency Chart
Install Libraries and Data Set
library(GGally)
Loading required package: ggplot2
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
library(DataExplorer)library(psych)
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ psych::%+%() masks ggplot2::%+%()
✖ psych::alpha() masks ggplot2::alpha()
✖ dplyr::filter() masks plotly::filter(), stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 123 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): month, area, region
dbl (4): year, day, richter, deaths
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
earth
# A tibble: 123 × 7
year month day richter area region deaths
<dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 1902 April 19 7.5 Quezaltenango and San Marco Guatemala 2000
2 1902 December 16 6.4 Uzbekistan Russia 4700
3 1903 April 28 7 Malazgirt Turkey 3500
4 1903 May 28 5.8 Gole Turkey 1000
5 1905 April 4 7.5 Kangra India 19000
6 1906 January 31 8.8 Esmeraldas (off coast) Ecuador 1000
7 1906 March 16 6.8 Chia-i Taiwan 1250
8 1906 April 18 7.7 San Francisco United States 3000
9 1906 August 17 8.6 Valparaiso Chile 3882
10 1907 October 21 8.1 Central Asia 12000
# ℹ 113 more rows
Create a new variable in earth to Show Deaths(Thousands)
# A tibble: 123 × 7
year month day richter area region deaths
<dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 1902 April 19 7.5 Quezaltenango and San Marco Guatemala 2
2 1902 December 16 6.4 Uzbekistan Russia 4.7
3 1903 April 28 7 Malazgirt Turkey 3.5
4 1903 May 28 5.8 Gole Turkey 1
5 1905 April 4 7.5 Kangra India 19
6 1906 January 31 8.8 Esmeraldas (off coast) Ecuador 1
7 1906 March 16 6.8 Chia-i Taiwan 1.25
8 1906 April 18 7.7 San Francisco United States 3
9 1906 August 17 8.6 Valparaiso Chile 3.88
10 1907 October 21 8.1 Central Asia 12
# ℹ 113 more rows
# A tibble: 123 × 7
year month day richter area region deaths
<dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 1970 May 31 7.9 Chimbote Peru 700
2 1976 July 28 7.5 Tangshan China 255
3 1920 December 16 7.8 Gansu China 200
4 1923 September 1 7.9 Yokohama Japan 143.
5 1948 October 5 7.3 Ashgabat Turkmenistan 110
6 1908 December 28 7.2 Messina Italy 72
7 1927 May 22 7.6 Gansu China 40.9
8 1990 June 20 7.4 West Iran 40
9 1939 December 26 7.8 Erzincan Turkey 32.7
10 1915 January 13 7 Avezzano Italy 32.6
# ℹ 113 more rows
Remove rows with NA’s from Data Set
earthc <- earthb%>%drop_na()earthc
# A tibble: 116 × 7
year month day richter area region deaths
<dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 1970 May 31 7.9 Chimbote Peru 700
2 1976 July 28 7.5 Tangshan China 255
3 1920 December 16 7.8 Gansu China 200
4 1923 September 1 7.9 Yokohama Japan 143.
5 1948 October 5 7.3 Ashgabat Turkmenistan 110
6 1908 December 28 7.2 Messina Italy 72
7 1927 May 22 7.6 Gansu China 40.9
8 1990 June 20 7.4 West Iran 40
9 1939 December 26 7.8 Erzincan Turkey 32.7
10 1915 January 13 7 Avezzano Italy 32.6
# ℹ 106 more rows
Create new Data Frame with Needed Columns
earth2 <- earthb[, c(1, 4, 6, 7)]earth2
# A tibble: 123 × 4
year richter region deaths
<dbl> <dbl> <chr> <dbl>
1 1970 7.9 Peru 700
2 1976 7.5 China 255
3 1920 7.8 China 200
4 1923 7.9 Japan 143.
5 1948 7.3 Turkmenistan 110
6 1908 7.2 Italy 72
7 1927 7.6 China 40.9
8 1990 7.4 Iran 40
9 1939 7.8 Turkey 32.7
10 1915 7 Italy 32.6
# ℹ 113 more rows
Create Interactive Scatterplot with Corosponding Variables
p <-ggplot(earth2, aes(x=year, y = richter, color = deaths))+geom_point(size=2)+scale_color_distiller(palette="Spectral",breaks=c(0,150,250),limits =c(0,250))+labs(x="Year",y ="Richter Scale Measurment",color ="Deaths(per 1000)",title ="Richter Scale VS Deaths Comparisson",caption ="World Almanac and Book of Facts: 2011")+coord_cartesian(xlim =c(1900, 2000), ylim =c(5.5, 9.2))p <-ggplotly(p)p
Scatterplot created with x = years, y = Richter Scale, and color = Deaths. I did this because due to a few outliers in Deaths the scatter plot looked very much like a line since most of the deaths for earthquakes are between 0 and 10,000. So Deaths needed to be moved to color for a more pleasing visual.For my next plots I will adjust limits to obtain regression line and correlation.
Create new GGplot with Red Line Smoother and a Confidence Interval
p2 <-ggplot(earth2, aes(x=deaths, y = richter, color = year))+geom_point(size=2)+coord_cartesian(xlim =c(0, 50), ylim =c(5.5, 9.2))+geom_smooth(color="red")+theme_classic()+labs(title ="Richter Scale vs Deaths",caption ="World Almanac and Book of Facts: 2011")p2
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Add a Regression Line with its own Confidence Interval
p3 <- p2 +geom_smooth(method ='lm', formula = y~x)p3
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range (`stat_smooth()`).
Removed 2 rows containing non-finite outside the scale range (`stat_smooth()`).
Warning: The following aesthetics were dropped during statistical transformation:
colour.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Create a Correlation Value Between Richter Scale and Deaths
cor(earthc$richter, earthc$deaths)
[1] 0.1589805
Shows a very weak correlation between The Richter Scale and Deaths
fit1 <-lm(richter ~ deaths, data = earthc)summary(fit1)
Call:
lm(formula = richter ~ deaths, data = earthc)
Residuals:
Min 1Q Median 3Q Max
-1.4570 -0.3402 -0.0374 0.3586 2.3592
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.1382677 0.0675391 105.691 <2e-16 ***
deaths 0.0015592 0.0009069 1.719 0.0883 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7058 on 114 degrees of freedom
Multiple R-squared: 0.02527, Adjusted R-squared: 0.01672
F-statistic: 2.956 on 1 and 114 DF, p-value: 0.08827
Write the Correlation Equation
The equation for Richter = .0016(Deaths) + 7.14
Explain the P value and R-squared value in relation to this model
The pvalue to the right of deaths is .0883 and has no asterics and is not a meaningful variable to the rise Richter Scale. Also in looking at the R-squared value, it says that 2.52% of the time, the observations are explained by this model, while 97.48% of the time the data is not explained by this model.
Conclusion
I see very little evidence that says that the Richter Scale Measurement will help determine the number of deaths in a major earthquake. With Richter Scale measurements from 5.5 up to 9.5 the amount of deaths was identified in this data set. If I had to guess, location and strength of infrastructure would be better factors at determining the number of deaths. Also in looking into a few of the bigger earthquakes with higher death totals I found that there was other contributing factors such as floods and land slides. Another factor I found interesting was that there is some controversy on the data in other web sites. Other sources dispute the death tolls of some of the earthquakes. Would need to dive deeper to see if I could find more data to support this data set or or numbers. Also, I would like to see additional variables that would help define the reasons for higher death counts. With better data a better solution might be possible. I also would have liked to add a photo of damage for each point in my visualization. When seeing a picture, it can really give an idea of damage and severity of an earthquakes. Another interesting idea would be if I could add geographical locations to each earthquake. To have a map showing locations and strengths would be interesting. I do know that the borders of the pacific ocean are a popular area for earthquakes to occur.