Earthquakes Data Set

Numeric Summary of The Data

  • First we need to import the library tidyverse that will help us transform and present our data

    library(tidyverse)
    ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
    ## ✔ dplyr     1.1.4     ✔ readr     2.1.5
    ## ✔ forcats   1.0.0     ✔ stringr   1.5.1
    ## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
    ## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
    ## ✔ purrr     1.0.2     
    ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ## ✖ dplyr::filter() masks stats::filter()
    ## ✖ dplyr::lag()    masks stats::lag()
    ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
  • Second we import our Earthquakes Data Set to memory by reading the csv file

quakes <- read_delim("./quakes.csv")
## Rows: 18334 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (8): magType, net, id, place, type, status, locationSource, magSource
## dbl  (12): latitude, longitude, depth, mag, nst, gap, dmin, rms, horizontalE...
## dttm  (2): time, updated
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  • We can now get a summary of the data that we have using the code below:

    summary(quakes[c("mag", "depth")])
    ##       mag            depth       
    ##  Min.   :5.000   Min.   : -1.01  
    ##  1st Qu.:5.100   1st Qu.: 10.00  
    ##  Median :5.200   Median : 13.63  
    ##  Mean   :5.341   Mean   : 53.02  
    ##  3rd Qu.:5.500   3rd Qu.: 45.53  
    ##  Max.   :8.300   Max.   :670.81
  • From the summary above we can note the following insights.

    • The maximum depth recorded for all earthquakes between 2013 and 2023 is 670.81 km and the minimum depth is -1.01 km. The concern one might have here is whether depth of an earthquake could be negative.

    • The mean depth is 53.02 km implying that the average (central tendency) of the earthquakes data set is 53.02 km.

    • From the summary provided by running the previous code snippet something that is missing that would be relevant to us is getting the unique types of the measurement scales used.From above we realize that the same magnitude scales are treated differently simply because of case sensitivity. Hence we need to optimize the above function in order to handle this scenario:

    unique_mag_types <- quakes |>
      select(magType) |>
      distinct()
    unique_mag_types
    ## # A tibble: 13 × 1
    ##    magType
    ##    <chr>  
    ##  1 mb     
    ##  2 mwb    
    ##  3 mw     
    ##  4 mww    
    ##  5 mwr    
    ##  6 mwc    
    ##  7 ms     
    ##  8 ml     
    ##  9 Md     
    ## 10 Ml     
    ## 11 ms_20  
    ## 12 mwp    
    ## 13 Mi
  • We now have have the types of magnitude scales used. However, we notice one problem. For the same scale, they are treated as two different types of scales which is incorrect. We can now modify the above function as below to handle case sensitivity.

unique_mag_types <- quakes |>
  select(magType) |>
  mutate(magType = tolower(magType)) |>
  distinct()
unique_mag_types
## # A tibble: 12 × 1
##    magType
##    <chr>  
##  1 mb     
##  2 mwb    
##  3 mw     
##  4 mww    
##  5 mwr    
##  6 mwc    
##  7 ms     
##  8 ml     
##  9 md     
## 10 ms_20  
## 11 mwp    
## 12 mi

Novel Questions to Investigate

Novel Question 1

  • Is there a correlation between depth at which the earthquake originated and the magnitude of the earthquake. Here we want to investigate whether earthquakes originating deeper from the earth’s surface have have more energy or the vice versa.

    • For this question, we can use the cor() function to evaluate the correction between depth and magnitude as shown in the code snippet below:
quakes |>
  drop_na(mag, depth, magType) |>
  group_by(magType) |>
  summarise(
    Correlation <- cor(mag, depth),
    Count <- n()
    )
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `Correlation <- cor(mag, depth)`.
## ℹ In group 6: `magType = "ms"`.
## Caused by warning in `cor()`:
## ! the standard deviation is zero
## # A tibble: 13 × 3
##    magType `Correlation <- cor(mag, depth)` `Count <- n()`
##    <chr>                              <dbl>          <int>
##  1 Md                              NA                    1
##  2 Mi                               1                    2
##  3 Ml                              NA                    1
##  4 mb                               0.00838           8346
##  5 ml                               0.143               55
##  6 ms                              NA                    3
##  7 ms_20                           -0.533                3
##  8 mw                               0.0748             158
##  9 mwb                              0.183              593
## 10 mwc                             -0.0647             228
## 11 mwp                              0.139                6
## 12 mwr                             -0.157              231
## 13 mww                              0.127             8707
  • From the results above we see that the groups with relative high correlation have significantly very few number of records for that group. This could significantly affect the accuracy of our analysis. For that reason we’ll filter out groups with less than 100 records. The code is adjusted as below:
quakes |>
  drop_na(mag, depth, magType) |>
  group_by(magType) |>
  filter(n()>100) |>
  summarise(
    Correlation <- cor(mag, depth),
    Count <- n()
    )
## # A tibble: 6 × 3
##   magType `Correlation <- cor(mag, depth)` `Count <- n()`
##   <chr>                              <dbl>          <int>
## 1 mb                               0.00838           8346
## 2 mw                               0.0748             158
## 3 mwb                              0.183              593
## 4 mwc                             -0.0647             228
## 5 mwr                             -0.157              231
## 6 mww                              0.127             8707
  • From above results, we see that the highest correlation is approximately \(0.2\) hence, an indication of weak correlation between depth and magnitude of the recorded earthquakes. We therefore conclude that there is little or no linear association between the two variables.

Novel Question 2

# Filter for strong earthquakes
strong_quakes <- quakes |>
  filter(mag >= 6.0)

# Plotting the geographic distribution of strong earthquakes
ggplot(strong_quakes, aes(x <- longitude, y <- latitude)) +
  geom_point(aes(color = mag), alpha = 0.6) +
  labs(title = "Geographic Distribution of Strong Earthquakes",
       x = "Longitude", y = "Latitude") +
  scale_color_viridis_c() +  # This adds a color gradient based on magnitude
  theme_minimal()

Novel Question 3

ggplot(quakes, aes(x = mag, y=nst), alpha=0,6) +
  geom_point(aes(color=mag)) +
  labs(
    title = "Relationship between Magnitude and #of Stations",
    x = "Magnitude",
    y ="#of Stations"
  ) +
  scale_fill_viridis_c() +
  theme_minimal()
## Warning: Removed 14759 rows containing missing values or values outside the scale range
## (`geom_point()`).

Conclusion

We explored the Earthquakes Data Set and we were able to obtain key summaries from the data, categorize continuous data by creating bins, aggregating continuous data, thus deriving key insights from the data. Overall, this was really interesting. However, even more insights can be derived by exploring the data further.