Earthquake Analysis

  • To Study the behavior of earthquake around the world (year 2018) and perform statistical and exploratory analysis based on data from http://earthquake.usgs.gov
  • Different factors will be analyzed
  •       1. What regions have the most magnitude of earthquakes?
          2. How the trend changes with time?
          3. Correlation between different factors like magnitutude,depth, rms?
          4. Latitude and Longitde and the relatin between the depth of the earthquake
          5. Countries most affected by earthquake on year 2018

    Data Sources

  • API - The primary data will be obtained from http://earthquake.usgs.gov using webservice API (https://earthquake.usgs.gov/fdsnws/event/1/query?)
  • CSV - Some of the supporting data files are fetched from github
  • Approach and libraries

  • This project follows the data science workflow, the data is first fetched from API and from github location and then it got transformed in order to present in the form the dataframe or graphs
  • The following libraries are used
  •       ggmap - map the world
          RColorBrewer - color coding of graphs
          tidyverse - data cleansing and transformation
          knitr - table visvulizations
          ggplot2 - plot graphs
          GGally - Advanced graphs
          sqldf - SQL operations
          shiny - to build the shiny app
          leaflet - generating maps
          treemap - generating treemaps

    # Load libraries
    library(tidyverse)
    library(ggmap)
    library(RColorBrewer)
    library(knitr)
    library(kableExtra)
    library(ggplot2)
    library(GGally)
    library(sqldf)
    library(shiny)
    library(leaflet)
    library(RCurl)
    library(treemap)

    Analysis

    Data from API

    #Getting data from API
    e1 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-01-01&endtime=2018-02-20", sep = ",", stringsAsFactors = F)
    e2 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-02-20&endtime=2018-04-14", sep = ",", stringsAsFactors = F)
    e3 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-04-15&endtime=2018-05-30", sep = ",", stringsAsFactors = F)
    e4 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-05-31&endtime=2018-06-24", sep = ",", stringsAsFactors = F)
    e5 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-06-25&endtime=2018-07-15", sep = ",", stringsAsFactors = F)
    e6 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-07-16&endtime=2018-08-05", sep = ",", stringsAsFactors = F)
    e7 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-08-06&endtime=2018-09-15", sep = ",", stringsAsFactors = F)
    e8 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-09-16&endtime=2018-11-05", sep = ",", stringsAsFactors = F)
    e9 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-11-06&endtime=2018-12-06", sep = ",", stringsAsFactors = F)
    e10 <- read.csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-12-07&endtime=2018-12-31", sep = ",", stringsAsFactors = F)
    #earthquake <- read.csv("C:/Users/aisha/Dropbox/CUNY/Semester1/DATA607_Data_Acquisition_and_Management/Project/earthquake.csv", sep = ",", stringsAsFactors = F)
    
    #Binding data
    earthquake <- bind_rows("group 1" = e1, "group 2" = e2,"group 3" = e3,"group 4" = e4,"group 5" = e5,"group 6" = e6,"group 7" = e7,"group 8" = e8, "group 9" = e9, "group 10" = e10, .id = "groups") %>% mutate(mag = replace_na(mag, 0))
    
    #Getting data from github
    station_mapping <- read.csv("https://raw.githubusercontent.com/thasleem1/DATA607/master/country_mapping.csv", sep = ",", stringsAsFactors = F)
    
    #Transformating data
    quakes1 <- left_join(earthquake, station_mapping, by = c('locationSource')) %>% rename(lat = latitude) %>% rename(long = longitude) %>% mutate(mag = replace_na(mag, 0)) %>% mutate(stations = replace_na(stations, 0)) %>% rename(places = `place.y`)
    
    #Preview data
    kable(data.frame(head(quakes1))) %>%
      kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
      row_spec(0, bold = T, color = "white", background = "#ea7872") %>%
        scroll_box(width = "100%", height = "300px")
    groups time lat long depth mag magType nst gap dmin rms net id updated place.x type horizontalError depthError magError magNst status locationSource magSource places stations
    group 1 2018-02-19T23:59:53.130Z 33.48650 -116.4577 7.96 0.83 ml 18 145.00 0.04547 0.1000 ci ci38112256 2018-02-20T21:52:02.750Z 21km ESE of Anza, CA earthquake 0.22 0.47 0.112 10 reviewed ci ci California 2
    group 1 2018-02-19T23:43:53.644Z 38.44040 -118.2282 1.90 0.70 ml 4 190.87 0.05800 0.1057 nn nn00622355 2018-02-20T16:51:19.329Z 35km ESE of Hawthorne, Nevada earthquake NA 7.50 0.000 1 reviewed nn nn Nevada 6
    group 1 2018-02-19T23:43:26.250Z 33.47833 -116.4647 5.63 1.98 ml 82 28.00 0.05550 0.1900 ci ci38112248 2018-02-20T19:49:27.160Z 21km ESE of Anza, CA earthquake 0.14 0.37 0.123 66 reviewed ci ci California 2
    group 1 2018-02-19T23:43:03.330Z 56.18700 -148.6738 20.32 2.90 ml NA 239.00 2.67100 0.7200 us us2000d906 2018-05-22T02:00:18.040Z 288km SE of Kodiak, Alaska earthquake 10.50 11.50 0.052 48 reviewed us us Asia and Europe 4
    group 1 2018-02-19T23:42:41.270Z 33.48100 -116.4595 5.49 1.81 ml 54 77.00 0.05085 0.2300 ci ci38112240 2018-02-20T19:06:11.738Z 22km ESE of Anza, CA earthquake 0.25 0.67 0.208 26 reviewed ci ci California 2
    group 1 2018-02-19T23:40:30.180Z 33.47700 -116.4650 5.97 2.81 ml 115 25.00 0.05674 0.2100 ci ci38112232 2018-02-20T22:41:43.237Z 21km ESE of Anza, CA earthquake 0.11 0.34 0.159 125 reviewed ci ci California 2

    Magnitude

  • Magnitude is a number that characterizes the relative size of an earthquake, which is based on measurement of the maximum motion recorded by a seismograph
  • A Magnitude of 4.5 or higher is considered as dangerous may cause more damage
  • # 4.5
    earthquake45 <- subset(earthquake, earthquake$mag >= 4.5)
    earthquake45.sorted <- arrange(earthquake45, desc(mag))
    earthquake45.mod <- select(earthquake45.sorted, latitude, longitude, mag)
    inside <- filter(earthquake45.mod, between(longitude, -90, 90), between(latitude, -180, 180))
    earthquake45.mod <- setdiff(earthquake45.mod, inside)
    
    # all
    earthquake0 <- subset(earthquake, earthquake$mag > 0)
    
    earthquake0.sorted <- arrange(earthquake0, desc(mag))
    earthquake0.mod <- select(earthquake0.sorted, latitude, longitude, mag)
    inside <- filter(earthquake0.mod, between(longitude, -90, 90), between(latitude, -180, 180))
    earthquake0.mod <- setdiff(earthquake0.mod, inside)
    
    wmap <- borders("world", colour = "gray50", fill = "gray50")
    
    # mag >= 4.5$ 
    earthquake45_map <- ggplot() + wmap
    earthquake45_map <- earthquake45_map + geom_point(data = earthquake45.mod, aes(x = as.numeric(longitude),
        y = as.numeric(latitude), colour = mag)) + ggtitle("Earthquakes with Mag > 4.5 - Dangerous") +
        xlab("Longitude") + ylab("Latitude") +
      theme(plot.title = element_text(hjust = 0.5))
    myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
    sc1 <- scale_colour_gradientn(colours = myPalette(100), limits = c(4.5, 8))
    earthquake45_map + sc1

    # mag > 0 (All recorded seismic events)
    earthquake0_map <- ggplot() + wmap
    earthquake0_map <- earthquake0_map + geom_point(data = earthquake0.mod, aes(x = as.numeric(longitude), y = as.numeric(latitude),
        colour = mag)) + ggtitle("Earthquakes with Mag > 0 Seismic Events") +
        xlab("Longitude") + ylab("Latitude")+
      theme(plot.title = element_text(hjust = 0.5))
    myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
    sc2 <- scale_colour_gradientn(colours = myPalette(100), limits = c(0, 8))
    earthquake0_map + sc2

    Trend Analysis

    Let’s find out how the earthquake behaved on monthly basis based on region

    earthquake %>%
      group_by(date = as.Date(time)) %>%
      summarise(count = n())  %>%
      ggplot() +
      geom_point(mapping = aes(x = date, y = count, colour = "red")) +
      scale_x_date(name = 'Month', date_breaks = '1 month', date_labels = '%b') + labs(title = "2018 Earthquake - Month", y= "Number of Earthquake")+ theme(legend.position = "none") +
      theme(plot.title = element_text(hjust = 0.5))

    geo_zone <- function(lat) {
      
      levels <- c("tropics", "subtropics", "temperate", "frigid")
      regions <- factor(x = levels, levels = levels, ordered =T)
      
      if (abs(lat) < 23.5) {    result <- regions[1]  }
      else if (abs(lat) < 35) {    result <- regions[2]  }
      else if (abs(lat) < 66.5) {    result <- regions[3]  }
      else {    result <- regions[4]  }
      return(result)
    }
    
    earthquake$zone <-unlist(lapply(earthquake$latitude, geo_zone))
    
    zone1 <- earthquake %>%
      group_by(date = as.Date(time), zone = earthquake$zone) %>%
      summarise(count = n())
    
    ggplot(zone1, aes(x = date, y = count, color = zone1$zone)) +
      geom_point() +
      scale_x_date(name = 'date', date_breaks = '1 month', date_labels = '%b') + 
      scale_color_brewer(palette="Set1") + labs(title = "Impact on Earthquake by zone level", y= "Number of Earthquake", fill = "zone") +
      theme(plot.title = element_text(hjust = 0.5))
  • The observation shows that most of the earthquake happened between Jun to August on the year 2018
  • Temperate zone are most vulnerable to earthquake however for the summer time it is tropical region
  • Factor Analysis

    Some of the factors included as below
  • depth - Depth of the event in kilometers
  • mag - The magnitude for the event
  • rms - The root-mean-square (RMS) travel time residual, in sec, using all weights
  • nst - The total number of seismic stations used to determine earthquake location

  • df <- earthquake[, c("depth", "mag","rms","nst")]
    df <- df[complete.cases(df),]
    ggpairs(df) + labs(title = "2018 Earthquake - Corr.") +   theme(plot.title = element_text(hjust = 0.5))

    ggplot(data = quakes1, mapping = aes(x = lat, y = long)) +
      geom_point() + geom_point(alpha = 0.1, aes(color = stations))+ labs(title = "latitude vs Longitude") +   theme(plot.title = element_text(hjust = 0.5))

    ggplot(data = quakes1, mapping = aes(x = depth, y = lat)) +
      geom_point() + geom_point(alpha = 0.1, aes(color = stations))+ labs(title = "Depth vs Latitude") +   theme(plot.title = element_text(hjust = 0.5))

    ggplot(data = quakes1, mapping = aes(x = depth, y = long)) +
      geom_point() + geom_point(alpha = 0.1, aes(color = stations))+ labs(title = "Depth vs Longitude") +   theme(plot.title = element_text(hjust = 0.5))

  • The correlation graph shows, there no much correlation on factors impacting earthquake
  • (-100,25) to (-100,50) are the places where most stations captured the earthquakes
  • The depth, latitude and longitude of stations affect the magnitude of earthquake. But it can be seen that there are still variations related to it
  • Countries Impacted

    Treemap showing how different places are impacted by earthquake magnitude

    places <-  sqldf("select places,printf(\"%.2f\",avg(mag)) as avg_mag from quakes1 group by places order by avg(mag)")
    places$avg_mag <- as.numeric(places$avg_mag) 
    
    treemap(dtf = places,
            index=c("places"),
            vSize="avg_mag",
            vColor="avg_mag",
            palette="Spectral",
            type="value",
            border.col=c("grey70", "grey90"),
            fontsize.title = 18,
            algorithm="pivotSize",
            title ="Earthquakes by Places - Magnitude",
            title.legend="Magnitude")

    Models/Techniques

    Earthquake prediction can be split into two types:
    1. Statistical prediction (background seismic hazard) based on previous events and likely future recurrence - uses instrumental catalogue, archaeological record, geological (Quaternary) record
    2. Deterministic prediction (the place, magnitude and time of a future event from observation of earthquake precursors)

    Some of the Earthquake models are
  • SVR - Support Vector Regression
  • HNN - Hybrid Neural Networks

    There is NO stable model identified yet to predict the earthquakes,most of them are in research stage

  • Apart from Earthquake

    Interesting finds apart from earthquake as part of this analysis

    other <-  sqldf("select type,printf(\"%.2f\",avg(mag)) as avg_mag,count(type) as count from quakes1 where type <> \"earthquake\" and type <> \"other event\" group by type order by avg(mag),count(type)")
    other$avg_mag <- as.numeric(other$avg_mag)  
    
    ggplot(data=other, aes(x= reorder(type,avg_mag), y=avg_mag, fill=count)) +  geom_bar(stat="identity") + coord_flip() + geom_text(aes(label=avg_mag), vjust=0, colour = "red", fontface = "bold") + ggtitle("Other Events") + theme(plot.title = element_text(hjust = 0.15)) + theme(legend.position = "none") + xlab("Event") + ylab("Magnitude")

    Tectonics Plate

    Tectonic earthquakes occur at plate tectonic boundaries. Tectonic plates are constantly moving slowly, but sometimes friction between them causes them lock together and become unable to move. The rest of the plates carry on moving, which leads to increased pressure on the locked section. Eventually, the locked section succumbs to the pressure, and the plates move past each other rapidly. This movement causes a tectonic earthquake. The waves of released energy move through the Earth’s crust and cause the shaking we feel at an earthquake site
    Some study says that mountains are the results of tectonic earthquakes

    Challenges

    1. The data was taken from API, it has limitation of getting 20000 records at a time, so i have split the records by doing multiple API calls to bind the together
    2. Initially the shiny app was designed to fetch data directly from API, but due to limitation (20000 records), and interactive parameter changes in the shiny app will drastically slowdown the performance, then I have redesigned the shiny app to pull data from R dataframe instead of directly from API

    Conclusion

    1. Earthquakes are mostly occurring on temperate regions and high on summer season on tropical regions
    2. Asia is most targeted on year 2018
    3. (-100,25) to (-100,50) are the places where most stations captured the earthquakes
    4. There is no stable predictive model built for earthquake
    5. Climate Change and Tectonics plate movement may change the earthquake behavior